Office Action Analysis: 18768487 — Artificial Intelligence Based Content Immersion Environment Generation

Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 17 objected to because of the following informalities: typographic error, "Vide, should be "video".  Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Singh (U.S. Pub No. 2024/0135649) in view of Goodrich et al. (U.S. Pub. No. 2023/0281930).
Regarding claim 1, Singh discloses a system comprising: a computing platform including a hardware processor and a system memory (Singh: paragraph 5, line(s) 1-3 "system for auto-generating and sharing customized virtual environments comprises a processor and a memory"), the system memory storing an artificial intelligence based (AI-based) content immersion environment generator (Singh: paragraph 34, line(s) 3-7 "memory 114 may store a user interface application 152, an object extraction model 154, a machine learning model 156"); the hardware processor configured to execute the AI-based content immersion environment generator to (Singh: paragraph 42, line(s) 4-8 "virtual interaction engine 110 may be implemented by the processor 108 by executing a user interface application 152, an object extraction model 154, a machine learning model 156, and a virtual event rendering model 158 to auto-generate and share customized virtual environments 131a-131d"; also, paragraph 31, line(s) 7-9 "the virtual interaction engine 110 may be configured to auto-generate customized virtual environments"): receive media content, the media content including a plurality of video frames (Singh: paragraph 42, line(s) 1-3 "Virtual interaction engine 110 may include, but is not limited to, one or more separate and independent software and/or hardware components of a server "; also, paragraph 44, line(s) 1-2 "The server 104 may receive a plurality of user data objects 162 in a series of video frames"); identify one or more video frames of the plurality of video frames for use in generating a content immersion environment for a display of the media content (Singh: paragraph 44, line(s) 12-18 "Each user data object 162 may be included in a video frame corresponding to a timestamp 150. For example, the user data objects 162 may be associated with the use profile 134, audio data, video data, or textual information which the server 104 receives during the interaction. The user data objects 162 may include user behavior objects 164 and user device objects 166 associated with a series of changing events or instances caused by users behaviors and user device operations during the interaction. The server 104 may extract a set of user behavior objects 164 and a set of user device objects 166 from the set of the user data objects 162."; also, paragraph 55, line(s) 1-12 "the server 104 extracts a set of user behavior objects 164 and a set of user device objects 166 from the set of the user data objects 162. In some embodiments, each user behavior object 164 is associated with a type of user behavior corresponding to at least one virtual environment object 146. For example, the set of the user behavior objects 164 correspond to one or more user behaviors through the user device 102 and the avatar 132 on the plurality of the virtual environment objects 146 during the interaction. The server 104 may generate the set of the user behavior objects 164 based on the user behaviors, user interaction patterns, and the context information."; also, paragraph 20, line(s) 2-3 "The user device 102 may be configured to display the virtual environment"); analyze features of each of the one or more video frames to provide one or more respective depth maps of the one or more video frames; and generate, based on the one or more video frames, using a trained AI model of the AI-based content immersion environment generator and the one or more respective depth maps, a three-dimensional (3-D) content immersion environment corresponding respectively to each of the one or more video frames to provide one or more 3-D content immersion environments for the display of the media content. Singh does not disclose the process for analyze features of each of the one or more video frames to provide one or more respective depth maps of the one or more video frames; and generate, based on the one or more video frames, using a trained AI model of the AI-based content immersion environment generator and the one or more respective depth maps, a three-dimensional (3-D) content immersion environment corresponding respectively to each of the one or more video frames to provide one or more 3-D content immersion environments for the display of the media content.
However, in a similar field of endeavor, Goodrich discloses the process of analyze features of each of the one or more video frames to provide one or more respective depth maps of the one or more video frames (Goodrich: Fig. 14; also, paragraph 121, line(s) 4-17 "An image includes one or more real-world features, such as a user's face or real-world object(s) detected in the image. In some embodiments, an image includes metadata describing the image. For example, the depth data includes data corresponding to a depth map including depth information based on light rays emitted from a light emitting module directed to an object (e.g., a user's face) having features with different depths (e.g., eyes, ears, nose, lips, etc.). By way of example, a depth map is similar to an image but instead of each pixel providing a color, the depth map indicates distance from a camera to that part of the image (e.g., in absolute terms, or relative to other pixels in the depth map)"); and generate, based on the one or more video frames, using a trained AI model of the AI-based content immersion environment generator and the one or more respective depth maps, a three-dimensional (3-D) content immersion environment (Goodrich: paragraph 156, line(s) 5-9 "machine learning techniques can train a machine learning model from training data from shared 3D messages in an example (or other image data). Such heuristics include using face tracking and portrait segmentation to generate a depth map of a person.") corresponding respectively to each of the one or more video frames to provide one or more 3-D content immersion environments for the display of the media content (Goodrich: Fig. 13; also, paragraph 16, line(s) 1-2 "capturing image information and generating a 3D message in a display of a client device").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Singh's system that comprises of a hardware processor and a system memory where the hardware processor and system memory executes and stores the AI-based content immersion environment generator, identify video frames and display it on the media content with the features of analyzing features of one or more frames to provide a respective depth map or such video frames and utilizing a trained AI model with the respective depth maps to generate an immersive 3D environment. As demonstrated by Goodrich, one could choose to analyze and implement a respective depth mapping based on video frames and generated a 3D immersive environment. 
Regarding claim 2, Singh as modified by Goodrich discloses the system of claim 1, wherein the AI-based content immersion environment generator includes a graphical user interface (GUI) (Goodrich: Fig, 12; also, paragraph 63, line(s) 18-21 "In some embodiments, a modified image or video stream may be presented in a graphical user interface displayed on the mobile client device"), and wherein before identification of the one or more video frames is performed, the hardware processor is further configured to execute the AI-based content immersion environment generator to (Goodrich: paragraph 63, line(s) 22-29 "The transform system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. That is, the user may capture the image or video stream and be presented with a modified result in real time or near real time once a modification icon has been selected. Further, the modification may be persistent while the video stream is being captured and the selected modification icon remains toggled. Machine taught neural networks may be used to enable such modifications."): receive, via the GUI from a user of the system, data identifying the one or more video frames or one or more instructions for use when identifying the one or more video frames (Goodrich: paragraph 63, line(s) 6-18 "The transform system operating within the messaging client application 104 determines the presence of a face within the image or video stream and provides modification icons associated with a computer animation model to transform image data, or the computer animation model can be present as associated with an interface described herein. The modification icons include changes which may be the basis for modifying the user's face within the image or video stream as part of the modification operation. Once a modification icon is selected, the transform system initiates a process to convert the image of the user to reflect the selected modification icon (e.g., generate a smiling face on the user)").
Regarding claim 3, Singh as modified by Goodrich discloses the system of claim 2, wherein the one or more instructions are received via the GUI (Goodrich: paragraph 63, line(s) 22-29 "The transform system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. That is, the user may capture the image or video stream and be presented with a modified result in real time or near real time once a modification icon has been selected. Further, the modification may be persistent while the video stream is being captured and the selected modification icon remains toggled. Machine taught neural networks may be used to enable such modifications."), and wherein the one or more instructions command identification of one or more video frames per shot or per scene of the media content (Goodrich: paragraph 63, line(s) 18-27 "In some embodiments, a modified image or video stream may be presented in a graphical user interface displayed on the mobile client device as soon as the image or video stream is captured and a specified modification is selected. The transform system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. That is, the user may capture the image or video stream and be presented with a modified result").
Regarding claim 4, Singh as modified by Goodrich discloses the system of claim 2, wherein the one or more instructions are received via the GUI, and wherein the one or more instructions command identification of one or more video frames per specified timecode interval of the media content (Singh: paragraph 51, line(s) 7-8 "The plurality of the spatial video frames 149 may correspond to different timestamps in the time sequence"; also, paragraph 44, line(s) 4-12 "Each user data object 162 may be included in a video frame corresponding to a timestamp 150. For example, the user data objects 162 may be associated with the use profile 134, audio data, video data, or textual information which the server 104 receives during the interaction"). Singh does not disclose the ability to receive one or more instructions received from the GUI.
However, in a similar field of endeavor, Goodrich additionally discloses the one or more instructions are received via the GUI (Goodrich: paragraph 63, line(s) 22-29 "The transform system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. That is, the user may capture the image or video stream and be presented with a modified result in real time or near real time once a modification icon has been selected. Further, the modification may be persistent while the video stream is being captured and the selected modification icon remains toggled. Machine taught neural networks may be used to enable such modifications.")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Singh's system of claim 2, and the identification of one or more video frames per specified timecode interval of the media content with the features of receiving one or more instructions via the GUI. As demonstrated by Goodrich, one could add in the support for a GUI to provide further instructions on one or more video frames per specified timecode interval of the media content.
Regarding claim 5, Singh as modified by Goodrich discloses the system of claim 2, wherein the hardware processor is further configured to execute the AI-based content immersion environment generator to (Singh: paragraph 42, line(s) 4-8 "The virtual interaction engine 110 may be implemented by the processor 108 by executing a user interface application 152, an object extraction model 154, a machine learning model 156, and a virtual event rendering model 158 to auto-generate and share customized virtual environments 131a-131d"; also, paragraph 31, line(s) 7-9 "the virtual interaction engine 110 may be configured to auto-generate customized virtual environments"): output, via the GUI to the user, the one or more 3-D content immersion environments. Singh does not disclose the output, via the GUI to the user, the one or more 3-D content immersion environments.
However, in a similar field of endeavor, Goodrich, additionally discloses the  output, via the GUI to the user, the one or more 3-D content immersion environments (Goodrich: paragraph 63, line(s) 18-29 "In some embodiments, a modified image or video stream may be presented in a graphical user interface displayed on the mobile client device as soon as the image or video stream is captured and a specified modification is selected. The transform system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. That is, the user may capture the image or video stream and be presented with a modified result in real time or near real time once a modification icon has been selected. Further, the modification may be persistent while the video stream is being captured and the selected modification icon remains toggled. Machine taught neural networks may be used to enable such modifications.").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Singh's system of claim 2 of the hardware processor is further configured to execute the AI-based content immersion environment generator with the features of output, via the GUI to the user, the one or more 3-D content immersion environments. As demonstrated by Goodrich, out could further configure a machine learning model that generates immersion environments and returns the results vie the GUI in one or more 3D content immersion environments.
Regarding claim 6, Singh as modified by Goodrich discloses the system of claim 1, wherein identifying the one or more video frames of the plurality of video frames for use in generating the content immersion environment is performed using metadata included with the media content (Goodrich: paragraph 55, line(s) 1-6 "Data and various systems using augmented reality content generators or other such transform systems to modify content using this data can thus involve detection of objects (e.g., faces, hands, bodies, cats, dogs, surfaces, objects, etc.), tracking of such objects as they leave, enter, and move around the field of view in video frames,").
Regarding claim 7, Singh as modified by Goodrich discloses the system of claim 1, wherein to generate the 3-D content immersion environment corresponding respectively to each of the one or more video frames, the hardware processor is further configured to execute the AI-based content immersion environment generator to (Singh: paragraph 42, line(s) 4-8 "The virtual interaction engine 110 may be implemented by the processor 108 by executing a user interface application 152, an object extraction model 154, a machine learning model 156, and a virtual event rendering model 158 to auto-generate and share customized virtual environments 131a-131d"; also, paragraph 31, line(s) 7-9 "the virtual interaction engine 110 may be configured to auto-generate customized virtual environments"): determine whether any of the one or more video frames contains an image segment depicting a human, humanoid, or animal; and inpaint the image segment, when determining determines that the one or more video frames contains the image segment, thereby obscuring the human, humanoid, or animal to provide one or more inpainted video frames; wherein generating a 3-D content immersion environment corresponding to an inpainted video frame uses the inpainted video frame. Singh does not disclose the ability to generate the 3-D content immersion environment corresponding respectively to each of the one or more video frames and determine whether any of the one or more video frames contains an image segment depicting a human, humanoid, or animal; and inpaint the image segment, when determining determines that the one or more video frames contains the image segment, thereby obscuring the human, humanoid, or animal to provide one or more inpainted video frames; wherein generating a 3-D content immersion environment corresponding to an inpainted video frame uses the inpainted video frame.
However, in a similar field of endeavor, Goodrich discloses additional features to generate the 3-D content immersion environment corresponding respectively to each of the one or more video frames (Goodrich: paragraph 55, line(s) 1-13 "Data and various systems using augmented reality content generators or other such transform systems to modify content using this data can thus involve detection of objects (e.g., faces, hands, bodies, cats, dogs, surfaces, objects, etc.), tracking of such objects as they leave, enter, and move around the field of view in video frames, and the modification or transformation of such objects as they are tracked. In various embodiments, different methods for achieving such transformations may be used. For example, some embodiments may involve generating a three-dimensional mesh model of the object or objects, and using transformations and animated textures of the model within the video to achieve the transformation"), and determine whether any of the one or more video frames contains an image segment depicting a human, humanoid, or animal; and inpaint the image segment, when determining determines that the one or more video frames contains the image segment, thereby obscuring the human, humanoid, or animal to provide one or more inpainted video frames; wherein generating a 3-D content immersion environment corresponding to an inpainted video frame uses the inpainted video frame (Goodrich: Fig. 15; also, paragraph 18, line(s) 1-3 "FIG. 15 is an example illustrating a depth inpainting mask and depth inpainting, according to some example embodiments."; also, paragraph 161, line(s) 1-3 "image and depth data processing module 706 generates a segmentation mask based at least in part on the image data"; also, paragraph 161, line(s) 6-10 "a prediction is made for every pixel to assign the pixel to a particular object class (e.g., face/portrait or background), and the segmentation mask is determined based on the groupings of the classified pixels (e.g., face/portrait or background)."; also, paragraph 162, line(s) 5-9 "the image and depth data processing module 706 performs a background inpainting technique that eliminates the portrait (e.g., including the user's face) from the background and blurring the background to focus on the person in the frame."; also, paragraph 166, line(s) 1-5 "generates a view of the 3D message using at least the background inpainted image, the inpainted depth map, and the post-processed foreground image, which are assets that are included the generated 3D message").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Singh's system of claim 1 where the hardware processor is further configured to execute the AI-based content immersion environment generator with the features to generate the 3-D content immersion environment corresponding respectively to each of the one or more video frames, and determine whether any of the one or more video frames contains an image segment depicting a human, humanoid, or animal; and inpaint the image segment, when determining determines that the one or more video frames contains the image segment, thereby obscuring the human, humanoid, or animal to provide one or more inpainted video frames; wherein generating a 3-D content immersion environment corresponding to an inpainted video frame uses the inpainted video frame. As demonstrated by Goodrich, one could implement the features of generating a 3D content immersion environment that correspond to one or more video frames and further configure a machine learning model to determine whether any of the one or more video frames contains an image segment depicting a human, humanoid, or animal; and inpaint the image segment, when determining determines that the one or more video frames contains the image segment, thereby obscuring the human, humanoid, or animal to provide one or more inpainted video frames; wherein generating a 3-D content immersion environment corresponding to an inpainted video frame uses the inpainted video frame.
Regarding claim 8, Singh as modified by Goodrich discloses the system of claim 1, wherein the hardware processor is further configured to execute the AI-based content immersion environment generator to (Singh: paragraph 42, line(s) 4-8 "The virtual interaction engine 110 may be implemented by the processor 108 by executing a user interface application 152, an object extraction model 154, a machine learning model 156, and a virtual event rendering model 158 to auto-generate and share customized virtual environments 131a-131d"; also, paragraph 31, line(s) 7-9 "the virtual interaction engine 110 may be configured to auto-generate customized virtual environments"): identify, using an AI-based visual analyzer of the AI-based content immersion environment generator, one or more interaction-suitable features depicted in at least one of the one or more video frames (Singh: paragraph 7, line(s) 4-7 "generates customized virtual environments in real-time based on user behaviors and user device operations corresponding to changes to virtual environment objects during the interactions"; also, paragraph 14, line(s) 4-7 "auto-generate and share customized virtual environments for users to perform interactions through user devices 102 and avatars 132 with an entity. In one embodiment, system 100 comprises a server 104"; also, paragraph 15, line(s) 2-27 "the server 104 to auto-generate customized virtual environments 131 with seamless integration with user needs and preferences for an avatar 132 associated with a user device 102 to implement an interaction with an entity. For example, the server 104 may extract user behavior objects 164 and user device objects 166 from user data objects 162 of the interaction. The server 104 may apply a machine learning model 156 to map the user behavior objects 164 and the user device objects 166 to corresponding virtual environment objects 146 in a virtual operation area 140 in a virtual environment 130. The server 104 may integrate the user behavior objects 164 and the user device objects 166 with the corresponding virtual environment objects 146 into a set of interaction objects 168. The server 104 may generate customized virtual environment objects 148 based on the interaction objects 168. The customized virtual environment objects 148 may be rendered in a customized virtual environment 131 corresponding to the interaction. The server 104 may update the user behavior objects 164 and the user device objects 166 in real time based on the detected new user behaviors and new parameters through the user device 102. The customized virtual environment 131 may be modified in synchronization with the updated user behavior objects 164 and the user device objects 166 to facilitate a user seamless interaction in real-time."); wherein a 3-D content immersion environment corresponding to the at least one of the one or more video frames includes at least one interactive environmental feature corresponding to at least one of the identified one or more interaction-suitable features. Singh does not disclose a 3-D content immersion environment corresponding to the at least one of the one or more video frames that includes at least one interactive environmental feature corresponding to at least one of the identified one or more interaction-suitable features.
However, in a similar field of endeavor, Goodrich discloses additional features where a 3-D content immersion environment corresponding to the at least one of the one or more video frames (Goodrich: paragraph 55, line(s) 1-13 "Data and various systems using augmented reality content generators or other such transform systems to modify content using this data can thus involve detection of objects (e.g., faces, hands, bodies, cats, dogs, surfaces, objects, etc.), tracking of such objects as they leave, enter, and move around the field of view in video frames, and the modification or transformation of such objects as they are tracked. In various embodiments, different methods for achieving such transformations may be used. For example, some embodiments may involve generating a three-dimensional mesh model of the object or objects, and using transformations and animated textures of the model within the video to achieve the transformation") and includes at least one interactive environmental feature corresponding to at least one of the identified one or more interaction-suitable features (Goodrich: paragraph 94, line(s) 1-9 "a three-dimensional (3D) message refers to an interactive 3D image including at least image and depth data. In an example embodiment, a 3D message is rendered using the subject system to visualize the spatial detail/geometry of what the camera sees, in addition to a traditional image texture. When a viewer interacts with this 3D message by moving a client device, the movement triggers corresponding changes in the perspective the image and geometry are rendered at to the viewer").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Singh's system of claim 1, the hardware processor is further configured to execute the AI-based content immersion environment generator to: identify, using an AI-based visual analyzer of the AI-based content immersion environment generator, one or more interaction-suitable features depicted in at least one of the one or more video frames with the features of a 3-D content immersion environment corresponding to the at least one of the one or more video frames that includes at least one interactive environmental feature corresponding to at least one of the identified one or more interaction-suitable features. As demonstrated by Goodrich, one could implement a 3-D content immersion environment corresponding to the at least one of the one or more video frames that includes at least one interactive environmental feature corresponding to at least one of the identified one or more interaction-suitable features with a system of claim 1 that can be further configured to execute the AI-based content immersion environment generator and the utilize the machine learning model to analyze the AI-based content immersion environment generator of one or more interaction-suitable features depicted by one or more video frames
Regarding claim 9, Singh as modified by Goodrich discloses the system of claim 1, wherein the hardware processor is further configured to execute the AI-based content immersion environment generator to (Singh: paragraph 42, line(s) 4-8 "The virtual interaction engine 110 may be implemented by the processor 108 by executing a user interface application 152, an object extraction model 154, a machine learning model 156, and a virtual event rendering model 158 to auto-generate and share customized virtual environments 131a-131d"; also, paragraph 31, line(s) 7-9 "the virtual interaction engine 110 may be configured to auto-generate customized virtual environments"): merge the media content with the one or more 3-D content immersion environments to provide an enhanced media content configured for rendering on a display of a user system (Singh: paragraph 4, line(s) 5-14 "The disclosed system is configured to apply a machine learning model to map the user behavior objects and the user device objects to corresponding virtual environment objects in the virtual environment for generating a customized virtual environment. The user behavior objects and the user device objects are integrated with the corresponding virtual environment objects into a set of interaction objects"; also, paragraph 45, line(s) 6-7 "The virtual environment objects 146 may be three dimensional (3D) spatial objects"; also, paragraph 20, line(s) 2-3 "The user device 102 may be configured to display the virtual environment").
Regarding claim 10, Singh as modified by Goodrich discloses the system of claim 9, wherein the user system comprises a virtual reality (VR) device (Singh: paragraph 4, line(s) 1-5  "The disclosed system is configured to dynamically generate customized virtual environments based on user behaviors or preferences for an avatar associated with a user device (e.g., augmented reality (AR)/virtual reality (VR) headset) to interact with an entity").
Regarding claim 11, Singh discloses a method for use by a system including a computing platform having a hardware processor and a system memory storing an artificial intelligence based (AI-based) content immersion environment generator (Singh: paragraph 5, line(s) 1-3 "system for auto-generating and sharing customized virtual environments comprises a processor and a memory"; also, paragraph 34, line(s) 3-7 "memory 114 may store a user interface application 152, an object extraction model 154, a machine learning model 156"), the method comprising: receiving media content, by the AI-based content immersion environment generator executed by the hardware processor, the media content including a plurality of video frames (Singh: paragraph 42, line(s) 1-3 "Virtual interaction engine 110 may include, but is not limited to, one or more separate and independent software and/or hardware components of a server "; also, paragraph 44, line(s) 1-2 "The server 104 may receive a plurality of user data objects 162 in a series of video frames"; also, paragraph 42, line(s) 4-8 "The virtual interaction engine 110 may be implemented by the processor 108 by executing a user interface application 152, an object extraction model 154, a machine learning model 156, and a virtual event rendering model 158 to auto-generate and share customized virtual environments 131a-131d"; also, paragraph 31, line(s) 7-9 "the virtual interaction engine 110 may be configured to auto-generate customized virtual environments"); identifying, by the AI-based content immersion environment generator executed by the hardware processor, one or more video frames of the plurality of video frames for use in generating a content immersion environment for a display of the media content (Singh: paragraph 42, line(s) 1-3 "Virtual interaction engine 110 may include, but is not limited to, one or more separate and independent software and/or hardware components of a server "; also, paragraph 44, line(s) 1-2 "The server 104 may receive a plurality of user data objects 162 in a series of video frames"; also, paragraph 42, line(s) 4-8 "The virtual interaction engine 110 may be implemented by the processor 108 by executing a user interface application 152, an object extraction model 154, a machine learning model 156, and a virtual event rendering model 158 to auto-generate and share customized virtual environments 131a-131d"; also, paragraph 31, line(s) 7-9 "the virtual interaction engine 110 may be configured to auto-generate customized virtual environments"; also, paragraph 44, line(s) 12-18 "Each user data object 162 may be included in a video frame corresponding to a timestamp 150. For example, the user data objects 162 may be associated with the use profile 134, audio data, video data, or textual information which the server 104 receives during the interaction. The user data objects 162 may include user behavior objects 164 and user device objects 166 associated with a series of changing events or instances caused by users behaviors and user device operations during the interaction. The server 104 may extract a set of user behavior objects 164 and a set of user device objects 166 from the set of the user data objects 162."; also, paragraph 55, line(s) 1-12 "the server 104 extracts a set of user behavior objects 164 and a set of user device objects 166 from the set of the user data objects 162. In some embodiments, each user behavior object 164 is associated with a type of user behavior corresponding to at least one virtual environment object 146. For example, the set of the user behavior objects 164 correspond to one or more user behaviors through the user device 102 and the avatar 132 on the plurality of the virtual environment objects 146 during the interaction. The server 104 may generate the set of the user behavior objects 164 based on the user behaviors, user interaction patterns, and the context information."; also, paragraph 20, line(s) 2-3 "The user device 102 may be configured to display the virtual environment"); analyzing, by the AI-based content immersion environment generator executed by the hardware processor, foreground, features of each of the one or more video frames to provide one or more respective depth maps of the one or more video frames; and generating, based on the one or more video frames, by the AI-based content immersion environment generator executed by the hardware processor using a trained AI model of the AI-based content immersion environment generator and the one or more respective depth maps a three-dimensional (3-D) content immersion environment corresponding respectively to each of the one or more video frames to provide one or more 3-D content immersion environments for the display of the media content. Singh does not disclose the means for analyzing, by the AI-based content immersion environment generator executed by the hardware processor, foreground, features of each of the one or more video frames to provide one or more respective depth maps of the one or more video frames; and generating, based on the one or more video frames, by the AI-based content immersion environment generator executed by the hardware processor using a trained AI model of the AI-based content immersion environment generator and the one or more respective depth maps a three-dimensional (3-D) content immersion environment corresponding respectively to each of the one or more video frames to provide one or more 3-D content immersion environments for the display of the media content.
However, in a similar field of endeavor, Goodrich discloses the means for analyzing, by the AI-based content immersion environment generator executed by the hardware processor, foreground, features of each of the one or more video frames to provide one or more respective depth maps of the one or more video frames (Goodrich: paragraph 165, line(s) 5-8 "the image and depth data processing module 706 provides a post-process foreground image by applying, using the depth normal map, a 3D effect(s) to a foreground region of the image data"; also, paragraph 166, line(s) 1-5 "generates a view of the 3D message using at least the background inpainted image, the inpainted depth map, and the post-processed foreground image, which are assets that are included the generated 3D message"; also, Fig. 14; also, paragraph 121, line(s) 4-17 "An image includes one or more real-world features, such as a user's face or real-world object(s) detected in the image. In some embodiments, an image includes metadata describing the image. For example, the depth data includes data corresponding to a depth map including depth information based on light rays emitted from a light emitting module directed to an object (e.g., a user's face) having features with different depths (e.g., eyes, ears, nose, lips, etc.). By way of example, a depth map is similar to an image but instead of each pixel providing a color, the depth map indicates distance from a camera to that part of the image (e.g., in absolute terms, or relative to other pixels in the depth map)"); and generating, based on the one or more video frames, by the AI-based content immersion environment generator executed by the hardware processor using a trained AI model of the AI-based content immersion environment generator and the one or more respective depth maps a three-dimensional (3-D) content immersion environment (Goodrich: paragraph 156, line(s) 5-9 "machine learning techniques can train a machine learning model from training data from shared 3D messages in an example (or other image data). Such heuristics include using face tracking and portrait segmentation to generate a depth map of a person.") corresponding respectively to each of the one or more video frames to provide one or more 3-D content immersion environments for the display of the media content (Goodrich: Fig. 13; also, paragraph 16, line(s) 1-2 "capturing image information and generating a 3D message in a display of a client device").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Singh's method for use by a system including a computing platform having a hardware processor and a system memory storing an artificial intelligence based (AI-based) content immersion environment generator with the features of analyzing, by the AI-based content immersion environment generator executed by the hardware processor, foreground, features of each of the one or more video frames to provide one or more respective depth maps of the one or more video frames and generating, based on the one or more video frames, by the AI-based content immersion environment generator executed by the hardware processor using a trained AI model of the AI-based content immersion environment generator and the one or more respective depth maps a three-dimensional (3-D) content immersion environment corresponding respectively to each of the one or more video frames to provide one or more 3-D content immersion environments for the display of the media content. As demonstrated by Goodrich, one could implement the process of analyzing by means of an AI based content immersion environment generator executed by the hardware processor foreground, features of each of the one or more video frames to provide one or more respective depth maps of the one or more video frames and generating, based on the one or more video frames, by the AI-based content immersion environment generator executed by the hardware processor using a trained AI model of the AI-based content immersion environment generator and the one or more respective depth maps a three-dimensional (3-D) content immersion environment corresponding respectively to each of the one or more video frames to provide one or more 3-D content immersion environments for the display of the media content.
Regarding claim 12, Singh as modified by Goodrich discloses the method of claim 11, wherein the AI-based content immersion environment generator includes a graphical user interface (GUI) (Goodrich: Fig, 12; also, paragraph 63, line(s) 18-21 "In some embodiments, a modified image or video stream may be presented in a graphical user interface displayed on the mobile client device"), and wherein before identification of the one or more video frames is performed, the method further comprises: receiving via the GUI from a user of the system, by the AI-based content immersion environment generator executed by the hardware processor, data identifying the one or more video frames or one or more instructions for use when identifying the one or more video frames (Goodrich: paragraph 63, line(s) 22-29"The transform system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. That is, the user may capture the image or video stream and be presented with a modified result in real time or near real time once a modification icon has been selected. Further, the modification may be persistent while the video stream is being captured and the selected modification icon remains toggled. Machine taught neural networks may be used to enable such modifications."; also, paragraph 63, line(s) 6-18 "The transform system operating within the messaging client application 104 determines the presence of a face within the image or video stream and provides modification icons associated with a computer animation model to transform image data, or the computer animation model can be present as associated with an interface described herein. The modification icons include changes which may be the basis for modifying the user's face within the image or video stream as part of the modification operation. Once a modification icon is selected, the transform system initiates a process to convert the image of the user to reflect the selected modification icon (e.g., generate a smiling face on the user)").
Regarding claim 13, Singh as modified by Goodrich discloses the method of claim 12, wherein the one or more instructions are received via the GUI (Goodrich: paragraph 63, line(s) 22-29 "The transform system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. That is, the user may capture the image or video stream and be presented with a modified result in real time or near real time once a modification icon has been selected. Further, the modification may be persistent while the video stream is being captured and the selected modification icon remains toggled. Machine taught neural networks may be used to enable such modifications."), and wherein the one or more instructions command identification of one or more video frames per shot or per scene of the media content (Goodrich: paragraph 63, line(s) 18-27 "In some embodiments, a modified image or video stream may be presented in a graphical user interface displayed on the mobile client device as soon as the image or video stream is captured and a specified modification is selected. The transform system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. That is, the user may capture the image or video stream and be presented with a modified result").
Regarding claim 14, Singh as modified by Goodrich discloses the method of claim 12, wherein the one or more instructions are received via the GUI, and wherein the one or more instructions command identification of one or more video frames per specified timecode interval of the media content (Singh: paragraph 51, line(s) 7-8 "The plurality of the spatial video frames 149 may correspond to different timestamps in the time sequence"; also, paragraph 44, line(s) 4-12 "Each user data object 162 may be included in a video frame corresponding to a timestamp 150. For example, the user data objects 162 may be associated with the use profile 134, audio data, video data, or textual information which the server 104 receives during the interaction"). Singh does not disclose one or more instructions are received via the GUI.
However, in a similar field of endeavor, Goodrich, further discloses additional features wherein the one or more instructions are received via the GUI (Goodrich: paragraph 63, line(s) 22-29 "The transform system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. That is, the user may capture the image or video stream and be presented with a modified result in real time or near real time once a modification icon has been selected. Further, the modification may be persistent while the video stream is being captured and the selected modification icon remains toggled. Machine taught neural networks may be used to enable such modifications.").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Singh's method of claim 12, and the identification of one or more video frames per specified timecode interval of the media content with the features of receiving one or more instructions via the GUI. As demonstrated by Goodrich, one could add in the support for a GUI to provide further instructions on one or more video frames per specified timecode interval of the media content.
Regarding claim 15, Singh as modified by Goodrich discloses the method of claim 12, outputting via the GUI to the user, by the AI-based content immersion environment generator executed by the hardware processor (Singh: paragraph 42, line(s) 4-8 "The virtual interaction engine 110 may be implemented by the processor 108 by executing a user interface application 152, an object extraction model 154, a machine learning model 156, and a virtual event rendering model 158 to auto-generate and share customized virtual environments 131a-131d"; also, paragraph 31, line(s) 7-9 "the virtual interaction engine 110 may be configured to auto-generate customized virtual environments"), the one or more 3-D content immersion environments. Singh does not disclose outputting via the GUI to the user and the one or more 3-D content immersion environments.
However, in a similar field of endeavor, Goodrich, further discloses, outputting via the GUI to the user (Goodrich: paragraph 63, line(s) 18-29 "In some embodiments, a modified image or video stream may be presented in a graphical user interface displayed on the mobile client device as soon as the image or video stream is captured and a specified modification is selected. The transform system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. That is, the user may capture the image or video stream and be presented with a modified result in real time or near real time once a modification icon has been selected. Further, the modification may be persistent while the video stream is being captured and the selected modification icon remains toggled. Machine taught neural networks may be used to enable such modifications.") and one or more 3-D content immersion environments (Goodrich: paragraph 63, line(s) 18-29 "In some embodiments, a modified image or video stream may be presented in a graphical user interface displayed on the mobile client device as soon as the image or video stream is captured and a specified modification is selected. The transform system may implement a complex convolutional neural network on a portion of the image or video stream to generate and apply the selected modification. That is, the user may capture the image or video stream and be presented with a modified result in real time or near real time once a modification icon has been selected. Further, the modification may be persistent while the video stream is being captured and the selected modification icon remains toggled. Machine taught neural networks may be used to enable such modifications.").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Singh's method of claim 12 where the hardware processor executes an AI-based content immersion environment generator with the features of outputting via the GUI to the user and the one or more 3-D content immersion environments. As demonstrated by Goodrich, one could combine the features of Singh invention with the features of Goodrich's inventions.
Regarding claim 16, Singh as modified by Goodrich discloses the method of claim 11,  wherein identifying the one or more video frames of the plurality of video frames for use in generating the content immersion environment is performed using metadata included with the media content (Goodrich: paragraph 55, line(s) 1-6 "Data and various systems using augmented reality content generators or other such transform systems to modify content using this data can thus involve detection of objects (e.g., faces, hands, bodies, cats, dogs, surfaces, objects, etc.), tracking of such objects as they leave, enter, and move around the field of view in video frames,").
Regarding claim 17, Singh as modified by Goodrich discloses the method of claim 11, wherein to generate the 3-D content immersion environment corresponding respectively to each of the one or more video frames, the method further comprises: determining, by the AI-based content immersion environment generator executed by the hardware processor (Singh: paragraph 42, line(s) 4-8 "The virtual interaction engine 110 may be implemented by the processor 108 by executing a user interface application 152, an object extraction model 154, a machine learning model 156, and a virtual event rendering model 158 to auto-generate and share customized virtual environments 131a-131d"; also, paragraph 31, line(s) 7-9 "the virtual interaction engine 110 may be configured to auto-generate customized virtual environments"), whether any of the one or more video frames contains an image segment depicting a human, humanoid, or animal; and inpainting the image segment, by the AI-based content immersion environment generator executed by the hardware processor when determining determines that the one or more vide frames contains the image segment, thereby obscuring the human, humanoid, or animal to provide one or more inpainted video frames; wherein generating a 3-D content immersion environment corresponding to an inpainted video frame uses the inpainted video frame. Singh does not disclose the feature to generate the 3-D content immersion environment corresponding respectively to each of the one or more video frames and whether any of the one or more video frames contains an image segment depicting a human, humanoid, or animal; and inpainting the image segment, by the AI-based content immersion environment generator executed by the hardware processor when determining determines that the one or more vide frames contains the image segment, thereby obscuring the human, humanoid, or animal to provide one or more inpainted video frames; wherein generating a 3-D content immersion environment corresponding to an inpainted video frame uses the inpainted video frame.
However, in a similar field of endeavor, Goodrich, further discloses the ability to generate the 3-D content immersion environment corresponding respectively to each of the one or more video frames (Goodrich: paragraph 55, line(s) 1-13 "Data and various systems using augmented reality content generators or other such transform systems to modify content using this data can thus involve detection of objects (e.g., faces, hands, bodies, cats, dogs, surfaces, objects, etc.), tracking of such objects as they leave, enter, and move around the field of view in video frames, and the modification or transformation of such objects as they are tracked. In various embodiments, different methods for achieving such transformations may be used. For example, some embodiments may involve generating a three-dimensional mesh model of the object or objects, and using transformations and animated textures of the model within the video to achieve the transformation") whether any of the one or more video frames contains an image segment depicting a human, humanoid, or animal; and inpainting the image segment, by the AI-based content immersion environment generator executed by the hardware processor when determining determines that the one or more vide frames contains the image segment, thereby obscuring the human, humanoid, or animal to provide one or more inpainted video frames; wherein generating a 3-D content immersion environment corresponding to an inpainted video frame uses the inpainted video frame (Goodrich: Fig. 15; also, paragraph 18, line(s) 1-3 "FIG. 15 is an example illustrating a depth inpainting mask and depth inpainting, according to some example embodiments."; also, paragraph 161, line(s) 1-3 "image and depth data processing module 706 generates a segmentation mask based at least in part on the image data"; also, paragraph 161, line(s) 6-10 "a prediction is made for every pixel to assign the pixel to a particular object class (e.g., face/portrait or background), and the segmentation mask is determined based on the groupings of the classified pixels (e.g., face/portrait or background)."; also, paragraph 162, line(s) 5-9 "the image and depth data processing module 706 performs a background inpainting technique that eliminates the portrait (e.g., including the user's face) from the background and blurring the background to focus on the person in the frame."; also, paragraph 166, line(s) 1-5 "generates a view of the 3D message using at least the background inpainted image, the inpainted depth map, and the post-processed foreground image, which are assets that are included the generated 3D message").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Singh's method of claim 11 where the AI-based content immersion environment generator is executed by a hardware processor with the features of generating the 3-D content immersion environment corresponding respectively to each of the one or more video frames whether any of the one or more video frames contains an image segment depicting a human, humanoid, or animal; and inpainting the image segment, by the AI-based content immersion environment generator executed by the hardware processor when determining determines that the one or more vide frames contains the image segment, thereby obscuring the human, humanoid, or animal to provide one or more inpainted video frames; wherein generating a 3-D content immersion environment corresponding to an inpainted video frame uses the inpainted video frame. As demonstrated by Goodrich, one could combine the features of Singh invention with the features of Goodrich's inventions which include the addition for generating the 3-D content immersion environment corresponding respectively to each of the one or more video frames whether any of the one or more video frames contains an image segment depicting a human, humanoid, or animal; and inpainting the image segment, by the AI-based content immersion environment generator executed by the hardware processor when determining determines that the one or more vide frames contains the image segment, thereby obscuring the human, humanoid, or animal to provide one or more inpainted video frames; wherein generating a 3-D content immersion environment corresponding to an inpainted video frame uses the inpainted video frame.
Regarding claim 18, Singh as modified by Goodrich discloses the method of claim 11,  identifying, by the AI-based content immersion environment generator executed by the hardware processor (Singh: paragraph 42, line(s) 4-8 "The virtual interaction engine 110 may be implemented by the processor 108 by executing a user interface application 152, an object extraction model 154, a machine learning model 156, and a virtual event rendering model 158 to auto-generate and share customized virtual environments 131a-131d"; also, paragraph 31, line(s) 7-9 "the virtual interaction engine 110 may be configured to auto-generate customized virtual environments") and using an AI-based visual analyzer of the AI-based content immersion environment generator, one or more interaction-suitable features depicted in at least one of the one or more video frames (Singh: paragraph 7, line(s) 4-7 "generates customized virtual environments in real-time based on user behaviors and user device operations corresponding to changes to virtual environment objects during the interactions"; also, paragraph 14, line(s) 4-7 "auto-generate and share customized virtual environments for users to perform interactions through user devices 102 and avatars 132 with an entity. In one embodiment, system 100 comprises a server 104"; also, paragraph 15, line(s) 2-27 "the server 104 to auto-generate customized virtual environments 131 with seamless integration with user needs and preferences for an avatar 132 associated with a user device 102 to implement an interaction with an entity. For example, the server 104 may extract user behavior objects 164 and user device objects 166 from user data objects 162 of the interaction. The server 104 may apply a machine learning model 156 to map the user behavior objects 164 and the user device objects 166 to corresponding virtual environment objects 146 in a virtual operation area 140 in a virtual environment 130. The server 104 may integrate the user behavior objects 164 and the user device objects 166 with the corresponding virtual environment objects 146 into a set of interaction objects 168. The server 104 may generate customized virtual environment objects 148 based on the interaction objects 168. The customized virtual environment objects 148 may be rendered in a customized virtual environment 131 corresponding to the interaction. The server 104 may update the user behavior objects 164 and the user device objects 166 in real time based on the detected new user behaviors and new parameters through the user device 102. The customized virtual environment 131 may be modified in synchronization with the updated user behavior objects 164 and the user device objects 166 to facilitate a user seamless interaction in real-time."); wherein a 3-D content immersion environment corresponding to the at least one of the one or more video frames includes at least one interactive environmental feature corresponding to at least one of the identified one or more interaction-suitable features. Singh does not disclose a 3-D content immersion environment corresponding to the at least one of the one or more video frames includes at least one interactive environmental feature corresponding to at least one of the identified one or more interaction-suitable features.
However, in a similar field of endeavor, Goodrich further discloses a 3-D content immersion environment corresponding to the at least one of the one or more video frames (Goodrich: paragraph 55, line(s) 1-13 "Data and various systems using augmented reality content generators or other such transform systems to modify content using this data can thus involve detection of objects (e.g., faces, hands, bodies, cats, dogs, surfaces, objects, etc.), tracking of such objects as they leave, enter, and move around the field of view in video frames, and the modification or transformation of such objects as they are tracked. In various embodiments, different methods for achieving such transformations may be used. For example, some embodiments may involve generating a three-dimensional mesh model of the object or objects, and using transformations and animated textures of the model within the video to achieve the transformation") includes at least one interactive environmental feature corresponding to at least one of the identified one or more interaction-suitable features (Goodrich: paragraph 94, line(s) 1-9 "a three-dimensional (3D) message refers to an interactive 3D image including at least image and depth data. In an example embodiment, a 3D message is rendered using the subject system to visualize the spatial detail/geometry of what the camera sees, in addition to a traditional image texture. When a viewer interacts with this 3D message by moving a client device, the movement triggers corresponding changes in the perspective the image and geometry are rendered at to the viewer").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified Singh's method of claim 11 where identifying, by the AI-based content immersion environment generator executed by the hardware processor, and using an AI-based visual analyzer of the AI-based content immersion environment generator, one or more interaction-suitable features depicted in at least one of the one or more video frames with the features of  a 3-D content immersion environment corresponding to the at least one of the one or more video frames includes at least one interactive environmental feature corresponding to at least one of the identified one or more interaction-suitable features. As demonstrated by Goodrich, one could implement the features of a 3-D content immersion environment corresponding to the at least one of the one or more video frames includes at least one interactive environmental feature corresponding to at least one of the identified one or more interaction-suitable features.
Regarding claim 19, Singh as modified by Goodrich discloses the method of claim 11,  merging, by the AI-based content immersion environment generator executed by the hardware processor, the media content with the one or more 3-D content immersion environments to provide an enhanced media content configured for rendering on a display of a user system (Singh: paragraph 42, line(s) 4-8 "The virtual interaction engine 110 may be implemented by the processor 108 by executing a user interface application 152, an object extraction model 154, a machine learning model 156, and a virtual event rendering model 158 to auto-generate and share customized virtual environments 131a-131d"; also, paragraph 31, line(s) 7-9 "the virtual interaction engine 110 may be configured to auto-generate customized virtual environments"; also, paragraph 4, line(s) 5-14 "The disclosed system is configured to apply a machine learning model to map the user behavior objects and the user device objects to corresponding virtual environment objects in the virtual environment for generating a customized virtual environment. The user behavior objects and the user device objects are integrated with the corresponding virtual environment objects into a set of interaction objects"; also, paragraph 45, line(s) 6-7 "The virtual environment objects 146 may be three dimensional (3D) spatial objects"; also, paragraph 20, line(s) 2-3 "The user device 102 may be configured to display the virtual environment").
Regarding claim 20, Singh as modified by Goodrich discloses the method of claim 19, wherein the user system comprises a virtual reality (VR) device (Singh: paragraph 4, line(s) 1-5 "The disclosed system is configured to dynamically generate customized virtual environments based on user behaviors or preferences for an avatar associated with a user device (e.g., augmented reality (AR)/virtual reality (VR) headset) to interact with an entity").
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAI WEI TOMMY LI whose telephone number is (571)272-1170. The examiner can normally be reached 6:00AM-4:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/JAI W LI/Junior Examiner, Art Unit 2613              


/XIAO M WU/Supervisory Patent Examiner, Art Unit 2613
Read full office action
Artificial Intelligence Based Content Immersion Environment Generation

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Artificial Intelligence Based Content Immersion Environment Generation

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email