Last updated: April 19, 2026

Application No. 18/058,865

GENERATING 3D VIDEO USING 2D IMAGES AND AUDIO WITH BACKGROUND KEYED TO 2D IMAGE-DERIVED METADATA

Non-Final OA §103

Filed

Nov 26, 2022

Examiner

WANG, YUEHAN

Art Unit

2617

Tech Center

2600 — Communications

Assignee

Sony Interactive Entertainment LLC

OA Round

5 (Non-Final)

Interview Optional

— +12.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 485 resolved cases, 2023–2026

Examiner Intelligence

WANG, YUEHAN View full profile →

Grants 83% — above average

Career Allow Rate

404 granted / 485 resolved

+21.3% vs TC avg

Moderate +13% lift

Without

With

+12.9%

Interview Lift

resolved cases with interview

Typical timeline

2y 7m

Avg Prosecution

47 currently pending

Career history

532

Total Applications

across all art units

Statute-Specific Performance

§101

4.3%

-35.7% vs TC avg

§103

69.6%

+29.6% vs TC avg

§102

8.3%

-31.7% vs TC avg

§112

6.6%

-33.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 485 resolved cases

Office Action

§103

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Response to Amendment
Applicant’s amendments filed on 05 November, 2025 have been entered. Claims 4 and 15 have been previously canceled. Claims 1-3, 5-14, and 16-22 are still pending in this application, with claims 1, 11 and 16 being independent.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05 November, 2025 has been entered.
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-3, 6-13 and 15-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over IDS Snap Inc. (US 20210065454 A1), referred herein as Snap in view of Arora et al. (US 10574974 B2), referred herein as Arora and Kopeinigg et al. (US 20200312037 A1), referred herein as Kopeinigg.
Regarding claim 1, Snap discloses an apparatus comprising:
at least one processor (Fig. 7, para [0131]) configured to:
Snap does not but Arora teaches 
extract 2D image information from a 2D image showing at least one object that has at least one feature (FIG 7.708: Extract first features from the first image data; FIG 7.712: Determine matching features between the first and second features), 
Snap further teaches
match the at least one feature in the 2D image information to at least one 3D model (Figs. 1, 6; para [0124]),
Snap does not but DAHA teaches 
customize the at least one 3D model based on the at least one feature (para [0002] creation of a 3D data model from a 2D image… The software matches common features in the images);
Snap further teaches
generate an animation of the at least one customized 3D model based on the 2D image information (Fig. 1, para [0046] "A media overlay may include audio and visual content and visual effects. Examples of audio and visual content include pictures, texts, logos, animations, and sound effects. An example of a visual effect includes color overlaying. The audio and visual content or the visual effects can be applied to a media content item (e.g., a photo) at the client device 102."); and
Kopeinigg further teaches
cause a presentation of the animation as a 3D interactive video with the at least one customized 3D model (para [0024] Custom mesh 204 may input a parametric 3D model 220 that is used to form custom 3D model 208 by applying one or more parameters 216 to parametric 3D model 220. Animation stage 210 may then input a motion capture video 222 and output a set of presentation data 224 that is configured for presentation to a user 226).
It would have been obvious to one of ordinary skill in the art to have modified the systems and methods of Snap to apply the method of Visual Simultaneous Localization and Mapping (Visual SLAM), such that the system can generate high quality 3D models, as suggested by Arora (col 1, ln. 34-37).
It would have been obvious to one of ordinary skill in the art to have modified the systems and methods of Snap to apply the animation stage 210 to generate presentation data 224 which, as shown, causes custom 3D model 208 to perform a particular motion, as suggested by Kopeinigg (par [0062]).

Regarding claim 2, Snap in view of Arora and Kopeinigg discloses the apparatus of Claim 1, and Snap further teaches wherein the processor is configured to:
receive from at least one input device at least one interaction signal (Fig. 7, para [0138] "By way of example, the user can utilize various inputs to rotate the selectable graphical items onto and off of the display screen in manner corresponding to a carousel providing a cyclic view of the graphical items."); and
alter presentation of the 3D interactive video at least in part based on the interaction signal (Figs. 12-13; para [0138] "By way of example, the user can utilize various inputs to rotate the selectable graphical items onto and off of the display screen in manner corresponding to a carousel providing a cyclic view of the graphical items ... In an example, augmented reality content generators can be organized into respective groups for including on the carousel arrangement thereby enabling rotating through augmented reality content generators by group.").

Regarding claim 3, Snap in view of Arora and Kopeinigg discloses the apparatus of Claim 2, and Snap further teaches wherein the at least one input device comprises at least one touch element on at least one computer simulation controller (Figs. 12-13; para [0141]).

Regarding claim 6, Snap in view of Arora and Kopeinigg discloses the apparatus of Claim 1, and Snap further teaches wherein the at least one feature of the 2D image information comprises human image texture (para [0181] "For instance, machine learning models can be applied in a beautification operation such as convolutional neural networks·, generative adversarial networks, and the like. Such machine learning models can be utilized to preserve facial feature structures, smooth blemishes or remove wrinkles, or preserve facial skin texture in facial image data.").

Regarding claim 7, Snap in view of Arora and Kopeinigg discloses the apparatus of Claim 1, and Snap further teaches wherein the at least one feature of the 2D image information comprises human image motion (Figs. 17-18; para [0222]).

Regarding claim 8, Snap in view of Arora and Kopeinigg discloses the apparatus of Claim 1, and Snap further teaches wherein the at least one feature of the 2D image information comprises human image facial emotion (para [0062] "The complex image manipulations may include size and shape changes, emotion transfers (e.g., changing a face from a frown to a smile), state transfers ... ").

Regarding claim 9, Snap in view of Arora and Kopeinigg discloses the apparatus of Claim 1, and Snap further teaches wherein the at least one feature of the 2D image information comprises environment type (Fig. 9, para [0161] "In an embodiment, the image and depth data processing module 706 determines the segmentation mask using a convolutional neural network to perform dense prediction tasks where a prediction is made for every pixel to assign the pixel to a particular object class (e.g., face/portrait or background), and the segmentation mask is determined based on the groupings of the classified pixels (e.g., face/portrait or background).").

Regarding claim 10, Snap in view of Arora and Kopeinigg discloses the apparatus of Claim 1, and Snap further teaches wherein the processor is programmed to: alter presentation of the 3D interactive video responsive to input from a time slider input element (Fig. 12, para [0138]).

Regarding claim 11, Snap in view of Arora and Kopeinigg discloses a device.
Snap teaches non-transitory computer-readable medium storing instructions that, when executed by one or more processors of a computing device, cause the one or more processors to perform operations to (Fig. 7, para [0131]) to: identify at least one texture of at least one human image in a 2D image (para [0181] "For instance, machine learning models can be applied in a beautification-operation such as convolutional neural networks, generative adversarial networks, and the like. Such machine learning models can be utilized to preserve facial feature structures, smooth blemishes or remove wrinkles, or preserve facial skin texture in facial image data.");
The metes and bounds of the claim substantially correspond to the claim as set forth in Claim 1; thus they are rejected on similar grounds and rationale as their corresponding limitations.

Regarding claim 12, Snap in view of Arora and Kopeinigg discloses the device of Claim 11, and Snap further teaches wherein the instructions are executable to: present on at least one 3D display the 3D video (Fig. 12, para [0197]).

Regarding claim 13, Snap in view of Arora and Kopeinigg discloses the device of Claim 11, and Snap further teaches wherein the instructions are executable to:
receive from at least one input device at least one interaction signal (Fig. 7, para [0138] "By way of example, the user can utilize various inputs to rotate the selectable graphical items onto and off of the display screen in manner corresponding to a carousel providing a cyclic view of the graphical ite.ms."); and
alter presentation of the 3D video at least in part based on the interaction signal (Figs. 12-13; para [0138] "By way of example, the user can utilize various inputs to rotate the selectable graphical items onto and off of the display screen in manner corresponding to a carousel providing a cyclic view of the graphical items ... In an example, augmented reality content generators can be organized into respective groups for including on the carousel arrangement thereby enabling rotating through augmented reality content generators by group.").

Regarding claim 15, Snap in view of Arora and Kopeinigg discloses the device of Claim 11, and Snap further teaches comprising the at least one processor (Fig. 7, para [0131]).

Regarding claim 16, Snap in view of Arora and Kopeinigg discloses a method.
Snap teaches accessing 2D images in a 3D spatial memory (para [0094] "In an example embodiment, a 3D message is rendered using the subject system to visualize the spatial detail/geometry of what the camera sees, in addition to a traditional image texture. When a viewer interacts with this 3D message by moving a client device, the movement triggers corresponding changes in the perspective the image and geometry are rendered at to the viewer .... In an embodiment, the subject system provides 3D effects that work in conjunction with other components of the system to process depth data, which provides particles, shaders, 2D assets and 3D geometry that can inhabit different depth-planes within messages.");
The metes and bounds of the claim substantially correspond to the claim as set forth in Claim 1; thus they are rejected on similar grounds and rationale as their corresponding limitations.

Regarding claim 17, Snap in view of Arora and Kopeinigg discloses the method of Claim 16, and Snap further teaches comprising: animating at least one character in the 3D video based on action in the 2D images (Fig. 18, para [0225]).

Regarding claim 18, Snap in view of Arora and Kopeinigg discloses the method of Claim 16, and Snap further teaches comprising: playing audible dialog in the 3D video based at least in part on an image of a user viewing the 3D video (Fig. 1, para [0046] "A media overlay may include audio and visual content and visual effects. Examples of audio and visual content include pictures, texts, logos, animations, and sound effects. An example of a visual effect includes color overlaying. The audio and visual content or the visual effects can be applied to a media content item (e.g., a photo) at the client device 102.").

Regarding claim 19, Snap in view of Arora and Kopeinigg discloses the method of Claim 16, and Snap further teaches comprising: animating at least one character in the 3D video based on input from an input device (Fig. 18, para [0225]).

Regarding claim 20, Snap in view of Arora and Kopeinigg discloses the apparatus of Claim 1, and Snap further teaches wherein the at least one processor is configured to match the at least one feature in the 2D image information by at least: generating a classification of the at least one feature in the 2D image information (para. [0021] machine learning techniques such as deep convolutional neural networks that perform classifications of each pixel in the depth map, encoder-decoder architecture for segmentation, fully convolutional networks, feature maps, deconvolutional networks, unsupervised feature learning, and the like), 
Kopeinigg further teaches wherein the at least one 3D model is selected from a repository of 3D models based on the classification (para. [0045] parametric 3D model 220 may serve as a generic model of the subject type in question (e.g., the human subject type in the ongoing example provided here of girl 302) that may take the form of various subjects of the subject type when different parameters are applied; para. [0048] parametric 3D model 220 may include each of the joints and model bones of skeletal model 602 (e.g., a plurality of joints shared by all subjects of the particular subject type) and allow these to be customized based on parameters 216).

Regarding claim 21, Snap in view of Arora and Kopeinigg discloses the apparatus of Claim 20, and Snap further teaches wherein the classification is generated based on at least one of: a physical use attribute, an image background attribute, or a physical location attribute (para. [0161] assign the pixel to a particular object class (e.g., face/portrait or background), and the segmentation mask is determined based on the groupings of the classified pixels (e.g., face/portrait or background)).


Claim(s) 5 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over IDS Snap Inc. (US 20210065454 A1), referred herein as Snap in view of Arora et al. (US 10574974 B2), referred herein as Arora, Kopeinigg et al. (US 20200312037 A1), referred herein as Kopeinigg, and IDS Nokia Technologies OY (US 20180276476 A1), referred herein as Nokia.
Regarding claim 5, Snap in view of Arora and Kopeinigg discloses the apparatus of Claim 1, but does not explicitly state wherein the processor is programmed to: alter presentation of the 3D interactive video responsive to an incoming telephone call. 
However, Nokia discloses wherein the processor is programmed to: alter presentation of the 3D interactive video responsive to an incoming telephone call (Fig. 2, para [0080]). 
It would have been obvious to one of ordinary skill in the art to have modified the systems and methods of Snap to alter the presentation of the interactive media in response to a phone call, such that the system can alert the user of emergency notifications during an interactive media session, as suggested by Nokia (para [0079]-[0081]).

Regarding claim 14, Snap in view of Arora and Kopeinigg discloses the device of Claim 11, but does not explicitly state wherein the instructions are executable to: alter presentation of the 3D interactive video responsive to an incoming telephone call. 
However, Nokia discloses altering the presentation of the 3D interactive video responsive to an incoming telephone call (Fig. 2, para [0080]). It would have been obvious to one of ordinary skill in the art to have modified the systems and methods of BU to alter the presentation of the interactive media in response to a phone call, such that the system can alert the user of emergency notifications during an interactive media session, as suggested by Nokia (para [0079]-[0081]).

Claim(s) 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over IDS Snap Inc. (US 20210065454 A1), referred herein as Snap in view of Arora et al. (US 10574974 B2), referred herein as Arora, Kopeinigg et al. (US 20200312037 A1), referred herein as Kopeinigg, and Singh et al. (US 20220245888 A1), referred herein as Singh.
Regarding claim 22, Snap in view of Arora and Kopeinigg discloses the apparatus of Claim 1, but does not explicitly state wherein the at least one processor is configured to: determine audio information that corresponds to the at least one feature in the 2D image information; and synchronize the audio information with the animation of the at least one customized 3D model based on the 2D image information.
However, Singh discloses wherein the at least one processor is configured to: determine audio information that corresponds to the at least one feature in the 2D image information; and synchronize the audio information with the animation of the at least one customized 3D model based on the 2D image information (para [0035] the animation layer(s) 124 can include an audio feature or audio hotspot. For instance, an audio file (e.g., a sound effect, a voice, music, or other audio sample) can be associated with the mapping locator 118 such that an interaction with the mapping locator 118).
It would have been obvious to one of ordinary skill in the art to have modified the systems and methods of Snap to include an audio feature or audio hotspot in the animation layer(s) in the 3D virtual environment, as suggested by Singh (para [0035])

Response to Arguments
Applicant’s arguments, see page 7, filed on 05 November, 2025, with respect to the rejection(s) of claim(s) 1, 11 and 16 under 103 rejection have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Kopeinigg

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Samantha (Yuehan) Wang whose telephone number is (571)270-5011. The examiner can normally be reached Monday-Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached on (571)272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Samantha (YUEHAN) WANG/
Primary Examiner
Art Unit 2617

Read full office action

Prosecution Timeline

Nov 26, 2022

Application Filed

Sep 20, 2024

Non-Final Rejection — §103

Oct 11, 2024

Response Filed

Dec 17, 2024

Final Rejection — §103

Feb 05, 2025

Examiner Interview Summary

Feb 05, 2025

Applicant Interview (Telephonic)

Feb 24, 2025

Request for Continued Examination

Feb 25, 2025

Response after Non-Final Action

Mar 11, 2025

Non-Final Rejection — §103

Jun 17, 2025

Applicant Interview (Telephonic)

Jun 17, 2025

Examiner Interview Summary

Jun 18, 2025

Response Filed

Aug 06, 2025

Final Rejection — §103

Oct 28, 2025

Applicant Interview (Telephonic)

Oct 28, 2025

Examiner Interview Summary

Nov 05, 2025

Request for Continued Examination

Nov 14, 2025

Response after Non-Final Action

Feb 13, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/198,019

Patent 12597178

VECTOR OBJECT PATH SEGMENT EDITING

2y 5m to grant Granted Apr 07, 2026

18/528,922

Patent 12597506

ENDOSCOPIC EXAMINATION SUPPORT APPARATUS, ENDOSCOPIC EXAMINATION SUPPORT METHOD, AND RECORDING MEDIUM

2y 5m to grant Granted Apr 07, 2026

18/492,720

Patent 12586286

DIFFERENTIABLE REAL-TIME RADIANCE FIELD RENDERING FOR LARGE SCALE VIEW SYNTHESIS

2y 5m to grant Granted Mar 24, 2026

18/584,076

Patent 12586261

IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM

2y 5m to grant Granted Mar 24, 2026

18/372,370

Patent 12567182

USING AUGMENTED REALITY TO VISUALIZE OPTIMAL WATER SENSOR PLACEMENT

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

83%

Grant Probability

96%

With Interview (+12.9%)

2y 7m

Median Time to Grant

High

PTA Risk

Based on 485 resolved cases by this examiner. Grant probability derived from career allow rate.

GENERATING 3D VIDEO USING 2D IMAGES AND AUDIO WITH BACKGROUND KEYED TO 2D IMAGE-DERIVED METADATA

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email