Last updated: April 19, 2026

Application No. 18/602,834

MACHINE LEARNING MODELS FOR RECONSTRUCTION AND SYNTHESIS OF DYNAMIC SCENES FROM VIDEO

Non-Final OA §102§103

Filed

Mar 12, 2024

Examiner

SUN, HAI TAO

Art Unit

2616

Tech Center

2600 — Communications

Assignee

Nvidia Corporation

OA Round

1 (Non-Final)

Interview Optional

— +26.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 476 resolved cases, 2023–2026

Examiner Intelligence

SUN, HAI TAO View full profile →

Grants 73% — above average

Career Allow Rate

347 granted / 476 resolved

+10.9% vs TC avg

Strong +27% interview lift

Without

With

+26.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 7m

Avg Prosecution

35 currently pending

Career history

511

Total Applications

across all art units

Statute-Specific Performance

§101

6.9%

-33.1% vs TC avg

§103

65.8%

+25.8% vs TC avg

§102

2.3%

-37.7% vs TC avg

§112

15.9%

-24.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 476 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Election/Restrictions
Applicant hereby elects Group I, claims 1-9 and 14-20, for examination, without traverse. Claims 10-13 are withdrawn.

Claim Objections
Claim 1 is objected to because of the following informalities: the language “one more circuits” is not correct.  Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-2, 5-9, 14-15, and 17-20 are rejected under 35 U.S.C. 102 (a) (1) as being anticipated by Xian (US 11748940 B1).
Regarding to claim 1 (Original), Xian discloses a processor (col. 8, lines 45-55: the system use a space-time neural radiance field (NeRF) framework to build a 4D representation of (x, y, z, t) for the dynamic scene; Fig. 4; col. 18, lines 30-40: use a spatiotemporal representation of a scene to generate image of the scene for an arbitrary view position, view direction and time; Fig. 5; col. 20, lines 33-55: the computer system 500 includes a processor 502, memory 504, storage 1006, an input/output interface, and a bus; processor 502 includes hardware for executing instructions) comprising: 
one more circuits to (Fig. 5; col. 20, lines 33-55: computer system 500 includes a processor 502, memory 504, storage 1006, an input/output interface, and a bus; processor 502 includes hardware for executing instructions; processor is circuits; col. 23, lines 5-15: one or more semiconductor based or other integrated circuits, i.e., ICs): 
determine, using a neural network and based on a three-dimensional (3D) representation of one or more scenes, a four-dimensional (4D) representation of the one or more scenes (col. 1, lines 58-67: preserve motion and texture details for conveying a vivid sense of 3D; the system uses a space-time neural radiance fields (NeRF) framework to build a 4D representation (x, y, z, t) for a dynamic scene; col. 8, lines 45-55: the system use a space-time neural radiance field (NeRF) framework to build a 4D representation of (x, y, z, t) for the dynamic scene; a space-time representation is a 4D representation; col. 11, lines 1-15: the system represents a 4D space-time radiance field as a function that maps a spatiotemporal location (x, t) to the emitted color and volume density; x represents the 3D location of (x, y, z), t represents time; col. 17, lines 55-65: the system collectively creates the 4D representation based on all images in the video), the 3D representation generated by a featurizer using a plurality of first image frames from video data of the one or more scenes (col. 9, lines 1-10: the system constrains the time-varying geometry of the dynamic scene representation using per-frame scene depth of the input video; col. 16, lines 24-50: the system uses volume rendering for both RGB and depth; using the volume rendering, the system determines depth value for each 3D position along the casted ray and modulate these depth values with corresponding volume densities.); and 
determine, from the 4D representation, a target image having a target pose and a target time (col. 8, lines 60-67: multiple posed images; col.17, lines 60-67: output the 4D representation; represent the spatiotemporal neural radiance or radiance fields;  the system renders an image for any viewpoints, view directions, and any time moments of the scene; col. 18, lines 10-20: generate an image for a particular viewpoint, view direction and time; the system casts a ray for each and every pixel of the image; Fig. 4; col. 18, lines 30-40: use a spatiotemporal representation of a scene to generate image of the scene for an arbitrary view position, view direction and time; an arbitrary view position, view direction, and time are pose and time).

Regarding to claim 2 (Original),  Xian discloses the processor of claim 1, wherein the featurizer comprises at least one of a latent diffusion model, a flow model, or a depth model (Xian;  col. 16, lines 24-50: the system uses volume rendering for both RGB and depth; modulate these depth values with corresponding volume densities; col. 19, lines 5-20: a depth; the one or more time varying geometries in the scene may be constrained using depth maps of the series of images of the video).

Regarding to claim 5 (Original),  Xian discloses the processor of claim 1, wherein the one or more circuits are to apply a volume rendering to the 4D representation, according to the target pose and the target time, to retrieve the target image (Xian; col. 8, lines 55-65: the space-time NeRF framework uses and applies a continuous volume rendering method; allow the color a pixel to be determined by integrating the radiance modulated by the volume density along the camera ray; col. 10, lines 50-60: the continuous volume rendering of the pixel colors are approximated by numerical quadrature; col. 12, lines 60-67).

Regarding to claim 6 (Original), Xian discloses the processor of claim 1, wherein the neural network comprises a transformer, and the one or more circuits are to update the transformer by (Xian; col. 15, lines 1-10: adjust the network parameters to minimize the corresponding loss metrics; the neural network parameters are a transformer): 
identifying a second image frame of the video data of the one or more scenes, the second image frame having a second pose and a second time (Fig. 2A; col. 14, lines 1-10: the camera moves from left to right generating a series of images; randomly identify and sample the space and time position within the collection images; randomly sample the time t, then, another time tl, to draw the 3D coordinates and identify second image; col. 14, lines 10-15: the system may randomly some 3D positions at one time and another time to determine whether the output color values are the same); 
determining, from the 4D representation, an estimated image having the second pose and the second time (col. 11, lines 15-35: this loss function may minimize the photometric loss of the ground truth images and the generated images; an estimated image is the generated images with different pose and time; Fig. 2A; col. 14, lines 40-60:  generate a space-time representation of a scene; the neural network 201 generates the output data including the RGB color values, the depths, the empty space locations, the static scene locations); and 
updating the transformer according to a comparison of the estimated image with the second image frame (Xian; Fig. 2A; col. 14, lines 60-67: use the depth reconstruction loss function 214 to compare the ground truth depth values from the depth maps 304, use the empty-space loss function 215 to compare toe the ground truth empty space locations 205, and use the static scene loss function 216 to compare to the ground truth static scene locations 206; col. 15, lines 1-10: all comparison results may be fed back to the neural network 201; adjust the network parameters to minimize the corresponding loss metrics based on comparison; col. 15, lines 20-45: the system may fix the weights for losses; 
    PNG
    media_image1.png
    36
    432
    media_image1.png
    Greyscale
).

Regarding to claim 7  (Original), Xian discloses the processor of claim 6, wherein the one or more circuits are to perform the comparison of the estimated image and the second image frame according to a photometric loss function (Xian;  col. 11, lines 15-35: this loss function may minimize the photometric loss of the ground truth images and the generated images; col. 14, lines 25-41: lose function; 
    PNG
    media_image2.png
    80
    436
    media_image2.png
    Greyscale
; Fig. 2A; col. 14, lines 40-50: the total loss function includes a linear combination, a weighted linear combination, or any suitable combination of all four loss functions; Fig. 2A; col. 14, lines 60-67: use the depth reconstruction loss function 214 to compare the ground truth depth values from the depth maps 304, use the empty-space loss function 215 to compare toe the ground truth empty space locations 205, and use the static scene loss function 216 to compare to the ground truth static scene locations 206).

Regarding to claim 8 (Original), Xian discloses the processor of claim 6, wherein the plurality of first image frames have a plurality of first time steps, and the second image frame has a second time step subsequent to the plurality of first time steps (Xian; col. 11, lines 1-10: map a spatiotemporal location (x, t) to the emitted color and volume density; where x represents the 3D location of (x, y, z), t represents time; col. 11, lines 10-21: time steps; col. 12, lines 15-30: time varying geometry of the dynamic scene; Fig. 2A; col. 14, lines 1-20).

Regarding to claim 9 (Original),  Xian discloses the processor of claim 1, wherein the processor is comprised in at least one of: 
a system for generating synthetic data; 
a system for performing simulation operations; 
a system for performing conversational AI operations; 
a system for performing collaborative content creation for 3D assets; 
a system comprising one or more large language models (LLMs); 
a system for performing digital twin operations; 
a system for performing light transport simulation; 
a system for performing deep learning operations; 
a system implemented using an edge device; 
a system implemented using a robot; 
a control system for an autonomous or semi-autonomous machine; 
a perception system for an autonomous or semi-autonomous machine; 
a system incorporating one or more virtual machines (VMs); 
a system implemented at least partially in a data center; or 
a system implemented at least partially using cloud computing resources (or is optional; Xian; col. 20, lines 15-25: span multiple data centers; one or more cloud components in one or more networks.).

Regarding to claim 14 (Original), Xian discloses a method (col. 8, lines 45-55: the system use a space-time neural radiance field (NeRF) framework to build a space-time representation, e.g., a 4D representation of (x, y, z, t) precluding view dependency, for the dynamic scene; Fig. 4; col. 18, lines 30-40: use a spatiotemporal representation of a scene to generate image of the scene for an arbitrary view position, view direction and time; Fig. 5; col. 20, lines 33-55: the computer system 500 includes a processor 502, memory 504, storage 1006, an input/output interface, and a bus; processor 502 includes hardware for executing instructions) comprising: 
generating, using one or more processors by a featurizer using a plurality of first image frames from video data of one or more scenes, a three-dimensional (3D) representation of the one or more scenes (col. 9, lines 1-10: the system constrains the time-varying geometry of the dynamic scene representation using per-frame scene depth of the input video; col. 16, lines 24-50: the system uses volume rendering for both RGB and depth; using the volume rendering, the system determines depth value for each 3D position along the casted ray and modulate these depth values with corresponding volume densities.); 
the rest claim limitations are similar to claim limitations recited in claim 1. Therefore, same rational used to reject claim 1 is also used to reject claim 14. 

Regarding to claim 15 (Original), Xian discloses the method of claim 14, 
The rest claim limitations are similar to claim limitations recited in claim 2. Therefore, same rational used to reject claim 2 is also used to reject claim 15. 

Regarding to claim 17  (Original), Xian discloses the method of claim 14, further comprising: 
The rest claim limitations are similar to claim limitations recited in claim 5. Therefore, same rational used to reject claim 5 is also used to reject claim 17.

Regarding to claim 18 (Original),  Xian discloses the method of claim 14, further comprising:
The rest claim limitations are similar to claim limitations recited in claim 6. Therefore, same rational used to reject claim 6 is also used to reject claim 18.

Regarding to claim 19 (Original),  Xian discloses the method of claim 18, further comprising: 
The rest claim limitations are similar to claim limitations recited in claim 7. Therefore, same rational used to reject claim 7 is also used to reject claim 19. 

Regarding to claim 20 (Original), Xian discloses the method of claim 18, 
The rest claim limitations are similar to claim limitations recited in claim 8. Therefore, same rational used to reject claim 8 is also used to reject claim 20.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Xian (US 11748940 B1) and in view of Montero (US 20250166311 A1).
Regarding to claim 3  (Original), Xian discloses the processor of claim 1, wherein the featurizer is a pre-trained model configured using data (Xian; col. 16, lines 15-25: the training sample pool at each training iteration; col. 18, lines 55-65: neural network is pre-trained based on randomly selected training samples; col. 19, lines 1-10: the neural network is pre-trained under one or more constraints; col. 19, lines 30-35: the neural network is pre-trained 30 using a total loss function corresponding the data).
Xian fails to explicitly disclose data is vehicle camera data.
In same field of endeavor, Montero teaches data is vehicle camera data ([0101]:  the one or more image sensors are mounted to a vehicle; [0161]: the inputs 1202 for training are obtained using sensors attached to a vehicle).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Xian to include data is vehicle camera data as taught by Montero. The motivation for doing so would have been to obtain the inputs 1202 for training using sensors attached to a vehicle; to evaluate generated outputs of the view synthesis model during training as taught by Montero in Fig. 2 and paragraph [0161-0162].

Claims 4 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Xian (US 11748940 B1) and in view of Zhou (US 10970518 B1).
Regarding to claim 4 (Original),  Xian discloses the processor of claim 1, wherein the 4D representation comprises at least one of a 4D tensor or a 4D neural radiance field (NeRF) (Xian; col. 1, lines 60-67: the system may use a space-time neural radiance fields (NeRF) framework to build a 4D representation (x, y, z, t) for a dynamic scene; col. 8, lines 45-55: the system use a space-time neural radiance field (NeRF) framework to build a 4D representation of (x, y, z, t) for the dynamic scene; col. 17, lines 60-67: output the 4D representation in the form of neural network that was trained to represent the spatiotemporal neural radiance or radiance fields).
Xian fails to explicitly disclose the 3D representation comprises a 3D feature cloud.
In same field of endeavor, Zhou teaches the 3D representation comprises a 3D feature cloud (col. 5, lines 10-20: learn effective features from point clouds and predict accurate 3D bounding boxes; col. 5, lines 60-67: features in the 3D point cloud; col. 6, lines 10-20: determine voxel features for a plurality of voxels of the point cloud).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Xian to include the 3D representation comprises a 3D feature cloud as taught by Zhou. The motivation for doing so would have been to learn effective features from point clouds and predict accurate 3D bounding boxes; to improve network robustness in detecting objects from different distances and with diverse sizes.as taught by Zhou in col. 5, lines 10-20 and col. 16, lines 55-65.

Regarding to claim 16  (Original), Xian discloses the method of claim 14, 
The rest claim limitations are similar to claim limitations recited in claim 4. Therefore, same rational used to reject claim 4 is also used to reject claim 16. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Hai Tao Sun whose telephone number is (571)272-5630. The examiner can normally be reached 9:00AM-6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Hajnik can be reached at 5712727642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAI TAO SUN/Primary Examiner, Art Unit 2616

Read full office action

Prosecution Timeline

Mar 12, 2024

Application Filed

Jan 08, 2026

Non-Final Rejection — §102, §103

Mar 26, 2026

Applicant Interview (Telephonic)

Mar 26, 2026

Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

18/597,939

Patent 12602816

SIMULATED CONFIGURATION EVALUATION APPARATUS AND METHOD

2y 5m to grant Granted Apr 14, 2026

18/684,393

Patent 12603024

DISPLAY CONTROL DEVICE

2y 5m to grant Granted Apr 14, 2026

18/527,903

Patent 12586310

APPARATUS AND METHOD WITH IMAGE PROCESSING

2y 5m to grant Granted Mar 24, 2026

18/066,199

Patent 12578846

GENERATING MASKED REGIONS OF AN IMAGE USING A PREDICTED USER INTENT

2y 5m to grant Granted Mar 17, 2026

18/414,841

Patent 12579727

APPARATUS AND METHOD FOR ASYNCHRONOUS RAY TRACING

2y 5m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

73%

Grant Probability

99%

With Interview (+26.6%)

2y 7m

Median Time to Grant

Low

PTA Risk

Based on 476 resolved cases by this examiner. Grant probability derived from career allow rate.