Office Action Analysis: 18397081 — SCENE GENERATION USING NEURAL RADIANCE FIELDS

Examiner Intelligence

WEI, XIAOMING View full profile →
Grants 83% — above average
Career Allowance Rate
30 granted / 36 resolved
+21.3% vs TC avg
Strong +24% interview lift
Without
With
+24.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
21 currently pending
Career history
60
Total Applications
across all art units
Statute-Specific Performance

§103
98.9%
+58.9% vs TC avg
§102
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 36 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The office action is in response to Applicant’s amendment filed 01/28/2026 which has been entered and made of record. Claims 1-2, 8 and 15 have been amended. No claim has been newly added. Claims 1-20 are pending in the application. 

Response to Arguments
Applicant’s arguments, filed 01/28/2026, with respect to the rejection(s) under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Sharma and Li as fully explained below.
 
Applicant argues Sharma, Pumarola and Irshad, taken individually or in combination, do not teach the newly amended independent claims.
Examiner agrees Sharma, Pumarola and Irshad do not teach the newly amended independent claims. However, a new ground of rejection is made in view of Sharma and Li.
 
Conclusions: The rejections set in the previous Office Action are shown to have been proper, and the claims are rejected below. New citations and parenthetical remarks can be considered new grounds of rejection and such new grounds of rejection are necessitated by the Applicant's amendments to the claims. Therefore, the present Office Action is made final.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s)  1-3, 8-10, 15-17 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sharma et al. (US 20240005627 A1), hereinafter as Sharma, in view of Li et al. (“Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes”), hereinafter as Li.
Regarding claim 1, Sharma teaches A method of generating one or more images (Sharma paragraph [0086] “Some aspects of the present disclosure render images from novel camera views via differentiable volume rendering.”) comprising:
accessing a three-dimensional (3D) representation of an environment (Sharma teaches a 3D neural scene as the 3D representation of an environment. paragraph [0054] “Some aspects of the present disclosure are directed to observing unlabeled multi-view videos at training time for learning to map a single image observation of a complex scene, such as a street with cars, to a 3D neural scene representation. According to these aspects of the present disclosure, the 3D neural scene representation is disentangled into movable and immovable parts while completing a 3D structure”); determining one or more static features and one or more dynamic features of the 3D representation (Sharma teaches immovable scene part as static feature of the static ground plane, further teaches movable scene parts as dynamic feature on dynamic ground plane. paragraph [0054] “Some aspects of the present disclosure separately parameterize movable and immovable scene parts via 2D neural ground planes. For example, these ground planes are implemented as 2D grids of features aligned with the ground plane that can be locally decoded into 3D neural radiance fields.” And paragraph [0083] “As shown in FIG. 6C, the resulting 2D grid of features represented by the entangled neural ground plane 670 is decomposed and separated into a static ground plane 680 and a dynamic ground plane 690 using a 2D CNN 672.”); determining, using the one or more static features, one or more static density values (Sharma paragraph [0091] “decoding the query points using both the static ground plane 680 and the dynamic ground plane 690 yields two sets of values (density, color) for each point. As shown in FIG. 7C, the contribution from static and dynamic components are composed along the ray. Given the color and density for static (c.sup.S, σ.sup.S)”); determining, using the one or more dynamic feature, one or more dynamic density values (Sharma paragraph [0091] “decoding the query points using both the static ground plane 680 and the dynamic ground plane 690 yields two sets of values (density, color) for each point. As shown in FIG. 7C, the contribution from static and dynamic components are composed along the ray……dynamic (c.sup.D, σ.sup.D) parts”); 
Sharma fails to teach predicting a forward flow vector and a backward flow vector based on the one or more dynamic density values; temporally aggregating dynamic features using the predicted forward flow vector and backward flow vector and generating the one or more images based on the one or more static density values, the one or more dynamic density values, and the aggregated dynamic features. Li teaches predicting a forward flow vector and a backward flow vector based on the one or more dynamic density values (Li teaches predicting forward and backward flow vectors, further teaches using multiple loss functions based on density values to train forward and backward flows, Page 3, Left Column, Third Paragraph, “To capture scene dynamics, we extend the static scenario described in Eq. 1 by including time in the domain and explicitly modeling 3D motion as dense scene flow fields. For a given 3D point x and time i, the model predicts not just reflectance and opacity, but also forward and backward 3D scene flow Fi = (fi→i+1,fi→i−1), which denote 3D offset vectors that point to the position of x at times i + 1 and i − 1 respectively.”, Page 3, Right Column, First Paragraph, “we achieve this by warping each 3D sampled point location xi along a ray ri during volume tracing using the predicted scene flows fields Fi to look up the RGB color cj and opacity σj from neighboring time j. This yields a rendered image, denoted ˆ Cj→i, of the scene at time j with both camera and scene motion warped to time i: 
    PNG
    media_image1.png
    112
    563
    media_image1.png
    Greyscale

We minimize the mean squared error (MSE) between each warped rendered view and the ground truth view: 
    PNG
    media_image2.png
    68
    510
    media_image2.png
    Greyscale
”); temporally aggregating dynamic features using the predicted forward flow vector and backward flow vector (Li teaches combining forward and backward flow vector with time in the dynamic scene, Page 3, Left Column, Last paragraph, “Our dynamic model is thus defined as:                         
                            
                                            c
                                        
                                            i
                                        
                                    ,
                                    
                                            σ
                                        
                                            i
                                        
                                    ,
                                    
                                            F
                                        
                                            i
                                        
                                    ,
                                    
                                            W
                                        
                                            i
                                        
                            =
                             
                                    F
                                
                                    θ
                                
                                    d
                                    y
                                
                            (
                            x
                            ,
                            d
                            ,
                            i
                            )
                        
                     Note that for convenience, we use the subscript i to indicate a value at a specific time i.”, where Fi = (fi→i+1,fi→i−1) stands for forward and backward flows) and generating the one or more images based on the one or more static density values, the one or more dynamic density values, and the aggregated dynamic features ( Li teaches combining dynamic scene with static scene, and using volume rendering to combine static density, and dynamic density from scene flow, Page 5, Left Column, Third Paragraph, “We model each representation with its own MLP, where the dynamic scene component is represented with Eq. 4                         
                            
                                            c
                                        
                                            i
                                        
                                    ,
                                    
                                            σ
                                        
                                            i
                                        
                                    ,
                                    
                                            F
                                        
                                            i
                                        
                                    ,
                                    
                                            W
                                        
                                            i
                                        
                            =
                             
                                    F
                                
                                    θ
                                
                                    d
                                    y
                                
                            (
                            x
                            ,
                            d
                            ,
                            i
                            )
                        
                    , and the static one is represented as a variant of Eq. 1                         
                            
                                    c
                                    ,
                                     
                                    σ
                                
                            =
                            
                                    F
                                
                                    θ
                                
                            (
                            x
                            ,
                            d
                            )
                        
                    ,                         
                            
                                    C
                                    ,
                                    σ
                                    ,
                                    v
                                
                            =
                            
                                    F
                                
                                    θ
                                
                                    s
                                    t
                                
                                    x
                                    ,
                                     
                                    d
                                
                            ,
                             
                    Eq. 12 where v is an unsupervised 3D blending weight field, that linearly blends the RGBσ from static and dynamic scene representations along each ray. Page 3, Right Column, Figure 2, “Scene flow fields warping. To render a frame at time i, we perform volume tracing along ray ri with RGBσ at time i, giving us the pixel color Cˆ i(ri) (left). To warp the scene from time j to i, we offset each step along ri using scene flow fi→j and volume trace with the associated color and opacity (cj , σj ) (right).”).
	Sharma and Li are in the same field of endeavor, namely computer graphics, especially in the field of image generation using neural radiance field. Li teaches a neural network based static and dynamic feature rendering system to achieve better rendering result (Li Page 1, Left Column, Abstract, “We show that our representation can be used for complex dynamic scenes, including thin structures, view-dependent effects, and natural degrees of motion. We conduct a number of experiments that demonstrate our approach significantly outperforms recent monocular view synthesis methods, and show qualitative results of space-time view synthesis on a variety of real-world videos.”).Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Li with the method of Sharma to achieve better rendering results.

Regarding claim 2, Sharma in view of Li teaches the method of claim 1, The method of claim 1, and further teach further comprising generating one or more color values based on the one or more static features and the one or more dynamic features (Sharma paragraph [0091] “decoding the query points using both the static ground plane 680 and the dynamic ground plane 690 yields two sets of values (density, color) for each point.”); and generating the one or more images using the one or more color values (Sharma paragraph [0091] “Given the color and density for static (c.sup.S, σ.sup.S) and dynamic (c.sup.D, σ.sup.D) parts, the density of the combined scene is calculated as σ.sup.S+σ.sup.D. The color at the sampled point is computed as a weighted linear combination w.sup.Sc.sup.S+w.sup.Dc.sup.D where w.sup.S=(1−exp(−δσ.sup.S))/(1−exp(−δ(σ.sup.S+σ.sup.D)), w.sup.D=(1−exp(−δσ.sup.D))/(1−exp(−δ(σ.sup.S+σ.sup.D)), and δ is the distance between adjacent samples on the camera ray.”).

Regarding claim 3, Sharma in view of Li teaches the method of claim 1, The method of claim 1, and further teach wherein the 3D representation of the environment is a neural radiance field (NeRF) (Sharma paragraph [0054] “the 3D neural scene representation is disentangled into movable and immovable parts while completing a 3D structure. Some aspects of the present disclosure separately parameterize movable and immovable scene parts via 2D neural ground planes. For example, these ground planes are implemented as 2D grids of features aligned with the ground plane that can be locally decoded into 3D neural radiance fields.”).

Regarding claim 8, Sharma teaches A non-transitory computer readable storage medium storing thereon executable instructions that, as a result of being executed by one or more processors of a computer system (Sharma paragraph [0007] “A non-transitory computer-readable medium having program code recorded thereon of conditional neural ground planes for static-dynamic disentanglement is described. The program code is executed by a processor. The non-transitory computer-readable medium includes program code to extract, using a convolutional neural network (CNN), CNN image features from an image to form a feature tensor.”), cause the computer system to: access a three-dimensional (3D) representation of an environment (Sharma teaches a 3D neural scene as the 3D representation of an environment. Paragraph [0054] “Some aspects of the present disclosure are directed to observing unlabeled multi-view videos at training time for learning to map a single image observation of a complex scene, such as a street with cars, to a 3D neural scene representation. According to these aspects of the present disclosure, the 3D neural scene representation is disentangled into movable and immovable parts while completing a 3D structure”); determine one or more static features and one or more dynamic features of the 3D representation (Sharma teaches immovable scene part as static feature of the static ground plane, further teaches movable scene parts as dynamic feature on dynamic ground plane. paragraph [0054] “Some aspects of the present disclosure separately parameterize movable and immovable scene parts via 2D neural ground planes. For example, these ground planes are implemented as 2D grids of features aligned with the ground plane that can be locally decoded into 3D neural radiance fields.” And paragraph [0083] “As shown in FIG. 6C, the resulting 2D grid of features represented by the entangled neural ground plane 670 is decomposed and separated into a static ground plane 680 and a dynamic ground plane 690 using a 2D CNN 672.”); determine, using the one or more static features, one or more static density values (Sharma paragraph [0091] “decoding the query points using both the static ground plane 680 and the dynamic ground plane 690 yields two sets of values (density, color) for each point. As shown in FIG. 7C, the contribution from static and dynamic components are composed along the ray. Given the color and density for static (c.sup.S, σ.sup.S)”); determine, using the one or more dynamic feature, one or more dynamic density values (Sharma paragraph [0091] “decoding the query points using both the static ground plane 680 and the dynamic ground plane 690 yields two sets of values (density, color) for each point. As shown in FIG. 7C, the contribution from static and dynamic components are composed along the ray……dynamic (c.sup.D, σ.sup.D) parts”); 
Sharma fails to teach predict a forward flow vector and a backward flow vector based on the one or more dynamic density values, temporally aggregate dynamic features using the predicted forward flow vector and backward flow vector and generate one or more images based on the one or more static density values, the one or more dynamic density values, and the aggregated dynamic features. Li teaches predict a forward flow vector and a backward flow vector based on the one or more dynamic density values (Li teaches predicting forward and backward flow vectors, further teaches using multiple loss functions based on density values to train forward and backward flows, Page 3, Left Column, Third Paragraph, “To capture scene dynamics, we extend the static scenario described in Eq. 1 by including time in the domain and explicitly modeling 3D motion as dense scene flow fields. For a given 3D point x and time i, the model predicts not just reflectance and opacity, but also forward and backward 3D scene flow Fi = (fi→i+1,fi→i−1), which denote 3D offset vectors that point to the position of x at times i + 1 and i − 1 respectively.”, Page 3, Right Column, First Paragraph, “we achieve this by warping each 3D sampled point location xi along a ray ri during volume tracing using the predicted scene flows fields Fi to look up the RGB color cj and opacity σj from neighboring time j. This yields a rendered image, denoted ˆ Cj→i, of the scene at time j with both camera and scene motion warped to time i: 
    PNG
    media_image1.png
    112
    563
    media_image1.png
    Greyscale

We minimize the mean squared error (MSE) between each warped rendered view and the ground truth view: 
    PNG
    media_image2.png
    68
    510
    media_image2.png
    Greyscale
”); temporally aggregate dynamic features using the predicted forward flow vector and backward flow vector (Li teaches combining forward and backward flow vector with time in the dynamic scene, Page 3, Left Column, Last paragraph, “Our dynamic model is thus defined as:                         
                            
                                            c
                                        
                                            i
                                        
                                    ,
                                    
                                            σ
                                        
                                            i
                                        
                                    ,
                                    
                                            F
                                        
                                            i
                                        
                                    ,
                                    
                                            W
                                        
                                            i
                                        
                            =
                             
                                    F
                                
                                    θ
                                
                                    d
                                    y
                                
                            (
                            x
                            ,
                            d
                            ,
                            i
                            )
                        
                     Note that for convenience, we use the subscript i to indicate a value at a specific time i.”, where Fi = (fi→i+1,fi→i−1) stands for forward and backward flows) and generate one or more images based on the one or more static density values, the one or more dynamic density values, and the aggregated dynamic features ( Li teaches combining dynamic scene with static scene, and using volume rendering to combine static density, and dynamic density from scene flow, Page 5, Left Column, Third Paragraph, “We model each representation with its own MLP, where the dynamic scene component is represented with Eq. 4                         
                            
                                            c
                                        
                                            i
                                        
                                    ,
                                    
                                            σ
                                        
                                            i
                                        
                                    ,
                                    
                                            F
                                        
                                            i
                                        
                                    ,
                                    
                                            W
                                        
                                            i
                                        
                            =
                             
                                    F
                                
                                    θ
                                
                                    d
                                    y
                                
                            (
                            x
                            ,
                            d
                            ,
                            i
                            )
                        
                    , and the static one is represented as a variant of Eq. 1                         
                            
                                    c
                                    ,
                                     
                                    σ
                                
                            =
                            
                                    F
                                
                                    θ
                                
                            (
                            x
                            ,
                            d
                            )
                        
                    ,                         
                            
                                    C
                                    ,
                                    σ
                                    ,
                                    v
                                
                            =
                            
                                    F
                                
                                    θ
                                
                                    s
                                    t
                                
                                    x
                                    ,
                                     
                                    d
                                
                            ,
                             
                    Eq. 12 where v is an unsupervised 3D blending weight field, that linearly blends the RGBσ from static and dynamic scene representations along each ray. Page 3, Right Column, Figure 2, “Scene flow fields warping. To render a frame at time i, we perform volume tracing along ray ri with RGBσ at time i, giving us the pixel color Cˆ i(ri) (left). To warp the scene from time j to i, we offset each step along ri using scene flow fi→j and volume trace with the associated color and opacity (cj , σj ) (right).”).
	Sharma and Li are in the same field of endeavor, namely computer graphics, especially in the field of image generation using neural radiance field. Li teaches a neural network based static and dynamic feature rendering system to achieve better rendering result (Li Page 1, Left Column, Abstract, “We show that our representation can be used for complex dynamic scenes, including thin structures, view-dependent effects, and natural degrees of motion. We conduct a number of experiments that demonstrate our approach significantly outperforms recent monocular view synthesis methods, and show qualitative results of space-time view synthesis on a variety of real-world videos.”).Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Li with the method of Sharma to achieve better rendering results.

Regarding claim 9, claim 9 has similar limitations as claim 2, therefore it is rejected under the same rationale as claim 2.

Regarding claim 10, claim 10 has similar limitations as claim 3, therefore it is rejected under the same rationale as claim 3.

Regarding claim 15, Sharma teaches A system comprising: one or more processors to (Sharma paragraph [0037] “The static-dynamic object disentanglement system 300 includes the processor 320 coupled to the computer-readable medium 322. The processor 320 performs processing, including the execution of software stored on the computer-readable medium 322 to provide static-dynamic object disentanglement functionality based on single images”): access a three-dimensional (3D) representation of an environment (Sharma teaches a 3D neural scene as the 3D representation of an environment. Paragraph [0054] “Some aspects of the present disclosure are directed to observing unlabeled multi-view videos at training time for learning to map a single image observation of a complex scene, such as a street with cars, to a 3D neural scene representation. According to these aspects of the present disclosure, the 3D neural scene representation is disentangled into movable and immovable parts while completing a 3D structure”); determine one or more static features and one or more dynamic features of the 3D representation (Sharma teaches immovable scene part as static feature of the static ground plane, further teaches movable scene parts as dynamic feature on dynamic ground plane. paragraph [0054] “Some aspects of the present disclosure separately parameterize movable and immovable scene parts via 2D neural ground planes. For example, these ground planes are implemented as 2D grids of features aligned with the ground plane that can be locally decoded into 3D neural radiance fields.” And paragraph [0083] “As shown in FIG. 6C, the resulting 2D grid of features represented by the entangled neural ground plane 670 is decomposed and separated into a static ground plane 680 and a dynamic ground plane 690 using a 2D CNN 672.”); determine, using the one or more static features, one or more static density values (Sharma paragraph [0091] “decoding the query points using both the static ground plane 680 and the dynamic ground plane 690 yields two sets of values (density, color) for each point. As shown in FIG. 7C, the contribution from static and dynamic components are composed along the ray. Given the color and density for static (c.sup.S, σ.sup.S)”); determine, using the one or more dynamic feature, one or more dynamic density values (Sharma paragraph [0091] “decoding the query points using both the static ground plane 680 and the dynamic ground plane 690 yields two sets of values (density, color) for each point. As shown in FIG. 7C, the contribution from static and dynamic components are composed along the ray……dynamic (c.sup.D, σ.sup.D) parts”); 
Sharma fails to teach predict a forward flow vector and a backward flow vector based on the one or more dynamic density values, temporally aggregate dynamic features using the predicted forward flow vector and backward flow vector; and generate one or more images based on the one or more static density values, the one or more dynamic density values, and the aggregated dynamic features. Li teaches predict a forward flow vector and a backward flow vector based on the one or more dynamic density values (Li teaches predicting forward and backward flow vectors, further teaches using multiple loss functions based on density values to train forward and backward flows, Page 3, Left Column, Third Paragraph, “To capture scene dynamics, we extend the static scenario described in Eq. 1 by including time in the domain and explicitly modeling 3D motion as dense scene flow fields. For a given 3D point x and time i, the model predicts not just reflectance and opacity, but also forward and backward 3D scene flow Fi = (fi→i+1,fi→i−1), which denote 3D offset vectors that point to the position of x at times i + 1 and i − 1 respectively.”, Page 3, Right Column, First Paragraph, “we achieve this by warping each 3D sampled point location xi along a ray ri during volume tracing using the predicted scene flows fields Fi to look up the RGB color cj and opacity σj from neighboring time j. This yields a rendered image, denoted ˆ Cj→i, of the scene at time j with both camera and scene motion warped to time i: 
    PNG
    media_image1.png
    112
    563
    media_image1.png
    Greyscale

We minimize the mean squared error (MSE) between each warped rendered view and the ground truth view: 
    PNG
    media_image2.png
    68
    510
    media_image2.png
    Greyscale
”);, temporally aggregate dynamic features using the predicted forward flow vector and backward flow vector (Li teaches combining forward and backward flow vector with time in the dynamic scene, Page 3, Left Column, Last paragraph, “Our dynamic model is thus defined as:                         
                            
                                            c
                                        
                                            i
                                        
                                    ,
                                    
                                            σ
                                        
                                            i
                                        
                                    ,
                                    
                                            F
                                        
                                            i
                                        
                                    ,
                                    
                                            W
                                        
                                            i
                                        
                            =
                             
                                    F
                                
                                    θ
                                
                                    d
                                    y
                                
                            (
                            x
                            ,
                            d
                            ,
                            i
                            )
                        
                     Note that for convenience, we use the subscript i to indicate a value at a specific time i.”, where Fi = (fi→i+1,fi→i−1) stands for forward and backward flows); and generate one or more images based on the one or more static density values, the one or more dynamic density values, and the aggregated dynamic features ( Li teaches combining dynamic scene with static scene, and using volume rendering to combine static density, and dynamic density from scene flow, Page 5, Left Column, Third Paragraph, “We model each representation with its own MLP, where the dynamic scene component is represented with Eq. 4                         
                            
                                            c
                                        
                                            i
                                        
                                    ,
                                    
                                            σ
                                        
                                            i
                                        
                                    ,
                                    
                                            F
                                        
                                            i
                                        
                                    ,
                                    
                                            W
                                        
                                            i
                                        
                            =
                             
                                    F
                                
                                    θ
                                
                                    d
                                    y
                                
                            (
                            x
                            ,
                            d
                            ,
                            i
                            )
                        
                    , and the static one is represented as a variant of Eq. 1                         
                            
                                    c
                                    ,
                                     
                                    σ
                                
                            =
                            
                                    F
                                
                                    θ
                                
                            (
                            x
                            ,
                            d
                            )
                        
                    ,                         
                            
                                    C
                                    ,
                                    σ
                                    ,
                                    v
                                
                            =
                            
                                    F
                                
                                    θ
                                
                                    s
                                    t
                                
                                    x
                                    ,
                                     
                                    d
                                
                            ,
                             
                    Eq. 12 where v is an unsupervised 3D blending weight field, that linearly blends the RGBσ from static and dynamic scene representations along each ray. Page 3, Right Column, Figure 2, “Scene flow fields warping. To render a frame at time i, we perform volume tracing along ray ri with RGBσ at time i, giving us the pixel color Cˆ i(ri) (left). To warp the scene from time j to i, we offset each step along ri using scene flow fi→j and volume trace with the associated color and opacity (cj , σj ) (right).”).
	Sharma and Li are in the same field of endeavor, namely computer graphics, especially in the field of image generation using neural radiance field. Li teaches a neural network based static and dynamic feature rendering system to achieve better rendering result (Li Page 1, Left Column, Abstract, “We show that our representation can be used for complex dynamic scenes, including thin structures, view-dependent effects, and natural degrees of motion. We conduct a number of experiments that demonstrate our approach significantly outperforms recent monocular view synthesis methods, and show qualitative results of space-time view synthesis on a variety of real-world videos.”).Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Li with the method of Sharma to achieve better rendering results.

Regarding claim 16, claim 16 has similar limitations as claim 2, therefore it is rejected under the same rationale as claim 2.

Regarding claim 17, claim 17 has similar limitations as claim 3, therefore it is rejected under the same rationale as claim 3.

Regarding claim 20, Sharma in view of Li teaches the system of claim 15, the system of claim 15, further teaches wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a first system for performing simulation operations; a second system for performing deep learning operations; a third system implemented using an edge device; a fourth system implemented using a robot; a fifth system incorporating one or more virtual machines (VMs); a sixth system implemented at least partially in a data center; a seventh system for performing digital twin operations; an eighth system for performing light transport simulation; a nineth system for performing collaborative content creation for 3D assets; a tenth system for performing conversational Artificial Intelligence operations; an eleventh system for generating synthetic data; a twelfth system for implementing a web-hosted service for detecting program workload inefficiencies; an application as an application programming interface ("API"); a thirteenth system implemented at least partially using cloud computing resources; a fourteenth system for presenting one or more of virtual reality content, augmented reality content, or mixed reality content; or a fifteenth system implementing one or more large language models (LLMs) (Sharma FIG. 3, paragraph [0034] “The static-dynamic object disentanglement system 300 may be a component of a vehicle, a robotic device, or other device. For example, as shown in FIG. 3, the static-dynamic object disentanglement system 300 is a component of the car 350. Aspects of the present disclosure are not limited to the static-dynamic object disentanglement system 300 being a component of the car 350, as other devices, such as a bus, motorcycle, or other like vehicle, are also contemplated for using the static-dynamic object disentanglement system 300. The car 350 may be autonomous or semi-autonomous.”).

Claim(s)  5-6, 12, 13 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sharma et al. (US 20240005627 A1), hereinafter as Sharma, in view of Li et al. (“Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes”), hereinafter as Li, and further in view of NPL Pumarola et al. (“D-NeRF: Neural Radiance Fields for Dynamic Scenes”), hereinafter as Pumarola. 
Regarding claim 5, Sharma in view of Li teaches the method of claim 1, The method of claim 1, but fails to explicitly teach wherein the one or more static features are determined using a feature encoder associated with a position within the environment. Pumarola teaches wherein the one or more static features are determined using a feature encoder associated with a position within the environment (Pumarola Page 4, Left column, fourth paragraph and Right column, second paragraph, “On the one hand we have the Canonical Network, an MLP (multilayer perceptron) Ψx(x, d) → (c, σ) is trained to encode the scene in the canonical configuration such that given a 3D point x and a view direction d returns its emitted color c and volume density σ.” And “The canonical network Ψx is trained so as to encode volumetric density and color of the scene in canonical configuration. Concretely, given the 3D coordinates x of a point, we first encode it into a 256-dimensional feature vector. This feature vector is then concatenated with the camera viewing direction d, and propagated through a fully connected layer to yield the emitted color c and volume density σ for that given point in the canonical space.”).
Sharma, Li and Pumarola are in the same field of endeavor, namely computer graphics, especially in the field of image generation using neural radiance field. Pumarola teaches dynamic-NeRF to decompose learning into two modules in order to render dynamic changing neural radiance field with high-quality images (Pumarola Page 2, Left Column, fourth paragraph, “We show that by decomposing learning into a canonical scene and scene flow D-NeRF is able to render high-quality images while controlling both camera view and time components.”).Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Pumarola with the method of Sharma and Li to achieve high quality rendering images with dynamic changing scenes.

Regarding claim 6, Sharma in view of Li teaches the method of claim 1, The method of claim 1, but fails to explicitly teach wherein the one or more dynamic features are determined using a feature encoder associated with a position and indication of time associated with the 3D representation of the environment. Pumarola teaches wherein the one or more dynamic features are determined using a feature encoder associated with a position and indication of time associated with the 3D representation of the environment (Pumarola Page 4, Left column, fourth paragraph and Right column, third paragraph, “The second module is called Deformation Network and consists of another MLP Ψt(x, t) → ∆x which predicts a deformation field defining the transformation between the scene at time t and the scene in its canonical configuration.” And “Formally, given a 3D point x at time t, Ψt is trained to output the displacement ∆x that transforms the given point to its position in the canonical space as x + ∆x….. for both the canonical and the deformation networks, we first encode x, d and t into a higher dimension space. We use the same positional encoder”).
Sharma, Li and Pumarola are in the same field of endeavor, namely computer graphics, especially in the field of image generation using neural radiance field. Pumarola teaches dynamic-NeRF to decompose learning into two modules in order to render dynamic changing neural radiance field with high-quality images (Pumarola Page 2, Left Column, fourth paragraph, “We show that by decomposing learning into a canonical scene and scene flow D-NeRF is able to render high-quality images while controlling both camera view and time components.”).Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Pumarola with the method of Sharma and Li to achieve high quality rendering images with dynamic changing scenes.

Regarding claim 12, claim 12 has similar limitations as claim 5, therefore it is rejected under the same rationale as claim 5.

Regarding claim 13, claim 13 has similar limitations as claim 6, therefore it is rejected under the same rationale as claim 6.

Regarding claim 19, claim 19 has similar limitations as claim 6, therefore it is rejected under the same rationale as claim 6.

Claim(s)  4, 7, 11, 14 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sharma et al. (US 20240005627 A1), hereinafter as Sharma, in view of Li et al. (“Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes”), hereinafter as Li, and further in view of Irshad et al. (US 20240171724 A1), hereinafter as Irshad. 
Regarding claim 4, Sharma in view of Li teaches the method of claim 1, The method of claim 1, but fails to teach further comprising determining a viewing direction associated with an autonomous machine; and using the viewing direction to generate the one or more images. Irshad teaches further comprising determining a viewing direction associated with an autonomous machine (Irshad teaches an autonomous vehicle with sensor system, further teaches the sensor system can determine the position and orientation change of vehicle, which includes the change of viewing direction of the autonomous vehicle. Irshad paragraph [0046-0052] “Accordingly the electronic control unit 50 of the vehicle 2 for example can include one or more autonomous driving module(s) 160. The autonomous driving module(s) 160 can be configured to receive data from the sensor system 52 and/or any other type of system capable of capturing information relating to the vehicle 2 and/or the external environment of the vehicle 2…… vehicle sensor(s) 52 can detect, determine, and/or sense information about the vehicle 2 itself, or can be configured to detect, and/or sense position and orientation changes of the vehicle 2, such as, for example, based on inertial acceleration……The NeO 360 system 170 can receive sensor data 250 from one or more sensors 52 and provide neural fields for sparse novel view synthesis of outdoor scenes” and paragraph [0076] “The radiance field decoder D, also referred to herein as NeRF decoder 268, is tasked with predicting a color c and density a for any arbitrary 3D location x and viewing direction d from triplanes S and residual features f.sub.r.”); and using the viewing direction to generate the one or more images (Irshad paragraph [0101] “In one example the decoder 268 predicts a color and density for an arbitrary 3D location and a viewing direction from the triplanar representation. The decoder 268 uses near and far rendering MLPs to decode color and density used to render the local and global feature representations of the novel scene. The near and far rendering MLPs output density and color for a 3D point and a viewing direction. The novel scene is rendered in a 360 degree view.”).
Sharma, Li and Irshad are in the same field of endeavor, namely computer graphics, especially in the field of image generation using neural radiance field. Irshad teaches a neural radiance field based system to generate new views and novel scenes in autonomous vehicle navigation to achieve better rendering result (Irshad paragraph [0087] “the present disclosure shows that the generated volume can also be employed in a computationally efficient way to estimate the entire scene's appearance and enable accurate neural rendering”).Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Irshad with the method of Sharma and Li to achieve high quality rendering results.

Regarding claim 7, Sharma in view of Li teaches the method of claim 1, The method of claim 1, but fails to explicitly teach further comprising using one or more neural networks to generate the one or more images. Irshad teaches further comprising using one or more neural networks to generate the one or more images (Irshad paragraph [0098] “Step 604 includes encoding with the encoder 254 the at least one inputted posed RGB image. In step 606, the at least one encoded RGB image is output from the encoder 254 and input into a far multi-layer perceptron (MLP) (e.g., far MLP 258 of FIG. 2, 4, or 5) for representing background images and a near multi-layer perceptron (MLP) (e.g., near MLP 260 of FIG. 2, 4, or 5) for representing foreground images. In an example the far MLP 258 and the near MLP 260 are neural networks.”).
Sharma, Li and Irshad are in the same field of endeavor, namely computer graphics, especially in the field of image generation using neural radiance field. Irshad teaches a neural radiance field based system to generate new views and novel scenes in autonomous vehicle navigation to achieve better rendering result (Irshad paragraph [0087] “the present disclosure shows that the generated volume can also be employed in a computationally efficient way to estimate the entire scene's appearance and enable accurate neural rendering”).Therefore, it would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Irshad with the method of Sharma and Li to achieve high quality rendering results.

Regarding claim 11, claim 11 has similar limitations as claim 4, therefore it is rejected under the same rationale as claim 4.

Regarding claim 14, claim 14 has similar limitations as claim 7, therefore it is rejected under the same rationale as claim 7.

Regarding claim 18, claim 18 has similar limitations as claim 4, therefore it is rejected under the same rationale as claim 4.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAOMING WEI whose telephone number is (571)272-3831. The examiner can normally be reached M-F 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/XIAOMING WEI/Examiner, Art Unit 2611                                                                                                                                                                                                        

/KEE M TUNG/Supervisory Patent Examiner, Art Unit 2611
Read full office action
Prosecution Timeline

Dec 27, 2023
Application Filed
Aug 28, 2025
Non-Final Rejection mailed — §103
Oct 06, 2025
Examiner Interview Summary
Oct 06, 2025
Applicant Interview (Telephonic)
Jan 27, 2026
Response Filed
Mar 12, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/518,941
Patent 12633010
System and Method for Single-pass Path Rendering Using Coverage Counting
2y 5m to grant Granted May 19, 2026
18/280,302
Patent 12626670
IMAGE DISPLAY METHODS AND APPARATUSES
2y 8m to grant Granted May 12, 2026
18/317,890
Patent 12620187
SYSTEMS, METHODS, AND USER INTERFACES FOR GENERATING A THREE-DIMENSIONAL VIRTUAL REPRESENTATION OF AN OBJECT
2y 11m to grant Granted May 05, 2026
18/532,291
Patent 12614335
METHOD AND SYSTEM FOR DISTRIBUTED REAL-TIME RENDERING
2y 4m to grant Granted Apr 28, 2026
18/157,183
Patent 12603064
CIRCUIT AND METHOD FOR VIDEO DATA CONVERSION AND DISPLAY DEVICE
3y 2m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+24.0%)
2y 4m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 36 resolved cases by this examiner. Grant probability derived from career allowance rate.
SCENE GENERATION USING NEURAL RADIANCE FIELDS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

SCENE GENERATION USING NEURAL RADIANCE FIELDS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email