Prosecution Insights
Last updated: April 19, 2026
Application No. 18/275,332

RENDERING NEW IMAGES OF SCENES USING GEOMETRY-AWARE NEURAL NETWORKS CONDITIONED ON LATENT VARIABLES

Non-Final OA §103
Filed
Aug 01, 2023
Examiner
AHN, CHRISTINE YERA
Art Unit
2615
Tech Center
2600 — Communications
Assignee
Deepmind Technologies Limited
OA Round
3 (Non-Final)
69%
Grant Probability
Favorable
3-4
OA Rounds
2y 7m
To Grant
99%
With Interview

Examiner Intelligence

Grants 69% — above average
69%
Career Allow Rate
11 granted / 16 resolved
+6.8% vs TC avg
Strong +38% interview lift
Without
With
+37.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
34 currently pending
Career history
50
Total Applications
across all art units

Statute-Specific Performance

§101
5.2%
-34.8% vs TC avg
§103
49.6%
+9.6% vs TC avg
§102
21.9%
-18.1% vs TC avg
§112
20.1%
-19.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 16 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status 1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 2. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on January 28, 2026 has been entered. Response to Amendment 3. The amendment filed January 28, 2026 has been entered. Claims 1, 3-17, and 19-22 remain pending in the application. Applicant’s amendments to the Drawings have overcome each and every objection. Response to Arguments 4. Applicant's arguments filed January 28, 2026 have been fully considered but they are not persuasive. 5. Applicant argues that Rezende et al. (U.S. Patent Application Publication No. 2019/0258907 A1), hereinafter referred to as Rezende, and Schwarz et al. ("GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"), hereinafter referred to as Schwarz, fails to teach at each time step, “evaluating an objective function for the time step that measures an error of images rendered using a scene representation neural network as conditioned on the current latent variable for the time step” and “determining updated parameters of the probability distribution over the latent space using a refinement neural network by processing a network input to the refinement neural network comprising: (i) a representation of the scene generated by processing the plurality of observations using the encoding neural network and (ii) gradients of the objective function for the time step with respect to the parameters of the probability distribution of latent variables” in amended claim 1, 16, and 17. Examiner replies that Rezende Paragraph 76-77 and Figure 4 step 408 teaches for each time step a latent variable is sampled. Rezende then teaches in Paragraph 72 evaluating a loss function that measures an error of images rendered using the sampled latent variable for that timestep. Rezende also teaches in Paragraph 76 a “latent variable neural network … generate as output a set of sufficient statistics of the prior distribution”. The latent variable neural network teaches the refinement neural network, the set of sufficient statistics teaches the probability distribution over the latent space, and the set of sufficient statistics teaches the parameters of the probability distribution. Therefore, this teaches determining updated parameters of the probability distribution using a refinement neural network. Rezende Paragraph 76 teaches the latent variable neural network takes as input the state of the recurrent neural network and set of latent variable neural network parameters. Paragraph 50 teaches using the encoding neural network to create the scene representations and Paragraph 51 teaches the recurrent neural network final state is the semantic representation. Thus, the latent variable neural network which takes in the recurrent neural network state teaches the refinement neural network taking as input a representation of the scene. Furthermore, Applicant’s arguments with respect to claim(s) 1, 16, 17, and their dependents have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Menick et al. (U.S. Patent Application Publication No. 2021/0004677 A1), hereinafter Menick, teaches in Paragraph 5 "determining a gradient of a loss function … and adjusting the current parameter values of the … prior neural network based on the gradient". Paragraph 14 teaches the prior neural network outputs parameters for the prior probability distribution. Thus, this teaches passing in a gradient of the objective function for each time step with respect to the current parameters of the probability distribution. Claim Rejections - 35 USC § 103 6. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 7. The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action. 8. Claim(s) 1, 3-4, 6, 11-14, 16-17, 19-20, and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rezende et al. (U.S. Patent Application Publication No. 2019/0258907 A1), hereinafter referred to as Rezende, in view of Schwarz et al. ("GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"), hereinafter referred to as Schwarz, and Menick et al. (U.S. Patent Application Publication No. 2021/0004677 A1), hereinafter referred to as Menick. 9. Regarding claim 1, Rezende teaches a method performed by one or more data processing apparatus (Paragraph 92 teaches a data processing apparatus which can execute a computer program) for rendering a new image that depicts a scene from a perspective of a camera at a new camera location (Paragraph 9 teaches rendering a new image of a scene from a camera at a new camera location), the method comprising: receiving a plurality of observations characterizing the scene (Paragraph 4 teaches receiving a plurality of observations characterizing the scene), wherein each observation comprises: (i) an image of the scene (Paragraph 4 teaches each observation comprises of an image of the scene), and (ii) data identifying a location of a camera that captured the image of the scene (Paragraph 4 teaches each observation comprises of data identifying a location of a camera capturing the image of the scene); iteratively updating, using an encoding neural network, parameters of a probability distribution over a latent space over a sequence of time steps, comprising, at each time step (Paragraph 76 teaches “at each of a pre-determined number of time steps, a latent variable neural network of the generator processes a state of the recurrent neural network for the time step in accordance with a set of latent variable neural network parameters to generate as output a set of sufficient statistics of the prior distribution for a subset of the latent variables corresponding to the time step”. The set of sufficient statistics are the parameters for the probability distribution. This teaches iteratively updating the parameters of the probability distribution; Paragraph 76 also teaches the latent variable neural network takes as input the state of the recurrent neural network and set of latent variable neural network parameters. Paragraph 50 teaches using the observation neural network (encoding neural network) to create the scene representations and Paragraph 51 teaches the recurrent neural network final state is the semantic representation. Thus, the encoding neural network is involved in iteratively updating the parameters of a probability distribution): sampling a current latent variable for the time step in accordance with the probability distribution over the latent space (Paragraph 77 and Figure 4 step 408 teaches “at each time step, the generator model determines values of the latent variables corresponding to the time step” and also “to generate a new image…the generator model determines values of the latent variables corresponding to the time step by sampling from the prior distribution”. This teaches sampling a current latent variable for the time step from the probability distribution over the latent space); evaluating an objective function for the time step that measures an error of images rendered using a scene representation neural network as conditioned on the current latent variable for the time step (Paragraph 72 teaches a loss function that calculates the probability of the image, output from the scene representation neural network conditioned on a latent variable sampled for the time step as taught in Paragraph 77, matching the target image x. This teaches measuring an error of the images rendered), and determining updated parameters of the probability distribution over the latent space using a refinement neural network by processing a network input to the refinement neural network comprising: (i) a representation of the scene generated by processing the plurality of observations using the encoding neural network (Paragraph 76 teaches a “latent variable neural network … generate as output a set of sufficient statistics of the prior distribution”. The latent variable neural network teaches the refinement neural network, the set of sufficient statistics teaches the probability distribution over the latent space, and the set of sufficient statistics teaches the parameters of the probability distribution. Therefore, this teaches determining updated parameters of the probability distribution using a refinement neural network. Paragraph 76 also teaches the latent variable neural network takes as input the state of the recurrent neural network and set of latent variable neural network parameters. Paragraph 50 teaches using the encoding neural network to create the scene representations and Paragraph 51 teaches the recurrent neural network final state is the semantic representation. Thus, the latent variable neural network which takes in the recurrent neural network state teaches the refinement neural network taking as input a representation of the scene.) sampling a latent variable representing the scene in accordance with the determined probability distribution of latent variables (Paragraph 77 and Figure 4 step 408 teaches “at each time step, the generator model determines values of the latent variables corresponding to the time step” and also “to generate a new image…the generator model determines values of the latent variables corresponding to the time step by sampling from the prior distribution”. This teaches sampling a latent variable for the time step from the probability distribution over the latent space); and rendering the new image that depicts the scene from the perspective of the camera at the new camera location using the scene representation neural network conditioned on the latent variable representing the scene (Paragraph 9 teaches rendering a new image of a scene from a camera at a new location by using the generator neural network or scene representation neural network; Paragraph 78 and Figure 4 step 410 teaches updating states of the generator model when processing the latent variables. This teaches conditioning the neural network with the latent variables) wherein the scene representation neural network, the encoding neural network, and the refinement neural network have been jointly trained using training examples that include example observations of example scenes (Paragraph 12 teaches the generator neural network and observation neural network are jointly trained with a plurality of training observations. Paragraph 3 teaches observations are images of the environment which can be a scene. Paragraph 76 teaches the latent variable neural network, which is the refinement neural network, is within the generator model. Thus, when the generator model is trained, the refinement neural network can also be considered to be trained). However, Rezende is not relied upon for the below claim language: wherein the scene representation neural network is configured to process the current latent variable for the time step to define a geometric model of the scene as a three-dimensional (3D) radiance field; determining updated parameters of the probability distribution over the latent space using a refinement neural network by processing a network input to the refinement neural network comprising: (ii) gradients of the objective function for the time step with respect to the parameters of the probability distribution of latent variables; and rendering the new image by projecting radiance values from the geometric model of the scene as a 3D radiance field onto an image plane of the camera at the new camera location. Schwarz teaches wherein the scene representation neural network is configured to process the current latent variable for the time step to define a geometric model of the scene as a three-dimensional (3D) radiance field (Section 3, Paragraph 1 teaches representing a scene by its radiance field; Page 5, Figure 2 and its description shows a scene representation neural network with a generator and discriminator. The network defines a geometric model of the scene as a 3D radiance field within the box labeled ‘Conditional Radiance Field’. The network is also conditioned by the z s and z a latent variables; Section 3.2.1 ‘Conditional Radiance Field’ subsection teaches encoding each 3D location and viewing direction to a RGB color value ‘c’ and volume density σ. This mapping represents the scene as a conditional radiance field g ϴ , or 3D radiance field); and rendering the new image that depicts the scene from the perspective of the camera at the new camera location using the scene representation neural network conditioned on the latent variable representing the scene by projecting radiance values from the geometric model of the scene as a 3D radiance field onto an image plane of the camera at the new camera location (Page 5, Figure 2 shows the network is conditioned by the z s and z a latent variables and has a new camera location x and viewing direction, or perspective of the camera, d as inputs. Then the generator outputs a predicted patch P’ using volume rendering on the radiance output from the conditional radiance field; Section 3.2, Paragraph 2 and Figure 2 teach using the generator to create the radiance values and then synthesize a patch P’ which can be considered the new image to be rendered; Section 3.2.1 ‘Volume Rendering’ subsection teaches getting the color of the pixel corresponding to ray r to create the predicted patch P’. The pixel corresponding to ray r inherently involves projecting the ray onto an image plane which identifies that pixel corresponding to the ray). Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the method of rendering a new image from a new perspective taught by Rezende with the using radiance fields to render the new image as taught by Schwarz in order to generate high-resolution images that have visual fidelity and 3D consistency (Schwarz Page 2, Paragraph 2). However, Rezende and Schwarz are not relied upon for the below claim language: determining updated parameters of the probability distribution over the latent space using a refinement neural network by processing a network input to the refinement neural network comprising: (ii) gradients of the objective function for the time step with respect to the parameters of the probability distribution of latent variables. Menick teaches determining updated parameters of the probability distribution over the latent space using a refinement neural network by processing a network input to the refinement neural network comprising: (ii) gradients of the objective function for the time step with respect to the parameters of the probability distribution of latent variables (Paragraph 5 teaches "determining a gradient of a loss function … and adjusting the current parameter values of the … prior neural network based on the gradient". Paragraph 14 teaches the prior neural network outputs parameters for the prior probability distribution so the prior neural network teaches the refinement neural network. Thus, this teaches passing in a gradient of the objective function with respect to the current parameters of the probability distribution.). Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the method of rendering a new image from a new perspective taught by Rezende in view of Schwarz with the passing in gradients of an objective function to the refinement neural network taught by Menick in order to train the networks to accomplish tasks more effectively (Menick Paragraph 51). 10. Regarding claim 3, Rezende in view of Schwarz and Menick teaches the limitations of claim 1. Rezende further teaches the method wherein iteratively updating, using the encoding neural network, parameters of the probability distribution over the latent space comprises: generating a respective representation of each observation (Paragraph 46-47 teach observations are generated from the environment and provided to the system 100. The observations can be considered to be represented by the data structure that consist of the images 118 and viewpoint data 120. The viewpoint data identifies the location of the camera which includes its position, yaw, and pitch); processing the respective representation of each observation using the encoding neural network to generate a corresponding embedding of each observation (Figure 1, Paragraph 4 and 50 teach processing each observation through the observation neural network 104. The observation neural network can be considered an encoding neural network which processes the observation and generates a lower dimensional representation of each observation. This lower dimensional representation can be considered the corresponding embedding of each observation); and combining the embeddings of the observations to generate the representation of the scene for processing by the refinement neural network (Paragraph 50 teaches combining “the lower-dimensional numerical representations of each observation 102 to generate as output a numerical semantic representation". Paragraph 76 teaches the latent variable neural network takes as input the state of the recurrent neural network. Paragraph 51 teaches the recurrent neural network final state is the semantic representation which is created by combining the embeddings of the observations. Thus, this teaches the refinement neural network takes as input combined embeddings). 11. Regarding claim 4, Rezende in view of Schwarz and Menick teaches the limitations of claim 3. However, Rezende and Schwarz are not relied upon for the below claim language: the method wherein combining the embeddings of the observations to generate the representation of the scene for processing by the refinement neural network comprises: averaging the embeddings of the observations to generate the representation of the scene for processing by the refinement neural network. Menick teaches the method wherein combining the embeddings of the observations to generate the representation of the scene for processing by the refinement neural network comprises: averaging the embeddings of the observations to generate the representation of the scene for processing by the refinement neural network (Paragraph 106 teaches "determining a code that represents the current observation" and "the system may determine the code to be the mean vector" which is then used by the prior neural network (refinement neural network) to "generate the prior distribution over the latent space for the next iteration"). Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the method of rendering a new image from a new perspective taught by Rezende in view of Schwarz with the passing in gradients of an objective function to the refinement neural network taught by Menick in order to train the networks to accomplish tasks more effectively (Menick Paragraph 51). 12. Regarding claim 6, Rezende in view of Schwarz and Menick teaches the limitations of claim 1. Rezende further teaches the method wherein the latent variable representing the scene comprises a plurality of latent sub-variables (Paragraph 9 teaches the latent variables comprise of variables inferred by the generator neural network conditioned by the data identifying the new camera location and numeric representation of a scene; Paragraph 56 teaches passing the semantic numeric representation and query viewpoint data as inputs into the generator which when processed make up the probability distribution where latent variables are sampled from. Thus, the semantic numeric representation and query viewpoint data can be considered latent sub-variables since when processed, they create the latent variable). 13. Regarding claim 11, Rezende in view of Schwarz and Menick teaches the limitations of claim 1. However, Rezende fails to teach the method wherein rendering the new image comprises, for each pixel of the new image: identifying a ray corresponding to the pixel that projects into the scene from the image plane of the camera at the new camera location; determining, for each of a plurality of spatial locations on the ray, a radiance emitted in a direction of the ray at the spatial location on the ray using the scene representation neural network conditioned on the latent variable representing the scene; rendering a color of the pixel in the new image based on the radiances emitted in the direction of the ray at the plurality of spatial locations on the ray. Schwarz teaches the method wherein rendering the new image comprises, for each pixel of the new image: identifying a ray corresponding to the pixel that projects into the scene from the image plane of the camera at the new camera location (Section 3.2.1 ‘Volume Rendering’ subsection teaches getting the color of the pixel corresponding to ray r to create the predicted patch P’ which is the image. The pixel corresponding to ray r is when ray r is identified and projected onto the image plane onto that pixel. The color of the pixel is obtained from the ( c r ,   σ r ) output from the conditional radiance field mapping of the spatial location x which can be considered the new location of the camera); determining, for each of a plurality of spatial locations on the ray, a radiance emitted in a direction of the ray at the spatial location on the ray using the scene representation neural network conditioned on the latent variable representing the scene (Figure 2 description shows outputting ( c r ,   σ r ) given the representation of each spatial location x r i which is along the ray r, representation of viewing direction d, and latent variables representing the scene z s and z a . c r   is the color or radiance of that ray so the radiance is determined in the viewing direction d at the plurality of spatial locations x r i   . The latent variables   z s and z a are input and condition the conditional radiance field or scene representation neural network). rendering a color of the pixel in the new image based on the radiances emitted in the direction of the ray at the plurality of spatial locations on the ray (Section 3.2.1. ‘Volume Rendering’ subsection teaches getting the color of the pixel corresponding to all the points along the ray r to create the predicted patch ‘P’). Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the method of rendering a new image from a new perspective taught by Rezende in view of Menick with the using radiance fields to render the new image as taught by Schwarz in order to generate high-resolution images that have visual fidelity and 3D consistency (Schwarz Page 2, Paragraph 2). 14. Regarding claim 12, Rezende in view of Schwarz and Menick teaches the limitations of claim 11. However, Rezende fails to teach the method further comprising: determining, for each of the plurality of spatial locations on the ray, a volume density of the scene at the spatial location which characterizes a likelihood that the ray would terminate at the spatial location; and rendering the color of the pixel in the new image based on both the radiances emitted in the direction of the ray and the volume densities at the plurality of spatial locations on the ray. Schwarz teaches the method further comprising: determining, for each of the plurality of spatial locations on the ray, a volume density of the scene at the spatial location which characterizes a likelihood that the ray would terminate at the spatial location (Section 3.2.1. ‘Volume Rendering’ subsection teaches having the volume density σ r for all points along the ray r. The volume density is created through Equation 9 in the Section 3.2.1 ‘Conditional Radiance Field’ subsection which maps the location ‘x’ on the ray with the shape latent code z s to calculate the density; Figure 1 description teaches they have a continuous function g ϴ which shows a distribution according the color and volume density which exhibits a likelihood of the object’s shape or surfaces. The existence of volume density indicates a surface or existence of an object. Rays inherently end or terminate at dense surfaces); and rendering the color of the pixel in the new image based on both the radiances emitted in the direction of the ray and the volume densities at the plurality of spatial locations on the ray (Section 3.2.1 ‘Volume Rendering’ subsection teaches rendering the color of pixels based on the direction of the ray ‘r’ and the volume density σ r for all points along the ray. Teaches Equation 3 in Section 3.1 ‘Volume Rendering’ subsection teaches the mapping between the volume density to the color). Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the method of rendering a new image from a new perspective taught by Rezende in view of Menick with the using radiance fields to render the new image as taught by Schwarz in order to generate high-resolution images that have visual fidelity and 3D consistency (Schwarz Page 2, Paragraph 2). 15. Regarding claim 13, Rezende in view of Schwarz and Menick teaches the limitations of claim 12. However, Rezende fails to teach the method wherein for each of the plurality of spatial locations on the ray, determining the radiance emitted in the direction of the ray at the spatial location and the volume density at the spatial location comprises: providing a representation of the spatial location on the ray and a representation of the direction of the ray to the scene representation neural network conditioned on the latent variable representing the scene to generate an output that defines the radiance emitted in the direction of the ray at the spatial location and the volume density at the spatial location. Schwarz teaches the method wherein for each of the plurality of spatial locations on the ray, determining the radiance emitted in the direction of the ray at the spatial location and the volume density at the spatial location comprises: providing a representation of the spatial location on the ray (Figure 2 shows x r which is a representation of the spatial location on the ray) and a representation of the direction of the ray (Figure 2 shows d r which is a representation of the direction of ray) to the scene representation neural network conditioned on the latent variable representing the scene to generate an output that defines the radiance emitted in the direction of the ray at the spatial location and the volume density at the spatial location (Figure 2 description shows outputting ( c r ,   σ r ) given the representation of each spatial location x r i which is along the ray r, representation of viewing direction d, and latent variables representing the scene z s and z a . The latent variables   z s and z a are input and condition the conditional radiance field or scene representation neural network. The output ( c r ,   σ r ) defines the radiance through c r and the volume density σ r ). Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the method of rendering a new image from a new perspective taught by Rezende in view of Menick with the using radiance fields to render the new image as taught by Schwarz in order to generate high-resolution images that have visual fidelity and 3D consistency (Schwarz Page 2, Paragraph 2). 16. Regarding claim 14, Rezende in view of Schwarz and Menick teaches the limitations of claim 12. However, Rezende fails to teach the method wherein rendering the color of the pixel in the new image based on both the radiances emitted in the direction of the ray and the volume densities at the plurality of spatial locations on the ray comprises: accumulating the radiances emitted in the direction of the ray and the volume densities at the plurality of spatial locations on the ray. Schwarz teaches the method wherein rendering the color of the pixel in the new image based on both the radiances emitted in the direction of the ray and the volume densities at the plurality of spatial locations on the ray comprises: accumulating the radiances emitted in the direction of the ray and the volume densities at the plurality of spatial locations on the ray (Section 3.2.1 ‘Volume Rendering’ subsection teaches accumulating the radiances and volume density represented by c r i ,   σ r i of all the points ‘i’ along the ray r to obtain the resulting color c r ; Section 3.1 ‘Volume Rendering’ subsection also shows they accumulate c r i ,   σ r i i = 1 N for all points ‘i’ to N and map it to a color value c r ) Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the method of rendering a new image from a new perspective taught by Rezende in view of Menick with the using radiance fields to render the new image as taught by Schwarz in order to generate high-resolution images that have visual fidelity and 3D consistency (Schwarz Page 2, Paragraph 2). 17. Regarding claim 16, claim 16 is the non-transitory computer storage media claim (Rezende Paragraph 91 teaches a non-transitory storage medium storing program instructions) of method claim 1 and is accordingly rejected using substantially similar rationale as to that which is set for with respect to claim 1. 18. Regarding claim 17, claim 17 is the system claim (Rezende Paragraph 96 teaches computers executing instructions store from a memory) of method claim 1 and is accordingly rejected using substantially similar rationale as to that which is set for with respect to claim 1. 19. Regarding claim 19, the claim is similar in scope to claim 3. Therefore, similar rationale as applied in the rejection of claim 3 applies herein. 20. Regarding claim 20, the claim is similar in scope to claim 4. Therefore, similar rationale as applied in the rejection of claim 4 applies herein. 21. Regarding claim 22, Rezende in view of Schwarz and Menick teaches the limitations of claim 17. Rezende further teaches the system wherein the latent variable representing the scene comprises a plurality of latent sub-variables (Paragraph 9 teaches the latent variables comprise of variables inferred by the generator neural network conditioned by the data identifying the new camera location and numeric representation of a scene; Paragraph 56 teaches passing the semantic numeric representation and query viewpoint data as inputs into the generator which when processed make up the probability distribution where latent variables are sampled from. Thus, the semantic numeric representation and query viewpoint data can be considered latent sub-variables since when processed, they create the latent variable). 22. Claim(s) 5, 15, and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rezende et al. (U.S. Patent Application Publication No. 2019/0258907 A1), hereinafter referred to as Rezende, in view of Schwarz et al. ("GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"), hereinafter referred to as Schwarz, and Menick et al. (U.S. Patent Application Publication No. 2021/0004677 A1), hereinafter Menick, as applied to claim 1 and 17 above, and further in view of Kar et al. (U.S. Patent Application Publication No. 2020/0226736 A1), hereinafter referred to as Kar. 23. Regarding claim 5, Rezende in view of Schwarz and Menick teaches the limitations of claim 1. Rezende further teaches the method wherein evaluating the objective function for the time step that measures the error of images rendered using the scene representation neural network as conditioned on the current latent variable for the time step comprises: conditioning the scene representation neural network on the current latent variable for the time step (Paragraph 78 and Figure 4 step 410 teaches updating states of the generator model when processing the sampled current latent variables for that time step. The generator model is the scene representation neural network. Thus, this teaches conditioning the neural network with the latent variables for the time step); rendering a rendered image for the time step that depicts the scene from a perspective of a camera at a target camera location using the scene representation neural network as conditioned on the current latent variable for the time step (Paragraph 81, Figure 4 step 414 teaches generating a new image for the time step depicting the scene from a viewpoint of a camera at a new camera location using the generator model which is the scene representation neural network conditioned on the latent variable as taught in step 410. The new camera location can be considered the target camera location); evaluating the objective function for the time step by determining an error (Paragraph 72 teaches a loss function that calculates the probability of the image, output from the scene representation neural network conditioned on a latent variable sampled for the time step as taught in Paragraph 77, matching the target image x. This teaches measuring an error of the images rendered) However, Rezende is not relied upon for the below claim language: evaluating the objective function between: (i) the rendered image for the time step that depicts the scene from the perspective of the camera at the target camera location, and (ii) a target image of the scene captured from the camera at the target camera location. Kar teaches evaluating the objective function between: (i) the rendered image for the time step that depicts the scene from the perspective of the camera at the target camera location (Paragraph 297 teaches generating “a novel view from a target viewpoint”), and (ii) a target image of the scene captured from the camera at the target camera location (Paragraph 297 teaches “that novel view may then be compared with a view of the 3D scene rendered directly from the target viewpoint. A loss function between the actual rendering and the novel view may be used to update the model”. This teaches calculating an error through the loss function between the actual rendering, which is the target image, and the novel view, which is the rendered image. The loss function is the objective function). Rezende, Schwarz, and Kar are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the method of rendering a new image from a new perspective taught by Rezende in view of Schwarz and Menick with the objective function depending on both the rendered and target image in order to update the model and alter weights (Kar Paragraph 307) to achieve an output image closer to the target image. 24. Regarding claim 15, Rezende in view of Schwarz and Menick teaches the limitations of claim 1. Rezende further teaches the method wherein the scene representation neural network has a plurality of neural network parameters (Paragraph 58 teaches the generator model has parameters that are adjusted through a training engine. The generator model is the scene representation neural network and thus has a plurality of neural network parameters), wherein before being used to render the new image of the scene, the scene representation neural network is trained to determine trained values of the neural network parameters from initial values of the neural network parameters (Paragraph 56 teaches the initial values of the neural network parameters set by the semantic representation or observation embeddings. Paragraph 58 teaches the generator model has the parameter values adjusted according to the backpropagation of gradients from an objective function), wherein training the scene representation neural network comprises, for each of a plurality of other scenes (Paragraph 48 teaches the environment could be a video which contains multiple frames. Each frame can depict another scene within the natural or computer-generated world): conditioning the scene representation neural network on a latent variable representing the other scene (Paragraph 9 teaches the latent variables are determined by generator neural network conditioned upon the data for a particular scene. This teaches a latent variable representing the other scene when the environment is a video with multiple frames with multiple scenes as taught in Paragraph 48; Paragraph 78, Figure 4 step 410 teaches updating states of the generator model by processing the latent variables which is conditioning the neural network); and rendering one or more images that each depict the other scene from the perspective of a camera at a location in the other scene using the scene representation neural network conditioned on the latent variable representing the other scene (Paragraph 9 teaches rendering a new image of a scene from a camera at a new camera location; Paragraph 78, Figure 4 step 410 teaches updating states of the generator model by processing the latent variables which is conditioning the neural network); and updating current values of the neural network parameters of the scene representation neural network using gradients of an objective function (Paragraph 58, Figure 1 teaches the generator model has the parameter values adjusted according to the backpropagation of gradients from an objective function through the training engine 124; Paragraph 78, Figure 4 step 410 teaches updating states of the generator model by processing the latent variables which is conditioning the neural network). However, Rezende, Schwarz, and Menick are not relied upon for the below claim language: an objective function that depends on the images of the other scene rendered. Kar teaches an objective function that depends on the images of the other scene rendered (Paragraph 297 teaches using a loss function between the actual rendering, which is the target image, and the novel view, which is the rendered image. The loss function is the objective function). Rezende, Schwarz, and Kar are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the method of rendering a new image taught by Rezende in view of Schwarz and Menick with the objective function depending on both the rendered and target image in order to update the model and alter weights (Kar Paragraph 307) to achieve an output image closer to the target image. 25. Regarding claim 21, Rezende in view of Schwarz and Menick teaches the limitations of claim 17. Rezende further teaches wherein evaluating the objective function for the time step that measures the error of images rendered using the scene representation neural network as conditioned on the current latent variable for the time step comprises: conditioning the scene representation neural network on the current latent variable for the time step (Paragraph 9 teaches the latent variables are determined by generator neural network conditioned upon the data for a particular scene. This teaches a latent variable representing the other scene when the environment is a video with multiple frames with multiple scenes as taught in Paragraph 48; Paragraph 78, Figure 4 step 410 teaches updating states of the generator model by processing the latent variables which is conditioning the neural network); rendering a rendered image for the time step that depicts the scene from a perspective of a camera at a target camera location using the scene representation neural network as conditioned on the current latent variable for the time step (Paragraph 9 teaches rendering a new image of a scene from a perspective of a camera at a target camera location; Paragraph 78, Figure 4 step 410 teaches updating states of the generator model by processing the latent variables which is conditioning the scene representation neural network); evaluating the objective function for the time step by determining an error (Paragraph 72 teaches a loss function that calculates the probability of the image, output from the scene representation neural network conditioned on a latent variable sampled for the time step as taught in Paragraph 77, matching the target image x. This teaches measuring an error of the images rendered) However, Rezende is not relied upon for the below claim language: evaluating the objective function between: (i) the rendered image for the time step that depicts the scene from the perspective of the camera at the target camera location, and (ii) a target image of the scene captured from the camera at the target camera location. Kar teaches evaluating the objective function between: (i) the rendered image for the time step that depicts the scene from the perspective of the camera at the target camera location (Paragraph 297 teaches generating “a novel view from a target viewpoint”), and (ii) a target image of the scene captured from the camera at the target camera location (Paragraph 297 teaches “that novel view may then be compared with a view of the 3D scene rendered directly from the target viewpoint. A loss function between the actual rendering and the novel view may be used to update the model”. This teaches calculating an error through the loss function between the actual rendering, which is the target image, and the novel view, which is the rendered image. The loss function is the objective function). Rezende, Schwarz, and Kar are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the system of rendering a new image from a new perspective taught by Rezende in view of Schwarz and Menick with the objective function depending on both the rendered and target image in order to update the model and alter weights (Kar Paragraph 307) to achieve an output image closer to the target image. 26. Claim(s) 7-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rezende et al. (U.S. Patent Application Publication No. 2019/0258907 A1), hereinafter referred to as Rezende, in view of Schwarz et al. ("GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"), hereinafter referred to as Schwarz, and Menick et al. (U.S. Patent Application Publication No. 2021/0004677 A1), hereinafter Menick, as applied to claim 6 and further in view of Vaswani et al. ("Attention is All You Need" -- cited in IDS), hereinafter referred to as Vaswani. 27. Regarding claim 7, Rezende in view of Schwarz and Menick teaches the limitations of claim 6. Rezende further teaches the plurality of latent sub-variables of the latent variable (Paragraph 9 teaches the latent variables comprise of variables inferred by the generator neural network conditioned by the data identifying the new camera location and numeric representation of a scene; Paragraph 56 teaches passing the semantic numeric representation and query viewpoint data as inputs into the generator which when processed make up the probability distribution where latent variables are sampled from. Thus, the semantic numeric representation and query viewpoint data can be considered latent sub-variables since when processed, they create the latent variable). However, Rezende fails to teach the method wherein the scene representation neural network comprises a sequence of one or more update blocks, wherein each update block is configured to: receive a current joint embedding of a spatial location in the scene and a viewing direction; update the current joint embedding of the spatial location in the scene and the viewing direction using attention over one or more of the plurality of latent sub-variables of the latent variable. Schwarz teaches the current joint embedding of a spatial location in the scene and a viewing direction (Section 3.2.1 ‘Conditional Radiance Field’ subsection teaches in Equation 6 that the viewing direction d and spatial location x are mapped to a (c,σ) pair which can be considered the current joint embedding). Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the method of rendering a new image from a new perspective taught by Rezende with the joint embedding to create radiance fields to render the new image as taught by Schwarz in order to generate high-resolution images that have visual fidelity and 3D consistency (Schwarz Page 2, Paragraph 2). However, Rezende, Schwarz and Menick fail to teach the method wherein the scene representation neural network comprises a sequence of one or more update blocks, wherein each update block is configured to: receive a current joint embedding Vaswani teaches the scene representation neural network comprises a sequence of one or more update blocks (Section 3, Figure 1 teaches an architecture which has as sequence of N layers. Each layer can be considered an update block), wherein each update block is configured to: receive a current joint embedding (Section 3.1 ‘Encoder’ subsection and Figure 1 show an input embedding being passed in which can be the current joint embedding. The current joint embedding is passed into the transformer or update block, updated through that update block, and then passed into another update block. Thus, each update block receives the current joint embedding as it is being updated) (Section 3.1 ‘Encoder’ subsection teaches the layer or update block consists of a multi-head self attention mechanism. The input embedding is passed into the update block which can be considered both the current joint embedding and the latent sub-variable). Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of novel view synthesis. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Vaswani is considered analogous to the claimed invention because it is in the same field of recurrent models that deal with time steps. Thus, it would have been obvious to modify the method of rendering a new image from a new perspective taught by Rezende in view of Schwarz and Menick with the update blocks and attention mechanism taught by Vaswani in order to optimize sequential computation and allow for more parallelism (Vaswani Section 2, Paragraph 2-4). 28. Regarding claim 8, Rezende in view of Schwarz, Menick, and Vaswani teaches the limitations of claim 7. However, Rezende, Schwarz and Menick fail to teach the method wherein the attention is multi-head query-key-value attention. Vaswani teaches the method wherein the attention is multi-head query-key-value attention (Section 3.2.2, Figure 2 teaches that the attention is a multi-head query-key-value attention function). Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of novel view synthesis. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Vaswani is considered analogous to the claimed invention because it is in the same field of recurrent models that deal with time steps. Thus, it would have been obvious to modify the method of rendering a new image from a new perspective taught by Rezende in view of Schwarz and Menick with the multi-head attention query-key-value mechanism taught by Vaswani in order to allow the model to include information from different representation subspaces at different positions (Vaswani Section 3.2.2). 29. Regarding claim 9, Rezende in view of Schwarz, Menick, and Vaswani teaches the limitations of claim 7. Rezende further teaches the method wherein the scene representation neural network is configured to receive a representation of a spatial location in the scene and a representation of a viewing direction (Paragraph 53 and Figure 1 teaches that query viewpoint data 108 is received by the generator model 110 which is the scene representation neural network. The query viewpoint data is the representation of the location of the camera in the scene which contains both the position of the camera and viewing direction. The query viewpoint data contains the representation of the spatial location which is the camera’s location and viewing direction. Paragraph 9 teaches that when Rezende refers to a location of a camera, it includes both the position of the camera and the viewing direction). However, Rezende is not relied upon for the below claim language: generate a joint embedding of the spatial location in the scene and the viewing direction from the representation of the spatial location in the scene and the representation of the viewing direction; update the joint embedding using each update block in the sequence of one or more update blocks; and generate the output that defines the radiance emitted in the viewing direction at the spatial location in the scene from the joint embedding as updated by a final update block in the sequence of update blocks. Schwarz teaches generating a joint embedding of the spatial location in the scene and the viewing direction from the representation of the spatial location in the scene and the representation of the viewing direction (Section 3.2.1 ‘Conditional Radiance Field’ subsection teaches in Equation 6 that the viewing direction d and spatial location x are mapped to a (c, σ ) pair which can be considered the joint embedding) ; (Figure 2 description shows output ( c r ,   σ r ) is created given the representation of the spatial location x r , representation of the viewing direction d r , and latent variables representing the scene z s and z a . The output ( c r ,   σ r ) is the joint embedding of the color and volume density. It defines the radiance through c r in the viewing direction d r at the spatial location x r ; Section 3.2.1 ‘Volume Rendering’ teaches obtaining the color c r by combining the results of all the color and volume density output pairs c r i ,   σ r i ) Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the method of rendering a new image from a new perspective taught by Rezende with the joint embedding to create radiance fields to render the new image as taught by Schwarz in order to generate high-resolution images that have visual fidelity and 3D consistency (Schwarz Page 2, Paragraph 2). However, Rezende, Schwarz and Menick are not relied upon for the below claim language: updating the joint embedding using each update block in the sequence of one or more update blocks and the joint embedding as updated by a final update block in the sequence of update blocks. Vaswani teaches updating the joint embedding using each update block in the sequence of one or more update blocks (Section 3, Figure 1 teaches an architecture which has a sequence of N layers. Each layer can be considered an update block; Section 3.1 ‘Encoder’ section and Figure 1 shows an input embedding being passed in which gets updated after each update block) and the joint embedding as updated by a final update block in the sequence of update blocks (Section 3, Figure 1 shows an input embedding passed through the N update blocks and then being passed through the decoder as the output embedding. After passing through the additional update blocks in the encoder, it is output as the final updated joint embedding). Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of novel view synthesis. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Vaswani is considered analogous to the claimed invention because it is in the same field of recurrent models that deal with time steps. Thus, it would have been obvious to modify the method of rendering a new image from a new perspective taught by Rezende in view of Schwarz and Menick with the multi-head attention query-key-value mechanism taught by Vaswani in order to allow the model to include information from different representation subspaces at different positions (Vaswani Section 3.2.2). 30. Regarding claim 10, Rezende in view of Schwarz, Menick, and Vaswani teaches the limitations of claim 7. However, Rezende fails to teach wherein each latent sub-variable comprises a plurality of channels, wherein each update block is assigned a respective latent sub-variable, and wherein for each update block, updating the current joint embedding using attention over one or more of the plurality of latent sub-variables of the latent variable comprises: updating the current joint embedding using attention over only the latent sub-variable that is assigned to the update block. Schwarz teaches wherein each latent sub-variable comprises a plurality of channels (Section 3.2.1 ‘Conditional Radiance Field’ subsection teaches in Equation 6 that the viewing direction d and spatial location x are mapped to a ( c ,   σ ) pair which can be considered the joint embedding. The c and σ components of the pair can be considered the channels of the joint embedding or latent sub-variable. Paragraph 9 teaches the latent variables comprise of variables inferred by the generator neural network conditioned by the data identifying the new camera location and numeric representation of a scene. Paragraph 56 teaches passing the semantic numeric representation and query viewpoint data as inputs into the generator which when processed make up the probability distribution where latent variables are sampled from. Thus, the semantic numeric representation and query viewpoint data can be considered latent sub-variables since when processed, they create the latent variable), Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of rendering novel views. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Thus, it would have been obvious to a person holding ordinary skill in the art before the effective filing date to modify the method of rendering a new image from a new perspective taught by Rezende with the joint embedding to create radiance fields to render the new image as taught by Schwarz in order to generate high-resolution images that have visual fidelity and 3D consistency (Schwarz Page 2, Paragraph 2). However, Rezende, Schwarz and Menick fail to teach the method wherein each update block is assigned a respective latent sub-variable, and wherein for each update block, updating the current joint embedding using attention over one or more of the plurality of latent sub-variables of the latent variable comprises: updating the current joint embedding using attention over only the latent sub-variable that is assigned to the update block. Vaswani teaches the method wherein each update block is assigned a respective latent sub-variable, and wherein for each update block, updating the current joint embedding using attention over one or more of the plurality of latent sub-variables of the latent variable comprises: updating the current joint embedding using attention over only the latent sub-variable that is assigned to the update block (Section 3 Figure 1 teaches an architecture that has a sequence of N layers. Each layer can be considered an update block; Section 3.1 ‘Encoder’ section and Figure 1 shows an input embedding, which can be considered the current joint embedding, being passed in to each update block. The joint embedding is passed into the layer or update block, updated through the update block, and then passed into another update block. The joint embedding can be considered to be the latent sub-variable as mapped in claim 6). Rezende and Schwarz are considered analogous to the claimed invention as because both are in the same field of novel view synthesis. Menick is considered analogous to the claimed invention because both are in the same field of training neural networks with scene observations. Vaswani is considered analogous to the claimed invention because it is in the same field of recurrent models that deal with time steps. Thus, it would have been obvious to modify the method of rendering a new image from a new perspective taught by Rezende in view of Schwarz and Menick with the multi-head attention query-key-value mechanism taught by Vaswani in order to allow the model to include information from different representation subspaces at different positions (Vaswani Section 3.2.2). Conclusion 31. Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTINE Y AHN whose telephone number is (571)272-0672. The examiner can normally be reached M-F 9-5pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alicia Harrington can be reached at (571)272-2330. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /CHRISTINE YERA AHN/Examiner, Art Unit 2615 /ALICIA M HARRINGTON/Supervisory Patent Examiner, Art Unit 2615
Read full office action

Prosecution Timeline

Aug 01, 2023
Application Filed
May 16, 2025
Non-Final Rejection — §103
Jul 29, 2025
Interview Requested
Aug 06, 2025
Applicant Interview (Telephonic)
Aug 06, 2025
Examiner Interview Summary
Aug 20, 2025
Response Filed
Oct 27, 2025
Final Rejection — §103
Jan 07, 2026
Interview Requested
Jan 28, 2026
Request for Continued Examination
Feb 09, 2026
Response after Non-Final Action
Mar 31, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602877
BODY MODEL PROCESSING METHODS AND APPARATUSES, ELECTRONIC DEVICES AND STORAGE MEDIA
2y 5m to grant Granted Apr 14, 2026
Patent 12548187
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM
2y 5m to grant Granted Feb 10, 2026
Patent 12456274
FACIAL EXPRESSION AND POSE TRANSFER UTILIZING AN END-TO-END MACHINE LEARNING MODEL
2y 5m to grant Granted Oct 28, 2025
Patent 12450810
ANIMATED FACIAL EXPRESSION AND POSE TRANSFER UTILIZING AN END-TO-END MACHINE LEARNING MODEL
2y 5m to grant Granted Oct 21, 2025
Patent 12439025
APPARATUS, SYSTEM, METHOD, STORAGE MEDIUM, AND FILE FORMAT
2y 5m to grant Granted Oct 07, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
69%
Grant Probability
99%
With Interview (+37.5%)
2y 7m
Median Time to Grant
High
PTA Risk
Based on 16 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month