Last updated: April 19, 2026
Application No. 18/571,748
EDITABLE FREE-VIEWPOINT VIDEO USING A LAYERED NEURAL REPRESENTATION

Non-Final OA §103
Filed
Dec 19, 2023
Examiner
DUFFY, CAROLINE TABANCAY
Art Unit
2662
Tech Center
2600 — Communications
Assignee
Shanghaitech University
OA Round
1 (Non-Final)
Interview Optional

— +26.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 78 resolved cases, 2023–2026
Examiner Intelligence

DUFFY, CAROLINE TABANCAY View full profile →
Grants 80% — above average
Career Allow Rate
62 granted / 78 resolved
+17.5% vs TC avg
Strong +27% interview lift
Without
With
+26.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
18 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
13.8%
-26.2% vs TC avg
§103
58.2%
+18.2% vs TC avg
§102
7.7%
-32.3% vs TC avg
§112
18.2%
-21.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 78 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Applicant’s claim for the benefit of prior-filed application PCT/CN2021/108513 filed 07/26/2021 under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/29/2023 is being considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 2, 5, 6, 8-11, 13, 15, 16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ost et al. (Neural Scene Graphs for Dynamic Scenes, published March 5th, 2021), in view of Pumarola (D-NeRF: Neural Radiance Fields for Dynamic Scenes, published 2020). 
Regarding Claim 1, Ost teaches “A computer-implemented method, comprising: 
obtaining a plurality of video of a scene from a plurality of views, wherein the scene comprises an environment and one or more dynamic entities” (Ost, Section 5.1, paragraph 1 discloses “Each training sequence consists of up to 90 time steps or 9 seconds and images of size 1242 × 375, each from two camera perspectives, and up to 12 unique, dynamic objects from different object classes”);
generating a 3D bounding-box for each dynamic entity of the one or more dynamic entities in the scene” (Ost, see Figure 1 and Figure 1 caption “The nodes are visualized as boxes with their local cartesian coordinate axis”; see also section 3.2 “Dynamic Nodes” where “bounding box size” is defined as sO = [LO,HO,WO]);

    PNG
    media_image1.png
    430
    1045
    media_image1.png
    Greyscale

Figure 1 of Ost
“encoding, by a computer device, a machine learning model comprising an environment layer and a dynamic entity layer for each dynamic entity in the scene” (Ost, Section 3.1, paragraph 1 recites “A single background representation model in the top of Fig. 2 approximates all static parts of a scene with a neural radiance fields, that, departing from prior work, lives on sparse planes instead of a volume”; where a background representation model is an environment layer; see also Fig. 2, below. Ost, Section 3.2, paragraph 5, recites “In the following, we describe two augmented models for neural radiance fields which are also illustrated in Fig. 2, which represent scenes as shown in Fig. 1”; where dynamic model of Fig. 2 is a dynamic entity layer), 

    PNG
    media_image2.png
    540
    447
    media_image2.png
    Greyscale

Figure 2 of Ost
“wherein the environment layer represents a continuous function of space (Ost, Section 3.1 discloses “We map the positional and directional inputs with γ(x) and γ(d) to higher-frequency feature vectors and pass those as input to the background model, resulting in the following two stages of the representation network”; see Equation 3, 4 of Ost; where Fθbckg,1 is a continuous function of space), “and the dynamic entity layer represents a continuous function of space Ost, Section 3.2, paragraph 5, recites “The continuous volumetric scene functions Fθc from Eq. 6 are modeled with a MLP architecture presented in the bottom of Fig. 2”; where a continuous volumetric scene function is a continuous function of space), “wherein the dynamic entity layer comprises a deformation module” (Ost, Section 3.2, paragraph 3 recites “We modify the input to the mapping function, conditioning the volume density on the global 3D location x of the sampled point and a 256-dimensional latent vector lo, resulting in the following new first stage 

    PNG
    media_image3.png
    37
    410
    media_image3.png
    Greyscale
”;
 where Fθc, 1 is a deformation module) “and a neural radiance module” (Ost, Section 3.2, paragraph 5 recites “To ensure the volumetric consistency, the pose is only considered for the emitted color and not the density. This adds the pose to the inputs y(t) and d of the second stage, that is 

    PNG
    media_image4.png
    32
    449
    media_image4.png
    Greyscale
”;
; where Fθc, 2 is a neural radiance module), “the deformation module is configured to deform a spatial coordinate in accordance with (Ost, Section 3.2, paragraph 3 recites “Conditioning on the latent code allows shared weights θc between all objects of class c”; where shared weight is a trained deformation weight; where y(x) output is a deformed spatial coordinate) “and the neural radiance module is configured to derive a density value and a color in accordance with the deformed spatial coordinate(Ost, Section 3.2, paragraph 5 recites “The continuous volumetric scene functions Fθc from Eq. 6 are modeled with a MLP architecture presented in the bottom of Fig. 2. The representation maps a latent vector lO, points x ∈ [−1,1] and a viewing direction d in the local frame of O to its corresponding volumetric density σ and directional c emitted color”; see also Equation (9) and Equation (10):

    PNG
    media_image5.png
    39
    423
    media_image5.png
    Greyscale


    PNG
    media_image6.png
    43
    472
    media_image6.png
    Greyscale

where y(x, lO) is a deformed spatial coordinate; where d is a direction. Ost, Section 3.2, paragraph 3 recites “Conditioning on the latent code allows shared weights θc between all objects of class c”; where shared weight is a trained radiance weight):
“training the machine learning model using the plurality of videos to obtain a trained machine learning model” (Ost, Section 5, paragraph 2 discloses “We choose to train our method on scenes from the KITTI dataset [11]. For experiments on synthetic data, from KITTI’s virtual counterpart, the Virtual KITTI 2 Dataset [5, 11], and for videos, we refer to the Supplementary Material and our project page”); “and 
rendering the scene in accordance with the trained machine learning model” (Ost, Section 4.1 paragraph discloses “Images of the learned scene are rendered using a ray casting approach”; see also Figure 3 of Ost).

    PNG
    media_image7.png
    238
    1046
    media_image7.png
    Greyscale

Figure 3 of Ost
Ost does not explicitly teach “wherein the environment layer represents a continuous function of space and time of the environment, and the dynamic entity layer represents a continuous function of space and time of the dynamic entity,” “the deformation module is configured to deform a spatial coordinate in accordance with a timestamp and a trained deformation weight to obtain a deformed spatial coordinate,” and “and the neural radiance module is configured to derive a density value and a color in accordance with the deformed spatial coordinate, the timestamp, a direction, and a trained radiance weight (emphasis added). 
However, in an analogous field of endeavor, Pumarola teaches “wherein the environment layer represents a continuous function of space and time of the environment, and the dynamic entity layer represents a continuous function of space and time of the dynamic entity” (Pumarola, Section 1, paragraph 3 recites “The learned model then allows to synthesize novel images, providing control in the continuum (θ,φ,t) of the camera views and time component, or equivalently, the dynamic state of the scene (see Fig. 1)”; where controlling the camera views and time component is a function of space and time. Pumarola, Section 4, paragraph 2 also discloses “The second module is called Deformation Network and consists of another MLP Ψt(x,t) → ∆x which predicts a deformation field defining the transformation between the scene at time t and the scene in its canonical configuration”; where Ψt is a function of space and time),
“the deformation module is configured to deform a spatial coordinate in accordance with a timestamp and a trained deformation weight to obtain a deformed spatial coordinate” (Pumarola, Section 4, paragraph 2 also discloses “The second module is called Deformation Network and consists of another MLP Ψt(x,t) → ∆x which predicts a deformation field defining the transformation between the scene at time t and the scene in its canonical configuration.” Pumarola, Section 5, paragraph 2 discloses “we sort the input images according to their time stamps (from lower to higher) and then we apply a curriculum learning strategy where we incrementally add images with higher time stamps”), 
“and the neural radiance module is configured to derive a density value and a color in accordance with the deformed spatial coordinate, the timestamp, a direction, and a trained radiance weight” (Pumarola, Section 4, paragraph 6 recites “As shown in previous works [34, 43, 26], directly feeding raw coordinates and angles to a neural network results in low performance. Thus, for both the canonical and the deformation networks, we first encode x, d and t into a higher dimension space.”) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ost to incorporate the teachings of Pumarola by incorporating time components and using time stamps to map dynamic scenes with a deformation network. The prior art Ost contained a ‘base’ method upon which the claimed invention can be seen as an ‘improvement.’ Ost teaches a method of rendering novel scene compositions from dynamic scenes, separating background and object representations. The prior art Pumarola contained a ‘comparable’ method that has been improved in the same way as the claimed invention. Pumarola teaches a method of rendering novel images from dynamic scenes, also incorporating time components and time stamps. One of ordinary skill in the art could have applied the known ‘improvement’ technique in the same way to the ‘base’ method and the results would have been predictable to one of ordinary skill in the art. That is, one of ordinary skill in the art could have incorporated the time components as taught by Pumarola into the method of Ost, for both background and object representations, and produced predictable results. Additionally, Ost teaches the method is tested on multiple time steps: Ost, Section 5.1, paragraph 1 discloses “Each training sequence consists of up to 90 time steps or 9 seconds and images of size 1242 × 375, each from two camera perspectives, and up to 12 unique, dynamic objects from different object classes.” Thus, incorporating a time element is taught by Ost, and Pumarola teaches the improvement technique of incorporating the time element to the deformation network itself. Finally, one of ordinary skill in the art would be motivated to combine the Ost and Pumarola references in order to render high quality novel views of objects: Pumarola, Section 7 discloses “A thorough evaluation demonstrates that D-NeRF is able to synthesize high quality novel views of scenes undergoing different types of deformation, from articulated objects to human bodies performing complex body postures.”) Accordingly, the combination of Ost and Pumarola discloses the invention of Claim 1. 

	Regarding Claim 2, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 1, wherein the scene comprises a first dynamic entity and a second dynamic entity” (Ost, Figure 1 shows objects l1 and l2, and thus teaches a first and second dynamic entity).

	Regarding Claim 5, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 1, wherein the deformation module comprises a multi-layer perceptron (MLP)” (Ost, Section 3.2, paragraph 5 recites “The continuous volumetric scene functions Fθc  from Eq. 6 are modeled with a MLP architecture presented in the bottom of Fig. 2”; where Fθc, 1 is a deformation module).

	Regarding Claim 6, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 5,wherein the deformation module comprises an 8-layer multi-layer perceptron (MLP) with a skip connection at a fourth layer” (see Ost, Figure 2, where Fθc, 1 shows a skip connection that concatenates to the fifth layer activation).

	Regarding Claim 8, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 1,further comprising: rendering each dynamic entity in accordance with the 3D bounding-box” (Ost, Figure 3 and bounding box in Figure 3(b); where rendering pipeline is rendering in accordance with the 3D bounding-box). 

	Regarding Claim 9, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 8, further comprising: 
computing intersections of a ray with the 3D bounding-box” (Ost, Section 4.1, paragraph 4 recites “Volumetric Rendering Each ray rj traced through the scene is discretized at Nd sampling points at each of the mj dynamic node intersections and at Ns planes, resulting in a set of quadrature points {{ti}i=1Ns+mjNd}j”);
obtaining a rendering segment of the dynamic object in accordance with the intersections; and rendering the dynamic entity in accordance with the rendering segment” (Ost, Section 4.1, paragraph 4 recites “The transmitted color c(r(ti)) and volumetric density σ(r(ti)) at each intersection point are predicted from the respective radiance fields in the static node Fθbckg  or dynamic node Fθc”).

Regarding Claim 10, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 2, further comprising: training each dynamic entity layer in accordance with the 3D bounding-box” (Ost, Section 4.2 paragraph 1 discloses “For each dynamic scene, we optimize a set of representation networks at each node F. Our training set consists of N tuples {(Ik,Sk)}Nk=1, the images Ik ∈ RH× W× 3 of the scene and the corresponding scene graph Sk”; where scene graph contain 3D bounding boxes; see Figure 1).

Regarding Claim 11, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 10, further comprising: training the environment layer, the dynamic entity layers for the first dynamic entity, and the second dynamic entity together with a loss function” (Ost, Section 4.2 recites At each step, the gradient to each trainable node in L and F intersecting with the rays in R is computed and back propagated”; where trainable nodes L and F are defined in Section 3 as: “leaf nodes F = Fθbckg∪{Fθc}Nclassc=1 represents both static and dynamic representation models, L ={lo}Nobjo=1 are leaf nodes that assign latent object codes to each representation leaf node”; thus, the loss function of Ost trains both static (environment) and dynamic (dynamic entity) representation models).

Regarding Claim 13, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 2, further comprising: 
applying an affine transformation to the 3D bounding box to obtain a new bounding-box” (Ost, Section 3, paragraph 2 recites:

    PNG
    media_image8.png
    232
    509
    media_image8.png
    Greyscale


    PNG
    media_image9.png
    249
    496
    media_image9.png
    Greyscale
; where assigning affine transformations for edges is applying affine transformations to 3D bounding boxes) and 
rendering the scene in accordance with the new bounding-box” (Ost, Section 4.1 paragraph discloses “Images of the learned scene are rendered using a ray casting approach”; see also Figure 3 of Ost).

Regarding Claim 15, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 2, further comprising:
applying a retiming transformation to the timestamp to obtain a new timestamp; and 
rendering the scene in accordance with the new timestamp” (Pumarola, Section 6, paragraph 1 discloses “Finally, we demonstrate D-NeRF ability to synthesize novel views at an arbitrary time in several complex dynamic scenes (Sec. 6.3).”) The proposed combination as well as the motivation for combining the Ost and Pumarola references presented in the rejection of Claim 1, apply to Claim 15 and are incorporated herein by reference.  Thus, the method recited in Claim 15 is met by the combination of Ost and Pumarola. 

Regarding Claim 16, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 2, further comprising: rendering the scene without the first dynamic entity” (Ost, Section 5, paragraph 1 discloses “We then modify the learned graphs to synthesize unseen frames of novel object arrangements, temporal variations, novel scene compositions and from novel views”; see also Figure 4(e); note that in novel scene (e), the red vehicle is not rendered, for example, thus Ost teaches rendering a scene without a first dynamic entity). 

Regarding Claim 18, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 1, wherein the environment layer comprises a neural radiance module, and the neural radiance module is configured to derive a density value and a color in accordance with the spatial coordinate, the timestamp, a direction, and a trained radiance weight” (Ost, Section 3.1, paragraph 1 discloses “The static background node function Fθbckg: (x,d) → (c,σ) maps a point x to its volumetric density and combined with a viewing direction to an emitted color. Thus, the background representation is implicitly stored in the weights θbckg”; where a viewing direction is a direction; where xi s a spatial coordinate; where weights θbckg is a trained radiance weight. Pumarola, Section 4, paragraph 2 also discloses “The second module is called Deformation Network and consists of another MLP Ψt(x,t) → ∆x which predicts a deformation field defining the transformation between the scene at time t and the scene in its canonical configuration.” Pumarola, Section 5, paragraph 2 discloses “we sort the input images according to their time stamps (from lower to higher) and then we apply a curriculum learning strategy where we incrementally add images with higher time stamps”; where it would be obvious to one or ordinary skill in the art that the methods of Pumarola applied to objects may be applied to the background, as Ost teaches weights of both background and dynamic nodes, and Ost teaches a method using time component in an MLP; the combination would be obvious to one of ordinary skill in the art). The proposed combination as well as the motivation for combining the Ost and Pumarola references presented in the rejection of Claim 1, apply to Claim 18 and are incorporated herein by reference.  Thus, the method recited in Claim 18 is met by the combination of Ost and Pumarola. 

	Regarding Claim 19, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 1,wherein the environment layer comprises a deformation module and a neural radiance module, the deformation module is configured to deform a spatial coordinate in accordance with a timestamp and a trained deformation weight, and the neural radiance module is configured to derive a density value and a color in accordance with the deformed spatial coordinate, the timestamp, a direction, and a trained radiance weight” (Ost, Section 3.1 recites “The static background
node function Fθbckg: (x,d) → (c,σ) maps a point x to its volumetric density and combined with a viewing direction to an emitted color. Thus, the background representation is implicitly stored in the weights θbckg”; see also Figure 2, top; where ‘θbckg, 1’ is a trained radiance weight; where  ‘θbckg, 2’ is a trained radiance weight. Pumarola, Section 4, paragraph 6 recites “As shown in previous works [34, 43, 26], directly feeding raw coordinates and angles to a neural network results in low performance. Thus, for both the canonical and the deformation networks, we first encode x, d and t into a higher dimension space”). The proposed combination as well as the motivation for combining the Ost and Pumarola references presented in the rejection of Claim 1, apply to Claim 19 and are incorporated herein by reference.  Thus, the method recited in Claim 19 is met by the combination of Ost and Pumarola.

	Regarding Claim 20, the combination of Ost and Pumarola teaches “The computer-implemented method of claim 1, wherein the environment layer comprises a multi-layer perceptron (MLP)” (Ost, Section 3.1, paragraph 1 recites “Thus, the background representation is implicitly stored in the weights θbckg. We use a set of mapping functions, Fourier encodings [37], to aid learning high-frequency functions in the MLP model.”)
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Ost et al. (Neural Scene Graphs for Dynamic Scenes, published March 5th, 2021), in view of Pumarola (D-NeRF: Neural Radiance Fields for Dynamic Scenes, published 2020), further in view of Oh (US 2022/0159261 A1), further in view of Wu (Visual Tracking With Multiview Trajectory Prediction, published 2020). 
	Regarding Claim 3, the combination of Ost and Pumarola does not explicitly teach the method of Claim 3. 
However, in an analogous field of endeavor, Oh teaches “The computer-implemented method of claim 1, further comprising: 
obtaining a point cloud for each frame of the plurality of videos, wherein each video of the plurality of videos comprises a plurality of frames” (Oh, [0326] discloses “A patch generator (or patch generation) 21002 generates patches from the point cloud data. The patch generator generates point cloud data or point cloud video as one or more pictures/frames”); 
“reconstruct a depth map for each view to be rendered to obtain a reconstructed depth map” (Oh, [0326] discloses “In addition, a geometry picture/frame, which is in the form of a depth map that represents the information about the position (geometry) of each point constituting the point cloud video on a patch-by-patch basis, may be generated”);
generating an initial 2D bounding-box in each view for each dynamic entity” (Oh, [0326] discloses “When points constituting the point cloud video is divided into one or more patches (sets of points that constitute the point cloud video, wherein the points belonging to the same patch are adjacent to each other in the 3D space and are mapped in the same direction among the planar faces of a 6-face bounding box when mapped to a 2D image) and mapped to a 2D plane, an occupancy map picture/frame in a binary map, which indicates presence or absence of data at the corresponding position in the 2D plane”; where a 6-face bounding box mapped to a 2D image is a 2D bounding-box; see in Fig. 7 2D patch); “and 

    PNG
    media_image10.png
    424
    703
    media_image10.png
    Greyscale

Fig. 7 of Oh
	
	It would have been obvious to one of ordinary skill in the art before the effective filing
date of the claimed invention to have modified the combination of Ost and Pumarola to incorporate the teachings of Oh by generating point cloud data, generating a depth map, and generating 2D projections of 3D bounding boxes, or patches. One of ordinary skill in the art would be motivated to combine the Ost, Pumarola, and Oh references in order to improve point cloud data transmission for universal applications: Oh, [0011] discloses “A point cloud data transmission method, a point cloud data transmission apparatus, a point cloud data reception method, and a point cloud data reception apparatus according to embodiments may provide universal point cloud content such as an self-driving service.” 
	The combination of Ost, Pumarola, and Oh does not explicitly teach “generating the 3D bounding-box for each dynamic entity using a trajectory prediction network (TPN).”
	However, in an analogous field of endeavor, Wu teaches “generating the 3D bounding-box for each dynamic entity using a trajectory prediction network (TPN)” (Wu, Section III. E. discloses “Our key novelty in multiview tracking is the proposed Trajectory Prediction Network (TPN) for handling tracking failure using cross-view trajectory prediction” and “At time t, based on 3D geometrical constrains, the object position gtb in view b can be transformed from its location gta in a).
	It would have been obvious to one of ordinary skill in the art before the effective filing
date of the claimed invention to have modified the combination of Ost, Pumarola, and Oh to incorporate the teachings of Wu by using a TPN to determine object position based on 3D geometrical constraints. One of ordinary skill in the art would be motivated to combine the Ost, Pumarola, Oh, and Wu references in order to detect object position even when there is occlusion: Wu, Abstract discloses “A key innovation in our framework is a cross-camera trajectory prediction network (TPN), which implicitly and dynamically encodes camera geometric relations, and hence addresses missing target issues such as occlusion.” Thus, Claim 3 is met by Ost, Pumarola, Oh, and Wu. 

Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Ost et al. (Neural Scene Graphs for Dynamic Scenes, published March 5th, 2021), in view of Pumarola (D-NeRF: Neural Radiance Fields for Dynamic Scenes, published 2020), further in view of Mildenhall et al. (NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, published 2020). 
Regarding Claim 7, the combination of Ost and Pumarola does not explicitly teach the method of Claim 7. 
However, in an analogous field of endeavor, Mildenhall teaches “The computer-implemented method of claim 3, wherein each frame comprises a frame number, and the frame number is encoded into a high dimension feature using positional encoding” (Mildenhall, Section 5.1 discloses “A similar mapping is used in the popular Transformer architecture [47], where it is referred to as a positional encoding. However, Transformers use it for a different goal of providing the discrete positions of tokens in a sequence as input to an architecture that does not contain any notion of order. In contrast, we use these functions to map continuous input coordinates into a higher dimensional space to enable our MLP to more easily approximate a higher frequency function; where the 5D input coordinates contain directional information of the direction, and thus represent a frame number (a view from a particular direction)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ost and Pumarola to incorporate the teachings of Mildenhall by applying positional encoding. Ost explicitly recites that the method of Ost builds on the teachings of Mildenhall: Ost, Section 3, paragraph 5 discloses “For the model representation nodes, F, we follow Mildenhall et al. [22] and represent scene objects as augmented implicit neural radiance fields.” One of ordinary skill in the art would be motivated to combine the Ost, Pumarola, and Mildenhall references in order to improve sampling to sample regions with visible content: Mildenhall, Section 3, paragraph 5 discloses “For the model representation nodes, F, we follow Mildenhall et al. [22] and represent scene objects as augmented implicit neural radiance fields.” Accordingly, the combination of Ost, Pumarola, and Mildenhall discloses the invention of Claim 7.

Regarding Claim 14, the combination of Ost and Pumarola does not explicitly teach “The computer-implemented method of claim 13, further comprising: applying an inverse transformation on sampled pixels for the dynamic entity.”
However, in an analogous field of endeavor, Mildenhall teaches “The computer-implemented method of claim 13, further comprising: 
applying an inverse transformation on sampled pixels for the dynamic entity” (Mildenhall, Section 5.2, paragraph 3 recites “We sample a second set of Nf locations from this distribution using inverse transform sampling, evaluate our ‘fine’ network at the union of the first and second set of samples, and compute the final rendered color of the ray using Eq. 3 but using all Nc + Nf samples”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Ost and Pumarola to incorporate the teachings of Mildenhall by applying an inverse transform. Ost explicitly recites that the method of Ost builds on the teachings of Mildenhall: Ost, Section 3, paragraph 5 discloses “For the model representation nodes, F, we follow Mildenhall et al. [22] and represent scene objects as augmented implicit neural radiance fields.” One of ordinary skill in the art would be motivated to combine the Ost, Pumarola, and Mildenhall references in order to improve sampling to sample regions with visible content: Mildenhall, Section 5.2, paragraph 3 discloses “This procedure allocates more samples to regions we expect to contain visible content. This addresses a similar goal as importance sampling, but we use the sampled values as a nonuniform discretization of the whole integration domain rather than treating each sample as an independent probabilistic estimate of the entire integral.” Accordingly, the combination of Ost, Pumarola, and Mildenhall discloses the invention of Claim 14.

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Ost et al. (Neural Scene Graphs for Dynamic Scenes, published March 5th, 2021), in view of Pumarola (D-NeRF: Neural Radiance Fields for Dynamic Scenes, published 2020), further in view of Zhou et al. (US 2016/0358315 A1).  
Regarding Claim 17, the combination of Ost and Pumarola does not explicitly teaches the method of Claim 17. 
However, in an analogous field of endeavor, Zhou teaches “The computer-implemented method of claim 2, further comprising: 
scaling a density value for the first dynamic entity with a scalar to obtain a scaled density value” (Zhou, [0132] discloses “The rendering density of the trace image may be related to a sampling frequency and/or the scaling value of trace image”); “and 
rendering the scene in accordance with the scaled density value for the first dynamic entity” (Zhou, [0132] discloses “Based on the determined rendering density, a dot or line pattern rendering mode can be selected.”)
	It would have been obvious to one of ordinary skill in the art before the effective filing
date of the claimed invention to have modified the combination of Ost and Pumarola to incorporate the teachings of Zhang by rendering at a scaled density value. One of ordinary skill in the art would be motivated to combine the Ost, Pumarola, and Zhang references in order to adjust rendering method for cost or effects: Zhang, [0052] discloses “In one embodiment, a first rendering rule may be selecting a rendering method, where the rendering method is a direct manifestation of a trace dynamic image. Different rendering methods may lead to different costs in rendering and different rendering effects.”  Accordingly, the combination of Ost, Pumarola, and Zhang discloses the invention of Claim 17.

Allowable Subject Matter
Claims 4 and 12 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 
Regarding Claim 4, none of the cited prior art Ost, Pumarola, Oh, nor Wu explicitly teaches the method of Claim 3. Zhang et al. (US 11184558 B1) teaches a framework for visual object tracking by producing segmentation masks, (Zhang, Abstract recites “A reframing engine may processes video clips using a segmentation and hotspot module to determine a salient region of an object, generate a mask of the object, and track the trajectory of an object in the video clips.”) However, none of the previously cited references explicitly teach an averaged depth value in accordance with the reconstructed depth map. Thus, none of the previously cited prior art provides a motivation to teach, alone or in combination, the ordered combination of “The computer-implemented method of claim 3, further comprising: predicting a mask of the a dynamic object in each frame from each view; calculating an averaged depth value of the dynamic object in accordance with the reconstructed depth map; obtaining a refined mask of the dynamic object in accordance with the calculated averaged depth value: and compositing a label map of the dynamic object in accordance with the refined mask.”

	Regarding Claim 12, none of the cited prior art Ost, Pumarola, Oh, nor Wu explicitly teaches the method of Claim 12. Although segmenting scenes by label map is well known in the art (Ehmann et al. (US 2015/0054974 A1) discloses “The initial segmentation map comprises an object label map U*, which is the crude segmentation of the current frame into background and foreground”), none of the cited prior art explicitly teaches a proportion of each dynamic object in accordance with a label map (understood under the broadest reasonable interpretation to be a size, portion or ratio of each dynamic object). Thus, none of the cited prior art provides a motivation to teach, alone or in combination, the ordered combination of  “The computer-implemented method of claim 11, further comprising: calculating a proportion of each dynamic object in accordance with a label map; training the environment layer, the dynamic entity layers for the first dynamic entity and the second dynamic entity in accordance with the proportion for the first dynamic entity and the second dynamic entity.”	

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CAROLINE TABANCAY DUFFY whose telephone number is (703)756-1859. The examiner can normally be reached Monday - Friday 8:00 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached at 5712723382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CAROLINE TABANCAY DUFFY/Examiner, Art Unit 2662                                                                                                                                                                                                        
/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662
Read full office action
Prosecution Timeline

Dec 19, 2023
Application Filed
Feb 04, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/225,348
Patent 12602753
ULTRASOUND IMAGE PROCESSING APPARATUS
2y 5m to grant Granted Apr 14, 2026
18/279,316
Patent 12602788
METHOD AND SYSTEM FOR FULLY AUTOMATICALLY SEGMENTING CEREBRAL CORTEX SURFACE BASED ON GRAPH NETWORK
2y 5m to grant Granted Apr 14, 2026
18/187,649
Patent 12597130
IMAGE PROCESSING APPARATUS, OPERATION METHOD OF IMAGE PROCESSING APPARATUS, AND OPERATION PROGRAM OF IMAGE PROCESSING APPARATUS
2y 5m to grant Granted Apr 07, 2026
18/000,359
Patent 12580081
SYSTEMS AND METHODS FOR DIRECTLY PREDICTING CANCER PATIENT SURVIVAL BASED ON HISTOPATHOLOGY IMAGES
2y 5m to grant Granted Mar 17, 2026
18/333,600
Patent 12567130
REAL-TIME BLIND REGISTRATION OF DISPARATE VIDEO IMAGE STREAMS
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
80%
Grant Probability
99%
With Interview (+26.9%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 78 resolved cases by this examiner. Grant probability derived from career allow rate.