Last updated: May 29, 2026
Application No. 18/799,247
POINT-BASED NEURAL RADIANCE FIELD FOR THREE DIMENSIONAL SCENE REPRESENTATION

Non-Final OA §103
Filed
Aug 09, 2024
Priority
Jul 09, 2022 — continuation of 12/073,507
Examiner
SHENG, XIN
Art Unit
2619
Tech Center
2600 — Communications
Assignee
Adobe Inc.
OA Round
1 (Non-Final)
Interview Optional

— +17.2% interview lift. Examiner has a relatively high allowance rate (72%); +17.2% interview lift. A written response may suffice.
Based on 404 resolved cases, 2023–2026
Examiner Intelligence

SHENG, XIN View full profile →
Grants 72% — above average
Career Allowance Rate
293 granted / 404 resolved
+10.5% vs TC avg
Strong +17% interview lift
Without
With
+17.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
14 currently pending
Career history
421
Total Applications
across all art units
Statute-Specific Performance

§101
1.6%
-38.4% vs TC avg
§103
94.5%
+54.5% vs TC avg
§102
1.0%
-39.0% vs TC avg
§112
0.3%
-39.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 404 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 5, 8-9, 12, 15-16, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Mildenhall et al ("Nerf: Representing scenes as neural radiance fields for view synthesis." Communications of the ACM 65.1 (2021): 99-106.) in view of Tay et al (US20190311546), Carrigg et al (US20200271451) further in view of Sawazaki (JP2005250662).

Regarding Claim 1. Mildenhall teaches A method, comprising:
accessing a plurality of input two-dimensional (2D) images corresponding to a plurality of views of an object (Mildenhall, abstract, the paper describes a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. The algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x; y; z) and viewing direction (θ, ɸ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
Page 2, Figure 1 presents a method that optimizes a continuous 5D neural radiance field representation (volume density and view-dependent color at any continuous location) of a scene from a set of input images. We use techniques from volume rendering to accumulate samples of this scene representation along rays to render the scene from any viewpoint. Here, we visualize the set of 100 input views of the synthetic Drums scene randomly captured on a surrounding hemisphere, and we show two novel views rendered from our optimized NeRF representation.);

Mildenhall fails to explicitly teach, however, Tay teaches a point cloud generation model (Tay, abstract, the invention describes method of rendering 2D and 3D data within a 3D virtual environment in the field of autonomous vehicles. The method includes accessing a 2D color image recorded by a 2D color camera and a 3D point cloud recorded by a 3D depth sensor at approximately a first time, the 2D color camera and the 3D depth sensor defining intersecting fields of view and facing outwardly from an autonomous vehicle; detecting a cluster of points in the 3D point cloud representing a continuous surface approximating a plane; isolating a cluster of color pixels in the 2D color image depicting the continuous surface; projecting the cluster of color pixels onto the plane to define a set of synthetic 3D color points in the 3D point cloud, the cluster of points and the set of synthetic 3D color points representing the continuous surface; and rendering points in the 3D point cloud and the set of synthetic 3D color points on a display.
[0010] As shown in FIGS. 1 and 2, a first method S100 for rendering 2D and 3D data within a 3D virtual environment.).
Mildenhall and Tay are analogous art because they both teach method of rendering 3D scene based on 2D image inputs and 3D information such as depth and/or point cloud. Tay further teaches method of calculating pixel color based on 2D image inputs and the 3D point cloud. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the 3D scene rendering method based on 2D images and 3D information (taught in Mildenhall), to further project the 2D image to the 3D point cloud (taught in Tay), so as to detecting color for cluster of points representing continuous surface at a certain time period which can be used in autonomous vehicle viewing aide (Tay, [0053-0055]).

The combination of Mildenhall and Tay fails to explicitly teach, however, Carrigg teaches extracting, from the input 2D images using a point cloud generation model, 2D image feature maps describing edges and corners of the input 2D images (Carrigg, abstract, the invention describes method for using a transport device (TD) to traverse substantially discontinuous surface feature (SDSF). The transport device (TD) can be, for example, but not limited to, an autonomous device or a semi-autonomous device, to navigate in environments that can include features such as substantially discontinuous surface features. The substantially discontinuous surface feature traversal feature can enable the TD to travel on an expanded variety of surfaces. In particular, substantially discontinuous surface features can be accurately identified and labeled so that the TD can automatically maintain the performance of the TD during ingress and egress of the substantially discontinuous surface feature.
[0002) The present teachings relate generally to surface feature detection and traversal. Surface feature traversal is challenging because surface features, for example, but not limited to, substantially discontinuous surface features (SDSFs), can be found amidst heterogeneous topology, and that topology can be unique to a specific geography. SDSFs, such as, for example, but not limited to, inclines, edges, curbs, steps, and curb-like geometries (referred to herein, in a non-limiting way, as SDSFs or simply surface features), however, can include some typical characteristics that can assist in their identification.  
[0024] … Yet another aspect of the SDSF traversal feature of the present teachings is that traversing SDSFs of varying geometries is possible. Geometries can include, for example, but not limited to, squared and contoured SDSFs.
[0052] Referring now to FIG. 3, in some configurations, map processor 104 can include, but is not limited to including, feature extraction that can include line of sight filtering 121 of point cloud data 131 and mapped trajectory 133. Line of sight filtering can remove points that are hidden from the direct line of sight of the sensors collecting the point cloud data and forming the mapped trajectory. Reduced point cloud data 132 can be further processed by organizing 151 reduced point cloud data 132 according to pre-selected criteria possibly associated with a specific feature.);
Mildenhall, Tay and Carrigg are analogous art because they all teach method of rendering 3D scene based on 2D image inputs and 3D information such as depth and/or point cloud. Carrigg further teaches method of associate points with detected surface features. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the 3D scene rendering method based on 2D images and 3D information (taught in Mildenhall and Tay), to further classify points and associate it with detected surface features (taught in Carrigg), so as to provide a method to locate surface feature based on a multi-part model that is associated with several criteria for SDSF identification (Carrigg, [0002-0003]).

The combination of Mildenhall, Tay and Carrigg further teaches generating, using the 2D image feature maps, a neural point cloud representing a three-dimensional (3D) scene that includes the object (Tay, [0014] By then projecting 2D image feeds from these laterally-facing cameras onto their corresponding image planes while rendering concurrent 3D LIDAR data (e.g., concurrent 3D point clouds recorded by a set of LIDAR sensors on the autonomous vehicle) within the virtual environment according to Blocks of the first method S100, the system can present clusters of points passing through an image plane while 2D images projected onto the image plane depict a vehicle moving past the field of view of the camera that recorded these 2D images, as shown in FIGS. 3A-3D. The human annotator viewing the virtual environment may quickly, visually discern a correlation between this cluster of points and the vehicle depicted in this sequence of 2D images projected onto the image plane given alignment between these 3D and 2D data within the 3D virtual environment. Accordingly, the human annotator may quickly select and label this cluster of points as a vehicle.);
determining, using a neural point volume rendering model applied to the neural point cloud, a color value for each pixel of an output image (Mildenhall, page 4, par 4, page 5, par 1, we represent a continuous scene as a 5D vector-valued function whose input is a 3D location x = (x; y; z) and 2D viewing direction (θ, ɸ), and whose output is an emitted color c = (r; g; b) and volume density σ. In practice, we express direction as a 3D Cartesian unit vector d. We approximate this continuous 5D scene representation with an MLP network Fϴ: (x; d)[Wingdings font/0xE0] (c; σ) and optimize its weights ϴ to map from each input 5D coordinate to its corresponding volume density and directional emitted color.
Page 5, par 2-3, We encourage the representation to be multiview consistent by restricting the network to predict the volume density σ as a function of only the location
x, while allowing the RGB color c to be predicted as a function of both location and viewing direction. To accomplish this, the MLP Fϴ first processes the input 3D coordinate x with 8 fully-connected layers (using ReLU activations and 256 channels per layer), and outputs σ and a 256-dimensional feature vector. This feature vector is then concatenated with the camera ray's viewing direction and passed to one additional fully-connected layer (using a ReLU activation and 128 channels) that output the view-dependent RGB color.
See Fig. 3 for an example of how our method uses the input viewing direction
to represent non-Lambertian effects. As shown in Fig. 4, a model trained without view dependence (only x as input) has difficulty representing specularities.
Tay, [0017] The autonomous vehicle can also implement one or more local neural networks to process LIDAR feeds (i.e., sequences of LIDAR images) video feeds (or sequences of color photographic images), and/or other sensor data substantially in real-time in order to localize the autonomous vehicle to a known location and orientation in real space, to interpret (or "perceive") its surroundings, and to then select and execute navigational actions. For example, a controller integrated into the autonomous vehicle can: pass LIDAR and video feeds into a localization/perception neural network to detect and characterize static object),

The combination of Mildenhall, Tay and Carrigg fails to explicitly teach, however, Sawazaki teaches each color value based on a shading point color value and a density value (Sawazaki, abstract, [0001], the invention describes a line drawing apparatus, a line drawing method, and a line drawing program, and is particularly suitable for anti-aliasing of lines included in the outline of a filled figure. The system includes: a shading module LM calculates a concentration value under the consideration of a straight line Lcol or a background color Bcol in each pixel; and a shading module RM calculates a concentration value under the consideration of a painted color Fcol or a straight line color Lcol in each pixel; and a color compounding
module M0 calculates the output color information Col of each pixel by compounding
the concentration values of those respective pixels.
[0008] Furthermore, according to a linear drawing apparatus according to one aspect of the present invention, the color information calculation means comprises: a first shading module that calculates a density value of a fill color or a background color in a pixel based on the proportion of the fill area or a background area that occupies the pixel; a second shading module that calculates a density value of a linear color in a pixel based on the proportion of the linear line that separates the fill area and the background area that occupies the pixel; and a color synthesis module that calculates color information of the pixel relating to the linear line based on the density value of at least one of the fill color or background color and the density value of the linear color.); and 
Mildenhall, Tay, Carrigg and Sawazaki are analogous art because they all teach method of rendering image pixel color. The combination of Mildenhall, Tay and Carrigg further teaches rendering 3D scene based on 2D image inputs and 3D information such as depth and/or point cloud. Carrigg further teaches method of rendering image color pixels associated with detected surface features. Sawazaki further teaches synthesis pixel color based on shading module density value calculation. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the 3D scene rendering method based on 2D images and 3D information (taught in Mildenhall, Tay and Carrigg), to further use the pixel color synthesis method based on shading module density value calculation (taught in Sawazaki), so as to provide a method to perform anti-aliasing of straight line of image pixels (Sawazaki, abstract, [0007]).

The combination of Mildenhall, Tay, Carrigg and Sawazaki further teaches generating the output image of the 3D scene based on the neural point cloud and the color value for each pixel; and storing the output image of the 3D scene(Tay, [0014] By then projecting 2D image feeds from these laterally-facing cameras onto their corresponding image planes while rendering concurrent 3D LIDAR data (e.g., concurrent 3D point clouds recorded by a set of LIDAR sensors on the autonomous vehicle) within the virtual environment according to Blocks of the first method S100, the system can present clusters of points passing through an image plane while 2D images projected onto the image plane depict a vehicle moving past the field of view of the camera that recorded these 2D images, as shown in FIGS. 3A-3D. The human annotator viewing the virtual environment may quickly, visually discern a correlation between this cluster of points and the vehicle depicted in this sequence of 2D images projected onto the image plane given alignment between these 3D and 2D data within the 3D virtual environment. Accordingly, the human annotator may quickly select and label this cluster of points as a vehicle.
In order to display the generated 3D virtual environment on the screen, the 3D output image has to be stored in memory including cache.). 

Regarding Claim 2. The combination of Mildenhall, Tay, Carrigg and Sawazaki further teaches The method of claim 1, further comprising receiving at least one of the input 2D images from a camera device (Tay, page 15, claim 1, A method for augmenting 3D depth map data with 2D color image data comprising: accessing a first 2D color image recorded at a first time via a 2D color camera arranged on an autonomous vehicle;).
The reasoning for combination of Mildenhall, Tay, Carrigg and Sawazaki is the same as described in Claim 1.

Regarding Claim 5. The combination of Mildenhall, Tay, Carrigg and Sawazaki further teaches The method of claim 1, wherein the neural point cloud comprises a plurality of neural points, wherein generating the neural point cloud comprises assigning, to each neural point of the plurality of neural points, a location, a confidence value representing a probability that the location is within a proximity to a surface of the object within the 3D scene, and a feature representing an appearance of the 3D scene at the location (Carrigg, abstract, the invention describes method for using a transport device (TD) to traverse substantially discontinuous surface feature (SDSF). The transport device (TD) can be, for example, but not limited to, an autonomous device or a semi-autonomous device, to navigate in environments that can include features such as substantially discontinuous surface features. The substantially discontinuous surface feature traversal feature can enable the TD to travel on an expanded variety of surfaces. In particular, substantially discontinuous surface features can be accurately identified and labeled so that the TD can automatically maintain the performance of the TD during ingress and egress of the substantially discontinuous surface feature.
[0081] Referring now to FIG. 19, system 1100 for navigating a TD towards a goal point across at least one SDSF can include, but is not limited to including, path line processor 1103, SDSF detector 1109, and SDSF controller 1127. System 1100 can be operably coupled with surface processor 1601 that can process sensor information that can include, for example, but not limited to, images of the surroundings of TD 101 (FIG. 20A). Surface processor 1601 can provide real-time surface feature updates, including
indications of SDSFs. In some configurations, cameras can provide RGB-D data whose points can be classified according to surface type. In some configurations, system 1100 can process the points that have been classified as SDSFs and their associated probabilities.
Mildenhall, page 4, par 4, page 5, par 1, we represent a continuous scene as a 5D vector-valued function whose input is a 3D location x = (x; y; z) and 2D viewing direction (θ, ɸ), and whose output is an emitted color c = (r; g; b) and volume density σ.).
The reasoning for combination of Mildenhall, Tay, Carrigg and Sawazaki is the same as described in Claim 1.

Claim 8 is similar in scope as Claim 1, and thus is rejected under same rationale.
Claim 9 is similar in scope as Claim 2, and thus is rejected under same rationale.
Claim 12 is similar in scope as Claim 5, and thus is rejected under same rationale.
Claim 15 is similar in scope as Claim 1, and thus is rejected under same rationale.
Claim 16 is similar in scope as Claim 2, and thus is rejected under same rationale.
Claim 18 is similar in scope as Claim 5, and thus is rejected under same rationale.

Claims 3-4, 10-11, 17 are rejected under 35 U.S.C. 103 as being unpatentable over Mildenhall et al in view of Tay et al, Carrigg et al, Sawazaki further in view of Long et al (US11017586).

Regarding Claim 3. The combination of Mildenhall, Tay, Carrigg and Sawazaki further teaches The method of claim 1, further comprising accessing view coordinates defining a viewing angle for the 3D scene (Mildenhall, page 5, par 2-3, We encourage the representation to be multiview consistent by restricting the network to predict the volume density σ as a function of only the location x, while allowing the RGB color c to be predicted as a function of both location and viewing direction. To accomplish this, the MLP Fϴ first processes the input 3D coordinate x with 8 fully-connected layers (using ReLU activations and 256 channels per layer), and outputs σ and a 256-dimensional feature vector. This feature vector is then concatenated with the camera ray's viewing direction and passed to one additional fully-connected layer (using a ReLU activation and 128 channels) that output the view-dependent RGB color.),

The combination of Mildenhall, Tay, Carrigg and Sawazaki fails to explicitly teach, however, Long teaches wherein the output image represents the 3D scene from the viewing angle (Long, abstract, the invention describes systems and methods for generating a three-dimensional (3D) effect from a two-dimensional (2D) image. The methods may include generating a depth map based on a 2D image, identifying a camera path, generating one or more extremal views based on the 2D image and the camera path, generating a global point cloud by inpainting occlusion gaps in the one or more extremal views, generating one or more intermediate views based on the global point cloud and the camera path, and combining the one or more extremal views and the one or more intermediate views to produce a 3D motion effect.).
Mildenhall, Tay, Carrigg, Sawazaki and Long are analogous art because they all teach method of image rendering. The combination of Mildenhall, Tay and Carrigg further teaches rendering 3D scene based on 2D image inputs and 3D information such as depth and/or point cloud. Long further teaches method of rendering image based on detected camera viewing angle. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the 3D scene rendering method based on 2D images and 3D information (taught in Mildenhall, Tay, Carrigg and Sawazaki), to further use the viewing angle image synthesis method (taught in Long), so as to provide a method to generate realistic 3D motion effect (Long, col 1, line 5-42).

Regarding Claim 4. The combination of Mildenhall, Tay, Carrigg, Sawazaki and Long further teaches The method of claim 3, further comprising receiving the view coordinates from a user interface (Tay, [0024) The system can then: define an image plane normal to the ray intersecting the end of the ray (i.e., offset from the camera origin in the horizontal plane by the ray length); and calculate a transform that projects the field of view of the camera onto the image plane based on the known position and orientation of the camera on the autonomous vehicle (e.g., relative to the reference point on the autonomous vehicle) and based on known intrinsic and extrinsic properties of the camera (e.g., focal length, imager dimensions, zoom, inherent optical aberration). The system (e.g., the annotation portal) can later project a sequence of 2D images- recorded by the camera-onto the image plane based on this transform while simultaneously rendering concurrent LIDAR frames in the virtual environment.
It is common in the art that user can move the viewing position and/or angle of a 3D scene by dragging the image with mouse.).
The reasoning for combination of Mildenhall, Tay, Carrigg, Sawazaki and Long is the same as described in Claim 3.

Claim 10 is similar in scope as Claim 3, and thus is rejected under same rationale.
Claim 11 is similar in scope as Claim 4, and thus is rejected under same rationale.
Claim 17 is similar in scope as Claim 3, and thus is rejected under same rationale.

Claims 6, 13, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Mildenhall et al in view of Tay et al, Carrigg et al, Sawazaki further in view of Wang et al (US20230237728).

Regarding Claim 6. The combination of Mildenhall, Tay, Carrigg and Sawazaki fails to explicitly teach, however, Wang teaches The method of claim 1, wherein determining the color value for each pixel of the output image comprises:
projecting a ray through the pixel into the neural point cloud representing the 3D scene; and
selecting a plurality of shading points along the ray, each of the plurality of shading points being located within a predefined proximity of the one or more neural points of the neural point cloud (Wang, abstract, the invention describes methods of rendering images using explicit object representation via rays tracing volume density aggregation. The techniques comprise reconstructing an object into a plurality of Gaussian ellipsoids; determining a volume density of each of the plurality of Gaussian ellipsoids along each of a plurality of viewing rays; determining a weight of each of the plurality of Gaussian ellipsoids based on the volume density; and synthesizing an image of the object using the determined weight on each pixel of the image to interpolate attributes of each of the plurality of Gaussian ellipsoids.
[0032] VoGE represents the object 302 using explicit object geometries (e.g., Gaussian ellipsoids), while renders via the rays tracing volume density aggregation. During rendering, given the Gaussian ellipsoids, e.g., the set of anisotropic 3D Gaussian reconstruction kernels, VoGE may first sample viewing rays r(t) with a camera configuration. Then for each ray, the volume density may be formulated as a function of location along the ray with respect to each ellipsoid.
[0033] Along each ray, VoGE (volume renderer using neural Gaussian Ellipsoids) may compute the density of each ellipsoid pk(r(t)), respectively. Occupancy along each ray may be computed via an integral of the volume density, and the contribution of each ellipsoid may be reweighted.
[0039] The anisotropic Gaussian ellipsoid kernels may reconstruct any arbitrary 3D shapes, which allows to convert common representations (e.g., meshes and point clouds) into Gaussian ellipsoids. For example, when converting meshes to Gaussian ellipsoids, Lk may be computed based on the distance from k-th vertex to its neighbors. In another example, point clouds may be easily converted via homogeneous isotropic Gaussians.
Therefore, point clouds can be converted to Gaussian ellipsoid kernels. The size of the point clouds is related to the distance to its neighbors. When calculating the aggregated density value along the ray, the intersected ellipsoid kernels will be considered. In order to intersect the ray, the kernels will need to located within the distance less than the distance between neighboring points.).
Mildenhall, Tay, Carrigg, Sawazaki and Wang are analogous art because they all teach method of image rendering. The combination of Mildenhall, Tay and Carrigg further teaches rendering 3D scene based on 2D image inputs and 3D information such as depth and/or point cloud. Wang further teaches pixel color calculation using method of rays tracing volume density aggregation. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the 3D scene rendering method based on 2D images and 3D information (taught in Mildenhall, Tay, Carrigg and Sawazaki), to further use the method of rays tracing volume density aggregation (taught in Wang), so as to provide a image rendering method which compute the density aggregation without computational heavy operations (Wang, [0023]).

Claim 13 is similar in scope as Claim 6, and thus is rejected under same rationale.
Claim 19 is similar in scope as Claim 6, and thus is rejected under same rationale.

Claims 7, 14, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Mildenhall et al in view of Tay et al, Carrigg et al, Sawazaki, Wang et al further in view of Hedman et al. ("Baking neural radiance fields for real-time view synthesis.", IEEE, 2021).

Regarding Claim 7. The combination of Mildenhall, Tay, Carrigg, Sawazaki and Wang fails to explicitly teach, however, Hedman teaches The method of claim 6, further comprising:
applying a first multilayer perceptron to determine a point-specific feature vector for each of the plurality of shading points; and
applying a second multilayer perceptron the point-specific feature vector to determine the shading point color value and the density value (Hedman, abstract, the paper describes a method to train a NeRF (Neural Radiance Fields), then precompute and store (i.e. “bake”) it as a novel representation called a Sparse Neural Radiance Grid (SNeRG) that enables real-time rendering on commodity hardware. To achieve this, we introduce 1) a reformulation of NeRF’s architecture, and 2) a sparse voxel grid representation with learned feature vectors. The resulting scene representation retains NeRF’s ability to render fine geometric details and view-dependent appearance, is compact (averaging less than 90 MB per scene), and can be rendered in real-time (higher than 30 frames per second on a laptop GPU). Actual screen captures are shown in our video.
Page 4, col 2, par 2, NeRF’s MLP (multilayer perceptron) can be thought of as predicting a 256-dimensional feature vector for each input 3D location, which is then concatenated with the viewing direction and decoded into an RGB color. NeRF then accumulates these view-dependent colors into a single pixel color. However, evaluating an MLP at every sample along a ray to estimate the view-dependent color is prohibitively expensive for real-time rendering. Instead, we modify NeRF to use a strategy similar to deferred rendering [10, 42]. We restructure NeRF to output a diffuse RGB color cd and a 4-dimensional feature vector vs (which is constrained to [0; 1] via a sigmoid so that it can be compressed, as discussed in Section 5.4) in addition to the volume density _ at each input 3D location: 

    PNG
    media_image1.png
    35
    537
    media_image1.png
    Greyscale
 
To render a pixel, we accumulate the diffuse colors and feature vectors along each ray and pass the accumulated feature vector and color, concatenated to the ray’s direction,
to a very small MLP with parameters ɸ (2 layers with 16 channels each) to produce a view-dependent residual that we add to the accumulated diffuse color:

    PNG
    media_image2.png
    196
    572
    media_image2.png
    Greyscale
).
Mildenhall, Tay, Carrigg, Sawazaki, Wang and Hedman are analogous art because they all teach method of image rendering. The combination of Mildenhall, Tay and Carrigg further teaches neural rendering 3D scene based on 2D image inputs and 3D information such as depth and/or point cloud. Hedman further teaches a Sparse Neural Radiance Grid (SNeRG) using two multilayer perceptron (MLP) to rendering viewpoint scene. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the 3D scene rendering method based on 2D images and 3D information (taught in Mildenhall, Tay, Carrigg, Sawazaki and Wang), to further use the method of Sparse Neural Radiance Grid (SNeRG) using two multilayer perceptron (MLP) (taught in Hedman), so as to render fine geometric details and view-dependent appearance in real-time (Hedman, abstract).

Claim 14 is similar in scope as Claim 7, and thus is rejected under same rationale.
Claim 20 is similar in scope as Claim 7, and thus is rejected under same rationale.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIN SHENG whose telephone number is (571)272-5734. The examiner can normally be reached M-F 9:30AM-3:30PM 6:00PM-8:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jason Chan can be reached at 5712723022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Xin Sheng/Primary Examiner, Art Unit 2619
Read full office action
Prosecution Timeline

Aug 09, 2024
Application Filed
Apr 22, 2026
Non-Final Rejection mailed — §103
May 05, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

18/173,623
Patent 12626326
IMAGE STITCHING WITH AN ADAPTIVE THREE-DIMENSIONAL BOWL MODEL OF THE SURROUNDING ENVIRONMENT FOR SURROUND VIEW VISUALIZATION
3y 2m to grant Granted May 12, 2026
18/367,119
Patent 12620165
SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR POPULATING ENVIRONMENT MODELS
2y 7m to grant Granted May 05, 2026
18/367,115
Patent 12614341
SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR POPULATING ENVIRONMENT MODELS
2y 7m to grant Granted Apr 28, 2026
18/490,458
Patent 12614337
SYSTEM AND METHODS FOR CUSTOMIZING 3D MODELS
2y 6m to grant Granted Apr 28, 2026
18/796,576
Patent 12614366
AUTOMATIC POINT CLOUD BUILDING ENVELOPE SEGMENTATION (AUTO-CuBES) USING MACHINE LEARNING
1y 8m to grant Granted Apr 28, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
72%
Grant Probability
90%
With Interview (+17.2%)
2y 4m (~6m remaining)
Median Time to Grant
Low
PTA Risk
Based on 404 resolved cases by this examiner. Grant probability derived from career allowance rate.