Last updated: April 19, 2026
Application No. 18/674,709
METHOD AND SYSTEM FOR NOVEL-VIEW IMAGE SYNTHESIS AND RENDERING, DEVICE AND MEDIUM

Non-Final OA §103
Filed
May 24, 2024
Examiner
TRUONG, KARL DUC
Art Unit
2614
Tech Center
2600 — Communications
Assignee
Sichuan Digital Economy Research Institute (Yibin)
OA Round
1 (Non-Final)
This examiner grants 52% of cases after interview

— +31.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 29 resolved cases, 2023–2026
Examiner Intelligence

TRUONG, KARL DUC View full profile →
Grants 52% of resolved cases
Career Allow Rate
15 granted / 29 resolved
-10.3% vs TC avg
Strong +31% interview lift
Without
With
+31.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
45 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
3.2%
-36.8% vs TC avg
§103
85.3%
+45.3% vs TC avg
§102
9.5%
-30.5% vs TC avg
§112
2.1%
-37.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 29 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. CN2024101601447, filed on 4th February, 2024.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
“an acquisition module, configured to acquire initial information of a target model” in Claim 6;
“a processing module, configured to perform neural texturing on the initial information to obtain neural texture (NT) information” in Claim 6;
“a rendering module, configured to input the NT information to a synthesis rendering model to obtain a rendered image” in Claim 6;
“the NT input module is configured to receive the NT information” in Claim 6; and
“the NT learning network module is configured to perform convolution and activation as well as concatenation on the NT information to obtain NT processed information”.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 4-8, 10-13, and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Thies et al. ("Deferred Neural Rendering: Image Synthesis using Neural Textures"), hereinafter referenced as Thies, in view of Wang et al. ("NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects"), hereinafter referenced as Wang.

Regarding Claim 1, Thies discloses a method for novel-view image synthesis and rendering (Thies, [Section 2.5 View Synthesis Using Neural Networks]: teaches a process <read on method> of generating <read on rendering> novel views based on a large corpus of posed training images that are synthesized by learned warping or a layered scene representation), comprising:
acquiring initial information of a target model (Thies, FIG. 2 teaches a neural rendering pipeline that takes, as input, an object <read on target model> with valid uv-map parameterization and an associated neural texture map <read on initial information>; Note: the uv-texture parameterization and the associated neural texture map are both being interpreted as initial information), wherein
    PNG
    media_image1.png
    380
    1321
    media_image1.png
    Greyscale

the target model is a physical model of an experimental object (Thies, FIG. 2 teaches an image of an object <read on experimental object>, based on a real-world target <read on physical model> object, containing valid uv-map parameterization and an associated neural texture map);
the initial information comprises screen space information and auxiliary input information (Thies, [Section 4 Deferred Neural Rendering]: teaches using additional information <read on auxiliary input information> of a 3D reconstructed object, such as a valid uv-texture parameterization, an associated neural texture map, and a target view, to render a view-dependent screen-space feature map; Note: screen space information includes information pertaining to a camera perspective <read on screen space information>, such as a view or viewing ray);
[[the screen space information comprises a diffuse reflection texture, a combined texture, and a normal texture; and]]
the auxiliary input information comprises a current view and [[a current light]] (Thies, [Section 4 Deferred Neural Rendering]: teaches using additional information <read on auxiliary input information> of a 3D reconstructed object, such as a valid uv-texture parameterization, an associated neural texture map, and a target view <read on current view>, to render a view-dependent screen-space feature map);
performing neural texturing on the initial information to obtain neural texture (NT) information (Thies, FIG. 2 teaches the neural rendering pipeline taking a neural texture and obtaining a sampled texture <read on NT information>); and
inputting the NT information to a synthesis rendering model to obtain a rendered image (Thies, FIG. 2 teaches the neural rendering pipeline <read on synthesis rendering model>, taking the sampled texture and inputting it into the neural renderer which then renders an image of the target object),
the synthesis rendering model comprising an NT input module, an NT learning network module, and [[a differentiable renderer that are connected to each other]] (Thies, [Section 4 Deferred Neural Rendering]: teaches a neural rendering pipeline <read on synthesis rendering model>, taking a valid uv-texture parameterization, an associated neural texture map, and a target view of a given 3D reconstructed object as input; Note: it should be noted that although an NT input module is not expressly stated, one skilled in the art would understand that an input component hardware must be present to receive any data; [Section 4 Deferred Neural Rendering]: teaches the deferred neural renderer combining aspects from a traditional graphics pipeline with learnable components <read on NT learning network module>; FIG. 2 teaches the deferred neural renderer being a part of the overall neural rendering pipeline), wherein
the NT input module is configured to receive the NT information (Thies, FIG. 2 teaches the neural rendering pipeline taking a neural texture and obtaining a sampled texture <read on NT information>), and
transmit the NT information to the NT learning network module (Thies, [Section 4.3 Differentiable Sampling of Neural Textures]: teaches performing bilinear sampling and a standard graphics pipeline to rasterize a coarse proxy geometry and sample a neural texture, which results in obtaining a view-dependent screen space feature map, which is then used by the deferred neural renderer <read on transmit NT information to NT learning network module>);
the NT learning network module is configured to perform convolution and activation as well as concatenation on the NT information to obtain NT processed information (Thies, [Appendix A Network Architecture]: teaches the rendering network being an encoder-decoder network with skip connections, where both the encoder and decoder each contain 5 convolutional layers with corresponding instance normalization, and a leaky RELU activation; [Section 3 Overview]: teaches neural textures and the deferred neural renderer are learned jointly <read on concatenation>, where the neural textures are sampled, resulting in a feature map <read on obtain NT processed information> in target space; Note: it should be noted that one skilled in the art would understand that "concatenation" involves joining the outputs or branches of different networks or layers to create a larger, combined representation); and
[[the differentiable renderer is configured to adjust and render the NT processed information in a neural rendering manner to obtain the rendered image.]]

However, Thies does not expressly disclose
the screen space information comprises a diffuse reflection texture, a combined texture, and a normal texture; and
the auxiliary input information comprises a current view and a current light;
the synthesis rendering model comprising an NT input module, an NT learning network module, and a differentiable renderer that are connected to each other, wherein
the differentiable renderer is configured to adjust and render the NT processed information in a neural rendering manner to obtain the rendered image.

Wang discloses
the screen space information comprises a diffuse reflection texture, a combined texture, and a normal texture (Wang, FIG. 2 teaches a Ray Bending Network (RBN) performing sphere tracing to calculate the surface normal <read on normal texture> of the target object, which is used to calculate the refracted direction                                 
                                    
                                        
                                            ω
                                        
                                        
                                            t
                                        
                                    
                                
                             and the refractive index                                 
                                    
                                        
                                            η
                                        
                                        
                                            t
                                        
                                    
                                
                             <read on diffuse reflection texture>, where these values are sent to Forward Rendering, which uses a view-dependent reflected radiance                                 
                                    
                                        
                                            L
                                        
                                        
                                            r
                                        
                                    
                                
                             <read on combined texture> to calculate the final output; Note: it should be noted that on Paragraph [0071] of the specification, a combined texture has "a reflection coefficient of a specular surface, a roughness and a metallicity"; furthermore, it is common in the art to use opacity alpha values, surface roughness values, and reflectivity values for realistic and accurate light-interaction calculations; in addition, FIG. 6 discloses examples of a surface normal, relight, and novel view of the 3D reconstructed target object); and
    PNG
    media_image2.png
    656
    1219
    media_image2.png
    Greyscale


the auxiliary input information comprises a current view and a current light (Wang, [Section 3.3 Ray Bending Network]: teaches a Neural Environment Matting, where the "ray bending network (RBN) directly estimates                                 
                                    
                                        
                                            ω
                                        
                                        
                                            t
                                        
                                    
                                
                            , mapping incident rays <read on current light> intersecting the scene object to the final refracted ray direction, thereby learning how the transparent object refracts environment light and implicitly represents the refractive index through the network");
the synthesis rendering model comprising an NT input module, an NT learning network module, and a differentiable renderer that are connected to each other (Wang, [Section 3.4 Forward Rendering]: teaches a forward rendering module that is differentiable and designed for physical plausibility; FIG. 2 teaches the forward rendering module being connected to the Ray Bending Network (RBN), which takes neural information from the Geometry Network; Note: a differentiable renderer allows for the computation of gradients through the entire rendering process, which establishes a link between 2D image pixels and 3D properties of a scene), wherein
the differentiable renderer is configured to adjust and render the NT processed information in a neural rendering manner to obtain the rendered image (Wang, FIG 2. teaches calculating the reflection direction                                 
                                    
                                        
                                            ω
                                        
                                        
                                            r
                                        
                                    
                                
                             through                                 
                                    
                                        
                                            ω
                                        
                                        
                                            i
                                        
                                    
                                
                             and n to render the viewing ray and calculated reflection rays <read on adjust NT processed information> using the differentiable forward renderer).




Wang is analogous art with respect to Thies because they are from the same field of endeavor, namely processing neural textures for novel view synthesis. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement the NEMTO framework, which includes a Geometry Network, a Ray Bending Network, and a Differentiable Forward Rendering Network as taught by Wang into the teaching of Thies. The suggestion for doing so would allow for more accurate light refraction and reflection modeling with regards to transparent/translucent objects. Therefore, it would have been obvious to combine Wang with Thies.

Regarding Claim 6, it recites the limitations that are similar in scope to Claim 1, but in a system. As shown in the rejection, the combination of Thies and Wang discloses the limitations of Claim 1. Additionally, Thies discloses a system for novel-view image synthesis and rendering (Thies, [Section 5.2 Animation Synthesis]: teaches using an Nvidia 1080 Ti GPU for processing neural textures <read on novel-view image synthesis> and performing neural rendering <read on novel-view image rendering>; Note: it should be noted that one skilled in the art would understand that in order to use a GPU, an electronic device or system, such as a computer, must be used), comprising:…

Thus, Claim 6 is met by Thies according to the mapping presented in the rejection of Claim 1, given the method corresponds to a system.

Regarding Claim 7, the combination of Thies and Wang discloses the method of Claim 1. Additionally, Thies further discloses an electronic device (Thies, [Section 5.2 Animation Synthesis]: teaches using an Nvidia 1080 Ti GPU; Note: it should be noted that one skilled in the art would understand that in order to use a GPU, an electronic device, such as a computer, must be used), comprising
a memory (Thies, [Section 5.1 Novel View Point Synthesis]: teaches storing a single neural texture <read on memory>, which allows for synthesis of the target object under new views; Note: it should be noted that one skilled in the art would understand that a computer requires some form of storage access in order to function, such as a solid-state drive (SSD)) and
a processor (Thies, [Section 5.2 Animation Synthesis]: teaches using an Nvidia 1080 Ti GPU <read on processor>), wherein
the memory is configured to store a computer program (Thies, [Section 5.2 Animation Synthesis]: teaches using an Nvidia 1080 Ti GPU for the deferred neural rendering process <read on computer program>; Note: it should be noted that one skilled in the art would understand that rendering tasks and programs are stored in non-volatile memory before being loaded into volatile memory), and
the processor runs the computer program to enable the electronic device to execute the method according to claim 1 (Thies, [Section 5.2 Animation Synthesis]: teaches using an Nvidia 1080 Ti GPU to run <read on execute> the deferred neural rendering processes <read on method> for test sequences).

Regarding Claim 12, the combination of Thies and Wang discloses the method of Claim 1. Additionally, Thies further discloses a non-transitory computer-readable storage medium (Thies, [Section 5.2 Animation Synthesis]: teaches using an Nvidia 1080 Ti GPU; Note: it should be noted that one skilled in the art would understand that in order to use a GPU, an electronic device, such as a computer, must be used, where a computer includes at least a processor, memory <read on non-transitory computer-readable storage medium>, and storage), wherein
the computer-readable storage medium stores a computer program (Thies, [Section 5.1 Novel View Point Synthesis]: teaches storing a single neural texture, which allows for synthesis of the target object under new views, using deferred neural rendering processes <read on computer program>; Note: it should be noted that one skilled in the art would understand that a computer requires some form of storage access in order to function, such as a solid-state drive (SSD), which is a non-transitory computer-readable storage medium), and
the computer program is executed by a processor to implement the method according to claim 1 (Thies, [Section 5.2 Animation Synthesis]: teaches using an Nvidia 1080 Ti GPU <read on processor> to run <read on execute> the deferred neural rendering processes <read on method> for test sequences).

Regarding Claims 2, 8, and 13, the combination of Thies and Wang discloses the method, the electronic device, and the non-transitory computer-readable storage medium of Claims 1, 7, and 12 respectively. Additionally, Thies further discloses wherein the performing neural texturing on the initial information to obtain NT information comprises:
sampling the initial information based on an NT structure to obtain a sampled result (Thies, [Section 4.2 Neural Texture Hierarchies]: teaches implementing Neural Texture Hierarchies <read on NT structure> with K levels, where the texture hierarchy is accessed "by sampling values <read on initial information> from all K levels using normalized texture coordinates and bi-linear sampling," which results in a final color estimate being obtained "by adding all per-level sampling results"),
the NT structure being a property encoding structure based on a neural network (Thies, [Appendix A Network Architecture]: teaches the neural rendering network being based on a U-Net with 5-layers (i.e., an encoder-decoder network with skip connections <read on property encoding structure based on neural network>));
performing red green blue (RGB) color value conversion on the sampled result based on the NT structure to obtain rendering property information (Thies, [Section 4.2 Neural Texture Hierarchies]: teaches accessing the texture hierarchy "by sampling values from all K levels using normalized texture coordinates and bi-linear sampling," where "the final color estimate <read on performing RGB color value conversion on sampled result> is then obtained by adding all per-level sampling results"; [Section 4.2 Neural Texture Hierarchies]: further teaches the neural texture hierarchy storing low frequency information on coarse levels and high frequency detail on finer levels <read on rendering property information>);
determining a feature map according to the rendering property information (Thies, [Section 4.3 Differentiable Sampling of Neural Textures]: teaches supporting bilinear interpolation for sampling stored high-dimensional feature maps of neural textures); and
decoding the feature map with a U-shaped network (U-Net) to obtain the NT information (Thies, [Appendix A Network Architecture]: teaches the neural rendering network being based on a U-Net with 5-layers (i.e., an encoder-decoder network with skip connections), where it takes an image with 16 features per pixel (i.e., the rendered neural texture) as input and uses the feature channels <read on feature map> to decode the neural texture data <read on obtain NT information> as shown in FIG. 18).
    PNG
    media_image3.png
    700
    477
    media_image3.png
    Greyscale


Regarding Claims 4, 10, and 15, the combination of Thies and Wang discloses the method, the electronic device, and the non-transitory computer-readable storage medium of Claims 1, 7, and 12 respectively. Additionally, Thies further discloses wherein adjusting and rendering, by the differentiable renderer, the NT processed information in the neural rendering manner to obtain the rendered image, comprises:
performing deferred rendering on the NT processed information in the neural rendering manner to obtain deferred rendering information data (Thies, [Section 4.4 Deferred Neural Renderer]: teaches "the task of the Deferred Neural Renderer is to form a photo-realistic image <read on perform deferred rendering> given a screen space feature map," where based on precomputed mapping, "differentiable sampling can be used to obtain the screen space feature map via a lookup <read on deferred rendering information data>");
determining view rendering image data based on the auxiliary input information according to the deferred rendering information data (Thies, [Section 4.4 Deferred Neural Renderer]: teaches the neural network having additional inputs <read on auxiliary input information>, such as a view-direction <read on determine view rendering image data>, where for view-dependent effects, the view-direction is used as input using the first 3 bands of spherical harmonics resulting in 9 feature maps); and
determining the rendered image according to the view rendering image data (Thies, [Section 5 Results]: teaches using training data to optimize the neural texture of the object, which allows for re-rendering of the object <read on determine rendered image> under novel views and/or animate the object).

Regarding Claims 5, 11, and 16, the combination of Thies and Wang discloses the method, the electronic device, and the non-transitory computer-readable storage medium of Claims 4, 10, and 15 respectively. Additionally, Thies further discloses wherein the deferred rendering comprises
rasterization (Thies, [Section 4.4 Deferred Neural Renderer]: teaches using a traditional rasterization pipeline),
interpolation calculation (Thies, [Section 4.3 Differentiable Sampling of Neural Textures]: teaches supporting bilinear interpolation for sampling stored high-dimensional feature maps),
texture mapping (Thies, [Section 4.1 Neural Textures]: teaches neural textures being an extension of traditional texture maps <read on texture mapping>), and
anti-aliasing (Thies, FIG. 13 teaches re-rendering the generated image with a spherical harmonics layer to obtain a sharper and less-noisy image <read on anti-aliasing> result).
    PNG
    media_image4.png
    719
    476
    media_image4.png
    Greyscale


Claims 3, 9, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Thies et al. ("Deferred Neural Rendering: Image Synthesis using Neural Textures"), hereinafter referenced as Thies, in view of Wang et al. ("NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects"), hereinafter referenced as Wang as applied to Claims 1, 7, and 14 above respectively, and further in view of Li et al. ("Dilated Fully Convolutional Neural Network for Depth Estimation From a Single Image"), hereinafter referenced as Li, and further in view of Kato et al. ("Neural 3D Mesh Renderer"), hereinafter referenced as Kato.

Regarding Claims 3, 9, and 14, the combination of Thies and Wang discloses the method, the system, and the non-transitory computer-readable storage medium of 1, 7, and 12 respectively. Additionally, Thies further discloses wherein a method for determining the synthesis rendering model comprises:
acquiring training data of the target model at each view (Thies, [Section 5 Results]: teaches creating training data by re-rendering the uv-maps of the mesh that correspond to the observed images, where by using said training data, the neural texture of the object is optimized, which then re-renders the object under novel views),
the training data comprising initial information of known rendered images (Thies, [Section 5.1 Novel View Point Synthesis]: teaches storing and using a single rendered neural texture <read on known rendered images> to synthesize the target object under new views <read on initial information>);
performing neural texturing on the initial information in the training data to obtain training NT information (Thies, FIG. 2 teaches the neural rendering pipeline taking a neural texture and obtaining a sampled texture <read on training NT information>);
constructing a synthesis rendering network (Thies, FIG. 2 teaches using the neural rendering pipeline <read on synthesis rendering network>),
the synthesis rendering network comprising the NT input module, the NT learning network module, and a training [[differentiable]] renderer that are connected in sequence (Thies, [Section 4 Deferred Neural Rendering]: teaches a neural rendering pipeline <read on synthesis rendering network>, taking a valid uv-texture parameterization, an associated neural texture map, and a target view of a given 3D reconstructed object as input; Note: it should be noted that although an NT input module is not expressly stated, one skilled in the art would understand that an input component hardware must be present to receive any data; FIG. 2 teaches the deferred neural renderer being a part of the overall neural rendering pipeline);
transmitting the training NT information to the NT learning network module through the NT input module (Thies, [Section 4.3 Differentiable Sampling of Neural Textures]: teaches performing bilinear sampling and a standard graphics pipeline to rasterize a coarse proxy geometry and sample a neural texture, which results in obtaining a view-dependent screen space feature map, which is then used by the deferred neural renderer <read on transmit training NT information to NT learning network module>);
performing [[dilated]] convolution on the training NT information in the NT learning network module to obtain processed training NT information (Thies, FIG. 2 teaches the neural rendering pipeline taking a neural texture and obtaining a sampled texture <read on processed NT information> by processing <read on convolution> a neural texture input <read on training NT information> through the neural renderer), and
performing activation and concatenation on the processed training NT information based on a same resolution to obtain concatenated training data (Thies, [Appendix A Network Architecture]: teaches the rendering network being an encoder-decoder network with skip connections, where both the encoder and decoder each contain 5 convolutional layers with corresponding instance normalization, and a leaky RELU activation; FIG. 18 teaches the U-Net architecture of the neural rendering network, where each layer has the same resolution; [Section 3 Overview]: teaches neural textures and the deferred neural renderer are learned jointly <read on concatenation>, where the neural textures are sampled, resulting in a feature map <read on obtain concatenated training data> in target space);
dividing the concatenated training data into a training set and a test set (Thies, [Section 4.6 Training Data]: teaches "the training corpus size <read on concatenated training data> varies between 800 and 1700 frames depending on the sequence and the target application with similar angular differences from the test set to the training set");
setting the training set and corresponding rendered images as an input of the training [[differentiable]] renderer (Thies, [Section 5 Results]: teaches using the generated training data <read on setting training set as input>, the neural texture of the object is optimized, which allows for said object to be re-rendered under novel views <read on corresponding rendered images>),
rendering the training set (Thies, [Section 5 Results]: teaches creating <read on rendering> the training data), and
[[with a goal of minimizing a value of a loss function, updating parameters of the training differentiable renderer by using a gradient descent method and a back propagation method to obtain a trained differentiable renderer; and]]
setting the test set and corresponding rendered images as an input of the trained [[differentiable]] renderer (Thies, FIG. 13 teaches using a test sequence of 250 images <read on setting test set as input> for the neural rendering pipeline), and
[[adjusting parameters of the trained differentiable renderer to obtain the differentiable renderer, wherein]]
the synthesis rendering model comprises the NT input module, the NT learning network module, and [[the differentiable renderer]] (Thies, [Section 4 Deferred Neural Rendering]: teaches a neural rendering pipeline <read on synthesis rendering model>, taking a valid uv-texture parameterization, an associated neural texture map, and a target view of a given 3D reconstructed object as input; [Section 4 Deferred Neural Rendering]: teaches the deferred neural renderer combining aspects from a traditional graphics pipeline with learnable components <read on NT learning network module>).

However, Thies does not expressly disclose
the synthesis rendering network comprising the NT input module, the NT learning network module, and a training differentiable renderer that are connected in sequence;
performing dilated convolution on the training NT information in the NT learning network module to obtain processed training NT information, and
setting the training set and corresponding rendered images as an input of the training differentiable renderer,
with a goal of minimizing a value of a loss function, updating parameters of the training differentiable renderer by using a gradient descent method and a back propagation method to obtain a trained differentiable renderer; and
setting the test set and corresponding rendered images as an input of the trained differentiable renderer, and
adjusting parameters of the trained differentiable renderer to obtain the differentiable renderer, wherein
the synthesis rendering model comprises the NT input module, the NT learning network module, and the differentiable renderer.

Wang discloses
the synthesis rendering network comprising the NT input module, the NT learning network module, and a training differentiable renderer that are connected in sequence (Wang, [Section 3.4 Forward Rendering]: teaches a forward rendering module that is differentiable and designed <read on trained> for physical plausibility; FIG. 2 teaches the forward rendering module being connected <read on connected in sequence> to the Ray Bending Network (RBN), which takes neural information from the Geometry Network);
[[performing dilated convolution on the training NT information in the NT learning network module to obtain processed training NT information, and]]
setting the training set and corresponding rendered images as an input of the training differentiable renderer (Wang, [Section 3.4 Forward Rendering]: teaches a forward rendering module <read on training> that is differentiable and designed for physical plausibility),
[[with a goal of minimizing a value of a loss function, updating parameters of the training differentiable renderer by using a gradient descent method and a back propagation method to obtain a trained differentiable renderer; and]]
setting the test set and corresponding rendered images as an input of the trained differentiable renderer (Wang, [Section 4.3 Real World Data Results]: teaches rendering training data <read on rendered images> following the steps of synthetic datasets generation, which are used as input for the NEMTO framework for a novel differentiable rendering layer <read on trained differentiable renderer>), and
adjusting parameters of the trained differentiable renderer to obtain the differentiable renderer (Wang, FIG 2. teaches calculating the reflection direction                                 
                                    
                                        
                                            ω
                                        
                                        
                                            r
                                        
                                    
                                
                             through                                 
                                    
                                        
                                            ω
                                        
                                        
                                            i
                                        
                                    
                                
                             and n to render the viewing ray and calculated reflection rays <read on adjust parameters> using the differentiable forward renderer <read on trained>), wherein
the synthesis rendering model comprises the NT input module, the NT learning network module, and the differentiable renderer (Wang, [Section 3.4 Forward Rendering]: teaches a forward rendering module that is differentiable and designed for physical plausibility).

Wang is analogous art with respect to Thies because they are from the same field of endeavor, namely processing neural textures for novel view synthesis. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement the NEMTO framework, which includes a Geometry Network, a Ray Bending Network, and a Differentiable Forward Rendering Network as taught by Wang into the teaching of Thies. The suggestion for doing so would allow for more accurate light refraction and reflection modeling with regards to transparent/translucent objects. Therefore, it would have been obvious to combine Wang with Thies.



However, the combination of Thies and Wang does not expressly disclose
performing dilated convolution on the training NT information in the NT learning network module to obtain processed training NT information, and
with a goal of minimizing a value of a loss function, updating parameters of the training differentiable renderer by using a gradient descent method and a back propagation method to obtain a trained differentiable renderer.

Li discloses
performing dilated convolution on the training NT information in the NT learning network module to obtain processed training NT information (Li, [Section 3.1 Overview of the Method]: teaches dilating a fully CNN architecture for depth estimation as shown in FIG. 2; Note: dilated convolution adds gaps between elements of a convolution kernel, which keeps the output resolutions high and avoids the need of up-sampling), and
    PNG
    media_image5.png
    409
    832
    media_image5.png
    Greyscale

[[with a goal of minimizing a value of a loss function, updating parameters of the training differentiable renderer by using a gradient descent method and a back propagation method to obtain a trained differentiable renderer.]]

Li is analogous art with respect to Thies, in view of Wang because they are from the same field of endeavor, namely processing training image data via convolutional neural networks. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to replace some or all of the convolutional layers with dilation convolutions as taught by Li into the teaching of Thies, in view of Wang. The suggestion for doing so would allow for better and more accurate depth estimation of target objects, which would result in the neural renderer to better understand 3D spatial information and context, thereby yielding improved novel view synthesis image results. Therefore, it would have been obvious to combine Li with Thies, in view of Wang.

However, the combination of Thies, Wang, and Li does not expressly disclose
with a goal of minimizing a value of a loss function, updating parameters of the training differentiable renderer by using a gradient descent method and a back propagation method to obtain a trained differentiable renderer.

Kato discloses
with a goal of minimizing a value of a loss function, updating parameters of the training differentiable renderer by using a gradient descent method and a back propagation method to obtain a trained differentiable renderer (Kato, [Section 2.3 Image Editing Via Gradient Descent]: teaches "using a differentiable feature extractor and loss function, an image that minimizes the loss <read on value of loss function> can be generated via back-propagation and gradient descent," where the Neural Renderer <read on obtain trained differentiable renderer>, which uses Deep-Dream, provides gradients of an image with respect to the vertices and textures of a mesh; [Section 2.3 Image Editing Via Gradient Descent]: further teaches Deep-Dream being a system <read on training differentiable renderer> that uses differentiable feature extractor and a loss function, where "an initial image is repeatedly updated so that the magnitude of its image feature becomes larger"; Note: it should be noted that one skilled in the art would understand that a gradient descent has a process of initializing parameters, calculating the gradient, and updating the parameters).

Kato is analogous art with respect to the combination of Thies, Wang, and Li because they are from the same field of endeavor, namely incorporating neural networks into 3D graphics rendering. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to incorporate a differentiable feature extractor and loss function to minimize loss within a generated image as taught by Kato into the combined teaching of Thies, Wang, and Li. The suggestion for doing so would allow for manual edits and changes to the image, which is permitted by the gradient descent method, thereby allowing a system, such as a novel view synthesis using neural textures, to modify its outputs for better and more accurate data collection for testing and training. Therefore, it would have been obvious to combine Kato with the combination of Thies, Wang, and Li.










Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Shan et al. (US 20210248811 A1) discloses a novel view synthesis that utilizes neural textures;
Liao et al. ("Translucency Perception Emerges in Deep Generative Representations for Natural Image Synthesis") discloses learning a latent space informative of human translucency perception by developing a deep generative network trained to synthesize images of perceptually persuasive material appearances; and
Mildenhall et al. ("NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis") discloses an algorithm that represents a scene using a fully-connected (non-convolutional) deep network, whose input is a continuous 3D coordinate and a viewing direction, and whose output is the volume density and view-dependent emitted radiance at that spatial location.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KARL TRUONG whose telephone number is (703)756-5915. The examiner can normally be reached 7:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/K.D.T./Examiner, Art Unit 2614                                                                                                                                                                                                        
/KENT W CHANG/Supervisory Patent Examiner, Art Unit 2614
Read full office action
Prosecution Timeline

May 24, 2024
Application Filed
Dec 01, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/324,617
Patent 12573149
DATA PROCESSING METHOD AND APPARATUS, DEVICE, COMPUTER-READABLE STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
2y 5m to grant Granted Mar 10, 2026
18/455,592
Patent 12561875
ANIMATION FRAME DISPLAY METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Feb 24, 2026
18/211,149
Patent 12494013
AUTODECODING LATENT 3D DIFFUSION MODELS
2y 5m to grant Granted Dec 09, 2025
18/125,596
Patent 12456258
SYSTEMS AND METHODS FOR GENERATING A SHADOW MESH
2y 5m to grant Granted Oct 28, 2025
18/028,063
Patent 12444020
FLEXIBLE IMAGE ASPECT RATIO USING MACHINE LEARNING
2y 5m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
52%
Grant Probability
83%
With Interview (+31.0%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 29 resolved cases by this examiner. Grant probability derived from career allow rate.