Last updated: April 19, 2026

Application No. 18/419,075

TEXT-BASED OBJECT GENERATION

Final Rejection §103

Filed

Jan 22, 2024

Examiner

GE, JIN

Art Unit

2619

Tech Center

2600 — Communications

Assignee

Nvidia Corporation

OA Round

2 (Final)

Interview Optional

— +18.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 520 resolved cases, 2023–2026

Examiner Intelligence

GE, JIN View full profile →

Grants 80% — above average

Career Allow Rate

416 granted / 520 resolved

+18.0% vs TC avg

Strong +18% interview lift

Without

With

+18.0%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

38 currently pending

Career history

558

Total Applications

across all art units

Statute-Specific Performance

§101

9.0%

-31.0% vs TC avg

§103

60.2%

+20.2% vs TC avg

§102

12.0%

-28.0% vs TC avg

§112

11.0%

-29.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 520 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
 	This is in response to applicant’s amendment/response filed on 02/24/2026, which has been entered and made of record.  Claims 1-20 have been amended.  Claims 1-20 are pending in the application. 

Response to Arguments
 	Applicant's arguments filed on 02/24/2026 have been fully considered but they are rendered moot in view of the new grounds of rejection presented below (as necessitated by the amendment to claims 1, 8, and 15).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. PGPubs 2024/0161403 to Lin et al. in view of China PGPubs CN119516145 to Huang et al.

    PNG
    media_image1.png
    224
    468
    media_image1.png
    Greyscale

Regarding claim 1, Lin et al. teach one or more processors comprising (par 0017): circuitry to (par 0078):
generate one or more three-dimensional (3D) models of one or more objects (par 0007, “A 3D mesh is determined for a scene model generated with a first resolution, wherein the scene model is generated from an input text prompt describing a 3D content “, par 0031-0032, “The low-resolution scene model refers to a scene model that has been generated from an input text prompt describing a 3D content ….The mesh extractor 202 processes the low-resolution scene model to determine a 3D mesh for the low-resolution scene model”) ;
use one or more neural networks to determine the one or more 3D models represents one or more text descriptions (par 0018-0019, “a 3D mesh is determined for a scene model generated with a first resolution, wherein the scene model is generated from an input text prompt describing a 3D content  ….the scene model may be generated by a diffusion model (different from the diffusion model described below with respect to operation 104). In an embodiment, the diffusion model used to generate the scene model from the input text prompt may back-propagate gradients into the scene model via a loss defined on rendered images at the first resolution. In one exemplary embodiment, this diffusion model may be a pre-trained text-to-image diffusion model”, par 0031-0032, “The low-resolution scene model refers to a scene model that has been generated from an input text prompt describing a 3D content ….. the mesh extractor 202 may be a deep 3D conditional generative model that processes the low-resolution scene model to extract the 3D mesh therefrom”, par 0044-0045, generate a low-resolution scene model based on text prompts through a neural network); and
use the one or more neural networks to adjust the one or more 3D models to represent the one or more text descriptions based, at least in part, on the determination by the one or more neural networks (par 0033, “The 3D mesh is output by the mesh extractor 202 to the diffusion model 204. The diffusion model 204 processes the 3D mesh to predict a high-resolution 3D mesh model. While the 3D mesh model is referred to as a “high-resolution” 3D mesh model, it should be noted that this simply refers to a resolution that is higher than a resolution of the input to the system 200, and in particular a resolution that is higher than a resolution of the scene model input to the mesh extractor 202” …..Generate a fine 3D model using second diffusion processing based on a generated coarse 3D model using first diffusion processing based on inputted text).
But Lin et al. keep silent for teaching use one or more neural networks to determine whether the one or more 3D models represents one or more text descriptions.
In related endeavor, Huang et al. teach use one or more neural networks to determine whether the one or more 3D models represents one or more text descriptions (abstract, “determining the initial neural radiation field model corresponding to the object to be modeled based on the text embedding; updating the initial neural radiation field model through a two-dimensional guidance model and a three-dimensional guidance model to determine the neural radiation field model corresponding to the object to be modeled”, par 0021, “the three-dimensional priori knowledge provided by the three-dimensional guiding model is used for guiding NeRF the model to learn the three-dimensional model representation of the object to be modeled, so that the three-dimensional object generated based on rendering of the three-dimensional model has good geometric consistency in multiple view angles, and the fidelity and the credibility of the three-dimensional object are improved”, par 0075-0076, “To solve and correct the recognized semantic inconsistencies, an optimization algorithm is employed. The algorithm uses a minimization strategy to accurately and minimally edit the 3D model. By recalibrating the features and elements of the model, the optimization algorithm ensures that the 3D reconstruction remains consistent and consistent across multiple viewpoints, thereby effectively solving any differences detected during the multi-viewpoint analysis and improving the reliability of the rendered generated three-dimensional object ….. the three-dimensional priori knowledge provided by the three-dimensional guiding model is used for guiding NeRF the model to learn the three-dimensional model representation of the object to be modeled, so that the three-dimensional object generated based on rendering of the three-dimensional model has good geometric consistency in multiple view angles, and the fidelity and the credibility of the three-dimensional object are improved”).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Lin et al. to include use one or more neural networks to determine whether the one or more 3D models represents one or more text descriptions as taught by Huang et al. to generate three-dimensional object with good geometric consistency in multiple view angles to improve the fidelity and the credibility of the three-dimensional object.

Regarding claim 2, Lin et al. as modified by Huang et al. teach all the limitation of claim 1, and further teach wherein the one or more neural networks comprise one or more first neural networks and one or  more second neural networks (Lin et al.: par 0019, “the scene model may be generated by a diffusion model (different from the diffusion model described below with respect to operation 104). In an embodiment, the diffusion model used to generate the scene model from the input text prompt may back-propagate gradients into the scene model via a loss defined on rendered images at the first resolution”, Fig 4, par 0037-0039, “the input text prompt is processed, using a first diffusion model, to generate a scene model with a first resolution. In an embodiment, the first diffusion model may be a pre-trained text-to-image diffusion model. In an embodiment, the scene model may be an Instant-NGP representation of the 3D content ….3D mesh is extracted from the scene model. In operation 308, the 3D mesh is processed, using a second diffusion model, to predict a 3D mesh model with a second resolution that is greater than the first resolution”), and the one or more first neural networks are trained based, at least in part, on an indication generated by the one or more second neural networks of whether the one or more 3D models adjusted by the one or more neural networks 3D model matches the one or more text descriptions (Lin et al.: par 0058-0059, “the 3D mesh model is optimized (or refined) based on the modification to the input text prompt. In the example above where the modification is to a texture and/or a geometry of the 3D content, the corresponding texture and/or geometry of the 3D mesh model may be optimized accordingly. In the example above where the modification is the addition of the reference image, the 3D mesh model may be optimized to better match the object(s) in the reference image”, Huang et al.: par 0021, “the three-dimensional priori knowledge provided by the three-dimensional guiding model is used for guiding NeRF the model to learn the three-dimensional model representation of the object to be modeled, so that the three-dimensional object generated based on rendering of the three-dimensional model has good geometric consistency in multiple view angles, and the fidelity and the credibility of the three-dimensional object are improved”, par 0065, “For the training phase, since the network parameters in the initial NeRF model are randomly generated, the network parameters in the initial NeRF model can be updated by performing network training on the initial NeRF model, so that the initial NeRF model continuously learns the three-dimensional representation of the object to be modeled in the training process. In this way, in the rendering stage, the training is completed to obtain the three-dimensional view of the object to be modeled in each view angle by using the neural radiation field model, so as to obtain the three-dimensional object. In this way, the resulting neuro-radiation field model, through training of the two-dimensional and three-dimensional guided models, can provide a powerful representation of the 3D object and can dynamically render the 3D object from almost any perspective, thereby ensuring that realistic, consistent 3D content is created, accurately capturing nuances and details of the geometry and appearance of the object to be modeled”, par 0075-0076, “To solve and correct the recognized semantic inconsistencies, an optimization algorithm is employed. The algorithm uses a minimization strategy to accurately and minimally edit the 3D model. By recalibrating the features and elements of the model, the optimization algorithm ensures that the 3D reconstruction remains consistent and consistent across multiple viewpoints, thereby effectively solving any differences detected during the multi-viewpoint analysis and improving the reliability of the rendered generated three-dimensional object ….. the three-dimensional priori knowledge provided by the three-dimensional guiding model is used for guiding NeRF the model to learn the three-dimensional model representation of the object to be modeled, so that the three-dimensional object generated based on rendering of the three-dimensional model has good geometric consistency in multiple view angles, and the fidelity and the credibility of the three-dimensional object are improved”).

Regarding claim 3, Lin et al. as modified by Huang et al. teach all the limitation of claim 1, and Lin et al. further teach wherein the one or more neural networks comprise a diffusion model (par 0019, “the scene model may be generated by a diffusion model (different from the diffusion model described below with respect to operation 104). In an embodiment, the diffusion model used to generate the scene model from the input text prompt may back-propagate gradients into the scene model via a loss defined on rendered images at the first resolution”, Fig 4, par 0037-0039, “3D mesh is extracted from the scene model. In operation 308, the 3D mesh is processed, using a second diffusion model, to predict a 3D mesh model with a second resolution that is greater than the first resolution”).

Regarding claim 4, Lin et al. as modified by Huang et al. teach all the limitation of claim 1, and Lin et al. further teach wherein the one or more neural networks comprise a convolutional neural network (par 0021, “ the Instant-NGP may use a hash grid encoding, and includes a first single-layer neural network that predicts albedo and density and a second single-layer neural network that predicts surface normal”, par 0066-0070, “Deep neural networks (DNNs), including deep learning models, developed on processors have been used for diverse use cases, from self-driving cars to faster drug development, from automatic image captioning in online image databases to smart real-time language translation in video chat applications.” par 0080-0081, “FIG. 7 illustrates another embodiment for training and deployment of a deep neural network. In at least one embodiment, untrained neural network 706 is trained using a training dataset 702.”).

Regarding claim 5, Lin et al. as modified by Huang et al. teach all the limitation of claim 1, and Lin et al. further teach wherein the one or more neural networks are trained to identify features of the one or more 3D models that are adjustable such that the one or more 3D model adjusted by the one or mode neural networks matches the one or more text descriptions (Fig 4, par 0050, “an MLP is used to predict the normals. Note that this does not violate geometric properties since volume rendering is used instead of surface rendering; as such, the orientation of particles at continuous positions need not be oriented to the level set surface. This helps to significantly reduce the computational cost of optimizing the coarse model. Accurate normals can be obtained in the fine stage of optimization when using a true surface rendering mode”, par 0058-0059, “The modification may received as a new input text prompt that changes one or more parameters of the prior input text prompt, in an embodiment. For example, the modification may be to a texture and/or a geometry of the 3D content. As another example, the modification may be adding a reference image to the input text prompt ….  where the modification is to a texture and/or a geometry of the 3D content, the corresponding texture and/or geometry of the 3D mesh model may be optimized accordingly. In the example above where the modification is the addition of the reference image, the 3D mesh model may be optimized to better match the object(s) in the reference image”).

Regarding claim 6, Lin et al. as modified by Huang et al. teach all the limitation of claim 1, and Lin et al. further teach wherein the one or more neural networks are to adjust one or more textures of the one or more 3D models (Fig 4, par 0050, “an MLP is used to predict the normals. Note that this does not violate geometric properties since volume rendering is used instead of surface rendering; as such, the orientation of particles at continuous positions need not be oriented to the level set surface. This helps to significantly reduce the computational cost of optimizing the coarse model. Accurate normals can be obtained in the fine stage of optimization when using a true surface rendering mode”, par 0058-0059, “The modification may received as a new input text prompt that changes one or more parameters of the prior input text prompt, in an embodiment. For example, the modification may be to a texture and/or a geometry of the 3D content. As another example, the modification may be adding a reference image to the input text prompt ….  where the modification is to a texture and/or a geometry of the 3D content, the corresponding texture and/or geometry of the 3D mesh model may be optimized accordingly. In the example above where the modification is the addition of the reference image, the 3D mesh model may be optimized to better match the object(s) in the reference image”).

Regarding claim 7, Lin et al. as modified by Huang et al. teach all the limitation of claim 1, and further teach wherein the one or more first neural networks are to adjust one or more meshes of the one or more 3D models (par 0027, “the method 100 may also include presenting the 3D content on a display device, using the 3D mesh model. For example, the 3D mesh model may be rendered (from a defined camera perspective) to an image that is presented on the display device. In an embodiment, a modification to the input text prompt may be received (from the user), and in turn the 3D mesh model may be optimized, or refined, based on the modification to the input text prompt. For example, the modification may be to a texture and/or a geometry, such that the corresponding texture and/or geometry of the 3D mesh model may be optimized accordingly”, Fig 4, par 0050, “an MLP is used to predict the normals. Note that this does not violate geometric properties since volume rendering is used instead of surface rendering; as such, the orientation of particles at continuous positions need not be oriented to the level set surface. This helps to significantly reduce the computational cost of optimizing the coarse model. Accurate normals can be obtained in the fine stage of optimization when using a true surface rendering mode”, par 0058-0059, “The modification may received as a new input text prompt that changes one or more parameters of the prior input text prompt, in an embodiment. For example, the modification may be to a texture and/or a geometry of the 3D content. As another example, the modification may be adding a reference image to the input text prompt ….  where the modification is to a texture and/or a geometry of the 3D content, the corresponding texture and/or geometry of the 3D mesh model may be optimized accordingly. In the example above where the modification is the addition of the reference image, the 3D mesh model may be optimized to better match the object(s) in the reference image”).

Regarding claims 8-14, the method claims 8-14 are similar in scope to claims 1-7 and are rejected under the same rational.

Regarding claims 15-20, the system claims 15-20 are similar in scope to claims 1-4 and 6-7 and are rejected under the same rational (par 0007).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jin Ge whose telephone number is (571)272-5556. The examiner can normally be reached 8:00 to 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jason Chan can be reached at (571)272-3022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JIN . GE
Examiner
Art Unit 2619



/JIN GE/Primary Examiner, Art Unit 2619

Read full office action

Prosecution Timeline

Jan 22, 2024

Application Filed

Oct 21, 2025

Non-Final Rejection — §103

Jan 02, 2026

Interview Requested

Feb 24, 2026

Response Filed

Mar 08, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/537,024

Patent 12592024

QUANTIFICATION OF SENSOR COVERAGE USING SYNTHETIC MODELING AND USES OF THE QUANTIFICATION

2y 5m to grant Granted Mar 31, 2026

18/597,468

Patent 12586296

METHODS AND PROCESSORS FOR RENDERING A 3D OBJECT USING MULTI-CAMERA IMAGE INPUTS

2y 5m to grant Granted Mar 24, 2026

18/565,927

Patent 12579704

VIDEO GENERATION METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM

2y 5m to grant Granted Mar 17, 2026

18/406,280

Patent 12573164

DESIGN DEVICE, PRODUCTION METHOD, AND STORAGE MEDIUM STORING DESIGN PROGRAM

2y 5m to grant Granted Mar 10, 2026

18/469,453

Patent 12573151

PERSONALIZED DEFORMABLE MESH BY FINETUNING ON PERSONALIZED TEXTURE

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

80%

Grant Probability

98%

With Interview (+18.0%)

2y 9m

Median Time to Grant

Moderate

PTA Risk

Based on 520 resolved cases by this examiner. Grant probability derived from career allow rate.