DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to the Amendment filed on 11/10/2025. Claims 1-20 are pending in the case.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 6-7, 9, and 12-20 are rejected under 35 U.S.C. 103 as being unpatentable over Francis (US 20240185498 A1) in view of Wang et al. (US 20240331356 A1, hereinafter Wang).
As to independent claim 1, Francis teaches a computer-implemented method, comprising:
determining a target level of detail for a three-dimensional (3D) volume (“The 3D scenes obtained from the client device 110 or gaming system 120 may have a fairly low resolution. Thus, the first set of 2D images often also have a fairly low resolution.” Pagraph 0027);
generating, using an image generation network and based at least on a current view representing the 3D volume, an updated view representing the 3D volume at the target level of detail (“The graphic generation system 130 extracts features from the first set of 2D images and generates one or more text prompts based on the extracted features. The graphic generation system 130 applies a diffusion model to the features and the one or more text prompts to generate a second set of 2D images having a higher resolution” Paragraph 0027);
providing, responsive to the target level of detail, the updated view (“causing the second set of 2D images to be rendered at the client device 110.” Paragraph 0027);
Francis does not appear to expressly teach adding the updated view to a set of images associated with the 3D volume; and
updating the 3D volume, based at least on the updated view.
Wang teaches adding the updated view to a set of images associated with the 3D volume (“A neural network can be trained to assist in a process of visual localization within a scene. By using a neural radiance field to generate images such neural networks can be trained efficiently whilst being high performing.” Paragraph 0004, “A neural network is trained using the generated images” paragraph 0005); and
updating the 3D volume, based at least on the updated view (“A plurality of training examples is accessed, each training example comprising a color image of a scene, a depth image of the scene, and a pose of a viewpoint from which color image and depth image were captured. A neural radiance field is trained using the training examples. A plurality of generated images is computed, by, for each of a plurality of randomly selected viewpoints, generating a color image and a depth image of the scene from the neural radiance field. A neural network is trained using the generated images.” Paragraph 0005).
Accordingly, it would have been obvious to a person ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Francis to comprise adding the updated view to a set of images associated with the 3D volume; and updating the 3D volume, based at least on the updated view. One would have been motivated to make such a combination to reduce the amount of training and to improve performance of the neural network.
As to dependent claim 2, Francis teaches the computer-implemented method of claim 1, Francis does not appear to expressly teach wherein the 3D volume is represented by a neural radiance field (NeRF).
Wang teaches wherein the 3D volume is represented by a neural radiance field (NeRF) (“The 3D model 110 is stored at a memory, database or other store accessible to the visual localization service. The 3D model 110 is a 3D point cloud, or a mesh model or any other 3D model of the scene…The visual localization service 112 uses a scene coordinate regression model that has been trained for the scene 114 using training images generated by a neural radiance field.” Paragraph 0037-0038).
Accordingly, it would have been obvious to a person ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Francis to comprise wherein the 3D volume is represented by a neural radiance field (NeRF). One would have been motivated to make such a combination to reduce the amount of training and to improve performance of the neural network.
As to dependent claim 3, Francis teaches the computer-implemented method of claim 1, Francis further teaches wherein the image generation network is a diffusion model conditioned on both text and images (“In some embodiments, the diffusion model is a text-to-image (T2I) model. The T2I model is trained over a dataset comprising images and corresponding text descriptions. Responsive to receiving a text prompt, the T2I model is trained to generate an image consistent with the text prompt.” Paragraph 0022, last sentence).
As to dependent claim 4, Francis teaches the computer-implemented method of claim 1, Francis teaches the method further comprising:
receiving a prompt for the current view (“The system also generates one or more text prompts based on the extracted features. In some embodiments, the system identifies an object (e.g., a car) based on the extracted features, and generates a text prompt corresponding to the object.” Paragraph 0009, “Alternatively, one or more text prompts 426 may be input by a user.” Paragraph 0118); and
providing, to a language model associated with the image generation network, the prompt (“training the T2I model described herein includes using a language model 220,” paragraph 0037).
As to dependent claim 6, Francis teaches the computer-implemented method of claim 1, Francis further teaches wherein the image generation network is a super-resolution model conditioned on an image having a resolution less than a threshold (“The graphic generation system 130 generates 540 one or more text prompts based on the extracted features. The graphic generation system 130 applies 550 a diffusion model to the features and the one or more text prompts to generate a second set of one or more 2D images having a second resolution greater than the first resolution.” Paragraph 0123, examiner notes the images having a first resolution, i.e. fairly low resolution, are images having a resolution less than a threshold).
As to dependent claim 7, Francis teaches the computer-implemented method of claim 1, Francis does not appear to expressly teach the method further comprising:
removing, upon receiving the updated view, one or more previous images from the set of images; and
updating a network associated with the 3D volume.
Wang teaches removing, upon receiving the updated view, one or more previous images from the set of images; and updating a network associated with the 3D volume (“To make training more effective it is possible to remove images with depth smaller than a specified threshold. This may be done by, for each of the plurality of generated images, inspecting depth values of the generated image; and where the operation of training the neural network comprises omitting one of the generated images according to the depth values being below a threshold. This mitigates effects of NeRF generated image artifacts.” Paragraph 0029).
Accordingly, it would have been obvious to a person ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Francis to comprise removing, upon receiving the updated view, one or more previous images from the set of images; and updating a network associated with the 3D volume. One would have been motivated to make such a combination to reduce the amount of training and to improve performance of the neural network
As to dependent claim 9, Francis teaches the computer-implemented method of claim 1, Francis does not appear to expressly teach wherein the 3D volume is represented by a neural radiance field (NeRF), the method further comprising:
converting the NeRF to a mesh-based representation.
Wang teaches converting the NeRF to a mesh-based representation (“The 3D model 110 is a 3D point cloud, or a mesh model or any other 3D model of the scene. In some cases the 3D model 110 is created by capturing depth images of the scene and using an iterative closest point algorithm to form a 3D point cloud from the depth images…The visual localization service 112 uses a scene coordinate regression model that has been trained for the scene 114 using training images generated by a neural radiance field.” Paragraph 0037-0038).
Accordingly, it would have been obvious to a person ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Francis converting the NeRF to a mesh-based representation. One would have been motivated to make such a combination to improve performance of the neural network.
As to dependent claim 13, Francis teaches the processor of claim 12, Francis further teaches wherein the prompt is a text prompt, and wherein the one or more processing units are further to: provide the text prompt to a large language model (LLM); receive, from the LLM, a command based, at least, on the text prompt (“The language model 220 is an artificial intelligence (AI) model trained to understand, interpret, generate, and respond to natural language.” Paragraph 0031); and provide the command to the one or more diffusion models (“The T2I model 230 is a diffusion model trained to create images based on both an image and a text description.” Paragraph 0032).
As to dependent claim 16, Francis teaches the processor of claim 12, Francis further teaches wherein the processor is comprised in at least one of:
a system for performing simulation operations;
a system for performing simulation operations to test or validate autonomous machine applications;
a system for performing digital twin operations;
a system for performing light transport simulation;
a system for rendering graphical output (a graphic generation system 130);
a system for performing deep learning operations;
a system implemented using an edge device;
a system for generating or presenting virtual reality (VR) content;
a system for generating or presenting augmented reality (AR) content;
a system for generating or presenting mixed reality (MR) content;
a system incorporating one or more Virtual Machines (VMs);
a system for performing operations for a conversational AI application; a
system for performing operations for a generative AI application;
a system for performing operations using a language model (language model 220);
a system for performing one or more generative content operations using a large language model (LLM);
a system implemented at least partially in a data center;
a system for performing hardware testing using simulation;
a system for performing one or more generative content operations using a language model;
a system for synthetic data generation; a collaborative content creation platform for 3D assets; or
a system implemented at least partially using cloud computing resources.
As to dependent claim 18, Francis teaches the system of claim 17, Francis further teaches wherein the output image is generated by one or more diffusion models responsive to a request. (“In some embodiments, the diffusion model is a text-to-image (T2I) model. The T2I model is trained over a dataset comprising images and corresponding text descriptions. Responsive to receiving a text prompt, the T2I model is trained to generate an image consistent with the text prompt.” Paragraph 0022, last sentence).
As to dependent claim 19, Francis teaches the system of claim 17, Francis further teaches wherein the output image is at least one of a higher resolution image relative to the input image, or a hallucinated image (“The graphic generation system 130 extracts features from the first set of 2D images and generates one or more text prompts based on the extracted features. The graphic generation system 130 applies a diffusion model to the features and the one or more text prompts to generate a second set of 2D images having a higher resolution” Paragraph 0027);
Claim 12 is substantially the same as claims 1-2 and is therefore rejected under the same rationale as above.
Claim 14 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.
Claim 15 is substantially the same as claim 6 and is therefore rejected under the same rationale as above.
Claim 17 is substantially the same as claim 1-2 and is therefore rejected under the same rationale as above.
Claim 20 is substantially the same as claim 16 and is therefore rejected under the same rationale as above.
Claims 5 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Francis in view of Wang et al., and Karpman et al. (US 11995803 B1, hereinafter Karpman).
As to dependent claim 5, Francis teaches the computer-implemented method of claim 4, Francis does not appear to expressly teach wherein the language model is a large language model (LLM) configured to generate a hierarchy of information based, at least, on the prompt.
Karpman teaches wherein the language model is a large language model (LLM) configured to generate a hierarchy of information based, at least, on the prompt (“For example, text-to-image diffusion model 112 may define a (set of) pre-trained text encoders 118 (e.g., one or more pre-trained language models), base image diffusion models 120, and high-resolution diffusion models 116. Text encoders 118 interpret a text query and generate an embedding of the text query. Base image diffusion models 120 generate a base image (e.g., an initial, low-resolution image) from the embedding. High-resolution diffusion models 116 are configured to progressively upsample the images to larger sizes and/or resolutions.” Col. 2, lines 56-65).
Accordingly, it would have been obvious to a person ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Francis to comprise wherein the language model is a large language model (LLM) configured to generate a hierarchy of information based, at least, on the prompt. One would have been motivated to make such a combination to improve performance of the neural network.
As to dependent claim 8, Francis teaches the computer-implemented method of claim 1, Francis does not appear to expressly teach wherein the target level of detail is associated with an input command from a user of an interactive environment.
Karpman teaches wherein the target level of detail is associated with an input command from a user of an interactive environment (“the creativity slider enables the user to specify and/or change a creativity level for the text-to-image diffusion model 112 when generating the requested image, ,…” Col. 21 lines 41-44).
Accordingly, it would have been obvious to a person ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Francis to comprise wherein the target level of detail is associated with an input command from a user of an interactive environment. One would have been motivated to make such a combination to improve performance of the neural network.
Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Francis in view of Wang et al., Karpman et al., and Wei (US 20210042991 A1).
As to dependent claim 10, Francis teaches the computer-implemented method of claim 1, Karpman further teaches receiving, a hierarchy of information for an object (“For example, text-to-image diffusion model 112 may define a (set of) pre-trained text encoders 118 (e.g., one or more pre-trained language models), base image diffusion models 120, and high-resolution diffusion models 116. Text encoders 118 interpret a text query and generate an embedding of the text query. Base image diffusion models 120 generate a base image (e.g., an initial, low-resolution image) from the embedding. High-resolution diffusion models 116 are configured to progressively upsample the images to larger sizes and/or resolutions.” Col. 2, lines 56-65).
Francis and Karpman do not appear to expressly teach the method further comprising:
determining a plurality of sub-levels for the 3D volume based, at least, on the hierarchy of information; and
establishing an ordering for the plurality of sub-levels associated with a respective level for each sub-level of the plurality of sub-levels.
Wei teaches receiving, a hierarchy of information for an object associated with the 3D volume (“A result of the preprocessing of the scene is that each object in the scene is divided to a spatial block of a level L.” Paragraph 0087);
determining a plurality of sub-levels for the 3D volume based, at least, on the hierarchy of information; and establishing an ordering for the plurality of sub-levels associated with a respective level for each sub-level of the plurality of sub-levels (“For example, FIG. 3 to FIG. 6 show scene items included in different levels of grid blocks. Scales of grids shown from FIG. 3 to FIG. 6 gradually decrease. It can be seen that when the scale of the grids is smaller, the sizes of the scene items included in the grid are smaller and the loading distances of the scene items are smaller. A level having a relatively large scale of grids includes more coarse-grained scene items. A level having a relatively small scale of grids includes more detailed scene items.” Paragraph 0087).
Accordingly, it would have been obvious to a person ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Francis to comprise determining a plurality of sub-levels for the 3D volume based, at least, on the hierarchy of information; and establishing an ordering for the plurality of sub-levels associated with a respective level for each sub-level of the plurality of sub-levels. One would have been motivated to make such a combination to improve the loading of a scene.
As to dependent claim 11, Francis teaches the computer-implemented method of claim 10, Francis does not appear to expressly teach the method further comprising:
storing the plurality of sub-levels;
providing, responsive to a first command, the object; and
providing, responsive to a second command, a sub-level of the plurality of sub-levels.
Karpman teaches storing the plurality of sub-levels (“The system can then store the resulting preference training set—including images generated by the initial text-to-image diffusion model 112 labeled with corresponding preference scores in the set of storage devices 110.” Col. 9 lines 9-13);
providing, responsive to a first command (“Generally, the system 100 can leverage internal serving infrastructure to receive and process image generation requests from a large set of client devices 104 (e.g., sequentially or concurrently process tens or hundreds of image generation requests per second,…” Col. 16 lines 9-13), the object (“generate a base image based on the one or more embedding representations and image generation parameters of the base image diffusion model 12” Col. 6 lines 1-4); and
providing, responsive to a second command (“the creativity slider enables the user to specify and/or change a creativity level for the text-to-image diffusion model 112 when generating the requested image, ,…” Col. 21 lines 41-44), a sub-level of the plurality of sub-levels (“sequentially execute the set of high-resolution diffusion models 116 to generate a final image by upsampling the base image to a final resolution; and output the final image (e.g., to the communication interface 122).” Col. 6 lines 4-8).
Accordingly, it would have been obvious to a person ordinary skill in the art before the effective filing date of the claimed invention to modify the method of Francis to comprise adding the updated view to a set of images associated with the 3D volume; and updating the 3D volume, based at least on the updated view. One would have been motivated to make such a combination to improve the loading of a scene.
Response to Arguments
Applicants’ arguments filed 11/10/2025 have been fully considered but they are not persuasive. Applicants argue that Francis is not prior art because the cited portion of Francis are not supported in its provisional application. Examiner respectfully disagrees. The Examiner has taken steps to ensure that the disclosure filed as the provisional application adequately provides (1) a written description of the subject matter of the claim(s) at issue in the later filed nonprovisional application, and (2) an enabling disclosure to permit one of ordinary skill in the art to make and use the claimed invention in the later filed nonprovisional application without undue experimentation.The disclosure of the provisional application provides sufficient support for the cited portions of the non-provisional application. For example, “2. We can take advantage of current low-quality text-to-3D models to add specificity to the scene but still have a high-quality generated result” (page 1), “2. This can be done by taking a 3D environment, pre-rendering it using existing techniques with minimal detail to maximize speed to produce a sequence of frames for the specified views in the scene. The underlying 3D environment can have minimal detail” (page 1), and “b. The quality of the mesh is not crucial as it is, as previously mentioned, and is simply a suggestive prior to the final rendered output” (page 3). It is noted that many words and terms can be used interchangeably to describe a series of steps or a process. Therefore, Examiner respectfully asserts that the cited portions of Francis [0027] are sufficiently supported in the provisional application.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Kim et al. US 12322068 B1 teaches using neural networks to generate a three-dimensional voxel representation of a scene based, at least in part, upon a plurality of two-dimensional images of the scene.
Kreis US 20240005604 A1 teaches synthesizing three-dimensional shapes using latent diffusion models in content generation systems and applications.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAHELET SHIBEROU whose telephone number is (571)270-7493. The examiner can normally be reached Monday-Friday 9:00 AM-5:00 PM Eastern Time.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Ell can be reached at 571-270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MAHELET SHIBEROU/Primary Examiner, Art Unit 2171