Prosecution Insights
Last updated: April 19, 2026
Application No. 16/922,214

IMAGE GENERATION USING ONE OR MORE NEURAL NETWORKS

Final Rejection §103
Filed
Jul 07, 2020
Examiner
HUA, QUAN M
Art Unit
2645
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
6 (Final)
72%
Grant Probability
Favorable
7-8
OA Rounds
2y 9m
To Grant
94%
With Interview

Examiner Intelligence

Grants 72% — above average
72%
Career Allow Rate
445 granted / 621 resolved
+9.7% vs TC avg
Strong +22% interview lift
Without
With
+21.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
45 currently pending
Career history
666
Total Applications
across all art units

Statute-Specific Performance

§101
8.3%
-31.7% vs TC avg
§103
48.3%
+8.3% vs TC avg
§102
18.4%
-21.6% vs TC avg
§112
17.0%
-23.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 621 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-30 are pending. IDS of 02/04/2026 is/are considered. Response to Arguments Arguments presented 01/30/2026 are considered but are moot in view of a new ground of rejection. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claim(s) 1, 7, 13, 19 and 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wantland et al. (US 2021/0042950) in view of Dixit et al. (US 2020/0380720). As to claim 1: Wantland discloses: One or more processors, comprising: circuitry to: Cause an software application that uses one or more more neural networks (¶0106-107, processor/memory to execute an image application that employs one or more trained neural network to perform the process disclosed) to generate, for one or more objects depicted with a first pose in one or more first images to one or more second images, a second pose for the one or more objects to be added to one or more second images (¶0079-0092, Fig. 6A-B, generate an object with different pose(s) in a new image), the added one or more objects in the one or more second images having a second pose different from the first pose, generated by the one or more neural networks based, at least in part, on one or more other objects within the one or more second images. (See ¶0080-0092, Fig.6A-B, an object image depicting a bike (602a) to be added to the scene image 601. Orientation/pose of the bike is adjusted based on the background objects of the scene 601 as seen in Fig. 6B to create a coherent composite scene with the added object (bike). “For example, the pose of the bike may be adjusted to lean at a greater angle than it would be if boots 610 were not present, so that the boots 610 are behind the rendered virtual bike 602b in screen 600b.”, “Similar metadata may be provided indicating the particular type of object and real-world sizing parameters for other segmentation masks in image 601. By combining this information with a depth map of the image, the editing application may determine what the pose and relative position of the real-world objects captured in image 601”) Wantland discloses the image processing application that employs one or more neural network in the process of generating a new image with the object with a new pose generated, with at least the depth map generation phase, but does not explicitly mention the the one or more neural network to generate the second pose to add the object with new pose to one or more second images. However, such generative tasks with context awareness is well-established in the art as typically being done via generative neural network, which generate novel content and output new images with said novel content. Indeed, Dixit, in a related field of endeavor, discloses an image generation application that synthesizes a new pose of object in a reference image and output an image with the object with new pose (Abstract), and per Fig. 2-4 wherein each part of the software structure is a neural network, for example the domain transfer model is a neural network structure depicted in fig. 3, the identify recovery module is a autoencoder with encoder-decoder neural network structure depicted in Fig. 4. Per ¶0083-0084, with Fig. 13A-Fig. 13B describing using the architecture above to generate new pose of the object by synthetizing the new depth map. Fig. 13C, outputting from the intensity model one or more images containing the object with a new pose from a new view point It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention the image generation architecture of Wantland to use neural networks not only to generate the depth maps of objects, but also to use fully them to generate context-awared new pose as well as the new images with the object, in similar suggestion by Dixit. Given that Wantland’s process involves deep analysis of image context and object pose as well as generation of new creative content, the involvement of a generative model is mandatory because it is impossible for legacy non-smart software to perform such deep analysis and creative task. Legacy software might be able to perform analysis, such as depth analysis, however they cannot “learn”, and the generation of new poses in Wantland must use an AI model such as a GAN. Using generative neural network in Dixit is shown to perform similar creative tasks in Wantland by simply learning from the original source images while retaining accuracy of features. As to claim 7: Wantland discloses: A system comprising: one or more processors to: Cause an software application that uses one or more neural networks (¶0106-107, processor/memory to execute an image application that employs one or more trained neural network to perform the process disclosed) to generate, for one or more objects depicted with a first pose in one or more first images to one or more second images, a second pose for the one or more objects to be added to one or more second images (¶0079-0092, Fig. 6A-B, generate an object with different pose(s) in a new image), the added one or more objects in the one or more second images having a second pose different from the first pose, generated by the one or more neural networks based, at least in part, on one or more other objects within the one or more second images. (See ¶0080-0092, Fig.6A-B, an object image depicting a bike (602a) to be added to the scene image 601. Orientation/pose of the bike is adjusted based on the background objects of the scene 601 as seen in Fig. 6B to create a coherent composite scene with the added object (bike). “For example, the pose of the bike may be adjusted to lean at a greater angle than it would be if boots 610 were not present, so that the boots 610 are behind the rendered virtual bike 602b in screen 600b.”, “Similar metadata may be provided indicating the particular type of object and real-world sizing parameters for other segmentation masks in image 601. By combining this information with a depth map of the image, the editing application may determine what the pose and relative position of the real-world objects captured in image 601”) Wantland discloses the image processing application that employs one or more neural network in the process of generating a new image with the object with a new pose generated, with at least the depth map generation phase, but does not explicitly mention the the one or more neural network to generate the second pose to add the object with new pose to one or more second images. However, such generative tasks with context awareness is well-established in the art as typically being done via generative neural network, which generate novel content and output new images with said novel content. Indeed, Dixit, in a related field of endeavor, discloses an image generation application that synthesizes a new pose of object in a reference image and output an image with the object with new pose (Abstract), and per Fig. 2-4 wherein each part of the software structure is a neural network, for example the domain transfer model is a neural network structure depicted in fig. 3, the identify recovery module is a autoencoder with encoder-decoder neural network structure depicted in Fig. 4. Per ¶0083-0084, with Fig. 13A-Fig. 13B describing using the architecture above to generate new pose of the object by synthetizing the new depth map. Fig. 13C, outputting from the intensity model one or more images containing the object with a new pose from a new view point It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention the image generation architecture of Wantland to use neural networks not only to generate the depth maps of objects, but also to use fully them to generate context-awared new pose as well as the new images with the object, in similar suggestion by Dixit. Given that Wantland’s process involves deep analysis of image context and object pose as well as generation of new creative content, the involvement of a generative model is mandatory because it is impossible for legacy non-smart software to perform such deep analysis and creative task. Legacy software might be able to perform analysis, such as depth analysis, however they cannot “learn”, and the generation of new poses in Wantland must use an AI model such as a GAN. Using generative neural network in Dixit is shown to perform similar creative tasks in Wantland by simply learning from the original source images while retaining accuracy of features. Claim 13 is directed to a method with step(s) similar to those in claim 1 and is rejected by the same reasoning. Claim 19 is directed to a non-transitory CRM with instructions when performed by a one or more processor to perform a method with step(s) similar to those in claim 1 and is rejected by the same reasoning. As to claim 25: Wantland discloses an image generation system, comprising: one or more processors Cause an software application that uses one or more more neural networks (¶0106-107, processor/memory to execute an image application that employs one or more trained neural network to perform the process disclosed) to generate, for one or more objects depicted with a first pose in one or more first images to one or more second images, a second pose for the one or more objects to be added to one or more second images (¶0079-0092, Fig. 6A-B, generate an object with different pose(s) in a new image), the added one or more objects in the one or more second images having a second pose different from the first pose, generated by the one or more neural networks based, at least in part, on one or more other objects within the one or more second images. (See ¶0080-0092, Fig.6A-B, an object image depicting a bike (602a) to be added to the scene image 601. Orientation/pose of the bike is adjusted based on the background objects of the scene 601 as seen in Fig. 6B to create a coherent composite scene with the added object (bike). “For example, the pose of the bike may be adjusted to lean at a greater angle than it would be if boots 610 were not present, so that the boots 610 are behind the rendered virtual bike 602b in screen 600b.”, “Similar metadata may be provided indicating the particular type of object and real-world sizing parameters for other segmentation masks in image 601. By combining this information with a depth map of the image, the editing application may determine what the pose and relative position of the real-world objects captured in image 601”) Wantland discloses the image processing application that employs one or more neural network in the process of generating a new image with the object with a new pose generated, with at least the depth map generation phase, but does not explicitly mention the the one or more neural network to generate the second pose to add the object with new pose to one or more second images. However, such generative tasks with context awareness is well-established in the art as typically being done via generative neural network, which generate novel content and output new images with said novel content. Indeed, Dixit, in a related field of endeavor, discloses an image generation application that synthesizes a new pose of object in a reference image and output an image with the object with new pose (Abstract), and per Fig. 2-4 wherein each part of the software structure is a neural network, for example the domain transfer model is a neural network structure depicted in fig. 3, the identify recovery module is a autoencoder with encoder-decoder neural network structure depicted in Fig. 4. Per ¶0083-0084, with Fig. 13A-Fig. 13B describing using the architecture above to generate new pose of the object by synthetizing the new depth map. Fig. 13C, outputting from the intensity model one or more images containing the object with a new pose from a new view point It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention the image generation architecture of Wantland to use neural networks not only to generate the depth maps of objects, but also to use fully them to generate context-awared new pose as well as the new images with the object, in similar suggestion by Dixit. Given that Wantland’s process involves deep analysis of image context and object pose as well as generation of new creative content, the involvement of a generative model is mandatory because it is impossible for legacy non-smart software to perform such deep analysis and creative task. Legacy software might be able to perform analysis, such as depth analysis, however they cannot “learn”, and the generation of new poses in Wantland must use an AI model such as a GAN. Using generative neural network in Dixit is shown to perform similar creative tasks in Wantland by simply learning from the original source images while retaining accuracy of features. Claim(s) 2, 4-6, 8, 10-12, 14, 16-18, 20, 22-24, 26, 28-30 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wantland et al. (US 2021/0042950) in view of Dixit et al. (US 2020/0380720) in view of Lee et al. (US 2020/0074707) As to claims 2, 8, 14, 20 and 26: Wantland in view of Dixit discloses all limitations of claim 1/7/13/19 and 25, however does not state the one or more neural networks 2include one or more variational autoencoders (VAEs) to determine features for the one or more other objects within the one or more second image and encode those features to a latent space to act as a 4constraint in adding the one or more first objects to the image. (See at least ¶0018, 0019, also, 0026-0028 using at least a VAE, features of the object and background are analyzed, to generate a vector in a latent space that is used to generate location/scale of the object to be added in the scene). Lee discloses a system/method for inserting objects into an existing image in which the one or more neural networks 2include one or more variational autoencoders (VAEs) to determine features for the first 3objects and the second objects and encode those features to a latent space to act as a 4constraint in adding the one or more first objects to the image. (See at least ¶0018, 0019, also, 0026-0028 using at least a VAE, features of the object and background are analyzed, to generate a vector in a latent space that is used to generate location/scale of the object to be added in the scene). It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the Wantland’s neural networks to include one or more variational autoencoders (VAEs) to determine features for the first 3objects and the second objects and encode those features to a latent space to act as a 4constraint in adding the one or more first objects to the image. Recall that Wantland employs neural network that generates/render object as established in ¶0090 , which is equivalent of generator of GAN of Lee. A VAE allows for advantage of accurate injection by providing specific location/scale of an object in a scene (Lee, ¶0026). As to claims 4, 10, 16, 22 and 28: Wantland in view of Dixit in view of Lee discloses all limitations of claim 2/8/14/20/26 and regarding: wherein the one or more neural networks 2include a generative network to determine one or more potential poses for the added one or more 3 objects based at least in part upon object types of the one or more other objects and with 4respect to features of the one or more second objects, wherein information for the potential 5poses is to be encoded into the latent space. (Lee, as discussed in above, discloses determining potential placements that maintain contextual coherence with the scene’s features per ¶0018-0019 , which is encoded in latent space, and See at least ¶0018, 0019, also, 0026-0028 using at least a VAE, features of the object and background are analyzed, to generate a vector in a latent space that is used to generate placement/scale of the object to be added in the scene. Wantland discloses determining potential poses for the inserted object based on object types, and also with respect to feature of the background objects, See at least ¶0089-0093 inserting object with pose based on object classes with consideration of background object) It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that Lee’s system and method to incorporate the concept of pose determination of Wantland as discussed above. Such implementation satisfies the need for contextual coherence accentuated by Lee’s disclosure in at least ¶0018-0019. As to claims 5, 11, 17, 23 and 29: Wantland in view of Dixit in view of Lee discloses all limitations of claims 4/10/16/22/28, wherein the one or more neural networks 2include a neural network to determine one or more potential positions for the added one or more 3 objects based at least in part upon the object types of the one or more other objects and potential poses of the one or more 4first objects, and with respect to the features of the one or more other objects, wherein 5information for the potential positions is to be encoded into the latent space. ( Wantland, See ¶0080-0092, Fig.6A-B, an object image depicting a bike (602a) to be added to the scene image 601. Orientation/pose of the bike is adjusted based on the background objects of the scene 601 as seen in Fig. 6B to create a coherent composite scene with the added object (bike). “For example, the pose of the bike may be adjusted to lean at a greater angle than it would be if boots 610 were not present, so that the boots 610 are behind the rendered virtual bike 602b in screen 600b.”, “Similar metadata may be provided indicating the particular type of object and real-world sizing parameters for other segmentation masks in image 601. By combining this information with a depth map of the image, the editing application may determine what the pose and relative position of the real-world objects captured in image 601”. Lee discloses determining potential placements, which include position and orientation, that maintain contextual coherence with the scene’s features per ¶0018-0019, which is encoded in latent space - See at least ¶0018, 0019, also, 0026-0028 using at least a VAE, features of the object and background are analyzed, to generate a vector in a latent space that is used to generate placement/scale of the object to be added in the scene). As to claims 6, 12, 18, 24 and 30: Wantland in view of Dixit and in view of Lee discloses all limitations of claims 5/11/17/23/29, wherein the one or more neural networks 2include a generative adversarial network (GAN) to generate one or more output images including 137 \\NORTHCA - 1R2674/010501 - 2773047 vlthe added one or more objects added to the image, wherein the added one or more objects have different 4poses or positions in the output images, the poses and positions to be selected from the potential 5poses and the potential positions determined from the latent space. (Lee, See at least ¶0018-0019, using neural network (Generative adversarial network) model add a desired object into a desired position of a captured real world scene image. See also Wantland, ¶0080-0092, Fig.6A-B, an object image depicting a bike (602a) to be added to the scene image 601. Orientation/pose of the bike is adjusted based on the background objects of the scene 601 as seen in Fig. 6B to create a coherent composite scene with the added object (bike). “For example, the pose of the bike may be adjusted to lean at a greater angle than it would be if boots 610 were not present, so that the boots 610 are behind the rendered virtual bike 602b in screen 600b.”) Claim(s) 3, 9, 15, 21 and 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wantland et al. (US 2021/0042950) in view of Dixit et al. (US 2020/0380720) in view of Lee et al. (US 2020/0074707) in view of Kopf (Mixture of Expert Variational Autoencoder for Clustering and Generating from Similarity-based Representation (01-2020) and in further view of Irsoy et al. (Unsupervised feature extraction with autoencoder trees” (2017) (prior arts of record). As to claims 3, 9, 15, 21 and 27: Wantland in view of Dixit and in view of Lee discloses all limitations of claims 2/8/14/20/25, however is silent on the one or more neural networks 2include a gating network to select the one or more VAEs from a set of VAEs each trained 3for a different class of object, the gating network to select the one or more VAEs using a 4hierarchical mixture-of-experts approach. Kopf discloses a gating network to select the one or more VAEs from a set of VAEs each trained 3for a different class of object (See Abstract, see page 3, a cluster I is gated to a corresponding expert (VAE), note that an expert has sole expertise in a particular class of object). It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the system/method of Lee to incorporate the feature of gating network to select VAEs as such implementation show superior clustering performance of the model on real world data (See page 2 of Kopf). None of the above further discloses using hierarchical mixture of expert approach. Irsoy, however, in a related field of endeavor discloses in Abstract, page 64, Section 3 through page 65, which discusses a soft decision node to direct instance to its branches according to different probability as given a gating function (gating network) in a hierarchical mixture of expert approach. Also Fig. 1, left column of page 64 discusses the gating function. It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the system/method of Lee to incorporate the feature of using hierarchical mixture of expert approach to select VAEs as such implementation improved operational accuracy (Irsoy page 71 - Conclusion) Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 10102629 - Methods, apparatus, and computer readable storage media related to defining a planar model that approximates a plurality of surfaces of an object and/or applying the planar model to detect the object and/or to estimate a pose for the object. For example, the planar model may be compared to data points sensed by the a three-dimensional vision sensor of a robot to determine the object is present in the field of view of the sensor and/or to determine a pose for the object relative to the robot. A planar model comprises a plurality of planar shapes modeled relative to one another in a three-dimensional space and approximates an object by approximating one or more surfaces of the object with the planar shapes. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUAN M HUA whose telephone number is (571)270-7232. The examiner can normally be reached 10:30-6:30. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Anthony Addy can be reached on 571-272-7795. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /QUAN M HUA/Primary Examiner, Art Unit 2645
Read full office action

Prosecution Timeline

Jul 07, 2020
Application Filed
Aug 26, 2022
Non-Final Rejection — §103
Mar 01, 2023
Response Filed
Mar 23, 2023
Final Rejection — §103
Jul 18, 2023
Interview Requested
Jul 28, 2023
Examiner Interview Summary
Jul 28, 2023
Applicant Interview (Telephonic)
Sep 29, 2023
Response after Non-Final Action
Sep 29, 2023
Notice of Allowance
Oct 11, 2023
Response after Non-Final Action
Dec 28, 2023
Response after Non-Final Action
Jan 05, 2024
Response after Non-Final Action
Mar 21, 2024
Response after Non-Final Action
May 28, 2024
Request for Continued Examination
May 30, 2024
Response after Non-Final Action
Sep 17, 2024
Examiner Interview Summary
Sep 17, 2024
Applicant Interview (Telephonic)
Sep 19, 2024
Response Filed
Nov 02, 2024
Non-Final Rejection — §103
Feb 07, 2025
Response Filed
May 06, 2025
Final Rejection — §103
Jun 03, 2025
Interview Requested
Jun 12, 2025
Applicant Interview (Telephonic)
Jun 12, 2025
Examiner Interview Summary
Sep 09, 2025
Request for Continued Examination
Sep 10, 2025
Response after Non-Final Action
Oct 29, 2025
Non-Final Rejection — §103
Jan 03, 2026
Interview Requested
Jan 08, 2026
Examiner Interview Summary
Jan 08, 2026
Applicant Interview (Telephonic)
Jan 30, 2026
Response Filed
Feb 18, 2026
Final Rejection — §103
Mar 10, 2026
Interview Requested
Mar 18, 2026
Applicant Interview (Telephonic)
Mar 18, 2026
Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602921
TECHNIQUES FOR MODIFYING AND TRAINING A NEURAL NETWORK
2y 5m to grant Granted Apr 14, 2026
Patent 12574761
MULTI-AP ASSOCIATION IDENTIFIERS MANAGEMENT
2y 5m to grant Granted Mar 10, 2026
Patent 12572803
MULTI-AGENT REINFORCEMENT LEARNING WITH MATCHMAKING POLICIES
2y 5m to grant Granted Mar 10, 2026
Patent 12559330
LOADING OPERATION MONITORING APPARATUS AND METHOD OF USING THE SAME
2y 5m to grant Granted Feb 24, 2026
Patent 12556939
FIRST NODE, THIRD NODE, FOURTH NODE AND METHODS PERFORMED THEREBY, FOR HANDLING PARAMETERS TO CONFIGURE A NODE IN A COMMUNICATIONS NETWORK
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

7-8
Expected OA Rounds
72%
Grant Probability
94%
With Interview (+21.9%)
2y 9m
Median Time to Grant
High
PTA Risk
Based on 621 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month