Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-30 are pending.
IDS of 02/04/2026 is/are considered.
Response to Arguments
Arguments presented 01/30/2026 are considered but are moot in view of a new ground of rejection.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 7, 13, 19 and 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wantland et al. (US 2021/0042950) in view of Dixit et al. (US 2020/0380720).
As to claim 1:
Wantland discloses:
One or more processors, comprising: circuitry to:
Cause an software application that uses one or more more neural networks (¶0106-107, processor/memory to execute an image application that employs one or more trained neural network to perform the process disclosed) to generate, for one or more objects depicted with a first pose in one or more first images to one or more second images, a second pose for the one or more objects to be added to one or more second images (¶0079-0092, Fig. 6A-B, generate an object with different pose(s) in a new image), the added one or more objects in the one or more second images having a second pose different from the first pose, generated by the one or more neural networks based, at least in part, on one or more other objects within the one or more second images. (See ¶0080-0092, Fig.6A-B, an object image depicting a bike (602a) to be added to the scene image 601. Orientation/pose of the bike is adjusted based on the background objects of the scene 601 as seen in Fig. 6B to create a coherent composite scene with the added object (bike). “For example, the pose of the bike may be adjusted to lean at a greater angle than it would be if boots 610 were not present, so that the boots 610 are behind the rendered virtual bike 602b in screen 600b.”, “Similar metadata may be provided indicating the particular type of object and real-world sizing parameters for other segmentation masks in image 601. By combining this information with a depth map of the image, the editing application may determine what the pose and relative position of the real-world objects captured in image 601”)
Wantland discloses the image processing application that employs one or more neural network in the process of generating a new image with the object with a new pose generated, with at least the depth map generation phase, but does not explicitly mention the the one or more neural network to generate the second pose to add the object with new pose to one or more second images.
However, such generative tasks with context awareness is well-established in the art as typically being done via generative neural network, which generate novel content and output new images with said novel content.
Indeed, Dixit, in a related field of endeavor, discloses an image generation application that synthesizes a new pose of object in a reference image and output an image with the object with new pose (Abstract), and per Fig. 2-4 wherein each part of the software structure is a neural network, for example the domain transfer model is a neural network structure depicted in fig. 3, the identify recovery module is a autoencoder with encoder-decoder neural network structure depicted in Fig. 4. Per ¶0083-0084, with Fig. 13A-Fig. 13B describing using the architecture above to generate new pose of the object by synthetizing the new depth map. Fig. 13C, outputting from the intensity model one or more images containing the object with a new pose from a new view point
It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention the image generation architecture of Wantland to use neural networks not only to generate the depth maps of objects, but also to use fully them to generate context-awared new pose as well as the new images with the object, in similar suggestion by Dixit. Given that Wantland’s process involves deep analysis of image context and object pose as well as generation of new creative content, the involvement of a generative model is mandatory because it is impossible for legacy non-smart software to perform such deep analysis and creative task. Legacy software might be able to perform analysis, such as depth analysis, however they cannot “learn”, and the generation of new poses in Wantland must use an AI model such as a GAN. Using generative neural network in Dixit is shown to perform similar creative tasks in Wantland by simply learning from the original source images while retaining accuracy of features.
As to claim 7:
Wantland discloses:
A system comprising: one or more processors to:
Cause an software application that uses one or more neural networks (¶0106-107, processor/memory to execute an image application that employs one or more trained neural network to perform the process disclosed) to generate, for one or more objects depicted with a first pose in one or more first images to one or more second images, a second pose for the one or more objects to be added to one or more second images (¶0079-0092, Fig. 6A-B, generate an object with different pose(s) in a new image), the added one or more objects in the one or more second images having a second pose different from the first pose, generated by the one or more neural networks based, at least in part, on one or more other objects within the one or more second images. (See ¶0080-0092, Fig.6A-B, an object image depicting a bike (602a) to be added to the scene image 601. Orientation/pose of the bike is adjusted based on the background objects of the scene 601 as seen in Fig. 6B to create a coherent composite scene with the added object (bike). “For example, the pose of the bike may be adjusted to lean at a greater angle than it would be if boots 610 were not present, so that the boots 610 are behind the rendered virtual bike 602b in screen 600b.”, “Similar metadata may be provided indicating the particular type of object and real-world sizing parameters for other segmentation masks in image 601. By combining this information with a depth map of the image, the editing application may determine what the pose and relative position of the real-world objects captured in image 601”)
Wantland discloses the image processing application that employs one or more neural network in the process of generating a new image with the object with a new pose generated, with at least the depth map generation phase, but does not explicitly mention the the one or more neural network to generate the second pose to add the object with new pose to one or more second images.
However, such generative tasks with context awareness is well-established in the art as typically being done via generative neural network, which generate novel content and output new images with said novel content.
Indeed, Dixit, in a related field of endeavor, discloses an image generation application that synthesizes a new pose of object in a reference image and output an image with the object with new pose (Abstract), and per Fig. 2-4 wherein each part of the software structure is a neural network, for example the domain transfer model is a neural network structure depicted in fig. 3, the identify recovery module is a autoencoder with encoder-decoder neural network structure depicted in Fig. 4. Per ¶0083-0084, with Fig. 13A-Fig. 13B describing using the architecture above to generate new pose of the object by synthetizing the new depth map. Fig. 13C, outputting from the intensity model one or more images containing the object with a new pose from a new view point
It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention the image generation architecture of Wantland to use neural networks not only to generate the depth maps of objects, but also to use fully them to generate context-awared new pose as well as the new images with the object, in similar suggestion by Dixit. Given that Wantland’s process involves deep analysis of image context and object pose as well as generation of new creative content, the involvement of a generative model is mandatory because it is impossible for legacy non-smart software to perform such deep analysis and creative task. Legacy software might be able to perform analysis, such as depth analysis, however they cannot “learn”, and the generation of new poses in Wantland must use an AI model such as a GAN. Using generative neural network in Dixit is shown to perform similar creative tasks in Wantland by simply learning from the original source images while retaining accuracy of features.
Claim 13 is directed to a method with step(s) similar to those in claim 1 and is rejected by the same reasoning.
Claim 19 is directed to a non-transitory CRM with instructions when performed by a one or more processor to perform a method with step(s) similar to those in claim 1 and is rejected by the same reasoning.
As to claim 25:
Wantland discloses an image generation system, comprising: one or more processors Cause an software application that uses one or more more neural networks (¶0106-107, processor/memory to execute an image application that employs one or more trained neural network to perform the process disclosed) to generate, for one or more objects depicted with a first pose in one or more first images to one or more second images, a second pose for the one or more objects to be added to one or more second images (¶0079-0092, Fig. 6A-B, generate an object with different pose(s) in a new image), the added one or more objects in the one or more second images having a second pose different from the first pose, generated by the one or more neural networks based, at least in part, on one or more other objects within the one or more second images. (See ¶0080-0092, Fig.6A-B, an object image depicting a bike (602a) to be added to the scene image 601. Orientation/pose of the bike is adjusted based on the background objects of the scene 601 as seen in Fig. 6B to create a coherent composite scene with the added object (bike). “For example, the pose of the bike may be adjusted to lean at a greater angle than it would be if boots 610 were not present, so that the boots 610 are behind the rendered virtual bike 602b in screen 600b.”, “Similar metadata may be provided indicating the particular type of object and real-world sizing parameters for other segmentation masks in image 601. By combining this information with a depth map of the image, the editing application may determine what the pose and relative position of the real-world objects captured in image 601”)
Wantland discloses the image processing application that employs one or more neural network in the process of generating a new image with the object with a new pose generated, with at least the depth map generation phase, but does not explicitly mention the the one or more neural network to generate the second pose to add the object with new pose to one or more second images.
However, such generative tasks with context awareness is well-established in the art as typically being done via generative neural network, which generate novel content and output new images with said novel content.
Indeed, Dixit, in a related field of endeavor, discloses an image generation application that synthesizes a new pose of object in a reference image and output an image with the object with new pose (Abstract), and per Fig. 2-4 wherein each part of the software structure is a neural network, for example the domain transfer model is a neural network structure depicted in fig. 3, the identify recovery module is a autoencoder with encoder-decoder neural network structure depicted in Fig. 4. Per ¶0083-0084, with Fig. 13A-Fig. 13B describing using the architecture above to generate new pose of the object by synthetizing the new depth map. Fig. 13C, outputting from the intensity model one or more images containing the object with a new pose from a new view point
It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention the image generation architecture of Wantland to use neural networks not only to generate the depth maps of objects, but also to use fully them to generate context-awared new pose as well as the new images with the object, in similar suggestion by Dixit. Given that Wantland’s process involves deep analysis of image context and object pose as well as generation of new creative content, the involvement of a generative model is mandatory because it is impossible for legacy non-smart software to perform such deep analysis and creative task. Legacy software might be able to perform analysis, such as depth analysis, however they cannot “learn”, and the generation of new poses in Wantland must use an AI model such as a GAN. Using generative neural network in Dixit is shown to perform similar creative tasks in Wantland by simply learning from the original source images while retaining accuracy of features.
Claim(s) 2, 4-6, 8, 10-12, 14, 16-18, 20, 22-24, 26, 28-30 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wantland et al. (US 2021/0042950) in view of Dixit et al. (US 2020/0380720) in view of Lee et al. (US 2020/0074707)
As to claims 2, 8, 14, 20 and 26:
Wantland in view of Dixit discloses all limitations of claim 1/7/13/19 and 25, however does not state the one or more neural networks 2include one or more variational autoencoders (VAEs) to determine features for the one or more other objects within the one or more second image and encode those features to a latent space to act as a 4constraint in adding the one or more first objects to the image. (See at least ¶0018, 0019, also, 0026-0028 using at least a VAE, features of the object and background are analyzed, to generate a vector in a latent space that is used to generate location/scale of the object to be added in the scene).
Lee discloses a system/method for inserting objects into an existing image in which the one or more neural networks 2include one or more variational autoencoders (VAEs) to determine features for the first 3objects and the second objects and encode those features to a latent space to act as a 4constraint in adding the one or more first objects to the image. (See at least ¶0018, 0019, also, 0026-0028 using at least a VAE, features of the object and background are analyzed, to generate a vector in a latent space that is used to generate location/scale of the object to be added in the scene).
It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the Wantland’s neural networks to include one or more variational autoencoders (VAEs) to determine features for the first 3objects and the second objects and encode those features to a latent space to act as a 4constraint in adding the one or more first objects to the image. Recall that Wantland employs neural network that generates/render object as established in ¶0090 , which is equivalent of generator of GAN of Lee. A VAE allows for advantage of accurate injection by providing specific location/scale of an object in a scene (Lee, ¶0026).
As to claims 4, 10, 16, 22 and 28:
Wantland in view of Dixit in view of Lee discloses all limitations of claim 2/8/14/20/26 and regarding:
wherein the one or more neural networks 2include a generative network to determine one or more potential poses for the added one or more 3 objects based at least in part upon object types of the one or more other objects and with 4respect to features of the one or more second objects, wherein information for the potential 5poses is to be encoded into the latent space. (Lee, as discussed in above, discloses determining potential placements that maintain contextual coherence with the scene’s features per ¶0018-0019 , which is encoded in latent space, and See at least ¶0018, 0019, also, 0026-0028 using at least a VAE, features of the object and background are analyzed, to generate a vector in a latent space that is used to generate placement/scale of the object to be added in the scene. Wantland discloses determining potential poses for the inserted object based on object types, and also with respect to feature of the background objects, See at least ¶0089-0093 inserting object with pose based on object classes with consideration of background object)
It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that Lee’s system and method to incorporate the concept of pose determination of Wantland as discussed above. Such implementation satisfies the need for contextual coherence accentuated by Lee’s disclosure in at least ¶0018-0019.
As to claims 5, 11, 17, 23 and 29:
Wantland in view of Dixit in view of Lee discloses all limitations of claims 4/10/16/22/28, wherein the one or more neural networks 2include a neural network to determine one or more potential positions for the added one or more 3 objects based at least in part upon the object types of the one or more other objects and potential poses of the one or more 4first objects, and with respect to the features of the one or more other objects, wherein 5information for the potential positions is to be encoded into the latent space. ( Wantland, See ¶0080-0092, Fig.6A-B, an object image depicting a bike (602a) to be added to the scene image 601. Orientation/pose of the bike is adjusted based on the background objects of the scene 601 as seen in Fig. 6B to create a coherent composite scene with the added object (bike). “For example, the pose of the bike may be adjusted to lean at a greater angle than it would be if boots 610 were not present, so that the boots 610 are behind the rendered virtual bike 602b in screen 600b.”, “Similar metadata may be provided indicating the particular type of object and real-world sizing parameters for other segmentation masks in image 601. By combining this information with a depth map of the image, the editing application may determine what the pose and relative position of the real-world objects captured in image 601”. Lee discloses determining potential placements, which include position and orientation, that maintain contextual coherence with the scene’s features per ¶0018-0019, which is encoded in latent space - See at least ¶0018, 0019, also, 0026-0028 using at least a VAE, features of the object and background are analyzed, to generate a vector in a latent space that is used to generate placement/scale of the object to be added in the scene).
As to claims 6, 12, 18, 24 and 30:
Wantland in view of Dixit and in view of Lee discloses all limitations of claims 5/11/17/23/29, wherein the one or more neural networks 2include a generative adversarial network (GAN) to generate one or more output images including 137 \\NORTHCA - 1R2674/010501 - 2773047 vlthe added one or more objects added to the image, wherein the added one or more objects have different 4poses or positions in the output images, the poses and positions to be selected from the potential 5poses and the potential positions determined from the latent space. (Lee, See at least ¶0018-0019, using neural network (Generative adversarial network) model add a desired object into a desired position of a captured real world scene image. See also Wantland, ¶0080-0092, Fig.6A-B, an object image depicting a bike (602a) to be added to the scene image 601. Orientation/pose of the bike is adjusted based on the background objects of the scene 601 as seen in Fig. 6B to create a coherent composite scene with the added object (bike). “For example, the pose of the bike may be adjusted to lean at a greater angle than it would be if boots 610 were not present, so that the boots 610 are behind the rendered virtual bike 602b in screen 600b.”)
Claim(s) 3, 9, 15, 21 and 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wantland et al. (US 2021/0042950) in view of Dixit et al. (US 2020/0380720) in view of Lee et al. (US 2020/0074707) in view of Kopf (Mixture of Expert Variational Autoencoder for Clustering and Generating from Similarity-based Representation (01-2020) and in further view of Irsoy et al. (Unsupervised feature extraction with autoencoder trees” (2017) (prior arts of record).
As to claims 3, 9, 15, 21 and 27:
Wantland in view of Dixit and in view of Lee discloses all limitations of claims 2/8/14/20/25, however is silent on the one or more neural networks 2include a gating network to select the one or more VAEs from a set of VAEs each trained 3for a different class of object, the gating network to select the one or more VAEs using a 4hierarchical mixture-of-experts approach.
Kopf discloses a gating network to select the one or more VAEs from a set of VAEs each trained 3for a different class of object (See Abstract, see page 3, a cluster I is gated to a corresponding expert (VAE), note that an expert has sole expertise in a particular class of object).
It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the system/method of Lee to incorporate the feature of gating network to select VAEs as such implementation show superior clustering performance of the model on real world data (See page 2 of Kopf).
None of the above further discloses using hierarchical mixture of expert approach.
Irsoy, however, in a related field of endeavor discloses in Abstract, page 64, Section 3 through page 65, which discusses a soft decision node to direct instance to its branches according to different probability as given a gating function (gating network) in a hierarchical mixture of expert approach. Also Fig. 1, left column of page 64 discusses the gating function.
It would have been obvious to one of ordinary skill in the art before the effective filing time of the invention that the system/method of Lee to incorporate the feature of using hierarchical mixture of expert approach to select VAEs as such implementation improved operational accuracy (Irsoy page 71 - Conclusion)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 10102629 - Methods, apparatus, and computer readable storage media related to defining a planar model that approximates a plurality of surfaces of an object and/or applying the planar model to detect the object and/or to estimate a pose for the object. For example, the planar model may be compared to data points sensed by the a three-dimensional vision sensor of a robot to determine the object is present in the field of view of the sensor and/or to determine a pose for the object relative to the robot. A planar model comprises a plurality of planar shapes modeled relative to one another in a three-dimensional space and approximates an object by approximating one or more surfaces of the object with the planar shapes.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUAN M HUA whose telephone number is (571)270-7232. The examiner can normally be reached 10:30-6:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Anthony Addy can be reached on 571-272-7795. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/QUAN M HUA/Primary Examiner, Art Unit 2645