Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed on 11/24/25 has been entered and made of record. Claims 1-2, 8-9 and 15-16 are amended. Claims 5, 12 and 19 are cancelled. Claims 1-4, 6-11, 13-18 and 20 are pending.
Response to Arguments
Applicant’s arguments with respect to claims 1, 8 and 15 have been considered but they are not persuasive.
Applicant asserts that Qian does not describe these multiple generators as being "independently optimized for a specific segment of the plurality of segments based on contents of a corresponding segment of the input image"; each of the generators in Qian are not focused on a specific semantic segment (e.g., nose, eyes, ears, etc.). Further, a subpattern cannot be equated to "a full version of the input image," as required by amended claim 1; Accordingly, even if one were to combine Ling and Qian, the resulting combination would fail to teach or suggest generating full images per generator and selectively compositing segments based on semantic segmentation (p. 8-9 of Remarks).
Examiner notices that applicant discloses “The method may include receiving an input image and a segmentation mask, projecting, using a differentiable machine learning pipeline, a plurality of segments of the input image into a plurality of latent spaces associated with a plurality of generators to obtain a plurality of projected segments, and compositing the plurality of projected segments into an output image” in Abstract. Here, an input image is firstly segmented into a plurality of segments, and then the segments are projected into a plurality of latent spaces associated with a plurality of generators to obtain a projected segments, and then are combined in to an output image. It is well-known that the output image of a generative adversarial network is a synthetic image designed to look real image (input image). However, there is no clear definition of the cited language “full version of the input image” in the specification, which is not known to the skilled in the art of neural network application, for example, a) a full version of an input image is for size (i.e. full size of the image), b) resolution (same resolution as the input image), or c) all contents included in the input image. Therefore, it is indefinite for the claim language “a full version of the input image”.
Examiner also notices that Ling discloses “In practice, an input image may be embedded into the latent space of a GAN, and the GAN may generate two outputs-an image corresponding to the input image and a segmentation mask corresponding to the generated image… The editing vector may thus represent the difference between the original and edited images in the latent space of the GAN, and the GAN may then generate an updated output image corresponding to the point in the latent space identified using the editing vector. In embodiments, the editing vector may be stored for use with other input images that have similar edits performed on their corresponding segmentation masks---e.g., an "enlarged wheels on car" editing vector may be used on any image of a car input to the GAN to generate enlarged wheels” in [0007]. Here, output image has similar content (car in this embodiment) as input image except enlarged wheels, which is interpreted as “a full version of the input image”.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-4, 6-11, 13-18 and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA the inventor(s), at the time the application was filed, had possession of the claimed invention.
Independent claims 1, 8, and 15 cite the limitation “wherein each generated image from the plurality of generated images is a full version of the input image”. There is no written description on “full version” of the input image in the specification. Claims 2-4, 6-7, 9-11, 13-14, 16-18 and 20 depend on these claims and are rejected under the similar rationale.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-4, 6-11, 13-18 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Independent claims 1, 8, and 15 cite the limitation “wherein each generated image from the plurality of generated images is a full version of the input image”. There is no clear definition of “full version of the input image” for the skilled in the art, which is directed to an indefinite scope of the claim. Claims 2-4, 6-7, 9-11, 13-14, 16-18 and 20 depend on these claims and are rejected under the similar rationale. For the purpose of the prosecution of the application, the claim limitation “wherein each generated image from the plurality of generated images is a full version of the input image” is interpreted as an output has similar content without missing any portion of the input image.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all
obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 6-11, 13-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ling et al. (US 2022/0383570) in view of Qian et al. (CN 111260652A) and Cherian et al. (US 2022/0309672 A1) further in view of Lee et al. (KR102225753 B1) and Arora et al. (Seam Reconstruct: Dynamic Scene Stitching with Large Exposure Difference, 2009 Second International Conference on the Applications of Digital Information and Web Technologies, Aug. 04-06, 2009).
As to Claim 1, Ling teaches A computer-implemented method comprising:
receiving an input image and a segmentation mask, the segmentation mask dividing an object depicted in the input image into a plurality of segments (Ling discloses part segmentation masks, for example, a mask for a headlight of a car in [0022]; “Further, to train the GAN 102 and to perform segmentation on a new image, an input or original image may be embedded into the latent space 110 of the GAN 112“ in [0025]; “As such, annotated images, x, may be embedded from a dataset labeled with semantic segmentations into the latent space 110, and the semantic segmentation branch of the generator of the GAN 112 may be trained using, e.g., standard supervised learning objectives (e.g., cross entropy loss)” in [0026]. Here, the segmentation mask can be used to decompose a car in the image into a plurality of segments. Qian also discloses “The mode decomposer is used to receive the real image and decompose the real image into multiple real image sub-patterns” at p. 2);
determining, by a plurality of separate generator models corresponding to the plurality of segments, a plurality of latent codes, wherein each separate generator model determines a latent code from its separate latent space, and wherein each of the plurality of latent codes corresponding to a separate generator are independently optimized for a specific segment of the plurality of segments based on contents of a corresponding segment of the input image; generating, by the plurality of separate generator models using the plurality of latent codes, a plurality of generated images, wherein each generated image from the plurality of generated images is a full version of the input image, and wherein a generated segment of each generated image from the plurality of generated images matches the corresponding segment of the input image; and compositing the generated segment from each of the plurality of generated images into an output image corresponding to the input image (Ling discloses “When training the GAN 102, the generator of the GAN may map latent codes z E Z, drawn from a multivariate normal distribution, into realistic images… The GAN 112 may thus use the joint distribution p(x, y) to perform high-precision semantic image editing of real and synthesized images. As such, the GAN 112 may model p(x, y) by adding an additional segmentation branch to the image generator” in [0024]; synthesizes segmentation outputs of the GAN 112 in [0029]. Ling is silent on a plurality of separate generator models. Qian further discloses “In the present invention, the real image is decomposed into a plurality of sub-modes by the mode decomposer, the MIMO-GAN module is used to generate the complete sub-modal image of the real image, and finally the output image is obtained by the mode synthesizer. In the present invention, the real image is decomposed into multiple sub-patterns through a mode decomposer, and multiple generators and multiple discriminators in the MIMO-GAN module are used to fight the game to capture the complete modal information of the real image… The MIMO-GAN module is composed of a generator set composed of multiple generators and a judge set composed of multiple judges; the MIMO-GAN module accepts the sub-modes and input signals of the real image from the mode resolver , Through the adversarial game of the discriminator, multiple sub-mode outputs are generated. The mode synthesizer is used to receive the m sub-modes output by the generator of the MIMO-GAN module, merge these sub-modes to generate a complete image, and obtain the final output result. An image generation method based on MIMO-GAN, using an image generation system based on MIMO-GAN, includes the following steps:
Step S1: input the real image into the mode decomposer for mode decomposition, and obtain n sub-patterns of the real image;
Step S2: Input the input signal into m generators respectively to obtain m generated initial sub-patterns;
Step S3: Fix the parameters of all generators in the MIMO-GAN module and read in the sub-patterns of the n real images in step S1 and the initial sub-patterns generated in step S2, train n discriminators in turn to obtain MIMO -The parameters of all discriminators in the GAN module” at p. 2-3; see also Fig 2 below:
PNG
media_image1.png
456
635
media_image1.png
Greyscale
.
Ling also teaches optimizing the latent code for a specific segment, such as, “The image editing may be achieved by using segmentation mask modifications (e.g., provided by a user, or otherwise) to optimize the latent code to be consistent with the updated segmentation, thus effectively changing the original, e.g., RGB image. To improve efficiency of the system, and to not require optimizations for each edit on each image, editing vectors may be learned in latent space that realize the edits and that can be directly applied on other images with or without additional optimizations” in [0006]; “At a high level, the process 100 may include embedding an image into a GANs 112 latent space 110, and condition latent code optimization may be executed according to an edit(s) to the segmentation mask. As a result, the corresponding image generated by the jointly modeled GAN may also be modified to match the edits to segmentation mask. To memorialize or amortize the optimization for a given edit, one or more editing vectors 122 may be learned in the latent space 110 that realize the edits. This framework allow for learning any number of editing vectors corresponding to any number of different edit types (e.g., enlarge wheels, shrink headlights, remove trunk of sedan, add a smile to a person, change gaze direction of a person, add frown to image depicting a painting or other out-of-domain image type, etc.)” in [0022]; “For example, with respect to FIG. 1B, the semantic segmentations 114A may be modified and the latent code z may be optimized for consistency with the new segmentation 114B within the editing region (e.g., using a first loss function 134A), and with the new image 116B appearance outside of the editing region” in [0033]. See also Above “Response to Arguments”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Ling with the teaching of Qian so as to turn the traditional GAN composed of a single generator and a single discriminator into one a GAN with a multiple input and output structure composed of multiple generators and multiple judges, and apply the invention not only to simple single-mode images, but also to complex multi-mode images, and even Cross-mode signals, such as image signals, language signals, and text signals (Qian, p. 5).
In response to the argument that Qian fails to describe how the ''plurality of latent codes" are selected for a particular GAN or that they are selected with any relation to the received input image, examiner notices that it is inherent that each latent code is extracted from the corresponding segment. For example, Cherian discloses “Some embodiments include the InSeGAN designed with a generator module that, instead of taking a single noise vector as input (as in standard GANs), the generator module takes noise vectors, and each noise vector of d-dimensions from a standard normal distribution, and generates a multiple-instance depth image as output, as noted above” in [0013]; “At least one benefit of taking the multiple random noise vectors (equal to the number of instances) as input to produce instance-specific feature maps, such that when training the model in a generative-adversarial setting, the InSeGAN framework implicitly forces the generator to capture the distribution of single object instances” in [0014]; “The instance encoder produces latent vectors for each 3D object instance. The latent vectors are iteratively used to produce corresponding 3D transformation matrices by the pose encoder, to produce single instance depth images by the generator consisting of only one instance of the 3D object in each depth image” in [0021]; see also [0016, 0018] and Fig 1 & 4.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Ling and Qian with the teaching of Cherian so as to take the multiple random noise vectors [latent code] as input to produce instance-specific feature maps, and explicitly force the generator to capture the distribution of single object instances when training the model in a generative-adversarial setting (Cherian, [0014]).
Ling, Qian and Cherian don’t directly teach stitching loss and seam visibility. The combination of Lee and Arora further teaches following limitations:
wherein during compositing the segmentation mask is dynamically updated to change boundaries of one or more segments based on a stitching loss calculated by a stitching layer, wherein the stitching loss evaluates a visibility of stitching at the boundaries of the one or more segments (Ling discloses “For example, with respect to FIG. 1B, the semantic segmentations 114A may be modified and the latent code z may be optimized for consistency with the new segmentation 114B within the editing region (e.g., using a first loss function 134A), and with the new image 116B appearance outside of the editing region (e.g., using a second loss function 134B). To do this, in embodiments, corresponding gradients may be backpropagated 132 through the shared generator of the GAN 112…” in [0033]; “The processor of claim 2, wherein at least one latent code optimization iteration of the one or more latent code optimization iterations includes backpropagating one or more gradients through the GAN” in claim 3. Lee further discloses “The stitching error segmentation model may be learned considering stitching error classification loss, bounding box regression loss, and mask prediction loss” in [0017]; “By providing a deep learning-based quality evaluation method and device for a stitched panoramic image according to an embodiment of the present invention, the quality of the stitched image can be evaluated based on stitching distortion segmentation, and there is an advantage in that real-time quality evaluation is possible” in [0023]; quality evaluation unit in [0087-0089]. Here, Lee’s quality evaluation unit may have to consider seam visibility. For example, Arora discloses “The Seam Evaluation Approach is a network based algorithm[9] which avoids “cutting through” objects where a seam would look unnatural. For this approach firstly we implemented Graph Cut technique[3] which is based on the concept of maxflow theorem. In this approach the difference of pixel value at the overlapping pixels was used to find an optimal cut so that the seam visibility is least. This cost function is given below:
PNG
media_image2.png
414
644
media_image2.png
Greyscale
)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Ling, Qian and Cherian with the teaching of Lee so as to provide a deep learning-based real-time quality evaluation method for a stitched image based on stitching distortion segmentation (Lee, [0023]). The motivation of combining the invention of Arora is to propose a more efficient cost function with accounts for the edges present in the input images (Arora, p. 576).
As to Claim 2, Ling in view of Qian, Cherian, Lee and Arora teaches The computer-implemented method of claim 1, further comprising:
receiving a request to edit a first portion of the input image; determining a segment of the input image corresponding to the first portion of the input image; generating, by a separate generator model corresponding to the segment of the input image, an edited image by exploring a latent space associated with the separate generator model; and generating an edited output image by compositing the edited image with the input image (Ling discloses “For example, the segmentation mask 114A generated by the semantic segmentation branch of the GAN 112 may by edited by, e.g., a user 120, and/or may be edited using an automated editing process to generate the edited segmentation mask 114B… The edit(s) may include adjustments to one or more features of one or more objects represented in the segmentation mask 114A and/or the generated image 116A… The edits may be made using interactive digital painting or labeling tools, in examples, to manually modify the segmentation according to a desired edit” in [0027]; “For example, in FIG. 2A, a vehicle in original image 202A may have its shape changed in image 202B, its wheels enlarged in image 202C, and/or its front light shrunk in image 202D. As another example, in FIG. 2B, a person in original image 204A may have a frown in image 204B, a change in gaze direction to the left in image 204C, and/or a smile in image 204D” in [0028]. Qian further discloses “The MIMO-GAN module consists of two parts, one is the generator set, and the other is the judge set. The generator set consists of m generator networks, which are respectively denoted as generator G1, generator G2,..., generator Gm, which can generate m sub-patterns. These m sub-patterns are composed of m generators. The input signal z is transformed, and the distribution of these signals is Pz…” at p. 5-6, see also Fig 2-3.)
As to Claim 3, Ling in view of Qian, Cherian, Lee and Arora teaches The computer-implemented method of claim 1, wherein the plurality of separate generator models are clones of a single generator model such that the plurality of separate generator models have a same parameters and weight values (Ling discloses “the shared generator of the GAN 112” in [0033]. Qian also discloses “In step S2, m can be set according to actual needs, the minimum can be set to 1, and the maximum does not exceed 1.2n” at p. 3.)
As to Claim 4, Ling in view of Qian, Cherian, Lee and Arora teaches The computer-implemented method of claim 1, wherein the plurality of separate generator models includes two or more different generator models having different parameters or weight values (Qian discloses “The generator set consists of m generator networks, which are respectively denoted as generator G1, generator G2,..., generator Gm, which can generate m sub-patterns. These m sub-patterns are composed of m generators. The input signal z is transformed, and the distribution of these signals is Pz…” at p. 5-6, see also Fig 2-3.)
As to Claim 6, Ling in view of Qian, Cherian, Lee and Arora teaches The computer-implemented method of claim 1, wherein receiving an input image and a segmentation mask further comprises:
processing the input image using a semantic segmentation model to generate the segmentation mask (Ling discloses “For example, with respect to FIG. 1B, the semantic segmentations 114A may be modified and the latent code z may be optimized for consistency with the new segmentation 114B within the editing region” in [0033].)
As to Claim 7, Ling in view of Qian, Cherian, Lee and Arora teaches The computer-implemented method of claim 1, wherein receiving an input image and a segmentation mask further comprises:
receiving an input identifying at least one segment of the segmentation mask via a user interface, wherein the input includes painting the at least one segment on a representation of the input image (Ling discloses “For example, the segmentation mask 114A generated by the semantic segmentation branch of the GAN 112 may by edited by, e.g., a user 120… The edit(s) may include adjustments to one or more features of one or more objects represented in the segmentation mask 114A and/or the generated image 116A…. The edits may be made using interactive digital painting or labeling tools, in examples, to manually modify the segmentation according to a desired edit” in [0027].)
Claim 8 recites similar limitations as claim 1 but in a computer readable medium form. Therefore, the same rationale used for claim 1 is applied.
Claim 9 is rejected based upon similar rationale as Claim 2.
Claim 10 is rejected based upon similar rationale as Claim 3.
Claim 11 is rejected based upon similar rationale as Claim 4.
Claim 13 is rejected based upon similar rationale as Claim 6.
Claim 14 is rejected based upon similar rationale as Claim 7.
Claim 15 recites similar limitations as claim 1 but in a system form. Therefore, the same rationale used for claim 1 is applied.
Claim 16 is rejected based upon similar rationale as Claim 2.
Claim 17 is rejected based upon similar rationale as Claim 3.
Claim 18 is rejected based upon similar rationale as Claim 4.
Claim 20 is rejected based upon similar rationale as Claim 6.
Conclusion
THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEIMING HE whose telephone number is (571)270-1221. The examiner can normally be reached Monday-Friday, 8:30am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy Goddard can be reached on 571-272-7773. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Weiming He/
Primary Examiner, Art Unit 2611