Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant’s Response
In the Applicant’s Response dated 12/22/25, the Applicant argued claims previously rejected in the Office Action dated 10/1/25. Claims 1-3, 5-12, 14-22 are pending examination.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 7/7/25 has been entered.
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/22/25 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 5, 6, 10-15, and 18-22 are rejected under 35 U.S.C. 103 as being unpatentable over Zhong, United States Patent Publication 2020/0349393, in view of Saruta et al., United States Patent Publication 2020/0250471 (hereinafter “Saruta”), in further view of Lee, United States Patent Publication 20220061816.
Claim 1:
Zhong discloses:
A computer-implemented method comprising:
producing an initial latent space representation of an input image by encoding the input image (see paragraph [0088]). Zhong teaches producing an initial first latent space representation of an image by encoding the image;
generating, by a generator neural network, an initial output image by processing the initial latent space representation of the input image (see paragraphs [0017], [0050] and [0089]). Zhong teaches generating an output image by processing the image through the generator network;
extracting a plurality of target perceptual features for the input image by providing the input image to a convolutional neural network trained to classify images and wherein extracting the plurality of perceptual features comprises extracting the plurality of target perceptual features from a plurality of selected different intermediate layers of the convolutional neural network, (see paragraphs [0016] and [0054]). Zhong teaches using the convolutional neural network is trained to extract features through many layers. The neural network becomes (e.g., learns) a function that projects (e.g., maps) the image on the latent space. In other words, the latent space is the space where the features lie. The latent space contains a compressed representation of the image. This compressed representation is then used to reconstruct an input, as faithfully as possible. To perform well, a neural network has to learn to extract the most relevant features (e.g., the most relevant latent space).;
outputting the optimized latent space representation of the input image for downstream use (see paragraph [0094]). Zhong teaches outputting the optimized representation of the input image to show barely any differences between the input and output image.
Zhong fails to teach extracting visual representable properties from selected intermediate layers.
Saruta discloses:
wherein the plurality of target perceptual features comprise visually representable properties of one or more objects in the input image (see paragraph [0040]). Saruta teaches visually representable properties of objects such as detecting persons in a scene.
extracting, corresponding to the plurality of target perceptual features extracted from the input image, a plurality of initial perceptual features from the initial output image by providing the initial output image to the convolutional neural network trained to classify images and wherein extracting the plurality of initial perceptual features comprises extracting the plurality of initial perceptual features from a plurality of selected intermediate layers of the convolutional neural network (see paragraphs [0040], [0056], [0059]). Saruta teaches extracting particular features from selected intermediate layers of the neural network. The neural network is trained to classify images and extract particular features from particular layers.
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the method disclosed by Zhong to include extracting particular features from selected intermediate layers for the purpose of efficiently extracting target features from images, as taught by Saruta.
Zhong and Saruta fail to expressly disclose
Lee discloses:
identifying a perceptual loss based on a comparison of the plurality of target perceptual features in the plurality of initial perceptual features, wherein the perceptual loss comprises a difference between the target perceptual features extracted from the input image and the initial perceptual features extracted from the initial output image generated by processing the initial and target latent space representations with the generator neural network (see paragraphs [0025] and [0074]). Lee teaches calculating a perceptual loss which is based on an assessment of differences between the input image and target image comprised by each image pair of the training image pairs. The image is created using a generative neural network;
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the method disclosed by Zhong and Saruta to include identifying a perceptual loss based on the difference calculated by the difference between the initial and target features for the purpose of improving the accuracy of image generation using neural networks, as taught by Lee.
Claim 2:
Zhong discloses:
further comprising down sampling the input image before generating the initial latent space representation of the input image (see paragraphs [0024] and [0054]). Zhong teaches down sampling the input image before generating a latent space representation.
Claim 3:
Zhong discloses:
further comprising computing, the loss by:
downsampling the initial output image (see paragraphs [0024] and [0054]). Zhong teaches downsampling an image;
computing the loss based upon the target perceptual features and the initial perceptual features (see paragraph [0068]). Zhong teaches computing the loss based on the features of the input and features of the target.
Claim 4:
Zhong discloses:
wherein the convolutional neural network is a Visual Geometry 2 Group (VGG) network, and wherein the layers include a conv1_1 layer, aconvl_2 layer, a conv3_1 layer, and a conv4_1 layer of the VGG network (see paragraph [0064] and [0095]). Zhong teaches a VGG network with a plurality of layers.
Zhong fails to teach extracting visual representable properties from selected intermediate layers.
Saruta discloses:
wherein the selected layers include a conv1_1 layer, aconvl_2 layer, a conv3_1 layer, and a conv4_1 layer of the VGG network (see paragraphs [0064]). Saruta teaches a plurality of intermediate layers.
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the method disclosed by Zhong to include intermediate layers for the purpose of efficiently extracting target features from images, as taught by Saruta.
Claim 5:
Zhong discloses:
wherein the loss is further based on one or more of: a comparison of pixels of the input image and pixels of the initial output image; or a comparison of the initial latent space representation and a target latent code (see paragraph [0090]). Zhong teaches comparison of the latent page representation and the target latent representation.
Claim 6:
Zhong discloses:
the downstream use comprising one or more of: applying user-configured edits to the latent space representation of the input image; or generating an output image, by the generator neural network, by processing the optimized latent space representation, wherein the output image is perceptually similar to the input image (see paragraph [0094]). Zhong teaches generating an output image, wherein the output image is very similar to the input page. The output image is generated by the generator.
Claim 9:
Zhong discloses:
outputting the output image for display on a computing device (see paragraph [0099]). Zhong teaches outputting an output image such as super resolution images.
Claims 10-12, 14, 15, 18:
Although Claims 10-12, 14, 15 and 18 are computer system claims, they are interpreted and rejected for the same reasons as the method of Claims 1-3, 5, 6, 9, respectively.
Claims 19-20:
Although Claims 19-20 are medium claims, they are interpreted and rejected for the same reasons as the method of Claims 1 and 6.
Claim 21:
Zhong discloses:
wherein the convolutional neural network is a Visual Geometry 2 Group (VGG) network, and wherein the layers include a conv1_1 layer, aconvl_2 layer, a conv3_1 layer, and a conv4_1 layer of the VGG network (see paragraph [0064] and [0095]). Zhong teaches a VGG network with a plurality of layers.
Zhong fails to teach extracting visual representable properties from selected intermediate layers.
Saruta discloses:
wherein the selected layers include a conv1_1 layer, aconvl_2 layer, a conv3_1 layer, and a conv4_1 layer of the VGG network (see paragraphs [0064]). Saruta teaches a plurality of intermediate layers.
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the method disclosed by Zhong to include intermediate layers for the purpose of efficiently extracting target features from images, as taught by Saruta.
Claim 22:
Zhong discloses:
wherein the convolutional neural network is a Visual Geometry 2 Group (VGG) network, and wherein the layers include a conv1_1 layer, aconvl_2 layer, a conv3_1 layer, and a conv4_1 layer of the VGG network (see paragraph [0064] and [0095]). Zhong teaches a VGG network with a plurality of layers.
Zhong fails to teach extracting visual representable properties from selected intermediate layers.
Saruta discloses:
wherein the selected layers include a conv1_1 layer, aconvl_2 layer, a conv3_1 layer, and a conv4_1 layer of the VGG network (see paragraphs [0064]). Saruta teaches a plurality of intermediate layers.
Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the method disclosed by Zhong to include intermediate layers for the purpose of efficiently extracting target features from images, as taught by Saruta.
Claims 7-8 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Zhong, Saruta and Lee, in view of Adamiak et al, "Facial Appearance Modifications using SKPCA-Derived Features Extracted from Convolutional Autoencoder’s Latent Space" (hereinafter “Adamiak”).
Claim 7:
Zhong, Saruta and Lee fail to expressly disclose the processes being less than 10 seconds.
Adamiak discloses:
wherein the producing the initial latent space representation, optimizing the initial latent space representation, and generating the output image that is perceptually similar to the input image are performed in less than about 10 seconds (see page 6, column 2 in Conclusion – page 7 column 1). Adamiak teaches the producing the representations, generating the output is done in real time and causing real time changes to the images.
Accordingly, it would have been obvious to one having ordinary skill in the art before the effectively filing date of the claimed invention to modify Zhong, Saruta and Lee to include the producing the latent space representation, optimizing the representing and generating the output image that is similar to the input image performed in about 10 seconds for the purpose of efficiently viewing the modifications to the images in real time, as taught by Adamiak.
Claim 8:
Zhong discloses:
wherein the output image has a resolution of about 1024 x 1024 pixels (see paragraph [0020]). Zhong teaches creating high resolution images.
Claims 16-17:
Although Claims 16-17 are computer system claims, they are interpreted and rejected for the same reasons as the method of Claims 7-8, respectively.
Response to Arguments
Applicant's arguments filed 7/7/25 have been fully considered but they are not persuasive.
Rejections Under 35 USC 103
Applicant argues In the Office Action, it is alleged that Zhong discloses “extracting a plurality of target perceptual features from the input image by providing the input image to a convolutional neural network trained to classify images and extracting features from a plurality of selected different intermediate layers of the convolutional neural network.” (Office Action, p. 3, citing Zhong pars. 16 and 54). As shown in FIG. 2 of the present application, the architecture used in the claimed invention includes an encoder 206, generator 210, and a third, convolutional, neural network 224. In contrast, Zhong at best describes the concept of extracting features from an encoder. (See Zhong pars. 16 and 54). There is no additional “convolutional neural network trained to classify images,” as is claimed.
The Examiner disagrees.
The claims only require one convolutional neural network using to classify images and extract features. Zhong recites “before a neural network can be used for a task (e.g., classification, regression, image reconstruction, etc.), the neural network is trained to extract features through many layers (convolutional, recurrent, pooling, etc.). The neural network becomes (e.g., learns) a function that projects (e.g., maps) the image on the latent space. In other words, the latent space is the space where the features lie. The latent space contains a compressed representation of the image. This compressed representation is then used to reconstruct an input, as faithfully as
possible. To perform well, a neural network has to learn to extract the most relevant features (e.g., the most relevant latent space)”. Therefore, Zhong teaches the one CNN is used to classify images and extract features from the layers.
Applicant argues Saruta does not appear to extract any particular visually representable properties or features. Saruta essentially identifies people in an image, so the existence of a person rather than any particular features of that person is at issue. (See, e.g., Saruta FIGS. 17A-17C).
The Examiner disagrees.
Saruta teaches extracting particular features from selected intermediate layers of the neural network. The neural network is trained to classify images and extract particular features from particular layers (see paragraphs [0040], [0056], [0059]). Although Saruta teaches detecting features such as bodies, faces, etc., Saruta also teaches being able to detect features such as arms and legs of a human body (see paragraph [0117]). Arms and legs are considered visually representable features. Thus, the claims do not require the specific features to be identified, and Saruta teaches this limitation.
Applicant argues the loss described in Zhong “optimizes the difference between E(Y) (1.e., the latent space presentation of the real sample) and E(G(Z)) (1.e., the latent space representation of the generated, by the generator 502, sample).” (Zhong, par. 73). In contrast, the independent claims as amended specify that “the perceptual loss comprises a difference between the target perceptual features extracted from the input image and the initial perceptual features extracted from the initial output image generated by processing the initial and target latent space representations with the generator neural network,” 1.e., 1s based on actual images extracted from the GAN rather than the latent space representation E from the encoder as in Zhong. Thus, independent claims 1, 10, and 19 are patentable over the cited art. Claims 2-3, 5-9, 11-12, 14-18, and 20-22 include the above-noted features by way of dependency, and are patentable over the cited art for at least the same reasons.
The Examiner agrees Zhong does not teach the amended limitation regarding the perceptual loss.
The Examiner introduced new art to teach the limitation. See the above rejection of Claims 1, 10 and 19.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TIONNA M BURKE whose telephone number is (571)270-7259. The examiner can normally be reached M-F 8a-4p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached at (571)272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TIONNA M BURKE/Examiner, Art Unit 2178 3/17/26