Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Preliminary Amendment
Preliminary Amendment filed April 4, 2024 has been entered. Claims 2-11 and 15 are currently amended. Claims 1-20 are pending.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-9, 16-18 and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting to Yi et al., hereinafter, “Yi”.
Claim 1. (Original) Yi teaches A computing system for image modification with improved computational efficiency, the computing system comprising: Fig. 1: Inpainting results on ultra high-resolution images
one or more processors; [4 Experimental Results] …two NVIDIA 1080 Ti GPUs
and one or more non-transitory computer-readable media that collectively store instructions that, [4.3 Comparisons With Learning-based Methods] the proposed model can inpaint 4096×4096 images…GPU memory.
when executed by the one or more processors, cause the computing system to perform operations, Fig. 2: The overall pipeline of the method: (top) CRA mechanism
the operations comprising: obtaining a lower resolution version of an input image, Fig. 2 (top) raw input image downsampled to low resolution input image
wherein the lower resolution version of the input image has a first resolution, [3.1 The Overall Pipeline] we first down-sample the image to 512 × 512 (first resolution)
wherein the lower resolution version of the input image comprises one or more image elements to be modified with predicted image data; [3.3 Architecture of Generator] The generator takes an image and a binary mask indicating the hole regions as input and predicts a completed image…
processing the lower resolution version of the input image with a first machine- learned model to generate an augmented image having the first resolution, [Fig. 2] Coarse Network (first machine-learned model) 512x512 image with mask/hole (augmented image)
wherein the augmented image comprises first predicted image data replacing the one or more image elements; [3.3 Architecture of Generator] The prediction of the coarse network is naively blended with the input image by replacing the hole region
[Fig. 2], under the Coarse Network, “replacing the hole region using the generated image”
extracting a portion of the augmented image, wherein the portion of the augmented image comprises the first predicted image data; [Fig. 2] the Coarse Network outputted 256x256 image (BRI of portion of the augmented to image to be the entire image)
upscaling the extracted portion of the augmented image to generate an upscaled image portion having an upscaled resolution; [Fig. 2] the Coarse Network the 256x256 image is upsampled to 512x512
processing the upscaled image portion with a second machine-learned model [Fig. 2] Refine Network (second machine learned model) to generate a refined portion,
Yi [3.3 Architecture of Generator] the refine network predicts finer results
wherein the refined portion comprises second predicted image data that modifies at least a portion of the first predicted image data; [3.3 Architecture of Generator] the refine network predicts finer result… The generator takes an image and a binary mask indicating the hole regions as input and predicts a completed image. The generator takes an image and a binary mask indicating the hole regions as input and predicts a completed image.
Fig. 2 outputted refined (modified) 512x512 image.
generating an output image based on the refined portion and a higher resolution version of the input image, [Fig. 2] the Refine Network outputs the 512x512 image (right side of the Refine Network)
wherein both the output image and the higher resolution version of the input image have a second resolution that is greater than the first resolution; [Fig. 2] Overall pipeline (top) the Inpainted 512x512 image is outputted by the Generator is upsampled and the refined hole region are both higher resolution the raw input/low resolution input image.
and providing the output image as an output. [Fig. 2] Overall pipeline (top) – Output image.
Claim 2. (Currently Amended) Yi teaches wherein obtaining the lower resolution version of the input image comprises downscaling the higher resolution version of the input image to obtain the lower resolution version of the input image. [Fig. 2] Overall pipeline (top) – raw image downsampled to the low resolution input image
Claim 3. (Currently Amended) Yi teaches wherein: processing the lower resolution version of the input image with the first machine-learned model to generate the augmented image comprises processing the lower resolution version of the input image and a mask that identifies the one or more image elements with a first machine- learned inpainting model to generate the augmented image having first inpainted image data that modifies the one or more image elements; [Fig. 2] the Coarse Network of the Generator – input and mask, [3.3 Architecture of Generator]
and processing the upscaled image portion with the second machine-learned model to generate the refined portion comprises processing the upscaled image portion with a second machine-learned inpainting model to generate the refined portion having second inpainted image data that modifies at least a portion of the first inpainted image data. [Fig. 2] the Refine Network of the Generator –and replacing the hole region, [3.3 Architecture of Generator]
Claim 4. (Currently Amended) Yi teaches wherein upscaling the extracted portion of the augmented image to generate the upscaled image portion having the upscaled resolution [Fig. 2] the Coarse Network the 256x256 image is upsampled to 512x512
comprises upscaling the extracted portion of the augmented image such that the upscaled resolution matches a corresponding resolution of a corresponding portion of the higher resolution version of the input image, [Fig. 2] Course Network 512x512 unsampled image (upscaled resolution) corresponding resolution of the 512x512 input image with mask
wherein the corresponding portion proportionally corresponds to the extracted portion of the augmented image. [Fig. 2] Course Network 512x512 unsampled image (upscaled resolution) corresponding resolution of the 512x512 input image with mask the corresponding portions are proportional.
Claim 5. (Currently Amended) Yi teaches wherein generating the output image based on the refined portion and the higher resolution version of the input image comprises inserting the refined portion into the higher resolution version of the input image. [Fig. 2] Refine Network 512x512 input image is the higher resolution version
[3.3 Architecture of Generator] inputs are down-sampled to 256×256 before convolution in the coarse network, different from the refine network who operates on 512×512. The prediction of the coarse network is naively blended with the input image by replacing the hole region of the latter with that of the former as the input to the refine network.
Claim 6. (Currently Amended) Yi teaches wherein the one or more image elements to be replaced comprise one or more user-designated image elements that have been designated based on one or more user inputs. [Fig. 7] The masks for Photoshop and Inpaint are manually drawn
Claim 7. (Currently Amended) Yi teaches wherein the one or more image elements to be replaced are one or more computer-designated image elements, [Fig. 7] Photoshop content-aware fill and an open-source PatchMatch implementation
wherein the one or more computer-designated image elements are designated by processing the input image with one or more classification sub-blocks of at least one of the first machine-learned model or the second machine-learned model. [Fig. 2] Discriminator
Claim 8. (Currently Amended) Yi teaches wherein the first and the second predicted image data correspond to one or more of inpainting, deblurring, recoloring, or smoothing of the one or more image elements. [Introduction] These tasks require automated image inpainting, …High-quality inpainting usually requires generating visually realistic and semantically coherent content to fill the hole regions.
[2.1 Irregular Hole-filling & Modified Convolutions]… visual artifacts such as color inconsistency, blurriness, and boundary artifacts…
Claim 9. (Currently Amended) Yi teaches wherein: the one or more objects comprises a plurality of objects; [Fig. 2] Raw input image
said processing the lower resolution version of the input image with the first machine- learned model to generate the augmented image is performed once; [Fig. 2] Course Network…downsampled to Input and Mask
and said extracting, upscaling, and processing the upscaled image portion with the second machine-learned model are performed separately for each object of the plurality of objects. [Fig. 2] Refine Network (extracts patches, upsamples images and processes the upsampled image)
Claim 16. (Original) Yi teaches One or more non-transitory computer readable media that collectively store instructions that, [4.3 Comparisons With Learning-based Methods] the proposed model can inpaint 4096×4096 images…GPU memory.
when executed by one or more processors, [4 Experimental Results] …two NVIDIA 1080 Ti GPUs
cause a computing system to perform operations, the operations comprising: Fig. 2: The overall pipeline of the method: (top) CRA mechanism
obtaining a lower resolution version of an input image, Fig. 2 (top) raw input image downsampled to low resolution input image
wherein the lower resolution version of the input image has a first resolution; [3.1 The Overall Pipeline] we first down-sample the image to 512 × 512 (first resolution)
processing the lower resolution version of the input image with a first machine-learned model to generate a first predicted image having the first resolution, [Fig. 2] Coarse Network (first machine-learned model) 512x512 image with mask/hole (augmented image)
[3.3 Architecture of Generator] The prediction of the coarse network is naively blended with the input image by replacing the hole region
wherein the first predicted image comprises first predicted image data; [3.3 Architecture of Generator] The prediction of the coarse network is naively blended with the input image by replacing the hole region
extracting a portion of the first predicted image, [Fig. 2] the Coarse Network outputted 256x256 image (BRI of portion of the predicted to image to be the entire image)
wherein the portion of the first predicted image comprises the first predicted image data; [Fig. 2] the Coarse Network outputted 256x256 image (BRI of portion of the predicted to image to be the entire image)
upscaling the extracted portion of the first predicted image to generate an upscaled image portion having an upscaled resolution; [Fig. 2] the Coarse Network the 256x256 image is upsampled to 512x512
and processing the upscaled image portion with a second machine-learned model [Fig. 2] Refine Network (second machine learned model) to generate a second predicted image, [3.3 Architecture of Generator] the refine network predicts finer results
[3.3 Architecture of Generator] the refine network predicts finer result… The generator takes an image and a binary mask indicating the hole regions as input and predicts a completed image. The generator takes an image and a binary mask indicating the hole regions as input and predicts a completed image.
wherein the second predicted image comprises second predicted image data that modifies at least a portion of the first predicted image data. [3.3 Architecture of Generator] the refine network predicts finer result, Fig. 2 outputted refined (modified) 512x512 image.
Claim 17. (Original) Yi teaches wherein the first predicted image and the second predicted image comprise edge recognition images that indicate recognized edges in the input image. [1 Introduction]…train a convolutional network to model image-wide edge structure or foreground object contours, thus enabling auto-completion of the edge or contours
Claim 18. (Original) Yi teaches wherein the first predicted image and the second predicted image comprise object detection images that indicate objects detected in the input image. [1 Introduction]… inpainting structured images like faces [10, 12, 17, 19, 20, 21], objects [11, 13, 14, 15]
Claim 20. (Original) Yi teaches wherein the first predicted image and the second predicted image comprise face recognition images that indicate recognized faces in the input image. [1 Introduction]… inpainting structured images like faces [10, 12, 17, 19, 20, 21]
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting to Yi et al., hereinafter, “Yi” in view of US 2017/0132528 A1 to Aslan et al., hereinafter, “Aslan”.
Claim 10. (Currently Amended) Yi fails to explicitly teach passing one or more internal feature vectors from the first machine-learned model to the second machine-learned model. Aslan, in the field of training a plurality of machine learning models, teaches further comprising passing one or more internal feature vectors from the first machine-learned model to the second machine-learned model. [0080] the first machine learning model is trained to learn the first task using a set of features from the training data (e.g., an n-dimensional feature vector of quantifiable information about an attribute of the data); and passing the information comprises providing the second machine learning model access to output from the first machine learning model
Yi and Aslan are both in the same field of training machine learning models to analyze image data. Thus, before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of the machine learning models of Yi with the teachings of Aslan [0009] to providing more flexibility in model training.
Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting to Yi et al., hereinafter, “Yi” in view of US 2022/0284613 A1 to Yin et al., hereinafter, “Yin”.
Claim 11. (Currently Amended) Yi fails to explicitly teach the augmented image further comprises a predicted depth channel output by the first machine- learned model. Yin, in the field of training machine learning models, teaches wherein the augmented image further comprises a predicted depth channel output by the first machine- learned model. [0086] teaches Depth Prediction Machine-Learning Model 300 generating a predicted depth map
Yi and Yin are both in the same field of training machine learning models to analyze image data. Thus, before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of the machine learning models of Yi with the teachings of Yin [0001-0002] to generate a robust, diverse, and accurate monocular depth prediction model.
Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting to Yi et al., hereinafter, “Yi” in view of US 2021/0335004 A1 to Zohar et al., hereinafter, “Zohar”.
Claim 19. (Original) Yi fails to explicitly teach the first predicted image and the second predicted image comprise human keypoint estimation images that indicate human keypoints detected in the input image. Zohar, in the field of training machine learning models to extract features of skeletal joints, teaches wherein the first predicted image and the second predicted image comprise human keypoint estimation images that indicate human keypoints detected in the input image. Zohar [0094] teaches machine learning to extract and predict skeletal joint positions (human keypoints), for one or more frames
Yi and Zohar are both in the same field of training machine learning models to analyze image data. Thus, before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of the machine learning models of Yi with the teachings of Zohar [0018] to effectively detect the pose of an object.
Allowable Subject Matter
Claims 12-15 are allowed.
In regards to Claim 12, the closest prior art is US 2021/0218961 A1 to Kanazawa et al., teaches:
(Original) A computer-implemented method for training machine learning models to perform image modification, the method comprising: [Abstract] A computing system can include a processor, a machine-learned image segmentation model comprising a semantic segmentation neural network and an edge refinement neural network,
FIG. 7, [0006] The semantic segmentation neural network can be trained… The edge refinement neural network can be trained…
receiving, by a computing system comprising one or more processors, [Abstract] A computing system can include a processor
a lower resolution version of an input image and a ground truth image, [0042] inputting a training image into the image segmentation model… Each training image can have, for example, corresponding ground-truth versions
[0031] …inputting the low resolution image into the semantic segmentation neural network.
a loss function that evaluates a difference between the predicted image and the ground truth image; [0044] the first loss function can be determined by, for example, determining a difference between the semantic segmentation mask (understood to be the output (predicted image)) and a ground-truth semantic segmentation mask
Kanazawa and other prior art search fails to explicitly teach “wherein the lower resolution version of the input image has a first resolution and the ground truth image has a second resolution that is greater than the first resolution, and wherein the lower resolution version of the input image comprises one or more image elements not present in the ground truth image.
Likewise claims 13-15 are allowed because they are dependents of claim 12.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DELOMIA L GILLIARD whose telephone number is (571)272-1681. The examiner can normally be reached 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Villecco can be reached at (571) 272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DELOMIA L GILLIARD/Primary Examiner, Art Unit 2661