Last updated: May 29, 2026

Application No. 18/697,686

CASCADED MULTI-RESOLUTION MACHINE LEARNING FOR IMAGE PROCESSING WITH IMPROVED COMPUTATIONAL EFFICIENCY

Non-Final OA §102§103

Filed

Apr 01, 2024

Priority

Oct 01, 2021 — nonprovisional of PCTUS2021053152

Examiner

GILLIARD, DELOMIA L

Art Unit

2661

Tech Center

2600 — Communications

Assignee

Google LLC

OA Round

1 (Non-Final)

Interview Optional

— +10.2% interview lift. Interview lift (+10.2%) is below the 15.0% threshold. A written response is recommended.

Based on 1092 resolved cases, 2023–2026

Examiner Intelligence

GILLIARD, DELOMIA L View full profile →

Grants 90% — above average

Career Allowance Rate

979 granted / 1092 resolved

+27.7% vs TC avg

Moderate +10% lift

Without

With

+10.2%

Interview Lift

resolved cases with interview

Fast prosecutor

1y 12m

Avg Prosecution

13 currently pending

Career history

1104

Total Applications

across all art units

Statute-Specific Performance

§101

3.7%

-36.3% vs TC avg

§103

67.8%

+27.8% vs TC avg

§102

6.5%

-33.5% vs TC avg

§112

3.9%

-36.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1092 resolved cases

Office Action

§102 §103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Preliminary Amendment
Preliminary Amendment filed April 4, 2024 has been entered. Claims 2-11 and 15 are currently amended. Claims 1-20 are pending. 
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1-9, 16-18 and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting to Yi et al., hereinafter, “Yi”.
Claim 1. (Original) Yi teaches A computing system for image modification with improved computational efficiency, the computing system comprising: Fig. 1: Inpainting results on ultra high-resolution images

one or more processors; [4 Experimental Results] …two NVIDIA 1080 Ti GPUs

and one or more non-transitory computer-readable media that collectively store instructions that, [4.3 Comparisons With Learning-based Methods] the proposed model can inpaint 4096×4096 images…GPU memory.

when executed by the one or more processors, cause the computing system to perform operations, Fig. 2: The overall pipeline of the method: (top) CRA mechanism

the operations comprising: obtaining a lower resolution version of an input image, Fig. 2 (top) raw input image downsampled to low resolution input image

wherein the lower resolution version of the input image has a first resolution, [3.1 The Overall Pipeline] we first down-sample the image to 512 × 512 (first resolution)

wherein the lower resolution version of the input image comprises one or more image elements to be modified with predicted image data; [3.3 Architecture of Generator] The generator takes an image and a binary mask indicating the hole regions as input and predicts a completed image…

processing the lower resolution version of the input image with a first machine- learned model to generate an augmented image having the first resolution, [Fig. 2] Coarse Network (first machine-learned model) 512x512 image with mask/hole (augmented image)

wherein the augmented image comprises first predicted image data replacing the one or more image elements; [3.3 Architecture of Generator] The prediction of the coarse network is naively blended with the input image by replacing the hole region

[Fig. 2], under the Coarse Network, “replacing the hole region using the generated image”

extracting a portion of the augmented image, wherein the portion of the augmented image comprises the first predicted image data;  [Fig. 2] the Coarse Network outputted 256x256 image (BRI of portion of the augmented to image to be the entire image) 

upscaling the extracted portion of the augmented image to generate an upscaled image portion having an upscaled resolution;  [Fig. 2] the Coarse Network the 256x256 image is upsampled to 512x512

processing the upscaled image portion with a second machine-learned model [Fig. 2] Refine Network (second machine learned model) to generate a refined portion, 
Yi [3.3 Architecture of Generator]  the refine network predicts finer results

wherein the refined portion comprises second predicted image data that modifies at least a portion of the first predicted image data; [3.3 Architecture of Generator]  the refine network predicts finer result… The generator takes an image and a binary mask indicating the hole regions as input and predicts a completed image. The generator takes an image and a binary mask indicating the hole regions as input and predicts a completed image. 

Fig. 2 outputted refined (modified) 512x512 image.

generating an output image based on the refined portion and a higher resolution version of the input image, [Fig. 2] the Refine Network outputs the 512x512 image (right side of the Refine Network)

wherein both the output image and the higher resolution version of the input image have a second resolution that is greater than the first resolution; [Fig. 2] Overall pipeline (top) the Inpainted 512x512 image is outputted by the Generator is upsampled and the refined hole region are both higher resolution the raw input/low resolution input image.

and providing the output image as an output. [Fig. 2] Overall pipeline (top) – Output image.

Claim 2. (Currently Amended) Yi teaches wherein obtaining the lower resolution version of the input image comprises downscaling the higher resolution version of the input image to obtain the lower resolution version of the input image. [Fig. 2] Overall pipeline (top) – raw image downsampled to the low resolution input image

Claim 3. (Currently Amended) Yi teaches wherein: processing the lower resolution version of the input image with the first machine-learned model to generate the augmented image comprises processing the lower resolution version of the input image and a mask that identifies the one or more image elements with a first machine- learned inpainting model to generate the augmented image having first inpainted image data that modifies the one or more image elements; [Fig. 2] the Coarse Network of the Generator – input and mask, [3.3 Architecture of Generator]

and processing the upscaled image portion with the second machine-learned model to generate the refined portion comprises processing the upscaled image portion with a second machine-learned inpainting model to generate the refined portion having second inpainted image data that modifies at least a portion of the first inpainted image data. [Fig. 2] the Refine Network of the Generator –and replacing the hole region, [3.3 Architecture of Generator]

Claim 4. (Currently Amended) Yi teaches wherein upscaling the extracted portion of the augmented image to generate the upscaled image portion having the upscaled resolution [Fig. 2] the Coarse Network the 256x256 image is upsampled to 512x512

comprises upscaling the extracted portion of the augmented image such that the upscaled resolution matches a corresponding resolution of a corresponding portion of the higher resolution version of the input image, [Fig. 2] Course Network 512x512 unsampled image (upscaled resolution) corresponding resolution of the 512x512 input image with mask

wherein the corresponding portion proportionally corresponds to the extracted portion of the augmented image. [Fig. 2] Course Network 512x512 unsampled image (upscaled resolution) corresponding resolution of the 512x512 input image with mask the corresponding portions are proportional.

Claim 5. (Currently Amended) Yi teaches wherein generating the output image based on the refined portion and the higher resolution version of the input image comprises inserting the refined portion into the higher resolution version of the input image.  [Fig. 2] Refine Network 512x512 input image is the higher resolution version

[3.3 Architecture of Generator] inputs are down-sampled to 256×256 before convolution in the coarse network, different from the refine network who operates on 512×512. The prediction of the coarse network is naively blended with the input image by replacing the hole region of the latter with that of the former as the input to the refine network.

Claim 6. (Currently Amended) Yi teaches wherein the one or more image elements to be replaced comprise one or more user-designated image elements that have been designated based on one or more user inputs. [Fig. 7] The masks for Photoshop and Inpaint are manually drawn

Claim 7. (Currently Amended) Yi teaches wherein the one or more image elements to be replaced are one or more computer-designated image elements, [Fig. 7] Photoshop content-aware fill and an open-source PatchMatch implementation 

wherein the one or more computer-designated image elements are designated by processing the input image with one or more classification sub-blocks of at least one of the first machine-learned model or the second machine-learned model.  [Fig. 2] Discriminator

Claim 8. (Currently Amended) Yi teaches wherein the first and the second predicted image data correspond to one or more of inpainting, deblurring, recoloring, or smoothing of the one or more image elements.  [Introduction] These tasks require automated image inpainting, …High-quality inpainting usually requires generating visually realistic and semantically coherent content to fill the hole regions.

[2.1 Irregular Hole-filling & Modified Convolutions]… visual artifacts such as color inconsistency, blurriness, and boundary artifacts…

Claim 9. (Currently Amended) Yi teaches wherein: the one or more objects comprises a plurality of objects;  [Fig. 2] Raw input image 

said processing the lower resolution version of the input image with the first machine- learned model to generate the augmented image is performed once;  [Fig. 2] Course Network…downsampled to Input and Mask

and said extracting, upscaling, and processing the upscaled image portion with the second machine-learned model are performed separately for each object of the plurality of objects. [Fig. 2] Refine Network (extracts patches, upsamples images and processes the upsampled image) 

Claim 16. (Original) Yi teaches One or more non-transitory computer readable media that collectively store instructions that, [4.3 Comparisons With Learning-based Methods] the proposed model can inpaint 4096×4096 images…GPU memory.

when executed by one or more processors,  [4 Experimental Results] …two NVIDIA 1080 Ti GPUs

cause a computing system to perform operations, the operations comprising: Fig. 2: The overall pipeline of the method: (top) CRA mechanism

obtaining a lower resolution version of an input image, Fig. 2 (top) raw input image downsampled to low resolution input image

wherein the lower resolution version of the input image has a first resolution; [3.1 The Overall Pipeline] we first down-sample the image to 512 × 512 (first resolution)

processing the lower resolution version of the input image with a first machine-learned model to generate a first predicted image having the first resolution, [Fig. 2] Coarse Network (first machine-learned model) 512x512 image with mask/hole (augmented image)

 [3.3 Architecture of Generator] The prediction of the coarse network is naively blended with the input image by replacing the hole region

wherein the first predicted image comprises first predicted image data; [3.3 Architecture of Generator] The prediction of the coarse network is naively blended with the input image by replacing the hole region

extracting a portion of the first predicted image, [Fig. 2] the Coarse Network outputted 256x256 image (BRI of portion of the predicted to image to be the entire image) 

wherein the portion of the first predicted image comprises the first predicted image data; [Fig. 2] the Coarse Network outputted 256x256 image (BRI of portion of the predicted to image to be the entire image) 

upscaling the extracted portion of the first predicted image to generate an upscaled image portion having an upscaled resolution; [Fig. 2] the Coarse Network the 256x256 image is upsampled to 512x512

and processing the upscaled image portion with a second machine-learned model [Fig. 2] Refine Network (second machine learned model)  to generate a second predicted image, [3.3 Architecture of Generator]  the refine network predicts finer results

[3.3 Architecture of Generator]  the refine network predicts finer result… The generator takes an image and a binary mask indicating the hole regions as input and predicts a completed image. The generator takes an image and a binary mask indicating the hole regions as input and predicts a completed image. 

wherein the second predicted image comprises second predicted image data that modifies at least a portion of the first predicted image data. [3.3 Architecture of Generator]  the refine network predicts finer result, Fig. 2 outputted refined (modified) 512x512 image.

Claim 17. (Original) Yi teaches wherein the first predicted image and the second predicted image comprise edge recognition images that indicate recognized edges in the input image. [1 Introduction]…train a convolutional network to model image-wide edge structure or foreground object contours, thus enabling auto-completion of the edge or contours

Claim 18. (Original) Yi teaches wherein the first predicted image and the second predicted image comprise object detection images that indicate objects detected in the input image. [1 Introduction]… inpainting structured images like faces [10, 12, 17, 19, 20, 21], objects [11, 13, 14, 15] 

Claim 20. (Original) Yi teaches wherein the first predicted image and the second predicted image comprise face recognition images that indicate recognized faces in the input image. [1 Introduction]… inpainting structured images like faces [10, 12, 17, 19, 20, 21]

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting to Yi et al., hereinafter, “Yi” in view of US 2017/0132528 A1 to Aslan et al., hereinafter, “Aslan”.
Claim 10. (Currently Amended) Yi fails to explicitly teach passing one or more internal feature vectors from the first machine-learned model to the second machine-learned model. Aslan, in the field of training a plurality of machine learning models, teaches further comprising passing one or more internal feature vectors from the first machine-learned model to the second machine-learned model. [0080]  the first machine learning model is trained to learn the first task using a set of features from the training data (e.g., an n-dimensional feature vector of quantifiable information about an attribute of the data); and passing the information comprises providing the second machine learning model access to output from the first machine learning model

Yi and Aslan are both in the same field of training machine learning models to analyze image data. Thus, before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of the machine learning models of Yi with the teachings of Aslan [0009] to providing more flexibility in model training.

Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting to Yi et al., hereinafter, “Yi” in view of US 2022/0284613 A1 to Yin et al., hereinafter, “Yin”.
Claim 11. (Currently Amended) Yi fails to explicitly teach the augmented image further comprises a predicted depth channel output by the first machine- learned model. Yin, in the field of training machine learning models, teaches wherein the augmented image further comprises a predicted depth channel output by the first machine- learned model. [0086] teaches Depth Prediction Machine-Learning Model 300 generating a predicted depth map

Yi and Yin are both in the same field of training machine learning models to analyze image data. Thus, before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of the machine learning models of Yi with the teachings of Yin [0001-0002] to generate a robust, diverse, and accurate monocular depth prediction model.

Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting to Yi et al., hereinafter, “Yi” in view of US 2021/0335004 A1 to Zohar et al., hereinafter, “Zohar”.
Claim 19. (Original) Yi fails to explicitly teach the first predicted image and the second predicted image comprise human keypoint estimation images that indicate human keypoints detected in the input image. Zohar, in the field of training machine learning models to extract features of skeletal joints, teaches wherein the first predicted image and the second predicted image comprise human keypoint estimation images that indicate human keypoints detected in the input image. Zohar [0094] teaches machine learning to extract and predict skeletal joint positions (human keypoints), for one or more frames 

Yi and Zohar are both in the same field of training machine learning models to analyze image data. Thus, before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of the machine learning models of Yi with the teachings of Zohar [0018] to effectively detect the pose of an object.

Allowable Subject Matter
Claims 12-15 are allowed.
In regards to Claim 12, the closest prior art is US 2021/0218961 A1 to Kanazawa et al., teaches: 
 (Original) A computer-implemented method for training machine learning models to perform image modification, the method comprising: [Abstract] A computing system can include a processor, a machine-learned image segmentation model comprising a semantic segmentation neural network and an edge refinement neural network,

FIG. 7, [0006] The semantic segmentation neural network can be trained… The edge refinement neural network can be trained…

receiving, by a computing system comprising one or more processors, [Abstract] A computing system can include a processor

a lower resolution version of an input image and a ground truth image, [0042] inputting a training image into the image segmentation model… Each training image can have, for example, corresponding ground-truth versions

 [0031] …inputting the low resolution image into the semantic segmentation neural network.

a loss function that evaluates a difference between the predicted image and the ground truth image; [0044] the first loss function can be determined by, for example, determining a difference between the semantic segmentation mask (understood to be the output (predicted image)) and a ground-truth semantic segmentation mask  
Kanazawa and other prior art search fails to explicitly teach “wherein the lower resolution version of the input image has a first resolution and the ground truth image has a second resolution that is greater than the first resolution, and wherein the lower resolution version of the input image comprises one or more image elements not present in the ground truth image.
Likewise claims 13-15 are allowed because they are dependents of claim 12.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DELOMIA L GILLIARD whose telephone number is (571)272-1681. The examiner can normally be reached 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Villecco can be reached at (571) 272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DELOMIA L GILLIARD/Primary Examiner, Art Unit 2661

Read full office action

Prosecution Timeline

Apr 01, 2024

Application Filed

Jan 30, 2026

Non-Final Rejection mailed — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/252,881

Patent 12639843

METHOD AND APPARATUS FOR DETERMINING POSE OF TRACKED OBJECT IN IMAGE TRACKING PROCESS

3y 0m to grant Granted May 26, 2026

18/381,540

Patent 12639964

VIDEO MANUAL GENERATION DEVICE, VIDEO MANUAL GENERATION METHOD, AND STORAGE MEDIUM STORING VIDEO MANUAL GENERATION PROGRAM

2y 7m to grant Granted May 26, 2026

18/470,367

Patent 12639980

User Eye Model Match Detection

2y 8m to grant Granted May 26, 2026

18/750,644

Patent 12633112

METHODS AND APPARATUS FOR IDENTIFYING VIDEO-DERIVED DATA

1y 11m to grant Granted May 19, 2026

18/282,949

Patent 12626555

ENTRY/EXIT MANAGEMENT SYSTEM, ENTRY/EXIT MANAGEMENT METHOD, AND RECORDING MEDIUM

2y 7m to grant Granted May 12, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

90%

Grant Probability

99%

With Interview (+10.2%)

1y 12m (~0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 1092 resolved cases by this examiner. Grant probability derived from career allowance rate.