Last updated: April 19, 2026
Application No. 18/778,450
INPAINTING AND SYNTHESIZING GROUP PHOTO

Non-Final OA §103
Filed
Jul 19, 2024
Examiner
HUYNH, THANG GIA
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Apple Inc.
OA Round
1 (Non-Final)
Interview Optional

— +50.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 25 resolved cases, 2023–2026
Examiner Intelligence

HUYNH, THANG GIA View full profile →
Grants 76% — above average
Career Allow Rate
19 granted / 25 resolved
+14.0% vs TC avg
Strong +50% interview lift
Without
With
+50.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
21 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.3%
-37.7% vs TC avg
§103
73.9%
+33.9% vs TC avg
§102
7.7%
-32.3% vs TC avg
§112
11.5%
-28.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 25 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 4-5, 7-8, 11, 14-15, 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US 11574392 B2) (Hereinafter referred to as Lin) in view of Shah et al. (US 10475222 B2) (Hereinafter referred to as Shah) and further in view of Vomweg et al. (US 20080292214 A1) (Hereinafter referred to as Vomweg). 

Regarding Claim 1, Lin discloses A method of processing images in a device, comprising: (See Col 2 Lines 22-24, “methods for automatically merging people and objects from multiple digital images into a composite group photo.”)
obtaining a set of images including a plurality of target objects; (See Col 2 Lines 24-27, “For instance, the disclosed systems can utilize a number of models and operations to automatically analyze multiple digital images to identify a missing person from a base image . . .” Also see Col 3 Lines 40-43, “To illustrate, in one or more implementations, the image merging system identifies multiple digital images (or simply “images”), where each of the images includes at least one person.” In this case, each person can be considered as “a plurality of target objects”.)
determining a feature value for each target object of the plurality of target objects in each image of the set of images; (See Col 3 Lines 40-46, “To illustrate, in one or more implementations, the image merging system identifies multiple digital images (or simply “images”), where each of the images includes at least one person. The image merging system can then identify faces in the images and compare the faces between the images (e.g., a base image and a second image) to determine a person that is missing from the base image.”
Also see Col 3 Lines 54-56, “As mentioned above, the image merging system can identify faces of persons within a set of images (e.g., a base image and a second image).”
Also see Col 10 Lines 60-65, “For example, the image merging system analyzes the facial features of each detected face and generates a face descriptor based on the result of the analysis. For instance, as described above, the face descriptor for a detected face includes facial feature values from one or more facial features of the face.” Note that a “face descriptor” corresponds to a “feature value”.)
a key image (See Col 3 Lines 43-46, “The image merging system can then identify faces in the images and compare the faces between the images (e.g., a base image and a second image) to determine a person that is missing from the base image.”)
a first auxiliary image (See Col 3 Lines 43-46, “The image merging system can then identify faces in the images and compare the faces between the images (e.g., a base image and a second image) to determine a person that is missing from the base image.”)
generating a synthesized image including a second target object in the key image and the first target object in the first auxiliary image. (See Col 6 Lines 36-42, “In addition, as shown, the series of acts 100 includes an act 110 of the image merging system generating a merged image. For example, in various implementations, the image merging system generates a composite group photo that merges the missing person from the second image into the base image utilizing the base image, the available location, and the segmented image of the missing person.”
In this case, consider the merged image to correspond to a “synthesized image”, the base image to correspond to a “key image” and second image to correspond to a “first auxiliary image”, missing person from the second image corresponds to “first target object in the first auxiliar image”, and it would be implied there exists a person (second target object) within the base image (key image).)
However, Lin fails to explicitly disclose determining a feature value for each target object of the plurality of target objects in each image of the set of images;
identifying a key image from the set of images based on the feature value for each target object;
identifying a first auxiliary image from the set of images based on the feature value associated with a first target object of the plurality of target objects;
aligning the key image and the first auxiliary image based on optical flow between the key image and the first auxiliary image; and
generating a synthesized image including a second target object in the key image and the first target object in the first auxiliary image.
Shah teaches determining a feature value for each target object of the plurality of target objects in each image of the set of images; (See Col 2 Lines 62 – Lines 66, “Embodiments of the invention select a best frame from a short video clip to use as a base image. Each frame in the clip is evaluated with respect to whether the faces are aligned towards the camera, whether the faces have features that are aligned emotionally. . .” Here, a video clip can be considered as “the set of images” and Shah teaches to evaluate each frame in the video clip, and thus in combination with Lin teaching facial descriptors (feature values) for each person (target object), the above limitation is taught.)
identifying a key image from the set of images based on the feature value for each target object; (See Abstract, “creating a group shot image by intelligently selecting a best frame of a video clip to use as a base frame and then intelligently merging features of other frames into the base frame.”
Also see Cols 2 Line 62 – Col 3 Line 6, “Embodiments of the invention select a best frame from a short video clip to use as a base image. Each frame in the clip is evaluated with respect to whether the faces are aligned towards the camera, whether the faces have features that are aligned emotionally (e.g., happy, sad, neutral, etc.), the quality of the faces with respect to blurriness, lighting, and/or exposure, and/or whether the eyes of the faces are opened or closed. One or more of these evaluations result in scores that are combined or otherwise used to determine a comprehensive score for each of the frames of the clip. The frame having the best score is selected to be used as the base frame for the group shot image.”
Here, Shah teaches to evaluate each frame in the video clip and select a base frame (key image) based on the scores which takes into account the facial features, and thus in combination with Lin already teaching a base image (key image) and teaching facial descriptors (feature values) for each person (target object), the above limitation is taught.)
identifying a first auxiliary image from the set of images based on the feature value associated with a first target object of the plurality of target objects; (See Col 2 Lines 13-19, “The feature merging module then determines replacement features in other frames of the video clip, for example, based on proximity of the other frames of the video clip to the base frame and/or detecting visibility of the replacement features in those frames. Once the replacement features are identified, those features are merged into the base frame to create the group shot image.” Here, Shah teaches to find replacement features in other frames (identifying a first auxiliary image from the set of images).
Note that Lin already teaching a second image (a first auxiliary image) and teaches determining facial descriptors (feature values) for each person (target object). Also see Lin Col 3 Line 66 – Col 4 Line 6, “For instance, in various implementations, the image merging system compares the face descriptors from faces in the base image with the face descriptors from faces in the second image. For example, the image merging system determines that a person is in both the base image and the second image based on their face descriptor from the second image matching (e.g., being within a face similarity threshold) their face descriptor in from the base image.” Here, Lin teaches the one can use the facial descriptors for comparison purposes and gives an example of using the face descriptor of a person in the base image (feature value associated with a first target object of the plurality of target objects) as a similarity reference.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lin with Shah to include determining facial descriptors (feature values) for each in image within the set images, as well as identifying a key and first auxiliary image.
The motivation to combine Lin with Shah would have been obvious as both Lin and Shah are within the same field of image merging (See Shah abstract). The benefit of using the feature values within each image to select the best fame (key image) and merging other features in other frames (first auxiliary image) into this image is that would improve the overall appearance of the final product. See Shah Col 3 Lines 7-10, “Given a base frame selection, embodiments of the invention additionally or alternatively intelligently merge attributes from other frames into the selected base frame to improve its facial feature attributes.”
However, Lin in view of Shah fails to explicitly disclose aligning the key image and the first auxiliary image based on optical flow between the key image and the first auxiliary image; and
Vomweg teaches aligning the key image and the first auxiliary image based on optical flow between the key image and the first auxiliary image; (See [0005], “d) Registering the first and the second image or set of images by applying the inverse optical flow to the pixels or voxels of the second image or set of images.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lin in view of Shah with Vomweg to include aligning the key image and the first auxiliary image based on optical flow.
The motivation to combine Lin in view of Shah with Vomweg would have been obvious as Shah and Vomweg are within the same field of aligning images (See Vomweg). Vomweg is simply teaching the common technique of using optical flow during image alignment. The benefit of using optical flow for image alignment is improved accuracy for alignment.


Regarding Claim 4, Lin in view of Shah and Vomweg disclose The method of claim 1, wherein identifying the key image comprises: determining a composite score for each image of the set of images based on the feature value of each target object; and selecting the key image based on the composite score. (See Shah Cols 2 Line 62 – Col 3 Line 6, “Embodiments of the invention select a best frame from a short video clip to use as a base image. Each frame in the clip is evaluated with respect to whether the faces are aligned towards the camera, whether the faces have features that are aligned emotionally (e.g., happy, sad, neutral, etc.), the quality of the faces with respect to blurriness, lighting, and/or exposure, and/or whether the eyes of the faces are opened or closed. One or more of these evaluations result in scores that are combined or otherwise used to determine a comprehensive score for each of the frames of the clip. The frame having the best score is selected to be used as the base frame for the group shot image.” The motivation to combine would have been similar that of Claim 1 rejection motivation.)

Regarding Claim 5, Lin in view of Shah and Vomweg disclose The method of claim 1, further comprising: determining the first target object in the key image is to be modified based on the feature value; and selecting the first auxiliary image from the set of images based on the feature value of the first target object in the first auxiliary image. (See Shah Cols 2 Line 62 – Col 3 Line 6 teaching selecting a best frame (key image) based on features such as eyes of the faces being open or closed, etc.
See Shah Col 11 Lines 12-19, “The technique 500 next involves identifying features of faces in the base frame for replacement based on the face scores, as shown in block 504. In an embodiment of the invention, this involves comparing the face scores with a threshold such as the average face score of all faces in the base frame, and identifying all of the faces that have scores below the threshold for replacement.” Note that one can consider features of faces in the base frame for replacement to be “the first target object in the key image is to be modified” and face scores to correspond to “feature value”.
Also see Shah Col 11 Lines 20-27, “The technique 500 next involves identifying replacement features in other frames of the video clip, as shown in block 505. These features are identified based on various criteria selected to minimize discontinuities and other undesirable visual attributes.”
See Lin Col 3 Lines 43-46, “The image merging system can then identify faces in the images and compare the faces between the images (e.g., a base image and a second image) to determine a person that is missing from the base image.” Note that one can consider the faces, and thus facial descriptors to be “feature value of the first target object” and thus in combination with Shah teaching identifying replacement features, the selection of the second image (the first auxiliary image) can be considered as “selecting the first auxiliary image from the set of images based on the feature value of the first target object”. The motivation to combine would have been similar that of Claim 1 rejection motivation.)

Regarding Claim 7, Lin in view of Shah and Vomweg disclose The method of claim 1, wherein the feature value is associated with a combination of key features associated with each target object, and wherein the key features of a target object include an orientation of the target object with respect to the device and facial features of the target object. (See Shah Cols 2 Line 62 – Col 3 Line 6, “Embodiments of the invention select a best frame from a short video clip to use as a base image. Each frame in the clip is evaluated with respect to whether the faces are aligned towards the camera, whether the faces have features that are aligned emotionally (e.g., happy, sad, neutral, etc.), the quality of the faces with respect to blurriness, lighting, and/or exposure, and/or whether the eyes of the faces are opened or closed. One or more of these evaluations result in scores that are combined or otherwise used to determine a comprehensive score for each of the frames of the clip. The frame having the best score is selected to be used as the base frame for the group shot image.” The motivation to combine would have been similar to that of Claim 1 rejection motivation.)

Regarding Claim 8, Lin in view of Shah and Vomweg disclose The method of claim 1, wherein the set of images are downscaled. (See Lin Col 20 Lines 49-52, “In some instances, blending includes adjusting the contrast, shading, hue, saturation, sharpness, and/or resolution of the segmented image to match the base image.” Here, Lin teaches the ability to adjust the resolution of the images, and thus one of ordinary skill in the art can obviously be able to downscale the images as that would well-known and basic function.)

Regarding Claim 11, Lin in view of Shah and Vomweg disclose A computing device for processing images, comprising: at least one memory; and at least one processor coupled to the at least one memory and configured to: (See Lin Col 25 Lines 1-5, “one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices, such as a client device (e.g., a mobile client device) or server device.”)
obtain a set of images including a plurality of target objects; determine a feature value for each target object of the plurality of target objects in each image of the set of images; identify a key image from the set of images based on the feature value for each target object; identify a first auxiliary image from the set of images based on the feature value associated with a first target object of the plurality of target objects; align the key image and the first auxiliary image based on optical flow between the key image and the first auxiliary image; and generate a synthesized image including a second target object in the key image and the first target object in the first auxiliary image. (The above limitations are similar to those of Claim 1 and is therefore rejected under a similar rationale as Claim 1.)

Regarding Claim 14, Claim 14 contains similar limitations as to Claim 4 and is therefore rejected under a similar rationale as Claim 4.

Regarding Claim 15, Claim 15 contains similar limitations as to Claim 5 and is therefore rejected under a similar rationale as Claim 5.

Regarding Claim 17, Claim 17 contains similar limitations as to Claim 7 and is therefore rejected under a similar rationale as Claim 7.

Regarding Claim 18, Claim 18 contains similar limitations as to Claim 8 and is therefore rejected under a similar rationale as Claim 8.

Claims 2 and 12 is rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Shah and Vomweg and in further view of Fritz, (“Guide to Image Inpainting: Using machine learning to edit and correct defects in photos”).

Regarding Claim 2, Lin in view of Shah and Vomweg disclose The method of claim 1, wherein generating the synthesized image comprises: inserting pixels of the first target object. (See Lin Col 28 Lines 13-16, “For example, act 1450 can involve generating a merged image by inserting the pixels of the second image representing the missing person into the available location of the first image.”)
However, Lin in view of Shah and Vomweg fails to explicitly disclose generating, using a machine learning model, boundary region pixels of the first target object based on hallucination of pixels at edges of the first target object using the set of images and the machine learning model.
Fritz teaches generating, using a machine learning model, boundary region pixels of the first target object based on hallucination of pixels at edges of the first target object using the set of images and the machine learning model. (See Page 5 teaching “deep learning” for inpainting and training on “huge training datasets”.
Also see Page 9 Paragraph 2, “The first is based on a Fast Marching Method, which starts from the boundary of the region to be inpainted and moves towards the epicenter, gradually filling everything in the boundary first. Each pixel is replaced by a normalized weighed sum of all the know pixels in its neighborhood.” In this case, deep learning implies a machine learning model, and gradually filling everything in the boundary corresponds to generating boundary region pixels based on hallucination of pixels at edges.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lin in view of Shah and Vomweg with Fritz to include using a machine learning model to generate boundary region pixels of the first target object based on hallucination of pixels at edges.
The motivation to combine Lin in view of Shah and Vomweg with Fritz would have been obvious as the of using a machine learning model for image alignment and merging is that it can have improved speed, accuracy, and automation over traditional methods.

Regarding Claim 12, Claim 12 contains similar limitations as to Claim 2 and is therefore rejected under a similar rationale as Claim 2.

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Shah and Vomweg and in further view of Hsieh et al. (US 20230132180 A1) (Hereinafter referred to as Hsieh).
Regarding Claim 3, Lin in view of Shah and Vomweg disclose The method of claim 1, further comprising: generating a first mask of the first target object from the first auxiliary image; (See Lin Col 4 Lines 12-23, “Based on identifying a missing person in the second image, the image merging system can create a segmented image of the missing person. . . For example, the segmentation model can create a bounding box around the missing person, generate an object mask of the missing person based on the bounding box, and generate a segmented image of the missing person based on the object mask (e.g., an indication of a plurality of pixels portraying an object such as a binary mask identifying pixels corresponding to an object).”)
However, Lin in view of Shah and Vomweg fails to explicitly disclose upsampling the first mask using a guided upsampling filter for filamentous structures associated with the first target object.
Hsieh teaches upsampling the first mask using a guided upsampling filter for filamentous structures associated with the first target object. (See [0077], “In other words, the segmentation mask refinement and upsampling system 106 utilizes the guided filtering to improve the refined preliminary segmentation mask 504 to recapture details (particularly along borders) from the low-resolution image 500 lost during the generation of the refined preliminary segmentation mask 504.”
Also see [0080], “The segmentation mask refinement and upsampling system 106 then upsamples the refined-filtered preliminary segmentation mask 508 to a higher resolution.” Note that since Lin teaches an object mask of a person, and that mask would include filamentous structures such as hair and clothing, thus the upsampling on the masks can be consider as for filamentous structures associated with the first target object.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Lin in view of Shah and Vomweg with Hsieh to include upsampling the mask using guided upsampling filter for filamentous structures associated with the first target object.
The motivation to combine Lin in view of Shah and Vomweg with Hsieh would have been obvious upsampling can improve the quality of the final image by having a higher resolution. Note Hsieh [0002] teaches that there is a need for high-resolution segmentation masks, “Although conventional segmentation systems generate segmentation masks for digital visual media items, such systems are often inflexibly limited to low-resolutions, are often inaccurate at segmenting fine-grained details in high-resolution images. . . ” 

Regarding Claim 13, Claim 13 contains similar limitations as to Claim 3 and is therefore rejected under a similar rationale as Claim 3.

Allowable Subject Matter
Claims 6, 9-10, 16, 19-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
Regarding Claim 6, the cited prior art does not disclose or render obvious the combination of elements cited in the claims as a whole. Specifically, the cited prior art fails to disclose or render obvious the limitations: extracting a first background from the key image excluding the plurality of target objects; extracting a second background from the first auxiliary image excluding the plurality of target objects; identifying key points within the first background and the second background; and combining the first background and the second background into a combined background based the optical flow between the key points, wherein the combined background is input into a machine learning model. Thus, Claim 6 contains allowable subject matter.
Regarding Claim 9, the cited prior art does not disclose or render obvious the combination of elements cited in the claims as a whole. Specifically, the cited prior art fails to disclose or render obvious the limitations: generating a first mask based on the first target object in the synthesized image at a first resolution and the first auxiliary image; generating a second mask based on the second target object in the synthesized image at the first resolution and the key image at the first resolution, interpolating the first mask and the second mask to a second resolution higher than the first resolution; and generating the synthesized image at the second resolution. Thus, Claim 9 contains allowable subject matter.
Regarding Claim 10, Claim 10 is dependent upon Claim 9 and thus also contains allowable subject matter.
Regarding Claim 16, Claim 16 contains similar limitations as to Claim 6 and therefore contains similar allowable subject matter.
Regarding Claim 19, Claim 19 contains similar limitations as to Claim 9 and therefore contains similar allowable subject matter.
Regarding Claim 20, Claim 20 contains similar limitations as to Claim 10 and therefore contains similar allowable subject matter.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THANG G HUYNH whose telephone number is (571)272-5432. The examiner can normally be reached Mon-Thu 7:30am-4:30pm EST | Fri 7:30am-11:30am EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/T.G.H./Examiner, Art Unit 2611                                                                                                                                                                                                        

/KEE M TUNG/Supervisory Patent Examiner, Art Unit 2611
Read full office action
Prosecution Timeline

Jul 19, 2024
Application Filed
Feb 13, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/419,942
Patent 12597100
DEEP IMAGE DELIGHTING
2y 5m to grant Granted Apr 07, 2026
18/423,346
Patent 12586309
MACHINE-LEARNING METHOD ON VECTORIZED THREE-DIMENSIONAL MODEL AND LEARNING SYSTEM THEREOF
2y 5m to grant Granted Mar 24, 2026
18/443,679
Patent 12581083
METHOD, DEVICE, AND COMPUTER PROGRAM PRODUCT FOR COMPRESSING TWO-DIMENSIONAL IMAGE
2y 5m to grant Granted Mar 17, 2026
18/370,595
Patent 12560450
METHOD AND SERVER FOR GENERATING SPATIAL MAP
2y 5m to grant Granted Feb 24, 2026
18/167,767
Patent 12554815
DEVICES, METHODS, AND GRAPHICAL USER INTERFACES FOR AUTHORIZING A SECURE OPERATION
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
99%
With Interview (+50.0%)
2y 4m
Median Time to Grant
Low
PTA Risk
Based on 25 resolved cases by this examiner. Grant probability derived from career allow rate.