Last updated: April 19, 2026
Application No. 18/527,881
REPEATED DISTRACTOR DETECTION FOR DIGITAL IMAGES

Non-Final OA §103
Filed
Dec 04, 2023
Examiner
WANG, JIN CHENG
Art Unit
2617
Tech Center
2600 — Communications
Assignee
Adobe Inc.
OA Round
3 (Non-Final)
Interview Optional

— +10.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 832 resolved cases, 2023–2026
Examiner Intelligence

WANG, JIN CHENG View full profile →
Grants 59% of resolved cases
Career Allow Rate
492 granted / 832 resolved
-2.9% vs TC avg
Moderate +10% lift
Without
With
+10.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
40 currently pending
Career history
872
Total Applications
across all art units
Statute-Specific Performance

§101
11.8%
-28.2% vs TC avg
§103
62.7%
+22.7% vs TC avg
§102
7.6%
-32.4% vs TC avg
§112
15.5%
-24.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 832 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Response to Amendment
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed 10/27/2025 has been entered. The claims 1, 11 and 17 have been amended. The claims 1-20 are pending in the current application. 

Response to Arguments
Applicant's arguments filed 10/27/2025 have been fully considered but they are moot in view of the new ground(s) of rejection set forth in the current Office Action. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over 
Lee et al. US-PGPUB No. 2023/0206586 (hereinafter Lee) in view of 
Kanazawa et al. US-PGPUB No. 2024/0303788 (hereinafter Kanazawa); 
Huynh, et al., “SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network”, Proc. Of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 18-22, 2023, pp. 14518-14527 (hereinafter Huynh); 
Liba et al. US-PGPUB No. 2024/0355107 (hereinafter Liba). 
Re Claim 1: 
Lee implicitly teaches method comprising:
receiving, by a processing device, an input specifying a location within a digital image displayed in a user interface (
Lee teaches at Paragraph 0085 that a distractor box may be determined through the box selection. Lee teaches at FIG. 5 and Paragraph 0073 that an object tracking apparatus may remove duplicate or overlapping candidate boxes through a candidate box removal operation 510. Lee teaches at Paragraph 0074-0075 calculating the overlap state (similarity) between the candidate box and the mask of the distractor map which is determined based on the distractor information via the box selection of Paragraph 0085. 
Lee teaches at Paragraph 0063 that the target box 202 may be determined according to a user input for selecting the target object and at Paragraph 0073 that selecting a target box and a distractor box from among candidate boxes and an object tracking apparatus may remove duplicate or overlapping candidate boxes through a candidate box removal operation 510. 
Lee teaches that the object tracking apparatus receives an input specifying a location of the target box or a location of a distractor box within a digital image. Lee shows at FIG. 6 specifying a location of the target box 612 or a location of the distractor box 613. 
Lee teaches at Paragraph 0063 that the target object may be determined according to a user input for selecting the target object. Lee teaches at Paragraph 0073 that FIG. 5 illustrates an example of selecting a target box and a distractor box from among candidate boxes. 
Lee teaches at Paragraph 0077 that an object tracking apparatus may determine a target box 612 and distractor boxes 613 in a search region 611 of a Tth image frame 615 and determine a distractor map 614 based on distractor information according to the distractor boxes 613. 
Lee teaches at Paragraph 0072 that the object tracking apparatus may determine the target box 421 and a distractor box 422. 
Lee teaches at Paragraph 0080 that the object tracking apparatus may input at least one of the distractor information of the previous image frame and the distractor information of the current image frame. 
Lee teaches at Paragraph 0063 that the target may be determined according to a user input for selecting the target object and at Paragraph 0076 that the object tracking apparatus may select the target box and the distractor box from among the candidate boxes through a box selection operation 530 and may select K candidate boxes with a high reliability score from among the candidate boxes);
identifying, by the processing device, an input distractor based on the location, the identifying performed using a machine-learning model (
Lee teaches at Paragraph 0085 that a distractor box may be determined through the box selection. Lee teaches at FIG. 5 and Paragraph 0073 that an object tracking apparatus may remove duplicate or overlapping candidate boxes through a candidate box removal operation 510. Lee teaches at Paragraph 0074-0075 calculating the overlap state (similarity) between the candidate box and the mask of the distractor map which is determined based on the distractor information via the box selection of Paragraph 0085. 
Lee teaches at Paragraph 0076 that the distractor boxes are determined based a box selection operation 530 and a similarity score s1 of a distractor may have the highest value before the score adjustment operation 520 and similarity score s1 is reduced through score adjustment operation 510. Lee teaches at Paragraph 0075 that a similarity score of a candidate box overlapping with the mask is reduced in proportion to an overlap ratio of each of the candidate boxes with the mask.
Lee teaches at Paragraph 0085 that a distractor box may be determined through the box selection. 
Lee teaches at FIG. 4 and Paragraph 0071 that based on the neural network, an image comparison model 410 determines a plurality of candidate boxes 411 based on similarity scores of the candidate boxes and at Paragraph 0074 that the mask may be set corresponding to all the distractor boxes and at FIG. 6 and Paragraph 0077 that an object tracking apparatus may determine a target box 612 and distractor boxes 613 (based on similarity scores). 
Lee teaches at Paragraph 0063 that the target box 202 may be determined according to a user input for selecting the target object and at Paragraph 0073 that selecting a target box and a distractor box from among candidate boxes and an object tracking apparatus may remove duplicate or overlapping candidate boxes through a candidate box removal operation 510. 
Lee teaches at FIG. 4 based on the location of the target object, the input distractors 411 with their locations are identified. Lee teaches at FIG. 5 based on the location of the target object, the input distractors are identified within the score adjustment 520 and the output candidate distractor boxes are identified within the target/distractor box selection 530. Lee teaches at FIG. 6 that the candidate output distractor boxes are identified in the current image frame based on the location of the target box 612 and the location of the input distractor 613 wherein the location of the ROI in the current image frame corresponds to the location of the target box 612 and/or the location of the input distractor box 613. Moreover, the location of the input distractor is also input to the object tracking module as shown in FIG. 6. The location of the input distractor in the Score adjustment 520 is also input to the Target/distractor box selection 530 as shown in FIG. 5. The location of the input distractor 411 is also input to the score adjustment 420 as shown in FIG. 4. The location of the input distractor in the distractor map 405 is also input to the score adjustment 420 to generate the output candidate distractor boxes 421 and 422. 
Lee’s segmentation mask is input to the object tracking apparatus based on input distractor location as shown in FIG. 3 wherein the mask 323 includes distractor boxes as input to the object tracking module 330. The segmentation mask at least includes an input distractor. 
Lee teaches at Paragraph 0051 that the object tracking model 110 may include an artificial intelligence model based on machine learning. For example, the object tracking model 110 may include a deep neural network (DNN) including a plurality of layers. 
Lee teaches that the object tracking apparatus receives an input specifying a location of the target box or a location of a distractor box within a digital image. Lee shows at FIG. 6 specifying a location of the target box 612 or a location of the distractor box 613. 
Lee teaches that the object tracking model 110 identifies the input distractor based on the location input of the target object or the location input of an input distractor. 
Lee teaches at Paragraph 0077 that an object tracking apparatus may determine a target box 612 and distractor boxes 613 in a search region 611 of a Tth image frame 615 and determine a distractor map 614 based on distractor information according to the distractor boxes 613. The box 613 constitutes an input distractor. The distractor map 614 may include a mask corresponding to the distractor boxes 613.
Lee teaches at Paragraph 0015 adjusting the similarity scores of the candidate boxes using a distractor map and at Paragraph 0068 that an object tracking apparatus may distinguish between a target object and a distractor using a distractor map. 
Lee teaches at Paragraph 0069-0070 the object tracking apparatus may perform the object tracking operation 310 using distractor information of a T-1th image frame at time T-1 and perform an update operation 320 on a distractor map based on distractor information of at least a partial region in the Tth image frame 321. Lee teaches identifying an input distractor of a ROI 322 in the T-1th image frame to extract candidate distractors in the Tth frame. Lee teaches identifying an input distractor wherein the object tracking apparatus may distinguish between a target object and a distractor by applying the mask 323 of the ROI 322 to the search region 331 and determine the target box 332 of the search region. 
Lee teaches at Paragraph 0067 that the distractor of the target object and at Paragraph 0068 that the object tracking apparatus may distinguish between a target object and a distractor using a distractor map including distractor information and at Paragraph 0069 that the distractor map may include distractor information of a region other than the ROI 322 and the distractor information may include a mask 323 corresponding to a distractor box);
detecting, by the processing device, at least one candidate distractor at another location within the digital image based on the input distractor as being visually similar to the input distractor (
Lee teaches at Paragraph 0085 that a distractor box may be determined through the box selection. Lee teaches at FIG. 5 and Paragraph 0073 that an object tracking apparatus may remove duplicate or overlapping candidate boxes through a candidate box removal operation 510. Lee teaches at Paragraph 0074-0075 calculating the overlap state (similarity) between the candidate box and the mask of the distractor map which is determined based on the distractor information via the box selection of Paragraph 0085. 
Lee teaches at Paragraph 0076-0077 that the object tracking apparatus determines the rest of the K candidate boxes as distractor boxes wherein the distractor boxes 613 are identified based on the input distractor box selection. 
Lee teaches that the similarity scores of a candidate box to the target object based on the its visual overlap (similarity) to the distractor map where a distractor box is selected through the box selection. The claimed visual similarity is mapped to the visual overlap with the distractor map and is inversely proportional to the similarity score of the candidate box. 
Lee teaches at Paragraph [0074] The object tracking apparatus may adjust similarity scores sn through a score adjustment operation 520. The object tracking apparatus may determine a mask according to distractor information of a previous image frame and adjust the similarity scores sn based on an overlap state between candidate boxes and the mask. When there is a candidate box overlapping with the mask, the candidate box may correspond to a distractor, and the object tracking apparatus may thus reduce a similarity score of the candidate box in such a way that the candidate box is not selected as a target box.
Lee teaches at FIG. 4 and Paragraph 0071 that based on the neural network, an image comparison model 410 determines a plurality of candidate boxes 411 based on similarity scores of the candidate boxes. 
Lee teaches detecting at least one candidate distractor of the candidate boxes in the locations current image frame based on an input distractor such as the input distractor of FIGS. 4-6. Lee shows at FIG. 4 the input distractor boxes 411 include their locations and the candidate distractor boxes 421 and 422 are identified. Lee shows at FIG. 5 that the input distractors are identified in score adjustment 520 and the candidate distractor boxes are identified based on the input distractors. Lee shows at FIG. 6 that the input distractor 613 is identified and the candidate distractor boxes are identified based on the input distractor 613.  
Lee teaches detecting at least one candidate distractor of the candidate boxes in the image frame 615 based on the input distractor 613. 
Lee teaches at Paragraph 0078-0080 using at least one of the input distractor boxes 613, detecting the candidate distractor boxes in the updated distractor map within the T+1th image frame using at least one of the input distractor boxes 613 of the distractor map of the previous image frame. Lee teaches at Paragraph 0091 that the processor 810 may determine box information of candidate boxes in a current image frame and similarity scores of the candidate boxes by comparing a search region of the current image frame with a template image corresponding to a target object, adjust the similarity scores of the candidate boxes using a distractor map including distractor information of a previous image frame, determine a target box corresponding to the target object and a distractor box corresponding to a distractor of the target object from the candidate boxes based on the adjusted similarity scores, and update the distractor map based on distractor information of the current image frame according to the distractor box.
Lee teaches at Paragraph 0077 that an object tracking apparatus may determine a target box 612 and distractor boxes 613 in a search region 611 of a Tth image frame 615 and determine a distractor map 614 based on distractor information according to the distractor boxes 613. The box 613 constitutes an input distractor. The distractor map 614 may include a mask corresponding to the distractor boxes 613.
Lee teaches at Paragraph 0015 adjusting the similarity scores of the candidate boxes using a distractor map and at Paragraph 0068 that an object tracking apparatus may distinguish between a target object and a distractor using a distractor map. 
Lee teaches at Paragraph 0074 that the object tracking apparatus may determine a mask according to (input) distractor information of a previous image frame and adjust the similarity scores sn based on an overlap state between candidate boxes and the mask. The input mask at least includes an input distractor box of the previous image frame.  Lee teaches at Paragraph 0074 when there a plurality of distractor boxes in the previous image frame, the mask may be set corresponding to all the distractor boxes. 
Lee teaches at Paragraph [0076] The object tracking apparatus may select the target box and the distractor box from among the candidate boxes through a box selection operation 530. The object tracking apparatus may select K candidate boxes with a high reliability score from among the candidate boxes, determine one with the highest reliability score among the K candidate boxes as the target box, and determine the rest of the K candidate boxes as distractor boxes. In the example of FIG. 5, K may be 3. A similarity score s1 of a distractor may have the highest value before the score adjustment operation 520. Through the score adjustment operation 520, however, the similarity score s1 may be reduced to a similarity score s1′ and a similarity score s3′ of the target object may have the highest value.
Lee teaches at Paragraph 0071 that an object tracking apparatus may determine box information of candidate boxes in a Tth image frame and similar scores of the candidate boxes by comparing a search region 401 with a template image 402 corresponding to a target object 403 based on a neural network. 
Lee teaches at Paragraph 0076 that the object tracking apparatus may select the target box and the distractor box from among the candidate boxes through a box selection operation 530 and may select K candidate boxes with a high reliability score from among the candidate boxes);
verifying, by the processing device, visual similarity that the at least one candidate distractor corresponds to the input distractor by comparing candidate distractor image features extracted from the at least one candidate distractor with input distractor image features extracted from the input distractor (
Lee teaches at Paragraph 0085 that a distractor box may be determined through the box selection. Lee teaches at FIG. 5 and Paragraph 0073 that an object tracking apparatus may remove duplicate or overlapping candidate boxes through a candidate box removal operation 510. Lee teaches at Paragraph 0074-0075 calculating the overlap state (similarity) between the candidate box and the mask of the distractor map which is determined based on the distractor information via the box selection of Paragraph 0085. 
Lee teaches at Paragraph 0076-0077 that the object tracking apparatus determines the rest of the K candidate boxes as distractor boxes wherein the distractor boxes 613 are identified based on the input distractor box selection. 
Lee teaches that the similarity scores of a candidate box to the target object based on the its visual overlap (similarity) to the distractor map where a distractor box is selected through the box selection. The claimed visual similarity is mapped to the visual overlap with the distractor map and is inversely proportional to the similarity score of the candidate box. 
Lee teaches at Paragraph [0074] The object tracking apparatus may adjust similarity scores sn through a score adjustment operation 520. The object tracking apparatus may determine a mask according to distractor information of a previous image frame and adjust the similarity scores sn based on an overlap state between candidate boxes and the mask. When there is a candidate box overlapping with the mask, the candidate box may correspond to a distractor, and the object tracking apparatus may thus reduce a similarity score of the candidate box in such a way that the candidate box is not selected as a target box.
Lee teaches at Paragraph 0085 that a distractor box may be determined through the box selection. Lee teaches at Paragraph 0076-0077 that the object tracking apparatus determines the rest of the K candidate boxes as distractor boxes wherein the distractor boxes 613 are identified based on the input distractor box selection. 
Lee teaches that the similarity scores of a candidate box to the target object based on the its visual overlap (similarity) to the distractor map where a distractor box is selected through the box selection. The claimed visual similarity is mapped to the visual overlap with the distractor map and is inversely proportional to the similarity score of the candidate box. 
Lee teaches at Paragraph [0074] The object tracking apparatus may adjust similarity scores sn through a score adjustment operation 520. The object tracking apparatus may determine a mask according to distractor information of a previous image frame and adjust the similarity scores sn based on an overlap state between candidate boxes and the mask. When there is a candidate box overlapping with the mask, the candidate box may correspond to a distractor, and the object tracking apparatus may thus reduce a similarity score of the candidate box in such a way that the candidate box is not selected as a target box.
Lee teaches at FIG. 4 based on the location of the target object, the input distractors 411 with their locations are identified. Lee teaches at FIG. 5 based on the location of the target object, the input distractors are identified within the score adjustment 520 and the output candidate distractor boxes are identified within the target/distractor box selection 530. Lee teaches at FIG. 6 that the candidate output distractor boxes are identified in the current image frame based on the location of the target box 612 and the location of the input distractor 613 wherein the location of the ROI in the current image frame corresponds to the location of the target box 612 and/or the location of the input distractor box 613. Moreover, the location of the input distractor is also input to the object tracking module as shown in FIG. 6. The location of the input distractor in the Score adjustment 520 is also input to the Target/distractor box selection 530 as shown in FIG. 5. The location of the input distractor 411 is also input to the score adjustment 420 as shown in FIG. 4. The location of the input distractor in the distractor map 405 is also input to the score adjustment 420 to generate the output candidate distractor boxes 421 and 422. 
Lee teaches verifying that the at least one candidate distractor within the candidate boxes of the current image frame corresponds to the input distractor 613 of the search region 611 by comparing the similarity scores extracted from the candidate distractors with the input distractor in the distractor map of the previous image frame. 
Lee teaches at Paragraph 0078-0080 using at least one of the input distractor boxes 613, detecting the candidate distractor boxes in the updated distractor map within the T+1th image frame using at least one of the input distractor boxes 613 of the distractor map of the previous image frame. Lee teaches at Paragraph 0091 that the processor 810 may determine box information of candidate boxes in a current image frame and similarity scores of the candidate boxes by comparing a search region of the current image frame with a template image corresponding to a target object, adjust the similarity scores of the candidate boxes using a distractor map including distractor information of a previous image frame, determine a target box corresponding to the target object and a distractor box corresponding to a distractor of the target object from the candidate boxes based on the adjusted similarity scores, and update the distractor map based on distractor information of the current image frame according to the distractor box.
Lee teaches at Paragraph 0077 that an object tracking apparatus may determine a target box 612 and distractor boxes 613 in a search region 611 of a Tth image frame 615 and determine a distractor map 614 based on distractor information according to the distractor boxes 613. The box 613 constitutes an input distractor. The distractor map 614 may include a mask corresponding to the distractor boxes 613.
Lee teaches at Paragraph 0015 adjusting the similarity scores of the candidate boxes using a distractor map and at Paragraph 0068 that an object tracking apparatus may distinguish between a target object and a distractor using a distractor map. 
Lee teaches at Paragraph 0071 that an object tracking apparatus may determine box information of candidate boxes in a Tth image frame and similar scores of the candidate boxes by comparing a search region 401 with a template image 402 corresponding to a target object 403 based on a neural network. Lee teaches at Paragraph 0074 that when there is a plurality of distractor boxes in the previous image frame, the mask may be set corresponding to all the distractor boxes. 
Lee teaches at Paragraph 0076 that the object tracking apparatus may select the target box and the distractor box from among the candidate boxes through a box selection operation 530 and may select K candidate boxes with a high reliability score from among the candidate boxes as the target box and determine the rest of the K candidate boxes as distractor boxes); 
Displaying, by the processing device in the user interface, indications of the input distractor and the at least one candidate distractor within the digital image as being visually similar (Lee teaches at Paragraph 0085 that a distractor box may be determined through the box selection. Lee teaches at FIG. 5 and Paragraph 0073 that an object tracking apparatus may remove duplicate or overlapping candidate boxes through a candidate box removal operation 510. Lee teaches at Paragraph 0074-0075 calculating the overlap state (similarity) between the candidate box and the mask of the distractor map which is determined based on the distractor information via the box selection of Paragraph 0085. 
Lee teaches at Paragraph 0076-0077 that the object tracking apparatus determines the rest of the K candidate boxes as distractor boxes wherein the distractor boxes 613 are identified based on the input distractor box selection. 
Lee teaches that the similarity scores of a candidate box to the target object based on the its visual overlap (similarity) to the distractor map where a distractor box is selected through the box selection. The claimed visual similarity is mapped to the visual overlap with the distractor map and is inversely proportional to the similarity score of the candidate box. 
Lee teaches at Paragraph [0074] The object tracking apparatus may adjust similarity scores sn through a score adjustment operation 520. The object tracking apparatus may determine a mask according to distractor information of a previous image frame and adjust the similarity scores sn based on an overlap state between candidate boxes and the mask. When there is a candidate box overlapping with the mask, the candidate box may correspond to a distractor, and the object tracking apparatus may thus reduce a similarity score of the candidate box in such a way that the candidate box is not selected as a target box);  
Receiving, by the processing device, authorization to remove the input distractor and the at least one candidate distractor (Lee teaches at Paragraph 0085 that a distractor box may be determined through the box selection. Lee teaches at FIG. 5 and Paragraph 0073 that an object tracking apparatus may remove duplicate or overlapping candidate boxes through a candidate box removal operation 510. Lee teaches at Paragraph 0074-0075 calculating the overlap state (similarity) between the candidate box and the mask of the distractor map which is determined based on the distractor information via the box selection of Paragraph 0085); 
displaying, by the processing device, responsive to the receiving of the authorization, an edited digital image having the input distractor and the at least one candidate distractor removed from the digital image (Lee teaches at Paragraph 0085 that a distractor box may be determined through the box selection. Lee teaches at FIG. 5 and Paragraph 0073 that an object tracking apparatus may remove duplicate or overlapping candidate boxes through a candidate box removal operation 510. Lee teaches at Paragraph 0074-0075 calculating the overlap state (similarity) between the candidate box and the mask of the distractor map which is determined based on the distractor information via the box selection of Paragraph 0085). 
However, Lee does not explicitly teach that the mask of the distractor map is input by the user. Kanazawa teaches at Paragraph 0060 that the mask is manually drawn by a user of a client device.  
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have provided the input distractor by the user input by drawing outlines of the input distractors and to have identified a mask that include the candidate distractors based on the one or more input distractors according Kanazawa to have identified candidate distractors based on the 2D features or pixel values of the candidate distractors to output a masked version of the input image that defines one or more regions for inpainting (Kanazawa Paragraph 0038-0039) to have automatically identified regions to remove from an input image using a neural network. One of the ordinary skill in the art would have been motivated to have automatically identified candidate distractors in the output mask based on user’s input of one or more distractors (Kanazawa Paragraph 0079). 
Lee does not explicitly teach: receiving, by the processing device, authorization to remove the input distractor and the at least one candidate distractor. 
Liba explicitly teaches: receiving, by the processing device, authorization to remove the input distractor and the at least one candidate distractor (Liba teaches at Paragraph 0064 that bounding boxes 432, 434 and 436 may be associated with high distractor scores and may be provided as selectable boxes and a user may exercise an option to choose whether or not delete the corresponding objects). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have provided an authorization to remove the distractors using selectable boxes in the user interface of Liba into Lee to have allowed authorization to remove the distractors in Lee. One of the ordinary skill in the art would have been motivated to have allowed the user to directly operate in the user interface to have removed distractors. e


Huynh teaches a method comprising:
receiving, by a processing device, an input specifying a location within a digital image displayed in a user interface (Huynh teaches at FIG. 1 we present a pipeline that enables the automatic segmentation of distractors in photos using a single click. With just one click, our pipeline can detect and mask the distracting object in the photos and identify other similar objects that may also be causing distraction and we can then use popular photo editing tools to remove the visual distractions seamlessly);
identifying, by the processing device, an input distractor based on the location, the identifying performed using a machine-learning model (Huynh teaches at Section 1 that our approach is to train an instance segmentation model like Mask-RCNN to detect and segment distractors in a supervised manner. 
Huynh teaches at FIG. 1 we present a pipeline that enables the automatic segmentation of distractors in photos using a single clock. With just one click, our pipeline can detect and mask the distracting object in the photos and identify other similar objects that may also be causing distraction and we can then use popular photo editing tools to remove the visual distractions seamlessly and at Section 3 that the 1C-DSN is trained with similar loss functions as in Entity Segmentation);
detecting, by the processing device, at least one candidate distractor based on the input distractor as being visually similar to the input distractor (
Huynh teaches at FIG. 1 we present a pipeline that enables the automatic segmentation of distractors in photos using a single clock. With just one click, our pipeline can detect and mask the distracting object in the photos and identify other similar objects that may also be causing distraction and we can then use popular photo editing tools to remove the visual distractions seamlessly and at Section 3 that the 1C-DSN is trained with similar loss functions as in Entity Segmentation. 
Huynh teaches at Section 3.4 that we further run an iterative process to sample more similar distractors to ensure that we entirely select all the distractors similar to the initial click and in the Abstract we also showcase how a transformer-based module can be used to identify more distracting regions similar to the user’s click position);
verifying, by the processing device, visual similarity that the at least one candidate distractor corresponds to the input distractor by comparing candidate distractor image features extracted from the at least one candidate distractor with input distractor image features extracted from the input distractor (Huynh teaches at FIG. 1 we present a pipeline that enables the automatic segmentation of distractors in photos using a single click. With just one click, our pipeline can detect and mask the distracting object in the photos and identify other similar objects that may also be causing distraction and we can then use popular photo editing tools to remove the visual distractions seamlessly and at Section 3 that the 1C-DSN is trained with similar loss functions as in Entity Segmentation. 
Huynh teaches at Section 3.4 that we further run an iterative process to sample more similar distractors to ensure that we entirely select all the distractors similar to the initial click and in the Abstract we also showcase how a transformer-based module can be used to identify more distracting regions similar to the user’s click position); 
Displaying, by the processing device in the user interface, indications of the input distractor and the at least one candidate distractor within the digital image as being visually similar; 
Receiving, by the processing device, authorization to remove the input distractor and the at least one candidate distractor; 
displaying, by the processing device, responsive to the receiving of the authorization, an edited digital image having the input distractor and the at least one candidate distractor removed from the digital image (Huynh teaches at FIG. 1 we present a pipeline that enables the automatic segmentation of distractors in photos using a single click. With just one click, our pipeline can detect and mask the distracting object in the photos and identify other similar objects that may also be causing distraction and we can then use popular photo editing tools to remove the visual distractions seamlessly and at Section 3 that the 1C-DSN is trained with similar loss functions as in Entity Segmentation. 
Huynh teaches at Section 3.4 that we further run an iterative process to sample more similar distractors to ensure that we entirely select all the distractors similar to the initial click and in the Abstract we also showcase how a transformer-based module can be used to identify more distracting regions similar to the user’s click position).
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have provided the user click on an input distractor to have identified candidate distractors in the image according to Huynh to have identified candidate distractors based on similarities according to Lee FIG. 6 to have selected candidate distractor objects via the user click input of Lee to have removed the distractor objects based on the user click input of selecting a distractor object and to have generated candidate distractor objects based on similarities according to Lee to have removed both the candidate distractors and the input distractor. One of the ordinary skill in the art would have been motivated to have provided a method of generating a click input for selecting a distractor object and generating candidate distractor objects based on the input distractor object based on similarities of the candidate distractor objects to the input distractor object. 
Re Claim 2: 
The claim 2 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the identifying the input distractor includes generating an input distractor segmentation mask based on the input distractor location using the machine-learning model. 
Lee/Huynh further teaches the claim limitation that the identifying the input distractor includes generating an input distractor segmentation mask based on the input distractor location using the machine-learning model (
Lee teaches at Paragraph [0014] that the determining of the box information of the candidate boxes and the similarity scores of the candidate boxes may include inputting the search region and the template image to a neural network-based image comparison model, and determining the box information of the candidate boxes and the similarity scores of the candidate boxes from an output of the image comparison model. Lee teaches at Paragraph 0074 that the object tracking apparatus may determine a mask according to distractor information and at Paragraph 0080 that the object tracking apparatus may estimate the distractor motion 617 using a neural network-based motion estimation model. Nielson teaches at FIG. 2 and Section 3 that image segmentation is computed using a fast iterative statistical region-growing process. 
Huynh teaches at Section 1 that our approach is to train an instance segmentation model like Mask-RCNN to detect and segment distractors in a supervised manner. 
Huynh teaches Section that we introduce a novel one-click distractor segmentation network that utilizes a single-click-based approach to segment medium to small distracting objects with high accuracy and at Section 3 that for distractor selection tasks, many objects of small size should be easier to choose with one click and the segmentation module finally outputs multiple binary segmentation masks corresponding to the user click positions and at Section 3.2 that we may come across multiple instances of distractors that share similar categories and appearances). 

Re Claim 3: 
The claim 3 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the detecting the at least one candidate distractor includes identifying a region within the digital image that corresponds to the input distractor using feature matching based on the input distractor and the region. 
Lee/Huynh further teaches the claim limitation that the detecting the at least one candidate distractor includes identifying a region within the digital image that corresponds to the input distractor using feature matching based on the input distractor and the region (
Lee teaches at Paragraph [0014] that the determining of the box information of the candidate boxes and the similarity scores of the candidate boxes may include inputting the search region and the template image to a neural network-based image comparison model, and determining the box information of the candidate boxes and the similarity scores of the candidate boxes from an output of the image comparison model. Lee teaches at Paragraph 0074 that the object tracking apparatus may determine a mask according to distractor information and at Paragraph 0080 that the object tracking apparatus may estimate the distractor motion 617 using a neural network-based motion estimation model. Nielson teaches at FIG. 2 and Section 3 that image segmentation is computed using a fast iterative statistical region-growing process. 
Huynh teaches Section that we introduce a novel one-click distractor segmentation network that utilizes a single-click-based approach to segment medium to small distracting objects with high accuracy and at Section 3 that for distractor selection tasks, many objects of small size should be easier to choose with one click and the segmentation module finally outputs multiple binary segmentation masks corresponding to the user click positions and at Section 3.2 that we may come across multiple instances of distractors that share similar categories and appearances). 
Re Claim 4: 
The claim 4 encompasses the same scope of invention as that of the claim 3 except additional claim limitation that the feature matching includes cross-scale feature matching. 
Huyh further teaches the claim limitation that the feature matching includes cross-scale feature matching (
Huynh teaches at Section 3.2 that we propose this Click Proposal Network to mine similar regions using cross-scale feature matching and regress the click positions from the high-confident regions. Then we can feed those click coordinates back to our 1C-DSN for masking to obtain the masks of all the similar distractors. 
Huynh teaches Section that we introduce a novel one-click distractor segmentation network that utilizes a single-click-based approach to segment medium to small distracting objects with high accuracy and at Section 3 that for distractor selection tasks, many objects of small size should be easier to choose with one click and the segmentation module finally outputs multiple binary segmentation masks corresponding to the user click positions and at Section 3.2 that we may come across multiple instances of distractors that share similar categories and appearances). 
Re Claim 5: 
The claim 5 encompasses the same scope of invention as that of the claim 3 except additional claim limitation that identifying a candidate distractor location within the digital image by a regression operation as applied to the region. 
Huyhn further teaches the claim limitation that identifying a candidate distractor location within the digital image by a regression operation as applied to the region (
Huynh teaches at Section 3.2 that we propose this Click Proposal Network to mine similar regions using cross-scale feature matching and regress the click positions from the high-confident regions. Then we can feed those click coordinates back to our 1C-DSN for masking to obtain the masks of all the similar distractors. 
Huynh teaches Section that we introduce a novel one-click distractor segmentation network that utilizes a single-click-based approach to segment medium to small distracting objects with high accuracy and at Section 3 that for distractor selection tasks, many objects of small size should be easier to choose with one click and the segmentation module finally outputs multiple binary segmentation masks corresponding to the user click positions and at Section 3.2 that we may come across multiple instances of distractors that share similar categories and appearances). 
Re Claim 6: 
The claim 6 encompasses the same scope of invention as that of the claim 5 except additional claim limitation that detecting the at least one candidate distractor includes generating a candidate distractor segmentation mask as identifying the at least one candidate distractor based on the candidate distractor location. 
Huynh further teaches the claim limitation that detecting the at least one candidate distractor includes generating a candidate distractor segmentation mask as identifying the at least one candidate distractor based on the candidate distractor location (
Huynh teaches at Section 3.2 that we propose this Click Proposal Network to mine similar regions using cross-scale feature matching and regress the click positions from the high-confident regions. Then we can feed those click coordinates back to our 1C-DSN for masking to obtain the masks of all the similar distractors. 
Huynh teaches Section that we introduce a novel one-click distractor segmentation network that utilizes a single-click-based approach to segment medium to small distracting objects with high accuracy and at Section 3 that for distractor selection tasks, many objects of small size should be easier to choose with one click and the segmentation module finally outputs multiple binary segmentation masks corresponding to the user click positions and at Section 3.2 that we may come across multiple instances of distractors that share similar categories and appearances). 
Re Claim 7: 
The claim 7 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that generating the edited digital image by removing the input distractor and the candidate distractor from the digital image using an object removal technique implemented using machine learning. 
Huynh further teach the claim limitation generating the edited digital image by removing the input distractor and the candidate distractor from the digital image using an object removal technique implemented using machine learning (
Huynh teaches at Section 1 that our approach is to train an instance segmentation model like Mask-RCNN to detect and segment distractors in a supervised manner. 
Huynh teaches at FIG. 1 we present a pipeline that enables the automatic segmentation of distractors in photos using a single clock. With just one click, our pipeline can detect and mask the distracting object in the photos and identify other similar objects that may also be causing distraction and we can then use popular photo editing tools to remove the visual distractions seamlessly. 
Huynh teaches in Abstract that manually selecting and removing these small and dense distracting regions can be laborious and time-consuming task, we propose an interactive distractor selection method that is optimized to achieve the task with just a single click and we also showcase how a transformer-based module can be used to identify more distracting regions similar to the user’s click position and at Section 3.3 that this module performs pairwise comparisons between the generated masks and the initial click and removes any click proposals that generate a mask. 
Huynh teaches Section that we introduce a novel one-click distractor segmentation network that utilizes a single-click-based approach to segment medium to small distracting objects with high accuracy and at Section 3 that for distractor selection tasks, many objects of small size should be easier to choose with one click and the segmentation module finally outputs multiple binary segmentation masks corresponding to the user click positions and at Section 3.2 that we may come across multiple instances of distractors that share similar categories and appearances). 
Re Claim 8: 
The claim 8 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the candidate distractor image features and the input distractor image features are extracted using a machine-learning model. 
Huynh further teach the claim limitation the candidate distractor image features and the input distractor image features are extracted using a machine-learning model (
Huynh teaches at Section 1 that our approach is to train an instance segmentation model like Mask-RCNN to detect and segment distractors in a supervised manner. 
Huynh teaches at FIG. 1 we present a pipeline that enables the automatic segmentation of distractors in photos using a single clock. With just one click, our pipeline can detect and mask the distracting object in the photos and identify other similar objects that may also be causing distraction and we can then use popular photo editing tools to remove the visual distractions seamlessly. 
Huynh teaches in Abstract that manually selecting and removing these small and dense distracting regions can be laborious and time-consuming task, we propose an interactive distractor selection method that is optimized to achieve the task with just a single click and we also showcase how a transformer-based module can be used to identify more distracting regions similar to the user’s click position and at Section 3.3 that this module performs pairwise comparisons between the generated masks and the initial click and removes any click proposals that generate a mask. 
Huynh teaches Section that we introduce a novel one-click distractor segmentation network that utilizes a single-click-based approach to segment medium to small distracting objects with high accuracy and at Section 3 that for distractor selection tasks, many objects of small size should be easier to choose with one click and the segmentation module finally outputs multiple binary segmentation masks corresponding to the user click positions and at Section 3.2 that we may come across multiple instances of distractors that share similar categories and appearances). 
Re Claim 9: 
The claim 9 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the input is a single input specified using a single set of coordinates. 
Huynh further teach the claim limitation that the input is a single input specified using a single set of coordinates (
Huynh teaches at Section 1 that our approach is to train an instance segmentation model like Mask-RCNN to detect and segment distractors in a supervised manner. 
Huynh teaches at FIG. 1 we present a pipeline that enables the automatic segmentation of distractors in photos using a single clock. With just one click, our pipeline can detect and mask the distracting object in the photos and identify other similar objects that may also be causing distraction and we can then use popular photo editing tools to remove the visual distractions seamlessly. 
Huynh teaches in Abstract that manually selecting and removing these small and dense distracting regions can be laborious and time-consuming task, we propose an interactive distractor selection method that is optimized to achieve the task with just a single click and we also showcase how a transformer-based module can be used to identify more distracting regions similar to the user’s click position and at Section 3.3 that this module performs pairwise comparisons between the generated masks and the initial click and removes any click proposals that generate a mask. 
Huynh teaches Section that we introduce a novel one-click distractor segmentation network that utilizes a single-click-based approach to segment medium to small distracting objects with high accuracy and at Section 3 that for distractor selection tasks, many objects of small size should be easier to choose with one click and the segmentation module finally outputs multiple binary segmentation masks corresponding to the user click positions and at Section 3.2 that we may come across multiple instances of distractors that share similar categories and appearances). 
Re Claim 10: 
The claim 10 encompasses the same scope of invention as that of the claim 9 except additional claim limitation that the input is a single click input using a cursor control or single tap as a gesture received via a user interface.
Huynh further teach the claim limitation that the input is a single click input using a cursor control or single tap as a gesture received via a user interface (
Huynh teaches at Section 1 that our approach is to train an instance segmentation model like Mask-RCNN to detect and segment distractors in a supervised manner. 
Huynh teaches at FIG. 1 we present a pipeline that enables the automatic segmentation of distractors in photos using a single clock. With just one click, our pipeline can detect and mask the distracting object in the photos and identify other similar objects that may also be causing distraction and we can then use popular photo editing tools to remove the visual distractions seamlessly. 
Huynh teaches in Abstract that manually selecting and removing these small and dense distracting regions can be laborious and time-consuming task, we propose an interactive distractor selection method that is optimized to achieve the task with just a single click and we also showcase how a transformer-based module can be used to identify more distracting regions similar to the user’s click position and at Section 3.3 that this module performs pairwise comparisons between the generated masks and the initial click and removes any click proposals that generate a mask. 
Huynh teaches Section that we introduce a novel one-click distractor segmentation network that utilizes a single-click-based approach to segment medium to small distracting objects with high accuracy and at Section 3 that for distractor selection tasks, many objects of small size should be easier to choose with one click and the segmentation module finally outputs multiple binary segmentation masks corresponding to the user click positions and at Section 3.2 that we may come across multiple instances of distractors that share similar categories and appearances). 
Re Claim 11: 
The claim 11 is in parallel with the claim 1 in an apparatus form. The claim 11 is subject to the same rationale of rejection as the claim 1. 
Moreover, Lee further teaches a computing device comprising:
a processing device (e.g., processor 910 of FIG. 9); and
a computer-readable storage medium storing instructions that, responsive to execution by the processing device, causes the processing device to perform operations [of the method of the claim 1] (Lee teaches at FIG. 9 and Paragraph 0096-0098 that a memory 920 storing program instruction is responsive to the execution by the processor 910 causing the processor to perform the operations described in FIGS. 1-8). 
Re Claim 12: 
The claim 12 encompasses the same scope of invention as that of the claim 11 except additional claim limitation that the candidate distractor image features and the input distractor image features are extracted using a machine-learning model. 
The claim 12 is in parallel with the claim 8 in an apparatus form. The claim 12 is subject to the same rationale of rejection as the claim 8. 
Re Claim 13: 
The claim 13 encompasses the same scope of invention as that of the claim 11 except additional claim limitation that the generating the candidate distractor segmentation mask includes identifying a region within the digital image that corresponds to the input distractor using feature matching based on the input distractor and the region. 
The claim 13 is in parallel with the claim 3 in an apparatus form. The claim 13 is subject to the same rationale of rejection as the claim 3. 
Re Claim 14: 
The claim 14 encompasses the same scope of invention as that of the claim 13 except additional claim limitation that the feature matching includes cross-scale feature matching.
The claim 14 is in parallel with the claim 4 in an apparatus form. The claim 14 is subject to the same rationale of rejection as the claim 4. 
Re Claim 15: 
The claim 15 encompasses the same scope of invention as that of the claim 13 except additional claim limitation that identifying a candidate distractor location within the digital image by a regression operation as applied to the region and the generating of the candidate distractor segmentation mask is based on the candidate distractor location. 
The claim 15 is in parallel with the claim 6 in an apparatus form. The claim 16 is subject to the same rationale of rejection as the claim 6. 
Re Claim 16: 
The claim 16 encompasses the same scope of invention as that of the claim 11 except additional claim limitation that the operations further comprise generating an edited digital image by removing the input distractor and the candidate distractor from the digital image using an object removal technique implemented using machine learning. 
The claim 16 is in parallel with the claim 7 in an apparatus form. The claim 16 is subject to the same rationale of rejection as the claim 7.  
Re Claim 17: 
The claim 17 is in parallel with the claim 1 in the form of a computer program product. The claim 17 is subject to the same rationale of rejection as the claim 1. 
Moreover, Lee further teaches one or more computer-readable storage media that are non-transitory and storing instructions that, responsive to execution by a processing device, causes the processing device to perform operations [of the method of the claim 1] (Lee teaches at FIG. 9 and Paragraph 0096-0098 that a memory 920 storing program instruction is responsive to the execution by the processor 910 causing the processor to perform the operations described in FIGS. 1-8). 
Re Claim 18: 
The claim 18 encompasses the same scope of invention as that of the claim 17 except additional claim limitation that the operations further comprise verifying that the candidate distractor corresponds to the input distractor by comparing candidate distractor image features extracted based on the candidate distractor segmentation mask with input distractor image features extracted based on the input distractor segmentation mask. 
The claim 18 is in parallel with the claim 1 in the form of a computer program product. The claim 18 is subject to the same rationale of rejection as the claim 1. 
Re Claim 19: 
The claim 19 encompasses the same scope of inventio as that of the claim 17 except additional claim limitation that the candidate distractor location is indicated using a single set of coordinates with respect to the digital image. 
The claim 19 is in parallel with the claim 9 in the form of a computer program product. The claim 19 is subject to the same rationale of rejection as the claim 9. 
Re Claim 20: 
The claim 20 encompasses the same scope of invention as that of the claim 17 except additional claim limitation that the location of the input distractor is specified responsive to a user input received via a user interface specifying a single set of coordinates with respect to the digital image.
The claim 20 is in parallel with the claim 9 in the form of a computer program product. The claim 20 is subject to the same rationale of rejection as the claim 9. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIN CHENG WANG whose telephone number is (571)272-7665. The examiner can normally be reached Mon-Fri 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at 571-270-0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JIN CHENG WANG/Primary Examiner, Art Unit 2617
Read full office action
Prosecution Timeline

Dec 04, 2023
Application Filed
Jun 23, 2025
Non-Final Rejection — §103
Aug 08, 2025
Response Filed
Aug 20, 2025
Final Rejection — §103
Oct 27, 2025
Request for Continued Examination
Nov 05, 2025
Response after Non-Final Action
Feb 17, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/270,926
Patent 12594883
DISPLAY DEVICE FOR DISPLAYING PATHS OF A VEHICLE
2y 5m to grant Granted Apr 07, 2026
16/703,494
Patent 12597086
Tile Region Protection in a Graphics Processing System
2y 5m to grant Granted Apr 07, 2026
18/291,702
Patent 12592012
METHOD, APPARATUS, ELECTRONIC DEVICE AND READABLE MEDIUM FOR COLLAGE MAKING
2y 5m to grant Granted Mar 31, 2026
17/655,739
Patent 12586270
GENERATING AND MODIFYING DIGITAL IMAGES USING A JOINT FEATURE STYLE LATENT SPACE OF A GENERATIVE NEURAL NETWORK
2y 5m to grant Granted Mar 24, 2026
17/888,216
Patent 12579709
IMAGE SPECIAL EFFECT PROCESSING METHOD AND APPARATUS
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
59%
Grant Probability
69%
With Interview (+10.3%)
3y 7m
Median Time to Grant
High
PTA Risk
Based on 832 resolved cases by this examiner. Grant probability derived from career allow rate.