Last updated: April 19, 2026
Application No. 18/159,134
TARGET DETECTION METHOD, TARGET DETECTION MODEL TRAINING METHOD, AND DEVICE

Non-Final OA §103
Filed
Jan 25, 2023
Examiner
BUDISALICH, ANDREW STEVEN
Art Unit
2662
Tech Center
2600 — Communications
Assignee
Nanjing Semidrive Technology Ltd.
OA Round
3 (Non-Final)
Interview Optional

— +8.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 46 resolved cases, 2023–2026
Examiner Intelligence

BUDISALICH, ANDREW STEVEN View full profile →
Grants 78% — above average
Career Allow Rate
36 granted / 46 resolved
+16.3% vs TC avg
Moderate +9% lift
Without
With
+8.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
14.5%
-25.5% vs TC avg
§103
65.6%
+25.6% vs TC avg
§102
5.2%
-34.8% vs TC avg
§112
13.0%
-27.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 46 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12 has been entered.

Status of Claims
Claims 1-3, 5-13, and 15-21 are pending. Claims 4 and 14 are canceled.

Response to Arguments
Applicant’s arguments, see p.11-15, filed 12/09/2025, with respect to the rejections of Claims 1-3, 5-13, and 15-21  under 35 U.S.C. 103 have been fully considered but are moot because Applicant’s amendments have altered the scope of the claims, and therefore, necessitated new grounds of rejection which are presented below. Additionally, Applicant argues that Han is completely silent on “pasting, based on background information around the selected target, a non-target sub-area not including the selected target into the deleted sub-area. Examiner respectfully disagrees, and, for further clarification, Han, Paras. 87-88, teaches generating a replacement image corresponding to the object-deleted area of the reference image and synthesizing or adding the generated replacement image to the object-deleted area of the reference image to create a new image wherein the replacement image which is synthesized to the object-deleted area is generated according to the corresponding portion of the object and background by using the color of the surroundings, i.e., pasting a non-target sub-area not including the target being the replacement image area into the deleted sub-area in the first image to generate the sample image based on background information around the selected target being the color of the surroundings.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US 20230196801 A1) in view of Zhao et al. (US 20220067375 A1), Xin (US 20240029256 A1), and Han (US 20210192751 A1).

Regarding Claim 1, Li teaches "A target detection method, comprising: inputting a to-be-detected image into a backbone included in a target detection model to determine that an output of the backbone is a feature corresponding to the to-be-detected image"; (Li, Abstract, teaches an object detection method wherein features are extracted from generated images by a backbone network, i.e., target detection comprising inputting an image into a backbone model to output features of the image);
"inputting the feature corresponding to the to-be-detected image into a prediction head included in the target detection model to determine that an output of the prediction head is target information included in the to-be-detected image"; (Li, Para. 79, teaches that 3D object prediction is performed by prediction heads after all the features are generated, i.e., output target information of the image from the prediction head after inputting the generated features of the image into a prediction head);
"wherein: the target information included in the to-be-detected image includes at least one of position information, size information, orientation information, depth information, or blocking information of a target in the to-be-detected image"; (Li, Para. 21, teaches determining spatial features of the predicted 3D object comprises determining the depth of the 3D object based on the height and width of the 3D object, i.e., the target information includes depth information of a target in the image).
However, Li does not explicitly teach "and the target detection model is trained and obtained based on a sample image; wherein the sample image is generated by: recognizing one or more targets included in a first image based on an example segmentation method; selecting, according to a predetermined probability, a target of the one or more targets; identifying a target sub-area including the selected target in the first image; deleting the target sub-area including the selected target from the first image to generate a deleted sub-area; and pasting, based on background information around the selected target, a non-target sub-area not including the selected target into the deleted sub-area in the first image to generate the sample image".
In an analogous field of endeavor, Zhao teaches "and the target detection model is trained and obtained based on a sample image"; (Zhao, Para. 36, teaches training an object detection model by using training data sets that include a training image slice containing an object detection box and a training image slice that does not contain an object detection box in order to learn the background areas that do not contain an object, i.e., target detection model is trained on a sample image including a target sub-area being the object detection box and a non-target sub-area being the slice that does not contain the object detection box).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Li by including the training of a target detection model by using an image that has a target area and a non-target area taught by Zhao. One of ordinary skill in the art would be motivated to combine the references since it reduces false detections (Zhao, Para. 36, teaches the motivation of combination to be to reduce false detections of background areas that do not contain an object detection box).
However, the combination of references of Li in view of Zhao does not explicitly teach “wherein the sample image is generated by: recognizing one or more targets included in a first image based on an example segmentation method; selecting, according to a predetermined probability, a target of the one or more targets; identifying a target sub-area including the selected target in the first image; deleting the target sub-area including the selected target from the first image to generate a deleted sub-area; and pasting, based on background information around the selected target, a non-target sub-area not including the selected target into the deleted sub-area in the first image to generate the sample image".
In an analogous field of endeavor, Xin teaches "wherein the sample image is generated by: recognizing one or more targets included in a first image based on an example segmentation method"; (Xin, Paras. 4, 38, and 47, teaches determining a type and position of at least one target object in the region using an instance segmentation model for processing the image data wherein the instance segmentation model may be capable of target object recognition and tracking when there are different possible views and/or features and wherein the instance segmentation algorithm may identify certain objects in the image and the human trainer may select the type from the predetermined types, i.e., sample image generated by recognizing one or more targets included in an image based on an example segmentation method being the instance segmentation);
"selecting, according to a predetermined probability, a target of the one or more targets"; (Xin, Paras. 7-8, teaches selecting one target object from the plurality of target objects of the same type that has the highest prediction probability according to the instance segmentation model or selecting one target object from the adjacent target objects that has the highest prediction probability of the different types according to the instance segmentation model, i.e., selecting a target of the one or more targets according to a predetermined probability being the probability according to the instance segmentation model).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Li and Zhao wherein a new sample image is generated by including the recognition of targets in an image based on a segmentation method and selected one of the targets according to a predetermined probability taught by Xin. One of ordinary skill in the art would be motivated to combine the references since it improves imaging outcomes (Xin, Para. 3, teaches the motivation of combination to be to improve the outcome of patient imaging).
However, the combination of references of Li in view of Zhao and Xin does not explicitly teach “identifying a target sub-area including the selected target in the first image; deleting the target sub-area including the selected target from the first image to generate a deleted sub-area; and pasting, based on background information around the selected target, a non-target sub-area not including the selected target into the deleted sub-area in the first image to generate the sample image".
In an analogous field of endeavor, Han teaches "identifying a target sub-area including the selected target in the first image"; (Han, Para. 7, teaches a controller generating images and recognizing one or more objects included in the images, i.e., generate sample image by identifying a target sub-area including a target in the image as the recognition of an object in the image);
"deleting the target sub-area including the selected target from the first image to generate a deleted sub-area"; (Han, Para. 88, teaches deleting the moving object from the reference image, i.e., deleting the target sub-area including the target from the first image as the deleted moving object wherein a deleted sub-area would be generated);
"and pasting, based on background information around the selected target, a non-target sub-area not including the selected target into the deleted sub-area in the first image to generate the sample image"; (Han, Paras. 87-88, teaches generating a replacement image corresponding to the object-deleted area of the reference image and synthesizing or adding the generated replacement image to the object-deleted area of the reference image to create a new image wherein the replacement image which is synthesized to the object-deleted area is generated according to the corresponding portion of the object and background by using the color of the surroundings, i.e., pasting a non-target sub-area not including the target being the replacement image area into the deleted sub-area in the first image to generate the sample image based on background information around the selected target being the color of the surroundings).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Li, Zhao, and Xin by including the deletion of a target area in an image and pasting a non-target area into the deleted area to generate the sample image taught by Han. One of ordinary skill in the art would be motivated to combine the references since it increases user interest (Han, Para. 15, teaches the motivation of combination to be to increase the user's interest by acquiring an image including only desired objects).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 12 recites a system with elements corresponding to the steps recited in Claim 1. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Li, Zhao, Xin, and Han references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of the Li, Zhao, Xin, and Han references discloses a processor and a memory to execute the method (for example, see Li, Paragraph 30).

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Xin in view of Han and Hassan et al. (US 20230077207 A1).

Regarding Claim 2, Xin teaches "A target detection model training method comprising: recognizing one or more targets included in a first image based on an example segmentation method"; (Xin, Paras. 4, 38, and 47, teaches determining a type and position of at least one target object in the region using an instance segmentation model for processing the image data wherein the instance segmentation model may be capable of target object recognition and tracking when there are different possible views and/or features and wherein the instance segmentation algorithm may identify certain objects in the image and the human trainer may select the type from the predetermined types, i.e., sample image generated by recognizing one or more targets included in an image based on an example segmentation method being the instance segmentation);
"selecting, according to a predetermined probability, a target of the one or more targets"; (Xin, Paras. 7-8, teaches selecting one target object from the plurality of target objects of the same type that has the highest prediction probability according to the instance segmentation model or selecting one target object from the adjacent target objects that has the highest prediction probability of the different types according to the instance segmentation model, i.e., selecting a target of the one or more targets according to a predetermined probability being the probability according to the instance segmentation model).
However, Xin does not explicitly teach "identifying a target sub-area including the selected target in the first image; deleting the target sub-area including the selected target from the first image to generate a deleted sub-area; pasting, based on background information around the selected target, a non-target sub-area not including the selected target into the deleted sub-area in the first image to generate a first sample image; inputting the first sample image into a backbone included in a target detection model to determine that an output of the backbone is a feature corresponding to the first sample image; inputting the feature corresponding to the first sample image into a prediction head included in the target detection model to determine a first sub-loss; updating parameters of the backbone and the prediction head based on the first sub-loss.
In an analogous field of endeavor, Han teaches "identifying a target sub-area including the selected target in the first image"; (Han, Para. 7, teaches a controller generating images and recognizing one or more objects included in the images, i.e., identifying a target sub-area including a target in the image as the recognition of an object in the image);
"deleting the target sub-area including the selected target from the first image to generate a deleted sub-area"; (Han, Para. 88, teaches deleting the moving object from the reference image, i.e., deleting the target sub-area including the target from the first image as the deleted moving object wherein a deleted sub-area would be generated);
"pasting, based on background information around the selected target, a non-target sub-area not including the selected target into the deleted sub-area in the first image to generate a first sample image"; (Han, Paras. 87-88, teaches generating a replacement image corresponding to the object-deleted area of the reference image and synthesizing or adding the generated replacement image to the object-deleted area of the reference image to create a new image wherein the replacement image which is synthesized to the object-deleted area is generated according to the corresponding portion of the object and background by using the color of the surroundings, i.e., pasting a non-target sub-area not including the target being the replacement image area into the deleted sub-area in the first image to generate the sample image based on background information around the selected target being the color of the surroundings).
The proposed combination as well as the motivation for combining the Xin and Han references presented in the rejection of Claim 1, applies to claim 2. 
However, the combination of references of Xin in view of Han does not explicitly teach "inputting the first sample image into a backbone included in a target detection model to determine that an output of the backbone is a feature corresponding to the first sample image; inputting the feature corresponding to the first sample image into a prediction head included in the target detection model to determine a first sub-loss; updating parameters of the backbone and the prediction head based on the first sub-loss".
In an analogous field of endeavor, Hassan teaches "inputting the first sample image into a backbone included in a target detection model to determine that an output of the backbone is a feature corresponding to the first sample image"; (Hassan, Para. 7, teaches processing an image using a backbone network to output a set of features, i.e., input the first sample image into a backbone of the target detection model to determine an output of the backbone is a feature corresponding to the image);
"inputting the feature corresponding to the first sample image into a prediction head included in the target detection model to determine a first sub-loss"; (Hassan, Para. 7, teaches inputting the set of features to at least one prediction head wherein a loss function is used, i.e., inputting the feature of the image into a prediction head to determine a first sub-loss corresponding to each prediction head);
"updating parameters of the backbone and the prediction head based on the first sub-loss"; (Hassan, Para. 7, teaches adjusting parameters of the backbone network and at least one prediction head using a loss function, i.e., updating parameters of the backbone and prediction head based on the first sub-loss).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Xin and Han by including the use a backbone network to extract a feature of the image to input the feature into a prediction head and determine a loss to update the model based on the loss taught by Hassan. One of ordinary skill in the art would be motivated to combine the references since it improves accuracy of the model (Hassan, Para. 58, teaches the motivation of combination to be to improve the accuracy of the classifier).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claims 3, 5, 13, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Xin in view of Han, Hassan, and Alhaija et al. "Augmented Reality Meets Computer Vision: Efficient Data generation for Urban Driving Scenes"

Regarding Claim 3, the combination of references of Xin in view of Han and Hassan does not explicitly teach "The method of claim 2, wherein pasting the non-target sub-area not including the target is performed based on depth information of the target sub-area and the non-target sub-area".
In an analogous field of endeavor, Alhaija teaches "The method of claim 2, wherein pasting the non-target sub-area not including the target is performed based on depth information of the target sub-area and the non-target sub-area"; (Alhaija, Figure 2 and Section 3 Data Augmentation Pipeline, teaches the overlaying of a 3D target model on top of a real scene to generate an augmented image, i.e., pasting the target sub-area including the target and the non-target sub-area not including the target into a first image to generate a sample image, wherein 3D locations and poses are used to place the car models in the scene and an environment map of the scene is used wherein a depth-blur operation is used to match the depth-of-field of the camera, i.e., the target sub-area and the non-target sub-area is pasted using depth information wherein the sub-areas correspond to the depth information of the image and the pasted image is the augmented image or first sample image).
	It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Xin, Han, and Hassan by including the pasting of the non-target sub-area being based on depth information taught by Alhaija. One of ordinary skill in the art would be motivated to combine the references since it allows for efficient and accurate augmentation of the image (Alhaija, Section 1. Introduction, teaches the motivation of combination to be to efficiently and highly accurately augment images while keeping the full realism of the background).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

	Regarding Claim 5, the combination of references of Xin in view of Han, Hassan, and Alhaija teaches "The method of claim 2, wherein pasting the non-target sub-area not including the target is performed based on a color of the target corresponding to the target sub-area"; (Alhaija, Figure 2 and Section 3 Data Augmentation Pipeline and Section 4 Evaluation, teaches the overlaying of the 3D target model on top of a real scene to generate an augmented image, i.e., pasting the target sub-area including the target and the non-target sub-area not including the target to an image to generate the sample image being the augmented image, wherein car color is chosen randomly to increase variety in the data, i.e., color distribution of the target in the sample set, and color shifts are applied as well as color curve and Gamma transformations are used to better match the color statistics and contrast of the real data, i.e., color of the target corresponding to the target sub-area).
	The proposed combination as well as the motivation for combining the Xin, Han, Hassan, and Alhaija references presented in the rejection of Claim 3, applies to claim 5. Thus, the method recited in claim 5 is met by Xin in view of Han, Hassan, and Alhaija.

Claim 13 recites a system with elements corresponding to the steps recited in Claim 3. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Xin, Han, Hassan, and Alhaija references, presented in rejection of Claim 3, apply to this claim.  Finally, the combination of the Xin, Han, Hassan, and Alhaija references discloses a processor and a memory to execute the method (for example, see Han, Paragraph 46).

Claim 15 recites a system with elements corresponding to the steps recited in Claim 5. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Han, Hassan, and Alhaija references, presented in rejection of Claim 3, apply to this claim.  Finally, the combination of the Han, Hassan, and Alhaija references discloses a processor and a memory to execute the method (for example, see Han, Paragraph 46).

Claims 6-11 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xin in view of Han, Hassan, and Zou et al. (US 20220004801 A1).

Regarding Claim 6, the combination of references of Xin in view of Han and Hassan does not explicitly teach "The method of claim 2, further comprising, after updating the parameters of the backbone and the prediction head based on the first sub-loss: inputting the feature corresponding to the first sample image into a feature discriminator included in the target detection model to determine a second sub-loss value; and updating parameters of the feature discriminator based on the second sub-loss value”.
In an analogous field of endeavor, Zou teaches "The method of claim 2, further comprising, after updating the parameters of the backbone and the prediction head based on the first sub-loss: inputting the feature corresponding to the first sample image into a feature discriminator included in the target detection model to determine a second sub-loss value"; (Zou, Para. 9, teaches inputting the feature of a sample image into a discriminator network to calculate a loss value, i.e., input the feature corresponding to the first sample image into a feature discriminator to determine a second sub-loss);
	"and updating parameters of the feature discriminator based on the second sub-loss value"; (Zou, Para. 9, teaches adjusting parameters of the discriminator network based on the second loss value).
	It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Xin, Han, and Hassan by including the use of a feature discriminator to determine a lose value from inputting the feature of the sample image and updating the model based on that loss taught by Zou. One of ordinary skill in the art would be motivated to combine the references since it improves model precision (Zou, Para. 51, teaches the motivation of combination to be to improve the precision of the object pose estimation).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.
 
	Regarding Claim 7, the combination of references of Xin in view of Han, Hassan, and Zou teaches "The method of claim 6, further comprising, after updating the parameters of the feature discriminator based on the second sub-loss value: determining a second sample image"; (Zou, Para. 9, teaches obtaining a next sample image in response to the loss values not meeting a threshold and repeating the actions of the training method, i.e., determining a second sample image after updating the parameters of the feature discriminator based on the second sub-loss);
"inputting the second sample image into the backbone with updated parameters to determine that an output of the backbone is a feature corresponding to the second sample image"; (Hassan, Fig. 7B and Paras. 7 and 85, teaches that the method can repeatedly adjust parameters in the network so as to minimize a loss until a stop criterion is met wherein processing an image uses a backbone network to output a set of features, i.e., input the second sample image into a backbone of the target detection model which has updated parameters to determine an output of the backbone is a feature corresponding to the second image);
"inputting the feature corresponding to the second sample image into the prediction head with updated parameters to determine a third sub-loss"; (Hassan, Fig. 7B and Paras. 7 and 85, teaches inputting the feature of a second received image into a prediction head with adjusted parameters wherein a loss function is used, i.e., input the feature of the second received sample image into the prediction head with now updated parameters to determine a third sub-loss);
"and updating parameters of the backbone and the prediction head based on the third sub-loss"; (Hassan, Fig. 7B and Paras. 7 and 85, teaches adjusting parameters of the network using a loss function, i.e., use the third-sub loss from the repeated cycle to update the parameters of the backbone and prediction head).
The proposed combination as well as the motivation for combining the Xin, Han, Hassan, and Zou references presented in the rejection of claims 2 and 6, applies to claim 7. Thus, the method recited in claim 7 is met by Xin in view of Han, Hassan, and Zou.

Regarding Claim 8, the combination of references of Xin in view of Han, Hassan, and Zou teaches "The method of claim 7, further comprising, after updating the parameters of the backbone and the prediction head based on the third sub-loss: inputting the feature corresponding to the second sample image into the feature discriminator with updated parameters to determine a fourth sub-loss"; (Zou, Para. 9, teaches repeating the actions of the training method on a next image sample wherein a feature corresponding to the image sample is input into the discriminator network that has adjusted parameters to determine a loss value, i.e., determine fourth sub-loss from a second sample feature input into the feature discriminator after updating the parameters from the previous actions);
"updating the parameters of the feature discriminator based on the fourth sub-loss"; (Zou, Para. 9, teaches adjusting parameters of the discriminator network based on the loss values, i.e., update parameters of the discriminator based on the fourth sub-loss);
"and training repeatedly the backbone, the prediction head, and the feature discriminator"; (Zou, Para. 9, teaches repeating the actions of the training method, i.e., repeatedly training the feature discriminator).
The proposed combination as well as the motivation for combining the Xin, Han, Hassan, and Zou references presented in the rejection of claim 6, applies to claim 8. Thus, the method recited in claim 8 is met by Xin in view of Han, Hassan, and Zou.

Regarding Claim 9, the combination of references of Xin in view of Han, Hassan, and Zou teaches "The method of claim 8, wherein: the first sub-loss includes a minimal loss function corresponding to the backbone and the prediction head"; (Hassan, Para. 85, teaches adjusting the parameters of the network to minimize the measure of difference of the cost function between predicted output and ground truth, i.e., minimal loss function corresponding to the backbone network and prediction head);
"the second sub-loss is determined based on an output of the feature discriminator after a natural pixel is inputted into the backbone, in response to the output of the backbone being used as an input of the feature discriminator, and an output of the feature discriminator after a paste pixel is inputted into the backbone, and in response to the output of the backbone being used as an input of the feature discriminator"; (Zou, Para. 9, teaches inputting a source domain image sample that is a simulated image generated through rendering based on object pose parameters and a target domain image sample that is a real image  into a feature extraction network, i.e., input natural pixels and paste pixels into the backbone, wherein the extracted matching feature is input into the discriminator network that calculates a loss value, i.e., the output of the feature discriminator after natural and paste pixels are input into the backbone which has an output that is input to the discriminator is used to determine the second sub-loss);
"and the third sub-loss is determined based on the minimal loss function of the backbone in response to the minimal loss function corresponding to the backbone and the prediction head"; (Hassan, Fig. 7B and Paras. 7 and 85, teaches computing a classification loss to minimize measure of difference between predicted output and ground truth wherein the loss function corresponds to the backbone network and the prediction head, i.e., third sub-loss determined based on minimal loss function of the backbone that corresponds to the backbone and prediction head);
"and the paste pixel being input to the backbone and an output of the feature discriminator after the paste pixel is inputted into the backbone, in response to an output of the backbone being used as an input of the feature discriminator"; (Zou, Para. 9, teaches the feature extraction network outputting a matching feature that is input to the discriminator network wherein the source domain image and target domain image is input to the feature extraction network and output of the discriminator network after being input into the feature extraction network, i.e., the paste pixel is input to the backbone network wherein the output of the backbone network is used as input of the feature discriminator).
The proposed combination as well as the motivation for combining the Xin, Han, Hassan, and Zou references presented in the rejection of claims 2 and 6, applies to claim 9. Thus, the method recited in claim 9 is met by Xin in view of Han, Hassan, and Zou.

 	Regarding Claim 10, the combination of references of Xin in view of Han, Hassan, and Zou teaches "The method of claim 9, wherein determining the second sub-loss based on the output of the feature discriminator after the natural pixel is inputted into the backbone, in response to the output of the backbone being used as the input of the feature discriminator, and the output of the feature discriminator after the paste pixel is inputted into the backbone, in response to the output of the backbone being used as the input of the feature discriminator includes: determining a logarithmic value expectation of the output of the feature discriminator after the natural pixel is inputted into the backbone and in response to the output of the backbone being used as the input of the feature discriminator"; (Zou, Paras. 9 and 89-91, teaches the calculation of the loss value uses a logarithmic function wherein the loss value is calculated from the output of the discriminator network after the source domain image and target domain image is input into the feature extraction network that has an output that is input into the discriminator network, i.e., a logarithmic value expectation is determined from the output of the discriminator network after natural pixels are input to the backbone network which has an output that is input to the feature discriminator);
	"and performing summation on the logarithmic value expectation and a logarithmic value expectation of the output of the feature discriminator after the paste pixel is inputted into the backbone and in response to the output of the backbone being used as the input of the feature discriminator to determine a summation result as the second sub-loss"; (Zou, Paras. 9 and 89-91, teaches the summation of logarithmic functions of the discrimination result of the matching feature of the target domain image sample and the matching feature of the source domain image sample, i.e., performing summation on the logarithmic value expectation of the natural pixel and the logarithmic value expectation of the paste pixel wherein the output of the feature discriminator is used after the paste pixel is input to the backbone network that has an output that is used as an input to the feature discriminator, wherein the summation of the logarithmic functions is used to calculate the second loss value, i.e., determine a summation result as the second sub-loss).
The proposed combination as well as the motivation for combining the Xin, Han, Hassan, and Zou references presented in the rejection of claim 6, applies to claim 10. Thus, the method recited in claim 10 is met by Xin in view of Han, Hassan, and Zou.
	
Regarding Claim 11, the combination of references of Xin in view of Han, Hassan, and Zou teaches "The method of claim 9, wherein determining the third sub-loss based on the minimal loss function of the backbone in response to inputting the minimal loss function corresponding to the backbone and the prediction head and the paste pixel into the backbone and the output of the feature discriminator after the paste pixel is inputted into the backbone and in response to the output of the backbone being used as the input of the feature discriminator includes: corresponding to the backbone and the prediction head, and when pasting pixels into the backbone, determining the minimal loss function of the backbone in response to inputting the minimal loss function corresponding to the backbone and the prediction head and the paste pixel into the backbone"; (Zou, Fig. 7 and Para. 9, teaches repeating the actions of the training method if the loss values do not meet the threshold, i.e., inputting the minimal loss function corresponding to the backbone network and prediction head and paste pixel back into the backbone network, wherein another loss value is then calculated for the feature extraction network corresponding to the feature extraction network wherein source domain image and target domain image is input to the network, i.e., determining the minimal loss function of the backbone corresponding to the backbone and prediction head and where paste pixels are input to the backbone network);
"and performing summation on the logarithmic value expectation of the output of the feature discriminator to determine the summation result as the third sub-loss"; (Zou, Paras. 9 and 89-91, teaches the calculation of loss as the summation of logarithmic functions of the discrimination result of the matching feature of the target domain image sample and the matching feature of the source domain image sample, i.e., performing summation on the logarithmic value expectation of the feature discriminator output in order to calculate the now third sub-loss as that summation result).
The proposed combination as well as the motivation for combining the Xin, Han, Hassan, and Zou references presented in the rejection of claim 6, applies to claim 11. Thus, the method recited in claim 11 is met by Xin in view of Han, Hassan, and Zou.
	
Claim 16 recites a system with elements corresponding to the steps recited in Claim 6. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Xin, Han, Hassan, and Zou references, presented in rejection of Claim 6, apply to this claim.  Finally, the combination of the Xin, Han, Hassan, and Zou references discloses a processor and a memory to execute the method (for example, see Han, Paragraph 46).

Claim 17 recites a system with elements corresponding to the steps recited in Claim 7. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Xin, Han, Hassan, and Zou references, presented in rejection of Claim 6, apply to this claim.  Finally, the combination of the Xin, Han, Hassan, and Zou references discloses a processor and a memory to execute the method (for example, see Han, Paragraph 46).

Claim 18 recites a system with elements corresponding to the steps recited in Claim 8. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Xin, Han, Hassan, and Zou references, presented in rejection of Claim 6, apply to this claim.  Finally, the combination of the Xin, Han, Hassan, and Zou references discloses a processor and a memory to execute the method (for example, see Han, Paragraph 46).

Claim 19 recites a system with elements corresponding to the steps recited in Claim 9. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Xin, Han, Hassan, and Zou references, presented in rejection of Claim 6, apply to this claim.  Finally, the combination of the Xin, Han, Hassan, and Zou references discloses a processor and a memory to execute the method (for example, see Han, Paragraph 46).

Claim 20 recites a system with elements corresponding to the steps recited in Claim 10. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Xin, Han, Hassan, and Zou references, presented in rejection of Claim 6, apply to this claim.  Finally, the combination of the Xin, Han, Hassan, and Zou references discloses a processor and a memory to execute the method (for example, see Han, Paragraph 46).

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Xin in view of Han, Hassan, and Kim et al. (US 20190031105 A1).

Regarding Claim 21, the combination of references of Xin in view of Han and Hassan does not explicitly teach "The method of claim 2, wherein the non-target sub-area includes one or more of a road surface, a vegetation, and a building".
In an analogous field of endeavor, Kim teaches "The method of claim 2, wherein the non-target sub-area includes one or more of a road surface, a vegetation, and a building"; (Kim, FIG. 14 and Paras. 372 and 505, teaches a secondary area of an image in which a tree, building, and an oncoming lane are present, i.e., non-target sub-area including a road surface, vegetation, and a building).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Xin, Han, and Hassan by including the area of the image including a road, vegetation, and a building taught by Kim. One of ordinary skill in the art would be motivated to combine the references since it displays objects for safety (Kim, Para. 594, teaches the motivation of combination to be to display objects to improve safety).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW STEVEN BUDISALICH whose telephone number is (703)756-5568. The examiner can normally be reached Monday - Friday 8:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached on (571) 272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/ANDREW S BUDISALICH/Examiner, Art Unit 2662

/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662
Read full office action
Prosecution Timeline

Jan 25, 2023
Application Filed
Jun 02, 2025
Non-Final Rejection — §103
Sep 08, 2025
Response Filed
Oct 06, 2025
Final Rejection — §103
Dec 09, 2025
Response after Non-Final Action
Dec 18, 2025
Request for Continued Examination
Jan 16, 2026
Response after Non-Final Action
Jan 20, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/342,892
Patent 12602820
METHOD AND APPARATUS WITH ATTENTION-BASED OBJECT ANALYSIS
2y 5m to grant Granted Apr 14, 2026
18/038,197
Patent 12597106
METHOD AND APPARATUS FOR IDENTIFYING DEFECT GRADE OF BAD PICTURE, AND STORAGE MEDIUM
2y 5m to grant Granted Apr 07, 2026
18/215,428
Patent 12592078
VIDEO MONITORING DEVICE, VIDEO MONITORING SYSTEM, VIDEO MONITORING METHOD, AND STORAGE MEDIUM STORING VIDEO MONITORING PROGRAM
2y 5m to grant Granted Mar 31, 2026
18/333,890
Patent 12586232
METHOD FOR OBJECT DETECTION USING CROPPED IMAGES
2y 5m to grant Granted Mar 24, 2026
17/954,417
Patent 12567151
Microscopy System and Method for Instance Segmentation
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
87%
With Interview (+8.9%)
2y 9m
Median Time to Grant
High
PTA Risk
Based on 46 resolved cases by this examiner. Grant probability derived from career allow rate.