Last updated: July 17, 2026
Application No. 18/620,283
TRAINING METHOD OF OBJECT DETECTION MODEL, OBJECT DETECTION METHOD, APPARATUS AND DEVICE

Final Rejection §101§103
Filed
Mar 28, 2024
Priority
Mar 31, 2023 — CN 202310342221.6
Examiner
PARK, EDWARD
Art Unit
2675
Tech Center
2600 — Communications
Assignee
Beijing Zitiao Network Technology Co., Ltd.
OA Round
2 (Final)
Interview Optional

— +18.0% interview lift. Examiner has a relatively high allowance rate (82%); +18.0% interview lift. A written response may suffice.
Based on 717 resolved cases, 2023–2026
Examiner Intelligence

PARK, EDWARD View full profile →
Grants 82% — above average
Career Allowance Rate
589 granted / 717 resolved
+20.1% vs TC avg
Strong +18% interview lift
Without
With
+18.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
29 currently pending
Career history
747
Total Applications
across all art units
Statute-Specific Performance

§101
2.6%
-37.4% vs TC avg
§103
87.3%
+47.3% vs TC avg
§102
4.1%
-35.9% vs TC avg
§112
0.7%
-39.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 717 resolved cases
Office Action

§101 §103
DETAILED ACTION

Contents
Notice of Pre-AIA  or AIA  Status	2
Response to Arguments	3
Claim Rejections - 35 USC § 101	3
Claim Rejections - 35 USC § 103	4
Allowable Subject Matter	18
Conclusion	20


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 


This action is responsive to applicant’s claim set received on 3/28/24.  Claims 1-20 are currently pending.

Response to Amendment
This action is responsive to applicant’s amendment and remarks received on 4/20/26.  Claims 1-20 are currently pending.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 10, 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.




Claim Rejections - 35 USC § 101
Applicant rebuttal is relatively conclusory.  The applicant asserts that claim 1 and the independent claims are eligible subject matter because the limitations are applied to object detection, uses neural networks and cannot be performed mentally (see pg. 11-13).   However, claim 1 still cites generating pseudo labels, calculating losses and updating models.  These are mathematical operations and the newly added limitation does not clearly recite how the residual-network structure improves computer functionality or object detection technology.  Thus, the 101 rejection is maintained.


35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-2, 10-11, 15-16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter as follows.  Regarding claims 1, 10, 15, the claims are directed to an abstract idea, namely mathematical operation and information processing.  The claims are not integrated into a practical application and the claims lack an inventive concept.  Furthermore claims 2, 11, and 16 are also directed to an abstract idea, specifically evaluating confidence data against thresholds to determine label retention.  The dependent claims do not integrate the abstract idea into a practical application and does not recite an inventive concept.  Thus, all of the listed claims are considered non-statutory subject matter.





Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

           The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimedinvention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 




Claims 1-2 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al (ICLR: “UNBIASED TEACHER FOR SEMI-SUPERVISED OBJECT DETECTION”) in view of Banitalebi-Dehkordi (ICCV: “Knowledge Distillation for Low-Power Object Detection: A Simple Technique and Its Extensions for Training Compact Models Using Unlabeled Data”).
Regarding claim 1, Liu teaches a training method of an object detection model, comprising: acquiring an input image, and determining an object pseudo label of the input image based on an object detection model (see section 3, 3.2; label images …….Figure 3: Overview of Unbiased Teacher. Unbiased Teacher consists of two stages. Burn-In: we first train the object detector using available labeled data. Teacher-Student Mutual Learning consists of two steps. Student Learning: the fixed teacher generates pseudo-labels to train the Student, while Teacher and Student are given weakly and strongly augmented inputs, respectively. Teacher Refinement: the knowledge that the Student learned is then transferred to the slowly progressing Teacher via exponential moving average (EMA) on network weights. When the detector is trained until converge in the Burn-In stage, we switch to the Teacher-Student Mutual Learning stage. the Teacher generates pseudo-labels to train the Student, and the Student updates the knowledge it learned back to the Teacher; hence, the pseudo-labels used to train the Student itself are improved. Lastly, there exists class-imbalance and foreground-background imbalance problems in object detection, which impedes the effectiveness of semi-supervised techniques of image classification (e.g., pseudo-labeling) being used directly on SS-OD. Therefore, in Sec. 3.3, we also discuss how Focal loss (Lin et al., 2017b) and EMA training alleviate the imbalanced pseudo-label issue.); 
calculating a first loss according to the multi-object detection result of the input image and the real label of the input image (see section 3.1; It is important to have a good initialization for both Student and Teacher models, as we will rely on the Teacher to generate pseudo-labels to train the Student in the later stage. To do so, we first use the available supervised data to optimize our model θ with the supervised loss Lsup. With the supervised data Ds = {x s i , y s i } Ns i=1, the supervised loss of object detection consists of four losses: the RPN classification loss L rpn cls , the RPN regression loss L rpn reg , the ROI classification loss L roi cls , and the ROI regression loss L roi reg (Ren et al., 2015), Lsup = X i L rpn cls (x s i , y s i ) + L rpn reg (x s i , y s i ) + L roi cls (x s i , y s i ) + L roi reg(x s i , y s i ). (1) After Burn-In, we duplicate the trained weights θ for both the Teacher and the Student models (θt ← θ, θs ← θ). Starting from this trained detector, we further utilize the unsupervised data to improve the object detector via the following proposed training regimen.), and 
calculating a second loss according to the multi-object detection result of the input image and the object pseudo label of the input image (see 3.2; Student Learning with Pseudo-Labeling. To address the lack of ground-truth labels for unsupervised data, we adapt the pseudo-labeling method to generate labels for training the Student with unsupervised data. This follows the principle of existing successful examples in semi-supervised image classification task (Lee, 2013; Sohn et al., 2020a). Similar to classification-based methods, to prevent the consecutively detrimental effect of noisy pseudo-labels (i.e., confirmation bias or error accumulation), we first set a confidence threshold δ of predicted bounding boxes to filter lowconfidence predicted bounding boxes, which are more likely to be false positive samples. While the confidence threshold method have achieved tremendous success in the image classification, it is however not sufficient for object detection. This is because there also exist duplicated box predictions and imbalanced prediction issues in the SS-OD (we leave the discussion of the imbalanced prediction issue in Sec. 3.3). To address the duplicated boxes prediction issue, we remove the repetitive predictions by applying class-wise non-maximum suppression (NMS) before the use of confidence thresholding as performed in STAC (Sohn et al., 2020b). In addition, noisy pseudo-labels can affect the pseudo-label generation model (Teacher). As a result, we detach the Student and the Teacher. To be more specific, after obtaining the pseudo-labels from the Teacher, only the learnable weights of the Student model is updated via back-propagation. θs ← θs + γ ∂(Lsup + λuLunsup) ∂θs , Lunsup = X i L rpn cls (x u i , yˆ u i ) + L roi cls (x u i , yˆ u i ) (2) Note that we do not apply unsupervised losses for the bounding box regression since the naive confidence thresholding is not able to filter the pseudo-labels that are potentially incorrect for bounding box regression (because the confidence of predicted bounding boxes only indicate the confidence of predicted object categories instead of the quality of bounding box locations (Jiang et al., 2018)).); and 
updating the auxiliary detection model according to the first loss and the second loss, and updating the object detection model based on the auxiliary detection model that has been updated (see 3.2; Teacher Refinement via Exponential Moving Average. To obtain more stable pseudo-labels, we apply EMA to gradually update the Teacher model. The slowly progressing Teacher model can be regarded as the ensemble of the Student models in different training iterations. θt ← αθt + (1 − α)θs. (3) This approach has been shown to be effective in many existing works, e.g., ADAM optimization (Kingma & Ba, 2015), Batch Normalization (Ioffe & Szegedy, 2015), self-supervised learning (He et al., 2020; Grill et al., 2020), and SSL image classification (Tarvainen & Valpola, 2017), while we, for the first time, demonstrate its effectiveness also in alleviating pseudo-labeling bias issue for SS-OD (see next section).); wherein the auxiliary detection model and the object detection model are both residual networks with an identical network structure (see section 3.1, 3.2).  Liu does not teach expressly wherein the input image is labeled with a real label; acquiring a multi-object detection result of the input image based on an auxiliary detection model.
Banitalebi-Dehkordi, in the same field of endeavor, teaches wherein the input image is labeled with a real label (see abstract, section 3); acquiring a multi-object detection result of the input image based on an auxiliary detection model (see section 2.2, section 3).  
It would have been obvious (before the effective filing date of the claimed invention) or (at the time the invention was made) to one of ordinary skill in the art to modify Liu to utilize the cited limitations as suggested by Banitalebi-Dehkordi.  The suggestion/motivation for doing so would have been to enable improvements in object detection (see abstract).  Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and/or programming techniques, without changing a “fundamental” operating principle of Liu, while the teaching of Banitalebi-Dehkordi continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result.    It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.  
Regarding claim 2, Liu discloses determining a preselected pseudo label of the input image based on the object detection model, wherein the preselected pseudo label corresponds to a detection box confidence; determining a first confidence threshold corresponding to each object category in the input image; and in response to the detection box confidence corresponding to the preselected pseudo label being greater than a first confidence threshold of a corresponding object category, retaining the preselected pseudo label, and determining the object pseudo label of the input image (see 3.2, fig. 8, 9).



Claims 10-11, 15-16 is rejected under 35 U.S.C. 103 as being unpatentable over Liu et al (ICLR: “UNBIASED TEACHER FOR SEMI-SUPERVISED OBJECT DETECTION”), with Banitalebi-Dehkordi (ICCV: “Knowledge Distillation for Low-Power Object Detection: A Simple Technique and Its Extensions for Training Compact Models Using Unlabeled Data”), and further in view of Aoki et al (US 2020/0394415 A1).
Regarding claim 10, Liu teaches a method comprising: acquiring an input image, and determining an object pseudo label of the input image based on an object detection model (see section 3, 3.2; label images …….Figure 3: Overview of Unbiased Teacher. Unbiased Teacher consists of two stages. Burn-In: we first train the object detector using available labeled data. Teacher-Student Mutual Learning consists of two steps. Student Learning: the fixed teacher generates pseudo-labels to train the Student, while Teacher and Student are given weakly and strongly augmented inputs, respectively. Teacher Refinement: the knowledge that the Student learned is then transferred to the slowly progressing Teacher via exponential moving average (EMA) on network weights. When the detector is trained until converge in the Burn-In stage, we switch to the Teacher-Student Mutual Learning stage. the Teacher generates pseudo-labels to train the Student, and the Student updates the knowledge it learned back to the Teacher; hence, the pseudo-labels used to train the Student itself are improved. Lastly, there exists class-imbalance and foreground-background imbalance problems in object detection, which impedes the effectiveness of semi-supervised techniques of image classification (e.g., pseudo-labeling) being used directly on SS-OD. Therefore, in Sec. 3.3, we also discuss how Focal loss (Lin et al., 2017b) and EMA training alleviate the imbalanced pseudo-label issue.); 
calculating a first loss according to the multi-object detection result of the input image and the real label of the input image (see section 3.1; It is important to have a good initialization for both Student and Teacher models, as we will rely on the Teacher to generate pseudo-labels to train the Student in the later stage. To do so, we first use the available supervised data to optimize our model θ with the supervised loss Lsup. With the supervised data Ds = {x s i , y s i } Ns i=1, the supervised loss of object detection consists of four losses: the RPN classification loss L rpn cls , the RPN regression loss L rpn reg , the ROI classification loss L roi cls , and the ROI regression loss L roi reg (Ren et al., 2015), Lsup = X i L rpn cls (x s i , y s i ) + L rpn reg (x s i , y s i ) + L roi cls (x s i , y s i ) + L roi reg(x s i , y s i ). (1) After Burn-In, we duplicate the trained weights θ for both the Teacher and the Student models (θt ← θ, θs ← θ). Starting from this trained detector, we further utilize the unsupervised data to improve the object detector via the following proposed training regimen.), and
calculating a second loss according to the multi-object detection result of the input image and the object pseudo label of the input image (see 3.2; Student Learning with Pseudo-Labeling. To address the lack of ground-truth labels for unsupervised data, we adapt the pseudo-labeling method to generate labels for training the Student with unsupervised data. This follows the principle of existing successful examples in semi-supervised image classification task (Lee, 2013; Sohn et al., 2020a). Similar to classification-based methods, to prevent the consecutively detrimental effect of noisy pseudo-labels (i.e., confirmation bias or error accumulation), we first set a confidence threshold δ of predicted bounding boxes to filter lowconfidence predicted bounding boxes, which are more likely to be false positive samples. While the confidence threshold method have achieved tremendous success in the image classification, it is however not sufficient for object detection. This is because there also exist duplicated box predictions and imbalanced prediction issues in the SS-OD (we leave the discussion of the imbalanced prediction issue in Sec. 3.3). To address the duplicated boxes prediction issue, we remove the repetitive predictions by applying class-wise non-maximum suppression (NMS) before the use of confidence thresholding as performed in STAC (Sohn et al., 2020b). In addition, noisy pseudo-labels can affect the pseudo-label generation model (Teacher). As a result, we detach the Student and the Teacher. To be more specific, after obtaining the pseudo-labels from the Teacher, only the learnable weights of the Student model is updated via back-propagation. θs ← θs + γ ∂(Lsup + λuLunsup) ∂θs , Lunsup = X i L rpn cls (x u i , yˆ u i ) + L roi cls (x u i , yˆ u i ) (2) Note that we do not apply unsupervised losses for the bounding box regression since the naive confidence thresholding is not able to filter the pseudo-labels that are potentially incorrect for bounding box regression (because the confidence of predicted bounding boxes only indicate the confidence of predicted object categories instead of the quality of bounding box locations (Jiang et al., 2018)).); and 
updating the auxiliary detection model according to the first loss and the second loss, and updating the object detection model based on the auxiliary detection model that has been updated (see 3.2; Teacher Refinement via Exponential Moving Average. To obtain more stable pseudo-labels, we apply EMA to gradually update the Teacher model. The slowly progressing Teacher model can be regarded as the ensemble of the Student models in different training iterations. θt ← αθt + (1 − α)θs. (3) This approach has been shown to be effective in many existing works, e.g., ADAM optimization (Kingma & Ba, 2015), Batch Normalization (Ioffe & Szegedy, 2015), self-supervised learning (He et al., 2020; Grill et al., 2020), and SSL image classification (Tarvainen & Valpola, 2017), while we, for the first time, demonstrate its effectiveness also in alleviating pseudo-labeling bias issue for SS-OD (see next section).).  Liu does not teach expressly an electronic device, comprising: one or more processors; and a storage apparatus on which one or more programs are stored, wherein the one or more programs, when executed by the one or more processors, enable the one or more processors to implement a training method of an object detection model, and the training method of an object detection model comprises:  wherein the input image is labeled with a real label; acquiring a multi-object detection result of the input image based on an auxiliary detection model.
Banitalebi-Dehkordi, in the same field of endeavor, teaches wherein the input image is labeled with a real label (see abstract, section 3); acquiring a multi-object detection result of the input image based on an auxiliary detection model (see section 2.2, section 3).  
It would have been obvious (before the effective filing date of the claimed invention) or (at the time the invention was made) to one of ordinary skill in the art to modify Liu to utilize the cited limitations as suggested by Banitalebi-Dehkordi.  The suggestion/motivation for doing so would have been to enable improvements in object detection (see abstract).  Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and/or programming techniques, without changing a “fundamental” operating principle of Liu, while the teaching of Banitalebi-Dehkordi continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result.    It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.  
Aoki, in the same field of endeavor, teaches an electronic device, comprising: one or more processors; and a storage apparatus on which one or more programs are stored, wherein the one or more programs, when executed by the one or more processors, enable the one or more processors to implement a training method of an object detection model, and the training method of an object detection model comprises (see 0061, 0062; The procedure described in the above exemplary embodiment can be realized by a program that causes a computer (9000 in FIG. 9) that functions as the prediction model generation apparatus or the index generation apparatus to realize the functions of these apparatus. Such a computer is exemplified by a configuration including a CPU (Central Processing Unit) 9010, a communication interface 9020, a memory 9030, and an auxiliary storage device 9040 as shown in FIG. 9. That is, the CPU 9010 in FIG. 9 may execute a pre-processing program, a machine learning program, and/or a post-processing program, and execute the update processing of the data stored in the auxiliary storage device 9040 or the like. Of course, an image processing processor called GPU (Graphics Processing Unit) may be used instead of the CPU 9010.  [0062] That is, each part (processing means and functions) of the prediction model generation apparatus and the index generation apparatus explained in the above-described exemplary embodiments executes the above-described processes by using a hardware mounted on a processor mounted on these apparatuses can be realized by a computer program).  
It would have been obvious (before the effective filing date of the claimed invention) or (at the time the invention was made) to one of ordinary skill in the art to modify Liu with Banitalebi-Dehkordi to utilize the cited limitations as suggested by Aoki.  The suggestion/motivation for doing so would have been to reduce the burden of inspectors that review the videos (see 0013).  Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and/or programming techniques, without changing a “fundamental” operating principle of Liu with Banitalebi-Dehkordi, while the teaching of Aoki continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result.    It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.  
Regarding claim 11, Liu discloses determining a preselected pseudo label of the input image based on the object detection model, wherein the preselected pseudo label corresponds to a detection box confidence; determining a first confidence threshold corresponding to each object category in the input image; and in response to the detection box confidence corresponding to the preselected pseudo label being greater than a first confidence threshold of a corresponding object category, retaining the preselected pseudo label, and determining the object pseudo label of the input image (see 3.2, fig. 8, 9).

Regarding claim 15, Liu teaches a method comprising: acquiring an input image, and determining an object pseudo label of the input image based on an object detection model, (see section 3, 3.2; label images …….Figure 3: Overview of Unbiased Teacher. Unbiased Teacher consists of two stages. Burn-In: we first train the object detector using available labeled data. Teacher-Student Mutual Learning consists of two steps. Student Learning: the fixed teacher generates pseudo-labels to train the Student, while Teacher and Student are given weakly and strongly augmented inputs, respectively. Teacher Refinement: the knowledge that the Student learned is then transferred to the slowly progressing Teacher via exponential moving average (EMA) on network weights. When the detector is trained until converge in the Burn-In stage, we switch to the Teacher-Student Mutual Learning stage. the Teacher generates pseudo-labels to train the Student, and the Student updates the knowledge it learned back to the Teacher; hence, the pseudo-labels used to train the Student itself are improved. Lastly, there exists class-imbalance and foreground-background imbalance problems in object detection, which impedes the effectiveness of semi-supervised techniques of image classification (e.g., pseudo-labeling) being used directly on SS-OD. Therefore, in Sec. 3.3, we also discuss how Focal loss (Lin et al., 2017b) and EMA training alleviate the imbalanced pseudo-label issue.); 
calculating a first loss according to the multi-object detection result of the input image and the real label of the input image (see section 3.1; It is important to have a good initialization for both Student and Teacher models, as we will rely on the Teacher to generate pseudo-labels to train the Student in the later stage. To do so, we first use the available supervised data to optimize our model θ with the supervised loss Lsup. With the supervised data Ds = {x s i , y s i } Ns i=1, the supervised loss of object detection consists of four losses: the RPN classification loss L rpn cls , the RPN regression loss L rpn reg , the ROI classification loss L roi cls , and the ROI regression loss L roi reg (Ren et al., 2015), Lsup = X i L rpn cls (x s i , y s i ) + L rpn reg (x s i , y s i ) + L roi cls (x s i , y s i ) + L roi reg(x s i , y s i ). (1) After Burn-In, we duplicate the trained weights θ for both the Teacher and the Student models (θt ← θ, θs ← θ). Starting from this trained detector, we further utilize the unsupervised data to improve the object detector via the following proposed training regimen.), and
calculating a second loss according to the multi-object detection result of the input image and the object pseudo label of the input image (see 3.2; Student Learning with Pseudo-Labeling. To address the lack of ground-truth labels for unsupervised data, we adapt the pseudo-labeling method to generate labels for training the Student with unsupervised data. This follows the principle of existing successful examples in semi-supervised image classification task (Lee, 2013; Sohn et al., 2020a). Similar to classification-based methods, to prevent the consecutively detrimental effect of noisy pseudo-labels (i.e., confirmation bias or error accumulation), we first set a confidence threshold δ of predicted bounding boxes to filter lowconfidence predicted bounding boxes, which are more likely to be false positive samples. While the confidence threshold method have achieved tremendous success in the image classification, it is however not sufficient for object detection. This is because there also exist duplicated box predictions and imbalanced prediction issues in the SS-OD (we leave the discussion of the imbalanced prediction issue in Sec. 3.3). To address the duplicated boxes prediction issue, we remove the repetitive predictions by applying class-wise non-maximum suppression (NMS) before the use of confidence thresholding as performed in STAC (Sohn et al., 2020b). In addition, noisy pseudo-labels can affect the pseudo-label generation model (Teacher). As a result, we detach the Student and the Teacher. To be more specific, after obtaining the pseudo-labels from the Teacher, only the learnable weights of the Student model is updated via back-propagation. θs ← θs + γ ∂(Lsup + λuLunsup) ∂θs , Lunsup = X i L rpn cls (x u i , yˆ u i ) + L roi cls (x u i , yˆ u i ) (2) Note that we do not apply unsupervised losses for the bounding box regression since the naive confidence thresholding is not able to filter the pseudo-labels that are potentially incorrect for bounding box regression (because the confidence of predicted bounding boxes only indicate the confidence of predicted object categories instead of the quality of bounding box locations (Jiang et al., 2018)).); and 
updating the auxiliary detection model according to the first loss and the second loss, and updating the object detection model based on the auxiliary detection model that has been updated (see 3.2; Teacher Refinement via Exponential Moving Average. To obtain more stable pseudo-labels, we apply EMA to gradually update the Teacher model. The slowly progressing Teacher model can be regarded as the ensemble of the Student models in different training iterations. θt ← αθt + (1 − α)θs. (3) This approach has been shown to be effective in many existing works, e.g., ADAM optimization (Kingma & Ba, 2015), Batch Normalization (Ioffe & Szegedy, 2015), self-supervised learning (He et al., 2020; Grill et al., 2020), and SSL image classification (Tarvainen & Valpola, 2017), while we, for the first time, demonstrate its effectiveness also in alleviating pseudo-labeling bias issue for SS-OD (see next section).).  Liu does not teach expressly a computer-readable storage medium on which a computer program is stored, wherein the computer program, when executed by a processor, causes the process to perform operations comprising: wherein the input image is labeled with a real label; acquiring a multi-object detection result of the input image based on an auxiliary detection model.
Banitalebi-Dehkordi, in the same field of endeavor, teaches wherein the input image is labeled with a real label (see abstract, section 3); acquiring a multi-object detection result of the input image based on an auxiliary detection model (see section 2.2, section 3).  
It would have been obvious (before the effective filing date of the claimed invention) or (at the time the invention was made) to one of ordinary skill in the art to modify Liu to utilize the cited limitations as suggested by Banitalebi-Dehkordi.  The suggestion/motivation for doing so would have been to enable improvements in object detection (see abstract).  Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and/or programming techniques, without changing a “fundamental” operating principle of Liu, while the teaching of Banitalebi-Dehkordi continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result.    It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.  
Aoki, in the same field of endeavor, teaches a computer-readable storage medium on which a computer program is stored, wherein the computer program, when executed by a processor, causes the process to perform operations comprising (see 0061, 0062; The procedure described in the above exemplary embodiment can be realized by a program that causes a computer (9000 in FIG. 9) that functions as the prediction model generation apparatus or the index generation apparatus to realize the functions of these apparatus. Such a computer is exemplified by a configuration including a CPU (Central Processing Unit) 9010, a communication interface 9020, a memory 9030, and an auxiliary storage device 9040 as shown in FIG. 9. That is, the CPU 9010 in FIG. 9 may execute a pre-processing program, a machine learning program, and/or a post-processing program, and execute the update processing of the data stored in the auxiliary storage device 9040 or the like. Of course, an image processing processor called GPU (Graphics Processing Unit) may be used instead of the CPU 9010.  [0062] That is, each part (processing means and functions) of the prediction model generation apparatus and the index generation apparatus explained in the above-described exemplary embodiments executes the above-described processes by using a hardware mounted on a processor mounted on these apparatuses can be realized by a computer program).  
It would have been obvious (before the effective filing date of the claimed invention) or (at the time the invention was made) to one of ordinary skill in the art to modify Liu to utilize the cited limitations as suggested by Aoki.  The suggestion/motivation for doing so would have been to reduce the burden of inspectors that review the videos (see 0013).  Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and/or programming techniques, without changing a “fundamental” operating principle of Liu, while the teaching of Aoki continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result.    It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.  
Regarding claim 16, Liu discloses determining a preselected pseudo label of the input image based on the object detection model, wherein the preselected pseudo label corresponds to a detection box confidence; determining a first confidence threshold corresponding to each object category in the input image; and in response to the detection box confidence corresponding to the preselected pseudo label being greater than a first confidence threshold of a corresponding object category, retaining the preselected pseudo label, and determining the object pseudo label of the input image (see 3.2, fig. 8, 9)



Allowable Subject Matter
Claims 3-9, 12-14, 17-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Regarding claims 3, 12, 17, none of the references of record alone or in combination suggest or fairly teach determining a second confidence threshold corresponding to each object category in the input image, wherein the second confidence threshold is less than the first confidence threshold of the same object category; in response to the detection box confidence corresponding to the preselected pseudo label being greater than or equal to a second confidence threshold of a corresponding object category and less than or equal to the first confidence threshold of the same object category, taking the preselected pseudo label as an uncertain pseudo label; and in response to the detection box confidence corresponding to the preselected pseudo label being less than the second confidence threshold corresponding to each object category, taking the preselected pseudo label as a background pseudo label.
  Regarding claims 4, 13, 18, none of the references of record alone or in combination suggest or fairly teach a first preselected pseudo label and a second preselected pseudo label; an object category corresponding to the first preselected pseudo label belongs to a first category, and an object category corresponding to the second preselected pseudo label belongs to a second category; a sample proportion of the first category is greater than a sample proportion of the second category; and a first confidence threshold of the object category corresponding to the first preselected pseudo label is greater than a first confidence threshold of the object category corresponding to the second preselected pseudo label.
Regarding claims 5, 14, 19, none of the references of record alone or in combination suggest or fairly teach wherein the determining the first confidence threshold corresponding to each object category in the input image comprises: calculating an entropy of the preselected pseudo label of the input image; calculating an average entropy of each object category in the input image according to the entropy of the preselected pseudo label; and calculating the first confidence threshold corresponding to each object category in the input image according to the average entropy of each object category in the input image.
Regarding claims 6-8, 20, none of the references of record alone or in combination suggest or fairly teach wherein the auxiliary detection model comprises a feature extraction network, and the method further comprises: acquiring a feature map of the input image extracted by the feature extraction network; inputting the feature map into a global classification module to acquire a global classification result of the input image; and acquiring a third loss according to the global classification result and a global classification label; and the updating the auxiliary detection model according to the first loss and the second loss comprises: updating the auxiliary detection model according to the first loss, the second loss and the third loss.
Regarding claim 9, none of the references of record alone or in combination suggest or fairly teach wherein acquiring the multi-object detection result of the input image based on the auxiliary detection model comprises: acquiring at least two datasets, wherein real labels of images in different datasets correspond to different object categories; determining any first image from the at least two datasets, and respectively calculating a similarity between the first image and any of remaining images in the at least two datasets except the first image; determining a preset number of second images satisfying a low similarity condition from the remaining images in the at least two datasets; synthesizing the first image and the preset number of second images to acquire a third image; and determining the first image as the input image, and inputting the third image into the auxiliary detection model to acquire the multi-object detection result of the input image.




Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD PARK.  The examiner’s contact information is as follows:
Telephone: (571)270-1576 | Fax: 571.270.2576 | Edward.Park@uspto.gov
For email communications, please notate MPEP 502.03, which outlines procedures pertaining to communications via the internet and authorization.  A sample authorization form is cited within MPEP 502.03, section II.
The examiner can normally be reached on M-F 9-6 CST.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Moyer, can be reached on (571) 272-9523.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





	
	/EDWARD PARK/               Primary Examiner, Art Unit 2675
Read full office action
Prosecution Timeline

Mar 28, 2024
Application Filed
Jan 23, 2026
Non-Final Rejection mailed — §101, §103
Apr 20, 2026
Response Filed
Jun 26, 2026
Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/657,195
Patent 12664816
SYSTEMS AND METHODS FOR FACE ANNOTATION
4y 2m to grant Granted Jun 23, 2026
18/222,321
Patent 12665071
HIPAA PROTECTION FOR MEDICAL IMAGES
2y 11m to grant Granted Jun 23, 2026
18/161,752
Patent 12657881
SYSTEMS AND METHODS FOR PROCESSING ELECTRONIC IMAGES FOR AUTO-LABELING FOR COMPUTATIONAL PATHOLOGY
3y 4m to grant Granted Jun 16, 2026
18/297,811
Patent 12657678
IMAGE PROCESSING APPARATUS AND CONTROL METHOD
3y 2m to grant Granted Jun 16, 2026
18/617,206
Patent 12657877
MACHINE LEARNING USING CATEGORICAL UNCERTAINTY SAMPLING
2y 2m to grant Granted Jun 16, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
82%
Grant Probability
99%
With Interview (+18.0%)
2y 8m (~5m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 717 resolved cases by this examiner. Grant probability derived from career allowance rate.