Prosecution Insights
Last updated: April 18, 2026
Application No. 18/187,436

PRESERVATION OF DEEP LEARNING CLASSIFIER CONFIDENCE DISTRIBUTIONS

Non-Final OA §102
Filed
Mar 21, 2023
Examiner
GERMICK, JOHNATHAN R
Art Unit
2122
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
1 (Non-Final)
47%
Grant Probability
Moderate
1-2
OA Rounds
4y 2m
To Grant
79%
With Interview

Examiner Intelligence

Grants 47% of resolved cases
47%
Career Allow Rate
43 granted / 91 resolved
-7.7% vs TC avg
Strong +32% interview lift
Without
With
+32.1%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
28 currently pending
Career history
119
Total Applications
across all art units

Statute-Specific Performance

§101
29.0%
-11.0% vs TC avg
§103
38.5%
-1.5% vs TC avg
§102
17.3%
-22.7% vs TC avg
§112
14.3%
-25.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 91 resolved cases

Office Action

§102
DETAILED ACTION This action is responsive to the Application filed on 03/21/2023. Claims 1-20are pending in the case. Claims 1, 8, and 15 are independent claims. Claims 1, 6, 9, 11, 15, 18, 19, 20 are amended. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. Claim(s)1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Cordeiro al. “PropMix: Hard Sample Filtering and Proportional MixUp for Learning with Noisy Labels” Claim 1 Cordeiro teaches, a processor that executes computer-executable components stored in a non-transitory computer-readable memory, wherein the computer-executable components comprise: ( pg 6 Section 4.2 “For CIFAR-10 and CIFAR-100 we used a 18-layer PreaAct-ResNet-18 (PRN18) [18] as our backbone model… For CIFAR-10 and CIFAR-100, PRN18 is trained with a WarmUp stage of 30 epochs…” one of ordinary skill in the art would understand that such an algorithm and its training is implements on a computer with memory.) an access component that accesses a deep learning classifier and a training dataset on which the deep learning classifier was trained; and (pg 6 Section 4.2 “For CIFAR-10 and CIFAR-100 we used a 18-layer PreaAct-ResNet-18 (PRN18) [18] as our backbone model” the model is a deep model because it has at least 18 layers. pg 10 Table 3 “Test accuracy (%) for WebVision… by methods trained with 100 epochs.” The table shows the test accuracy of the authors model, indicating it is accessed and trained. Pg 4 “After the pre-training, we warm-up the classifier by training it for a few epochs on the (noisy) training data” the classifier is trained with training data.) a re-training component that re-trains the deep learning classifier using a loss function that is based on a Gaussian mixture model constructed from the training dataset. (Figure 1 PNG media_image1.png 211 732 media_image1.png Greyscale caption “Our proposed PropMix has a self-supervised pre-training stage [6, 7, 12, 19], followed by a supervised training stage, where we first warm-up the classifier with a classification loss, using the pre-trained weights. Then, using the classification loss, we train a GMM to separate the samples into clean and noisy. Next, using classification confidence for the noisy set, we train a second GMM to separate the easy and hard noisy samples. The clean and easy noisy samples are proportionally combined in the MixUp for training” pg 5 “To optimise the classification term we rely on the regularised CE loss… PNG media_image2.png 47 172 media_image2.png Greyscale ” the model is first trained, then retrained via a loss function which is based on the filtered dataset by the gaussian mixture model.) Claim 2 Cordeiro teaches claim 1 Cordeiro teaches, wherein the deep learning classifier is configured to receive a data candidate as input… and to produce a classification label and a confidence score as output, ( pg 4 “After the pre-training, we warm-up the classifier by training it for a few epochs on the (noisy) training data set with the cross-entropy (CE) loss. The clean and noisy sets, X ,U ⊆ D… PNG media_image3.png 32 838 media_image3.png Greyscale … with τ denoting a classification threshold, … being a function that estimates the probability that (xi ,yi) is a clean label sample” the probability estimate of a clean label sample is a confidence score, while figure 1 cited above indicates the label estimation, corresponding to the classification label, of the candidate data in the dataset D.) and wherein the computer-executable components further comprise: a data component that generates a set of confidence lists collated according to class, by executing the deep learning classifier on the training dataset. (pg 4 “After the pre-training, we warm-up the classifier by training it for a few epochs…The clean and noisy sets, X ,U ⊆ D, are formed… Next, we obtain the sets of easy and hard noisy samples UE,UH ⊆ U, as follows.” Clean/noisy and easy/hard samples are organized into sets or lists according to class, where the attributes of the sample (i.e easy/hard) corresponds to the class. This is accomplished by executing the classifier) Claim 3 Cordeiro teaches claim 2 Cordeiro teaches, a Gaussian component that generates the Gaussian mixture model based on the set of confidence lists, wherein constituent Gaussian distributions of the Gaussian mixture model respectively correspond to unique classes. (pg 4 “The function … in Eq.3 is a bi-modal Gaussian mixture model (GMM)… where γ denotes the GMM parameters and the larger mean component is the noisy component whereas the smaller mean component is the clean component…. The function … in Eq. 4 is a GMM, where γ denotes the GMM parameters and the smaller mean component is the hard noise component whereas the larger mean component is the easy noise component” the gaussian mixture models are generated based on the set of confidence lists. Each gaussian component of the models correspond to particular unique classes (i.e noisy/clean/hard/easy)) Claim 4 Cordeiro teaches claim 3 Cordeiro teaches, wherein the access component accesses a training data candidate on which the deep learning classifier has not been trained… and wherein the re-training component executes the deep learning classifier on the training data candidate (pg 2 “To improve the feature representation and model confidence in high noise scenarios, we also add a self-supervised pre-training stage” pg 3 “Our method proposes a hybrid approach. We claim that hard noisy samples are unlikely to have their label corrected, mainly in a high noise scenario. On the other hand, we can find easy noisy samples that are likely to be correctly relabelled and used in a supervised training. The main difference of existing filtering methods and our approach is that we filter out hard noisy samples, while keeping easy noisy samples to be relabelled and included in the training process” the processes first involve training with initial data. The labels are revised, i.e a new data set is created to retrain the model, thus accessing data which the classifier has not been trained, which is then used for training.) thereby yielding a first classification label and a first confidence score. (pg 2 “PropMix filters out hard noisy samples via a two-stage process, where the first stage classifies samples as clean or noisy using the loss values, and the second stage eliminates hard noisy samples using their classification confidence. Then, by re-labelling the easy noisy samples with the model output, adding these samples to the training set, and running a regular classification training with MixUp” the model extract a first classification confidence or score, and re-labels samples, thus yielding a first classification label.) Claim 5 Cordeiro teaches claim 4 Cordeiro teaches, wherein the first classification label corresponds to a first constituent Gaussian distribution of the Gaussian mixture model ( pg 4-5 “Next, we obtain the sets of easy and hard noisy samples UE,UH ⊆ U, as follows… … PNG media_image4.png 42 465 media_image4.png Greyscale … The function p hard|pθ (c ∗ i | fφ (xi)), γ in Eq. 4 is a GMM” the classification label y corresponds to the constitute gaussian distribution of the GMM mixture model.) and wherein the re-training component determines, via the Gaussian mixture model, a measure of fit between the first confidence score and the first constituent Gaussian distribution ( pg 5 “To optimise the classification term we rely on the regularised CE loss… PNG media_image5.png 82 863 media_image5.png Greyscale ” the retraining according to the loss function is based on the output from the Guassian mixture model, i.e via the GMM, measures the fit between the determined confidence score from the GMM and gaussian distribution via the KL loss term with parameters theta.) Claim 6 Cordeiro teaches claim 5 Cordeiro teaches, wherein the loss function comprises a first term that is based on the first classification label, and wherein the loss function comprises a second term that is based on the measure of fit (pg 5 PNG media_image6.png 57 177 media_image6.png Greyscale PNG media_image5.png 82 863 media_image5.png Greyscale the first term l_CE is based on the classification label y. The Second term l_r is the measure of fit via KL divergence) Claim 7 Cordeiro teaches claim 2 Cordeiro teaches, wherein the data component generates the set of confidence lists based on applying a drop out technique to the training dataset. (pg 4 Our proposed PropMix PNG media_image7.png 203 692 media_image7.png Greyscale pg 3 Section 3.2 “Then, we perform a supervised training, with a new filtering step to identify clean samples, easy noisy samples, and hard noisy samples, which are removed from training” discarding hard samples amounts to applying dropout to the training data set, with each training iteration the filtering is performed this the dropout is based on filtering described in the figure, the confidence list of filtered samples is based on the removed samples in part) Claim 8 Cordeiro teaches, A computer-implemented method ( pg 6 Section 4.2 “For CIFAR-10 and CIFAR-100 we used a 18-layer PreaAct-ResNet-18 (PRN18) [18] as our backbone model”) The remaining limitation are rejected for the reasons set forth in claim 1 Claim 9-14 The claims are rejected for the reasons set forth in the rejections of claims 2-7, in connection with claim 1 Claim 15 Cordeiro teaches, A computer program product for facilitating preservation of deep learning classifier confidence distributions, the computer program product comprising a non-transitory computer-readable memory having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: ( pg 6 Section 4.2 “For CIFAR-10 and CIFAR-100 we used a 18-layer PreaAct-ResNet-18 (PRN18) [18] as our backbone model… For CIFAR-10 and CIFAR-100, PRN18 is trained with a WarmUp stage of 30 epochs…” one of ordinary skill in the art would understand that such an algorithm and its training is implements on a computer with memory.)) The remaining limitation are rejected for the reasons set forth in claim 1 Claim 16-20 The claims are rejected for the reasons set forth in the rejections of claims 2-6, in connection with claim 1 Conclusion Prior art: Lee et al. “training confidence-calibrated classifiers for detecting out-of-distribution samples” describes generating training samples using a gaussian mixture model tuned to underlying class distributions. Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 9:30-4:30. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /J.R.G./ Examiner, Art Unit 2122 /KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
Read full office action

Prosecution Timeline

Mar 21, 2023
Application Filed
Jan 22, 2026
Non-Final Rejection — §102
Mar 17, 2026
Applicant Interview (Telephonic)
Mar 17, 2026
Examiner Interview Summary
Mar 31, 2026
Response Filed

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12566962
DITHERED QUANTIZATION OF PARAMETERS DURING TRAINING WITH A MACHINE LEARNING TOOL
2y 5m to grant Granted Mar 03, 2026
Patent 12566983
MACHINE LEARNING CLASSIFIERS PREDICTION CONFIDENCE AND EXPLANATION
2y 5m to grant Granted Mar 03, 2026
Patent 12554977
DEEP NEURAL NETWORK FOR MATCHING ENTITIES IN SEMI-STRUCTURED DATA
2y 5m to grant Granted Feb 17, 2026
Patent 12443829
NEURAL NETWORK PROCESSING METHOD AND APPARATUS BASED ON NESTED BIT REPRESENTATION
2y 5m to grant Granted Oct 14, 2025
Patent 12443868
QUANTUM ERROR MITIGATION USING HARDWARE-FRIENDLY PROBABILISTIC ERROR CORRECTION
2y 5m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
47%
Grant Probability
79%
With Interview (+32.1%)
4y 2m
Median Time to Grant
Low
PTA Risk
Based on 91 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month