Last updated: April 18, 2026

Application No. 18/187,436

PRESERVATION OF DEEP LEARNING CLASSIFIER CONFIDENCE DISTRIBUTIONS

Non-Final OA §102

Filed

Mar 21, 2023

Examiner

GERMICK, JOHNATHAN R

Art Unit

2122

Tech Center

2100 — Computer Architecture & Software

Assignee

International Business Machines Corporation

OA Round

1 (Non-Final)

Interview Optional

— +32.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 91 resolved cases, 2023–2026

Examiner Intelligence

GERMICK, JOHNATHAN R View full profile →

Grants 47% of resolved cases

Career Allow Rate

43 granted / 91 resolved

-7.7% vs TC avg

Strong +32% interview lift

Without

With

+32.1%

Interview Lift

resolved cases with interview

Typical timeline

4y 2m

Avg Prosecution

28 currently pending

Career history

119

Total Applications

across all art units

Statute-Specific Performance

§101

29.0%

-11.0% vs TC avg

§103

38.5%

-1.5% vs TC avg

§102

17.3%

-22.7% vs TC avg

§112

14.3%

-25.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 91 resolved cases

Office Action

§102

DETAILED ACTION
This action is responsive to the Application filed on 03/21/2023. Claims 1-20are pending in the case.  Claims 1, 8, and 15 are independent claims. Claims 1, 6, 9, 11, 15, 18, 19, 20 are amended. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s)1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Cordeiro al. “PropMix: Hard Sample Filtering and Proportional MixUp for Learning with Noisy Labels”

Claim 1
Cordeiro teaches, a processor that executes computer-executable components stored in a non-transitory computer-readable memory, wherein the computer-executable components comprise: ( pg 6 Section 4.2 “For CIFAR-10 and CIFAR-100 we used a 18-layer PreaAct-ResNet-18 (PRN18) [18] as our backbone model… For CIFAR-10 and CIFAR-100, PRN18 is trained with a WarmUp stage of 30 epochs…” one of ordinary skill in the art would understand that such an algorithm and its training is implements on a computer with memory.) an access component that accesses a deep learning classifier and a training dataset on which the deep learning classifier was trained; and (pg 6 Section 4.2 “For CIFAR-10 and CIFAR-100 we used a 18-layer PreaAct-ResNet-18 (PRN18) [18] as our backbone model” the model is a deep model because it has at least 18 layers. pg 10 Table 3 “Test accuracy (%) for WebVision… by methods trained with 100 epochs.” The table shows the test accuracy of the authors model, indicating it is accessed and trained. Pg 4 “After the pre-training, we warm-up the classifier by training it for a few epochs on the (noisy) training data” the classifier is trained with training data.) a re-training component that re-trains the deep learning classifier using a loss function that is based on a Gaussian mixture model constructed from the training dataset. (Figure 1 
    PNG
    media_image1.png
    211
    732
    media_image1.png
    Greyscale
caption “Our proposed PropMix has a self-supervised pre-training stage [6, 7, 12, 19], followed by a supervised training stage, where we first warm-up the classifier with a classification loss, using the pre-trained weights. Then, using the classification loss, we train a GMM to separate the samples into clean and noisy. Next, using classification confidence for the noisy set, we train a second GMM to separate the easy and hard noisy samples. The clean and easy noisy samples are proportionally combined in the MixUp for training” pg 5 “To optimise the classification term we rely on the regularised CE loss… 
    PNG
    media_image2.png
    47
    172
    media_image2.png
    Greyscale
” the model is first trained, then retrained via a loss function which is based on the filtered dataset by the gaussian mixture model.)

Claim 2
Cordeiro teaches claim 1
Cordeiro teaches, wherein the deep learning classifier is configured to receive a data candidate as input… and to produce a classification label and a confidence score as output,  ( pg 4 “After the pre-training, we warm-up the classifier by training it for a few epochs on the (noisy) training data set with the cross-entropy (CE) loss. The clean and noisy sets, X ,U ⊆ D… 
    PNG
    media_image3.png
    32
    838
    media_image3.png
    Greyscale
… with τ denoting a classification threshold, … being a function that estimates the probability that (xi ,yi) is a clean label sample” the probability estimate of a clean label sample is a confidence score, while figure 1 cited above indicates the label estimation, corresponding to the classification label, of the candidate data in the dataset D.) and wherein the computer-executable components further comprise: a data component that generates a set of confidence lists collated according to class, by executing the deep learning classifier on the training dataset. (pg 4 “After the pre-training, we warm-up the classifier by training it for a few epochs…The clean and noisy sets, X ,U ⊆ D, are formed… Next, we obtain the sets of easy and hard noisy samples UE,UH ⊆ U, as follows.” Clean/noisy and easy/hard samples are organized into sets or lists according to class, where the attributes of the sample (i.e easy/hard) corresponds to the class. This is accomplished by executing the classifier)

Claim 3
Cordeiro teaches claim 2
Cordeiro teaches, a Gaussian component that generates the Gaussian mixture model based on the set of confidence lists, wherein constituent Gaussian distributions of the Gaussian mixture model respectively correspond to unique classes. (pg 4 “The function … in Eq.3 is a bi-modal Gaussian mixture model (GMM)…  where γ denotes the GMM parameters and the larger mean component is the noisy component whereas the smaller mean component is the clean component…. The function …  in Eq. 4 is a GMM, where γ denotes the GMM parameters and the smaller mean component is the hard noise component whereas the larger mean component is the easy noise component” the gaussian mixture models are generated based on the set of confidence lists. Each gaussian component of the models correspond to particular unique classes (i.e noisy/clean/hard/easy))

Claim 4
Cordeiro teaches claim 3
Cordeiro teaches, wherein the access component accesses a training data candidate on which the deep learning classifier has not been trained… and wherein the re-training component executes the deep learning classifier on the training data candidate (pg 2 “To improve the feature representation and model confidence in high noise scenarios, we also add a self-supervised pre-training stage” pg 3 “Our method proposes a hybrid approach. We claim that hard noisy samples are unlikely to have their label corrected, mainly in a high noise scenario. On the other hand, we can find easy noisy samples that are likely to be correctly relabelled and used in a supervised training. The main difference of existing filtering methods and our approach is that we filter out hard noisy samples, while keeping easy noisy samples to be relabelled and included in the training process” the processes first involve training with initial data. The labels are revised, i.e a new data set is created to retrain the model, thus accessing data which the classifier has not been trained, which is then used for training.) thereby yielding a first classification label and a first confidence score. (pg 2 “PropMix filters out hard noisy samples via a two-stage process, where the first stage classifies samples as clean or noisy using the loss values, and the second stage eliminates hard noisy samples using their classification confidence. Then, by re-labelling the easy noisy samples with the model output, adding these samples to the training set, and running a regular classification training with MixUp” the model extract a first classification confidence or score, and re-labels samples, thus yielding a first classification label.)

Claim 5
Cordeiro teaches claim 4
Cordeiro teaches, wherein the first classification label corresponds to a first constituent Gaussian distribution of the Gaussian mixture model ( pg 4-5 “Next, we obtain the sets of easy and hard noisy samples UE,UH ⊆ U, as follows… …
    PNG
    media_image4.png
    42
    465
    media_image4.png
    Greyscale
… The function p hard|pθ (c ∗ i | fφ (xi)), γ  in Eq. 4 is a GMM” the classification label y corresponds to the constitute gaussian distribution of the GMM mixture model.) and wherein the re-training component determines, via the Gaussian mixture model, a measure of fit between the first confidence score and the first constituent Gaussian distribution ( pg 5 “To optimise the classification term we rely on the regularised CE loss… 
    PNG
    media_image5.png
    82
    863
    media_image5.png
    Greyscale
” the retraining according to the loss function is based on the output from the Guassian mixture model, i.e via the GMM, measures the fit between the determined confidence score from the GMM and gaussian distribution via the KL loss term with parameters theta.)

Claim 6
Cordeiro teaches claim 5
Cordeiro teaches, wherein the loss function comprises a first term that is based on the first classification label, and wherein the loss function comprises a second term that is based on the measure of fit (pg 5  
    PNG
    media_image6.png
    57
    177
    media_image6.png
    Greyscale

    PNG
    media_image5.png
    82
    863
    media_image5.png
    Greyscale
 the first term l_CE is based on the classification label y. The Second term l_r is the measure of fit via KL divergence)

Claim 7
Cordeiro teaches claim 2
Cordeiro teaches, wherein the data component generates the set of confidence lists based on applying a drop out technique to the training dataset. (pg 4 Our proposed PropMix 
    PNG
    media_image7.png
    203
    692
    media_image7.png
    Greyscale
pg 3 Section 3.2 “Then, we perform a supervised training, with a new filtering step to identify clean samples, easy noisy samples, and hard noisy samples, which are removed from training” discarding hard samples amounts to applying dropout to the training data set, with each training iteration the filtering is performed this the dropout is based on filtering described in the figure, the confidence list of filtered samples is based on the removed samples in part)

Claim 8
Cordeiro teaches, A computer-implemented method ( pg 6 Section 4.2 “For CIFAR-10 and CIFAR-100 we used a 18-layer PreaAct-ResNet-18 (PRN18) [18] as our backbone model”) 
The remaining limitation are rejected for the reasons set forth in claim 1

Claim 9-14
The claims are rejected for the reasons set forth in the rejections of claims 2-7, in connection with claim 1

Claim 15
Cordeiro teaches, A computer program product for facilitating preservation of deep learning classifier confidence distributions, the computer program product comprising a non-transitory computer-readable memory having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: ( pg 6 Section 4.2 “For CIFAR-10 and CIFAR-100 we used a 18-layer PreaAct-ResNet-18 (PRN18) [18] as our backbone model… For CIFAR-10 and CIFAR-100, PRN18 is trained with a WarmUp stage of 30 epochs…” one of ordinary skill in the art would understand that such an algorithm and its training is implements on a computer with memory.)) 
The remaining limitation are rejected for the reasons set forth in claim 1

Claim 16-20
The claims are rejected for the reasons set forth in the rejections of claims 2-6, in connection with claim 1

Conclusion
Prior art:
	Lee et al. “training confidence-calibrated classifiers for detecting out-of-distribution samples” describes generating training samples using a gaussian mixture model tuned to underlying class distributions.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 9:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.R.G./
Examiner, Art Unit 2122    

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122

Read full office action

Prosecution Timeline

Mar 21, 2023

Application Filed

Jan 22, 2026

Non-Final Rejection — §102

Mar 17, 2026

Applicant Interview (Telephonic)

Mar 17, 2026

Examiner Interview Summary

Mar 31, 2026

Response Filed

Precedent Cases

Applications granted by this same examiner with similar technology

16/240,514

Patent 12566962

DITHERED QUANTIZATION OF PARAMETERS DURING TRAINING WITH A MACHINE LEARNING TOOL

2y 5m to grant Granted Mar 03, 2026

17/121,871

Patent 12566983

MACHINE LEARNING CLASSIFIERS PREDICTION CONFIDENCE AND EXPLANATION

2y 5m to grant Granted Mar 03, 2026

17/025,845

Patent 12554977

DEEP NEURAL NETWORK FOR MATCHING ENTITIES IN SEMI-STRUCTURED DATA

2y 5m to grant Granted Feb 17, 2026

16/537,752

Patent 12443829

NEURAL NETWORK PROCESSING METHOD AND APPARATUS BASED ON NESTED BIT REPRESENTATION

2y 5m to grant Granted Oct 14, 2025

17/029,290

Patent 12443868

QUANTUM ERROR MITIGATION USING HARDWARE-FRIENDLY PROBABILISTIC ERROR CORRECTION

2y 5m to grant Granted Oct 14, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

47%

Grant Probability

79%

With Interview (+32.1%)

4y 2m

Median Time to Grant

Low

PTA Risk

Based on 91 resolved cases by this examiner. Grant probability derived from career allow rate.