Last updated: May 29, 2026
Application No. 18/052,140
CONTRASTIVE CREDIBILITY PROPAGATION FOR SEMI-SUPERVISED LEARNING

Final Rejection §101§103
Filed
Nov 02, 2022
Examiner
SHALU, ZELALEM W
Art Unit
2145
Tech Center
2100 — Computer Architecture & Software
Assignee
Palo Alto Networks Inc.
OA Round
2 (Final)
Interview Optional

— +20.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 29% grant rate with +20.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 110 resolved cases, 2023–2026
Examiner Intelligence

SHALU, ZELALEM W View full profile →
Grants only 29% of cases
Career Allowance Rate
32 granted / 110 resolved
-25.9% vs TC avg
Strong +20% interview lift
Without
With
+20.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 6m
Avg Prosecution
21 currently pending
Career history
145
Total Applications
across all art units
Statute-Specific Performance

§101
0.9%
-39.1% vs TC avg
§103
86.6%
+46.6% vs TC avg
§102
4.1%
-35.9% vs TC avg
§112
8.2%
-31.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 110 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is in response to the amendment filed on 01/30/2026. Claims 1-22 are pending in the case. This action is Final.

Applicant Response
In Applicant’s response dated 01/30/2026, Applicant amended Claims 1, 10, and 11-15,  and argued against all objections and rejections previously set forth in the Office Action dated 10/22/2025.

Claim Rejections - 35 USC § 101
4.	35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


5.	Claims 1-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed towards an abstract idea, without significantly more.
Step 1
	According to the first part of the analysis, in the instant case, claim is directed to a computer implemented method, which is a process and falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).

Regarding Claim 1, 10 and 17,
At step 2A, prong 1, Does the claim recite a judicial exception?
Claim 1 further recites the steps of: 
	training a first neural network to generate soft labels for a plurality of partially labeled samples based, at least in part, on credibility vectors for each of the plurality of partially labeled samples indicating certainty of class assignment,  … wherein training the first neural network comprises, for each epoch in a plurality of epochs (This limitation recited mathematical relationships and calculations which are directed to the abstract idea of mathematical concepts (see MPEP 2106.04(a)(2))
applying, for each of the plurality of partially labeled samples, one or more of a plurality of transformations to generate a plurality of transformed samples (This limitation recited mathematical relationships and calculations which are directed to the abstract idea of mathematical concepts (see MPEP 2106.04(a)(2))
inputting the transformed samples into the first neural network to generate a plurality of representations for the transformed samples (This limitation recited mathematical relationships and calculations which are directed to the abstract idea of mathematical concepts (see MPEP 2106.04(a)(2)); 
updating credibility vectors corresponding to unlabeled samples in plurality of partially labeled samples based, at least in part, on the plurality of representations.(This limitation is directed to the abstract idea of mathematical concepts (see MPEP 2106.04(a)(2))  
training the first neural network with a loss function applied to the plurality of representations and the updated credibility vectors; (This limitation recited mathematical relationships and calculations which are directed to the abstract idea of mathematical concepts (see MPEP 2106.04(a)(2; and 
averaging, for each unlabeled sample of the plurality of partially labeled samples, corresponding updated credibility vectors across the plurality of epochs.(This limitation is directed to the abstract idea of mathematical concepts (see MPEP 2106.04(a)(2)); and 
determining labels for the unlabeled samples of the plurality of partially labeled samples based, at least in part, on the averaged credibility vectors This limitation is directed to the abstract idea of mathematical concepts (see MPEP 2106.04(a)(2))
	 The claim recites abstract ideas mathematical concepts, vector similarity data labeling, loss function and probability calculations. Accordingly, the claims recite an abstract idea. 

Step 2A prong 2: Does the claim re0cite additional elements? Do those additional elements, individually and in combination, integrate the judicial exception into a practical application?
Further, the claim does not recite any additional element which could integrate this abstract idea into a practical application, because the additional elements recited of consist of:
	“… a non-transitory, machine-readable medium having program code stored thereon, the program code comprising instructions to … (Claim 10) and a non-transitory, machine-readable medium having program code stored thereon, the program code comprising instructions to: …” (Claim16) (Generic computer components on which to implement the math abstract idea (see MPEP 2106.05(f)); 
	The additional elements are recited at a high level of generality and do not amount to significantly more than the abstract idea (MPEP 2106.05(f)).  The steps form machine learning model but lacks concreate implementation or technical improvement to a computer technology and does not integrate the abstract idea into a practical application. The claim uses a computer to perform a math and does not improve the function of the computer or other technology. Accordingly, the claim does not integrate the abstract idea into practical application.
Thus, the claim is directed towards the abstract idea.

Step 2B: Do the additional elements, considered individually and in combination, amount to significantly more than the judicial exception?
	“… a non-transitory, machine-readable medium having program code stored thereon, the program code comprising instructions to … (Claim 10) and a non-transitory, machine-readable medium having program code stored thereon, the program code comprising instructions to: …” (Claim16) (Generic computer components on which to implement the math abstract idea (see MPEP 2106.05(f)); 
	The additional elements are recited at a high level of generality and do not amount to significantly more than the abstract idea (MPEP 2106.05(f)).  The steps form machine learning model but lacks concreate implementation or technical improvement to a computer technology and does not integrate the abstract idea into a practical application. The claim uses a computer to perform a math and does not improve the function of the computer or other technology	The additional elements, alone and in combination, fail to integrate the abstract idea into a practical application or add “significantly more.” Thus, the claims are not patent eligible. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. Neither can insignificant extra-solution activity. All of these additional elements as generically claimed are thus considered well-understood, routine, and conventional. Therefore, these limitations, taken alone or in combination, do not integrate the abstract idea into a practical application or recite significantly more that the abstract idea. 
Thus, these independent claims are not patent eligible.
The dependent claims respectively recite a judicial exception in limitations of: 
“Subsequent to averaging corresponding updated credibility vectors across the plurality of epochs and prior to determining labels, for each unlabeled sample of the plurality of partially labeled samples: normalizing the averaged credibility vectors by their respective maximal entries; and clipping the normalized credibility vectors to have entry values between one and zero.”(claim 2); “training a second neural network to predict labels for additional samples based, at least in part, on the plurality of partially labeled samples and the clipped credibility vectors.”( claim 3); “wherein the second neural network is trained on a cross- entropy loss function applied to the clipped credibility vectors as soft labels and outputs of the second neural network from inputting the plurality of partially labeled samples.”(claim 4); “subsequent to clipping the normalized credibility vectors and prior to determining labels: determining a percentage of the clipped credibility vectors to subsample; and at least one of setting a subset of the clipped credibility vectors according to the percentage to zero vectors and discarding the subset of the clipped credibility vectors.”(claim 5); “wherein determining the percentage of the clipped credibility vectors to subsample comprises: for each candidate percentage of a set of candidate percentages, setting a subset of the clipped credibility vectors to zero vectors according to the candidate percentage to generate subsampled credibility vectors; converting the clipped credibility vectors and the subsampled credibility vectors into a first probability distribution and a second probability distribution, respectively; and computing a probability distribution distance from the second probability distribution to the first probability distribution; and determining the percentage of the clipped credibility vectors as a candidate percentage in the set of candidate percentages having a maximal corresponding probability distribution distance below a threshold probability distribution distance.”(claim 6); “computing weights for the clipped credibility vectors as maximal entries of corresponding averaged credibility vectors; and determining the subset of the clipped credibility vectors as a subset with lowest computed weights according to the percentage.”(claim 7); “wherein updating the credibility vectors comprises: for each credibility vector and each transformation of the one or more of the plurality of transformations applied to a corresponding sample, updating each entry of the credibility vector according to proximity of a representation of the sample corresponding to the credibility vector with the transformation applied to representations of other samples with transformations in the plurality of transformations applied; and normalizing entries of each updated credibility vector by corresponding maximal entries; and averaging the normalized credibility vectors for each corresponding sample across the one or more of the plurality of transformations.”(Claim 8); wherein training the first neural network comprises backpropagating loss through layers of the first neural network based on a loss function applied to the plurality of representations and the updated credibility vectors.”(claim 9); “wherein the program code further comprises instructions to: train a second model to predict labels for additional samples based, at least in part, on the soft labels, wherein the soft labels are used to compute loss in training the second model.”(claim 11); “wherein the program code further comprises instructions to: initialize the credibility vectors of the labeled samples according to corresponding labels; and initialize the credibility vectors of the unlabeled samples to zero vectors.”(claim 12); “wherein the program code further comprises instructions to, for each training iteration, normalize each of the averaged credibility vectors with respect to maximal certainty values.”(claim13); “wherein the program code further comprises instructions to, for each training iteration, subsample the normalized credibility vectors and set the subsample of normalized credibility vectors to zero vectors.”(claim 14); “wherein the program code further comprises instructions to, for each training iteration, determine a subsampling rate for a succeeding training iteration, wherein the instructions to determine the subsampling rate comprise instructions to determine a maximum of multiple candidate subsampling rates for zero setting the normalized credibility vectors that yields a greatest overall impact on probability distribution of the normalized credibility vectors below a threshold impact.”(claim 15); “wherein the first neural network comprises third neural network and a first projection head, wherein the second neural network comprises the third neural network and a second projection head.”(claim 17); “wherein the machine-readable medium further has stored thereon instructions executable by the processor to cause the apparatus to initialize internal parameters of the third neural network prior to training the second neural network to generate labels for the plurality of partially labeled samples.”(claim 18); “wherein the machine-readable medium further has stored thereon instructions executable by the processor to cause the apparatus to average credibility vectors updated across the plurality of training epochs.”(claim 19); “wherein the machine-readable medium further has stored thereon instructions executable by the processor to cause the apparatus to at least one of normalize and clip the averaged credibility vectors.”(claim 20); “wherein the machine-readable medium further has stored thereon instructions executable by the processor to cause the apparatus to, prior to generating the soft labels, subsample the credibility vectors.”(claim 21); “wherein the machine-readable medium further has stored thereon instructions executable by the processor to cause the apparatus to determine a subsampling rate for subsampling the credibility vectors based, at least in part, on changes in probability distribution of the credibility vectors by zeroing the credibility vectors at each of one or more candidate subsampling rates.”(claim 22).  
	These additional limitations (in claims 2-9, 11-15 and 17-22) also constitute concepts performed Mathematical concept or mathematical operation groupings of abstract ideas.
	This judicial exception is not integrated into a practical application. Additional elements “computer readable medium comprising: computer program code (in claims 2-9, 11-16 and 17-22) all amount to no more than adding insignificant extra-solution activity/specifications related to data gathering, data input, or data transmittal. These additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The dependent claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of non-transitory computer readable medium comprising: computer program code are again insignificant extra-solution activity steps that cannot provide an inventive concept. All of these additional elements as generically claimed are considered well-understood, routine, and conventional. 
	Therefore, these limitations, taken alone or in combination, do not integrate the abstract idea into a practical application or recite significantly more that the abstract idea. Thus, all of the dependent claims are also not patent eligible.

Examiner Comments
5.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Claim Rejections - 35 USC § 103
6.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



7.	Claims 1-22 are rejected under 35 U.S.C. 103 as being unpatentable over Pal (Pub. No. US 20180336457 B1, Pub. Date 2018-11-22) in view of Chen (Pub. No. US 20210319266 A1, 2021-10-14 ) in view of Basu (US 20220300557 A1, 2022-09-22).and in further view of Ayvaci (US 20220164585 A1, 2022-05-26).

Pal teaches a method comprising:
training a first neural network to generate soft labels for a plurality of partially labeled samples based, at least in part, on [label distribution information] for each of the plurality of partially labeled samples indicating certainty of class assignment (see Pal: Fig.3, [0043], “At step 340, a machine-learning model may be trained to predict a label for a node based on the first label distribution information of the node. The machine-learning model may be trained using the labels and the first label distribution information of a set of the labeled nodes.”), […]  
updating credibility vectors corresponding to unlabeled samples in plurality of partially labeled samples based, at least in part, on the plurality of representations (see Pal: Fig.3, “At step 350, a predicted label may be generated for each of the unlabeled nodes using the trained machine-learning model and the first label distribution information associated with the unlabeled node. At step 360, a convergence condition may be checked. If the condition is not met, then the process may repeat with the generated predicted labels being propagated and used to train the machine-learning model. This process may repeat until the convergence condition is met.”); 
averaging, for each unlabeled sample of the plurality of partially labeled samples, corresponding updated credibility vectors across the plurality of epochs (see Pal: Fig.4, [0042], “after the label distribution has been updated, it may again be used to retrain the label classifier model q, which may again be used to predict the unknown labels Y.sub.U. Since the underlying predictors have been updated (i.e., F.sub.L has been updated after the Subsequent LP Step), the model q would consequently be updated when it is retrained in another iteration of the Training Step. Then at the subsequent Labeling Step, the retrained model q may be applied to the updated label distributions F.sub.U of the unlabeled nodes U to predict labels Y.sub.U. If additional iterations are desired, the Subsequent LP Step may again be performed to propagate the labels and the Training Step and Labeling Step may be repeated. The iterative process may terminate upon satisfaction of certain convergence conditions. For example, the convergence condition may be a predetermined number of iterations or upon detection that a benchmark metric no longer changes significantly.”); and
determining labels for the unlabeled samples of the plurality of partially labeled samples based, at least in part, on the averaged credibility vectors (see Pal: Fig.3, [0043], “At step 350, a predicted label may be generated for each of the unlabeled nodes using the trained machine-learning model and the first label distribution information associated with the unlabeled node. At step 360, a convergence condition may be checked. If the condition is not met, then the process may repeat with the generated predicted labels being propagated and used to train the machine-learning model. This process may repeat until the convergence condition is met.”)

As shown above, Pal teaches Deep Label Propagation (DLP) is a machine-learning algorithm that uses label propagation and deep neural network techniques using Semi-Supervised Learning. 

Pal does not teach the method wherein: 
training the first neural network comprises, for each epoch in a plurality of epochs, applying, for each of the plurality of partially labeled samples, one or more of a plurality of transformations to generate a plurality of transformed samples;
inputting the transformed samples into the first neural network to generate a plurality of representations for the transformed samples. 
training based on credibility vectors for each of the plurality of partially labeled samples indicating certainty of class.
training the first neural network with a loss function applied to the plurality of representations and the updated credibility vectors.
However, Chen teaches the method wherein:
training the first neural network comprises, for each epoch in a plurality of epochs, applying, for each of the plurality of partially labeled samples, one or more of a plurality of transformations to generate a plurality of transformed samples (see Chen: Fig.4, [0065], “systematically studies the impact of data augmentation, several different augmentations were considered and can optionally be included in implementations of the present disclosure. One example type of augmentation involves spatial/geometric transformation of data, such as cropping and resizing (with horizontal flipping), rotation, and cutout. Another example type of augmentation involves appearance transformation, such as color distortion (including color dropping, brightness, contrast, saturation, hue), Gaussian blur, and Sobel filtering.”)
inputting the transformed samples into the first neural network to generate a plurality of representations for the transformed samples (see Chen: Fig.2A, [0046], “A base encoder neural network 204 (represented in notation herein as ƒ(⋅)) that extracts intermediate representation vectors from augmented data examples. For example, in the illustration of FIG. 2A, the base encoder neural network 204 has generated intermediate representations 214 and 224 from augmented images 212 and 222, respectively. The example framework 200 allows various choices of the network architecture without any constraints.”); 
	Because both Pal and Chen are in the similar field of endeavor of Semi-supervised contrastive learning machine learning, accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Pal’s probabilistic classification model to incorporate Chen’s transformation and representation learning framework as taught by Chen. One would have been motivated to make such a combination in order to provide improved data labeling accuracy and increase flexibility and applicability of generated models. (see Chen: [0004])

Pal and Chen  do not teach the method wherein:
training based on credibility vectors for each of the plurality of partially labeled samples indicating certainty of class assignment .
training the first neural network with a loss function applied to the plurality of representations and the updated credibility vectors.

However, Basu teaches the method wherein:
the training based on credibility vectors for each of the plurality of partially labeled samples indicating certainty of class (see Basu: Fig.3, [0025], “statistically quantify a classifier's performance for each label by estimating a value of a performance metric and an associated confidence and/or credibility of its estimation (e.g., a credibility or confidence interval for the estimation of the value of the performance metric for each of the classifier's labels). A label that has an unacceptable confidence and/or credibility level (e.g., a label with a substantially large statistical uncertainty) for the estimation of its performance metric (e.g., as determined via a comparison of the size of the confidence and/or credibility interval to an interval threshold) may be referred to as a “violating label.” Whereas, a label with an acceptable confidence and/or credibility in the estimation of its performance metric (e.g., a label with a sufficiently small statistical uncertainty) may be referred to as a “non-violating label.”)
	Because Pal, Chen and Basu are in the same/similar field of endeavor of Semi-supervised contrastive learning machine learning, accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Pal to include a system that training based on credibility vectors for each of the plurality of partially labeled samples indicating certainty of class as taught by Basu. One would have been motivated to make such a combination in order to provide improved data labeling accuracy and increase flexibility and applicability of generated models. 

Ayvaci teaches the method wherein:
training the first neural network with a loss function applied to the plurality of representations and the updated credibility vectors (see Ayvaci: Fig.2, [0107], “he training system 240 uses the same contrastive loss function to determine both the first contrastive loss values for the first updated embeddings 222 and the second contrastive loss values for the second updated embeddings 232. In some other implementations, the training system 240 uses a first contrastive loss function to determine the first contrastive loss values for the first updated embeddings 222, and a second contrastive loss function to determine the second contrastive loss values for the second updated embeddings 232.”). See also [0074] , 

    PNG
    media_image1.png
    104
    443
    media_image1.png
    Greyscale

	Because Pal, Chen, Basu and Ayvaci are in the similar field of endeavor of Semi-supervised contrastive learning machine learning, accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of probabilistic classification model of Pal to include a system that training based on credibility vectors for each of the plurality of partially labeled samples indicating certainty of class assignment and that transform samples into the first neural network to generate a plurality of representations for the transformed samples as taught by Ayvaci. One would have been motivated to make such a combination in order to provide improved data labeling accuracy and increase flexibility and applicability of generated models. 

Regarding Claim 2, 
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 1. Pal further teaches the system comprising:
subsequent to averaging corresponding updated credibility vectors across the plurality of epochs and prior to determining labels, for each unlabeled sample of the plurality of partially labeled samples: normalizing the averaged credibility vectors by their respective maximal entries; and clipping the normalized credibility vectors to have entry values between one and zero (see Basu: Fig.4, [0049], “dataset amplifier 124 may iterate over the scaling factor x. For instance, for each violating label, dataset amplifier 124 may initialize a value of x (e.g., x.fwdarw.1), and determine an updated credibility interval. If the updated credibility interval continues to render the label as a violating label, the value of x may be incremented (with appropriate granularity) and the process is repeated until either the violating label is transitioned to a non-violating label (or until the value of x reaches a predetermined upper bound). The value of x that results in a transition from violating label to non-violating label provides an indication of how many additional “real” (e.g., not “simulated”) datapoints would be required for an adequate reduction in the label's credibility interval.”)	
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Pal to include a system to normalizing the averaged credibility vectors by their respective maximal entries; and clipping the normalized credibility vectors to have entry values between one and zero as taught by Basu. One would have been motivated to make such a combination in order to provide improved data labeling accuracy and increase flexibility and applicability of generated models.
	
Regarding Claim 3,     
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 2. Basu further teaches the system comprising : 
training a second neural network to predict labels for additional samples based, at least in part, on the plurality of partially labeled samples and the clipped credibility vectors (see Pal: Fig.3, [0043], “At step 340, a machine-learning model may be trained to predict a label for a node based on the first label distribution information of the node. The machine-learning model may be trained using the labels and the first label distribution information of a set of the labeled nodes.”)

Regarding Claim 4,     
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 3. Pal further teaches the system comprising : 
the second neural network is trained on a cross- entropy loss function applied to the clipped credibility vectors as soft labels and outputs of the second neural network from inputting the plurality of partially labeled samples (see Pal: Fig.1, [0024], “he first term is the network-based loss, as in LP (Eq. 2), and the second term is the loss of the node-specific classifier (e.g., loss can be cross-entropy for DNN). However, contrast these two equations: In Eq. 5, ƒ.sub.θ(Z.sub.i) is embedding of the features Z while in LP, ƒ.sub.θ(Z.sub.i)=F.sub.i, which depends only on i and not Z.sub.i. The choice of functional form of ƒ.sub.θ(Z.sub.i) dictates the different methods: logistic regression in Logistic Label Propagation, SVM with Hinge Loss in LapSVM, or a deep neural network in EmbedNN.”)

Regarding Claim 5, 
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 2. Basu further teaches the system comprising : 
subsequent to clipping the normalized credibility vectors and prior to determining labels: determining a percentage of the clipped credibility vectors to subsample; and at least one of setting a subset of the clipped credibility vectors according to the percentage to zero vectors and discarding the subset of the clipped credibility vectors (see Basu: Fig.4A, [0083], “a decoder may reconstruct the sequence of layers from the last hidden state representation. Upon comparison of the reconstructed layer surface data with a layer surface data of the layer being printed, the encoder-decoder based model may identify a deviation between the reconstructed layer surface data and the layer surface data of the layer being printed. Such a deviation is indicated as a predicted anomaly score.”); See motivation to combine Pan, Chen and Ayvaci in claim1.

Regarding Claim 6, 
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 5. Pal further teaches the system comprising :
determining the percentage of the clipped credibility vectors to subsample comprises: for each candidate percentage of a set of candidate percentages, setting a subset of the clipped credibility vectors to zero vectors according to the candidate percentage to generate subsampled credibility vectors (see Basu: Fig.3, [0004], “Classifiers are now ubiquitously deployed in many computation-based applications, including but not limited to search engines, medical diagnostic applications, spam filtering, natural language processing, and numerous other applications. Classifiers may be prone to erroneously assigning (or failing to assign) labels to some datapoints, e.g., classifiers may exhibit non-zero Type I and/414 or Type II error rates. Thus, the issue of quantifying a classifier's performance is a growing concern.”); 
converting the clipped credibility vectors and the subsampled credibility vectors into a first probability distribution and a second probability distribution, respectively; and computing a probability distribution distance from the second probability distribution to the first probability distribution (see Basu: Fig.1, [006], “a classifier may include any computer-enabled mapping (f) from a first set or domain (e.g., the domain of datapoints) to a second set or domain, e.g., a domain that includes the set of all possible subsets of L, e.g., L′. That is, a classifier may be represented by the mapping notation: f(x) ∈ L′. For each label in the set of labels, the mapping may be a deterministic mapping. In other embodiments, the mapping may be a probabilistic mapping. In some multi-label deterministic embodiments, the mapping may be indicated by the notation: f:custom-character.sup.d.fwdarw.2.sup.|L|. In probabilistic embodiments, the mapping may be notated as: f:custom-character.sup.d.fwdarw. [0,1].sup.|L|, where each component of [0,1] .sup.|L| indicates a probability for a corresponding element of L (e.g., a label of the set of labels”); and 
determining the percentage of the clipped credibility vectors as a candidate percentage in the set of candidate percentages having a maximal corresponding probability distribution distance below a threshold probability distribution distance (see Basu: Fig.1, [0046], “upon determining the credibility interval for each label of the set of possible labels, performance metric estimator 122 may be enabled to identify one or more labels that are violating labels. As a reminder, a violating label is a label, where the size of the credibility interval is larger than a predetermined interval threshold (e.g., Δ(l, m, q)≥δ, where δ is the predetermined interval threshold). In some embodiments, performance metric estimator 122 may subdivide the set of labels into two complementary subsets: a set of violating labels(V) and a set if non-violating labels (V) (e.g., for particular values of m, q, and δ).”
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Pal to include a system to convert the clipped credibility vectors and the subsampled credibility vectors into a first probability distribution and a second probability distribution, respectively; and computing a probability distribution distance from the second probability distribution to the first probability distribution as taught by Basu. One would have been motivated to make such a combination in order to provide improved data labeling accuracy and increase flexibility and applicability of generated models.

Regarding Claim 7,  
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 5. Pal further teaches the system comprising : 
computing weights for the clipped credibility vectors as maximal entries of corresponding averaged credibility vectors (see Basu : Fig.1, [0041] “A credibility interval for a random variable may be indicated as: [r.sub.lo, r.sub.up], where r.sub.lo and r.sub.up are the upper and lower bounds of the interval. The size of the interval may be determined as: r.sub.up−r.sub.lo. For a random variable (R) that is distributed via a beta distribution (e.g., R˜Beta(a, b)), the upper and lower bounds of its credibility interval (for a predetermined confidence value q ∈ [0,1]) may be calculated as: r.sub.lo=F.sub.R.sup.−1(0.5.Math.(1−q)) and r.sub.up=F.sub.R.sup.−1(0.5.Math.(1+q)), where F.sub.R(z)=Pr(R≤z).”); and 
determining the subset of the clipped credibility vectors as a subset with lowest computed weights according to the percentage (see Basu: Fig.1, [0042], “For the F1 score performance metric, the upper and lower bounds for the F1 credibility interval [F1.sub.lo, F1.sub.up] may be determined from the harmonic means definition of the F1 score and the upper/lower bounds for the credibility interval for the precision and recall).” 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Pal to include a system that determine the subset of the clipped credibility vectors as a subset with lowest computed weights according to the percentage as taught by Basu. One would have been motivated to make such a combination in order to provide improved data labeling accuracy and increase flexibility and applicability of generated models.

Regarding Claim 8,    
	As shown above, Pal, Chen, Basu and Ayvaci teaches all the limitations of claim 5. Pal further teaches the system comprising :
updating the credibility vectors comprises: for each credibility vector and each transformation of the one or more of the plurality of transformations applied to a corresponding sample, updating each entry of the credibility vector according to proximity of a representation of the sample corresponding to the credibility vector with the transformation applied to representations of other samples with transformations in the plurality of transformations applied; and normalizing entries of each updated credibility vector by corresponding maximal entries (see Basu: Fig.1, [0049], “dataset amplifier 124 may iterate over the scaling factor x. For instance, for each violating label, dataset amplifier 124 may initialize a value of x (e.g., x.fwdarw.1), and determine an updated credibility interval. If the updated credibility interval continues to render the label as a violating label, the value of x may be incremented (with appropriate granularity) and the process is repeated until either the violating label is transitioned to a non-violating label (or until the value of x reaches a predetermined upper bound). The value of x that results in a transition from violating label to non-violating label provides an indication of how many additional “real” (e.g., not “simulated”) datapoints would be required for an adequate reduction in the label's credibility interval.”); and 
averaging the normalized credibility vectors for each corresponding sample across the one or more of the plurality of transformations (see Basu: Fig.1, [0044], “The macro-average of the recall (R) may be determined as: R=|L|.sup.−1Σ.sub.l∈Lr(l). The macro-average of the Fl score may be defined as the harmonic mean HM (P, R). The bounds for the credibility interval for the macro-averages may be computed, as described above, from the values of P and R, as described above”)
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Pal to include a system that updating the credibility vectors and averaging the normalized credibility vectors for each corresponding sample across the one or more of the plurality of transformations as taught by Basu. One would have been motivated to make such a combination in provide smaller, more efficient training datasets and a lag-free and efficient machine learning model training system.

Regarding Claim 9,    
	As shown above, Pal, Chen, Basu and Ayvaci teaches all the limitations of claim 5. Pal further teaches the system comprising : 
training the first neural network comprises backpropagating loss through layers of the first neural network based on a loss function applied to the plurality of representations and the updated credibility vectors ( see Pal: Fig.1, [0037], “DLP may begin with label propagation (hereinafter referred to as the Initial LP Step). Any label propagation (LP) technique may be used to propagate known labels (referred to as Y.sub.L) of labeled nodes L across the network/graph. One result of LP is that each node in the network may have a label distribution. For example, in the simple network shown in FIG. 2, si.”) 

Regarding independent claim 10, 
	Claim 10 is directed to a non-transitory, machine-readable medium claim and has similar/same claim limitation as Claim 1 and is rejected under same rationale.
	
Regarding Claim 11,     
	Claim 11 is directed to machine-readable media claim and has similar/same claim limitation as Claim 4 and is rejected under same rationale.	

Regarding Claim 12,     
	Claim 11 is directed to machine-readable media claim and has similar/same claim limitation as Claim 5 and is rejected under same rationale.	

Regarding Claim 13,     
	As shown above, Pal, Chen, Basu and Ayvaci teaches all the limitations of claim 10. Pal further teaches the system comprising : 
for each training iteration, normalize each of the averaged credibility vectors with respect to maximal certainty values (see Pal: Fig.1, [0042], “after the label distribution has been updated, it may again be used to retrain the label classifier model q, which may again be used to predict the unknown labels Y.sub.U. Since the underlying predictors have been updated (i.e., F.sub.L has been updated after the Subsequent LP Step), the model q would consequently be updated when it is retrained in another iteration of the Training Step. T”)

Regarding Claim 14,     
	As shown above, Pal, Chen, Basu and Ayvaci teaches all the limitations of claim 5. Pal further teaches the system comprising : 
for each training iteration, subsample the normalized credibility vectors and set the subsample of normalized credibility vectors to zero vectors (see Basu: Fig.4, [0049], “dataset amplifier 124 may iterate over the scaling factor x. For instance, for each violating label, dataset amplifier 124 may initialize a value of x (e.g., x.fwdarw.1), and determine an updated credibility interval. If the updated credibility interval continues to render the label as a violating label, the value of x may be incremented (with appropriate granularity) and the process is repeated until either the violating label is transitioned to a non-violating label (or until the value of x reaches a predetermined upper bound). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Pal to include a system that for each training iteration, subsample the normalized credibility vectors and set the subsample of normalized credibility vectors to zero vectors as taught by Basu. One would have been motivated to make such a combination in order to provide improved data labeling accuracy and increase flexibility and applicability of generated models.


Regarding Claim 15,     
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 5. Pal further teaches the system comprising : 
for each training iteration, determine a subsampling rate for a succeeding training iteration (see Pal: Fig.1, [0042], “after the label distribution has been updated, it may again be used to retrain the label classifier model q, which may again be used to predict the unknown labels Y.sub.U. Since the underlying predictors have been updated (i.e., F.sub.L has been updated after the Subsequent LP Step), the model q would consequently be updated when it is retrained in another iteration of the Training Step. T”).”) , wherein the instructions to determine the subsampling rate comprise instructions to determine a maximum of multiple candidate subsampling rates for zero setting the normalized credibility vectors that yields a greatest overall impact on probability distribution of the normalized credibility vectors below a threshold impact (see Basu: Fig.1, “determining the credibility interval for each label of the set of possible labels, performance metric estimator 122 may be enabled to identify one or more labels that are violating labels. As a reminder, a violating label is a label, where the size of the credibility interval is larger than a predetermined interval threshold (e.g., Δ(l, m, q)≥δ, where δ is the predetermined interval threshold). In some embodiments, performance metric estimator 122 may subdivide the set of labels into two complementary subsets: a set of violating labels(V) and a set if non-violating labels (V) (e.g., for particular values of m, q, and δ).”
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Pal to include a system that determine a maximum of multiple candidate subsampling rates for zero setting the normalized credibility vectors that yields a greatest overall impact on probability distribution of the normalized credibility vectors below a threshold impact as taught by Basu. One would have been motivated to make such a combination in provide smaller, more efficient training datasets and a lag-free and efficient machine learning model training system.

Regarding independent Claim 16,     
	Claim 16 is directed to an apparatus claim and has similar/same claim limitation as Claim 1 and is rejected under same rationale.

Regarding Claim 17,     
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 5. Pal further teaches the system comprising : 
the first neural network comprises third neural network and a first projection head, wherein the second neural network comprises the third neural network and a second projection head (see Pal: Fig.3, [0043], “At step 340, a machine-learning model may be trained to predict a label for a node based on the first label distribution information of the node. The machine-learning model may be trained using the labels and the first label distribution information of a set of the labeled nodes.”)

Regarding Claim 18,     
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 5. Pal further teaches the system comprising: 
initialize internal parameters of the third neural network prior to training the second neural network to generate labels for the plurality of partially labeled sample (see Pal: Fig.1, [0024], “he first term is the network-based loss, as in LP (Eq. 2), and the second term is the loss of the node-specific classifier (e.g., loss can be cross-entropy for DNN). However, contrast these two equations: In Eq. 5, ƒ.sub.θ(Z.sub.i) is embedding of the features Z while in LP, ƒ.sub.θ(Z.sub.i)=F.sub.i, which depends only on i and not Z.sub.i. The choice of functional form of ƒ.sub.θ(Z.sub.i) dictates the different methods: logistic regression in Logistic Label Propagation, SVM with Hinge Loss in LapSVM, or a deep neural network in EmbedNN.”)

Regarding Claim 19,     
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 5. Pal further teaches the system comprising: 
average credibility vectors updated across the plurality of training epochs (see Pal: Fig.3, [0043], “At step 350, a predicted label may be generated for each of the unlabeled nodes using the trained machine-learning model and the first label distribution information associated with the unlabeled node. At step 360, a convergence condition may be checked. If the condition is not met, then the process may repeat with the generated predicted labels being propagated and used to train the machine-learning model. This process may repeat until the convergence condition is met.”)

Regarding Claim 20,     
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 5. Pal further teaches the system comprising:
at least one of normalize and clip the averaged credibility vectors (see Pal: Fig.3, [0043], “At step 340, a machine-learning model may be trained to predict a label for a node based on the first label distribution information of the node. The machine-learning model may be trained using the labels and the first label distribution information of a set of the labeled nodes.”)

Regarding Claim 21,     
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 5. Pal further teaches the system comprising: 
prior to generating the soft labels, subsample the credibility vectors (see Basu: Fig.4A, [0083], “a decoder may reconstruct the sequence of layers from the last hidden state representation. Upon comparison of the reconstructed layer surface data with a layer surface data of the layer being printed, the encoder-decoder based model may identify a deviation between the reconstructed layer surface data and the layer surface data of the layer being printed. Such a deviation is indicated as a predicted anomaly score.”);
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Pal to include a system that prior to generating the soft labels, subsample the credibility vectors as taught by Basu. One would have been motivated to make such a combination in provide smaller, more efficient training datasets and a lag-free and efficient machine learning model training system.

Regarding Claim 22,     
	As shown above, Pal, Chen, Basu and Ayvaci and teaches all the limitations of claim 5. Pal further teaches the system comprising: 
determine a subsampling rate for subsampling the credibility vectors based, at least in part, on changes in probability distribution of the credibility vectors by zeroing the credibility vectors at each of one or more candidate subsampling rates (see Basu: Fig.1, “determining the credibility interval for each label of the set of possible labels, performance metric estimator 122 may be enabled to identify one or more labels that are violating labels. As a reminder, a violating label is a label, where the size of the credibility interval is larger than a predetermined interval threshold (e.g., Δ(l, m, q)≥δ, where δ is the predetermined interval threshold). In some embodiments, performance metric estimator 122 may subdivide the set of labels into two complementary subsets: a set of violating labels(V) and a set if non-violating labels (V) (e.g., for particular values of m, q, and δ).”
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Pal to include a system that determine a maximum of multiple candidate subsampling rates for zero setting the normalized credibility vectors that yields a greatest overall impact on probability distribution of the normalized credibility vectors below a threshold impact as taught by Basu. One would have been motivated to make such a combination in provide smaller, more efficient training datasets and a lag-free and efficient machine learning model training system.


Response to Arguments
Claim Rejections - 35 U.S.C. § 101, 
	Regarding the 35 U.S.C. 101 rejection for being directed non-statutory subject matter has been sustained based on applicant amendments . Therefore, the 35 U.S.C. 101 rejection has been sustained.

Claim Rejections - 35 U.S.C. § 103,
	Applicant’s arguments with respect to claim amendments have been considered but are moot considering the new combination of references being used in the current rejection. The new combination of references was necessitated by Applicant’s claim amendments. Therefore, the claims are rejected under the new combination of references as indicated above.

Conclusion
	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
PGPUB
 NUMBER:
INVENTOR-INFORMATION:
TITLE / DESCRIPTION
US 20190205748 A1 
Fukuda; Takashi
Title: SOFT LABEL GENERATION FOR KNOWLEDGE DISTILLATION
Description The present disclosure, generally, relates to machine learning, and more particularly, to methods, computer program products and computer systems for generating soft labels used for training a model.
US 20210326660 A1
Krishnan; Dilip
Title: Supervised Contrastive Learning With Multiple Positive Examples
Description: The cross-entropy loss is likely the most widely used loss function for supervised learning. It is naturally defined as the KL-divergence between two discrete distributions: the empirical label distribution (a discrete distribution of 1-hot vectors) and the empirical distribution of the logits.


Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZELALEM W SHALU whose telephone number is (571)272-3003. The examiner can normally be reached M- F 0800am- 0500pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached at (571) 272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Zelalem Shalu/Examiner, Art Unit 2145    



/CESAR B PAULA/Supervisory Patent Examiner, Art Unit 2145
Read full office action
Prosecution Timeline

Nov 02, 2022
Application Filed
Oct 22, 2025
Non-Final Rejection mailed — §101, §103
Jan 12, 2026
Interview Requested
Jan 22, 2026
Applicant Interview (Telephonic)
Jan 24, 2026
Examiner Interview Summary
Jan 30, 2026
Response Filed
Mar 05, 2026
Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/481,160
Patent 12619879
TRAINING NEURAL NETWORKS USING LEARNED OPTIMIZERS
4y 7m to grant Granted May 05, 2026
17/827,588
Patent 12477016
AUTOMATION OF VISUAL INDICATORS FOR DISTINGUISHING ACTIVE SPEAKERS OF USERS DISPLAYED AS THREE-DIMENSIONAL REPRESENTATIONS
3y 5m to grant Granted Nov 18, 2025
17/808,093
Patent 12468969
METHODS FOR CORRELATED HISTOGRAM CLUSTERING FOR MACHINE LEARNING
3y 4m to grant Granted Nov 11, 2025
15/770,665
Patent 12419611
PATIENT MONITOR, PHYSIOLOGICAL INFORMATION MEASUREMENT SYSTEM, PROGRAM TO BE USED IN PATIENT MONITOR, AND NON-TRANSITORY COMPUTER READABLE MEDIUM IN WHICH PROGRAM TO BE USED IN PATIENT MONITOR IS STORED
7y 5m to grant Granted Sep 23, 2025
17/344,053
Patent 12153783
User Interfaces and Methods for Generating a New Artifact Based on Existing Artifacts
3y 5m to grant Granted Nov 26, 2024
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
29%
Grant Probability
49%
With Interview (+20.3%)
3y 6m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 110 resolved cases by this examiner. Grant probability derived from career allowance rate.