Office Action Analysis: 18047335 — SYSTEMS AND METHODS FOR DATA CORRECTION

Examiner Intelligence

PHAM, JESSICA THUY View full profile →
Grants only 33% of cases
Career Allow Rate
1 granted / 3 resolved
-21.7% vs TC avg
Minimal -33% lift
Without
With
+-33.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
38 currently pending
Career history
41
Total Applications
across all art units
Statute-Specific Performance

§101
26.8%
-13.2% vs TC avg
§103
35.5%
-4.5% vs TC avg
§102
11.0%
-29.0% vs TC avg
§112
22.7%
-17.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 3 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment/Status of Claims
Claims 1-5, 8-9, 11-12, and 14-20 were amended.
Claims 6-7, 10, and 13 were cancelled.
Claims 1-5, 8-9, 11-12, and 14-20 are pending and examined herein.
Claim 18 has a limitation interpreted under 112(f).
Claims 1-5, 8-9, 11-12, and 14-20 are rejected under 35 U.S.C. 101.
Claims 1-5, 8-9, 11-12, and 14-20 are rejected under 35 U.S.C. 103.

Response to Arguments
Applicant's arguments with respect to the 35 U.S.C. 101 rejection of claims 1-5, 8-9, 11-12, and 14-20, filed 11/28/2025, have been fully considered but they are not persuasive.
Applicant argues, see pages 12-15, "Applicant respectfully submits that claim 1 is comparable to example 39, and not example 47, because it is confined to the creation of training data and the use of the training data to train a machine learning model, and does not require a specific mathematical calculation to train the machine learning model, as "approximating a change in a conditional loss for the neural network" is not a specific algorithm, and a "loss" itself is a value determined by a computation, and not a computation itself See, e.g., paragraph [O 109] of the Specification. Accordingly, because claims 1, 11, and 17 do not recite an abstract idea, Applicant respectfully requests that the rejection of claims 1, 11, and 17, and claims 2-5, 8-9, 12, 14-16, and 18-20 respectively dependent thereon under 35 U.S.C. § 101 be withdrawn." Examiner respectfully disagrees.
Regarding the comparison to Example 39, the fact that the claims are “confined to the creation of training data and the use of the training data to train a machine learning model” is not relevant to the 35 U.S.C. 101 analysis of the claims in either Example 39 or the instant application. Example 39 is not directed to an abstract idea because the claims do not recite a mathematical concept, even though the limitations may be based on mathematical concepts, or a mental process. Specifically, Example 39 is directed to the transformation of images, which cannot be practically performed in the human mind. 
In contrast, the instant application recites “computing an influence of each of the second plurality of parts on the false label by approximating a change in a conditional loss for the neural network, wherein the influence for each part of the second plurality of parts quantifies a degree to which the part causes the neural network to make the prediction of the false label;”, which does recite a mathematical concept.
MPEP 2106.04(a)(2)(I)(C) states “A claim that recites a mathematical calculation, when the claim is given its broadest reasonable interpretation in light of the specification, will be considered as falling within the "mathematical concepts" grouping. A mathematical calculation is a mathematical operation (such as multiplication) or an act of calculating using mathematical methods to determine a variable or number, e.g., performing an arithmetic operation such as exponentiation. There is no particular word or set of words that indicates a claim recites a mathematical calculation. That is, a claim does not have to recite the word "calculating" in order to be considered a mathematical calculation. For example, a step of "determining" a variable or number using mathematical methods or "performing" a mathematical operation may also be considered mathematical calculations when the broadest reasonable interpretation of the claim in light of the specification encompasses a mathematical calculation.”
The instant application determines (“computes”) a variable (“an influence”) using a mathematical method (“approximating a change in a conditional loss for the neural network”). [0121] of the specification states "According to some aspects, the influence component uses equation (5) to approximate an influence via             
                ε
            
        -weighting." Therefore, the broadest reasonable interpretation in light of the specification of the limitation is a mathematical calculation.
	The instant application also recites a mental process, “identifying a false label from among a plurality of predicted labels corresponding to the first plurality of parts of the first training sample, wherein the plurality of predicted labels is generated by the neural network trained based on a training set comprising a plurality of training samples and a plurality of training labels;”. In contrast to Example 39, wherein transforming an image cannot be performed in the human mind, identifying a false label can be practically performed in the human mind, and is therefore a mental process.
	Regarding the comparison to Example 47, both examples recite mathematical concepts as both, when given their broadest reasonable interpretation in light of the specification, recite a mathematical calculation. See above for the analysis of “computing an influence of each of the second plurality of parts on the false label by approximating a change in a conditional loss for the neural network, wherein the influence for each part of the second plurality of parts quantifies a degree to which the part causes the neural network to make the prediction of the false label;”.
	Applicant further argues, see pages 15-18, "Applicant respectfully submits that at least the additional element of "modifying the training set based on the identified part of the second training sample to obtain a corrected training set" integrates claim 1 into a practical application of the alleged judicial exception. When claim 1 is read as a whole, the additional element enables the claim to recite an improvement to the machine learning technology accuracy of machine learning technology" and "The additional element of "modifying the training set based on the identified part of the second training sample to obtain a corrected training set" therefore enables claim 1 to recite an improvement to the machine learning technology accuracy of machine learning technology, and accordingly integrates the alleged judicial exception into a practical application." Examiner respectfully disagrees.
	MPEP 2106.05(a) states “If it is asserted that the invention improves upon conventional functioning of a computer, or upon conventional technology or technological processes, a technical explanation as to how to implement the invention should be present in the specification. That is, the disclosure must provide sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement. The specification need not explicitly set forth the improvement, but it must describe the invention such that the improvement would be apparent to one of ordinary skill in the art. Conversely, if the specification explicitly sets forth an improvement but in a conclusory manner (i.e., a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art), the examiner should not determine the claim improves technology. An indication that the claimed invention provides an improvement can include a discussion in the specification that identifies a technical problem and explains the details of an unconventional technical solution expressed in the claim, or identifies technical improvements realized by the claim over the prior art.”
	MPEP 2106.05(a) also states “An important consideration in determining whether a claim improves technology is the extent to which the claim covers a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome. McRO, 837 F.3d at 1314-15, 120 USPQ2d at 1102-03; DDR Holdings, 773 F.3d at 1259, 113 USPQ2d at 1107. In this respect, the improvement consideration overlaps with other considerations, specifically the particular machine consideration (see MPEP § 2106.05(b)), and the mere instructions to apply an exception consideration (see MPEP § 2106.05(f)). Thus, evaluation of those other considerations may assist examiners in making a determination of whether a claim satisfies the improvement consideration.”
	The additional element of "modifying the training set based on the identified part of the second training sample to obtain a corrected training set" is not a technical improvement as it does not recite a particular solution to the problem. Rather, it falls under the “mere instructions to apply an exception” consideration (see below 35 U.S.C. 101 rejection), and does not modify the training set in a particular manner that would provide an improvement. Therefore, the claims do not represent an improvement to technology.
	Applicant further argues, see pages 18-19, "Even assuming arguendo that claim 1 does not integrate an alleged judicial exception into a practical application, Applicant submits that the above-identified additional element of claim 1 enables the claim when read as whole to amount to significantly more than the alleged judicial exception. For example, the additional element of "modifying the training set based on the identified part of the second training sample to obtain a corrected training set" enables the computed influence of the identified part to be a factor in assembling a corrected training set, and therefore is a limitation other than what is well-understood, routine, conventional activity in the field, and is an unconventional step that confines claim 1 to the particular useful application of retraining a neural network to generate more accurate predictions of labels for parts of a sample. See MPEP § 2016.05." Examiner respectfully disagrees.
	MPEP 2106.05(d) states “When making a determination whether the additional elements in a claim amount to significantly more than a judicial exception, the examiner should evaluate whether the elements define only well-understood, routine, conventional activity. In this respect, the well-understood, routine, conventional consideration overlaps with other Step 2B considerations, particularly the improvement consideration (see MPEP § 2106.05(a)), the mere instructions to apply an exception consideration (see MPEP § 2106.05(f)), and the insignificant extra-solution activity consideration (see MPEP § 2106.05(g)). Thus, evaluation of those other considerations may assist examiners in making a determination of whether a particular element or combination of elements is well-understood, routine, conventional activity.”
As stated above, "modifying the training set based on the identified part of the second training sample to obtain a corrected training set" is not an improvement and rather falls under the “mere instructions to apply an exception” consideration. Therefore, it is a well-understood, routine, conventional activity and does not amount to significantly more than the judicial exception.

Applicant’s arguments, see pages 19-20, filed 11/28/2025, with respect to the rejection(s) of claims 1-5, 8-9, 11-12, and 14-20 under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Teso (“Interactive Label Cleaning with Example-based Explanations”, 2021) and Qin (“On Sample Based Explanation Methods for Sequence-to-Sequence Applications”, June 2022).
	However, note that the interpretation of false label has remained the same. Applicant argues, see pages 21-22, that "there is no evidence that the label             
                
                                y
                            
                            ~
                        
                        t
                    
         is predicted by a neural network." However, Teso, page 3 states “We consider a very general class of probabilistic classifiers             
                f
                :
                
                        R
                    
                        d
                    
                →
                [
                c
                ]
            
         of the form             
                f
                
                        x
                        ;
                        θ
                    
                ≔
                
                        argmax
                    
                        y
                        ∈
                        
                                c
                            
                P
                (
                y
                |
                x
                ;
                θ
                )
            
        , where the conditional distribution             
                P
                (
                Y
                |
                X
                ;
                θ
                )
            
         has been fit on training data by minimizing the cross-entropy loss             
                l
                (
                
                        x
                        ,
                        y
                        ,
                        θ
                    
                =
                -
                
                        ∑
                        
                            i
                            ∈
                            
                                    c
                                
                        l
                        
                                i
                                =
                                y
                            
                        log
                        P
                        (
                        i
                        |
                        x
                        ,
                        θ
                        )
                    
        . In our implementation, we also assume             
                P
            
         to be a neural network with a softmax activation at the top layer, trained using some variant of SGD and possibly early stopping.” Page 4 states "Intuitively,             
                
                        z
                    
                        k
                    
                ∈
                
                        D
                    
                        t
                        -
                        1
                    
        is a contrastive counter-example for a  suspicious example             
                
                                z
                            
                            ~
                        
                        t
                    
         if removing it from the data set and retraining leads to a model with parameters             
                
                        θ
                    
                        t
                        -
                        1
                    
                        -
                        k
                    
         that assigns higher probability to the suspicious label             
                
                                y
                            
                            ~
                        
                        t
                    
                .
            
         The most contrastive counter-example is then the one that maximally affects the change in probability:             
                
                        argmax
                    
                        k
                        ∈
                        
                                t
                                -
                                1
                            
                {
                P
                
                                y
                            
                            ~
                        
                                x
                            
                                t
                            
                        ;
                        
                                θ
                            
                                t
                                -
                                1
                            
                                -
                                k
                            
                -
                P
                (
                
                        y
                    
                    ~
                
                |
                
                        x
                    
                        t
                    
                ;
                
                        θ
                    
                        t
                        -
                        1
                    
                )
                }
            
         (2)". Therefore, as             
                P
            
         is a neural network with parameters             
                
                        θ
                    
                        t
                        -
                        1
                    
        , and the conditional distribution             
                P
                (
                
                        y
                    
                    ~
                
                |
                
                        x
                    
                        t
                    
                ;
                
                        θ
                    
                        t
                        -
                        1
                    
                )
            
         has been fit on training data,             
                P
                (
                
                        y
                    
                    ~
                
                |
                
                        x
                    
                        t
                    
                ;
                
                        θ
                    
                        t
                        -
                        1
                    
                )
            
         is the probability that the neural network             
                P
            
         will predict             
                
                        y
                    
                    ~
                
        , and             
                
                        y
                    
                    ~
                
        , in the broadest reasonable interpretation, is a predicted label generated by a neural network. 

Information Disclosure Statement
The information disclosure statement filed 10/18/2022 fails to comply with 37 CFR 1.98(a)(2), which requires a legible copy of each cited foreign patent document; each non-patent literature publication or that portion which caused it to be listed; and all other information or that portion which caused it to be listed.  It has been placed in the application file, but the information referred to therein has not been considered. The unavailable documents are document #14, Hampel, et al., “Robust Statistics: The Approach Based on Influence Functions” and document #17, Honnibal, et al., “explosion/spaCy: v3.0.0a16”.
The references, other than the ones listed above, have been considered. However, it appears that a document number appears before each author name in the citation. It is recommended that the applicant file a new IDS with corrected citations.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
a modification component configured to identify the false label and to modify the training set in claim 18.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 17-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  
The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because they are claimed as a machine without any physical/tangible components. See MPEP 2106.03. 
Claim 17 recites “a memory component” and “a processing device coupled to the memory component, the processing device configured to perform operations comprising”. The broadest reasonable interpretation of “a memory component” includes software memory, such as a database or software-defined storage. The broadest reasonable interpretation of “a processing device” is not limited to a computer processor, but includes software processing devices such as firmware, as mentioned in [0182] of the specification.
                                                                                                                                                                     
Claims 1-5, 8-9, 11-12, and 14-20  are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject
matter. The analysis of claims 1-5, 8-9, 11-12, and 14-20, in accordance with these steps, follows.

Step 1 Analysis:
Step 1 is to determine whether the claim is directed to a statutory category (process, machine,
manufacture, or composition of matter. Claims 1-5, 8-9, 11-12, and 14-16 are directed to processes, and claims 16-20, if amended to include a tangible component, would be directed to a machine. Though claims 16-20 are not directed to a statutory category, the analysis will proceed.

Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.
	None of the claims represent an improvement to technology.

Regarding claim 1, the following claim elements are abstract ideas:
identifying a false label from among a plurality of predicted labels corresponding to the first plurality of parts of the first training sample, wherein the plurality of predicted labels is generated by the neural network trained based on a training set comprising a plurality of training samples and a plurality of training labels; (Identifying a false label can be practically performed in the human mind. This is a mental process.)
computing an influence of each of the second plurality of parts on the false label by approximating a change in a conditional loss for the neural network, wherein the influence for each part of the second plurality of parts quantifies a degree to which the part causes the neural network to make the prediction of the false label; (Computing an influence by approximating a change in a conditional loss is a mathematical calculation, which is a mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception: 
A method for training a neural network, comprising: (This recites generic machine learning processes and components; this amounts to mere instructions to apply an exception.)
obtaining a training set that has been used for training a neural network, the training set including a first training sample and a second training sample, where in the first training sample includes a first plurality of parts and the second training sample includes a second plurality of parts, and wherein each of the first plurality of parts and the second plurality of parts comprises an index indicating a location within the first training sample or the second training sample; (Obtaining data is the generic process of receiving data, which amounts to mere instructions to apply an exception.)
wherein the plurality of predicted labels is generated by the neural network; (This limitation recites generic machine learning processes and components, which amounts to mere instructions to apply an exception.)
modifying the training set based on the identified part of the second training sample to obtain a corrected training set; and (This limitation is writing data to memory based on a result, which is storing the result. This amounts to mere instructions to apply an exception.)
retraining the neural network using the corrected training set. (This recites generic machine learning processes and components; this amounts to mere instructions to apply an exception.)
	 
	Regarding claim 2, the rejection of claim 1 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
	displaying the first training sample and the plurality of predicted labels in a user interface; and (Displaying data is an insignificant extra-solution activity. See MPEP § 2106.05, ‘Mere Data Gathering’, ex. iii.)
receiving a user input identifying the false label from among the plurality of predicted labels via the user interface. (In the art of computing, receiving a user input is merely receiving data from memory, or alternatively, receiving data over a network, both of which are insignificant extra-solution activities. See MPEP § 2106.05(d)(II), list 1, ex. i and iv.)
	
	Regarding claim 3, the rejection of claim 1 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
	displaying the identified part of the second training sample and the corresponding source label in a user interface; and (Displaying data is an insignificant extra-solution activity. See MPEP § 2106.05, ‘Mere Data Gathering’, ex. iii.)
receiving a user input corresponding to the identified part of the second training sample or the corresponding source label, wherein the identified part of the training sample is identified based on the user input. (In the art of computing, receiving a user input is merely receiving data from memory, or alternatively, receiving data over a network, both of which are insignificant extra-solution activities. See MPEP § 2106.05(d)(II), list 1, ex. i and iv.)
	
	Regarding claim 4, the rejection of claim 1 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
	receiving a corrected label via a user interface; and (In the art of computing, receiving a user input is merely receiving data from memory, or alternatively, receiving data over a network, both of which are insignificant extra-solution activities. See MPEP § 2106.05(d)(II), list 1, ex. i and iv.)
labeling the identified part with the corrected label, wherein the corrected training set includes the corrected label. (This limitation recites generic machine learning processes and components, which amounts to mere instructions to apply an exception.)
	
Regarding claim 5, the rejection of claim 1 is incorporated herein. Further, the following are abstract ideas:
	identifying a ground-truth label for the first training sample; (Identifying a ground-truth label for data can be practically performed in the human mind. This is a mental process.)
comparing the false label to the ground-truth label, wherein the false label is identified based on the comparison. (Comparing labels to identify a false label can be practically performed by the human mind. This is a mental process.)
	Claim 5 does not recite any additional elements.

Regarding claim 8, the rejection of claim 1 is incorporated herein. Further, the following are abstract ideas:
identifying a plurality of encoder output weights and a plurality of class transition parameters, wherein the influence is approximated based on the plurality of encoder output weights and is independent of the plurality of class transition parameters. (Identifying a plurality of encoder output weights and class transition parameters can be practically performed in the human mind, and is therefore a mental process. Approximating an influence based on the encoder output weights, independent of the class transition parameters can also be practically performed in the human mind, and is therefore also a mental process.

Regarding claim 9, the rejection of claim 1 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
the first training sample comprises a text sample and the false label corresponds to a phrase of the text sample. (This limitation merely limits the abstract ideas to the technological environment of text analytics. This is a field of use limitation. See MPEP § 2106.05(h).)

Regarding claim 11, the following are abstract ideas:
A method for data correction, comprising: (Data correction can be practically performed by the human mind. This is a mental process.)
identifying a false label from among a plurality of predicted labels corresponding to the first plurality of parts of the first training sample, wherein the plurality of predicted labels is generated by the neural network (Identifying a false label can be practically performed in the human mind. This is a mental process.)
computing an influence of each of the second plurality of parts on the false label by approximating a change in a conditional loss for the neural network, wherein the influence for each part of the second plurality of parts quantifies a degree to which the part causes the neural network to make the prediction of the false label; (Computing an influence by approximating a change in a conditional loss is a mathematical calculation, which is a mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
A method for training a neural network, comprising: (This recites generic machine learning processes and components; this amounts to mere instructions to apply an exception.)
obtaining a training set that has been used for training a neural network, the training set including a first training sample and a second training sample, where in the first training sample includes a first plurality of parts and the second training sample includes a second plurality of parts, and wherein each of the first plurality of parts and the second plurality of parts comprises an index indicating a location within the first training sample or the second training sample; (Obtaining data is the generic process of receiving data, which amounts to mere instructions to apply an exception.)
	 correcting a label corresponding to a part of the second training sample from the plurality of training samples based on the computed influence of the part of the second training sample; and(This limitation is writing data to memory based on a result, which is simply storing the result. This amounts to mere instructions to apply an exception.)
	 retraining the neural network using the corrected training set. (This limitation recites generic machine learning components and processes at a high level, which amounts to mere instructions to apply an exception.)
	
Regarding claim 12, the rejection of claim 11 is incorporated herein. Further, the following are abstract ideas:
	identifying a ground-truth label for the first training sample; (Identifying a ground-truth label for data can be practically performed in the human mind. This is a mental process.)
comparing the false label to the ground-truth label, wherein the false label is identified based on the comparison. (Comparing labels to identify a false label can be practically performed by the human mind. This is a mental process.)
	Claim 12 does not recite any additional elements.	
	
Regarding claim 14, the rejection of claim 11 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
displaying the part of the second training sample and the corresponding source label in a user interface; and (Displaying data is an insignificant extra-solution activity. See MPEP § 2106.05, ‘Mere Data Gathering’, ex. iii.)
receiving a user input corresponding to the part of the second training sample or the corresponding source label, wherein the part of the training sample is identified based on the user input. (In the art of computing, receiving a user input is merely receiving data from memory, or alternatively, receiving data over a network, both of which are insignificant extra-solution activities. See MPEP § 2106.05(d)(II), list 1, ex. i and iv.)
	
Regarding claim 15, the rejection of claim 11 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
	displaying the part of the training sample and the corresponding source label in a user interface; and (Displaying data is an insignificant extra-solution activity. See MPEP § 2106.05, ‘Mere Data Gathering’, ex. iii.)
receiving a user input identifying the part of the training sample or the corresponding source label, wherein the part of the training sample and the corresponding source label are identified based on the user input. (In the art of computing, receiving a user input is merely receiving data from memory, or alternatively, receiving data over a network, both of which are insignificant extra-solution activities. See MPEP § 2106.05(d)(II), list 1, ex. i and iv.)
	
Regarding claim 16, the rejection of claim 11 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
	receiving a corrected label via a user interface; and (In the art of computing, receiving a user input is merely receiving data from memory, or alternatively, receiving data over a network, both of which are insignificant extra-solution activities. See MPEP § 2106.05(d)(II), list 1, ex. i and iv.)
replacing the label with the corrected label, wherein the corrected training set includes the corrected label. (This limitation is writing data to memory based on a result, which is simply storing the result. This amounts to mere instructions to apply an exception.)

Regarding claim 17, the following are abstract ideas:
compute an influence of each of a plurality of training labels on a target label by approximating a change in a conditional loss for the neural network corresponding to each of the plurality of training labels. (Computing an influence by approximating a change in a conditional loss is a mathematical calculation, which is a mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
A system for training a machine learning model, the system comprising: (This recites a generic system, this amounts to mere instructions to apply an exception.)
a memory component; and (This recites generic memory, this amounts to mere instructions to apply an exception.)
a processing device coupled to the memory component, the processing device configured to perform operations comprising: (This recites generic computer components and processes; this amounts to mere instructions to apply an exception.)
The remainder of claim 17 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Regarding claim 18, the rejection of claim 17 is incorporated herein. Further, the following is an abstract idea:
identify the false label and to (Identifying a false label can be practically performed in the human mind. This is a mental process.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
a modification component configured to ... modify the training set (This refers to generic machine learning components and generic computer components, which amounts to mere instructions to apply an exception. Modifying the training set is the generic computer process of writing data, which also amounts to mere instructions to apply an exception.)

Regarding claim 19, the rejection of claim 18 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
a user interface, configured to receive an input identifying the false label. (In the art of computing, receiving a user input is merely receiving data from memory, or alternatively, receiving data over a network, both of which are insignificant extra-solution activities. See MPEP § 2106.05(d)(II), list 1, ex. i and iv. The user interface itself is a generic computing component recited at a high level of generality and amounts to mere instructions to apply an exception.)

Regarding claim 20, the rejection of claim 19 is incorporated herein. The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
the user interface is further configured to receive a corrected label. (Receiving data is a generic/known computer process. This amounts to mere instructions to apply an exception.)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1, 4, 9, 11, and 16-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Teso (“Interactive Label Cleaning with Example-based Explanations”, 2021) and Qin (“On Sample Based Explanation Methods for Sequence-to-Sequence Applications”, July 2022).

Regarding claim 1, Teso teaches
	A method for training a neural network, comprising: (The abstract states "We tackle sequential learning under label noise in applications where a human supervisor can be queried to relabel suspicious examples." Page 3 states “We consider a very general class of probabilistic classifiers                         
                            f
                            :
                            
                                    R
                                
                                    d
                                
                            →
                            [
                            c
                            ]
                        
                     of the form                         
                            f
                            
                                    x
                                    ;
                                    θ
                                
                            ≔
                            
                                    argmax
                                
                                    y
                                    ∈
                                    
                                            c
                                        
                            P
                            (
                            y
                            |
                            x
                            ;
                            θ
                            )
                        
                    , where the conditional distribution                         
                            P
                            (
                            Y
                            |
                            X
                            ;
                            θ
                            )
                        
                     has been fit on training data by minimizing the cross-entropy loss                         
                            l
                            (
                            
                                    x
                                    ,
                                    y
                                    ,
                                    θ
                                
                            =
                            -
                            
                                    ∑
                                    
                                        i
                                        ∈
                                        
                                                c
                                            
                                    l
                                    
                                            i
                                            =
                                            y
                                        
                                    log
                                    P
                                    (
                                    i
                                    |
                                    x
                                    ,
                                    θ
                                    )
                                
                    . In our implementation, we also assume                         
                            P
                        
                     to be a neural network with a softmax activation at the top layer, trained using some variant of SGD and possibly early stopping.”)
	obtaining a training set that has been used for training a neural network, the training set including a first training sample and a second training sample, (Page 3 states “At the beginning of iteration                         
                            t
                        
                    , the machine has acquired a training set                         
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                            =
                            
                                            z
                                        
                                            1
                                        
                                    ,
                                     
                                    …
                                    ,
                                     
                                            z
                                        
                                            t
                                            -
                                            1
                                        
                     and trained a model with parameters                         
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                     on it. At this point, the machine receives a new, possibly mislabeled example                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     (line 3) and has to decide whether to trust it. Following skeptical learning [5], CINCER does so by computing the margin                         
                            μ
                            
                                                    z
                                                
                                                ~
                                            
                                            t
                                        
                                    ,
                                     
                                            θ
                                        
                                            t
                                            -
                                            1
                                        
                    , i.e., the difference in conditional probability between the model’s prediction                         
                            
                                            y
                                        
                                        ^
                                    
                                    t
                                
                            ≔
                            
                                    argmax
                                
                                    y
                                
                            P
                            
                                    y
                                
                                            x
                                        
                                            t
                                        
                                    ,
                                    
                                            θ
                                        
                                            t
                                            -
                                            1
                                        
                     and the annotation                         
                            
                                            y
                                        
                                        ~
                                    
                                    t
                                
                    .” Page 3 states “If                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     is compatible, it is added to the data set as-is (line 5). Otherwise, CINCER computes a counterexample                         
                            
                                    z
                                
                                    k
                                
                            ∈
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                     that maximally supports the machine’s suspicion.”                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     is interpreted as the first training sample and                         
                            
                                    z
                                
                                    k
                                
                     is interpreted as the second training sample.)
identifying a false label from among a plurality of predicted labels …, wherein the plurality of predicted labels is generated by a neural network trained based on a training set comprising a plurality of training samples and a plurality of training labels; (Page 3 states "The pseudo-code of CINCER is listed in Algorithm 1. At the beginning of iteration                         
                            t
                        
                    , the machine has acquired a training set                         
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                            =
                            
                                            z
                                        
                                            1
                                        
                                    ,
                                     
                                    …
                                    ,
                                     
                                            z
                                        
                                            t
                                            -
                                            1
                                        
                     and trained a model with parameters                         
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                     on it. At this point, the machine receives a new, possibly mislabeled example                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     (line 3) and has to decide whether to trust it. Following skeptical learning [5], CINCER does so by computing the margin                         
                            μ
                            
                                                    z
                                                
                                                ~
                                            
                                            t
                                        
                                    ,
                                     
                                            θ
                                        
                                            t
                                            -
                                            1
                                        
                    , i.e., the difference in conditional probability between the model’s prediction                         
                            
                                            y
                                        
                                        ^
                                    
                                    t
                                
                            ≔
                            
                                    argmax
                                
                                    y
                                
                            P
                            
                                    y
                                
                                            x
                                        
                                            t
                                        
                                    ,
                                    
                                            θ
                                        
                                            t
                                            -
                                            1
                                        
                     and the annotation                         
                            
                                            y
                                        
                                        ~
                                    
                                    t
                                
                    ." Page 3 further states "The example                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     is deemed compatible if the margin is below a given threshold                         
                            τ
                        
                     and suspicious otherwise". The suspicious label is interpreted as the false label. Page 3 states “We consider a very general class of probabilistic classifiers                         
                            f
                            :
                            
                                    R
                                
                                    d
                                
                            →
                            [
                            c
                            ]
                        
                     of the form                         
                            f
                            
                                    x
                                    ;
                                    θ
                                
                            ≔
                            
                                    argmax
                                
                                    y
                                    ∈
                                    
                                            c
                                        
                            P
                            (
                            y
                            |
                            x
                            ;
                            θ
                            )
                        
                    , where the conditional distribution                         
                            P
                            (
                            Y
                            |
                            X
                            ;
                            θ
                            )
                        
                     has been fit on training data by minimizing the cross-entropy loss                         
                            l
                            (
                            
                                    x
                                    ,
                                    y
                                    ,
                                    θ
                                
                            =
                            -
                            
                                    ∑
                                    
                                        i
                                        ∈
                                        
                                                c
                                            
                                    l
                                    
                                            i
                                            =
                                            y
                                        
                                    log
                                    P
                                    (
                                    i
                                    |
                                    x
                                    ,
                                    θ
                                    )
                                
                    . In our implementation, we also assume                         
                            P
                        
                     to be a neural network with a softmax activation at the top layer, trained using some variant of SGD and possibly early stopping.” Page 4 states "Intuitively,                         
                            
                                    z
                                
                                    k
                                
                            ∈
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                    is a contrastive counter-example for a  suspicious example                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     if removing it from the data set and retraining leads to a model with parameters                         
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                                    -
                                    k
                                
                     that assigns higher probability to the suspicious label                         
                            
                                            y
                                        
                                        ~
                                    
                                    t
                                
                            .
                        
                     The most contrastive counter-example is then the one that maximally affects the change in probability:                         
                            
                                    argmax
                                
                                    k
                                    ∈
                                    
                                            t
                                            -
                                            1
                                        
                            {
                            P
                            
                                            y
                                        
                                        ~
                                    
                                            x
                                        
                                            t
                                        
                                    ;
                                    
                                            θ
                                        
                                            t
                                            -
                                            1
                                        
                                            -
                                            k
                                        
                            -
                            P
                            (
                            
                                    y
                                
                                ~
                            
                            |
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                            )
                            }
                        
                     (2)". Therefore, as                         
                            P
                        
                     is a neural network with parameters                         
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                    , and the conditional distribution                         
                            P
                            (
                            
                                    y
                                
                                ~
                            
                            |
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                            )
                        
                     has been fit on training data,                         
                            P
                            (
                            
                                    y
                                
                                ~
                            
                            |
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                            )
                        
                     is the probability that the neural network                         
                            P
                        
                     will predict                         
                            
                                    y
                                
                                ~
                            
                    , and                         
                            
                                    y
                                
                                ~
                            
                    , in the broadest reasonable interpretation, is a predicted label generated by a neural network. As the algorithm is repeated for each training sample,                         
                            
                                    y
                                
                                ~
                            
                     is selected from among a plurality of predicted labels.)
	 computing an influence of each of the second plurality of [labels] on the false label by approximating a change in a conditional loss for the neural network, … (Page 4 states "Intuitively,                         
                            
                                    z
                                
                                    k
                                
                            ∈
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                    is a contrastive counter-example for a  suspicious example                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     if removing it from the data set and retraining leads to a model with parameters                         
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                                    -
                                    k
                                
                     that assigns higher probability to the suspicious label                         
                            
                                            y
                                        
                                        ~
                                    
                                    t
                                
                            .
                        
                    "  Page 5 states "Contrastive counter-examples are highly influential. Can this algorithm be used for identifying influential counter-examples? It turns out that, as long as the model is obtained by optimizing the cross-entropy loss, the answer is affirmative." Teso provides equations for calculating the influence of the training labels that approximates eq. 2 (page 4) which is                         
                            
                                    argmax
                                
                                    k
                                    ∈
                                    
                                            t
                                            -
                                            1
                                        
                            {
                            P
                            
                                            y
                                        
                                        ~
                                    
                                            x
                                        
                                            t
                                        
                                    ;
                                    
                                            θ
                                        
                                            t
                                            -
                                            1
                                        
                                            -
                                            k
                                        
                            -
                            P
                            (
                            
                                    y
                                
                                ~
                            
                            |
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                            )
                            }
                        
                    . One of ordinary skill in the art would understand, as the false label is                         
                            
                                    y
                                
                                ~
                            
                    , that the influence would be calculated on the false label. Algorithm 1 (page 4) shows that this calculation happens for each label. Algorithm 1 states that the influence is calculated using Eq. 12 (equivalent to Eq. 2) which is                         
                            
                                    argmax
                                
                                    k
                                    ∈
                                    
                                            t
                                            -
                                            1
                                        
                            ∇
                            l
                            
                                                            z
                                                        
                                                        ~
                                                    
                                                    t
                                                
                                            ,
                                            
                                                    θ
                                                
                                                    t
                                                    -
                                                    1
                                                
                                    ⊤
                                
                            H
                            
                                                    θ
                                                
                                                    t
                                                    -
                                                    1
                                                
                                    -
                                    1
                                
                                    ∇
                                
                                    θ
                                
                            l
                            (
                            
                                    z
                                
                                    k
                                
                            ,
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                            )
                        
                     [page 5]. One of ordinary skill would realize that this is an approximation of the change of the loss, which, as one of ordinary skill in the art would realize, is a loss based on a conditional probability as it approximates Eq. 2.")
	 identifying … the second training sample from among the plurality of training labels based on the computed influence of … the identified second training sample; (Page 6 states "A simple strategy, which we do employ in all of our examples and experiments, is to restrict the search to counter-examples whose label in the training set is the same as the prediction for the suspicious example, i.e.,                         
                            
                                    y
                                
                                    k
                                
                            =
                            
                                            y
                                        
                                        ^
                                    
                                    t
                                
                    .This way, the annotator can interpret the counter-example as being in support of the machine’s suspicion." As the counter-example’s label is searched for, it is identified and selected. As explained above, selecting the counter-example is based on the computed influence of the part of the training sample.)
modifying the training set based on the identified … second training sample to obtain a corrected training set; and (Page 3 states "Next, CINCER asks the annotator to double-check the pair (                        
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            ,
                            
                                    z
                                
                                    k
                                
                    ) and relabel the suspicious example, the counter-example, or both, thus resolving the potential inconsistency. The data set and model are then updated accordingly (line 9) and the loop repeats." As the counter-example is presented, the modification is based on the identified part of the training sample and the corresponding source label.)
	retraining the neural network using the corrected training set. (Page 3 states "Next, CINCER asks the annotator to double-check the pair (                        
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            ,
                            
                                    z
                                
                                    k
                                
                    ) and relabel the suspicious example, the counter-example, or both, thus resolving the potential inconsistency. The data set and model are then updated accordingly (line 9) and the loop repeats." Updating the model is interpreted as retraining. Additionally, line 10 of Algorithm 1 on page 4 states "10: fit                         
                            
                                    θ
                                
                                    t
                                
                     on                         
                            
                                    D
                                
                                    t
                                
                    ”. As                         
                            
                                    D
                                
                                    t
                                
                     is the updated training set,                         
                            θ
                        
                    , the neural network, is retrained.)
Teso does not appear to explicitly teach
wherein the first training sample includes a first plurality of parts and the second training sample includes a second plurality of parts, and wherein each of the first plurality of parts and the second plurality of parts comprises an index indicating a location within the first training sample or the second training sample;
[labels] corresponding to the first plurality of parts of the first training sample
[computing an influence of] parts, wherein the influence for each part of the second plurality of parts quantifies a degree to which the part causes the neural network to make the prediction of the [label];
a part of the second training sample
However, Qin—directed to analogous art—teaches
where in the first training sample includes a first plurality of parts and the second training sample includes a second plurality of parts, and wherein each of the first plurality of parts and the second plurality of parts comprises an index indicating a location within the first training sample or the second training sample; (Page 40 states "Similar to TracInS, measures how the model loss for a span on test instance 𝑧′ varies with a span in training instance 𝑧." The test instance                         
                            z
                            '
                        
                     is interpreted as the first training sample and the test instance                         
                            z
                            '
                        
                     is interpreted as the second training sample. Page 40 states "First, we learn from the use of span in TracIn+. Then, Set training span                         
                            
                                    z
                                
                                    i
                                    j
                                
                            =
                            [
                            
                                    t
                                
                                    i
                                
                            ,
                             
                            …
                            ,
                             
                                    t
                                
                                    j
                                
                            ]
                        
                    , where 0 ≤ 𝑖 < 𝑗 ≤ 𝑚." Page 40 states "Generally, we choose a speaker turn as span                         
                            
                                    z
                                
                                    i
                                    j
                                
                     Roll out the instance 𝑧 with K rounds from the first speaker turn to get a total of K different instances                         
                            
                                    z
                                
                                    k
                                
                     from top-1 to top-K, which can be achieved using the work in Section A.” Therefore,                         
                            i
                        
                     and                         
                            j
                        
                     indicate an index indicating a location within the second training sample and                         
                            
                                    t
                                
                                    i
                                
                            ,
                             
                            …
                            ,
                             
                                    t
                                
                                    j
                                
                     is a part in the K second plurality of parts. Page 40 states "Similarly, combining the above methods, we add any span test data to achieve a more fine-grained unit-based influence score calculation, as (8)." As the test data is also a span, one of ordinary skill in the art would be able to conclude, from Eq. 8 that test span                         
                            
                                    z
                                
                                    k
                                    l
                                
                            =
                            [
                            
                                    t
                                
                                    k
                                
                            ,
                             
                            …
                            ,
                             
                                    t
                                
                                    l
                                
                            ]
                        
                    , where 0 ≤ k < l ≤ n. " Therefore,                         
                            k
                        
                     and                         
                            l
                        
                     indicate an index indicating a location within the first training sample and                         
                            
                                    t
                                
                                    k
                                
                            ,
                             
                            …
                            ,
                             
                                    t
                                
                                    l
                                
                     is a part of the first plurality of parts, which are the test data.)
[labels] corresponding to the first plurality of parts of the first training sample (Page 40 states "For text generation, each token has a corresponding label whose number N is the vocabulary size." Page 40 states "𝑡 is each token in the instance." Therefore,                         
                            
                                    t
                                
                                    k
                                
                            ,
                             
                            …
                            ,
                             
                                    t
                                
                                    l
                                
                    , which is the first plurality of parts from the first training sample and are tokens, have corresponding labels.) 
[computing an influence of] parts, wherein the influence for each part of the second plurality of parts quantifies a degree to which the part causes the neural network to make the prediction of the [label]; (Page 40 "Similar to TracInS, measures how the model loss for a span on test instance 𝑧′ varies with a span in training instance 𝑧. The more important this training span is, the more impact it has on the test instances, and the greater the impact score should be." Page 40 states "Based on the above method, by subtracting the previous gradient vector from the gradient vector of each top-i, we can get the influence score of spans (the first subtraction minus an empty utterance that only contains start and terminator)." One of ordinary skill in the art would realize that the impact on the test instances means that the degree to which the part causes the neural network to make the prediction of the label from the test instance.)
a part of the second training sample (Page 40 "Similar to TracInS, measures how the model loss for a span on test instance 𝑧′ varies with a span in training instance 𝑧. The more important this training span is, the more impact it has on the test instances, and the greater the impact score should be." The span on the test instance z’ is interpreted as the part of the training sample.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Teso and Qin because, as stated by Qin on page 39, "For sequence-to-sequence applications, long natural language texts are common. For sample-based explanation methods, model decisions may actually depend on specific parts of training instances. Unlike classification tasks, the generation of target sequences in such tasks is a multi-classification problem, and the number of labels is the total number of token categories in the vocabulary. Therefore, how to apply influence functions to multi-label tasks is the core problem."

Regarding claim 4, the rejection of claim 1 is incorporated herein. Teso teaches
	receiving a corrected label via a user interface; and (Page 3 states "Next, asks the annotator to double-check the pair and relabel the suspicious example, CINCER (                        
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            ,
                            
                                    z
                                
                                    k
                                
                    ) the counter-example, or both, thus resolving the potential inconsistency. The data set and model are then updated accordingly (line 9) and the loop repeats." In order for a human user, such as the annotator, to receive the information a user interface must be used to communicate the information. As the data set and model are updated according to the corrected labels, the corrected labels must have been received.)
labeling the identified part with the corrected label, wherein the corrected training set includes the corrected label. (Page 4, Algorithm 1, lines 7-10 state “find counterexample                         
                            
                                    z
                                
                                    k
                                
                     using Eq.12 … present                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            ,
                             
                                    z
                                
                                    k
                                
                    to user, receive possibly cleaned labels                         
                            
                                    y
                                
                                    t
                                
                                    '
                                
                            ,
                             
                                    y
                                
                                    k
                                
                                    '
                                
                     …                         
                            
                                    D
                                
                                    t
                                
                            ←
                            (
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                            {
                            
                                    z
                                
                                    k
                                
                            }
                            )
                            ∪
                            {
                            
                                            x
                                        
                                            t
                                        
                                    ,
                                    
                                            y
                                        
                                            t
                                        
                                            '
                                        
                            ,
                             
                            (
                            
                                    x
                                
                                    k
                                
                            ,
                            
                                    y
                                
                                    k
                                
                                    '
                                
                            )
                            }
                        
                     … fit                         
                            
                                    θ
                                
                                    t
                                
                     on                         
                            
                                    D
                                
                                    t
                                
                    ”. Therefore, the training set is                         
                            
                                    D
                                
                                    t
                                
                     and the parts are identified with the corrected labels and the corrected data set includes the corrected labels.)

Regarding claim 9, the rejection of claim 1 is incorporated herein. Teso does not appear to explicitly teach
the first sample comprises a text sample and the false label corresponds to a phrase of the text sample.
However, Qin—directed to analogous art—teaches
the input sample comprises a text sample and the false label corresponds to a phrase of the text sample. (Page 39 states "In pursuit of interpretability, we choose the daily dialogue generation task for research, and select the dialogue material Daily Dialog as the original dataset. Daily Dialog consists of multiple turns of daily dialogue. On average there are around 8 speaker turns per dialogue with around 15 tokens per turn." Page 40 states "For text generation, each token has a corresponding label whose number N is the vocabulary size." Page 40 states "𝑡 is each token in the instance." Therefore,                         
                            
                                    t
                                
                                    k
                                
                            ,
                             
                            …
                            ,
                             
                                    t
                                
                                    l
                                
                    , which is the first plurality of parts from the first training sample and are tokens, have corresponding labels. The tokens are interpreted as the phrases which have the labels.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Teso with the teachings of Qin for the reasons given above in regards to claim 1.

Regarding claim 11, Teso teaches
A method for training a neural network, comprising: (The abstract states "We tackle sequential learning under label noise in applications where a human supervisor can be queried to relabel suspicious examples." Page 3 states “We consider a very general class of probabilistic classifiers                         
                            f
                            :
                            
                                    R
                                
                                    d
                                
                            →
                            [
                            c
                            ]
                        
                     of the form                         
                            f
                            
                                    x
                                    ;
                                    θ
                                
                            ≔
                            
                                    argmax
                                
                                    y
                                    ∈
                                    
                                            c
                                        
                            P
                            (
                            y
                            |
                            x
                            ;
                            θ
                            )
                        
                    , where the conditional distribution                         
                            P
                            (
                            Y
                            |
                            X
                            ;
                            θ
                            )
                        
                     has been fit on training data by minimizing the cross-entropy loss                         
                            l
                            (
                            
                                    x
                                    ,
                                    y
                                    ,
                                    θ
                                
                            =
                            -
                            
                                    ∑
                                    
                                        i
                                        ∈
                                        
                                                c
                                            
                                    l
                                    
                                            i
                                            =
                                            y
                                        
                                    log
                                    P
                                    (
                                    i
                                    |
                                    x
                                    ,
                                    θ
                                    )
                                
                    . In our implementation, we also assume                         
                            P
                        
                     to be a neural network with a softmax activation at the top layer, trained using some variant of SGD and possibly early stopping.”)
	obtaining a training set that has been used for training a neural network, the training set including a first training sample and a second training sample, (Page 3 states “At the beginning of iteration                         
                            t
                        
                    , the machine has acquired a training set                         
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                            =
                            
                                            z
                                        
                                            1
                                        
                                    ,
                                     
                                    …
                                    ,
                                     
                                            z
                                        
                                            t
                                            -
                                            1
                                        
                     and trained a model with parameters                         
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                     on it. At this point, the machine receives a new, possibly mislabeled example                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     (line 3) and has to decide whether to trust it. Following skeptical learning [5], CINCER does so by computing the margin                         
                            μ
                            
                                                    z
                                                
                                                ~
                                            
                                            t
                                        
                                    ,
                                     
                                            θ
                                        
                                            t
                                            -
                                            1
                                        
                    , i.e., the difference in conditional probability between the model’s prediction                         
                            
                                            y
                                        
                                        ^
                                    
                                    t
                                
                            ≔
                            
                                    argmax
                                
                                    y
                                
                            P
                            
                                    y
                                
                                            x
                                        
                                            t
                                        
                                    ,
                                    
                                            θ
                                        
                                            t
                                            -
                                            1
                                        
                     and the annotation                         
                            
                                            y
                                        
                                        ~
                                    
                                    t
                                
                    .” Page 3 states “If                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     is compatible, it is added to the data set as-is (line 5). Otherwise, CINCER computes a counterexample                         
                            
                                    z
                                
                                    k
                                
                            ∈
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                     that maximally supports the machine’s suspicion.”                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     is interpreted as the first training sample and                         
                            
                                    z
                                
                                    k
                                
                     is interpreted as the second training sample.)
identifying a false label from among a plurality of predicted labels …, wherein the plurality of predicted labels is generated by a neural network trained based on a training set comprising a plurality of training samples and a plurality of training labels; (Page 3 states "The pseudo-code of CINCER is listed in Algorithm 1. At the beginning of iteration                         
                            t
                        
                    , the machine has acquired a training set                         
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                            =
                            
                                            z
                                        
                                            1
                                        
                                    ,
                                     
                                    …
                                    ,
                                     
                                            z
                                        
                                            t
                                            -
                                            1
                                        
                     and trained a model with parameters                         
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                     on it. At this point, the machine receives a new, possibly mislabeled example                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     (line 3) and has to decide whether to trust it. Following skeptical learning [5], CINCER does so by computing the margin                         
                            μ
                            
                                                    z
                                                
                                                ~
                                            
                                            t
                                        
                                    ,
                                     
                                            θ
                                        
                                            t
                                            -
                                            1
                                        
                    , i.e., the difference in conditional probability between the model’s prediction                         
                            
                                            y
                                        
                                        ^
                                    
                                    t
                                
                            ≔
                            
                                    argmax
                                
                                    y
                                
                            P
                            
                                    y
                                
                                            x
                                        
                                            t
                                        
                                    ,
                                    
                                            θ
                                        
                                            t
                                            -
                                            1
                                        
                     and the annotation                         
                            
                                            y
                                        
                                        ~
                                    
                                    t
                                
                    ." Page 3 further states "The example                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     is deemed compatible if the margin is below a given threshold                         
                            τ
                        
                     and suspicious otherwise". The suspicious label is interpreted as the false label. Page 3 states “We consider a very general class of probabilistic classifiers                         
                            f
                            :
                            
                                    R
                                
                                    d
                                
                            →
                            [
                            c
                            ]
                        
                     of the form                         
                            f
                            
                                    x
                                    ;
                                    θ
                                
                            ≔
                            
                                    argmax
                                
                                    y
                                    ∈
                                    
                                            c
                                        
                            P
                            (
                            y
                            |
                            x
                            ;
                            θ
                            )
                        
                    , where the conditional distribution                         
                            P
                            (
                            Y
                            |
                            X
                            ;
                            θ
                            )
                        
                     has been fit on training data by minimizing the cross-entropy loss                         
                            l
                            (
                            
                                    x
                                    ,
                                    y
                                    ,
                                    θ
                                
                            =
                            -
                            
                                    ∑
                                    
                                        i
                                        ∈
                                        
                                                c
                                            
                                    l
                                    
                                            i
                                            =
                                            y
                                        
                                    log
                                    P
                                    (
                                    i
                                    |
                                    x
                                    ,
                                    θ
                                    )
                                
                    . In our implementation, we also assume                         
                            P
                        
                     to be a neural network with a softmax activation at the top layer, trained using some variant of SGD and possibly early stopping.” Page 4 states "Intuitively,                         
                            
                                    z
                                
                                    k
                                
                            ∈
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                    is a contrastive counter-example for a  suspicious example                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     if removing it from the data set and retraining leads to a model with parameters                         
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                                    -
                                    k
                                
                     that assigns higher probability to the suspicious label                         
                            
                                            y
                                        
                                        ~
                                    
                                    t
                                
                            .
                        
                     The most contrastive counter-example is then the one that maximally affects the change in probability:                         
                            
                                    argmax
                                
                                    k
                                    ∈
                                    
                                            t
                                            -
                                            1
                                        
                            {
                            P
                            
                                            y
                                        
                                        ~
                                    
                                            x
                                        
                                            t
                                        
                                    ;
                                    
                                            θ
                                        
                                            t
                                            -
                                            1
                                        
                                            -
                                            k
                                        
                            -
                            P
                            (
                            
                                    y
                                
                                ~
                            
                            |
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                            )
                            }
                        
                     (2)". Therefore, as                         
                            P
                        
                     is a neural network with parameters                         
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                    , and the conditional distribution                         
                            P
                            (
                            
                                    y
                                
                                ~
                            
                            |
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                            )
                        
                     has been fit on training data,                         
                            P
                            (
                            
                                    y
                                
                                ~
                            
                            |
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                            )
                        
                     is the probability that the neural network                         
                            P
                        
                     will predict                         
                            
                                    y
                                
                                ~
                            
                    , and                         
                            
                                    y
                                
                                ~
                            
                    , in the broadest reasonable interpretation, is a predicted label generated by a neural network. As the algorithm is repeated for each training sample,                         
                            
                                    y
                                
                                ~
                            
                     is selected from among a plurality of predicted labels.)
	 computing an influence of each of the second plurality of [labels] on the false label by approximating a change in a conditional loss for the neural network, … (Page 4 states "Intuitively,                         
                            
                                    z
                                
                                    k
                                
                            ∈
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                    is a contrastive counter-example for a  suspicious example                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                     if removing it from the data set and retraining leads to a model with parameters                         
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                                    -
                                    k
                                
                     that assigns higher probability to the suspicious label                         
                            
                                            y
                                        
                                        ~
                                    
                                    t
                                
                            .
                        
                    "  Page 5 states "Contrastive counter-examples are highly influential. Can this algorithm be used for identifying influential counter-examples? It turns out that, as long as the model is obtained by optimizing the cross-entropy loss, the answer is affirmative." Teso provides equations for calculating the influence of the training labels that approximates eq. 2 (page 4) which is                         
                            
                                    argmax
                                
                                    k
                                    ∈
                                    
                                            t
                                            -
                                            1
                                        
                            {
                            P
                            
                                            y
                                        
                                        ~
                                    
                                            x
                                        
                                            t
                                        
                                    ;
                                    
                                            θ
                                        
                                            t
                                            -
                                            1
                                        
                                            -
                                            k
                                        
                            -
                            P
                            (
                            
                                    y
                                
                                ~
                            
                            |
                            
                                    x
                                
                                    t
                                
                            ;
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                            )
                            }
                        
                    . One of ordinary skill in the art would understand, as the false label is                         
                            
                                    y
                                
                                ~
                            
                    , that the influence would be calculated on the false label. Algorithm 1 (page 4) shows that this calculation happens for each label. Algorithm 1 states that the influence is calculated using Eq. 12 (equivalent to Eq. 2) which is                         
                            
                                    argmax
                                
                                    k
                                    ∈
                                    
                                            t
                                            -
                                            1
                                        
                            ∇
                            l
                            
                                                            z
                                                        
                                                        ~
                                                    
                                                    t
                                                
                                            ,
                                            
                                                    θ
                                                
                                                    t
                                                    -
                                                    1
                                                
                                    ⊤
                                
                            H
                            
                                                    θ
                                                
                                                    t
                                                    -
                                                    1
                                                
                                    -
                                    1
                                
                                    ∇
                                
                                    θ
                                
                            l
                            (
                            
                                    z
                                
                                    k
                                
                            ,
                            
                                    θ
                                
                                    t
                                    -
                                    1
                                
                            )
                        
                     [page 5]. One of ordinary skill would realize that this is an approximation of the change of the loss, which, as one of ordinary skill in the art would realize, is a loss based on a conditional probability as it approximates Eq. 2.")
	 identifying … the second training sample from among the plurality of training labels based on the computed influence of … the identified second training sample; (Page 6 states "A simple strategy, which we do employ in all of our examples and experiments, is to restrict the search to counter-examples whose label in the training set is the same as the prediction for the suspicious example, i.e.,                         
                            
                                    y
                                
                                    k
                                
                            =
                            
                                            y
                                        
                                        ^
                                    
                                    t
                                
                    .This way, the annotator can interpret the counter-example as being in support of the machine’s suspicion." As the counter-example’s label is searched for, it is identified and selected. As explained above, selecting the counter-example is based on the computed influence of the part of the training sample.)
correcting a label corresponding to a part of the second training sample based on the computed influence of the part of the second training sample to obtain a corrected training set (Page 4, Algorithm 1, lines 7-10 state “find counterexample                         
                            
                                    z
                                
                                    k
                                
                     using Eq.12 … present                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            ,
                             
                                    z
                                
                                    k
                                
                    to user, receive possibly cleaned labels                         
                            
                                    y
                                
                                    t
                                
                                    '
                                
                            ,
                             
                                    y
                                
                                    k
                                
                                    '
                                
                     …                         
                            
                                    D
                                
                                    t
                                
                            ←
                            (
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                            {
                            
                                    z
                                
                                    k
                                
                            }
                            )
                            ∪
                            {
                            
                                            x
                                        
                                            t
                                        
                                    ,
                                    
                                            y
                                        
                                            t
                                        
                                            '
                                        
                            ,
                             
                            (
                            
                                    x
                                
                                    k
                                
                            ,
                            
                                    y
                                
                                    k
                                
                                    '
                                
                            )
                            }
                        
                     … fit                         
                            
                                    θ
                                
                                    t
                                
                     on                         
                            
                                    D
                                
                                    t
                                
                    ”. Therefore, the training set is                         
                            
                                    D
                                
                                    t
                                
                     and the parts are identified with the corrected labels and the corrected data set includes the corrected labels.)
	retraining the neural network using the corrected training set. (Page 3 states "Next, CINCER asks the annotator to double-check the pair (                        
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            ,
                            
                                    z
                                
                                    k
                                
                    ) and relabel the suspicious example, the counter-example, or both, thus resolving the potential inconsistency. The data set and model are then updated accordingly (line 9) and the loop repeats." Updating the model is interpreted as retraining. Additionally, line 10 of Algorithm 1 on page 4 states "10: fit                         
                            
                                    θ
                                
                                    t
                                
                     on                         
                            
                                    D
                                
                                    t
                                
                    ”. As                         
                            
                                    D
                                
                                    t
                                
                     is the updated training set,                         
                            θ
                        
                    , the neural network, is retrained.)
Teso does not appear to explicitly teach
where in the first training sample includes a first plurality of parts and the second training sample includes a second plurality of parts, and wherein each of the first plurality of parts and the second plurality of parts comprises an index indicating a location within the first training sample or the second training sample.
However, Qin—directed to analogous art—teaches
where in the first training sample includes a first plurality of parts and the second training sample includes a second plurality of parts, and wherein each of the first plurality of parts and the second plurality of parts comprises an index indicating a location within the first training sample or the second training sample. (Page 40 states "Similar to TracInS, measures how the model loss for a span on test instance 𝑧′ varies with a span in training instance 𝑧." The test instance                         
                            z
                            '
                        
                     is interpreted as the first training sample and the test instance                         
                            z
                            '
                        
                     is interpreted as the second training sample. Page 40 states "First, we learn from the use of span in TracIn+. Then, Set training span                         
                            
                                    z
                                
                                    i
                                    j
                                
                            =
                            [
                            
                                    t
                                
                                    i
                                
                            ,
                             
                            …
                            ,
                             
                                    t
                                
                                    j
                                
                            ]
                        
                    , where 0 ≤ 𝑖 < 𝑗 ≤ 𝑚." Page 40 states "Generally, we choose a speaker turn as span                         
                            
                                    z
                                
                                    i
                                    j
                                
                     Roll out the instance 𝑧 with K rounds from the first speaker turn to get a total of K different instances                         
                            
                                    z
                                
                                    k
                                
                     from top-1 to top-K, which can be achieved using the work in Section A.” Therefore,                         
                            i
                        
                     and                         
                            j
                        
                     indicate an index indicating a location within the second training sample and                         
                            
                                    t
                                
                                    i
                                
                            ,
                             
                            …
                            ,
                             
                                    t
                                
                                    j
                                
                     is a part in the K second plurality of parts. Page 40 states "Similarly, combining the above methods, we add any span test data to achieve a more fine-grained unit-based influence score calculation, as (8)." As the test data is also a span, one of ordinary skill in the art would be able to conclude, from Eq. 8 that test span                         
                            
                                    z
                                
                                    k
                                    l
                                
                            =
                            [
                            
                                    t
                                
                                    k
                                
                            ,
                             
                            …
                            ,
                             
                                    t
                                
                                    l
                                
                            ]
                        
                    , where 0 ≤ k < l ≤ n. " Therefore,                         
                            k
                        
                     and                         
                            l
                        
                     indicate an index indicating a location within the first training sample and                         
                            
                                    t
                                
                                    k
                                
                            ,
                             
                            …
                            ,
                             
                                    t
                                
                                    l
                                
                     is a part of the first plurality of parts, which are the test data.)
[labels] corresponding to the first plurality of parts of the first training sample (Page 40 states "For text generation, each token has a corresponding label whose number N is the vocabulary size." Page 40 states "𝑡 is each token in the instance." Therefore,                         
                            
                                    t
                                
                                    k
                                
                            ,
                             
                            …
                            ,
                             
                                    t
                                
                                    l
                                
                    , which is the first plurality of parts from the first training sample and are tokens, have corresponding labels.) 
[computing an influence of] parts, wherein the influence for each part of the second plurality of parts quantifies a degree to which the part causes the neural network to make the prediction of the [label]; (Page 40 "Similar to TracInS, measures how the model loss for a span on test instance 𝑧′ varies with a span in training instance 𝑧. The more important this training span is, the more impact it has on the test instances, and the greater the impact score should be." Page 40 states "Based on the above method, by subtracting the previous gradient vector from the gradient vector of each top-i, we can get the influence score of spans (the first subtraction minus an empty utterance that only contains start and terminator)." One of ordinary skill in the art would realize that the impact on the test instances means that the degree to which the part causes the neural network to make the prediction of the label from the test instance.)
a part of the second training sample (Page 40 "Similar to TracInS, measures how the model loss for a span on test instance 𝑧′ varies with a span in training instance 𝑧. The more important this training span is, the more impact it has on the test instances, and the greater the impact score should be." The span on the test instance z’ is interpreted as the part of the training sample.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Teso and Qin because, as stated by Qin on page 39, "For sequence-to-sequence applications, long natural language texts are common. For sample-based explanation methods, model decisions may actually depend on specific parts of training instances. Unlike classification tasks, the generation of target sequences in such tasks is a multi-classification problem, and the number of labels is the total number of token categories in the vocabulary. Therefore, how to apply influence functions to multi-label tasks is the core problem."

Regarding claim 16, the rejection of claim 11 is incorporated herein. Teso teaches
	receiving a corrected label via a user interface; and (Page 3 states "Next, asks the annotator to double-check the pair and relabel the suspicious example, CINCER (                        
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            ,
                            
                                    z
                                
                                    k
                                
                    ) the counter-example, or both, thus resolving the potential inconsistency. The data set and model are then updated accordingly (line 9) and the loop repeats." In order for a human user, such as the annotator, to receive the information a user interface must be used to communicate the information. As the data set and model are updated according to the corrected labels, the corrected labels must have been received.)
replacing the label with the corrected label, wherein the corrected training set includes the corrected label. (Page 4, Algorithm 1, lines 7-10 state “find counterexample                         
                            
                                    z
                                
                                    k
                                
                     using Eq.12 … present                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            ,
                             
                                    z
                                
                                    k
                                
                    to user, receive possibly cleaned labels                         
                            
                                    y
                                
                                    t
                                
                                    '
                                
                            ,
                             
                                    y
                                
                                    k
                                
                                    '
                                
                     …                         
                            
                                    D
                                
                                    t
                                
                            ←
                            (
                            
                                    D
                                
                                    t
                                    -
                                    1
                                
                            {
                            
                                    z
                                
                                    k
                                
                            }
                            )
                            ∪
                            {
                            
                                            x
                                        
                                            t
                                        
                                    ,
                                    
                                            y
                                        
                                            t
                                        
                                            '
                                        
                            ,
                             
                            (
                            
                                    x
                                
                                    k
                                
                            ,
                            
                                    y
                                
                                    k
                                
                                    '
                                
                            )
                            }
                        
                     … fit                         
                            
                                    θ
                                
                                    t
                                
                     on                         
                            
                                    D
                                
                                    t
                                
                    ”. Therefore, the training set is                         
                            
                                    D
                                
                                    t
                                
                     and the labels are replaced with the corrected labels and the corrected data set includes the corrected labels.)

Regarding claim 17, Teso teaches
A system for training a machine learning model, the system comprising: (The abstract states "We tackle sequential learning under label noise in applications where a human supervisor can be queried to relabel suspicious examples." Page 3 states “We consider a very general class of probabilistic classifiers                         
                            f
                            :
                            
                                    R
                                
                                    d
                                
                            →
                            [
                            c
                            ]
                        
                     of the form                         
                            f
                            
                                    x
                                    ;
                                    θ
                                
                            ≔
                            
                                    argmax
                                
                                    y
                                    ∈
                                    
                                            c
                                        
                            P
                            (
                            y
                            |
                            x
                            ;
                            θ
                            )
                        
                    , where the conditional distribution                         
                            P
                            (
                            Y
                            |
                            X
                            ;
                            θ
                            )
                        
                     has been fit on training data by minimizing the cross-entropy loss                         
                            l
                            (
                            
                                    x
                                    ,
                                    y
                                    ,
                                    θ
                                
                            =
                            -
                            
                                    ∑
                                    
                                        i
                                        ∈
                                        
                                                c
                                            
                                    l
                                    
                                            i
                                            =
                                            y
                                        
                                    log
                                    P
                                    (
                                    i
                                    |
                                    x
                                    ,
                                    θ
                                    )
                                
                    . In our implementation, we also assume                         
                            P
                        
                     to be a neural network with a softmax activation at the top layer, trained using some variant of SGD and possibly early stopping.” Page 6 states "We implemented CINCER using Python and Tensorflow [25] on top of three classifiers and compared different counter-example selection strategies on five data sets. The IF code is adapted from [26]. All experiments were run on a 12-core machine with 16 GiB of RAM and no GPU." The machine is interpreted as the system.)
a memory component; and (Page 6 states "We implemented CINCER using Python and Tensorflow [25] on top of three classifiers and compared different counter-example selection strategies on five data sets. The IF code is adapted from [26]. All experiments were run on a 12-core machine with 16 GiB of RAM and no GPU." The RAM is interpreted as the memory component.)
a processing device coupled to the memory component, the processing device configured to perform operations comprising: (Page 6 states "We implemented CINCER using Python and Tensorflow [25] on top of three classifiers and compared different counter-example selection strategies on five data sets. The IF code is adapted from [26]. All experiments were run on a 12-core machine with 16 GiB of RAM and no GPU." The cores are interpreted as the processing device, which are coupled to the memory device, RAM.)
The remainder of claim 17 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Regarding claim 18, the rejection of claim 17 is incorporated herein. Teso teaches
a modification component configured to identify the false label and to modify the training set. (Page 6 states "A simple strategy, which we do employ in all of our examples and experiments, is to restrict the search to counter-examples whose label in the training set is the same as the prediction for the suspicious example, i.e.,                         
                            
                                    y
                                
                                    k
                                
                            =
                            
                                            y
                                        
                                        ^
                                    
                                    t
                                
                    .This way, the annotator can interpret the counter-example as being in support of the machine’s suspicion." As the counter-example’s label is searched for, it is identified and selected. As explained above, selecting the counter-example is based on the computed influence. Page 3 states "Next, CINCER asks the annotator to double-check the pair (                        
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            ,
                            
                                    z
                                
                                    k
                                
                    ) and relabel the suspicious example, the counter-example, or both, thus resolving the potential inconsistency. The data set and model are then updated accordingly (line 9) and the loop repeats." As the counter-example is presented, the modification is based on the identified part of the training sample and the corresponding source label. The part of the computer used to perform these steps are interpreted as the modification component.)

Claim(s) 2, 3, 14, 15, 19, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Teso (“Interactive Label Cleaning with Example-based Explanations”, 2021) and Qin (“On Sample Based Explanation Methods for Sequence-to-Sequence Applications”, June 2022) as applied to claim 1 above, and further in view of da Silva (“Improving Named Entity Recognition using Deep Learning with Human in the Loop”, 2019).

	Regarding claim 2, the rejection of claim 1 is incorporated herein. Teso does not appear to explicitly teach
	displaying the first training sample and the plurality of predicted labels in a user interface; and
receiving a user input identifying the false label from among the plurality of predicted labels via the user interface.
However, da Silva—directed to analogous art—teaches
	displaying the first training sample and the plurality of predicted labels in a user interface; and (Page 595 states "Human NERD presents to the user an interactive web-based annotation interface used for adding entity annotations or editing automatic pre-annotations in                         
                            O
                        
                    . As the entities are labeled in                         
                            O
                        
                    , users (as reviewers) then accept or reject these to indicate which ones are true. Each document                         
                            
                                    t
                                
                                    i
                                
                            ∈
                            O
                        
                     is presented to one user. Thus no two users labeled the same document at the same instant of time. This step outputs                         
                            O
                        
                     with its user corrections." The document is interpreted as the input sample. The annotations are interpreted as the plurality of predicted labels, as page 595 states "The deep learning model is initially trained with                        
                            D
                            =
                            {
                            
                                            x
                                        
                                            i
                                        
                                    ,
                                    
                                            Y
                                        
                                            i
                                        
                            :
                            i
                            =
                            1
                            .
                            .
                            m
                            }
                        
                    , a set of labeled training examples                         
                            
                                    x
                                
                                    i
                                
                    , where                         
                            
                                    Y
                                
                                    i
                                
                            ⊆
                            L
                        
                     the set of labels of the i-th example. At this step, the pre-trained model classifies the entity mentions on                         
                            T
                            -
                            
                                            t
                                        
                                            i
                                        
                                    ,
                                    …
                                    ,
                                     
                                            t
                                        
                                            m
                                        
                     using the labels described on                         
                            L
                        
                     and outputs                         
                            O
                            =
                            
                                            t
                                        
                                            1
                                        
                                    ,
                                     
                                            Y
                                        
                                            i
                                        
                            :
                            i
                            =
                            1
                            …
                            m
                        
                    , where                         
                            
                                    t
                                
                                    i
                                
                            ∈
                            T
                        
                     and                         
                            Y
                            ⊆
                            L
                        
                     the set of labels of the i-th document.")
receiving a user input identifying the false label from among the plurality of predicted labels via the user interface. (Page 595 states “As the entities are labeled in                         
                            O
                        
                    , users (as reviewers) then accept or reject these to indicate which ones are true. Each document                         
                            
                                    t
                                
                                    i
                                
                            ∈
                            O
                        
                     is presented to one user. Thus no two users labeled the same document at the same instant of time. This step outputs                         
                            O
                        
                     with its user corrections.” Rejecting the labels is interpreted as identifying the false label, as it indicates which ones are true, which also indicates which ones are false. Page 595 further states "Based on the user corrections, the NER model can learn and improve from                         
                            O
                        
                    ." As the system uses the user input to updated the NER model, the user input must have been received.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Teso and Qin with the human annotation system of da Silva because, as the abstract states, "Named Entity Recognition (NER) is a challenging problem in Natural Language Processing (NLP). Deep Learning techniques have been extensively applied in NER tasks because they require little feature engineering and are free from language-specific resources, learning important features from word or character embeddings trained on large amounts of data. However, these techniques are data-hungry and require a massive amount of training data. This work proposes Human NERD (stands for Human Named Entity Recognition with Deep learning) which addresses this problem by including humans in the loop."

	Regarding claim 3, the rejection of claim 1 is incorporated herein. Teso teaches
	displaying the identified part of the second training sample in a user interface (Page 4, Algorithm 1, line 3 states “receive new example                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            =
                            (
                            
                                    x
                                
                                    t
                                
                            ,
                             
                                            y
                                        
                                        ~
                                    
                                    t
                                
                            )
                        
                    ”, meaning that the example includes the sample, in this case,                         
                            
                                    x
                                
                                    t
                                
                     and the label, in this case,                         
                            
                                            y
                                        
                                        ~
                                    
                                    t
                                
                    . Page 4, Algorithm 1, lines 7 and 8 states “find counterexample                         
                            
                                    z
                                
                                    k
                                
                     using Eq.12 … present                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            ,
                             
                                    z
                                
                                    k
                                
                    to user, receive possibly cleaned labels                         
                            
                                    y
                                
                                    t
                                
                                    '
                                
                            ,
                             
                                    y
                                
                                    k
                                
                                    '
                                
                    ”. Therefore, the part of the training sample (the counter-example) and the corresponding source label (the original counter-example label) are presented to the user. In order for a human user, such as the annotator, to receive the information, a user interface must be used to display the information.)
	Teso does not appear to explicitly teach
receiving a user input corresponding to the part of the training sample, wherein the identified part of the training sample is identified based on the user input 
However, da Silva teaches
receiving a user input corresponding to the part of the training sample, wherein the identified part of the training sample is identified based on the user input (Page 596 states "If the reviewer identifies an entity not annotated by the model, he/she can manually label it. In this case, first, he/she should click on the class label (on top of the Figure 2), then the class will appear in evidence. After that, the reviewer selects the sequence of words in the document to annotate." The sequence of words is interpreted as the part of the training sample, as it is a part of the document. Page 595 states "Based on the user corrections, the NER model can learn and improve from O." Therefore, the training sample is identified based on the input.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Teso and Qin with the human annotation system of da Silva for the reasons given above in regards to claim 3.

Regarding claim 14, the rejection of claim 11 is incorporated herein. Teso does not appear to explicitly teach
	displaying the first training sample and the plurality of predicted labels in a user interface; and
receiving a user input identifying the false label from among the plurality of predicted labels via the user interface.
However, da Silva—directed to analogous art—teaches
	displaying the first training sample and the plurality of predicted labels in a user interface; and
(Page 595 states "Human NERD presents to the user an interactive web-based annotation interface used for adding entity annotations or editing automatic pre-annotations in                         
                            O
                        
                    . As the entities are labeled in                         
                            O
                        
                    , users (as reviewers) then accept or reject these to indicate which ones are true. Each document                         
                            
                                    t
                                
                                    i
                                
                            ∈
                            O
                        
                     is presented to one user. Thus no two users labeled the same document at the same instant of time. This step outputs                         
                            O
                        
                     with its user corrections." The document is interpreted as the input sample. The annotations are interpreted as the plurality of predicted labels, as page 595 states "The deep learning model is initially trained with                        
                             
                            D
                            =
                            {
                            
                                            x
                                        
                                            i
                                        
                                    ,
                                    
                                            Y
                                        
                                            i
                                        
                            :
                            i
                            =
                            1
                            .
                            .
                            m
                            }
                        
                    , a set of labeled training examples                         
                            
                                    x
                                
                                    i
                                
                    , where                         
                            
                                    Y
                                
                                    i
                                
                            ⊆
                            L
                        
                     the set of labels of the i-th example. At this step, the pre-trained model classifies the entity mentions on                         
                            T
                            -
                            
                                            t
                                        
                                            i
                                        
                                    ,
                                    …
                                    ,
                                     
                                            t
                                        
                                            m
                                        
                     using the labels described on                         
                            L
                        
                     and outputs                         
                            O
                            =
                            
                                            t
                                        
                                            1
                                        
                                    ,
                                     
                                            Y
                                        
                                            i
                                        
                            :
                            i
                            =
                            1
                            …
                            m
                        
                    , where                         
                            
                                    t
                                
                                    i
                                
                            ∈
                            T
                        
                     and                         
                            Y
                            ⊆
                            L
                        
                     the set of labels of the i-th document.")
receiving a user input identifying the false label from among the plurality of predicted labels via the user interface. (Page 595 states “As the entities are labeled in                         
                            O
                        
                    , users (as reviewers) then accept or reject these to indicate which ones are true. Each document                         
                            
                                    t
                                
                                    i
                                
                            ∈
                            O
                        
                     is presented to one user. Thus no two users labeled the same document at the same instant of time. This step outputs                         
                            O
                        
                     with its user corrections.” Rejecting the labels is interpreted as identifying the false label, as it indicates which ones are true, which also indicates which ones are false. Page 595 further states "Based on the user corrections, the NER model can learn and improve from                         
                            O
                        
                    ." As the system uses the user input to updated the NER model, the user input must have been received.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Teso and Qin with the human annotation system of da Silva because, as the abstract states, "Named Entity Recognition (NER) is a challenging problem in Natural Language Processing (NLP). Deep Learning techniques have been extensively applied in NER tasks because they require little feature engineering and are free from language-specific resources, learning important features from word or character embeddings trained on large amounts of data. However, these techniques are data-hungry and require a massive amount of training data. This work proposes Human NERD (stands for Human Named Entity Recognition with Deep learning) which addresses this problem by including humans in the loop."

	Regarding claim 15, the rejection of claim 11 is incorporated herein.
	displaying the part of the second training sample in a user interface (Page 4, Algorithm 1, line 3 states “receive new example                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            =
                            (
                            
                                    x
                                
                                    t
                                
                            ,
                             
                                            y
                                        
                                        ~
                                    
                                    t
                                
                            )
                        
                    ”, meaning that the example includes the sample, in this case,                         
                            
                                    x
                                
                                    t
                                
                     and the label, in this case,                         
                            
                                            y
                                        
                                        ~
                                    
                                    t
                                
                    . Page 4, Algorithm 1, lines 7 and 8 states “find counterexample                         
                            
                                    z
                                
                                    k
                                
                     using Eq.12 … present                         
                            
                                            z
                                        
                                        ~
                                    
                                    t
                                
                            ,
                             
                                    z
                                
                                    k
                                
                    to user, receive possibly cleaned labels                         
                            
                                    y
                                
                                    t
                                
                                    '
                                
                            ,
                             
                                    y
                                
                                    k
                                
                                    '
                                
                    ”. Therefore, the part of the training sample (the counter-example) and the corresponding source label (the original counter-example label) are presented to the user. In order for a human user, such as the annotator, to receive the information, a user interface must be used to display the information.)
	Teso does not appear to explicitly teach
receiving a user input corresponding to the part of the training sample or the corresponding source label, wherein the part is identified based on the user input.
However, da Silva teaches
receiving a user input corresponding to the part of the training sample or the corresponding source label, wherein the part is identified based on the user input. (Page 596 states "If the reviewer identifies an entity not annotated by the model, he/she can manually label it. In this case, first, he/she should click on the class label (on top of the Figure 2), then the class will appear in evidence. After that, the reviewer selects the sequence of words in the document to annotate." The sequence of words is interpreted as the part of the training sample, as it is a part of the document. The class label is interpreted as the corresponding source label. Page 595 states "Based on the user corrections, the NER model can learn and improve from O." Therefore, the training sample and corresponding source label are identified based on the input.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Teso and Qin with the human annotation system of da Silva for the reasons given above in regards to claim 14.

Regarding claim 19, the rejection of claim 18 is incorporated herein. Teso does not appear to explicitly teach
a user interface configured to receive an input identifying the false label
However, Da Silva—directed to analogous art—teaches
a user interface configured to receive an input identifying the false label (Page 595 states “As the entities are labeled in                         
                            O
                        
                    , users (as reviewers) then accept or reject these to indicate which ones are true. Each document                         
                            
                                    t
                                
                                    i
                                
                            ∈
                            O
                        
                     is presented to one user. Thus no two users labeled the same document at the same instant of time. This step outputs                         
                            O
                        
                     with its user corrections.” Rejecting the labels is interpreted as identifying the false label, as it indicates which ones are true, which also indicates which ones are false. Page 595 further states "Based on the user corrections, the NER model can learn and improve from                         
                            O
                        
                    ." As the system uses the user input to updated the NER model, the user input must have been received.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Teso and Qun with the human annotation system of da Silva because, as the abstract states, "Named Entity Recognition (NER) is a challenging problem in Natural Language Processing (NLP). Deep Learning techniques have been extensively applied in NER tasks because they require little feature engineering and are free from language-specific resources, learning important features from word or character embeddings trained on large amounts of data. However, these techniques are data-hungry and require a massive amount of training data. This work proposes Human NERD (stands for Human Named Entity Recognition with Deep learning) which addresses this problem by including humans in the loop."

Regarding claim 20, the rejection of claim 19 is incorporated herein. Teso does not appear to explicitly teach
wherein: the user interface is further configured to receive a corrected label.
However, Da Silva—directed to analogous art—teaches
wherein: the user interface is further configured to receive a corrected label. (Page 595 states "Human NERD presents to the user an interactive web-based annotation interface used for adding entity annotations or editing automatic pre-annotations in                         
                            O
                        
                    . As the entities are labeled in                         
                            O
                        
                    , users (as reviewers) then accept or reject these to indicate which ones are true. Each document                         
                            
                                    t
                                
                                    i
                                
                            ∈
                            O
                        
                     is presented to one user. Thus no two users labeled the same document at the same instant of time. This step outputs                         
                            O
                        
                     with its user corrections." The document is interpreted as the input sample. The user corrections are interpreted as the corrected label, as they include the labeled entities.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Teso and Qin with the human annotation system of da Silva for the reasons given above in regards to claim 19.

	Claim(s) 5, 8, and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Teso (“Interactive Label Cleaning with Example-based Explanations”, 2021) and Qin (“On Sample Based Explanation Methods for Sequence-to-Sequence Applications”, June 2022) as applied to claim 1 above, and further in view of Kong (“Resolving Training Biases Via Influence-Based Data Relabeling”, 2022).
	Kong was made available through the IDS.

	Regarding claim 5, the rejection of claim 1 is incorporated herein. Teso does not appear to explicitly teach 
	identifying a ground-truth label for the first training sample; and
comparing the false label to the ground-truth label, wherein the false label is identified based on the comparison.
However, Kong—directed to analogous art—teaches
identifying a ground-truth label for the first training sample; and (Page 16, Section C.2 states "Specifically, the influence of each training instance is estimated with the validation set using the validation loss and the model’s performance is tested by an additional out-of-sample test set which ensures we do not utilize any information of the test data. When training logistic regression, we randomly pick up 30% samples from the training set as the validation set. For different influence-based approaches, the training/validation/test sets are kept the same for fair comparison." Therefore, the training set corresponds to the training set. Page 4, Eq. 5 states that the influence is calculated using Eq. 2, which uses the change in loss, which is the validation loss according to Section C.2. Page 2 states “Since our relabeling function is dependent to the loss function, we focus on the most effective and versatile loss, i.e., Cross Entropy loss for any classification tasks.” One of ordinary skill in the art would realize that the cross-entropy loss requires the ground-truth of the data set. Therefore, as the validation set is used for the validation loss, input samples (the validation data) and the ground-truth labels for the input samples are included.) 
comparing the false label to the ground-truth label, wherein the false label is identified based on the comparison. (As in the above limitation, the influence is calculated using the cross-entropy loss, which as one of ordinary skill in the art would understand, compares the false label to the ground-truth label. Page 4 states “We denote                 
                    D
                    _
                    =
                    {
                    
                            z
                        
                            i
                        
                    ∈
                    D
                     
                    |
                    
                            Φ
                        
                            θ
                        
                                    z
                                
                                    i
                                
                    >
                    0
                    }
                
             as harmful samples.” Therefore, Eq. 2 (used to compute the influence                 
                    
                            Φ
                        
                            θ
                        
                    (
                    .
                     
                    )
                
             is also used to identify the harmful samples. As the harmful samples are relabeled (see pg. 5-6), the harmful samples labels, interpreted as the false labels, are identified based on the comparison.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Teso and Qin with the validation set usage of Kong because, as Kong states on page 2, "Since out relabeling function is dependent to the loss function, we focus on the most effective and versatile loss, i.e., Cross Entropy loss for any classification tasks." Additionally, page 9 states, “As the number of validation samples increases, RDIA significantly outperforms ERM, achieving up to 90% relative lower in test loss. The reason is that, as the number of validation sets increases, the validation set can gradually reflect the true distribution of test data.”)

Regarding claim 8, the rejection of claim 1 is incorporated herein. Teso teaches
a plurality of encoder output weights (Page 7 states that they use the data set "20NG [27]: data set of newsgroup posts categorized in twenty topics. The documents were embedded using a pre-trained SentenceBERT model [28] and compressed to 100 features using PCA." Therefore, as SentenceBERT is an encoder, as one of ordinary skill in the art would understand, the encoder output weights must have been identified to make the predictions that are used for the calculation of the influence.)
Teso does not appear to explicitly teach
identifying a plurality of [output weights] and a plurality of class transition parameters, wherein the influence is approximated based on the plurality of [output weights] and is independent of the plurality of class transition parameters.
However, Kong—directed to analogous art—teaches
identifying a plurality of [output weights] and a plurality of class transition parameters, wherein the influence is approximated based on the plurality of [output weights] and is independent of the plurality of class transition parameters. (Page 5 states “Recall that                 
                    φ
                    
                                    x
                                
                                    i
                                
                            ,
                             
                            θ
                        
             denotes the model output for                 
                    
                            x
                        
                            i
                        
            ”. Eq. 8 and Eq. 9, used for calculating the influence uses the model output, therefore using the model output weights. Page 6 states “Consider a training sample                 
                    
                            z
                        
                            i
                        
                    =
                    (
                    
                            x
                        
                            i
                        
                    ,
                    
                            y
                        
                            i
                        
                    )
                
             belonging to the                 
                    m
                
            -th class                 
                    (
                    m
                    ∈
                    
                            1
                            ,
                             
                            K
                        
                    )
                
             i.e.                 
                    
                            y
                        
                            i
                            ,
                            m
                        
                    =
                    1
                
            . Let                 
                    R
                    
                                    x
                                
                                    i
                                
                            ,
                            
                                    y
                                
                                    i
                                
                    =
                    
                            y
                        
                            i
                        
                            '
                        
            , we propose the following relabeling function                 
                    R
                
             that fulfills the above two principles:                 
                    
                            y
                        
                            i
                            ,
                            k
                        
                            '
                        
                    =
                    
                                            0
                                            ,
                                             
                                            i
                                            f
                                             
                                            k
                                            =
                                            m
                                        
                                                            log
                                                        
                                                            φk
                                                        
                                                ⁡
                                                
                                                            1
                                                            -
                                                            m
                                                        
                                                            K
                                                            -
                                                            1
                                                        
                                            ,
                                             
                                            o
                                            t
                                            h
                                            e
                                            r
                                            w
                                            i
                                            s
                                            e
                                        
            , (12) where                 
                    φ
                    
                                    x
                                
                                    i
                                
                            ,
                            
                                    θ
                                
                                ^
                            
                    =
                    (
                    
                            φ
                        
                            1
                        
                    ,
                     
                    …
                    ,
                     
                            φ
                        
                            K
                        
                    )
                
             is the probability distribution over                 
                    K
                
             classes produced by the model with parameters                 
                    
                            θ
                        
                        ^
                    
            , i.e.,                 
                    
                            φ
                        
                            i
                        
                    ∈
                    [
                    0,1
                    ]
                
             and                 
                    
                            ∑
                            
                                i
                                =
                                1
                            
                                K
                            
                                    φ
                                
                                    i
                                
                            =
                            1
                        
            .” Therefore, the relabeling function describe class transition, and the output of this relabeling function is interpreted as the class transition parameters. As this is repeated for all of the harmful training samples, there is a plurality of class transition parameters. The influence function is calculated before the relabeling step (see page 4) and therefore is independent of the class transition parameters. As the influence function (see page 5, Eq. 8, Eq. 9) includes the loss of the model, the model output weights are used to calculated the influence.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Teso and Qin with the relabeling function of Kong because, as Kong states on page 6, "Theorem 2 shows that using our proposed relabeling                 
                    R
                
             can further reduce the test risk than simply discarding or downweighting                 
                    D
                    _
                
             from the training set for any classification tasks."

Regarding claim 12, the rejection of claim 11 is incorporated herein. Teso does not appear to explicitly teach 
	identifying a ground-truth label for the first training sample; and
comparing the false label to the ground-truth label, wherein the false label is identified based on the comparison.
However, Kong—directed to analogous art—teaches
identifying a ground-truth label for the first training sample; and (Page 16, Section C.2 states "Specifically, the influence of each training instance is estimated with the validation set using the validation loss and the model’s performance is tested by an additional out-of-sample test set which ensures we do not utilize any information of the test data. When training logistic regression, we randomly pick up 30% samples from the training set as the validation set. For different influence-based approaches, the training/validation/test sets are kept the same for fair comparison." Therefore, the training set corresponds to the training set. Page 4, Eq. 5 states that the influence is calculated using Eq. 2, which uses the change in loss, which is the validation loss according to Section C.2. Page 2 states “Since our relabeling function is dependent to the loss function, we focus on the most effective and versatile loss, i.e., Cross Entropy loss for any classification tasks.” One of ordinary skill in the art would realize that the cross-entropy loss requires the ground-truth of the data set. Therefore, as the validation set is used for the validation loss, input samples (the validation data) and the ground-truth labels for the input samples are included.) 
comparing the false label to the ground-truth label, wherein the false label is identified based on the comparison. (As in the above limitation, the influence is calculated using the cross-entropy loss, which as one of ordinary skill in the art would understand, compares the false label to the ground-truth label. Page 4 states “We denote                 
                    D
                    _
                    =
                    {
                    
                            z
                        
                            i
                        
                    ∈
                    D
                     
                    |
                    
                            Φ
                        
                            θ
                        
                                    z
                                
                                    i
                                
                    >
                    0
                    }
                
             as harmful samples.” Therefore, Eq. 2 (used to compute the influence                 
                    
                            Φ
                        
                            θ
                        
                    (
                    .
                     
                    )
                
             is also used to identify the harmful samples. As the harmful samples are relabeled (see pg. 5-6), the harmful samples labels, interpreted as the false labels, are identified based on the comparison.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Teso and Qin with the validation set usage of Kong because, as Kong states on page 2, "Since out relabeling function is dependent to the loss function, we focus on the most effective and versatile loss, i.e., Cross Entropy loss for any classification tasks." Additionally, page 9 states, “As the number of validation samples increases, RDIA significantly outperforms ERM, achieving up to 90% relative lower in test loss. The reason is that, as the number of validation sets increases, the validation set can gradually reflect the true distribution of test data.”)

	Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA THUY PHAM whose telephone number is (571)272-2605. The examiner can normally be reached Monday - Friday, 9 A.M. - 5:00 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.T.P./Examiner, Art Unit 2121                                                                                                                                                                                                        
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

Oct 18, 2022
Application Filed
Aug 23, 2025
Non-Final Rejection — §101, §103
Nov 24, 2025
Examiner Interview Summary
Nov 24, 2025
Applicant Interview (Telephonic)
Nov 28, 2025
Response Filed
Feb 18, 2026
Final Rejection — §101, §103 (current)
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

3-4
Expected OA Rounds
33%
Grant Probability
0%
With Interview (-33.3%)
3y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 3 resolved cases by this examiner. Grant probability derived from career allow rate.
SYSTEMS AND METHODS FOR DATA CORRECTION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

SYSTEMS AND METHODS FOR DATA CORRECTION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email