DETAILED ACTION
This communication is in response to the Application No. 18/163,162 filed on February 01, 2023
in which Claims 1-20 are presented for examination.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 1-20 are rejected under 35 U.S.C. 101 because these claimed inventions are
directed to an abstract idea without significantly more.
Regarding Claim 1:
Step 1: Claim 1 is a method type claim. Therefore, Claims 1-9 fall within one of the four statutory
categories (i.e., process, machine, manufacture, or composition of matter).
2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance
of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. If a claim limitation, under its broadest reasonable
interpretation, covers performance of the limitation by mathematical calculation but for the recitation
of generic computer components, then it falls within the “Mathematical Concepts” grouping of abstract
ideas.
determining, based on one or more datapoints of the training dataset, one or more changes to the weight associated with each node of the plurality of nodes (mental process – determining changes to the weight associated with each node may be performed mentally by a user observing/analyzing the datapoints of the training dataset and accordingly using judgement/evaluation to determine changes to the weight based on said analysis);
identifying, using one or more model explainability techniques and based on the one or more changes to the weight associated with each node of the plurality of nodes, one or more pathways that decrease an accuracy of the machine learning model outputting the label (mental process – identifying pathways that decrease an accuracy of the machine learning model may be performed mentally by a user observing/analyzing the changes to the weight associated with each node and accordingly using judgement/evaluation to identify pathways that decrease an accuracy of the machine learning model based on said analysis);
determining a first set of the one or more datapoints that correlate with the one or more pathways that decrease the accuracy of the machine learning model outputting the label (mental process – determining a set of data points that correlate with the pathways that decrease the accuracy of the machine learning model may be performed mentally by observing/analyzing the pathways and accordingly using judgement/evaluation to determine a first set of datapoints that correlate with the pathways);
and removing the first set of the one or more datapoints from the training dataset to generate a reduced training dataset (mental process – removing the first set of the datapoints from the training dataset may be performed manually by a user by getting rid of the set of datapoints to generate reduced training dataset).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
inputting, by a computing device, a training dataset into a machine learning model to train the machine learning model to output a label, wherein the machine learning model comprises a plurality of nodes and each node, of the plurality of nodes, is associated with a weight (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) ) Examiner’s note: high level recitation of training a machine learning model by using a training dataset without significantly more)
Step 2B: The claim does not include additional elements considered individually and in combination that
are sufficient to amount to significantly more than the judicial exception.
inputting, by a computing device, a training dataset into a machine learning model to train the machine learning model to output a label, wherein the machine learning model comprises a plurality of nodes and each node, of the plurality of nodes, is associated with a weight (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) ) Examiner’s note: high level recitation of training a machine learning model by using a training dataset without significantly more)
For the reasons above, Claim 1 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 1-9. The additional limitations of the dependent claims are addressed below.
Regarding Claim 2:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 2 depends on.
inputting the reduced training dataset into the machine learning model to determine whether the machine learning model outputs the label (mental process – inputting the reduced training dataset into the machine learning model may be performed manually by a user and determining whether the machine learning model outputs the label may be performed mentally by a user observing/analyzing the machine learning model outputs)
comparing a first label outputted by the machine learning model trained on the training dataset to a second label outputted by the machine learning model trained on the reduced training dataset to determine whether the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset (mental process – comparing a first label outputted by the machine learning model to a second label outputted by the machine learning model may be performed mentally by a user observing/analyzing the first and second label and accordingly using judgement/evaluation to determine whether the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset);
and in response to a determination that the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset, determining that the reduced training dataset is valid (mental process – determining that the reduced training dataset is valid may be performed mentally by a user observing/analyzing the determination that the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset and accordingly using judgement/evaluation to determine that the reduced training dataset is valid based on said analysis);
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 3:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 3 depends on.
determining that the first set of the one or more datapoints causes the weight associated with each node associated with the one or more pathways to change by more than a threshold amount (mental process – determining that the first set of the datapoints causes the weight associated with each node to change by more than a threshold may be performed mentally by a user observing/analyzing the weight associated with each node and accordingly using judgement/evaluation to determine that the first set of the datapoints causes the weight associated with each node to change by more than a threshold based on said analysis).
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered
individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 4:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 4 depends on.
wherein the one or more changes comprise at least one of: a magnitude by which the weight associated with each node of the plurality of nodes changes; or a direction in which the weight associated with each node of the plurality of nodes changes (mental process – determining a magnitude or direction of a weight change may be performed manually by a user by observing/analyzing the weight or direction and accordingly using judgement/evaluation to determine any changes)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 5:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 5 depends on.
wherein each of the one or more pathways comprises a plurality of nodes that are not used in a determination to output the label based on the training dataset inputted into the machine learning model (elaborating a mental process - pathways comprising a plurality of nodes that are not used in the determination adds detail to a mental process)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 6:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 6 depends on.
determining the first set of the one or more datapoints that correlate with the one or more pathways that cause the machine learning model to have a net decrease in outputting the label over the plurality of epochs (mental process – determining the first set of the datapoints that correlate with the pathways that cause the machine learning model to have a net decrease outputting the label may be performed mentally by a user observing/analyzing the datapoints that correlate with the one or more pathways and accordingly using judgement/evaluation to determine pathways that cause the machine learning model to have a net decrease in outputting the label)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regrading Claim 7:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 7 depends on.
wherein the accuracy of the machine learning model is based on at least one of: a classification accuracy of the machine learning model; or a logarithmic loss of the machine learning model (mental process – evaluating accuracy using classification accuracy or logarithmic loss involves analyzing numerical performance and comparing values using mathematical reasoning, which may be performed manually by a user)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 8:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 8 depends on.
identifying, using the one or more model explainability techniques and the one or more changes to the weight associated with each node of the plurality of nodes, one or more second pathways, wherein the one or more second pathways increase the accuracy of the machine learning model outputting the label (mental process – identifying pathways that increase the accuracy of the machine learning model may be performed mentally by a user observing/analyzing the changes to the weight associated with each node and accordingly using judgement/evaluation to identify pathways that increase an accuracy of the machine learning model based on said analysis)
determining a second set of the one or more datapoints that correlate with the one or more second pathways (mental process – determining a second set of data points that correlate with the second pathways may be performed mentally by observing/analyzing the pathways and accordingly using judgement/evaluation to determine a second set of datapoints that correlate with the second pathways)
and generating a second reduced training dataset comprising the second set of the one or more datapoints (mental process – generating a second reduced training dataset may be performed manually by a user observing/analyzing the second set of the datapoints and accordingly using judgement/evaluation to generate a second reduced training dataset)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 9:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 9 depends on.
wherein the one or more model explainability techniques comprise at least one of: a local interpretable model-agnostic explanations technique; or a Shapley additive explanations technique (mental process – using a local interpretable model-agnostic explanations technique or a Shapley additive explanations involves analyzing model inputs, outputs, and feature contributions and applying judgement to attribute influence to features, which may be performed mentally by a user evaluating how different inputs affect an outcome)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 10:
Step 1: Claim 10 is a non-transitory machine-readable medium type claim. Therefore, Claims 10-14 fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance
of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. If a claim limitation, under its broadest reasonable
interpretation, covers performance of the limitation by mathematical calculation but for the recitation
of generic computer components, then it falls within the “Mathematical Concepts” grouping of abstract
ideas.
determining, based on one or more datapoints of the training dataset, one or more changes to the weight associated with each node of the plurality of nodes (mental process – determining changes to the weight associated with each node may be performed mentally by a user observing/analyzing the datapoints of the training dataset and accordingly using judgement/evaluation to determine changes to the weight based on said analysis);
identifying, using one or more model explainability techniques and based on the one or more changes to the weight associated with each node of the plurality of nodes, one or more pathways that decrease an accuracy of the machine learning model outputting the label (mental process – identifying pathways that decrease an accuracy of the machine learning model may be performed mentally by a user observing/analyzing the changes to the weight associated with each node and accordingly using judgement/evaluation to identify pathways that decrease an accuracy of the machine learning model based on said analysis);
determining a first set of the one or more datapoints that correlate with the one or more pathways that decrease the accuracy of the machine learning model outputting the label (mental process – determining a set of data points that correlate with the pathways that decrease the accuracy of the machine learning model may be performed mentally by observing/analyzing the pathways and accordingly using judgement/evaluation to determine a first set of datapoints that correlate with the pathways);
and removing the first set of the one or more datapoints from the training dataset to generate a reduced training dataset (mental process – removing the first set of the datapoints from the training dataset may be performed manually by a user by getting rid of the set of datapoints to generate reduced training dataset).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
Inputting a training dataset into a machine learning model to train the machine learning model to output a label, wherein the machine learning model comprises a plurality of nodes and each node, of the plurality of nodes, is associated with a weight (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) ) Examiner’s note: high level recitation of training a machine learning model by using a training dataset without significantly more)
Step 2B: The claim does not include additional elements considered individually and in combination that
are sufficient to amount to significantly more than the judicial exception.
inputting, by a computing device, a training dataset into a machine learning model to train the machine learning model to output a label, wherein the machine learning model comprises a plurality of nodes and each node, of the plurality of nodes, is associated with a weight (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) ) Examiner’s note: high level recitation of training a machine learning model by using a training dataset without significantly more)
For the reasons above, Claim 10 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 10-14. The additional limitations of the dependent claims are addressed below.
Regarding Claim 11:
Step 2A Prong 1: See the rejection of Claim 10 above, which Claim 11 depends on.
inputting the reduced training dataset into the machine learning model to determine whether the machine learning model outputs the label (mental process – inputting the reduced training dataset into the machine learning model may be performed manually by a user and determining whether the machine learning model outputs the label may be performed mentally by a user observing/analyzing the machine learning model outputs)
comparing a first label outputted by the machine learning model trained on the training dataset to a second label outputted by the machine learning model trained on the reduced training dataset to determine whether the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset (mental process – comparing a first label outputted by the machine learning model to a second label outputted by the machine learning model may be performed mentally by a user observing/analyzing the first and second label and accordingly using judgement/evaluation to determine whether the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset);
and in response to a determination that the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset, determining that the reduced training dataset is valid (mental process – determining that the reduced training dataset is valid may be performed mentally by a user observing/analyzing the determination that the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset and accordingly using judgement/evaluation to determine that the reduced training dataset is valid based on said analysis);
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 12:
Step 2A Prong 1: See the rejection of Claim 10 above, which Claim 12 depends on.
determining that the first set of the one or more datapoints causes the weight associated with each node associated with the one or more pathways to change by more than a threshold amount (mental process – determining that the first set of the datapoints causes the weight associated with each node to change by more than a threshold may be performed mentally by a user observing/analyzing the weight associated with each node and accordingly using judgement/evaluation to determine that the first set of the datapoints causes the weight associated with each node to change by more than a threshold based on said analysis).
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered
individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 13:
Step 2A Prong 1: See the rejection of Claim 10 above, which Claim 13 depends on.
wherein each of the one or more pathways comprises a plurality of nodes that are not used in a determination to output the label based on the training dataset inputted into the machine learning model (mental process - pathways comprising a plurality of nodes that are not used in the determination adds detail to a mental process)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 14:
Step 2A Prong 1: See the rejection of Claim 10 above, which Claim 14 depends on.
determining the first set of the one or more datapoints causes the one or more pathways to have a net decrease in outputting the label over the plurality of epochs (mental process – determining the first set of the datapoints that causes the pathways to have a net decrease outputting the label may be performed mentally by a user observing/analyzing the datapoints that cause the one or more pathways to have a net decrease and accordingly using judgement/evaluation to determine the first set of datapoints that cause the pathways to have a net decrease in outputting the label)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 15:
Step 1: Claim 15 is a computing device type claim. Therefore, Claims 15-20 fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance
of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. If a claim limitation, under its broadest reasonable
interpretation, covers performance of the limitation by mathematical calculation but for the recitation
of generic computer components, then it falls within the “Mathematical Concepts” grouping of abstract
ideas.
determine, based on one or more datapoints of the training dataset, one or more changes to the weight associated with each node of the plurality of nodes (mental process – determining changes to the weight associated with each node may be performed mentally by a user observing/analyzing the datapoints of the training dataset and accordingly using judgement/evaluation to determine changes to the weight based on said analysis);
identify, using one or more model explainability techniques and based on the one or more changes to the weight associated with each node of the plurality of nodes, one or more pathways that decrease an accuracy of the machine learning model outputting the label (mental process – identifying pathways that decrease an accuracy of the machine learning model may be performed mentally by a user observing/analyzing the changes to the weight associated with each node and accordingly using judgement/evaluation to identify pathways that decrease an accuracy of the machine learning model based on said analysis);
determine a first set of the one or more datapoints that correlate with the one or more pathways that decrease the accuracy of the machine learning model outputting the label (mental process – determining a set of data points that correlate with the pathways that decrease the accuracy of the machine learning model may be performed mentally by observing/analyzing the pathways and accordingly using judgement/evaluation to determine a first set of datapoints that correlate with the pathways);
and remove the first set of the one or more datapoints from the training dataset to generate a reduced training dataset (mental process – removing the first set of the datapoints from the training dataset may be performed manually by a user by getting rid of the set of datapoints to generate reduced training dataset).
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
Input a training dataset into a machine learning model to train the machine learning model to output a label, wherein the machine learning model comprises a plurality of nodes and each node, of the plurality of nodes, is associated with a weight (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) ) Examiner’s note: high level recitation of training a machine learning model by using a training dataset without significantly more)
Step 2B: The claim does not include additional elements considered individually and in combination that
are sufficient to amount to significantly more than the judicial exception.
input, by a computing device, a training dataset into a machine learning model to train the machine learning model to output a label, wherein the machine learning model comprises a plurality of nodes and each node, of the plurality of nodes, is associated with a weight (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) ) Examiner’s note: high level recitation of training a machine learning model by using a training dataset without significantly more)
For the reasons above, Claim 15 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 15-20. The additional limitations of the dependent claims are addressed below.
Regarding Claim 16:
Step 2A Prong 1: See the rejection of Claim 15 above, which Claim 16 depends on.
input the reduced training dataset into the machine learning model to determine whether the machine learning model outputs the label (mental process – inputting the reduced training dataset into the machine learning model may be performed manually by a user and determining whether the machine learning model outputs the label may be performed mentally by a user observing/analyzing the machine learning model outputs)
compare a first label outputted by the machine learning model trained on the training dataset to a second label outputted by the machine learning model trained on the reduced training dataset to determine whether the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset (mental process – comparing a first label outputted by the machine learning model to a second label outputted by the machine learning model may be performed mentally by a user observing/analyzing the first and second label and accordingly using judgement/evaluation to determine whether the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset);
and in response to a determination that the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset, validate the reduced training dataset (mental process – validating that the reduced training dataset is valid may be performed mentally by a user observing/analyzing the determination that the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset and accordingly using judgement/evaluation to determine that the reduced training dataset is valid based on said analysis);
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 17:
Step 2A Prong 1: See the rejection of Claim 15 above, which Claim 17 depends on.
determine that the first set of the one or more datapoints causes the weight associated with each node associated with the one or more pathways to change by more than a threshold amount (mental process – determining that the first set of the datapoints causes the weight associated with each node to change by more than a threshold may be performed mentally by a user observing/analyzing the weight associated with each node and accordingly using judgement/evaluation to determine that the first set of the datapoints causes the weight associated with each node to change by more than a threshold based on said analysis).
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered
individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 18:
Step 2A Prong 1: See the rejection of Claim 15 above, which Claim 18 depends on.
wherein each of the one or more pathways comprises a plurality of nodes that are not used in a determination to output the label based on the training dataset inputted into the machine learning model (mental process - pathways comprising a plurality of nodes that are not used in the determination adds detail to a mental process)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 19:
Step 2A Prong 1: See the rejection of Claim 15 above, which Claim 19 depends on.
determine the first set of the one or more datapoints causes the one or more pathways to have a net decrease in outputting the label over the plurality of epochs (mental process – determining the first set of the datapoints that causes the pathways to have a net decrease outputting the label may be performed mentally by a user observing/analyzing the datapoints that cause the one or more pathways to have a net decrease and accordingly using judgement/evaluation to determine the first set of datapoints that cause the pathways to have a net decrease in outputting the label)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Regarding Claim 20:
Step 2A Prong 1: See the rejection of Claim 15 above, which Claim 20 depends on.
identify, using the one or more model explainability techniques and the one or more changes to the weight associated with each node of the plurality of nodes, one or more second pathways, wherein the one or more second pathways increase the accuracy of the machine learning model outputting the label (mental process – identifying pathways that increase the accuracy of the machine learning model may be performed mentally by a user observing/analyzing the changes to the weight associated with each node and accordingly using judgement/evaluation to identify pathways that increase an accuracy of the machine learning model based on said analysis)
determine a second set of the one or more datapoints that correlate with the one or more second pathways (mental process – determining a second set of data points that correlate with the second pathways may be performed mentally by observing/analyzing the pathways and accordingly using judgement/evaluation to determine a second set of datapoints that correlate with the second pathways)
and generate a second reduced training dataset comprising the second set of the one or more datapoints (mental process – generating a second reduced training dataset may be performed manually by a user observing/analyzing the second set of the datapoints and accordingly using judgement/evaluation to generate a second reduced training dataset)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, there are no additional elements that integrate the abstract idea into practical application. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1 - 20 are rejected under 35 U.S.C. 103 as being unpatentable over Prendki et al.
(hereafter Prendki) (US 20220138561), in view of Shachar et al. (hereinafter Shachar) (US 12124933).
Regarding Claim 1, Prendki teaches a method (Prendki, Par. [0018], “A computer-implemented process or method”, thus a method is disclosed) comprising:
inputting, by a computing device, a training dataset into a machine learning model to train the machine learning model to output a label […] (Prendki, Par. [0053], “At block 150, using a hardware processor for example, the method is programmed for executing computer instructions that are programmed to receive an input dataset of training data, the input dataset comprising a plurality of records, the input dataset having been previously used to train the second machine learning model”, thus inputting, by a computing device/a hardware processor, a training dataset into a machine learning model to train the machine learning model to output a label is disclosed)
determining, based on one or more datapoints of the training dataset […] (Prendki, Par. [0022], “In an embodiment, the disclosure provides a computer-implemented process of building a predictive (ML) model to predict the usefulness of a record (data point) in the context of the training process of a machine learning model”, thus Prendki discloses evaluating individual training datapoints by predicting their usefulness with respect to a machine learning model’s training and performance, where determinations about model behavior are made by analyzing how specific datapoints influence training outcomes)
identifying […] one or more pathways that decrease an accuracy of the machine learning model outputting the label (Prendki, Par. [0021], “Training Set Optimization refers to the process of modifying a training set by removing redundant, useless, or harmful data rows; it differs from conventional compression in which each row is compressed by reducing its individual size and is more accurately described as denoising. Filter refers to a classifier (in most cases, binary) that separates a first subset of data having high information value from a second subset of data having less or no information value.”, & Par. [0048], “The process described thus far offers many benefits and improvements over prior approaches. First, the process is agnostic concerning models. Using several models built for the same task, an implementation can build a more robust filter that will work for any model within the same family of tasks. By using models for different tasks on the same dataset, it is possible to build a map of the data in terms of its absolute value; data that is useless across all tasks is useless in the absolute”, thus identifying one or more pathways that decrease an accuracy of the machine learning model outputting the label is disclosed, because, as stated in the applicant’s specification – Par [0071], the one or more pathways may comprise a contiguous plurality of nodes forming a path from a node that receives input based on a datapoint from the training dataset to a node that outputs the label. Prendki identifies datapoints that are harmful to model accuracy by evaluating their effect on model performance, with datapoints classified as harmful corresponding to contiguous computational routes within the machine learning model that lead to inaccurate label outputs)
determining a first set of the one or more datapoints that correlate with the one or more pathways that decrease the accuracy of the machine learning model outputting the label (Prendki, Par. [0021], “Training Set Optimization refers to the process of modifying a training set by removing redundant, useless, or harmful data rows; it differs from conventional compression in which each row is compressed by reducing its individual size and is more accurately described as denoising. Filter refers to a classifier (in most cases, binary) that separates a first subset of data having high information value from a second subset of data having less or no information value”, & Par. [0077], “The goal of the disclosed system is to identify which data records (rows) from the training set are creating such confusion and classify them as “harmful” to the model, in order to eliminate them in future retraining of the model.”, thus Prendki discloses determining a first set of datapoints that correlate with pathways that decrease model accuracy by evaluating individual training records based on their effect on a trained machine learning model’s performance and classifying those records as harmful when they introduce confusion or reduce predictive accuracy. Because, as stated in the applicant’s specification, a pathway may be a contiguous set of nodes from an input node receiving a datapoint to an output node generating a label, Prendki’s identification of harmful datapoints necessarily corresponds to identifying the specific input-to-output computational routes within the model through which those datapoints lead to incorrect label outputs. Thus, Prendki determines a subset of datapoints correlated with accuracy-degrading pathways by selecting and removing training records whose processing through the model results in decreased accuracy)
removing the first set of the one or more datapoints from the training dataset to generate a reduced training dataset (Prendki, Par. [0021], “Training Set Optimization refers to the process of modifying a training set by removing redundant, useless, or harmful data rows; it differs from conventional compression in which each row is compressed by reducing its individual size and is more accurately described as denoising.”, thus removing the first set of the one or more datapoints from the training dataset to generate a reduced training dataset is disclosed)
Prendki does not explicitly teach the machine learning model comprising a plurality of nodes and each node, of the plurality of nodes, being associated with a weight, one or more changes to the weight associated with each node of the plurality of nodes, and one or more changes to the weight associated with each node of the plurality of nodes, and using one or more model explainability techniques.
However, Shachar teaches wherein the machine learning model comprises a plurality of nodes and each node, of the plurality of nodes, is associated with a weight (Shachar, Par. [0045], “As shown in FIGS. 3 and 4, models 300 and 400 may include different layers, such as an input layer, a hidden layer, and an output layer, each having one or more nodes, however, different layers may also be utilized. For model 300, layers 1104 are shown, while for model 400, layers 1204 are shown”, & Par. [0046], “By continuously providing different sets of training data and penalizing models 300 and 400 when the output is incorrect, models 300 and 400 (and specifically, the representations of the nodes in the hidden layer) may be trained (adjusted) to improve performance of models 300, 400 in data classification. Adjusting models 300 and 400 may include separately adjusting the weights associated with each node in the hidden layer.”, thus Sharchar discloses a machine learning model having multiple layers of nodes in which each node participates in weighted connections that are adjusted during training, such that each node is associated with at least one weight used to compute the model’s output)
Shachar also teaches one or more changes to the weight associated with each node of the plurality of nodes (Shachar, Par. [0046], “By continuously providing different sets of training data and penalizing models 300 and 400 when the output is incorrect, models 300 and 400 (and specifically, the representations of the nodes in the hidden layer) may be trained (adjusted) to improve performance of models 300, 400 in data classification. Adjusting models 300 and 400 may include separately adjusting the weights associated with each node in the hidden layer.”, thus changes/adjustments to the weight associated with each node of plurality of nodes is disclosed)
Additionally, Shachar teaches using one or mode model explainability techniques (Shachar, Par. [0039], “After creation of the models, at block 9, model explanation is performed to understand the importance of micromodels and, inside each micromodel, the importance of the features to the micromodels. Thus, after building the models, an ML model explainer, such as an explanation algorithm, may be used to verify the added value of each separate feature. This may include utilizing SHAP or LIME to obtain a measure of importance of each feature in each classification task. Thereafter, an average of those contributions is determined to obtain a total significance level of each feature”, thus using one or mode model explainability techniques is disclosed)
It would have been obvious for one of ordinary skill in the art before the effective filing date
of the claimed invention to combine Prendki’s approach of identifying training datapoints whose processing through the machine learning model results in decreased output accuracy, which reads on identifying one or more pathways that decrease an accuracy of the machine learning model outputting the label, with Shachar’s use of ML model explainers to measure the importance of each feature, which corresponds to using one or more model explainability techniques, thereby improving dataset quality and resulting in a more accurate machine learning model (Shachar, Par. [0051], “An ML model explainer may be used to determine an added value of each feature to the ML models' classifications, such as a measure of importance of each feature in the classification tasks. This may be done using SHAP, LIME, or a lift ratio per each feature separately. Thereafter, an administrator may determine whether the risk scores used to enrich the data set provide a more accurate ML model for anomaly detection based on the comparison and feature importance.”) Thus, the combined teachings disclose using explainability feature importance to associate reductions in model accuracy with specific computational pathways activated by particular training datapoints, allowing identification of datapoints that negatively affect model performance and their removal to improve model accuracy.
Regarding Claim 2, Prendki combined with Shachar teaches all of the limitations of claim 1 as cited above and Prendki further teaches:
inputting the reduced training dataset into the machine learning model to determine whether the machine learning model outputs the label (Prendki, Par. [0046], “7: m.sub.select ← train(m, selected) (train a new model on selected) accuracy.sub.selected = test(m.sub.select, S.sub.test) “, & Par. [0058], “At block 160, the process executes computer instructions that are programmed to filter the second dataset of prospective training data using the data filter, and to output a refined training dataset comprising fewer records than the second dataset, the refined training dataset comprising only records of the second dataset having the usefulness value greater than a specified threshold.”, thus inputting the reduced training dataset into the machine learning model to determine whether the machine learning model outputs the label is disclosed, because Prendki retrains a machine learning model using a reduced (refined) training dataset and evaluates the model’s predictive output and accuracy on a test set, which corresponds to determining whether the machine learning model outputs the label when trained on the reduced training dataset)
comparing a first label outputted by the machine learning model trained on the training dataset to a second label outputted by the machine learning model trained on the reduced training dataset to determine whether the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset (Prendki, Par. [102], “The next step of filter validation 204 (FIG. 2) includes testing if the data filter does not generate biases, and that the accuracy obtained is as expected (function of how the threshold was set). For a more thorough estimation of the filter's efficacy, there can be a held-out training dataset which is filtered down. Two versions of the model are trained, one on the entire dataset and the other on the filtered down version. If the filtered down version achieves a similar accuracy level to the full version, then the data filter is useful.”, thus comparing a first label outputted by a model trained on the full training dataset with a second label outputted by a model trained on a reduced training dataset to evaluate whether the reduced dataset maintains at least comparable accuracy is disclosed, because Prendki trains two separate models on the full and filtered datasets and compares their resulting predictive accuracy to determine whether the reduced training dataset yields determinations that are at least as accurate as those produced using the full training dataset)
in response to a determination that the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset, determining that the reduced training dataset is valid (Prendki, Par. [102], “Two versions of the model are trained, one on the entire dataset and the other on the filtered down version. If the filtered down version achieves a similar accuracy level to the full version, then the data filter is useful”, &Par. [103], “For instance, if the filter predicts 10% of the data as “useful”, the future versions of the model will be able to be trained with only 10% of the data (note that the amount of data used when training a model is not necessarily linear with the amount of time it takes to train; the disclosed system also provides customers with the capability to review this relationship)”, thus determining that the reduced training dataset is valid is disclosed, because Prendki evaluates the performance of a machine learning model trained on a reduced training dataset relative to a model trained on the full dataset and, upon determining that comparable accuracy is achieved, deems the reduced dataset useful and acceptable for future training, which corresponds to determining the validity of the reduced training dataset in response to achieving at least equivalent model accuracy)
Regarding Claim 3, Prendki combined with Shachar teaches all of the limitations of claim 1 as cited above and Prendki further teaches:
determining that the first set of the one or more datapoints causes the weight associated with each node associated with the one or more pathways to change by more than a threshold amount (Prendki, Par. [0022 - 0026], “In an embodiment, the disclosure provides a computer-implemented process of building a predictive (ML) model to predict the usefulness of a record (data point) in the context of the training process of a machine learning model. According to one embodiment, the following algorithmic flow is programmed [0023] 1. Collect/acquire (historical) training data. In the pseudocode algorithm examples set forth below, training data is denoted Strain. [0024] 2. Run process to measure usefulness of records within this training dataset (*); measurement of usefulness can be categorical or a score (number) [0025] 3. Categorize training data into groups of usefulness (*) [0026] This can be binary (useful/not useful), and can use a process to establish a threshold above which data is useful”, & Par. [0073], “All the details computed during the metadata generation phase are referred to as “metadata”—they are not data per se, but by-products of the training of the customer's model using a fraction of the customer's data that the disclosed system will use in the next stages of the process. Examples of metadata include, but are not limited to: Inference, Binary “correctness” (correctly/incorrectly predicted), Unlikelihood of prediction (if a record is predicted to be of a class that is rarely confused with ‘true’ class confusion matrix), Confidence level, First margin (difference between confidence of predicted class and next best class), Subsequent margins, Consensus between multiple models (can be perturbed versions of the same model) “Bayesian” confidence, List of activated neurons (if neural net), Activation functions, Weights and biases in model, and/or their derivatives, “Path length” (if decision tree)”, thus Prendki discloses determining, based on training-generated metadata, that particular datapoints produce changes in model weights and activations along specific computational pathways, and further discloses applying threshold-based criteria to classify those datapoints, which corresponds to determining that the first set of datapoints causes the weight associated with each node associated with the one or more pathways to change by more than a threshold amount.)
Regarding Claim 4, Prendki combined with Shachar teaches all of the limitations of claim 1 as cited above and Sharchar further teaches:
wherein the one or more changes comprise at least one of: a magnitude by which the weight associated with each node of the plurality of nodes changes; or a direction in which the weight associated with each node of the plurality of nodes changes (Sharchar, Par. [0033], “Normalizing may also occur where data sets are normalized to reduce their means and then scaled by the standard deviation of each feature. Normalizing may be performed due to two main reasons. First, gradient descent-based algorithms introduce exploding gradients if the features are not normalized”, & Par. [0046], “Models 300 and 400 may be separately trained using training data, where the nodes in the hidden layer may be trained (adjusted) such that an optimal output (e.g., a classification) is produced in the output layer based on the training data”, thus the one or more changes comprise at least one of: a magnitude by which the weight associated with each node of the plurality of nodes changes; or a direction in which the weight associated with each node of the plurality of nodes changes is disclosed, because Shachar teaches gradient-descent-based training in which node weights in the hidden layers are iteratively adjusted in response to training error, and such adjustments involve changes in both the magnitude and the direction of the weights as part of improving model output accuracy)
Regarding Claim 5, Prendki combined with Shachar teaches all of the limitations of claim 1 as cited above and Prendki further teaches:
wherein each of the one or more pathways comprises a plurality of nodes that are not used in a determination to output the label based on the training dataset inputted into the machine learning model (Prendki, Par. [0104], “Plugging in Active Learning allows the system to account for mutually contained information; indeed, the scores do not necessarily reflect that the information from record r.sub.i (from the training set) was not already contained in r.sub.j. r.sub.i and r.sub.j, i≠j could have similar scores but be redundant with each other and therefore might not be useful to use concurrently, which Active Learning and other related systems may address”, thus Prendki discloses identifying redundant or mutually contained datapoints whose contributions are determined to be unnecessary for model prediction, which correspond to computational pathways within the model that do not contribute to determining the output label and are therefore effectively not used in the label determination)
Regarding Claim 6, Prendki combined with Shachar teaches all of the limitations of claim 1 as cited above and Shachar further teaches:
determining the first set of the one or more datapoints that correlate with the one or more pathways that cause the machine learning model to have a net decrease in outputting the label over the plurality of epochs (Shachar, Par. [0077], “Other tests with Active Learning (e.g., a process where the model is trained iteratively after gradually incrementing the size of the training set) have shown that, at times, the model oscillates from a state where it seems to have understood a class, back to a state where it is clearly confused”, &Par. [0088], “In some embodiments, this approach is simplistic because whenever a training record ends up helping for one class (typically, the one it belongs to) and hurting another, the formula would annihilate those different effects on different test records; which is why in practice, the system may use other approaches to correlate the absence/presence of a record from the training set to its effect on the training (inferred on the test set). Assuming that the ground truth is available for the training set also, it is possible to correlate those effects with more precision”, &Par. [0098], “The “threshold” can either reflect the maximum amount of the data that is desired to be used when training future versions of the model, or the limit (value) under which data seems to become useless (flat learning curve) or harmful (decreasing learning curve)”, thus determining the first set of the one or more datapoints that correlate with the one or more pathways that cause the machine learning model to have a net decrease in outputting the label over the plurality of epochs is disclosed, because Shachar teaches iterative, epoch-based training in which the presence or absence of individual training records is correlated with model behavior across training iterations, including oscillations, confusion, and decreasing learning curves, and further teaches identifying datapoints whose inclusion results in harmful or degrading effects on model performance over time based on thresholded learning trends, which corresponds to correlating specific datapoints with model pathways that negatively affect label output across epochs, thereby enabling removal of datapoints that degrade training performance to improve model accuracy)
Regarding Claim 7, Prendki combined with Shachar teaches all of the limitations of claim 1 as cited above and Prendki further teaches:
wherein the accuracy of the machine learning model is based on at least one of: a classification accuracy of the machine learning model; or a logarithmic loss of the machine learning model (Prendki, Claim 6, “determining a first accuracy value representing a first classification accuracy of the another machine learning model that has been trained using the refined training dataset”, thus the accuracy of the machine learning model being based on at least one of: a classification accuracy of the machine learning model; or a logarithmic loss of the machine learning model is disclosed)
Regarding Claim 8, Prendki combined with Shachar teaches all of the limitations of claim 1 as cited above and Prendki further teaches:
identifying, using the one or more model explainability techniques and the one or more changes to the weight associated with each node of the plurality of nodes, one or more second pathways, wherein the one or more second pathways increase the accuracy of the machine learning model outputting the label (Pendki, Par. [0058], “At block 160, the process executes computer instructions that are programmed to filter the second dataset of prospective training data using the data filter, and to output a refined training dataset comprising fewer records than the second dataset, the refined training dataset comprising only records of the second dataset having the usefulness value greater than a specified threshold”, &Par. [0079], “One way to do so is to simply average, for each data record from training set, the confidence level achieved for each data record from within the test set and each sample (run) with a weight of +1 if the prediction for that record is correct, and −1 if it's incorrect, whenever this data record has been used to train the model. The metadata can be used to improve the confidence level. By doing so, the disclosed system will have high scores for each training record if they consistently help the model learn correctly”, thus identifying one or more second pathways that increase model accuracy, because Prendki evaluates the contribution of training datapoints to correct predictions using confidence and correctness metadata generated during model training, and selects datapoints that positively influence learned weights and model behavior, which correspond to contiguous computational pathways through nodes whose weight updates improve label output accuracy)
determining a second set of the one or more datapoints that correlate with the one or more second pathways (Prendki, Par. [0058], “At block 160, the process executes computer instructions that are programmed to filter the second dataset of prospective training data using the data filter, and to output a refined training dataset comprising fewer records than the second dataset, the refined training dataset comprising only records of the second dataset having the usefulness value greater than a specified threshold”, thus determining a second set of the one or more datapoints that correlate with the one or more second pathways is disclosed, because Prendki selects and retains training records based on their positive usefulness values derived from training behavior, which correspond to datapoints associated with computational pathways)
generating a second reduced training dataset comprising the second set of the one or more datapoints (Prendki, Par. [0058], “At block 160, the process executes computer instructions that are programmed to filter the second dataset of prospective training data using the data filter, and to output a refined training dataset comprising fewer records than the second dataset, the refined training dataset comprising only records of the second dataset having the usefulness value greater than a specified threshold”, thus generating a second reduced training dataset comprising the second set of the one or more datapoints is disclosed)
Regarding Claim 9, Prendki combined with Shachar teaches all of the limitations of claim 1 as cited above and Shachar further teaches:
wherein the one or more model explainability techniques comprise at least one of: a local interpretable model-agnostic explanations technique; or a Shapley additive explanations technique (Sharchar, Par. [0039], “After creation of the models, at block 9, model explanation is performed to understand the importance of micromodels and, inside each micromodel, the importance of the features to the micromodels. Thus, after building the models, an ML model explainer, such as an explanation algorithm, may be used to verify the added value of each separate feature. This may include utilizing SHAP or LIME to obtain a measure of importance of each feature in each classification task. Thereafter, an average of those contributions is determined to obtain a total significance level of each feature”, thus the one or more model explainability techniques comprising at least one of: a local interpretable model-agnostic explanations technique; or a Shapley additive explanations technique is disclosed)
Regarding Claim 10, Prendki teaches a non-transitory machine-readable medium (Prendki, Claim 18, “One or more non-transitory computer-readable media”, thus a non-transitory machine-readable medium is disclosed) storing instructions that, when executed by one or more processors, cause the one or more processors to perform steps comprising:
inputting, by a computing device, a training dataset into a machine learning model to train the machine learning model to output a label […] (Prendki, Par. [0053], “At block 150, using a hardware processor for example, the method is programmed for executing computer instructions that are programmed to receive an input dataset of training data, the input dataset comprising a plurality of records, the input dataset having been previously used to train the second machine learning model”, thus inputting, by a computing device/a hardware processor, a training dataset into a machine learning model to train the machine learning model to output a label is disclosed)
determining, based on one or more datapoints of the training dataset […] (Prendki, Par. [0022], “In an embodiment, the disclosure provides a computer-implemented process of building a predictive (ML) model to predict the usefulness of a record (data point) in the context of the training process of a machine learning model”, thus Prendki discloses evaluating individual training datapoints by predicting their usefulness with respect to a machine learning model’s training and performance, where determinations about model behavior are made by analyzing how specific datapoints influence training outcomes)
identifying […] one or more pathways that decrease an accuracy of the machine learning model outputting the label (Prendki, Par. [0021], “Training Set Optimization refers to the process of modifying a training set by removing redundant, useless, or harmful data rows; it differs from conventional compression in which each row is compressed by reducing its individual size and is more accurately described as denoising. Filter refers to a classifier (in most cases, binary) that separates a first subset of data having high information value from a second subset of data having less or no information value.”, & Par. [0048], “The process described thus far offers many benefits and improvements over prior approaches. First, the process is agnostic concerning models. Using several models built for the same task, an implementation can build a more robust filter that will work for any model within the same family of tasks. By using models for different tasks on the same dataset, it is possible to build a map of the data in terms of its absolute value; data that is useless across all tasks is useless in the absolute”, thus identifying one or more pathways that decrease an accuracy of the machine learning model outputting the label is disclosed, because, as stated in the applicant’s specification – Par [0071], the one or more pathways may comprise a contiguous plurality of nodes forming a path from a node that receives input based on a datapoint from the training dataset to a node that outputs the label. Prendki identifies datapoints that are harmful to model accuracy by evaluating their effect on model performance, with datapoints classified as harmful corresponding to contiguous computational routes within the machine learning model that lead to inaccurate label outputs)
determining a first set of the one or more datapoints that correlate with the one or more pathways that decrease the accuracy of the machine learning model outputting the label (Prendki, Par. [0021], “Training Set Optimization refers to the process of modifying a training set by removing redundant, useless, or harmful data rows; it differs from conventional compression in which each row is compressed by reducing its individual size and is more accurately described as denoising. Filter refers to a classifier (in most cases, binary) that separates a first subset of data having high information value from a second subset of data having less or no information value”, & Par. [0077], “The goal of the disclosed system is to identify which data records (rows) from the training set are creating such confusion and classify them as “harmful” to the model, in order to eliminate them in future retraining of the model.”, thus Prendki discloses determining a first set of datapoints that correlate with pathways that decrease model accuracy by evaluating individual training records based on their effect on a trained machine learning model’s performance and classifying those records as harmful when they introduce confusion or reduce predictive accuracy. Because, as stated in the applicant’s specification, a pathway may be a contiguous set of nodes from an input node receiving a datapoint to an output node generating a label, Prendki’s identification of harmful datapoints necessarily corresponds to identifying the specific input-to-output computational routes within the model through which those datapoints lead to incorrect label outputs. Thus, Prendki determines a subset of datapoints correlated with accuracy-degrading pathways by selecting and removing training records whose processing through the model results in decreased accuracy )
removing the first set of the one or more datapoints from the training dataset to generate a reduced training dataset (Prendki, Par. [0021], “Training Set Optimization refers to the process of modifying a training set by removing redundant, useless, or harmful data rows; it differs from conventional compression in which each row is compressed by reducing its individual size and is more accurately described as denoising.”, thus removing the first set of the one or more datapoints from the training dataset to generate a reduced training dataset is disclosed)
Prendki does not explicitly teach the machine learning model comprising a plurality of nodes and each node, of the plurality of nodes, being associated with a weight, one or more changes to the weight associated with each node of the plurality of nodes, and one or more changes to the weight associated with each node of the plurality of nodes, and using one or more model explainability techniques.
However, Shachar teaches wherein the machine learning model comprises a plurality of nodes and each node, of the plurality of nodes, is associated with a weight (Shachar, Par. [0045], “As shown in FIGS. 3 and 4, models 300 and 400 may include different layers, such as an input layer, a hidden layer, and an output layer, each having one or more nodes, however, different layers may also be utilized. For model 300, layers 1104 are shown, while for model 400, layers 1204 are shown”, & Par. [0046], “By continuously providing different sets of training data and penalizing models 300 and 400 when the output is incorrect, models 300 and 400 (and specifically, the representations of the nodes in the hidden layer) may be trained (adjusted) to improve performance of models 300, 400 in data classification. Adjusting models 300 and 400 may include separately adjusting the weights associated with each node in the hidden layer.”, thus Sharchar discloses a machine learning model having multiple layers of nodes in which each node participates in weighted connections that are adjusted during training, such that each node is associated with at least one weight used to compute the model’s output)
Shachar also teaches one or more changes to the weight associated with each node of the plurality of nodes (Shachar, Par. [0046], “By continuously providing different sets of training data and penalizing models 300 and 400 when the output is incorrect, models 300 and 400 (and specifically, the representations of the nodes in the hidden layer) may be trained (adjusted) to improve performance of models 300, 400 in data classification. Adjusting models 300 and 400 may include separately adjusting the weights associated with each node in the hidden layer.”, thus changes/adjustments to the weight associated with each node of plurality of nodes is disclosed)
Additionally, Shachar teaches using one or mode model explainability techniques (Shachar, Par. [0039], “After creation of the models, at block 9, model explanation is performed to understand the importance of micromodels and, inside each micromodel, the importance of the features to the micromodels. Thus, after building the models, an ML model explainer, such as an explanation algorithm, may be used to verify the added value of each separate feature. This may include utilizing SHAP or LIME to obtain a measure of importance of each feature in each classification task. Thereafter, an average of those contributions is determined to obtain a total significance level of each feature”, thus using one or mode model explainability techniques is disclosed)
It would have been obvious for one of ordinary skill in the art before the effective filing date
of the claimed invention to combine Prendki’s approach of identifying training datapoints whose processing through the machine learning model results in decreased output accuracy, which reads on identifying one or more pathways that decrease an accuracy of the machine learning model outputting the label, with Shachar’s use of ML model explainers to measure the importance of each feature, which corresponds to using one or more model explainability techniques, thereby improving dataset quality and resulting in a more accurate machine learning model (Shachar, Par. [0051], “An ML model explainer may be used to determine an added value of each feature to the ML models' classifications, such as a measure of importance of each feature in the classification tasks. This may be done using SHAP, LIME, or a lift ratio per each feature separately. Thereafter, an administrator may determine whether the risk scores used to enrich the data set provide a more accurate ML model for anomaly detection based on the comparison and feature importance.”) Thus, the combined teachings disclose using explainability feature importance to associate reductions in model accuracy with specific computational pathways activated by particular training datapoints, allowing identification of datapoints that negatively affect model performance and their removal to improve model accuracy.
Regarding Claim 11, Prendki combined with Shachar teaches all of the limitations of claim 10 as cited above and Prendki further teaches:
inputting the reduced training dataset into the machine learning model to determine whether the machine learning model outputs the label (Prendki, Par. [0046], “7: m.sub.select ← train(m, selected) (train a new model on selected) accuracy.sub.selected = test(m.sub.select, S.sub.test) “, & Par. [0058], “At block 160, the process executes computer instructions that are programmed to filter the second dataset of prospective training data using the data filter, and to output a refined training dataset comprising fewer records than the second dataset, the refined training dataset comprising only records of the second dataset having the usefulness value greater than a specified threshold.”, thus inputting the reduced training dataset into the machine learning model to determine whether the machine learning model outputs the label is disclosed, because Prendki retrains a machine learning model using a reduced (refined) training dataset and evaluates the model’s predictive output and accuracy on a test set, which corresponds to determining whether the machine learning model outputs the label when trained on the reduced training dataset)
comparing a first label outputted by the machine learning model trained on the training dataset to a second label outputted by the machine learning model trained on the reduced training dataset to determine whether the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset (Prendki, Par. [102], “The next step of filter validation 204 (FIG. 2) includes testing if the data filter does not generate biases, and that the accuracy obtained is as expected (function of how the threshold was set). For a more thorough estimation of the filter's efficacy, there can be a held-out training dataset which is filtered down. Two versions of the model are trained, one on the entire dataset and the other on the filtered down version. If the filtered down version achieves a similar accuracy level to the full version, then the data filter is useful.”, thus comparing a first label outputted by a model trained on the full training dataset with a second label outputted by a model trained on a reduced training dataset to evaluate whether the reduced dataset maintains at least comparable accuracy is disclosed, because Prendki trains two separate models on the full and filtered datasets and compares their resulting predictive accuracy to determine whether the reduced training dataset yields determinations that are at least as accurate as those produced using the full training dataset)
and in response to a determination that the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset, validating the reduced training dataset (Prendki, Par. [102], “Two versions of the model are trained, one on the entire dataset and the other on the filtered down version. If the filtered down version achieves a similar accuracy level to the full version, then the data filter is useful”, &Par. [103], “For instance, if the filter predicts 10% of the data as “useful”, the future versions of the model will be able to be trained with only 10% of the data (note that the amount of data used when training a model is not necessarily linear with the amount of time it takes to train; the disclosed system also provides customers with the capability to review this relationship)”, thus validating the reduced training dataset is disclosed, because Prendki evaluates the performance of a machine learning model trained on a reduced training dataset relative to a model trained on the full dataset and, upon determining that comparable accuracy is achieved, deems the reduced dataset useful and acceptable for future training, which corresponds to determining the validity of the reduced training dataset in response to achieving at least equivalent model accuracy)
Regarding Claim 12, Prendki combined with Shachar teaches all of the limitations of claim 10 as cited above and Prendki further teaches:
determining the first set of the one or more datapoints that causes the weight associated with each node associated with the one or more pathways to change by more than a threshold amount (Prendki, Par. [0022 - 0026], “In an embodiment, the disclosure provides a computer-implemented process of building a predictive (ML) model to predict the usefulness of a record (data point) in the context of the training process of a machine learning model. According to one embodiment, the following algorithmic flow is programmed [0023] 1. Collect/acquire (historical) training data. In the pseudocode algorithm examples set forth below, training data is denoted Strain. [0024] 2. Run process to measure usefulness of records within this training dataset (*); measurement of usefulness can be categorical or a score (number) [0025] 3. Categorize training data into groups of usefulness (*) [0026] This can be binary (useful/not useful), and can use a process to establish a threshold above which data is useful”, & Par. [0073], “All the details computed during the metadata generation phase are referred to as “metadata”—they are not data per se, but by-products of the training of the customer's model using a fraction of the customer's data that the disclosed system will use in the next stages of the process. Examples of metadata include, but are not limited to: Inference, Binary “correctness” (correctly/incorrectly predicted), Unlikelihood of prediction (if a record is predicted to be of a class that is rarely confused with ‘true’ class confusion matrix), Confidence level, First margin (difference between confidence of predicted class and next best class), Subsequent margins, Consensus between multiple models (can be perturbed versions of the same model) “Bayesian” confidence, List of activated neurons (if neural net), Activation functions, Weights and biases in model, and/or their derivatives, “Path length” (if decision tree)”, thus Prendki discloses determining, based on training-generated metadata, that particular datapoints produce changes in model weights and activations along specific computational pathways, and further discloses applying threshold-based criteria to classify those datapoints, which corresponds to determining that the first set of datapoints causes the weight associated with each node associated with the one or more pathways to change by more than a threshold amount.)
Regarding Claim 13, Prendki combined with Shachar teaches all of the limitations of claim 10 as cited above and Prendki further teaches:
wherein each of the one or more pathways comprises a plurality of nodes that are not used in a determination to output the label based on the training dataset inputted into the machine learning model (Prendki, Par. [0104], “Plugging in Active Learning allows the system to account for mutually contained information; indeed, the scores do not necessarily reflect that the information from record r.sub.i (from the training set) was not already contained in r.sub.j. r.sub.i and r.sub.j, i≠j could have similar scores but be redundant with each other and therefore might not be useful to use concurrently, which Active Learning and other related systems may address”, thus Prendki discloses identifying redundant or mutually contained datapoints whose contributions are determined to be unnecessary for model prediction, which correspond to computational pathways within the model that do not contribute to determining the output label and are therefore effectively not used in the label determination)
Regarding Claim 14, Prendki combined with Shachar teaches all of the limitations of claim 10 as cited above and Shachar further teaches:
determining that the first set of the one or more datapoints causes the one or more pathways to have a net decrease in outputting the label over the plurality of epochs (Shachar, Par. [0077], “Other tests with Active Learning (e.g., a process where the model is trained iteratively after gradually incrementing the size of the training set) have shown that, at times, the model oscillates from a state where it seems to have understood a class, back to a state where it is clearly confused”, &Par. [0088], “In some embodiments, this approach is simplistic because whenever a training record ends up helping for one class (typically, the one it belongs to) and hurting another, the formula would annihilate those different effects on different test records; which is why in practice, the system may use other approaches to correlate the absence/presence of a record from the training set to its effect on the training (inferred on the test set). Assuming that the ground truth is available for the training set also, it is possible to correlate those effects with more precision”, &Par. [0098], “The “threshold” can either reflect the maximum amount of the data that is desired to be used when training future versions of the model, or the limit (value) under which data seems to become useless (flat learning curve) or harmful (decreasing learning curve)”, thus determining that the first set of the one or more datapoints causes the one or more pathways to have a net decrease in outputting the label over the plurality of epochs is disclosed, because Shachar teaches iterative, epoch-based training in which the presence or absence of individual training records is correlated with model behavior across training iterations, including oscillations, confusion, and decreasing learning curves, and further teaches identifying datapoints whose inclusion results in harmful or degrading effects on model performance over time based on thresholded learning trends, which corresponds to correlating specific datapoints with model pathways that negatively affect label output across epochs, thereby enabling removal of datapoints that degrade training performance to improve model accuracy)
Regarding Claim 15, Prendki teaches a computing device (Prendki, Par. [0125], “a computing device”, thus a computing device is disclosed) comprising: one or more processors (Prendki, Par. [0127], “a processor”, thus one or more processors is disclosed); and memory (Prendki, Par. [0127], “memory”, memory is disclosed storing instructions that, when executed by the one or more processors, cause the computing device to:
input, by a computing device, a training dataset into a machine learning model to train the machine learning model to output a label […] (Prendki, Par. [0053], “At block 150, using a hardware processor for example, the method is programmed for executing computer instructions that are programmed to receive an input dataset of training data, the input dataset comprising a plurality of records, the input dataset having been previously used to train the second machine learning model”, thus inputting, by a computing device/a hardware processor, a training dataset into a machine learning model to train the machine learning model to output a label is disclosed)
determine, based on one or more datapoints of the training dataset […] (Prendki, Par. [0022], “In an embodiment, the disclosure provides a computer-implemented process of building a predictive (ML) model to predict the usefulness of a record (data point) in the context of the training process of a machine learning model”, thus Prendki discloses evaluating individual training datapoints by predicting their usefulness with respect to a machine learning model’s training and performance, where determinations about model behavior are made by analyzing how specific datapoints influence training outcomes)
identify […] one or more pathways that decrease an accuracy of the machine learning model outputting the label (Prendki, Par. [0021], “Training Set Optimization refers to the process of modifying a training set by removing redundant, useless, or harmful data rows; it differs from conventional compression in which each row is compressed by reducing its individual size and is more accurately described as denoising. Filter refers to a classifier (in most cases, binary) that separates a first subset of data having high information value from a second subset of data having less or no information value.”, & Par. [0048], “The process described thus far offers many benefits and improvements over prior approaches. First, the process is agnostic concerning models. Using several models built for the same task, an implementation can build a more robust filter that will work for any model within the same family of tasks. By using models for different tasks on the same dataset, it is possible to build a map of the data in terms of its absolute value; data that is useless across all tasks is useless in the absolute”, thus identifying one or more pathways that decrease an accuracy of the machine learning model outputting the label is disclosed, because, as stated in the applicant’s specification – Par [0071], the one or more pathways may comprise a contiguous plurality of nodes forming a path from a node that receives input based on a datapoint from the training dataset to a node that outputs the label. Prendki identifies datapoints that are harmful to model accuracy by evaluating their effect on model performance, with datapoints classified as harmful corresponding to contiguous computational routes within the machine learning model that lead to inaccurate label outputs)
determine a first set of the one or more datapoints that correlate with the one or more pathways that decrease the accuracy of the machine learning model outputting the label (Prendki, Par. [0021], “Training Set Optimization refers to the process of modifying a training set by removing redundant, useless, or harmful data rows; it differs from conventional compression in which each row is compressed by reducing its individual size and is more accurately described as denoising. Filter refers to a classifier (in most cases, binary) that separates a first subset of data having high information value from a second subset of data having less or no information value”, & Par. [0077], “The goal of the disclosed system is to identify which data records (rows) from the training set are creating such confusion and classify them as “harmful” to the model, in order to eliminate them in future retraining of the model.”, thus Prendki discloses determining a first set of datapoints that correlate with pathways that decrease model accuracy by evaluating individual training records based on their effect on a trained machine learning model’s performance and classifying those records as harmful when they introduce confusion or reduce predictive accuracy. Because, as stated in the applicant’s specification, a pathway may be a contiguous set of nodes from an input node receiving a datapoint to an output node generating a label, Prendki’s identification of harmful datapoints necessarily corresponds to identifying the specific input-to-output computational routes within the model through which those datapoints lead to incorrect label outputs. Thus, Prendki determines a subset of datapoints correlated with accuracy-degrading pathways by selecting and removing training records whose processing through the model results in decreased accuracy)
remove the first set of the one or more datapoints from the training dataset to generate a reduced training dataset (Prendki, Par. [0021], “Training Set Optimization refers to the process of modifying a training set by removing redundant, useless, or harmful data rows; it differs from conventional compression in which each row is compressed by reducing its individual size and is more accurately described as denoising.”, thus removing the first set of the one or more datapoints from the training dataset to generate a reduced training dataset is disclosed)
Prendki does not explicitly teach the machine learning model comprising a plurality of nodes and each node, of the plurality of nodes, being associated with a weight, one or more changes to the weight associated with each node of the plurality of nodes, and one or more changes to the weight associated with each node of the plurality of nodes, and using one or more model explainability techniques.
However, Shachar teaches wherein the machine learning model comprises a plurality of nodes and each node, of the plurality of nodes, is associated with a weight (Shachar, Par. [0045], “As shown in FIGS. 3 and 4, models 300 and 400 may include different layers, such as an input layer, a hidden layer, and an output layer, each having one or more nodes, however, different layers may also be utilized. For model 300, layers 1104 are shown, while for model 400, layers 1204 are shown”, & Par. [0046], “By continuously providing different sets of training data and penalizing models 300 and 400 when the output is incorrect, models 300 and 400 (and specifically, the representations of the nodes in the hidden layer) may be trained (adjusted) to improve performance of models 300, 400 in data classification. Adjusting models 300 and 400 may include separately adjusting the weights associated with each node in the hidden layer.”, thus Sharchar discloses a machine learning model having multiple layers of nodes in which each node participates in weighted connections that are adjusted during training, such that each node is associated with at least one weight used to compute the model’s output)
Shachar also teaches one or more changes to the weight associated with each node of the plurality of nodes (Shachar, Par. [0046], “By continuously providing different sets of training data and penalizing models 300 and 400 when the output is incorrect, models 300 and 400 (and specifically, the representations of the nodes in the hidden layer) may be trained (adjusted) to improve performance of models 300, 400 in data classification. Adjusting models 300 and 400 may include separately adjusting the weights associated with each node in the hidden layer.”, thus changes/adjustments to the weight associated with each node of plurality of nodes is disclosed)
Additionally, Shachar teaches using one or mode model explainability techniques (Shachar, Par. [0039], “After creation of the models, at block 9, model explanation is performed to understand the importance of micromodels and, inside each micromodel, the importance of the features to the micromodels. Thus, after building the models, an ML model explainer, such as an explanation algorithm, may be used to verify the added value of each separate feature. This may include utilizing SHAP or LIME to obtain a measure of importance of each feature in each classification task. Thereafter, an average of those contributions is determined to obtain a total significance level of each feature”, thus using one or mode model explainability techniques is disclosed)
It would have been obvious for one of ordinary skill in the art before the effective filing date
of the claimed invention to combine Prendki’s approach of identifying training datapoints whose processing through the machine learning model results in decreased output accuracy, which reads on identifying one or more pathways that decrease an accuracy of the machine learning model outputting the label, with Shachar’s use of ML model explainers to measure the importance of each feature, which corresponds to using one or more model explainability techniques, thereby improving dataset quality and resulting in a more accurate machine learning model (Shachar, Par. [0051], “An ML model explainer may be used to determine an added value of each feature to the ML models' classifications, such as a measure of importance of each feature in the classification tasks. This may be done using SHAP, LIME, or a lift ratio per each feature separately. Thereafter, an administrator may determine whether the risk scores used to enrich the data set provide a more accurate ML model for anomaly detection based on the comparison and feature importance.”) Thus, the combined teachings disclose using explainability feature importance to associate reductions in model accuracy with specific computational pathways activated by particular training datapoints, allowing identification of datapoints that negatively affect model performance and their removal to improve model accuracy.
Regarding Claim 16, Prendki combined with Shachar teaches all of the limitations of claim 15 as cited above and Prendki further teaches:
input the reduced training dataset into the machine learning model to determine whether the machine learning model outputs the label (Prendki, Par. [0046], “7: m.sub.select ← train(m, selected) (train a new model on selected) accuracy.sub.selected = test(m.sub.select, S.sub.test) “, & Par. [0058], “At block 160, the process executes computer instructions that are programmed to filter the second dataset of prospective training data using the data filter, and to output a refined training dataset comprising fewer records than the second dataset, the refined training dataset comprising only records of the second dataset having the usefulness value greater than a specified threshold.”, thus inputting the reduced training dataset into the machine learning model to determine whether the machine learning model outputs the label is disclosed, because Prendki retrains a machine learning model using a reduced (refined) training dataset and evaluates the model’s predictive output and accuracy on a test set, which corresponds to determining whether the machine learning model outputs the label when trained on the reduced training dataset)
compare a first label outputted by the machine learning model trained on the training dataset to a second label outputted by the machine learning model trained on the reduced training dataset to determine whether the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset (Prendki, Par. [102], “The next step of filter validation 204 (FIG. 2) includes testing if the data filter does not generate biases, and that the accuracy obtained is as expected (function of how the threshold was set). For a more thorough estimation of the filter's efficacy, there can be a held-out training dataset which is filtered down. Two versions of the model are trained, one on the entire dataset and the other on the filtered down version. If the filtered down version achieves a similar accuracy level to the full version, then the data filter is useful.”, thus comparing a first label outputted by a model trained on the full training dataset with a second label outputted by a model trained on a reduced training dataset to evaluate whether the reduced dataset maintains at least comparable accuracy is disclosed, because Prendki trains two separate models on the full and filtered datasets and compares their resulting predictive accuracy to determine whether the reduced training dataset yields determinations that are at least as accurate as those produced using the full training dataset)
and in response to a determination that the reduced training dataset causes the machine learning model to render a determination at least as accurate as the training dataset, validate the reduced training dataset (Prendki, Par. [102], “Two versions of the model are trained, one on the entire dataset and the other on the filtered down version. If the filtered down version achieves a similar accuracy level to the full version, then the data filter is useful”, &Par. [103], “For instance, if the filter predicts 10% of the data as “useful”, the future versions of the model will be able to be trained with only 10% of the data (note that the amount of data used when training a model is not necessarily linear with the amount of time it takes to train; the disclosed system also provides customers with the capability to review this relationship)”, thus validating the reduced training dataset is disclosed, because Prendki evaluates the performance of a machine learning model trained on a reduced training dataset relative to a model trained on the full dataset and, upon determining that comparable accuracy is achieved, deems the reduced dataset useful and acceptable for future training, which corresponds to determining the validity of the reduced training dataset in response to achieving at least equivalent model accuracy)
Regarding Claim 17, Prendki combined with Shachar teaches all of the limitations of claim 15 as cited above and Prendki further teaches:
determine the first set of the one or more datapoints causes the weight associated with each node associated with the one or more pathways to change by more than a threshold amount (Prendki, Par. [0022 - 0026], “In an embodiment, the disclosure provides a computer-implemented process of building a predictive (ML) model to predict the usefulness of a record (data point) in the context of the training process of a machine learning model. According to one embodiment, the following algorithmic flow is programmed [0023] 1. Collect/acquire (historical) training data. In the pseudocode algorithm examples set forth below, training data is denoted Strain. [0024] 2. Run process to measure usefulness of records within this training dataset (*); measurement of usefulness can be categorical or a score (number) [0025] 3. Categorize training data into groups of usefulness (*) [0026] This can be binary (useful/not useful), and can use a process to establish a threshold above which data is useful”, & Par. [0073], “All the details computed during the metadata generation phase are referred to as “metadata”—they are not data per se, but by-products of the training of the customer's model using a fraction of the customer's data that the disclosed system will use in the next stages of the process. Examples of metadata include, but are not limited to: Inference, Binary “correctness” (correctly/incorrectly predicted), Unlikelihood of prediction (if a record is predicted to be of a class that is rarely confused with ‘true’ class confusion matrix), Confidence level, First margin (difference between confidence of predicted class and next best class), Subsequent margins, Consensus between multiple models (can be perturbed versions of the same model) “Bayesian” confidence, List of activated neurons (if neural net), Activation functions, Weights and biases in model, and/or their derivatives, “Path length” (if decision tree)”, thus Prendki discloses determining, based on training-generated metadata, that particular datapoints produce changes in model weights and activations along specific computational pathways, and further discloses applying threshold-based criteria to classify those datapoints, which corresponds to determining that the first set of datapoints causes the weight associated with each node associated with the one or more pathways to change by more than a threshold amount.)
Regarding Claim 18, Prendki combined with Shachar teaches all of the limitations of claim 15 as cited above and Prendki further teaches:
wherein each of the one or more pathways comprises a plurality of nodes that are not used in a determination to output the label based on the training dataset inputted into the machine learning model (Prendki, Par. [0104], “Plugging in Active Learning allows the system to account for mutually contained information; indeed, the scores do not necessarily reflect that the information from record r.sub.i (from the training set) was not already contained in r.sub.j. r.sub.i and r.sub.j, i≠j could have similar scores but be redundant with each other and therefore might not be useful to use concurrently, which Active Learning and other related systems may address”, thus Prendki discloses identifying redundant or mutually contained datapoints whose contributions are determined to be unnecessary for model prediction, which correspond to computational pathways within the model that do not contribute to determining the output label and are therefore effectively not used in the label determination)
Regarding Claim 19, Prendki combined with Shachar teaches all of the limitations of claim 15 as cited above and Shachar further teaches:
determine that the first set of the one or more datapoints causes the one or more pathways to have a net decrease in outputting the label over the plurality of epochs (Shachar, Par. [0077], “Other tests with Active Learning (e.g., a process where the model is trained iteratively after gradually incrementing the size of the training set) have shown that, at times, the model oscillates from a state where it seems to have understood a class, back to a state where it is clearly confused”, &Par. [0088], “In some embodiments, this approach is simplistic because whenever a training record ends up helping for one class (typically, the one it belongs to) and hurting another, the formula would annihilate those different effects on different test records; which is why in practice, the system may use other approaches to correlate the absence/presence of a record from the training set to its effect on the training (inferred on the test set). Assuming that the ground truth is available for the training set also, it is possible to correlate those effects with more precision”, &Par. [0098], “The “threshold” can either reflect the maximum amount of the data that is desired to be used when training future versions of the model, or the limit (value) under which data seems to become useless (flat learning curve) or harmful (decreasing learning curve)”, thus determining the first set of the one or more datapoints that correlate with the one or more pathways that cause the machine learning model to have a net decrease in outputting the label over the plurality of epochs is disclosed, because Shachar teaches iterative, epoch-based training in which the presence or absence of individual training records is correlated with model behavior across training iterations, including oscillations, confusion, and decreasing learning curves, and further teaches identifying datapoints whose inclusion results in harmful or degrading effects on model performance over time based on thresholded learning trends, which corresponds to correlating specific datapoints with model pathways that negatively affect label output across epochs, thereby enabling removal of datapoints that degrade training performance to improve model accuracy)
Regarding Claim 20, Prendki combined with Shachar teaches all of the limitations of claim 15 as cited above and Prendki further teaches:
identify, using the one or more model explainability techniques and the one or more changes to the weight associated with each node of the plurality of nodes, one or more second pathways, wherein the one or more second pathways increase the accuracy of the machine learning model outputting the label (Pendki, Par. [0058], “At block 160, the process executes computer instructions that are programmed to filter the second dataset of prospective training data using the data filter, and to output a refined training dataset comprising fewer records than the second dataset, the refined training dataset comprising only records of the second dataset having the usefulness value greater than a specified threshold”, &Par. [0079], “One way to do so is to simply average, for each data record from training set, the confidence level achieved for each data record from within the test set and each sample (run) with a weight of +1 if the prediction for that record is correct, and −1 if it's incorrect, whenever this data record has been used to train the model. The metadata can be used to improve the confidence level. By doing so, the disclosed system will have high scores for each training record if they consistently help the model learn correctly”, thus identifying one or more second pathways that increase model accuracy, because Prendki evaluates the contribution of training datapoints to correct predictions using confidence and correctness metadata generated during model training, and selects datapoints that positively influence learned weights and model behavior, which correspond to contiguous computational pathways through nodes whose weight updates improve label output accuracy)
determine a second set of the one or more datapoints that correlate with the one or more second pathways (Prendki, Par. [0058], “At block 160, the process executes computer instructions that are programmed to filter the second dataset of prospective training data using the data filter, and to output a refined training dataset comprising fewer records than the second dataset, the refined training dataset comprising only records of the second dataset having the usefulness value greater than a specified threshold”, thus determining a second set of the one or more datapoints that correlate with the one or more second pathways is disclosed, because Prendki selects and retains training records based on their positive usefulness values derived from training behavior, which correspond to datapoints associated with computational pathways)
generate a second reduced training dataset comprising the second set of the one or more datapoints (Prendki, Par. [0058], “At block 160, the process executes computer instructions that are programmed to filter the second dataset of prospective training data using the data filter, and to output a refined training dataset comprising fewer records than the second dataset, the refined training dataset comprising only records of the second dataset having the usefulness value greater than a specified threshold”, thus generating a second reduced training dataset comprising the second set of the one or more datapoints is disclosed)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US20210295175A1 is pertinent to applicant’s disclosure because it teaches training a neural network by monitoring the model’s internal behavior during training and repeatedly updating weights and biases across many iterations to reach a desired output behavior. It also teaches applying constraint operations during training that identify which connections are conforming versus non-conforming (based on importance measures like weight magnitude or sensitivity), then progressively shrinking the non-conforming connections toward zero over time to enforce restrictions such as limiting disallowed feature interactions and reducing improper bias effects. Since the applicant also focuses on controlling undesirable model behavior (including bias-related behavior) by adjusting internal pathways/parameters during training and corrective procedures, the reference is relevant to the claimed invention.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAHLIET ADMASU whose telephone number is (571)272-0034. The examiner can normally be reached Mon-Fri, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached at (571)270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/M.T.A./Examiner, Art Unit 2123
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123