Last updated: April 19, 2026
Application No. 17/976,480
UPDATING LABEL PROBABILITY DISTRIBUTIONS OF DATA POINTS

Non-Final OA §101§103
Filed
Oct 28, 2022
Examiner
MARU, MATIYAS T
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Micro Focus LLC
OA Round
1 (Non-Final)
Interview Optional

— +12.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 40 resolved cases, 2023–2026
Examiner Intelligence

MARU, MATIYAS T View full profile →
Grants 58% of resolved cases
Career Allow Rate
23 granted / 40 resolved
+2.5% vs TC avg
Moderate +12% lift
Without
With
+12.5%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
39 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
35.9%
-4.1% vs TC avg
§103
50.9%
+10.9% vs TC avg
§102
1.9%
-38.1% vs TC avg
§112
11.3%
-28.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 40 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 4, 12 and 19 – 20 are objected to because of the following informalities:  
Regarding claim 4, the claim recites: “for each of the smaller number of data points, fixably specifying the label probability distribution of the data point such that the, …” for a clear understanding, should read “for each of the smaller number of data points, assign a fixed label probability distribution …” 
Claim(s) 12 and 19 – 20 recite similar language and required corresponding corrections for clarity and consistency.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim(s) 1 – 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e. an abstract idea) without significantly more.
In step 1, of the 101-analysis set forth in the MPEP 2106, the examiner has determined
that the following limitations recite a process that, under the broadest reasonable interpretation, falls within one or more statutory categories (processes).
In step 2A prong 1, of the 101-analysis set forth in MPEP 2106, the examiner has determined
that the following limitations recite a process that, under broadest reasonable interpretation, covers
a mental process but for the recitation of generic computer components:
Regarding claim 1, 
calculating, for each of a plurality of data points that each have a label probability distribution, a label quality measure based on the label probability distribution of the data point;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating probability distributions associated with data points and making a judgment about the quality of the labels based on these distributions. See (MPEP 2106.04)).
updating the label probability distribution of each of at least one of the data points using either or both of a classification technique and a constrained clustering technique based on the data points and the label quality measure of each data point.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating data points and their associated quality measures, making a judgment to categorize or group the data and revising the assigned label probability accordingly. See (MPEP 2106.04)).

If the claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process, but for the recitation of generic computer components, then it falls within the mental process. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2 of the 101-analysis, set forth in MPEP 2106, the examiner has determined that
the following additional elements do not integrate this judicial exception into a practical application:
A non-transitory computer-readable data storage medium storing program code executable by a processor to perform processing comprising performing one or more iterations
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).

In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the
claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception: 
Regarding limitation (I), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f). 

As analyzed above, the additional elements, analyzed above, do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Regarding claim 2, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein each data point comprises a feature vector having a plurality of values for different features.

(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves further describes organizing or characterizing information of a data point. See (MPEP 2106.04)).
Regarding claim 3, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the label probability distribution of each data point comprises a probability for each of a plurality of labels that the label is correct for the data point.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves further describes assigning and considering likelihoods for different possible labels associated with a data point. See (MPEP 2106.04)).
Regarding claim 4, dependent upon claim 3, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the processing further comprises, before performing the iterations: receiving the plurality of data points, including a smaller number of data points that have each been manually assigned the label that is correct for the data point and a larger number of data points that each have not been manually assigned the label that is correct for the data point;
The recitation in the additional limitation directed to mere data gathering as deemed insufficient to transform the judicial exception because claimed elements are considered insignificant extra-solution activity and well-understood routine and conventional (2106.05(d)). 
Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TL| Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). See MPEP 2106.05(d)(II).
The additional limitations as analyze failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above.
for each of the smaller number of data points, fixably specifying the label probability distribution of the data point such that the probability for the label that has been manually assigned is set to a highest value within the label probability distribution and the probability for every other label is set to a lowest value within the label probability distribution;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing manually assigned label, evaluating which label should be treated as correct, and assign it the highest probability while assigning the lowest probability to the remaining labels. See (MPEP 2106.04)).
for each of the larger number of data points, initially specifying the label probability distribution of the data point as a uniform probability distribution such that the probability for every label is set to a same value within the label probability distribution.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves making a decision to initial all labels as equally by assigning the same probability value to each label. See (MPEP 2106.04)).
Regarding claim 5, dependent upon claim 3, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein, for each data point, the label quality measure is calculated based on an entropy of the label probability distribution of the data point and an entropy of a uniform label probability distribution.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 6, dependent upon claim 3, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
 training a probabilistic classifier using a training subset of the data points, the probabilistic classifier trained to minimize a loss function weighted by the label quality measure of each data point of the training subset;
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
 applying the probabilistic classifier to each data point of the at least one of the data points to yield an updated probability for each label that the label is correct for the data point.
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 7, dependent upon claim 6, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
selecting the training subset of the data points
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves picking a training dataset based on certain criteria. See (MPEP 2106.04)).
wherein in an initial iteration, the training subset of the data points is selected as or from a smaller number of data points that each have been manually assigned the label that is corrected for the data point, and not as or from a larger number of data points that each have not been manually assigned the label that is correct for the data point.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing which data points have been manually labeled, evaluating their suitability for training and making judgment to select these labeled data points while excluding unlabeled ones. See (MPEP 2106.04)).
Regarding claim 8, dependent upon claim 7, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein in each iteration other than the initial iteration, selecting the training subset of the data points comprises: selecting each data point for which the label quality measure is greater than a threshold or a number of the data points for which the label quality measure is greater than the threshold.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves comparing measured values to a threshold and choosing data points based on the comparison. See (MPEP 2106.04)).
Regarding claim 9, dependent upon claim 3, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
clustering the data points over a plurality of clusters corresponding to the labels, such that the data points having a probability of belonging to a same cluster that is greater than a threshold are constrained to a same cluster; and 
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing the data points, evaluating their likelihoods of belonging to various groups and organized them into clusters based on threshold probability. See (MPEP 2106.04)).
using the clusters to yield an updated probability for each label that the label is correct for each data point of the at least one of the data points.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves making an observation how data points are grouped into clusters, evaluating the groupings and adjusting the likelihood that each label is correct based on the cluster assignments. See (MPEP 2106.04)).
Regarding claim 10, dependent upon claim 9, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the constrained clustering technique is a soft-constrained clustering technique providing a likelihood that each data point belongs to each cluster, and wherein, for each data point of the at least one of the data points, the updated probability for each label that the label is correct for the data point is the likelihood that the data point belongs to the cluster corresponding to the label.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 11, dependent upon claim 9, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the constrained clustering technique is a hard-constrained clustering technique in which each data point is belongs to one of the clusters, and wherein, for each data point of the at least one of the data points, the updated probability for each label that the label is correct for the data point is based on a quality metric that the one of the clusters to which the data point belongs corresponds to the label that is correct for the data point.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 12, dependent upon claim 3, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein each iteration further comprises: requesting a user to manually assign the label that is correct for each of a number of the data points having lowest label quality measures;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves identifying data points with low quality labels, evaluating data points to assign correct label. See (MPEP 2106.04)).
for each of the number of the data points having the lowest label quality measures, fixably updating the label probability distribution of the data point such that the probability for the label that has been manually assigned is set to a highest value within the label probability distribution and the probability for every other label is set to a lowest value within the label probability distribution.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing which label have been manually assigned, evaluating the probability and making a judgment to set the manually assigned labels as the highest and all others as the lowest. See (MPEP 2106.04)).
Regarding claim 13, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein each iteration further comprises: calculating a convergence metric of the label probability distribution of each data point between before having been updated and after having been updated;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing two states of label probability distributions, compare them and evaluating the degree of change to reach convergence. See (MPEP 2106.04)).
in response to the convergence metric being less than a threshold, concluding that a current iteration is a last iteration, such that no further iterations are performed; 
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating a value relative to a threshold and making a determination about whether there will be further iteration or not. See (MPEP 2106.04)).
in response to the convergence metric being greater than the threshold and a number of already performed iterations being equal to a maximum number, concluding that the current iteration is the last iteration, such that no further iterations are performed; and 
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing comparing numerical values (a convergence metric and an iteration count) against predetermined criteria to make a determination to stop processing. See (MPEP 2106.04)).
in response to the convergence metric being greater than the threshold and the number of already performed iterations being less than the maximum number, concluding that the current iteration is not the last iteration, such that at least one further iteration is performed.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing assessing multiple conditions and deciding whether additional action should be occur based on these assessments. See (MPEP 2106.04)).
Regarding claim 14, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the processing further comprises, after performing the iterations: assigning to each data point a label for which the data point has a highest probability within the label probability distribution of the data point.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves examining a set of probabilities associated with each data point, comparing those probabilities and selecting the highest one to make a labeling decision. See (MPEP 2106.04)).
Regarding claim 15, dependent upon claim 14, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the processing further comprises, after performing the iterations, either or both of: removing each data point for which the label quality measure is less than a threshold; and removing a number of the data points having lowest label quality measures.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating data points against a threshold or ranking them by a quality measure and deciding which ones to discard. See (MPEP 2106.04)).
Regarding claim 16, dependent upon claim 14, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the processing further comprises, after performing the iterations: either or both of training and validating a machine learning model that predicts an output label for an input data point, using the data points and the label assigned to each data point.
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 17, dependent upon claim 16, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the processing further comprises, after performing the iterations: applying the machine learning model as trained and/or validated to the input data point to predict the output label and a probability that the output label is correct for the input data point.
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 18, dependent upon claim 17, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the input data point comprises one or multiple log events of one or multiple devices, the output label corresponds to an anomaly
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
the processing further comprises: in response to the probability that the output label is correct for the input data being greater than a threshold, concluding that the one or multiple devices has the anomaly and performing an action to resolve the anomaly.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating a probability value against a threshold, making a determination based on that comparison and deciding to take a responsive action. See (MPEP 2106.04)).
Regarding claim 19, 
fixably setting a label probability distribution of each data point of the smaller number, and initially setting the label probability distribution of each data point of the larger number;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves assigning initial confidence levels to labels for difference groups of data points. See (MPEP 2106.04)).
for each data point of the smaller number, calculating a label quality measure based on the label probability distribution;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating and judging the quality or reliability of a label based on assigned probabilities. See (MPEP 2106.04)).
for each data point of the larger number, iteratively calculating a label quality measure based on the label probability distribution and updating the label probability using either or both of a classification technique and a constrained clustering technique based on the data points and the label quality measure of each data point;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating label quality and revising decisions about labels based on observed patterns and constraints. See (MPEP 2106.04)).

If the claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process, but for the recitation of generic computer components, then it falls within the mental process. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2 of the 101-analysis, set forth in MPEP 2106, the examiner has determined that
the following additional elements do not integrate this judicial exception into a practical application:
As evaluated below:
The preamble is deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because the preamble generally links the use of a judicial exception to a particular technological environment or field of use, see MPEP 2106.05(h).

receiving a plurality of data points, including a smaller number of data points that each have been manually assigned a label that is correct for the data point and a larger number of data points that each have not been manually assigned the label that is correct for the data point;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation directed to mere data gathering as deemed insufficient to transform the judicial exception because claimed elements are considered insignificant extra-solution activity. See MPEP (2106.05(g))).
either or both of training and validating a machine learning model, using the data points and the label probability distribution of each data point;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
applying the machine learning model as trained and/or validated to an input data point to predict an output label and a probability that the output label is correct for the input data point.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).

In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the
claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception: 
Regarding limitation (II and III), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f). 
Regarding limitation (I), additional elements considered extra/post solution activity, as analyzed above, are activity that are well-understood routine and conventional, specifically: the courts have recognized the computer functions as well‐understood, routine, and conventional functions.
Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TL| Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). See MPEP 2106.05(d)(II).

As analyzed above, the additional elements, analyzed above, do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Regarding claim 20, 
The rest of the limitations recite similar subject matter as claim 19, so are rejected under the same rationale.

in response to the probability that the one or multiple devices having the anomaly being greater than a threshold, perform an action to resolve the anomaly.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating a probability against a threshold and make a decision to take corrective action based on that evaluation. See (MPEP 2106.04)).
A system comprising: a processor; and a memory storing program code executable by the processor to: 
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
apply the machine learning model to one or multiple log events of one or multiple devices to output a probability that the one or multiple devices have an anomaly
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1 – 3, 5 – 6 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Soliman, Pub. No.: US20210084058A1, in view of Wang et al., Pub. No.: US20220303288A1, and Schmidtler et al., Pub. No.: US20100169250A1. 
Regarding claim 1, Soliman teaches: A non-transitory computer-readable data storage medium storing program code executable by a processor to perform processing comprising performing one or more iterations, each iteration comprising: 
(Soliman, “[0111] … In one embodiment, the computer system includes a processor [a processor] coupled to a bus and memory storage coupled to the bus [A non-transitory computer-readable data storage medium storing program code executable by a processor to perform processing comprising performing one or more iterations]. The memory storage can be volatile or non-volatile (i.e. transitory or non-transitory) and can include removable storage media. The computer can also include a display, provision for data input and output, etc. as will be understood by a person skilled in the relevant art.”)

calculating, for each of a plurality of data points that each have a label probability distribution, a label quality measure based on the label probability distribution of the data point; and 
(Soliman, “[0105] … The goal of this phase is to automatically derive a “model”. A model effectively encodes a mathematical function whose input is the application and whose output is a classification. In the context of using machine learning to detect malware, the output of the model (when applied to a file whose disposition is being sought) might be a binary label of either “benign” or “malign”. Certain machine learning models are also capable of producing a score that reflects the confidence in the label. For example, the output might be (“malign”, 0.95) which can be taken to mean that the model believes that the feature vector has a 95% chance of corresponding to a malicious software application [calculating, for each of a plurality of data points that each have a label probability distribution, a label quality measure based on the label probability distribution of the data point] (i.e.: distribution over labels (benign vs malign)).”)
Soliman does not teach: 
updating the label probability distribution of each of at least one of the data points using either or both of a classification technique 
a constrained clustering technique based on the data points and the label quality measure of each data point.
Wang teaches:
updating the label probability distribution of each of at least one of the data points using either or both of a classification technique 
(Wang, “[0050] In another embodiment, based on the result of the anomaly detection 119, the user may determine misclassifications of the input data as anomalous (or non-anomalous). The user may update the anomaly detector 101 regarding the misclassification through the feedback, where the feedback may comprise one or more labels indicating misclassification of the input data 103 [using either or both of a classification technique]. For example, if non-anomalous data is identified as anomalous by the anomaly detector 101, the user may provide label “No” to indicate misclassification of the input data 103. Accordingly, the tuner module 209, based on the feedback, may update [updating the label probability distribution of each of at least one of the data points] (i.e.: feedback leads to modifying detector weights/thresholds, which changes the probability/confidence assigned to datapoint) at least one of: the threshold 117 and weights 203 of at least one (or individual) loss function of the plurality of loss functions 113 in the weighted combination of the loss functions to correct the misclassification.”)
Wang and Soliman are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Wang with teachings of Soliman to add a classifier that detects intrusion by identifying anomalous network or control system data based on reconstruction loss exceeding a threshold (Wang, Abstract).
Soliman in view of Wang do not teach:
a constrained clustering technique based on the data points and the label quality measure of each data point.
Schmidtler teaches: 
a constrained clustering technique based on the data points and the label quality measure of each data point.
(Schmidtler, “[0073] In summary, at the M step of the transductive classification algorithm of Jaakkola, referenced herein, unlabeled data have to fulfill stricter classification constraints than the labeled data and their cumulative weight to the solution is less constrained than for labeled data [a constrained clustering technique based on the data points and the label quality measure of each data point]. In addition, unlabeled data with an expected label close to zero that lie within the margin of the current M step influence the solution the most. The resulting net effect of formulating the E and M step this way is illustrated by applying this algorithm to the dataset shown in FIG. 2.”)
Schmidtler, Soliman and Wang are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Schmidtler with teachings of Soliman and Wang to leverage both labeled and unlabeled network or control-system data to more accurately classify events (e.g., intrusion vs non-intrusion) using iterative based data (Schmidtler, Abstract).
Regarding claim 2, Soliman in view of Wang and Schmidtler teach the method of claim 1.
Soliman further teaches: wherein each data point comprises a feature vector having a plurality of values for different features.
(Soliman, “… [0105] In the context of using machine learning to detect malware, the output of the model (when applied to a file whose disposition is being sought) might be a binary label of either “benign” or “malign”. Certain machine learning models are also capable of producing a score that reflects the confidence in the label. For example, the output might be (“malign”, 0.95) which can be taken to mean that the model believes that the feature vector [wherein each data point comprises a feature vector having a plurality of values for different features] has a 95% chance of corresponding to a malicious software application. ”)
Regarding claim 3, Soliman in view of Wang and Schmidtler teach the method of claim 1.
Wang further teaches: wherein the label probability distribution of each data point comprises a probability for each of a plurality of labels that the label is correct for the data point.
(Wang, “[0021] To that end, the reconstruction loss is treated as a score calculated as a function of individual loss-terms and the weights, corresponding to each feature of multiple features of inputted data. With true labels, indicated by the user, for the small set of examples, the weights are adjusted toward improving the detection performance based on the labeled small set of examples. This can be achieved in an online fashion, with only small adjustments done incrementally for each labeled example, as limited feedback from the user is obtained during operation. In an example embodiment, the adjustment can specifically be realized with a gradient descent step on weights for binary classification cross-entropy loss between true labels (provided by user feedback) [wherein the label probability distribution of each data point comprises a probability for each of a plurality of labels that the label is correct for the data point] and classification scores that are computed from the reconstruction loss and threshold.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Wang with teachings of Soliman and Schmidtler for the same reasons disclosed for claim 1.
Regarding claim 5, Soliman in view of Wang and Schmidtler teach the method of claim 3.
Wang further teaches: wherein, for each data point, the label quality measure is calculated based on an entropy of the label probability distribution of the data point and an entropy of a uniform label probability distribution.
(Wang, “[0021] To that end, the reconstruction loss is treated as a score calculated as a function of individual loss-terms and the weights, corresponding to each feature of multiple features of inputted data. With true labels, indicated by the user, for the small set of examples, the weights are adjusted toward improving the detection performance based on the labeled small set of examples. This can be achieved in an online fashion, with only small adjustments done incrementally for each labeled example, as limited feedback from the user is obtained during operation. In an example embodiment, the adjustment can specifically be realized with a gradient descent step on weights for binary classification cross-entropy loss between true labels (provided by user feedback) and classification scores [wherein, for each data point, the label quality measure is calculated based on an entropy of the label probability distribution of the data point and an entropy of a uniform label probability distribution] that are computed from the reconstruction loss and threshold. ”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Wang with teachings of Soliman and Schmidtler for the same reasons disclosed for claim 1.
Regarding claim 6, Soliman in view of Wang and Schmidtler teach the method of claim 3.
Wang further teaches: wherein updating the label probability distribution comprises using at least the classification technique, the classification technique comprising: training a probabilistic classifier using a training subset of the data points, the probabilistic classifier trained to minimize a loss function weighted by the label quality measure of each data point of the training subset; and 
(Wang, “[0008] To address this issue, some embodiments instead of using the available anomalous training data for training the multi-class classifier [training a probabilistic classifier using a training subset of the data points], use the anomalous training data for tuning a threshold of anomaly detection of the one-class classifier. These embodiments alone or in a combination with compound loss function [, the probabilistic classifier trained to minimize a loss function weighted by the label quality measure of each data point of the training subset] can improve the accuracy of anomaly detection without increasing the complexity of its training”)
applying the probabilistic classifier to each data point of the at least one of the data points to yield an updated probability for each label that the label is correct for the data point.
(Wang, “[0050] In another embodiment, based on the result of the anomaly detection 119, the user may determine misclassifications of the input data as anomalous (or non-anomalous). The user may update the anomaly detector 101 regarding the misclassification through the feedback [applying the probabilistic classifier to each data point of the at least one of the data points to yield an updated probability], where the feedback may comprise one or more labels indicating misclassification of the input data 103. For example, if non-anomalous data is identified as anomalous by the anomaly detector 101, the user may provide label “No” to indicate misclassification of the input data 103. Accordingly, the tuner module 209, based on the feedback, may update at least one of: the threshold 117 and weights 203 of at least one (or individual) loss function of the plurality of loss functions 113 in the weighted combination of the loss functions to correct the misclassification [for each label that the label is correct for the data point].”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Wang with teachings of Soliman and Schmidtler for the same reasons disclosed for claim 1.
Regarding claim 12, Soliman in view of Wang and Schmidtler teach the method of claim 3.
Wang further teaches: wherein each iteration further comprises: 
requesting a user to manually assign the label that is correct for each of a number of the data points having lowest label quality measures; and for each of the number of the data points having the lowest label quality measures, 
(Wang, “[0020] Further, while providing anomaly detection results to the user, the feedback may be provided by the user, where the feedback indicates false alarms and missed detections to the anomaly detector, which provides labels for a small set of informative data examples [requesting a user to manually assign the label that is correct for each of a number of the data points having lowest label quality measures]. With this small set of labeled data flagged by the user, the anomaly detector may retune the loss-term weights to improve detection performance.”)

fixably updating the label probability distribution of the data point such that the probability for the label that has been manually assigned is set to a highest value within the label probability distribution and the probability for every other label is set to a lowest value within the label probability distribution.
(Wang, “[0021] To that end, the reconstruction loss is treated as a score calculated as a function of individual loss-terms and the weights, corresponding to each feature of multiple features of inputted data. With true labels, indicated by the user, for the small set of examples, the weights are adjusted toward improving the detection performance based on the labeled small set of examples [fixably updating the label probability distribution of the data point such that the probability for the label that has been manually assigned is set to a highest value within the label probability distribution and the probability for every other label is set to a lowest value within the label probability distribution]. This can be achieved in an online fashion, with only small adjustments done incrementally for each labeled example, as limited feedback from the user is obtained during operation. In an example embodiment, the adjustment can specifically be realized with a gradient descent step on weights for binary classification cross-entropy loss between true labels (provided by user feedback) and classification scores that are computed from the reconstruction loss and threshold.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Wang with teachings of Soliman and Schmidtler for the same reasons disclosed for claim 1.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Soliman in view of Wang, Schmidtler and in further view of MENDELOWITZ et al., Pub. No.: US20220400125A1 and Albrecht et al., Pub. No.: US20220309292A1. 
Regarding claim 4, Soliman in view of Wang and Schmidtler teach the method of claim 3.
Soliman further teaches: wherein the processing further comprises, before performing the iterations: [ ] a larger number of data points that each have not been manually assigned the label that is correct for the data point;
(Soliman, “[0105] … Each application in this set is optionally accompanied with a “label” of it disposition, for example “benign”, “malign”, or “unknown”. It is preferable to have fewer unknown samples. Furthermore, it is preferable for the corpus to be representative of the real world scenarios in which the machine learning techniques will ultimately be applied. For example, in the context of classifying software applications, it might be desirable if the applications in the corpus are reflective of what might be found on a typical system. This is followed by a “training phase” in which the applications together with the labels associated with the data, files, etc. [and a larger number of data points] themselves, are fed into an algorithm that implements the “training phase”. The goal of this phase is to automatically derive a “model” [that each have not been manually assigned the label that is correct for the data point]. A model effectively encodes a mathematical function whose input is the application and whose output is a classification. In the context of using machine learning to detect malware, the output of the model (when applied to a file whose disposition is being sought) might be a binary label of either “benign” or “malign”.”)

for each of the smaller number of data points, fixably specifying the label probability distribution of the data point such that the probability for the label that has been manually assigned 
(Soliman, “[0104] … Such systems “learn” to perform tasks by considering examples, generally without being programmed with any task-specific rules. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually [for the label that has been manually assigned] labeled as “cat” or “no cat” [fixably specifying the label probability distribution of the data point such that the probability] and using the results to identify cats in other images [for each of the smaller number of data points]. A person skilled in the relevant art will understand that a convolutional neural network is a class of neural networks that specializes in processing data that has a grid-like topology, such as an image.”)
Soliman in view of Wang and Schmidtler do not teach: 
receiving the plurality of data points, including a smaller number of data points that have each been manually assigned the label that is correct for the data point
is set to a highest value within the label probability distribution and the probability for every other label is set to a lowest value within the label probability distribution; and
for each of the larger number of data points, initially specifying the label probability distribution of the data point as a uniform probability distribution such that the probability for every label is set to a same value within the label probability distribution
MENDELOWITZ teaches: 
receiving the plurality of data points, including a smaller number of data points that have each been manually assigned the label that is correct for the data point
(MENDELOWITZ, “[0062] In contrast, in the two-stage ML pipeline architecture, the supervised ML model of the second phase may be employed to a significantly small group of anomaly events identified as such by the unsupervised ML model(s) of the first phase thus significantly reducing the labeled training dataset which may be needed to effectively train the supervised ML model(s). The labeled training dataset may be significantly reduced since the supervised ML model may be trained and learned to classify and identify accordingly only anomalies which are misidentified by the unsupervised ML model(s) which as stated herein before may constitute only a very small subset of the overall space of events the vehicle's environment. The labeled training dataset may be therefore significantly small, i.e., a fraction of the unlabeled training dataset [receiving the plurality of data points, including a smaller number of data points that have each been manually assigned the label that is correct for the data point ], thus significantly reducing the time, computing resources, and/or manual effort involved in creating it (the labeled training dataset)
MENDELOWITZ, Soliman, Wang and Schmidtler are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of MENDELOWITZ with teachings of Soliman, Wang and Schmidtler to add a real-time, staged machine learning approach that first identifies patterns in operational data then classifies and generates alerts for potential abnormal events. (MENDELOWITZ, Abstract).
Soliman in view of Wang, Schmidtler and MENDELOWITZ do not teach: 
is set to a highest value within the label probability distribution and the probability for every other label is set to a lowest value within the label probability distribution; and
for each of the larger number of data points, initially specifying the label probability distribution of the data point as a uniform probability distribution such that the probability for every label is set to a same value within the label probability distribution
Albrecht teaches: 
is set to a highest value within the label probability distribution and the probability for every other label is set to a lowest value within the label probability distribution; and
(Albrecht, “[0061] … The autoencoder architecture can update the respective probability in a peaking probability distribution to a highest probability value in the probability distribution (e.g., a highest probability value up to a maximum probability value of 1.0) [is set to a highest value within the label probability distribution], while the other probabilities in the probability distribution are much lower values than the highest probability value, indicating the unlabeled data item being processed (under examination) by the each autoencoder is more likely (predicted to be) a member of the set of classified labeled data associated with the each autoencoder (associated with the highest probability value). The other two autoencoders process poorly the same unlabeled data item and the autoencoder architecture typically updates the respective probabilities in a probability distribution to a much lower probability value [and the probability for every other label is set to a lowest value] that can range down to a minimum probability value approaching 0.0), indicating that the unlabeled data item is less likely (predicted to not be) a member of those other two sets of classified labeled data [within the label probability distribution] respectively associated with the other two autoencoders.”)

for each of the larger number of data points, initially specifying the label probability distribution of the data point.
(Albrecht, “[0080] The reconstruction optimizer controller 338, 112, conditions (optimizes) the initialized particular prototype autoencoder 202 by causing it to process a large batch of data items [for each of the larger number of data points, initially specifying the label probability distribution of the data point], including labeled data and unlabeled data, that are received at its input 204. The output 206 of the particular prototype autoencoder 202 provides a reconstructed version of the original data item received at its input 204. The reconstructed version of the original data item at the output 206 is compared 208 to the original data item received at the input 204, and the result of the comparison indicates a loss of information value. This loss of information value is then compared 210 to a target zero loss of information.”)

as a uniform probability distribution such that the probability for every label is set to a same value within the label probability distribution
(Albrecht, “[0044] …. For example, if there are three sets of classified labeled data (e.g., three labels that in this example respectively represent either: a satellite image that contains an ocean view, or a satellite image that contains a land rural view, or a satellite image that contains a land city view) then the probability of an unlabeled data item being a member of any one of the three classes (the three sets of classified labeled data) would be 33⅓ percent associated with the unlabeled data item for each of the three sets of classified labeled data. That is, and unlabeled data item initially would be assigned 33⅓% probability that it is a member of any one of the three sets of classified labeled data. The unlabeled data item (which has unknown membership in any of the three sets of classified labeled data in this example), initially is assigned the three probabilities (33⅓%, 33⅓%, and 33⅓%) associated [as a uniform probability distribution such that the probability for every label is set to a same value within the label probability distribution] with the three respective sets of classified labeled data, where the sum of the three probabilities totals 100%.”)
Albrecht, Soliman, Wang, Schmidtler and MENDELOWITZ are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Albrecht with teachings of Soliman, Wang, Schmidtler and MENDELOWITZ to add an automated, scalable data labeling mechanism that use an autoencoder to infer and assign probabilistic labels to unlabeled data to improve detection model training with minimal labeling. (Albrecht, Abstract).
Claim(s) 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Soliman in view of Wang, Schmidtler and in further view of MENDELOWITZ. 
Regarding claim 7, Soliman in view of Wang and Schmidtler teach the method of claim 6.
Soliman in view of Wang and Schmidtler do not teach:
selecting the training subset of the data points, wherein in an initial iteration, the training subset of the data points is selected as or from a smaller number of data points that each have been manually assigned the label that is corrected for the data point, and not as or from a larger number of data points that each have not been manually assigned the label that is correct for the data point.
MENDELOWITZ teaches: 
selecting the training subset of the data points, wherein in an initial iteration, the training subset of the data points is selected as or from a smaller number of data points that each have been manually assigned the label that is corrected for the data point, 
(MENDELOWITZ, “[0062] In contrast, in the two-stage ML pipeline architecture, the supervised ML model of the second phase may be employed to a significantly small group of anomaly events identified as such by the unsupervised ML model(s) of the first phase thus significantly reducing the labeled training dataset which may be needed to effectively train the supervised ML model(s). The labeled training dataset may be significantly reduced since the supervised ML model may be trained and learned to classify and identify accordingly only anomalies which are misidentified by the unsupervised ML model(s) which as stated herein before may constitute only a very small subset of the overall space of events in the vehicle's environment. The labeled training dataset may be therefore significantly small, i.e., a fraction of the unlabeled training dataset [selecting the training subset of the data points, wherein in an initial iteration, the training subset of the data points is selected as or from a smaller number of data points that each have been manually assigned the label that is corrected for the data point], thus significantly reducing the time, computing resources, and/or manual effort involved in creating it (the labeled training dataset).”)

and not as or from a larger number of data points that each have not been manually assigned the label that is correct for the data point.
(MENDELOWITZ, “[0063] Furthermore, since the supervised ML model(s) of the staged ML pipeline is trained with the small subset of anomaly events, the effort, time and/or computing resources invested to train the supervised ML model(s) may be significantly reduced compared to the existing supervised ML based methods using supervised ML model(s) trained with a significantly larger and typically extremely larger dataset [and not as or from a larger number of data points that each have not been manually assigned the label that is correct for the data point].”)
MENDELOWITZ, Soliman, Wang and Schmidtler are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of MENDELOWITZ with teachings of Soliman, Wang and Schmidtler to add a real-time, staged machine learning approach that first identifies patterns in operational data then classifies and generates alerts for potential abnormal events. (MENDELOWITZ, Abstract).
Regarding claim 8, Soliman in view of Wang, Schmidtler and MENDELOWITZ teach the method of claim 7.
Wang further teaches: wherein in each iteration other than the initial iteration, selecting the training subset of the data points comprises: selecting each data point for which the label quality measure is greater than a threshold or a number of the data points for which the label quality measure is greater than the threshold.
(Wang, “[0044] When the determined reconstruction loss 115 is higher than the threshold 117, the classifier 109 determines that a particular input sample comprises anomaly which may be a threat to a user [selecting each data point for which the label quality measure is greater than a threshold]. The classifier 109 may further notify the user regarding the detected anomaly. On the other hand, when the determined reconstruction loss 115 is less than the threshold 117, the classifier 109 determines that a particular input sample as benign data sample. In this way, the proposed anomaly detector 101 uses the reconstruction loss 115, which is the weighted combination of the plurality of loss functions 113, to determine a result of anomaly detection 119 in the input data 103.”)
Wang, Soliman, Schmidtler and MENDELOWITZ are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Wang with teachings of Soliman, Schmidtler and MENDELOWITZ to add a classifier that detects intrusion by identifying anomalous network or control system data based on reconstruction loss exceeding a threshold (Wang, Abstract).
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Soliman in view of Wang, Schmidtler and in further view of Bauer et al., Pub. No.: US20190163806A1. 
Regarding claim 9, Soliman in view of Wang and Schmidtler teach the method of claim 3.
Soliman in view of Wang and Schmidtler do not teach:
clustering the data points over a plurality of clusters corresponding to the labels, such that the data points having a probability of belonging to a same cluster that is greater than a threshold are constrained to a same cluster; and using the clusters to yield an updated probability for each label that the label is correct for each data point of the at least one of the data points.
Bauer teaches:
clustering the data points over a plurality of clusters corresponding to the labels, such that the data points having a probability of belonging to a same cluster that is greater than a threshold are constrained to a same cluster; and using the clusters to yield an updated probability for each label that the label is correct for each data point of the at least one of the data points.
(Bauer, “[0005] … correlate one or more clusters with corresponding events by: i) calculating, for each cluster-label pair comprising a given cluster in the set of clusters and a given label in a set of candidate labels, a value indicative of a correlation between the cluster and the label in the cluster-label pair; ii) selecting one cluster-label pair for each cluster, the selected cluster-label pair for a given cluster being the pair resulting in the highest value from amongst all other cluster-label pairs comprising the given cluster; iii) for each selected cluster-label pair in respect of which the resulting value is above a predetermined threshold, determining that the cluster is correlated with the event with which the label is associated; [clustering the data points over a plurality of clusters corresponding to the labels, such that the data points having a probability of belonging to a same cluster that is greater than a threshold are constrained to a same cluster] and generate output indicative of at least one such correlation [using the clusters to yield an updated probability for each label that the label is correct for each data point of the at least one of the data points].”)
Bauer, Soliman, Wang and Schmidtler are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Bauer with teachings of Soliman, Wang and Schmidtler to add a sensor and event based correlation capability that labels and clusters observation over time to identify patterns and relationship between operational data and specific events. (Bauer, Abstract).
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Soliman in view of Wang, Schmidtler, Bauer and in further view of Cohen et al., Pub. No.: US7870136B1. 
Regarding claim 10, Soliman in view of Wang, Schmidtler and Bauer teach the method of claim 9.
Soliman in view of Wang, Schmidtler and Bauer do not teach:
wherein the constrained clustering technique is a soft-constrained clustering technique providing a likelihood that each data point belongs to each cluster, and wherein, for each data point of the at least one of the data points, the updated probability for each label that the label is correct for the data point is the likelihood that the data point belongs to the cluster corresponding to the label.
Cohen teaches:
wherein the constrained clustering technique is a soft-constrained clustering technique providing a likelihood that each data point belongs to each cluster, and wherein, for each data point of the at least one of the data points, the updated probability for each label that the label is correct for the data point is the likelihood that the data point belongs to the cluster corresponding to the label.
(Cohen, (col. 4 line [41 – 59]), “Accordingly, described herein are methods and systems for providing data clustering by extending the above-described chunklet model to handle user-input soft constraints for soft partitioning of the data of interest. The soft constraints [wherein the constrained clustering technique is a soft-constrained clustering technique providing] are handled by directly sampling constraints to build probabilistic chunklets using weights representing a user's confidence [a likelihood that each data point belongs to each cluster] in each constraint rather than using approximations to the weighted HMRF with arbitrary penalty weights. Furthermore, various embodiments provide data clustering with automatically-generated constraints from the partitioning of feature sets derived from the data (as opposed to user-input constraints) and employ the generated data clusters to diagnose a system state or status [wherein, for each data point of the at least one of the data points, the updated probability for each label that the label is correct for the data point is the likelihood that the data point belongs to the cluster corresponding to the label]. Soft constraints, in the form of must-link and cannot-link constraints extended to include a confidence (or probability) level in a range from 0 to 1 for each constraint, enable probability assignments to the data clustering, which then allow the resulting data clusters to be used in a variety of different automated decision making tasks where alternative choices may be made.”)
Cohen, Soliman, Wang, Schmidtler and Bauer are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Cohen with teachings of Soliman, Wang, Schmidtler and Bauer to add a soft-constrained clustering capability that groups operational or network data based on pairwise relationship with associated confidence level, enabling more flexible and probabilistic data partition. (Cohen, Abstract).
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Soliman in view of Wang, Schmidtler, Bauer and in further view of Becket et al., Pub. No.: US20220057901A1.
Regarding claim 11, Soliman in view of Wang, Schmidtler and Bauer teach the method of claim 9.
Wang further teaches: wherein, for each data point of the at least one of the data points, the updated probability for each label that the label is correct for the data point is based on a quality metric that the one of the clusters to which the data point belongs corresponds to the label that is correct for the data point.
(Wang, “[0050] In another embodiment, based on the result of the anomaly detection 119, the user may determine misclassifications of the input data as anomalous (or non-anomalous). The user may update the anomaly detector 101 regarding the misclassification through the feedback, where the feedback may comprise one or more labels indicating misclassification of the input data 103. For example, if non-anomalous data is identified as anomalous by the anomaly detector 101, the user may provide label “No” to indicate misclassification of the input data 103. Accordingly, the tuner module 209, based on the feedback, may update at least one of: the threshold 117 and weights 203 of at least one (or individual) loss function of the plurality of loss functions 113 in the weighted combination of the loss functions to correct the misclassification [wherein, for each data point of the at least one of the data points, the updated probability for each label that the label is correct for the data point is based on a quality metric that the one of the clusters to which the data point belongs corresponds to the label that is correct for the data point].”)
Wang, Soliman, Schmidtler and Bauer are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Wang with teachings of Soliman, Schmidtler and Bauer to add a classifier that detects intrusion by identifying anomalous network or control system data based on reconstruction loss exceeding a threshold (Wang, Abstract).

Soliman in view of Wang, Schmidtler and Bauer do not teach:
wherein the constrained clustering technique is a hard-constrained clustering technique in which each data point is belongs to one of the clusters, and 
Beck teaches:
wherein the constrained clustering technique is a hard-constrained clustering technique in which each data point is belongs to one of the clusters, and 
(Beck, “[0005] There lies at least a need for a control that allows an operator to label his data using clustering M.L without being constrained by a fixed number of clusters [clustering technique is a hard-constrained clustering technique in which each data point is belongs to one of the clusters].”)
Beck, Soliman, Wang, Schmidtler and Bauer are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Beck with teachings of Soliman, Wang, Schmidtler and Bauer to add user interface for visualizing and interacting with hierarchically clustering data, allowing users to explore clusters, view underlying datasets and examine different representations of the clustered content. (Beck, Abstract).
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Soliman in view of Wang, Schmidtler and in further view of Muddu et al., Pub. No.: US20170063912A1.
Regarding claim 13, Soliman in view of Wang and Schmidtler teach the method of claim 1.
Soliman in view of Wang and Schmidtler do not teach: 
wherein each iteration further comprises: calculating a convergence metric of the label probability distribution of each data point between before having been updated and after having been updated; in response to the convergence metric being less than a threshold, concluding that a current iteration is a last iteration, such that no further iterations are performed; in response to the convergence metric being greater than the threshold and a number of already performed iterations being equal to a maximum number, concluding that the current iteration is the last iteration, such that no further iterations are performed; and in response to the convergence metric being greater than the threshold and the number of already performed iterations being less than the maximum number, concluding that the current iteration is not the last iteration, such that at least one further iteration is performed.
Muddu teaches: 
calculating a convergence metric of the label probability distribution of each data point between before having been updated and after having been updated; 
(Muddu, “[0589] The machine learning model 6300 continues the iterative process until the weight values at the network devices D1-D6 converge. At each step of the iterative process, for each node, the machine learning model 6300 keeps 15% of the weight value at the node and then equally distributes the remainder of the weight values along the edges to other nodes. The convergence criterion can be any criterion indicative of this type of convergence [calculating a convergence metric of the label probability distribution]. For example, the machine learning model 6300 can determine that the iterative process reaches a convergence when the change of weight values between two consecutive steps at each node is less than a threshold value [of each data point between before having been updated and after having been updated].”)

in response to the convergence metric being less than a threshold, concluding that a current iteration is a last iteration, such that no further iterations are performed; 
(Muddu, “[0590] Step Z of FIG. 65 shows the status of a final step with converged weight values when the iterative process reaches a convergence [in response to the convergence metric being less than a threshold, concluding that a current iteration is a last iteration, such that no further iterations are performed]. The converged weight values at the devices D1-D6 are similarity scores assigned to these devices. The machine learning model 6300 uses the similarity scores to determine whether multiple network devices are similar in terms of associated users that interact with the devices.”)

in response to the convergence metric being greater than the threshold and a number of already performed iterations being equal to a maximum number, concluding that the current iteration is the last iteration, such that no further iterations are performed; and 
(Muddu, “[0315] At step 2006, the model training process thread calls a model readiness logic in the model training process logic 1616 to determine when the model state has sufficient training [concluding that the current iteration is the last iteration, such that no further iterations are performed]. The model readiness logic can include measuring how many event feature sets have been used to train the model state; measuring how long the model state has been in training in real-time; whether the model state is converging (i.e., not changing within a threshold percentage despite additional training); [in response to the convergence metric being greater than the threshold and a number of already performed iterations being equal to a maximum number] or any combination thereof. Different model types can have different model readiness logics. At step 2008, when the model readiness logic determines that the model state has sufficient training, the model training process thread marks the model state for deployment.”)
in response to the convergence metric being greater than the threshold and 
(Muddu, “[0315] At step 2006, the model training process thread calls a model readiness logic in the model training process logic 1616 to determine when the model state has sufficient training. The model readiness logic can include measuring how many event feature sets have been used to train the model state; measuring how long the model state has been in training in real-time; whether the model state is converging (i.e., not changing within a threshold percentage despite additional training); [in response to the convergence metric being greater than the threshold and] or any combination thereof. Different model types can have different model readiness logics. At step 2008, when the model readiness logic determines that the model state has sufficient training, the model training process thread marks the model state for deployment.”)

the number of already performed iterations being less than the maximum number, concluding that the current iteration is not the last iteration, such that at least one further iteration is performed
(Muddu, “[0589] The machine learning model 6300 continues the iterative process until the weight values at the network devices D1-D6 converge [the number of already performed iterations being less than the maximum number, concluding that the current iteration is not the last iteration, such that at least one further iteration is performed]. At each step of the iterative process, for each node, the machine learning model 6300 keeps 15% of the weight value at the node and then equally distributes the remainder of the weight values along the edges to other nodes. The convergence criterion can be any criterion indicative of this type of convergence. For example, the machine learning model 6300 can determine that the iterative process reaches a convergence when the change of weight values between two consecutive steps at each node is less than a threshold value.”)
Muddu, Soliman, Wang and Schmidtler are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Muddu with teachings of Soliman, Wang and Schmidtler to add an iterative, distributed machine learning process with convergence monitoring, where computations at network nodes are adjusted and propagated until the system reaches a stable state based on a defined convergence criteria.  (Muddu, ¶[0585] - [0589]).
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Soliman in view of Wang, Schmidtler and in further view of Harris et al., Pub. No.: US20220407878A1.
Regarding claim 14, Soliman in view of Wang and Schmidtler teach the method of claim 1.
Soliman in view of Wang and Schmidtler do not teach: 
wherein the processing further comprises, after performing the iterations: assigning to each data point a label for which the data point has a highest probability within the label probability distribution of the data point
Harris teaches: 
wherein the processing further comprises, after performing the iterations: assigning to each data point a label for which the data point has a highest probability within the label probability distribution of the data point.
(Harris, “[0021] Responsively, program 150 calculates a majority class for each cluster utilizing the aforementioned labels. In an embodiment, if program 150 utilizes a pure assignment algorithm, then program 150 skips calculating the majority class of each cluster. In this embodiment, the clustering algorithm itself serves as a predictor, thus no classifier is needed. In the situation where the corresponding clustering algorithm creates overlapping clusters (i.e., clusters with shared datapoints), program 150 utilizes fuzzy labelling in which program 150 assigns a score or probability for each of the true labels identified above [assigning to each data point a label for which the data point has a highest probability within the label probability distribution of the data point]. In an embodiment, program 150 assigns the class or label with the highest associated score or probability. For example, program 150 labels either an event or a user utilizing the corresponding score, or, alternately, an additional fuzzy label is created and assigned.”)
Harris, Soliman, Wang and Schmidtler are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Harris with teachings of Soliman, Wang and Schmidtler to add a cluster based labeling mechanism with a majority assignment, enabling the system to assign labels or probabilities to events or data points, even in overlapping clusters, without requiring a separate classifier. (Harris,  ¶[0021]).
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Soliman in view of Wang, Schmidtler, Harris and in further view of Qu et al., Pub. No.: US20220318672A1.
Regarding claim 15, Soliman in view of Wang, Schmidtler and Harris teach the method of claim 14.
Soliman in view of Wang, Schmidtler and Harris do not teach: 
wherein the processing further comprises, after performing the iterations, either or both of: removing each data point for which the label quality measure is less than a threshold; and removing a number of the data points having lowest label quality measures.
Qu teaches:
wherein the processing further comprises, after performing the iterations, either or both of: removing each data point for which the label quality measure is less than a threshold; and removing a number of the data points having lowest label quality measures.
(Qu, “[0047] In one embodiment, in response to the confident learning at block 202, the training module 106 identifies the data samples in the training dataset that are predicted to be noisy (i.e.: noisy data is labels with quality lower than a threshold), labels the identified data samples as noisy, and removes these data samples from the training dataset [removing each data point for which the label quality measure is less than a threshold; and removing a number of the data points having lowest label quality measures].”)
Qu, Soliman, Wang, Schmidtler and Harris are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Qu with teachings of Soliman, Wang, Schmidtler and Harris to add a confidence learning based data cleansing capability that identifies and removes potentially noisy or mislabeled samples to improve the quality and reliability. (Qu, ¶ [0045] – [0047]).
Claim(s) 16 – 18 are rejected under 35 U.S.C. 103 as being unpatentable over Soliman in view of Wang, Schmidtler and Harris.
Regarding claim 16, Soliman in view of Wang, Schmidtler and Harris teach the method of claim 14.
Schmidtler further teaches: wherein the processing further comprises, after performing the iterations: either or both of training and validating a machine learning model that predicts an output label for an input data point, using the data points and the label assigned to each data point.
(Schmidtler, “[0135] Additionally, the program code comprises instructions for applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points [using the data points and the label assigned to each data point], as well as instructions for outputting a classification of the classified data points [either or both of training and validating a machine learning model that predicts an output label for an input data point], or derivative thereof, to at least one of a user, another system, and another process. Also, the decision function that minimizes the KL divergence to the prior probability distribution of the decision function parameters given the included and excluded training examples may be determined utilizing the labeled as well as the unlabeled data as learning examples according to their expected label.”)
 Schmidtler, Soliman, Wang and Harris are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Schmidtler with teachings of Soliman, Wang and Harris to leverage both labeled and unlabeled network or control-system data to more accurately classify events (e.g., intrusion vs non-intrusion) using iterative based data (Schmidtler, Abstract).
Regarding claim 17, Soliman in view of Wang, Schmidtler and Harris teach the method of claim 16.
Soliman further teaches: wherein the processing further comprises, after performing the iterations: applying the machine learning model as trained and/or validated to the input data point to predict the output label and a probability that the output label is correct for the input data point.
(Soliman, “[0105] … The goal of this phase is to automatically derive a “model”. A model effectively encodes a mathematical function whose input is the application and whose output is a classification. In the context of using machine learning to detect malware, the output of the model (when applied to a file whose disposition is being sought) might be a binary label of either “benign” or “malign”. [applying the machine learning model as trained and/or validated to the input data point to predict the output label] Certain machine learning models are also capable of producing a score that reflects the confidence in the label. For example, the output might be (“malign”, 0.95) which can be taken to mean that the model believes that the feature vector has a 95% chance of corresponding to a malicious software application [a probability that the output label is correct for the input data point].”)
Regarding claim 18, Soliman in view of Wang, Schmidtler and Harris teach the method of claim 17.
Soliman further teaches: wherein the input data point comprises one or multiple log events of one or multiple devices, the output label corresponds to an anomaly, and 
(Soliman, “[0103] A person skilled in the relevant art would understand the term “alert”, “alarm” or “notification” to refer to a message sent by an agent warning of a suspected or actual intrusion, malicious activity [the output label corresponds to an anomaly], violation or other anomaly and calling for some sort of action in response. Typically, such alerts, alarms and/or notifications may be sent to a display window in or on a management component and logged as an entry to a log file [wherein the input data point comprises one or multiple log events of one or multiple devices].”)

the processing further comprises: in response to the probability that the output label is correct for the input data being greater than a threshold, concluding that the one or multiple devices has the anomaly and 
(Soliman, “[0101] As used herein, an intrusion detection system (“IDS”) will be understood to refer to a device, system, method, apparatus or software application that monitors one or more networks or systems for malicious activity, policy violations, or anomalous activity [the processing further comprises: in response to the probability that the output label is correct for the input data being greater than a threshold, concluding that the one or multiple devices has the anomaly] (e.g. all outside intrusion). Any malicious activity, violation or anomaly is typically reported either to an administrator or collected centrally using a security information and event management (“SIEM”) system. A SIEM system combines outputs from multiple sources, and uses alarm/alert filtering techniques to distinguish malicious activity from false alarms.”)

performing an action to resolve the anomaly.
(Soliman, “[0103] A person skilled in the relevant art would understand the term “alert”, “alarm” or “notification” to refer to a message sent by an agent warning of a suspected or actual intrusion, malicious activity, violation or other anomaly and calling for some sort of action in response [performing an action to resolve the anomaly]. Typically, such alerts, alarms and/or notifications may be sent to a display window in or on a management component and logged as an entry to a log file.”)
Claim(s) 19 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Soliman in view of Albrecht, Wang and Schmidtler.
Regarding claim 19, Soliman teaches: A method comprising: receiving the plurality of data points, including a smaller number of data points that have each been manually assigned the label that is correct for the data point 
(Soliman, “[0104] A person skilled in the art will understand that the present description will reference terminology from the field of artificial intelligence, including machine learning, and may be known to such a person skilled in the relevant art. A person skilled in the relevant art will also understand that artificial neural networks generally refer to computing or computer systems that are design to mimic biological neural networks (e.g. animal brains). Such systems “learn” to perform tasks by considering examples, generally without being programmed with any task-specific rules. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as “cat” or “no cat” and using the results to identify cats in other images [receiving the plurality of data points, including a smaller number of data points that have each been manually assigned the label that is correct for the data point]. A person skilled in the relevant art will understand that a convolutional neural network is a class of neural networks that specializes in processing data that has a grid-like topology, such as an image.”)

and a larger number of data points that each have not been manually assigned the label that is correct for the data point;
(Soliman, “[0105] … Each application in this set is optionally accompanied with a “label” of it disposition, for example “benign”, “malign”, or “unknown”. It is preferable to have fewer unknown samples. Furthermore, it is preferable for the corpus to be representative of the real world scenarios in which the machine learning techniques will ultimately be applied. For example, in the context of classifying software applications, it might be desirable if the applications in the corpus are reflective of what might be found on a typical system. This is followed by a “training phase” in which the applications together with the labels associated with the data, files, etc. [and a larger number of data points] themselves, are fed into an algorithm that implements the “training phase”. The goal of this phase is to automatically derive a “model” [that each have not been manually assigned the label that is correct for the data point]. A model effectively encodes a mathematical function whose input is the application and whose output is a classification. In the context of using machine learning to detect malware, the output of the model (when applied to a file whose disposition is being sought) might be a binary label of either “benign” or “malign”.”)

fixably setting a label probability distribution of each data point of the smaller number, and 
(Soliman, “[0104] … Such systems “learn” to perform tasks by considering examples, generally without being programmed with any task-specific rules. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as “cat” or “no cat” [fixably setting a label probability distribution] and using the results to identify cats in other images [of each data point of the smaller number]. A person skilled in the relevant art will understand that a convolutional neural network is a class of neural networks that specializes in processing data that has a grid-like topology, such as an image.”)

for each data point of the smaller number, calculating a label quality measure based on the label probability distribution; 
(Soliman, “[0105] … The goal of this phase is to automatically derive a “model”. A model effectively encodes a mathematical function whose input is the application and whose output is a classification. In the context of using machine learning to detect malware, the output of the model (when applied to a file whose disposition is being sought) might be a binary label of either “benign” or “malign”. Certain machine learning models are also capable of producing a score that reflects the confidence in the label [for each data point of the smaller number, calculating a label quality measure based on the label probability distribution]. For example, the output might be (“malign”, 0.95) which can be taken to mean that the model believes that the feature vector has a 95% chance of corresponding to a malicious software application.”)

applying the machine learning model as trained and/or validated to an input data point to predict an output label and a probability that the output label is correct for the input data point.
(Soliman, “[0105] … The goal of this phase is to automatically derive a “model”. A model effectively encodes a mathematical function whose input is the application and whose output is a classification. In the context of using machine learning to detect malware, the output of the model (when applied to a file whose disposition is being sought) might be a binary label of either “benign” or “malign”. [applying the machine learning model as trained and/or validated to an input data point to predict an output label] Certain machine learning models are also capable of producing a score that reflects the confidence in the label. For example, the output might be (“malign”, 0.95) which can be taken to mean that the model believes that the feature vector has a 95% chance of corresponding to a malicious software application [a probability that the output label is correct for the input data point].”)

Soliman does not teach: 
initially setting the label probability distribution of each data point of the larger number;
for each data point of the larger number, iteratively calculating a label quality measure based on the label probability distribution and 
updating the label probability using either or both of a classification technique and a constrained clustering technique based on the data points and the label quality measure of each data point; 
either or both of training and validating a machine learning model, using the data points and the label probability distribution of each data point;
Albrecht teaches: 
initially setting the label probability distribution of each data point of the larger number;
(Albrecht, “[0080] The reconstruction optimizer controller 338, 112, conditions (optimizes) the initialized particular prototype autoencoder 202 by causing it to process a large batch of data items [initially setting the label probability distribution of each data point of the larger number], including labeled data and unlabeled data, that are received at its input 204. The output 206 of the particular prototype autoencoder 202 provides a reconstructed version of the original data item received at its input 204. The reconstructed version of the original data item at the output 206 is compared 208 to the original data item received at the input 204, and the result of the comparison indicates a loss of information value. This loss of information value is then compared 210 to a target zero loss of information.”)

for each data point of the larger number, iteratively calculating a label quality measure based on the label probability distribution and 
(Albrecht, “[0080] The reconstruction optimizer controller 338, 112, conditions (optimizes) the initialized particular prototype autoencoder 202 by causing it to process a large batch of data items [for each data point of the larger number, iteratively calculating a label quality measure based on the label probability distribution], including labeled data and unlabeled data, that are received at its input 204. The output 206 of the particular prototype autoencoder 202 provides a reconstructed version of the original data item received at its input 204. The reconstructed version of the original data item at the output 206 is compared 208 to the original data item received at the input 204, and the result of the comparison indicates a loss of information value. This loss of information value is then compared 210 to a target zero loss of information.”)
Albrecht and Soliman are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Albrecht with teachings of Soliman to add an automated, scalable data labeling mechanism that use an autoencoder to infer and assign probabilistic labels to unlabeled data to improve detection model training with minimal labeling. (Albrecht, Abstract).
Soliman in view of Albrecht do not teach: 
updating the label probability using either or both of a classification technique and a constrained clustering technique based on the data points and the label quality measure of each data point; 
either or both of training and validating a machine learning model, using the data points and the label probability distribution of each data point;
Wang teaches: 
updating the label probability using either or both of a classification technique and a constrained clustering technique based on the data points and the label quality measure of each data point; 
(Wang, “[0050] In another embodiment, based on the result of the anomaly detection 119, the user may determine misclassifications of the input data as anomalous (or non-anomalous). The user may update the anomaly detector 101 regarding the misclassification through the feedback, where the feedback may comprise one or more labels indicating misclassification of the input data 103 [using either or both of a classification technique and a constrained clustering technique based on the data points and the label quality measure of each data point]. For example, if non-anomalous data is identified as anomalous by the anomaly detector 101, the user may provide label “No” to indicate misclassification of the input data 103. Accordingly, the tuner module 209, based on the feedback, may update [updating the label probability] at least one of: the threshold 117 and weights 203 of at least one (or individual) loss function of the plurality of loss functions 113 in the weighted combination of the loss functions to correct the misclassification.”)
Wang, Soliman and Albrecht are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Wang with teachings of Soliman and Albrecht to add a classifier that detects intrusion by identifying anomalous network or control system data based on reconstruction loss exceeding a threshold (Wang, Abstract).
Soliman in view of Albrecht and Wang do not teach: 
either or both of training and validating a machine learning model, using the data points and the label probability distribution of each data point;
Schmidtler teaches:
either or both of training and validating a machine learning model, using the data points and the label probability distribution of each data point;
(Schmidtler, “[0135] Additionally, the program code comprises instructions for applying the trained classifier to classify at least one of the unlabeled data points, the labeled data points, and input data points [using the data points and the label probability distribution of each data point], as well as instructions for outputting a classification of the classified data points [either or both of training and validating a machine learning model], or derivative thereof, to at least one of a user, another system, and another process. Also, the decision function that minimizes the KL divergence to the prior probability distribution of the decision function parameters given the included and excluded training examples may be determined utilizing the labeled as well as the unlabeled data as learning examples according to their expected label.”)
Schmidtler, Soliman, Albrecht and Wang are related to the same field of endeavor (i.e.: a computer system that is responsible for managing the data and the programs that train and operate the machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Schmidtler with teachings of Soliman, Albrecht and Wang to leverage both labeled and unlabeled network or control-system data to more accurately classify events (e.g., intrusion vs non-intrusion) using iterative based data (Schmidtler, Abstract).
Regarding claim 20, Soliman teaches: A system comprising: a processor; and a memory storing program code executable by the processor to: 
(Soliman, “[0111] … In one embodiment, the computer system includes a processor [a processor] coupled to a bus and memory storage coupled to the bus [a memory storing program code executable by the processor to]. The memory storage can be volatile or non-volatile (i.e. transitory or non-transitory) and can include removable storage media. The computer can also include a display, provision for data input and output, etc. as will be understood by a person skilled in the relevant art.”)

apply the machine learning model to one or multiple log events of one or multiple devices to output a probability that the one or multiple devices have an anomaly; 
(Soliman, “[0105] … The goal of this phase is to automatically derive a “model”. A model effectively encodes a mathematical function whose input is the application and whose output is a classification. In the context of using machine learning to detect malware, the output of the model (when applied to a file whose disposition is being sought) might be a binary label of either “benign” or “malign”. [apply the machine learning model to one or multiple log events of one or multiple devices to output a probability that the one or multiple devices have an anomaly] Certain machine learning models are also capable of producing a score that reflects the confidence in the label. For example, the output might be (“malign”, 0.95) which can be taken to mean that the model believes that the feature vector has a 95% chance of corresponding to a malicious software application [a probability that the output label is correct for the input data point].”)

and in response to the probability that the one or multiple devices having the anomaly being greater than a threshold, perform an action to resolve the anomaly.
(Soliman, “[0101] As used herein, an intrusion detection system (“IDS”) will be understood to refer to a device, system, method, apparatus or software application that monitors one or more networks or systems for malicious activity, policy violations, or anomalous activity [in response to the probability that the one or multiple devices having the anomaly being greater than a threshold,] (e.g. all outside intrusion). Any malicious activity, violation or anomaly is typically reported either to an administrator or collected centrally using a security information and event management (“SIEM”) system [perform an action to resolve the anomaly]. A SIEM system combines outputs from multiple sources, and uses alarm/alert filtering techniques to distinguish malicious activity from false alarms.”)

The rest of the limitation are analogous to claim 19, so are rejected under similar rationale.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
SAMEL et al., Pub. No.: US20190354810A1.
A technique for processing training data for a machine learning model. The technique includes training the machine learning model using training data comprising a set of features and a set of original labels associated with the set of features. The technique also includes generating multiple groupings of the training data based on internal representations of the training data in the machine learning model.  
Watkins, Pub No.: US11037073B1.
A data analysis system utilizing custom unsupervised machine learning processes over a communications network is disclosed, the system comprising a repository of data connected to the communications network, a web application deployed on a web server connected to the communications network, the web application including a data collection interface between the web server and the repository of data, wherein the web application is configured for providing a graphical user interface for modifying, by a user, a plurality of threshold parameters of a clustering algorithm for clustering the data.

Any inquiry concerning this communication or earlier communications from the examiner
should be directed to MATIYAS T MARU whose telephone number is (571)270-0902. The examiner
can normally be reached Monday 8:00am - Friday 4:00pm EST.
		Examiner interviews are available via telephone, in-person, and video conferencing using a
USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to
use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor,
Michelle Bechtold can be reached on (571)431-0762. The fax phone number for the organization were this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from
Patent Center. Unpublished application information in Patent Center is available to registered users.
To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit
https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and
https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional
questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like
assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA)
or 571-272-1000.

/M.T.M./       Examiner, Art Unit 2148                                                                                                                                                                                                 
/MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Oct 28, 2022
Application Filed
Dec 29, 2025
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/367,134
Patent 12586114
GENERATING DIGITAL RECOMMENDATIONS UTILIZING COLLABORATIVE FILTERING, REINFORCEMENT LEARNING, AND INCLUSIVE SETS OF NEGATIVE FEEDBACK
2y 5m to grant Granted Mar 24, 2026
17/138,890
Patent 12572796
METHODS AND SYSTEMS FOR GENERATING RECOMMENDATIONS FOR COUNTERFACTUAL EXPLANATIONS OF COMPUTER ALERTS THAT ARE AUTOMATICALLY DETECTED BY A MACHINE LEARNING ALGORITHM
2y 5m to grant Granted Mar 10, 2026
17/161,575
Patent 12567004
METHOD OF MACHINE LEARNING TRAINING FOR DATA AUGMENTATION
2y 5m to grant Granted Mar 03, 2026
17/329,627
Patent 12561588
Methods and Systems for Generating Example-Based Explanations of Link Prediction Models in Knowledge Graphs
2y 5m to grant Granted Feb 24, 2026
17/384,253
Patent 12561584
TEACHING DATA PREPARATION DEVICE, TEACHING DATA PREPARATION METHOD, AND PROGRAM
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
58%
Grant Probability
70%
With Interview (+12.5%)
4y 6m
Median Time to Grant
Low
PTA Risk
Based on 40 resolved cases by this examiner. Grant probability derived from career allow rate.