Office Action Analysis: 18149451 — SYSTEM AND METHOD FOR TRAINING A MACHINE LEARNING NETWORK

Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1 – 3, 8 – 14, and 17 – 20 were amended. Claims 1 – 20 are pending and examined herein. 
Claims 2 – 3, 13 – 14 are rejected under 35 U.S.C. 112(b).
Claims 1 – 20 are rejected under 35 U.S.C. 101.
Claims 1 – 20 are rejected under 35 U.S.C. 103.

Response to Amendment
	The amendment filed January 12th, 2026 has been entered. Claims 1 – 3, 8 – 14, and 17 – 20 were amended. Claims 1 – 20 are pending and examined herein. Applicant’s amendments to the claims, specification, and drawing have overcome each and every objection and 112(b) rejection previously set forth in the Non-Final Rejection Office Action mailed November 12th, 2025.

Response to Arguments
Applicant's arguments filed January 12th, 2026 regarding the 35 U.S.C. 101 rejection of claims 1-20 have been fully considered but they are not persuasive. Applicant argues, on page 12, that amended claim 1 does not recite a mental process because certain recited operations, including training a machine learning model and updating the trained machine learning model, cannot practically be performed in the human mind. However, the rejection does not rely on those computer implemented limitations as the recited abstract idea. Rather, under the BRI of claim 1, the claim still recites mental process where the limitation states classifying unlabeled data records as having or not having a particular characteristic, and evaluating/selecting at least one data record based on whether it matches a decision criterion, which are observations, evaluations, and judgments recited at a high level of generality. The limitations reciting a machine learning model, a label propagation algorithm, and updating the trained model are treated as additional elements. A claim may still recite a mental process even when claimed as being performed on a computer, and the presence of additional computer implemented elements does not remove the recited abstract idea from the mental process category. 
Applicant further argues that claim 1 integrates any recited exception into a practical application as the claim recites a particular machine, a machine learning model and a label propagation algorithm. This argument is not persuasive. As claimed, these features are recited at a function level and are used as tools to carry out the recited classification and selection operations. Claim 1 does not recite a specifically configured machine or a particular technological architecture that meaningfully limits the abstract idea. Accordingly, these additional elements do not integrate the recited abstract idea into a practical application. 
Applicant also argues that claim 1 reflects a technological improvement in the area because the specification describes use of low labeled datasets, reduced human intervention, near real time operation, reduced training time, improved training accuracy, and improved prediction quality. USPTO guidance explains that, even where the specification discusses an improvement, the claim itself must reflect the disclosed improvement by reciting the components or steps that provide it. Claim 1 does not recite a particularized machine learning mechanism, model architecture, parameter updating technique, data structure, or other concrete technological implementation that produces the asserted benefits. Instead, claim 1 recites the result of iteratively selecting records, labeling records, and updating a model at a high functional level. On this record, the alleged improvement is not reflected in the claim as a concrete improvement to computer functionality or to machine learning technology itself. 
Applicant further argues that the claimed combination is not well-understood, routine, or conventional. However, the additional elements, considered individually and in combination, do not amount to significantly more than the recited abstract idea. USPTO guidance explains that invoking computers or other machinery merely as tools to perform an existing process, or claiming the improved speed or efficiency inherent in applying an abstract idea on a computer, does not integrate the exception into a practical application or provide significantly more. The additional elements in claims are recited at a high level of generality and merely use computer implemented tools to carry out the recited evaluate/selection concept, rather than reciting a technology-based solution of the type recognized as patent-eligible. 
Applicant’s reliance on SME Examples 2, 23,and 37 is also not persuasive. The USPTO explains that the eligibility examples are teaching tools and fact specific illustrations of the Office’s guidance, and they do not control the outcome for a different claim set. Applicant’s separate reliance on Example 47 for claims 11 and 20 is likewise unpersuasive. Example 47 found practical application because the claim reflected a specific improvement in the technical field of network intrusion detection, including detecting a suspicious source address, dropping suspicious packets, and blocking future traffic from that source in real time. By contrast, claims 11 and 20 here recite a fraud-screening context in which the particular characteristic is suspicion that a record is fraudulent, the positive records are associated with known fraud sources, and the method further includes automatically blocking a transaction associated with the record. That limitation is directed to the consequence of a fraud determination, not to a specific improvement in computer or network operation of the type present in Example 47. 
Finally, applicant’s reliance on the later issuance of Lesiecki as a patent is not persuasive. Subject matter eligibility must be determined for the claims presently under examination, based on their broadest reasonable interpretation and the claim as a whole analysis required by current USPTO guidance. The issuance of a different patent does not substitute for that claim specific eligibility analysis. Accordingly, the rejection under 35 U.S.C. § 101 is maintained.

	Applicant's arguments filed January 12th, 2026 regarding the rejections under 35 U.S.C. 103 have been fully considered and are persuasive. The cited references do not fairly teach or suggest the claim as amended. However, new references, Stergioudis (US 2022/0230095 A1), Givental et al. (US 2021/0281592 A1), Perrizo (US 2008/0040302 A1) are introduced in the below 35 U.S.C. 103 rejection to teach the new features.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2 – 3, 13 – 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 2 and 3 recite the limitation "the at least one data record comprises" in line 2. These claims refer back to “at least one data record” in claim 1 and written like the dependent claims are narrowing these single “data record” somehow “comprises” all unlabeled records, or a subset of records. It is unclear whether the dependent claims are narrowing the number of selected records, the composition of a selected set of records, or the selecting operation itself. Claim 3 needs to be clearly written as it is confusing whether the selected subset from not labeled ones are being labeled by the label propagation algorithm or the label propagation algorithm is being used to select subset of the data records. It is unclearly written to determine which step is being narrow down in the dependent claim. There is insufficient antecedent basis for these limitations in the claim. For examination purposes, claim 2 would be interpreted as “wherein selecting the at least one data record comprises selecting all of the data records that are not labeled” and claim 3 would be interpreted as “wherein selecting the at least one data record comprises selecting, from the data records that are not labeled, a subset of data records to be labeled by the label propagation algorithm.” 

Claims 13 – 14 recite substantially similar subject matter to claim 2 – 3 respectively and are rejected with the same rationale, mutatis mutandis.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 - 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject matter. The analysis of claims 1-20, in accordance with these steps, follows. 

Step 1 Analysis:
Step 1 is to determine whether the claim is directed to a statutory category (process, machine, manufacture, or composition of matter.
Claims 1 – 11 are directed to a system, meaning that it is directed to the statutory category of machine. Claims 12 – 20 are directed to a computer-implemented method, which is the statutory category of process 

Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.

Regarding claim 1, the following claim elements are abstract ideas:
wherein each data record of the set of data records comprises a set of features that have been matched against a list of data records having a particular characteristic, (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
wherein a first fraction of the data records of the set of data records are labeled as having the particular characteristic, (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
wherein a second fraction of the data records of the set of data records are labeled as not having the particular characteristic, (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
wherein a majority of the data records of the set of data records are not labeled; (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
to classify the data records that are not labeled as either having the particular characteristic or not having the particular characteristic (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
selecting at least one data record that matches a decision criterion; (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
labeling the selected at least one data record as either having the particular characteristic or not having the particular characteristic; (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
receiving a set of data records, (This is mere data gathering, an insignificant extra solution activity, which is a well-understood, routine conventional activity. It does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d). Therefore, this does not amount to significantly more than the judicial exception.)
with the data records labeled as having the particular characteristic and the data records labeled as not having the particular characteristic, training a machine learning model; (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
using the trained machine learning model, … (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
from the data records that are not labeled, iteratively: (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
with a label propagation algorithm, based on the classifications, (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
and updating the trained machine learning model to include the selected at least one data record and the labeling of the selected at least one data record. (This is mere data gathering, an insignificant extra solution activity, which is a well-understood, routine conventional activity. It does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d). Therefore, this does not amount to significantly more than the judicial exception.)
wherein the iteration continues until all of the data records are labeled, such that the updated trained machine learning model is a final model. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 2, the rejection of claim 1 is incorporated herein. Further, claim 2 recites the following additional element:
wherein the at least one data record comprises all of the data records that are not labeled. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 3, the rejection of claim 1 is incorporated herein. Further, claim 3 recites the following additional element:
wherein the at least one data records comprises, of the data records that are not labeled, a subset of data records labeled by the label propagation algorithm. (This is mere data gathering, an insignificant extra solution activity, which is a well-understood, routine conventional activity. It does not integrate the judicial exception into a practical application. See MPEP § 2106.05(d). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 4, the rejection of claim 1 is incorporated herein. Further, claim 4 recites the following abstract idea:
wherein the label propagation algorithm employs least confidence sampling, margin sampling, or entropy-based sampling, or a combination thereof. (This is merely mathematical equations to apply, which is mathematical concept.)
Claim 4 does not recite additional elements

Regarding claim 5, the rejection of claim 1 is incorporated herein. Further, claim 5 recites the following abstract idea:
wherein the data records of the set of data records are mapped into a multidimensional space wherein each axis of the space represents a feature of the set of features. (This is merely mathematical relationship of mapping data records in multidimensional space, which is mathematical concept.)
Claim 5 does not recite additional elements

Regarding claim 6, the rejection of claim 5 is incorporated herein. Further, claim 6 recites the following abstract idea:
wherein a decision boundary in the multidimensional space divides data records of the set of data records having the particular characteristic from data records of the set of data records not having the particular characteristic. (This is merely mathematical relationship where decision boundary divides data records, which is mathematical concept.)
Claim 6 does not recite additional elements.

Regarding claim 7, the rejection of claim 6 is incorporated herein. Further, claim 7 recites the following abstract idea:
wherein the decision criterion comprises determining on which side of the decision boundary the selected data record falls. (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
Claim 7 does not recite additional elements.

Regarding claim 8, the rejection of claim 1 is incorporated herein. Further, claim 8 recites the following abstract idea:
Identifying … labeled neighbors within a given search radius from the data records labeled as having the particular characteristic and the data records labeled as not having the particular characteristic. (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
Claim 8 further recites the following additional elements.
,for the selected at least one data record, (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
, in a multidimensional space wherein each axis of the space represents a feature of the set of features, (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 9, the rejection of claim 8 is incorporated herein. Further, claim 9 recites the following abstract idea:
wherein the decision criterion comprises determining whether a majority of the labeled neighbors within the given search radius are labeled as having the particular characteristic. (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
Claim 9 does not recite additional elements.

Regarding claim 10, the rejection of claim 8 is incorporated herein. Further, claim 10 recites the following abstract idea:
wherein the decision criterion comprises determining whether a weighted majority of the labeled neighbors within the given search radius are labeled as having the particular characteristic, (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components or by a human using a pen and paper.)
wherein the weighting is based on a distance between the selected record and each respective labeled neighbor within the multidimensional space. (This is merely reciting mathematical relationship, which is mathematical concept)
Claim 10 does not recite additional elements.

Regarding claim 11, the rejection of claim 1 is incorporated herein. Further, claim 11 recites the following additional element:
wherein the particular characteristic comprises a suspicion that the data record is fraudulent, wherein the list of data records having the particular characteristic comprises a list of known fraud sources. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
and wherein the operations further comprise automatically blocking a transaction associated with the data record. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Claims 12 – 20 recite substantially similar subject matter to claims 1 – 11 respectively and are rejected with the same rationale, mutatis mutandis.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1 – 4, 11 - 15 are rejected under 35 U.S.C. 103 as being unpatentable over Stergioudis (U.S. Pub. 2022/0230095 A1) in view of Givental et al.(U.S. Pub. 2021/0281592 A1), further in view of Lesiecki et al. (U.S. Pub. 2013/0204880 A1).
	 Regarding claim 1, Stergioudis teaches 
wherein a majority of the data records of the set of data records are not labeled; ([0007] of Stergioudis states “The method can include the one or more processors inputting the unlabeled data set into the surrogate model trained with the labeled data set to generate the predictions for the unlabeled data set.” [0029] of Stergioudis states ”For example, active learning can include selecting data samples from an unlabeled pool of data. The system can provide the selected data samples to an oracle, such as a human, in order to get ground truth data. However, due to the large amount of data in the unlabeled pool of data, it can be challenging to efficiently select one or more data samples to present to the oracle. Furthermore, since systems can leverage a third-party machine learning model to perform certain aspects, such as a third-party natural language processor, the system facilitating the active learning or generating the training data set may not have access to particular mechanisms, functions, or intermediary outputs of the third-party machine learning model in order to facilitate efficient selection of data samples from the unlabeled pool of data.” [0030] of Stergioudis states ”To improve active learning by efficiently selecting data samples from the unlabeled pool of data, this technical solution can generate a surrogate model. The generated surrogate model can facilitate or allow for the improvement of active learning.“ Stergioudis explicitly maintains and uses a large unlabeled data set that is separate from the labeled training set. It would have been obvious in active learning where labeled records are a minority subset and unlabeled records are the majority pool.)
training a machine learning model; ([0007] of Stergioudis states “The method can include the one or more processors training the surrogate model with a labeled data set. The labeled data set can include phrases configured for input into the virtual assistant and indications of corresponding functions to be executed by the one or more virtual applications. The method can include the one or more processors inputting the unlabeled data set into the surrogate model trained with the labeled data set to generate the predictions for the unlabeled data set.”)
using the trained machine learning model to classify the data records that are not labeled ([0007] of Stergioudis states “The method can include the one or more processors inputting the unlabeled data set into the surrogate model trained with the labeled data set to generate the predictions for the unlabeled data set.” Stergioudis uses unlabeled data into the trained model to generate the classification output.)
from the data records that are not labeled, iteratively: selecting a data record of the at least some data records that matches a decision criterion; ([0005] of Stergioudis states “The method can include the one or more processors querying the unlabeled data set to select a first set of phrases from the plurality of phrases. The method can include the one or more processors selecting the first set of phrases based at least on one or more confidence scores output by a surrogate model that corresponds to a third-party model maintained by a third-party system.” [0107] of Stergioudis states ”The data processing system 202 can continue to query the unlabeled data set 218 to perform active learning until a condition is met. For example, the condition can be a number of iterations or queries, until the unlabeled data set 218 is empty (e.g., after labeling a data sample, it is removed from the unlabeled data set 218 and moved to the labeled data set 220), or based on a level of performance of the surrogate model 222 or the 3P model 238.” [0095] of Stergioudis states “he query generator 206 can query the unlabeled data set 218 to select a first set of samples (e.g., a first set of phrases) from the unlabeled data samples stored in the unlabeled data set 218. The query generator 206 can input the unlabeled data set 218 into the surrogate model 222 trained with the labeled data set 220 or knowledge distillation from the 3P model 238 to generate the predictions, along with probabilities or confidence scores, for the unlabeled data set. In some cases, the predictions can include a category or class and a corresponding probability or confidence score. The query generator 206 can select the first set of samples based at least on one or more confidence scores output by a surrogate model 222 that corresponds to a third-party model 238 maintained by a third-party system 228.” Under BRI, selecting a record “based on confidence scores output by the trained model” is selecting a record that “matches a decision criterion” as the confidence score threshold is that criterion.)
updating the trained machine learning model to include the selected at least one data record and the labeling of the selected at least one data record, ([0015] of Stergioudis states ”The one or more processors can adjust the soft targets using a smoothing technique including at least one of probability clipping, a probability assignment, or a softmax temperature. The one or more processors can train the surrogate model based on the adjusted soft targets and the indications of functions received via the user interface.” [0030] of Stergioudis states “For example, the third-party model may provide a single output, whereas the surrogate model can provide multiple candidate outputs along with confidence scores for each candidate output. The system can leverage the confidence scores from the surrogate model to select data samples from the unlabeled pool of data to present to the oracle. Upon receiving input from the oracle, the system can provide the training data set to the third-party model to improve the performance of the third-party model, thereby resulting in improve natural language processing or other machine learning task.” Stergioudis states after each selection and labeling the model is retrained using the updated training dataset and adjusted soft targets are the labels assigned to the selected records. Combined with Givental’s label propagation for labels to originate from propagation instead of human oracle.)
such that the updated training machine learning model is a final model. ([0122] of Stergioudis states “At ACT 438, the data processing system can provide the labeled data set 436 to the 3P model 238 to cause the 3P system 228 to update the 3P model 238. At ACT 440, the data processing system can also use the labeled data set 436 to update the surrogate model 222 using knowledge distillation such that the surrogate model 222 can mimic aspects of the 3P model 238. The data processing system can repeat ACTs 406-440 until an acceptable performance has been achieved for the 3P model 238, there is no unlabeled data remaining, or the incremental performance improvements are very small, such as below a threshold.” [0133] of Stergioudis states ”If the active learning is complete, the data processing system can proceed to 612 to end active learning. The data processing system can determine that the 3P model performance in a satisfactory manner and that it may not be necessary to further update the 3P model based on a training set.”)
However, Stergioudis does not explicitly teach 
receiving a set of data records, wherein each data record of the set of data records comprises a set of features that have been matched against a list of data records having a particular characteristic
wherein a first fraction of the data records of the set of data records are labeled as having the particular characteristic, wherein a second fraction of the data records of the set of data records are labeled as not having the particular characteristic, 
with the data records labeled as having the particular characteristic and the data records labeled as not having the particular characteristic, 
as either having the particular characteristic or not having the particular characteristic	
with a label propagation algorithm, based on the classifications, labeling the selected at least one data record as either having the particular characteristic or not having the particular characteristic;
wherein the iteration continues until all of the data records are labeled, 
Givental teaches
wherein a first fraction of the data records of the set of data records are labeled as having the particular characteristic, wherein a second fraction of the data records of the set of data records are labeled as not having the particular characteristic, ([0005] of Givental states “The method comprises executing, by the hybrid ML anomaly detector, the ensemble of unsupervised machine learning models on log data to generate, for each entry in the log data, a predicted anomaly score and corresponding anomaly classification label of the entry as to whether the entry represents an anomalous event. The method also comprises generating, by the hybrid ML anomaly detector, a partially labeled dataset based on a selected subset of entries in the log data and other unlabeled log data in the log data. The method further comprises performing, by the semi-supervised machine learning model, a similarity analysis of the unlabeled log data in the partially labeled dataset with entries in the selected subset of entries.” Givental’s anomaly classification label is a binary label where anomalous means labeled as having the particular characteristic causing it to be anomalous or non-anomalous which is labeled as not having.  Selected subset of entries in Givental that forms the partially labeled dataset corresponds to the claimed first and second labeled fraction.)
with the data records labeled as having the particular characteristic and the data records labeled as not having the particular characteristic, ([0005] of Givental states “executing, by the hybrid ML anomaly detector, the ensemble of unsupervised machine learning models on log data to generate, for each entry in the log data, a predicted anomaly score and corresponding anomaly classification label of the entry as to whether the entry represents an anomalous event.”)
as either having the particular characteristic or not having the particular characteristic ([0005] of Givental states “executing, by the hybrid ML anomaly detector, the ensemble of unsupervised machine learning models on log data to generate, for each entry in the log data, a predicted anomaly score and corresponding anomaly classification label of the entry as to whether the entry represents an anomalous event.” The output corresponds to anomaly classification label as anomalous events are present or not which corresponds to whether the particular characteristic is present or not.)
with a label propagation algorithm, based on the classifications, labeling the selected at least one data record as either having the particular characteristic or not having the particular characteristic; ([0061] of  Givental states “Based on the similarities of the unlabeled entry to the labeled entries, label propagation and label spreading operations are performed such that a labeled entry having a highest similarity measure is selected for the unlabeled entry and the corresponding label of the selected labeled entry is attributed to the unlabeled entry, e.g., log entry in the log data. Thus, if the unlabeled entry is most similar to a labeled entry with the label “anomalous” then the unlabeled entry will likewise be labeled “anomalous” when the semi-supervised learning algorithms 170 update the partially labeled data 162. Similarly, if the unlabeled entry is most similar to a labeled entry with the label of “non-anomalous” then the unlabeled entry will likewise be labeled “non-anomalous.” [0078] of Givental states “The label of the selected labeled portion of data is propagated to the selected unlabeled portion of data (step 412). Steps 406-412 are then repeated for each subsequent portion of unlabeled data in the partially labeled dataset until all portions of data in the partially labeled dataset are labeled to generate a fully labeled dataset (step 414).”)
wherein the iteration continues until all of the data records are labeled, ([0078] of Givental states ”The label of the selected labeled portion of data is propagated to the selected unlabeled portion of data (step 412). Steps 406-412 are then repeated for each subsequent portion of unlabeled data in the partially labeled dataset until all portions of data in the partially labeled dataset are labeled to generate a fully labeled dataset (step 414).”)
	Lesiecki teaches
receiving a set of data records, wherein each data record of the set of data records comprises a set of features that have been matched against a list of data records having a particular characteristic ([0002] of Lesiecki states “In certain embodiments, received records may be matched against a list of entity records using attributes of the received and entity records and matching techniques applied with configurable weights.” [0020] of Lesiecki states “Attributes of the known entity or corpus of known entities may include names, addresses, date of births, social security numbers, passport numbers, driver's license numbers, business registration information, business functions, geographical locations, name origins, or the like.” [0030] of Lesiecki states “In some embodiments, the list may be a list of entities that may be associated with particular accounts (e.g., financial, service-based, etc.), events, companies (e.g., customers of a company), etc. Certain embodiments may use such lists of entities to verify or authorize transactions or related operations (e.g., where a match does not indicate a prohibition). For example, a healthcare or insurance company may attempt to match an insurance claim against records of insured individuals, in order to verify that a claim from the individual is valid. Other institutions may also utilize disclosed embodiments to match persons for other purposes, such as marketing, fraud, compliance, due diligence, identity verification, or the like. “ [0037] of Lesiecki states “In one aspect, database(s) 109 may be used by components of network layout 100 to perform one or more operations consistent with the disclosed embodiments. In one embodiment, database(s) 109 may comprise storage containing a variety of data sets consistent with disclosed embodiments. For example, database(s) 109 may include, for example, lists of politically/financially notable or exposed individuals, watch lists, fraud lists, a list of sanctioned or embargoed entities such as an OFAC (Office of Foreign Assets Control) list, RFC (Risk, Fraud & Compliance) lists, or the like.” One of ordinary skill will understand that the particularly disclosed embodiments are not limiting and can be used for a variety of purposes. Compare received data record with entity record to check whether particular characteristics exist to determine whether record represent suspect entity or not.)
	It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings of Stergioudis, Givental, and Lesiecki. Stergioudis teaches an iterative active learning workflow in which unlabeled records are selected based on model output confidence scores and used to retrain a machine learning model. Givental teaches a semi supervised label propagation mechanism in which classification labels are propagated to unlabeled entries until a fully labeled dataset is obtained. Lesiecki teaches matching record attributes against fraud entity lists to identify suspect transactions in a fraud detection environment. One with ordinary skill in the art would have been motivated to incorporate the teachings of Lesiecki and Givental into that of Stergioudis to apply combined label propagation and iterative active learning framework to the fraud record matching domain of Lesiecki as all three references address the problem of classifying large pools of partially labeled or unlabeled records while reducing manual review burden. It would have been predictable combination to improve labeling efficiency and model training in fraud related record classification systems by using known label propagation algorithm.

Regarding claim 2, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Stergioudis, Givental, Lesiecki teaches
wherein the at least one data record comprises all of the data records that are not labeled. ([0078] of Givental states “The label of the selected labeled portion of data is propagated to the selected unlabeled portion of data (step 412). Steps 406-412 are then repeated for each subsequent portion of unlabeled data in the partially labeled dataset until all portions of data in the partially labeled dataset are labeled to generate a fully labeled dataset (step 414).” The at least one data record selected at each iteration, taken across all iterations, comprises all of the data records that are not labeled.)

Regarding claim 3, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Stergioudis, Givental, Lesiecki teaches
wherein the at least one data record comprises, of the data records that are not labeled, a subset of data records labeled by the label propagation algorithm. ([0077] of Givental states “A partially labeled dataset is generated based on the labeled subset of data and the unlabeled data in the input dataset (step 404). A next unlabeled portion of data in the partially labeled dataset is selected (step 406). Then, for each labeled portion of data in the partially labeled dataset, a similarity measure is generated indicating a similarity between the labeled portion of data and the selected unlabeled portion of data (step 408). A labeled portion of data having a highest similarity measure, indicating a highest similarity with the unlabeled portion of data is selected (step 410).” [0078] of Givental states “The label of the selected labeled portion of data is propagated to the selected unlabeled portion of data (step 412). Steps 406-412 are then repeated for each subsequent portion of unlabeled data in the partially labeled dataset until all portions of data in the partially labeled dataset are labeled to generate a fully labeled dataset (step 414).” At each iteration of Givental, step 406 selects one unlabeled portion and step 412 labels that selected subset via the label propagation algorithm.)

Regarding claim 4, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Stergioudis, Givental, Lesiecki teaches
wherein the label propagation algorithm employs least confidence sampling, margin sampling, or entropy-based sampling, or a combination thereof. ([0097] of Stergioudis states “The query generator 206 can be configured with one or more types of uncertainty sampling techniques including, for example, a least confident technique, a margin sampling technique, or an entropy sampling technique. With the least confident uncertainty sampling technique, the query generator 206 can query instances or data samples in the unlabeled data set 218 about which the surrogate model 222 is least certain how to label. This approach can be configured for probabilistic learning models. For example, when using a probabilistic model for binary classification, the query generator 206 using the least confident uncertainty sampling technique can query the instance whose posterior probability of being positive is nearest to 0.5.” [0098] of Stergioudis states “With the margin sampling uncertainty sampling technique, the query generator 206 can select the instances in the unlabeled data set 218 where the difference between the first most likely and second most likely classes are the smallest. The margin sampling technique can correct for a shortcoming in the least confident technique by incorporating the posterior of the second most likely label. For instances with small margins, knowing the true label can facilitate the model discriminating more effectively between them.” [0099] of Stergioudis states “With the entropy sampling uncertainty technique, the query generator 206 can select the instances where the class probabilities have the largest entropy. Entropy can be an information-theoretic measure that represents the amount of information used to “encode” a distribution. As such, it is often thought of as a measure of uncertainty or impurity in machine learning.” [0100] of Stergioudis states “The query generator 206 can use one or more of the uncertainty sampling techniques. The query generator 206 can select an uncertainty sampling technique to use based on one or more factors or criteria. For example, the query generator 206 can use the entropy sampling technique if the objective function is to minimize log-loss, while the query generator 206 can select least confident or margin sampling in order to reduce classification error.” Stergioudis teaches these strategies in the record selection step. Givental’s label propagation algorithm operates on selected records. Under BRI, a label propagation algorithm that operates on records selected using these sampling techniques is obvious choice.) 

Regarding claim 11, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Stergioudis, Givental, Lesiecki teaches
wherein the particular characteristic comprises a suspicion that the data record is fraudulent,  wherein the list of data records having the particular characteristic comprises a list of known fraud sources, and wherein the operations further comprise automatically blocking a transaction associated with the data record. ([0019] of Lesiecki states “In one embodiment, list filtering enables transactions to be monitored and flagged for a variety of reasons. For example, if a participant in a financial transaction is known to have committed fraud in the past, a watch list filter system consistent with disclosed embodiments may flag the transaction as potentially fraudulent. As another example, if a participant in a financial transaction is known or suspected to be involved with a terrorist group, narcotic group, or is otherwise on a list of individuals known to be involved in illegal acts, a watch list filter system consistent with disclosed embodiments may flag the transaction in order to stop or pause it and/or alert appropriate parties (such as a government agency)” [0037] of Lesiecki states “For example, database(s) 109 may include, for example, lists of politically/financially notable or exposed individuals, watch lists, fraud lists, a list of sanctioned or embargoed entities such as an OFAC (Office of Foreign Assets Control) list, RFC (Risk, Fraud & Compliance) lists, or the like.” [0041] of Lesiecki states “For instance, while a financial transaction is occurring, invoking system 101 may send one or more names of one or more participants associated with the transaction to server 103 for processing in a manner consistent with the disclosed embodiments, such as for checking against one or more lists, and for pausing or denying the transaction if one or more participants is found on the list(s).” Lesiecki’s system flags records associated with fraud, which characteristic of interest is fraud suspicion. It also uses database with fraud list, which is entities known to be associated with fraud or that they have particular characteristic of fraud. As the Lesiecki’s system stops/pause/deny transaction that are inappropriate, it corresponds to automatically blocking these transactions associated with the fraud data record.)

Claims 12 – 15 recite substantially similar subject matter to claims 1 – 4 respectively and are rejected with the same rationale, mutatis mutandis.

Claims 5 – 10, 16 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Stergioudis (U.S. Pub. 2022/0230095 A1) in view of Givental et al.(U.S. Pub. 2021/0281592 A1), Lesiecki et al. (U.S. Pub. 2013/0204880 A1), further in view of Perrizo (U.S. Pub. 2008/0040302 A1).
Regarding claim 5, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Stergioudis, Givental, Lesiecki does not teaches
wherein the data records of the set of data records are mapped into a multidimensional space wherein each axis of the space represents a feature of the set of features. 
	However, Perrizo teaches 
wherein the data records of the set of data records are mapped into a multidimensional space wherein each axis of the space represents a feature of the set of features. ([0007] of Perrizo states “Support Vector Machine (SVM) classification is generally regarded as a technique that produces high-accuracy classification. In classification, a data item to be classified may be represented by a number of features. If, for example, the data item to be classified is represented by two features, it may be represented by a point in 2-dimensional space. Similarly, if the data item to be classified is represented by n features, also referred to as the “feature vector”, it may be represented by a point in n-dimensional space. The training set points to be used to classify that data item are points in n+1 dimensional space (the n feature space dimensions plus the one additional class label dimension). SVM uses a kernel to translate that n+1 dimensional space to another space, usually much higher dimensional, in which the entire global boundary (or the global boundary, once a few “error” training points are removed). This linear boundary (also referred to as a hyperplane), which separates feature vector points associated with data items “in a class” and feature vector points associated with data items “not in the class.” The underlying premise behind SVM is that, for any feature vector space, a higher-dimensional hyperplane exists that defines this boundary. A number of classes can be defined by defining a number of hyperplanes. The hyperplane defined by a trained SVM maximizes a distance (also referred to as an Euclidean distance) from it to the closest points (also referred to as “support vectors”) “in the class” and “not in the class” so that the SVM defined by the hyperplane is robust to input noise”)
	It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings of Stergioudis, Givental, Lesiecki, and Perrizo. Stergioudis teaches an iterative active learning workflow in which unlabeled records are selected based on model output confidence scores and used to retrain a machine learning model. Givental teaches a semi supervised label propagation mechanism in which classification labels are propagated to unlabeled entries until a fully labeled dataset is obtained. Lesiecki teaches matching record attributes against fraud entity lists to identify suspect transactions in a fraud detection environment. Perrizo teaches nearest neighbor and boundary based classification techniques in n-dimensional feature space, including local decision boundaries, epsilon distance neighbor sets, predominant class voting, and distance weighted voting. One with ordinary skill in the art would have been motivated to incorporate the teachings of Perrizo into the combination of Lesiecki, Givental, Stergioudis to implement the resulting classifier using Perrizo’s well known feature space, boundary, and neighbor based decision techniques, because each reference contributes a complementary part of the same overall classification task and their combination would have predictable improved the accuracy and robustness of the system. It would have been predictable combination to enable semi-supervised fraud related record classification using iterative selection, label propagation, and neighborhood based decision criteria within a single integrated system. 

Regarding claim 6, the rejection of claim 5 is incorporated herein. Furthermore, the combination of Stergioudis, Givental, Lesiecki, Perrizo teaches
wherein a decision boundary in the multidimensional space divides data records of the set of data records having the particular characteristic from data records of the set of data records not having the particular characteristic. ([0009] of Perrizo states ”One aspect of the invention is directed to classifying a subject data item based on a training set of pre-classified data items. Any smooth boundary can be piecewise-linearly approximated. Therefore, if the set of near neighbors chosen is small enough the boundary (hereafter called the local boundary) between different classes will be linear. This local boundary is automatically computed. The local boundary is approximated by a neighborhood set of data items selected from the training set that have been pre-classified into different classes and have feature points similar to the points of the subject data item. A class is automatically assigned to the subject data item in accordance with a side of the local boundary on which the subject data item resides.” The local boundary divides records with the characteristic and records without in the multidimensional feature space.)

Regarding claim 7, the rejection of claim 6 is incorporated herein. Furthermore, the combination of Stergioudis, Givental, Lesiecki, Perrizo teaches
wherein the decision criterion comprises determining on which side of the decision boundary the selected data record falls. ([0009] of Perrizo states ”One aspect of the invention is directed to classifying a subject data item based on a training set of pre-classified data items. Any smooth boundary can be piecewise-linearly approximated. Therefore, if the set of near neighbors chosen is small enough the boundary (hereafter called the local boundary) between different classes will be linear. This local boundary is automatically computed. The local boundary is approximated by a neighborhood set of data items selected from the training set that have been pre-classified into different classes and have feature points similar to the points of the subject data item. A class is automatically assigned to the subject data item in accordance with a side of the local boundary on which the subject data item resides.” In accordance with a side of the local boundary on which the subject data item resides in Perrizo refers to determining on which side of the decision boundary the selected record falls. The decision criterion will be the side of boundary determination.)

Regarding claim 8, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Stergioudis, Givental, Lesiecki, Perrizo teaches
which further comprises, for the selected data record, identifying, in a multidimensional space wherein each axis of the space represents a feature of the set of features, labeled neighbors from the data records labeled as having the particular characteristic and the data records labeled as not having the particular characteristic. ([0031] of Perrizo states “Another known method of nearest neighbor classification is the so-called epsilon nearest neighbor classification. Unlike K nearest neighbor classification, epsilon nearest neighbor classification uses the degree of similarity as the criterion which defines the nearest neighbor set. In this method, all data items of a specific degree of similarity to the reference data item are included in the nearest neighbor set. Because the degree of similarity is the key criterion here, it must be specified in advance of running the classification routine.” [0034] of Perrizo states “In one embodiment, all neighbors within a certain given Euclidian distance (the “closed k or epsilon nearest neighbor set) are used. Attribute-weighted Euclidian distances are used. Additionally, a Gaussian weighting of the vote is applied based on distance to the subject.” [0005] of Perizzo states “Nearest Neighbor Vote classification and Full Decision Boundary Based (e.g., Support Vector Machine) classification are popular approaches to real life data classification applications. In Nearest Neighbor Vote classification, the neighbors (i.e. the data items in the training set that are sufficiently similar or close to the data item to be classified), are found by scanning the entire data set. The predominant class in that neighbor set is assigned to the subject.” Perizzo teaches that a certain given Euclidian distance needs to be specified in advance which corresponds to that the radius is given before classification runs. As Perizzo scan all dataset for neighbors consisting of the labeled training data, all identified neighbors within epsilon distance are the labeled records from the training set that fall within the given search radius.)

Regarding claim 9, the rejection of claim 8 is incorporated herein. Furthermore, the combination of Stergioudis, Givental, Lesiecki, Perrizo teaches
wherein the decision criterion comprises determining whether a majority of the labeled neighbors within a given search radius are labeled as having the particular characteristic. ([0005] of Perizzo states “Nearest Neighbor Vote classification and Full Decision Boundary Based (e.g., Support Vector Machine) classification are popular approaches to real life data classification applications. In Nearest Neighbor Vote classification, the neighbors (i.e. the data items in the training set that are sufficiently similar or close to the data item to be classified), are found by scanning the entire data set. The predominant class in that neighbor set is assigned to the subject.” Under BRI, assigning the predominant class is determining the majority voted class.)

Regarding claim 10, the rejection of claim 8 is incorporated herein. Furthermore, the combination of Stergioudis, Givental, Lesiecki, Perrizo teaches
wherein the decision criterion comprises determining whether a weighted majority of the labeled neighbors are labeled as having the particular characteristic, wherein the weighting is based on a distance between the selected record and each respective labeled neighbor within the multidimensional space. ([0034] of Perizzo states “In one embodiment, all neighbors within a certain given Euclidian distance (the “closed k or epsilon nearest neighbor set) are used. Attribute-weighted Euclidian distances are used. Additionally, a Gaussian weighting of the vote is applied based on distance to the subject. The contribution to each class vote from ‘Xth’ epsilon neighbor in class ‘c’ for subject ‘s’ is calculated as follows.” [0035] of Perizzo states “
    PNG
    media_image1.png
    42
    343
    media_image1.png
    Greyscale
In equation (1) above, d(x,s) indicates the weighted Euclidian distance from subject s to neighbor x. The parameters ‘VoteFact’, ‘sigma’ and the epsilon can be optimized by evolutionary algorithms.” Perizzo’s gaussian weighting of the vote applied based on distance to the subject under BRI maps to the “weighting based on a distance between the selected record and each respective labeled neighbor.”)

Claims 16 and 17 – 20 recite substantially similar subject matter to claims 5 – 7 combined and 8 – 11 respectively and are rejected with the same rationale, mutatis mutandis.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BYUNGKWON HAN whose telephone number is (571)272-5294. The examiner can normally be reached M-F: 9:00AM-6PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BYUNGKWON HAN/Examiner, Art Unit 2121                                                                                                                                                                                                        
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
SYSTEM AND METHOD FOR TRAINING A MACHINE LEARNING NETWORK

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

SYSTEM AND METHOD FOR TRAINING A MACHINE LEARNING NETWORK

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email