Last updated: May 29, 2026
Application No. 17/837,358
AUTOMATIC RULE INDUCTION FOR SEMI-SUPERVISED TEXT CLASSIFICATION

Non-Final OA §103§112
Filed
Jun 10, 2022
Priority
May 23, 2022 — provisional 63/344,728
Examiner
SITIRICHE, LUIS A
Art Unit
2126
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
1 (Non-Final)
Interview Optional

— +22.0% interview lift. Examiner has a relatively high allowance rate (78%); +22.0% interview lift. A written response may suffice.
Based on 471 resolved cases, 2023–2026
Examiner Intelligence

SITIRICHE, LUIS A View full profile →
Grants 78% — above average
Career Allowance Rate
366 granted / 471 resolved
+22.7% vs TC avg
Strong +22% interview lift
Without
With
+22.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
10 currently pending
Career history
494
Total Applications
across all art units
Statute-Specific Performance

§101
17.6%
-22.4% vs TC avg
§103
65.5%
+25.5% vs TC avg
§102
7.8%
-32.2% vs TC avg
§112
4.7%
-35.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 471 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Election/Restrictions
Applicant’s election without traverse of Group I. Claims 1-13 and 21 in the reply filed on 10/30/2025 is acknowledged.
Group II. Claims 14-20 have been withdrawn as being directed to a non-elected invention.
Claim Objections
Claim 13 is objected to because of the following informalities: “one or more storage device having stored computer-executable instructions which are executable by the one or more processors to configure the system to implement a method for modifying the trained classification model by at least configuring the system to perform a following”.  The words “a following” should be “the following”. Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 4-5 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Dependent claims 4 and 5 recite:
“4. The computer-implemented method of claim 1, wherein the text featurization module implements a bag of word model”;   
“5. The computer-implemented method of claim 1, wherein the text featurization module implements a principal component analysis (PCA) method”.  
There is no previous recitation of a “text featurization module” in Claim 1, failing to provide proper antecedent basis and rendering the claims unclear and indefinite. Clarification is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 5-6, 8-9, 11, 13, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Sampaio et al (US 2021/0374614- hereinafter Sampaio) in view of Goodwin (US 2021/0192318 - hereinafter Goodwin).
Referring to Claim 1, Sampaio teaches a computer-implemented method for modifying a trained classification model, comprising: 
identifying a trained classification model (see Sampaio at [0015]: “Embodiments of the present disclosure provide a ML system based on Active Learning (AL) that integrates the selection and annotation of small samples of unlabeled data, as well as continuing ML model training and evaluation” and “The system can be extended with complementary functionality (e.g., architecture search, hyper-parameter tuning, pre-trained models”. Therefore, this provided ML system corresponds to the identified trained classification model); 
accessing a dataset of labeled data and unlabeled data configured for being classified by the trained classification model (see Sampaio at [0027]: “When the system starts up for the first time, some of the unlabeled data in the data stream are selected to be labeled. In this example, the portion of the unlabeled data to label include event e2 and event e4, collectively referred to as the first group of selected data within the dashed box as shown”, and “The unlabeled data is then labeled by an annotator, e.g., a human analyst, or another labeling service. The unlabeled data is now labeled and is stored in a pool of labeled data as shown in state 1 of the labeled pool. In this example, event e2 is labeled as “fraud” and event e4 is labeled as “not fraud”); 
generating a plurality of feature vector values by converting the dataset of labeled data and unlabeled data to a feature space (see Sampaio at [0037]: “Though this process only directly exploits bivariate correlations, it offers an advantage that it removes features in the original feature space, so the feature plan can be reduced to a smaller size while keeping features that are more human interpretable”. Further at [0038]: “For example, Principal Component Analysis (PCA) can be applied to reduce the dimensionality of the feature space obtained through automatic feature engineering”. Therefore, the reduction to smaller size in the original feature space corresponds to the conversion of the dataset to a feature space); 
generating a plurality of transformed vector values based on at least the plurality of feature vector values (see Sampaio at [0049]: “The Process Startup module 320 is configured to perform automatic feature engineering and feature filtering by pre-processing raw data when the system starts for the first time. For example, it can (optionally) contain a preprocessing pipeline responsible for transforming the raw data to enrich it with further features to be used for machine learning (configurable, e.g., through domain knowledge), or it can (also optionally) produce an automatic feature preprocessing pipeline to enrich the raw fields”. Therefore, the transformation of the raw data to enrich it with features corresponds to the transformed values); 
generating a set of rules based on the plurality of transformed vector values, the set of rules configured to classify new unlabeled data based on at least the transformed vector values (see [0076]: “A Warm up Policy uses both the unlabeled pool and labeled pool distributions (regardless of the label values). The system switches to the next policy after a minimum number of labels is collected, as specified by the switching criterion, to represent sufficiently well the distribution of the target variable for the next policy to be able to act. An example is binary classification for fraud detection, where a common criterion would be to require that at least one fraud event is detected. A Hot Policy uses the available labels and collects new labels with a goal of improving the ML model's performance, which is unlike the Cold and Warmup policies, whose goal is to represent well the unlabeled pool regardless of the labels”. Therefore, these policies based on the labels are interpreted as the set of rules); 
generating a modified classification model by at least applying the set of rules to the trained classification model, and such that the modified classification model is configured to classify a dataset of new unlabeled data at least partially based on the set of rules (see [0076]: “A Warm up Policy uses both the unlabeled pool and labeled pool distributions (regardless of the label values). The system switches to the next policy after a minimum number of labels is collected, as specified by the switching criterion, to represent sufficiently well the distribution of the target variable for the next policy to be able to act. An example is binary classification for fraud detection, where a common criterion would be to require that at least one fraud event is detected. A Hot Policy uses the available labels and collects new labels with a goal of improving the ML model's performance, which is unlike the Cold and Warmup policies, whose goal is to represent well the unlabeled pool regardless of the labels”. Since the ML model’s performance is improved according to the policies, this is interpreted as a modified classification model).
However, Sampaio fails to explicitly teach:
generating a plurality of feature vector values by converting the dataset of labeled data and unlabeled data to a feature space;
generating a plurality of transformed vector values based on at least the plurality of feature vector values;
generating a set of rules based on the plurality of transformed vector values, the set of rules configured to classify new unlabeled data based on at least the transformed vector values.
Goodwin teaches, in an analogous system, 
generating a plurality of feature vector values by converting the dataset of labeled data and unlabeled data to a feature space (see Goodwin at [0032]: “The classification inference process 200 continues by mapping the feature vector formed in box 205 into a new Q-dimensional real-valued feature space, which may be referred to as an embedding space (box 207)”);
generating a plurality of transformed vector values based on at least the plurality of feature vector values (see Goodwin at [0043]: “The process 400 continues by transforming batch of labeled raw feature vectors respectively into labeled embedding vectors by a feature-space transformation process (box 407)”);
generating a set of rules based on the plurality of transformed vector values, the set of rules configured to classify new unlabeled data based on at least the transformed vector values (see Goodwin at [0005]: “current classifiers use separate processes for classifier training and determine of classification rules for inference. The feature-space conditioning, however, may be suboptimal for inference using the determined classification rules”. Further at [0007]: “In other words, the rules for inference are incorporated in the feature-space conditioning transformation derived by the training process”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sampaio with the above teachings of Goodwin by creating a machine learning classifier using rules, as taught by Sampaio, wherein the classifier is generated using feature vector values, as taught by Goodwin. The modification would have been obvious because one of ordinary skill in the art would be motivated to improve classification performance, as suggested by Goodwin at [0007]: “In embodiments based on deep learning, novel objective functions are constructed based on inference rules and the feature-space transformation is learned via backpropagation. Embodiments of the system and method exhibit improved classification performance with respect to several other existing approaches”.

Referring to Claim 5, the combination of Sampaio and Goodwin teaches the computer-implemented method of claim 1, wherein the text featurization module implements a principal component analysis (PCA) method (see Sampaio at [0038]: “For example, Principal Component Analysis (PCA) can be applied to reduce the dimensionality of the feature space obtained through automatic feature engineering”).  

Referring to Claim 6, the combination of Sampaio and Goodwin teaches the computer-implemented method of claim 1, however, fails to teach wherein the data value transformer reduces the plurality of feature vector values to the plurality of transformed vector values by 10 to 90 percent, or 20 to 70 percent, or 25 to 50 percent.  
Nevertheless, it has been held that where the general conditions of a claim are disclosed in the prior art, discovering the optimum or workable ranges by routine experimentation does not make a claim patentably distinct from the prior art (In re Aller, 220 F.2d 454, 456, 105 USPQ 233, 235 (CCPA 1995); see MPEP 2144.05 II).  Moreover, the examiner has carefully reviewed the specification of the instant application and has not found any disclosure or evidence of any criticality to establish a reduction of the feature vector values by either 10 to 90 percent, or 20 to 70 percent, or 25 to 50 percent.
Therefore, it would have been obvious to a person of ordinary skill in the arts at the time of the applicant’s invention to modify the combination of Sampaio and Goodwin by incorporating the reduction of the feature vector values by either 10 to 90 percent, or 20 to 70 percent, or 25 to 50 percent in order to improve classification performance.

Referring to Claim 8, the combination of Sampaio and Goodwin teaches the computer-implemented method of claim 1, wherein the rule generator generates rules using a random forest model performed on the plurality of transformed vector values (see Sampaio at [0090]: “In various embodiments, the disclosed techniques use a random forest model, which is non-differentiable but offers a convenient way of controlling regularization (by using a large number of shallow trees) while providing good generalization (this can be especially important to train on small data samples such as the labeled pool)”).  

Referring to Claim 9, the combination of Sampaio and Goodwin teaches the computer-implemented method of claim 1, further comprising: 
applying the plurality of transformed vector values to the data value transformer to generate a plurality of newly transformed vector values one or more times (see Goodwin at [0040]: “In some embodiments the batch processing is iterated for multiple batches of labeled inputs to progressively reduce the loss function as batches are sequentially processed. In some embodiments, iterating over multiple batches of labeled inputs progressively improves the feature-space transformation for classification”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sampaio with the above teachings of Goodwin by creating a machine learning classifier using rules, as taught by Sampaio, and generate a plurality of newly transformed vector values one or more times, as taught by Goodwin. The modification would have been obvious because one of ordinary skill in the art would be motivated to improve the feature-space transformation for classification , as suggested by Goodwin at [0040]: “In some embodiments the batch processing is iterated for multiple batches of labeled inputs to progressively reduce the loss function as batches are sequentially processed. In some embodiments, iterating over multiple batches of labeled inputs progressively improves the feature-space transformation for classification”.

Referring to Claim 11, the combination of Sampaio and Goodwin teaches the computer-implemented method of claim 1, further comprising: 
updating the set of rules using a training accuracy method (see Sampaio at [0046]: “The process outputs labels of the labeled data (210). The labels can be used for a variety of purposes such as updating a rules system or performing supervised machine learning training using the labeled data. ML model performance metrics can be estimated online with the available labels, e.g., including using online cross validation to tune model parameters”. Therefore, this updating of the rules using cross validation is interpreted to the ‘training accuracy method’).  

Referring to independent Claim 13, it is rejected on the same basis as independent claim 1, mutatis mutandis, since both are analogous claims.
Referring to Claim 21, Sampaio teaches a system for applying unlabeled data to a tuned classification model to classify the unlabeled data, comprising: 
one or more processors (see Sampaio at [0101]: “Processor 502 is coupled bi-directionally with memory 510”); and 
one or more storage device having stored computer-executable instructions which are executable by the one or more processors  (see Sampaio at [0101]: “Processor 502 is coupled bi-directionally with memory 510”) to configure the system to implement a method for modifying a trained classification model by at least configuring the system to perform a following: 
identify the tuned classification model, wherein the tuned classification model is a modified trained classification model generated from a process that includes: 
identify a trained classification model (see Sampaio at [0015]: “Embodiments of the present disclosure provide a ML system based on Active Learning (AL) that integrates the selection and annotation of small samples of unlabeled data, as well as continuing ML model training and evaluation” and “The system can be extended with complementary functionality (e.g., architecture search, hyper-parameter tuning, pre-trained models”. Therefore, this ML system corresponds to the identified trained classification model), 
access a dataset of labeled data and unlabeled data configured for being classified by the trained classification model (see Sampaio at [0027]: “When the system starts up for the first time, some of the unlabeled data in the data stream are selected to be labeled. In this example, the portion of the unlabeled data to label include event e2 and event e4, collectively referred to as the first group of selected data within the dashed box as shown”, and “The unlabeled data is then labeled by an annotator, e.g., a human analyst, or another labeling service. The unlabeled data is now labeled and is stored in a pool of labeled data as shown in state 1 of the labeled pool. In this example, event e2 is labeled as “fraud” and event e4 is labeled as “not fraud”), 
apply the dataset of labeled data and unlabeled data to a text featurization module to generate a plurality of feature vector values (see Sampaio at [0037]: “Though this process only directly exploits bivariate correlations, it offers an advantage that it removes features in the original feature space, so the feature plan can be reduced to a smaller size while keeping features that are more human interpretable”. Further at [0038]: “For example, Principal Component Analysis (PCA) can be applied to reduce the dimensionality of the feature space obtained through automatic feature engineering”), 
apply the plurality of feature vector values to a data value transformer to generate a plurality of transformed vector values (see Sampaio at [0049]: “The Process Startup module 320 is configured to perform automatic feature engineering and feature filtering by pre-processing raw data when the system starts for the first time. For example, it can (optionally) contain a preprocessing pipeline responsible for transforming the raw data to enrich it with further features to be used for machine learning (configurable, e.g., through domain knowledge), or it can (also optionally) produce an automatic feature preprocessing pipeline to enrich the raw fields”), 
apply the plurality of transformed vector values to a rule generator to generate a set of rules for classifying new unlabeled data (see Sampaio at [0076]: “A Warm up Policy uses both the unlabeled pool and labeled pool distributions (regardless of the label values). The system switches to the next policy after a minimum number of labels is collected, as specified by the switching criterion, to represent sufficiently well the distribution of the target variable for the next policy to be able to act. An example is binary classification for fraud detection, where a common criterion would be to require that at least one fraud event is detected. A Hot Policy uses the available labels and collects new labels with a goal of improving the ML model's performance, which is unlike the Cold and Warmup policies, whose goal is to represent well the unlabeled pool regardless of the labels”. Therefore, these policies are interpreted as the set of rules), and 
generate a modified classification model by at least applying the set of rules to the trained classification model, and such that the modified classification model is configured to classify new unlabeled data at least partially based on the set of rules (see Sampaio at [0076]: “A Warm up Policy uses both the unlabeled pool and labeled pool distributions (regardless of the label values). The system switches to the next policy after a minimum number of labels is collected, as specified by the switching criterion, to represent sufficiently well the distribution of the target variable for the next policy to be able to act. An example is binary classification for fraud detection, where a common criterion would be to require that at least one fraud event is detected. A Hot Policy uses the available labels and collects new labels with a goal of improving the ML model's performance, which is unlike the Cold and Warmup policies, whose goal is to represent well the unlabeled pool regardless of the labels”. Since the ML model’s performance is improved according to the policies, this is interpreted as a modified classification model); 
access a dataset of unlabeled data configured for being classified by the tuned classification model (see Sampaio at [0027]: “When the system starts up for the first time, some of the unlabeled data in the data stream are selected to be labeled. In this example, the portion of the unlabeled data to label include event e2 and event e4, collectively referred to as the first group of selected data within the dashed box as shown”, and “The unlabeled data is then labeled by an annotator, e.g., a human analyst, or another labeling service. The unlabeled data is now labeled and is stored in a pool of labeled data as shown in state 1 of the labeled pool. In this example, event e2 is labeled as “fraud” and event e4 is labeled as “not fraud”); and 
classify the dataset of unlabeled data with the tuned classification model by (i) applying the dataset of unlabeled data as input to the tuned classification model and (ii) obtaining classification labels for the dataset of unlabeled data as output from the tuned classification model (see Sampaio at [0076]: “A Warm up Policy uses both the unlabeled pool and labeled pool distributions (regardless of the label values). The system switches to the next policy after a minimum number of labels is collected, as specified by the switching criterion, to represent sufficiently well the distribution of the target variable for the next policy to be able to act. An example is binary classification for fraud detection, where a common criterion would be to require that at least one fraud event is detected. A Hot Policy uses the available labels and collects new labels with a goal of improving the ML model's performance, which is unlike the Cold and Warmup policies, whose goal is to represent well the unlabeled pool regardless of the labels”. Since the ML model’s performance is improved according to the policies, this is interpreted as a tuned classification model).
However, Sampaio fails to explicitly teach:
apply the dataset of labeled data and unlabeled data to a text featurization module to generate a plurality of feature vector values;
apply the plurality of feature vector values to a data value transformer to generate a plurality of transformed vector values;
apply the plurality of transformed vector values to a rule generator to generate a set of rules for classifying new unlabeled data.
Goodwin teaches, in an analogous system, 
apply the dataset of labeled data and unlabeled data to a text featurization module to generate a plurality of feature vector values (see Goodwin at [0032]: “The classification inference process 200 continues by mapping the feature vector formed in box 205 into a new Q-dimensional real-valued feature space, which may be referred to as an embedding space (box 207)”);
apply the plurality of feature vector values to a data value transformer to generate a plurality of transformed vector values (see Goodwin at [0043]: “The process 400 continues by transforming batch of labeled raw feature vectors respectively into labeled embedding vectors by a feature-space transformation process (box 407)”);
apply the plurality of transformed vector values to a rule generator to generate a set of rules for classifying new unlabeled data (see Goodwin at [0005]: “current classifiers use separate processes for classifier training and determine of classification rules for inference. The feature-space conditioning, however, may be suboptimal for inference using the determined classification rules”. Further at [0007]: “In other words, the rules for inference are incorporated in the feature-space conditioning transformation derived by the training process”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Sampaio with the above teachings of Goodwin by creating a machine learning classifier using rules, as taught by Sampaio, wherein the classifier is generated using feature vector values, as taught by Goodwin. The modification would have been obvious because one of ordinary skill in the art would be motivated to improve classification performance, as suggested by Goodwin at [0007]: “In embodiments based on deep learning, novel objective functions are constructed based on inference rules and the feature-space transformation is learned via backpropagation. Embodiments of the system and method exhibit improved classification performance with respect to several other existing approaches”.

Claims 2-4 are rejected under 35 U.S.C. 103 as being unpatentable over Sampaio et al (US 2021/0374614- hereinafter Sampaio) in view of Goodwin (US 2021/0192318 - hereinafter Goodwin), and further in view of Ormerod (US 2022/0245350- hereinafter Ormerod).
Referring to Claim 2, the combination of Sampaio and Goodwin teaches the computer-implemented method of claim 1, however, fails to teach wherein the trained classification model is a language model.  
Ormerod teaches, in an analogous system, wherein the trained classification model is a language model (see Ormerod at [0014]: “A list of incorporated text-classifiers available for use can include the bag-of-words model, recurrent neural network models, and/or a wide range of the transformer and/or reformer-based models. Available preprocessing techniques can use a mix of heuristic approaches involving proper noun detection, written number detection, beam searches, language models”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Sampaio and Goodwin with the above teachings of Ormerod by creating a machine learning classifier using rules wherein the classifier is generated using feature vector values, as taught by Sampaio and Goodwin, wherein the model is a language model, as taught by Ormerod. The modification would have been obvious because one of ordinary skill in the art would be motivated to generate a framework that uniquely deals with text classification, as suggested by Ormerod at [0029]: “Challenges faced in natural language processing (NLP) are amongst the most difficult for machine learning. There is a need for a framework that uniquely dealt with text classification”.

Referring to Claim 3, the combination of Sampaio and Goodwin teaches the computer-implemented method of claim 1, however, fails to teach wherein the dataset of labeled and unlabeled data are text data.  
Ormerod teaches, in an analogous system, wherein the dataset of labeled and unlabeled data are text data (see Ormerod at [0014]: “A list of incorporated text-classifiers available for use can include the bag-of-words model, recurrent neural network models, and/or a wide range of the transformer and/or reformer-based models Available preprocessing techniques can use a mix of heuristic approaches involving proper noun detection, written number detection, beam searches, language models”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Sampaio and Goodwin with the above teachings of Ormerod by creating a machine learning classifier using rules wherein the classifier is generated using feature vector values, as taught by Sampaio and Goodwin, wherein the model is a language model, as taught by Ormerod. The modification would have been obvious because one of ordinary skill in the art would be motivated to generate a framework that uniquely deals with text classification, as suggested by Ormerod at [0029]: “Challenges faced in natural language processing (NLP) are amongst the most difficult for machine learning. There is a need for a framework that uniquely dealt with text classification”.

Referring to Claim 4, the combination of Sampaio and Goodwin teaches the computer-implemented method of claim 1, however, fails to teach wherein the text featurization module implements a bag of word model.  
Ormerod teaches, in an analogous system, wherein the text featurization module implements a bag of word model (see Ormerod at [0014]: “A list of incorporated text-classifiers available for use can include the bag-of-words model, recurrent neural network models, and/or a wide range of the transformer and/or reformer-based models Available preprocessing techniques can use a mix of heuristic approaches involving proper noun detection, written number detection, beam searches, language models”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Sampaio and Goodwin with the above teachings of Ormerod by creating a machine learning classifier using rules wherein the classifier is generated using feature vector values, as taught by Sampaio and Goodwin, wherein the model is a language model, as taught by Ormerod. The modification would have been obvious because one of ordinary skill in the art would be motivated to generate a framework that uniquely deals with text classification, as suggested by Ormerod at [0029]: “Challenges faced in natural language processing (NLP) are amongst the most difficult for machine learning. There is a need for a framework that uniquely dealt with text classification”.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Sampaio et al (US 2021/0374614- hereinafter Sampaio) in view of Goodwin (US 2021/0192318 - hereinafter Goodwin), and further in view of Xiao et al (US 2022/0318499- hereinafter Xiao).
Referring to Claim 7, the combination of Sampaio and Goodwin teaches the computer-implemented method of claim 1, however, fails to teach wherein the rule generator generates rules using a linear model performed on the plurality of transformed vector values.  
Xiao teaches, in an analogous system, wherein the rule generator generates rules using a linear model performed on the plurality of transformed vector values (see Xiao at [0081]: “Wide component 464 may encompass a generalized linear model (GLM). The GLM may compute a prediction as a function of a vector of weighted contextual metadata features 462 about a message in standard messages 216 and a bias. The contextual metadata features 462 may include raw features and transformed features”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Sampaio and Goodwin with the above teachings of Xiao by creating a machine learning classifier using rules wherein the classifier is generated using feature vector values, as taught by Sampaio and Goodwin, wherein the rule generator generates rules using a linear model performed on the plurality of transformed vector values, as taught by Xiao. The modification would have been obvious because one of ordinary skill in the art would be motivated to compute a prediction as a function of a vector of weighted contextual metadata features about a message in standard messages and a bias, thereby improving text and message classification, as suggested by Xiao at 0081: “The GLM may compute a prediction as a function of a vector of weighted contextual metadata features 462 about a message in standard messages 216 and a bias…During training, embedding vectors can be initialized randomly and then the values of the embedding vectors are trained to minimize a loss function”).

Claims 10 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Sampaio et al (US 2021/0374614- hereinafter Sampaio) in view of Goodwin (US 2021/0192318 - hereinafter Goodwin), and further in view of Zhao (US 2022/0269939- hereinafter Zhao).
Referring to Claim 10, the combination of Sampaio and Goodwin teaches the computer-implemented method of claim 1, however, fails to teach further comprising: generating one or more weak labels by applying the set of rules created by the rule generator to unlabeled data. 
Zhao teaches, in an analogous system, generating one or more weak labels by applying the set of rules created by the rule generator to unlabeled data (see Zhao at [0004]: “A new graph neural network then augments the labelling rules by exploring semantic relations between rules. Finally, the augmented rules are applied to the unlabeled data to generate weak labels that are then used to train a NER model”, and [0049]: “Each “weak” label is a probability distribution over all entity classes, which can be used to train a discriminative NER model. One advantage of training a discriminative NER model is that it can use other token features while the generative model can only use the outputs of the labeling rules as inputs”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Sampaio and Goodwin with the above teachings of Zhao by creating a machine learning classifier using rules wherein the classifier is generated using feature vector values, as taught by Sampaio and Goodwin, and generating one or more weak labels by applying the set of rules created by the rule generator to unlabeled data, as taught by Zhao. The modification would have been obvious because one of ordinary skill in the art would be motivated to use other token features while the generative model can only use the outputs of the labeling rules as inputs, thereby improving the classification model (as suggested by Zhao at [0049]).

Referring to Claim 12, the combination of Sampaio and Goodwin teaches the computer-implemented method of claim 1, however, fails to teach further comprising: updating the set of rules using a semantic coverage method.
Zhao teaches, in an analogous system, updating the set of rules using a semantic coverage method (see Zhao at [0003]: “The framework is designed around a concept of sematic similarity in which two rule candidates that can accurately label the same type of entities are semantically related via the entities matched by them. Accordingly, new labeling rules are acquired based on their semantic relatedness with a relatively small set of “seeding” rules”. Therefore, using semantic similarity is interpreted as the semantic coverage method).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Sampaio and Goodwin with the above teachings of Zhao by creating a machine learning classifier using rules wherein the classifier is generated using feature vector values, as taught by Sampaio and Goodwin, and updating the set of rules using a semantic coverage method, as taught by Zhao. The modification would have been obvious because one of ordinary skill in the art would be motivated to accurately label the same type of entities being semantically related via the entities matched by them based on their semantic relatedness (as suggested by Zhao at [0003]).








Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LUIS A SITIRICHE whose telephone number is (571)270-1316. The examiner can normally be reached M-F 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Jun 10, 2022
Application Filed
Mar 02, 2026
Non-Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/438,079
Patent 12632723
LOW DISPLACEMENT RANK BASED DEEP NEURAL NETWORK COMPRESSION
4y 8m to grant Granted May 19, 2026
17/320,098
Patent 12626123
CONTROLLING OPERATION OF ACTOR AND LEARNER COMPUTING UNITS BASED ON A USAGE RATE OF A REPLAY MEMORY
5y 0m to grant Granted May 12, 2026
17/392,690
Patent 12585947
MODIFYING COMPUTATIONAL GRAPHS
4y 7m to grant Granted Mar 24, 2026
17/164,756
Patent 12579476
ADAPTIVE LEARNING FOR IMAGE CLASSIFICATION
5y 1m to grant Granted Mar 17, 2026
17/684,752
Patent 12579445
MODELS FOR PREDICTING RESISTANCE TRENDS
4y 0m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+22.0%)
3y 7m (~0m remaining)
Median Time to Grant
Low
PTA Risk
Based on 471 resolved cases by this examiner. Grant probability derived from career allowance rate.