Last updated: April 19, 2026
Application No. 17/500,721
Classification Evaluation and Improvement in Machine Learning Models

Non-Final OA §103§112
Filed
Oct 13, 2021
Examiner
THAI, JASMINE THANH
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Servicenow Inc.
OA Round
3 (Non-Final)
Interview Optional

— +56.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 24 resolved cases, 2023–2026
Examiner Intelligence

THAI, JASMINE THANH View full profile →
Grants only 25% of cases
Career Allow Rate
6 granted / 24 resolved
-30.0% vs TC avg
Strong +56% interview lift
Without
With
+56.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
23.6%
-16.4% vs TC avg
§103
37.2%
-2.8% vs TC avg
§102
14.6%
-25.4% vs TC avg
§112
21.8%
-18.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 24 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/03/2025 has been entered.

 Response to Arguments
Applicant's arguments filed 11/03/2025 have been fully considered and they are persuasive.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 101, the applicant argues that the amended claims directed to a technical improvement. Examiner respectfully agrees and withdraws the rejection of claims under 35 USC § 101.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 103, the arguments are directed to newly amended limitations that were not previously examined by the examiner. Therefore, applicants arguments are rendered moot. The examiner refers to the rejection under 35 USC § 103 in the current office action for more details.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2-3 and 7 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 2-3 and 7 recites the limitation "a predetermined threshold."  There is insufficient antecedent basis for this limitation in the claim as it is unclear if each predetermined thresholds in the claims are the same as “a predetermined threshold” in claim 1.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-3, 7-9, 14-16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. US20200234162A1 Jayaraman et al. (“Jayaraman”) in view of Yuan, et al. "Interpreting Deep Models for Text Analysis via Optimization and Regularization Methods." (“Yuan”) in further view of Jin, Di, et al. "Is bert really robust? a strong baseline for natural language attack on text classification and entailment." Proceedings of the AAAI conference on artificial intelligence. Vol. 34. No. 05. 2020. (“Jin”)
In regards to claim 1,
Jayaraman teaches A system comprising: persistent storage 
(Jayaraman, “[0171] The computer readable medium can also include non-transitory computer readable media such as computer readable media that store data for short periods of time like register memory and processor cache. The computer readable media can further include non-transitory computer readable media that store program code and/or data for longer periods of time. Thus, the computer readable media may include secondary or persistent long term storage, like ROM, optical or magnetic disks, solid state drives, compact-disc read only memory (CD-ROM), for example. The computer readable media can also be any other volatile or non-volatile storage systems. A computer readable medium can be considered a computer readable storage medium, for example, or a tangible storage device.”)
Jayaraman teaches containing a training dataset and a test dataset, each with multi-token units of text that are labelled with one or more of a plurality of categories; 
(Jayaraman, “[0112] FIG. 6B depicts a different software engineering process. An ML trainer 610 is a program that takes in training input 612 and training output 614. There often is a one-to-one mapping between each unit of training input 612 and a unit of training output 614, though more complex mappings are possible. Further, it is assumed that training input 612 and training output 614, which are usually combined into a single training data set, is quite large with a significant number of such mappings (e.g., hundreds, thousands, or even millions). This training data set [a training dataset] may be referred to as having labeled data [a test dataset], in that each input is labeled with its respective ground-truth output value [each with units of text that are labelled with one or more of a plurality of categories].”)
Jayaraman further teaches multi-token units of text (Jayaraman, “[0128] In some examples, one or more of the fields may contain word vectors and/or paragraph vectors representing semantic content of, e.g., one or more of the other field(s). Additionally or alternatively, such word vectors and/or paragraphs vectors could be generated, as part of a data pre-processing phase (e.g., 620) of a machine learning pipeline. Word vectors and paragraph vectors represent the overall “meaning” of corresponding words or collections of words (e.g., phrases, sentences, paragraphs), respectively. They do this by projecting into a semantically-encoded multidimensional vector space such that words and/or paragraphs having similar “meaning” are proximate to each other in the semantically-encoded multidimensional vector space (e.g., with respect to a Euclidean distance in the multidimensional vector space) while words and/or paragraphs having dissimilar “meanings” are distant from each other in the semantically-encoded multidimensional vector space. So, for example, word vectors for the words “uncle” and “aunt,” or for the words “car” and “automobile,” would be closer to each other, within the multidimensional vector space, than the word vectors for “duck” and “hypotenuse,” or for “flange” and “blancmange.”
[0129] A word vector for a particular word (e.g., a word present in one of the fields of the incident report 700) [multi-token units of text; wherein a word is a token (with a corresponding word vector ie numerical representation) and multiple words can be in a field and thus, Jayaraman teaches multi-token units of text] could be determined by looking up the word vector in a lookup table or other index mapping words to word vectors. Such a mapping may be generated in a variety of ways, e.g., by training a multi-layer perceptron or other ML model architecture using samples of natural language. The trained ML model includes an input layer that represents the mapping between input words and corresponding word vectors. The ML model could be configured and trained to predict words based on their context, e.g., to predict the next word in a sequence of words (e.g., the next word in a sentence) based on a number of prior words.”)
Jayaraman teaches a machine learning model, trained with the training dataset to classify input units of text into the plurality of categories, and tested with the test dataset; 
(Jayaraman, “[0112] FIG. 6B depicts a different software engineering process. An ML trainer 610 is a program that takes in training input 612 and training output 614. There often is a one-to-one mapping between each unit of training input 612 and a unit of training output 614, though more complex mappings are possible. Further, it is assumed that training input 612 and training output 614, which are usually combined into a single training data set, is quite large with a significant number of such mappings (e.g., hundreds, thousands, or even millions). This training data set may be referred to as having labeled data, in that each input is labeled with its respective ground-truth output value.
[0113] The goal of ML trainer 610 is to iteratively analyze the mappings to build a computational ML model 616 (e.g., an algorithm) that can, with high probability, produce the training output 614 from training input 612. In other words, for each unit of training input 612, the associated unit of training output 614 will be produced in the vast majority of instances. Furthermore, ML model 616 may be able to produce desirable output even from input that was not used during its training. [a machine learning model, trained with the training dataset to classify input units of text into the plurality of categories]
[0114] The types of models and methods through which these models can be trained vary dramatically. For instance, ML model 616 could be an artificial neural network, decision tree, random forest, support vector machine, Bayes classifier, k-means clusterer, linear regression predictor, and so on. But the embodiments herein may be operable with any type of machine learning technique.
[0115] Once tested, ML model 616 may be placed into production [tested with the test dataset]. Thus, like program 600, ML model 616 may receive production input 602. However, ML model 616 may produce production output 618 that is different from production output 604. As alluded to above, a well-trained ML model can often produce production output that is superior to that of a traditionally-developed algorithm.”)
Jayaraman teaches and one or more processors configured to: read, from the persistent storage, the training dataset or the test dataset; 
(Jayaraman, “[0047] Memory 104 may store program instructions and/or data on which program instructions may operate. By way of example, memory 104 may store these program instructions on a non-transitory, computer-readable medium, such that the instructions are executable by processor 102 to carry out any of the methods, processes, or operations disclosed in this specification or the accompanying drawings.”)
Jayaraman teaches determine, by way of statistical metrics, distributional properties of the training dataset or the test dataset; 
(Jayaraman, “[0144] The data pre-processing phase 1020 depicted in FIG. 10 also includes transforming the input data set (transformation 1026). This can include scaling, shifting, or applying some other equation to the values in one or more fields/columns of the input data set. For example, the values of one or more columns could be scaled and shifted to lie within the range of zero to one, e.g., to make the data compatible with the inputs of a neural network of the ML model to be generated. The information in one or more fields/columns of the input data set could be counted, indexed, sorted, binned, or otherwise manipulated in order to determine a probability distribution or other statistical information [determine, by way of statistical metrics, distributional properties ie statistical information of the training dataset or the test dataset] for the fields/columns. Additionally or alternatively, the values or other data in the fields/columns could be scaled, shifted, nonlinearly modified, or otherwise transformed such that the values comport with a specified distribution. For example, the values in a column of the input data set could be scaled and shifted such that the values correspond to a normal distribution having a specified mean and/or variance.”)
Jayaraman teaches perturb, by way of token insertion, token deletion, or token replacement, the training dataset or the test dataset into an expanded dataset; 
(Jayaraman, “[0143] The data pre-processing [perturb] phase 1020 depicted in FIG. 10 also includes removing repeated entries from the selected input data set (de-duplication 1024). This can include removing [token deletion], for entries that are identical to each other, all but one of the identical entries from the training data set. In some examples, the entries being “identical” could be determined based only on a subset of the fields/columns of the input data set (e.g., to disregard entries in the data set that differ from each other with respect to fields that are not going to be analyzed).
[0144] The data pre-processing phase 1020 depicted in FIG. 10 also includes transforming the input data set (transformation 1026). This can include scaling, shifting, or applying some other equation to the values in one or more fields/columns of the input data set. For example, the values of one or more columns could be scaled and shifted to lie within the range of zero to one, e.g., to make the data compatible with the inputs of a neural network of the ML model to be generated. The information in one or more fields/columns of the input data set could be counted, indexed, sorted, binned, or otherwise manipulated in order to determine a probability distribution or other statistical information for the fields/columns. Additionally or alternatively, the values or other data in the fields/columns could be scaled, shifted, nonlinearly modified, or otherwise transformed such that the values comport with a specified distribution. For example, the values in a column of the input data set could be scaled and shifted such that the values correspond to a normal distribution having a specified mean and/or variance.
[0145] In some examples, transforming the input data set 1026 as part of generating a conditioned data set [an expanded dataset] can include transforming the values or other information in multiple fields/columns of the input data set together with each other [token replacement]. For example, the values in multiple fields of the data set could be considered as scalars within a single vector for each of the entries in the input data set, and the data could be ‘rotated’ in order to, e.g., orthogonalize the data in the multiple fields. Such orthogonalization could be performed in order to speed the ML model building process by providing fields of data (and thus respective ML model input variables) that are more independent from each other statistically. Additionally or alternatively, the values in values in multiple fields of the data set could be considered as scalars within a single vector for each of the entries in the input data set, and the data could be subjected to a dimensionality reduction process. Such dimensionality reduction could be performed in order to speed the ML model building process by providing fewer fields of data (and thus respective ML model input variables) that may provide a substantial fraction of the predictive information that was present in the multiple fields of information to which the dimensionality reduction process was applied.”)
Jayaraman teaches obtain, by way of the machine learning model, classifications into the plurality of categories for the expanded dataset; 
(Jayaraman, “[0119] The conditioned data set [expanded dataset] generated by the data pre-processing phase 620 is then provided to the build model phase 622. The build model phase attempts to generate an ML model [by way of the machine learning model] based on the conditioned data set that reflects the information present in the input data set (e.g., the structure of the inter-relationships between the columns or other variables of the conditioned data set and one or more output variables which may be, themselves, columns of the conditioned data set). The build model phase 624 may include one or more of a variety of methods used to train ML models, e.g., reinforcement learning, gradient descent, backpropagation, genetic algorithms, dynamic programming, simulated annealing, model hyperparameter estimation, pruning, or other methods [obtain, …, classifications ie train the model]].”)
Jayaraman teaches based on the distributional properties, [the saliency maps], and the classifications, identify one or more potential causes of failure for the machine learning model.
(Jayaraman, “[0152] In order to determine whether the ML model building pipeline 1010 has failed, is likely to fail, and/or in order to determine one or more factors (e.g., properties of the input dataset, settings of the ML pipeline, and/or characteristics of the ML model) that likely contributed to the failure of the ML model pipeline, failure metrics can be determined at one or more points along the ML model building pipeline 1010 [identify one or more potential causes of failure for the machine learning model]. For example, one or more failure metrics could be determined after the dataset selection sub-phase (1023), after the de-duplication sub-phase (1025), after the transformation sub-phase (1027), after the indexing sub-phase (1029), after the model utility validation phase (1031), after the ML model build phase (1041), and/or at some other point in the ML model building process. A failure metric can be determined, at a particular phase of the ML pipeline, based on one or more properties of the data or other information (e.g., ML model parameters) input to the phase and/or the data or other information output from the phase that is related to adequacy of the data set and/or the overall likelihood that the ML pipeline will generate a non-deficient ML model [based on the distributional properties, the saliency maps, and the classifications ie properties of the data wherein the saliency map of Li is used as a property of the data in the model building phase wherein saliency is the contribution of each token to the final classification].”)
Jayaraman teaches based on the one or more potential causes of failure for the machine learning model, modify the training dataset so that: (i) the training dataset includes more tokens that are synonymous or similar to tokens with a saliency in the saliency maps that is above a predetermined threshold, 
(Jayaraman, “[0002] The embodiments herein provide methods for predicting that generation of an ML model is likely to fail and/or providing data-based context for the reason(s) that such generation failed. The pipeline prepares and analyzes the training data before attempting to build an ML model using this data. At multiple points along the pipeline, descriptive statistics are determined and compared to known baselines. If the statistics deviate from the baseline by more than a threshold amount and/or if the statistics conform to a “failure” baseline by more than a threshold amount, ML model generation may be terminated. Additionally or alternatively, an indication of the failure-related statistics may be provided to a user, such that the user may modify the ML model building process and/or the data set used to build the ML model in order to facilitate successful model creation [based on the one or more potential causes of failure for the machine learning model, modify the training dataset]. The statistics may include a variety of information about the structure and content of data used to generate an ML model, e.g., the density of the data, the proportion of the data that is unique (i.e., non-repeated entries), the distribution of to-be-predicted columns of the data, or other information.”)
Jayaraman teaches and automatically retrain the machine learning model with the training dataset as modified. 
(Jayaraman, “[0113] The goal of ML trainer 610 is to iteratively [automatically retrain] analyze the mappings to build a computational ML model 616 (e.g., an algorithm) that can, with high probability, produce the training output 614 from training input 612 [with the training dataset as modified]. In other words, for each unit of training input 612, the associated unit of training output 614 will be produced in the vast majority of instances. Furthermore, ML model 616 may be able to produce desirable output even from input that was not used during its training.”)
However, Jayaraman does not explicitly teach determine, by way of the machine learning model, saliency maps for tokens in the training dataset or the test dataset; saliency maps; so that, the training dataset includes more tokens that are synonymous or similar to tokens with a saliency in the saliency maps that is above a predetermined threshold
Yuan teaches determine, by way of the machine learning model, saliency maps for tokens in the training dataset or the test dataset; 

    PNG
    media_image1.png
    319
    969
    media_image1.png
    Greyscale

(Yuan, “Figure 1: Illustration of the overall pipeline of our approach. Part 1 shows the general structure of the CNN model that we try to investigate. After training, we first build saliency maps [saliency maps for tokens in the training dataset or the test dataset] for different hidden spatial locations, where saliency scores reflect contributions to the final decision. As the example shown in Part 2, the CNN model [by way of the machine learning model] classifies the test sentence to class c (shown in green). For the conv1 layer, the saliency score is computed for each spatial location, and three spatial locations are selected (highlighted in yellow). Next, for each selected location, optimization technique is employed to determine what is detected from the test sentence. As shown in Part 3, for the spatial location k, a randomly initialized input X0 is fed to the network and we iteratively update X0 towards the objective function shown in Equation 6. Finally, based on the receptive field of location k (shown in blue with red bounding box), we obtain an overall representation for this receptive field. We search the vocabulary and select word representations with high similarity to the overall representation. Then, the t-SNE is employed to visualize these representations, as shown in Part 4.”)
However, Yuan does not explicitly teach so that: the training dataset includes more tokens that are synonymous or similar to tokens with a saliency in the saliency maps that is above a predetermined threshold
Jin teaches so that: the training dataset includes more tokens that are synonymous or similar to tokens with a saliency in the saliency maps that is above a predetermined threshold, 
(Jin, pg. 3 para. 1-6, “Step 2: Word Transformer (line 7-30) For a given word wi ∈ X with a high importance score obtained in Step 1 [tokens with a saliency in the saliency maps that is above a predetermined threshold; wherein the predetermined threshold is the label for a given word (token) with a “high” importance score; Examiner interprets the saliency score of Yuan to be substitutable to the importance score of Jin as Yuan discloses in Figure 1 “saliency scores reflect contributions to the final decision;” and Jin discloses in pg. 2 para. 5, the importance score as “score Iwi to measure the influence of a word wi ∈ X towards the classification result F(X) = Y;” Thus, the methods of Yuan are used to determine saliency while Jin is relied upon to determine a threshold (as an indication of a “high” score)], we need to design a word replacement mechanism. A suitable replacement word needs to fulfill the following criteria: it should (1) have similar semantic meaning with the original one, (2) fit within the surrounding context, and (3) force the target model to make wrong predictions. In order to select replacement words that meet such criteria, we propose the following workflow. 
Synonym Extraction: We gather a candidate set CANDIDATES for all possible replacements of the selected word wi. CANDIDATES is initiated with N closest synonyms according to the cosine similarity between wi and every other word in the vocabulary. 
To represent the words, we use word embeddings from (Mrksiˇ c et al. 2016). These word vectors are specially curated for finding synonyms, as they achieve the state-of-the-art performance on SimLex-999, a dataset designed to measure how well different models judge semantic similarity between words (Hill, Reichart, and Korhonen 2015). 
Using this set of embedding vectors, we identify top N synonyms whose cosine similarity with w are greater than δ [so that: the training dataset includes more tokens that are synonymous or similar (as determined by cosine similarity) to tokens with a saliency in the saliency maps that is above a predetermined threshold]. Note that enlarging N or lowering δ would both generate more diverse synonym candidates; however, the semantic similarity between the adversary and the original sentence would decrease. In our experiments, empirically setting N to be 50 and δ to be 0.7 strikes a balance between diversity and semantic similarity control.”)
Jayaraman and Yuan are both considered to be analogous to the claimed invention because they are in the same field of NLP. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jayaraman to incorporate the teachings of Yuan in order to provide an approach to sentence classification utilizing CNN and saliency maps that derives meaningful interpretation for the hidden neurons of NLP models and explains the underlying decision process (Yuan, Introduction, “In this paper, we propose an approach to interpret and understand deep NLP models. Specifically, we focus on convolutional neural networks (CNN) (Krizhevsky, Sutskever, and Hinton 2012) for sentence classification tasks. Our approach employs gradient-based approaches (Simonyan, Vedaldi, and Zisserman 2013) and optimization techniques (Erhan et al. 2009) to select spatial locations with high contribution to the decision from hidden layers and study what is detected by these locations. We propose to approximately interpret the meaning of detected information using the nearest neighbors of the optimized representation based on the special property of word representations, which imply that words with semantically similar meanings are embedded to nearby points (Mikolov et al. 2013). Experimental results demonstrate that our approach can obtain reasonable and meaningful interpretation for hidden units. It is shown that our approach can explain the decision process in NLP models.”)
Jin is considered to be analogous to the claimed invention because they are in the same field of NLP. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jayaraman and Yuan to incorporate the teachings of Jin in order to provide a method of augmenting the training data with adversarial examples of important tokens/words replaced with synonyms as doing so would improve robustness of the model (Jin, Abstract, “Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness of these models by exposing the maliciously crafted adversarial examples. In this paper, we present TEXTFOOLER, a simple but strong baseline to generate adversarial text. By applying it to two fundamental natural language tasks, text classification and textual entailment, we successfully attacked three target models, including the powerful pre-trained BERT, and the widely used convolutional and recurrent neural networks. We demonstrate three advantages of this framework: (1) effective—it outperforms previous attacks by success rate and perturbation rate, (2) utility-preserving—it preserves semantic content, grammaticality, and correct types classified by humans, and (3) efficient—it generates adversarial text with computational complexity linear to the text length.”)

In regards to claim 2, 
Jayaraman and Yuan and Jin teaches The system of claim 1, 
Jayaraman teaches wherein the distributional properties exhibit a divergence between the categories associated with the training dataset and the test dataset that exceeds a predetermined threshold, and wherein the one or more potential causes of failure for the machine learning model include the divergence exceeding the predetermined threshold.
(Jayaraman, “[0155] In yet another example, determining the failure metric [the one or more potential causes of failure for the machine learning model] could include determining a distribution of values, categories, or other contents of the entries in a field/column of the data set [distributional properties exhibit]. For example, determining a failure metric could include determining whether a particular field/column contains only a single unique value. Additionally or alternatively, determining a failure metric could include determining whether a particular field/column contains values [between the categories associated with the training dataset and the test dataset] that are skewed [divergence] beyond a threshold amount [predetermined threshold], e.g., that a majority category or value represents more than 60% of the entries, more than 80% of the entries, or more than 90% of the entries.”)

In regards to claim 3, 
Jayaraman and Yuan and Jin teaches The system of claim 1, 
Jayaraman teaches wherein the distributional properties exhibit, per the units of text, a difference in mean token count or a variability of token count between the training dataset and the test dataset that exceeds a predetermined threshold, and wherein the one or more potential causes of failure for the machine learning model include the difference exceeding the predetermined threshold.
(Jayaraman, “[0155] In yet another example, determining the failure metric [the one or more potential causes of failure for the machine learning model] could include determining a distribution of values, categories, or other contents of the entries in a field/column of the data set. For example, determining a failure metric could include determining whether a particular field/column contains only a single unique value. Additionally or alternatively, determining a failure metric could include determining whether a particular field/column contains values that are skewed beyond a threshold amount [the difference exceeding the predetermined threshold], e.g., that a majority category or value represents more than 60% of the entries, more than 80% of the entries, or more than 90% of the entries [distributional properties exhibit, per the units of text, a difference in mean token count or a variability of token count ie a value (token) representing more of the data (thus varying in count from other tokens)].”)

In regards to claim 7, 
Jayaraman and Yuan and Jin teaches The system of claim 1, 
Yuan teaches wherein the saliency maps indicate that a stop word in one or more of the units of text has a saliency

    PNG
    media_image1.png
    319
    969
    media_image1.png
    Greyscale

(Yuan, “Figure 1: Illustration of the overall pipeline of our approach. Part 1 shows the general structure of the CNN model that we try to investigate. After training, we first build saliency maps [saliency map] for different hidden spatial locations, where saliency scores reflect contributions to the final decision. As the example shown in Part 2, the CNN model classifies the test sentence to class c (shown in green). For the conv1 layer, the saliency score is computed for each spatial location [a stop word in one or more of the units of text ie each spatial location has a saliency ie saliency score; wherein each token in the units of text has a saliency score], and three spatial locations are selected (highlighted in yellow). Next, for each selected location, optimization technique is employed to determine what is detected from the test sentence. As shown in Part 3, for the spatial location k, a randomly initialized input X0 is fed to the network and we iteratively update X0 towards the objective function shown in Equation 6. Finally, based on the receptive field of location k (shown in blue with red bounding box), we obtain an overall representation for this receptive field. We search the vocabulary and select word representations with high similarity to the overall representation. Then, the t-SNE is employed to visualize these representations, as shown in Part 4.”)
However, Yuan does not explicitly teach above a predetermined threshold, and wherein the one or more potential causes of failure for the machine learning model include the saliency being above the predetermined threshold in the one or more of the units of text.
Jayaraman teaches above a predetermined threshold, and wherein the one or more potential causes of failure for the machine learning model include the saliency being above the predetermined threshold in the one or more of the units of text.
(Jayaraman, “[0002] The embodiments herein provide methods for predicting that generation of an ML model is likely to fail and/or providing data-based context for the reason(s) that such generation failed. The pipeline prepares and analyzes the training data before attempting to build an ML model using this data. At multiple points along the pipeline, descriptive statistics are determined and compared to known baselines. If the statistics deviate from the baseline by more than a threshold amount and/or if the statistics conform to a “failure” baseline by more than a threshold amount [a stop word in one or more of the units of text has a saliency ie statistics above a predetermined threshold,], ML model generation may be terminated [wherein the one or more potential causes of failure for the machine learning model include the saliency ie statistics being above the predetermined threshold ie baseline in the one or more of the units of text].”)

In regards to claim 8, 
Jayaraman and Yuan and Jin teaches The system of claim 1, 
Jayaraman teaches wherein the one or more processors are further configured to: embed the units of text into n-dimensional vectors based on semantic content, based on the n-dimensional vectors, determine similarity metrics between pairs of the units of text; 
(Jayaraman, “[0128] In some examples, one or more of the fields may contain word vectors and/or paragraph vectors representing semantic content of, e.g., one or more of the other field(s) [embed the units of text into n-dimensional vectors based on semantic content]. Additionally or alternatively, such word vectors and/or paragraphs vectors could be generated, as part of a data pre-processing phase (e.g., 620) of a machine learning pipeline. Word vectors and paragraph vectors represent the overall “meaning” of corresponding words or collections of words (e.g., phrases, sentences, paragraphs), respectively. They do this by projecting into a semantically-encoded multidimensional vector space such that words and/or paragraphs having similar “meaning” are proximate to each other in the semantically-encoded multidimensional vector space (e.g., with respect to a Euclidean distance in the multidimensional vector space) [based on the n-dimensional vectors, determine similarity metrics between pairs of the units of text] while words and/or paragraphs having dissimilar “meanings” are distant from each other in the semantically-encoded multidimensional vector space. So, for example, word vectors for the words “uncle” and “aunt,” or for the words “car” and “automobile,” would be closer to each other, within the multidimensional vector space, than the word vectors for “duck” and “hypotenuse,” or for “flange” and “blancmange.””)
Jayaraman teaches and based on the similarity metrics, identify a particular unit of text of the units of text for which a predetermined number of most similar instances of the units of text have been labeled with different instances of the categories, 
(Jayaraman, “[0113] The goal of ML trainer 610 is to iteratively analyze the mappings to build a computational ML model 616 (e.g., an algorithm) that can, with high probability, produce the training output 614 from training input 612. In other words, for each unit of training input 612 [based on the similarity metrics; wherein the given input is determined in the data pre-processing phase that builds the word vectors based on the similarity metrics], the associated unit of training output 614 will be produced in the vast majority of instances. Furthermore, ML model 616 may be able to produce desirable output even from input that was not used during its training.
[0114] The types of models and methods through which these models can be trained vary dramatically. For instance, ML model 616 could be an artificial neural network, decision tree, random forest, support vector machine, Bayes classifier, k-means clusterer, linear regression predictor, and so on. But the embodiments herein may be operable with any type of machine learning technique.
[0115] Once tested, ML model 616 may be placed into production. Thus, like program 600, ML model 616 may receive production input 602. However, ML model 616 may produce production output 618 that is different from production output 604 [a predetermined number of most similar instances of the units of text ie production input 602 have been labeled with different instances of the categories ie different classification between output 618 and output 604]. As alluded to above, a well-trained ML model can often produce production output that is superior to that of a traditionally-developed algorithm.”)
Jayaraman teaches wherein identifying the one or more potential causes of failure for the machine learning model comprises identifying that the particular unit of text was mislabeled or that there is an overlap between the categories or that other similar units of text are mislabeled.
(Jayaraman, “[0150] The ML model building pipeline 1010 additionally includes an ML model building phase (build model phase 1040). This phase operates to generate the full ML model based on the conditioned data set generated by the data pre-processing phase 1020. The execution of the model building phase 1040 may be contingent on the outcome of the model utility validation phase 1030, on an analysis of one or more model failure metrics [identifying the one or more potential causes of failure for the machine learning model] generated based on the output(s) of previous phases or sub-phases of the ML model building pipeline 1010 [particular unit of text was mislabeled], or may be contingent on some other circumstance.”)
(Jayaraman, “[0117] Thus, ML trainer 610 may be implemented as a multi-stage pipeline as depicted in FIG. 6C. As shown, ML training includes two phases: data pre-processing phase 620 and build model phase 624. But in general, ML training may contain more or fewer phases. For example, the ML trainer 610 may include a model utility validation phase to determine whether a chosen ML model architecture is likely to provide benefits (e.g., predictive accuracy) relative to a model having fewer trainable parameters or that is otherwise less complex than the chosen ML model architecture.”; wherein the predictive accuracy is indicative of the mislabeling of inputs)

In regards to claim 9, 
Jayaraman and Yuan and Jin teaches The system of claim 1, 
Jayaraman teaches wherein identifying the one or more potential causes of failure for the machine learning model comprises associating at least some of the units of text with smart tags that specify an abnormality found in the training dataset or the test dataset.
(Jayaraman, “[0153] Thus, a failure metric as described herein may take many forms. In some examples, determining a failure metric could include determining statistics for fields/columns of an input data set and/or of a data set that has been generated from an input data set (e.g., a conditioned data set). For example, a number or percentage of entries in a column/field of a data set that are empty (e.g., that include a value indicative of the lack of available data for that entry in that field/column) [associating at least some of the units of text ie a number of entries in a column/field with smart tags that specify an abnormality ie empty found in the training dataset or the test dataset]. If all of the field/column's entries are empty, or more than a threshold amount (e.g., a threshold percentage) of the field/column's entries are empty, and/or fewer than a threshold amount are not empty, it could be more likely that a sufficient ML model (that is, a non-deficient ML model) could be determined based on that field/column (e.g., due to insufficient data upon which to build the ML model).”)

Claims 14 and 20 are rejected on the same grounds under 35 U.S.C. 103 as they are substantially similar to claim 1
Claim 15 is rejected on the same grounds under 35 U.S.C. 103 as they are substantially similar to claim 8
Claim 16 is rejected on the same grounds under 35 U.S.C. 103 as they are substantially similar to claim 9

Claim(s) 4 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Jayaraman in view of Yuan and Jin in further view of C.N. Pub. No. CN113392331A Liu, Gang (“Liu”)
In regards to claim 4,
Jayaraman and Yuan and Jin teaches The system of claim 1, 
Liu teaches wherein the token insertion includes inserting stop words or punctuation into the units of text of the training dataset or the test dataset, 
(Liu, Detailed Description, “The process of performing the enhanced processing on the original service data may include, but is not limited to, at least one of the following processing manners:
disorganizing the sequence among sentences: i.e. the sequence of sentences in the original service data is disturbed, and the content between any two punctuations can be regarded as a sentence.
Adding stop words: i.e. stop words are added in some sentences [the token insertion includes inserting stop words or punctuation into the units of text of the training dataset or the test dataset].”)
However, Liu does not explicitly teach wherein identifying the one or more potential causes of failure for the machine learning model comprises determining that a subset of the units of text of the training dataset or the test dataset were classified differently than their corresponding units of text in the expanded dataset
Jayaraman teaches and wherein identifying the one or more potential causes of failure for the machine learning model comprises determining that a subset of the units of text of the training dataset or the test dataset were classified differently than their corresponding units of text in the expanded dataset.
(Jayaraman, “[0150] The ML model building pipeline 1010 additionally includes an ML model building phase (build model phase 1040). This phase operates to generate the full ML model based on the conditioned data set generated by the data pre-processing phase 1020. The execution of the model building phase 1040 may be contingent on the outcome of the model utility validation phase 1030, on an analysis of one or more model failure metrics [identifying the one or more potential causes of failure for the machine learning model] generated based on the output(s) of previous phases or sub-phases of the ML model building pipeline 1010 [determining that a subset of the units of text of the training dataset or the test dataset were classified differently than their corresponding units of text in the expanded dataset], or may be contingent on some other circumstance.”)
(Jayaraman, “[0117] Thus, ML trainer 610 may be implemented as a multi-stage pipeline as depicted in FIG. 6C. As shown, ML training includes two phases: data pre-processing phase 620 and build model phase 624. But in general, ML training may contain more or fewer phases. For example, the ML trainer 610 may include a model utility validation phase to determine whether a chosen ML model architecture is likely to provide benefits (e.g., predictive accuracy) relative to a model having fewer trainable parameters or that is otherwise less complex than the chosen ML model architecture.”; wherein the predictive accuracy is indicative of the mislabeling of inputs)

Jayaraman and Yuan and Jin and Liu are both considered to be analogous to the claimed invention because they are in the same field of NLP. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jayaraman and Yuan and Jin to incorporate the teachings of Liu in order to provide a method to automatically generate a large number of samples to quickly train a classification model and improve prediction accuracy through the varying samples. (Liu, Abstract, “The embodiment of the application provides a text processing method and text processing equipment, which relate to the technical field of data processing, and the method comprises the following steps: when the quantity of the labeled texts in the labeled text set does not meet the preset condition, the semi-supervised classification model obtained based on the labeled text set training is used for predicting the prediction result of each type of the to-be-labeled texts in the to-be-labeled text set, a new labeled text is automatically generated based on the prediction result and added into the labeled text set, so that the automatic generation of the labeled text is realized, the labeling efficiency of the labeled text and the quality and coverage rate of the labeled text are improved, the purpose of quickly and effectively obtaining a large number of samples required by machine learning is achieved, a high-accuracy text classification model can be obtained based on the obtained labeled text set training, and the classification model can use a simple model due to the fact that the quantity of the labeled texts is large enough, and the training speed and the prediction efficiency of the classification model can be improved.”)

In regards to claim 6, 
Jayaraman and Yuan and Jin teaches The system of claim 1, 
Liu teaches wherein the token replacement includes contracting tokens, expanding contracted tokens, or replacing tokens with synonyms in the units of text of the training dataset or the test dataset, 
(Liu, Detailed Description, “The process of performing the enhanced processing on the original service data may include, but is not limited to, at least one of the following processing manners:
…
And replacing part of words in the original service data with similar words. Words may be randomly determined in the original business data and replaced with corresponding synonyms [replacing tokens with synonyms in the units of text of the training dataset or the test dataset]. Or, according to a certain rule (for example, selecting a preset word), selecting some words in the original service data and replacing the words with corresponding similar words.”)
However, Liu does not explicitly teach wherein identifying the one or more potential causes of failure for the machine learning model comprises determining that a subset of the units of text of the training dataset or the test dataset were classified differently than their corresponding units of text in the expanded dataset

Jayaraman teaches and wherein identifying the one or more potential causes of failure for the machine learning model comprises determining that a subset of the units of text of the training dataset or the test dataset were classified differently than their corresponding units of text in the expanded dataset.
(Jayaraman, “[0150] The ML model building pipeline 1010 additionally includes an ML model building phase (build model phase 1040). This phase operates to generate the full ML model based on the conditioned data set generated by the data pre-processing phase 1020. The execution of the model building phase 1040 may be contingent on the outcome of the model utility validation phase 1030, on an analysis of one or more model failure metrics [identifying the one or more potential causes of failure for the machine learning model] generated based on the output(s) of previous phases or sub-phases of the ML model building pipeline 1010 [determining that a subset of the units of text of the training dataset or the test dataset were classified differently than their corresponding units of text in the expanded dataset], or may be contingent on some other circumstance.”)
(Jayaraman, “[0117] Thus, ML trainer 610 may be implemented as a multi-stage pipeline as depicted in FIG. 6C. As shown, ML training includes two phases: data pre-processing phase 620 and build model phase 624. But in general, ML training may contain more or fewer phases. For example, the ML trainer 610 may include a model utility validation phase to determine whether a chosen ML model architecture is likely to provide benefits (e.g., predictive accuracy) relative to a model having fewer trainable parameters or that is otherwise less complex than the chosen ML model architecture.”; wherein the predictive accuracy is indicative of the mislabeling of inputs)

Claim(s) 5 is rejected under 35 U.S.C. 103 as being unpatentable over Jayaraman in view of Yuan and Jin in further view of U.S. Pub. No. US20180341839A1 Malak et al. (“Malak”)
In regards to claim 5,
Jayaraman and Yuan and Jin teaches The system of claim 1, 
Malak teaches wherein the token deletion includes deleting stop words or punctuation from the units of text of the training dataset or the test dataset, 
(Malak, “[0150] Therefore, an example embodiment provides a more powerful technique for performing sentiment analysis by using a combination of a CNN and a word embedding model.
[0151] FIG. 10 illustrates a diagram of a co-occurrence graph implemented for process 800 in accordance with some example embodiments. A co-occurrence graph may be implemented using techniques disclosed in “Choosing the Word Most Typical in Context: Using a Lexical Co-occurrence Network” (1997) Philip Edmonds [https://arxiv.org/pdf/cs/9811009.pdf], which is incorporated herein by reference for all purposes. In accordance with some example embodiments, a co-occurrence graph may be generated to identify words having a relationship for purposes of sentiment analysis. The connection of occurrence between words may be used to identify words having a more frequent occurrence for purposes of determining popular bigrams.
[0152] Prior to preparing the co-occurrence graph, process 800 may include eliminating some words (e.g., stop words) from the data set [wherein the token deletion includes deleting stop words or punctuation from the units of text of the training dataset or the test dataset]. For example, a toolkit such as NLTK stop words from http://www.nltk.org/nltk_data/, may be implemented to remove stop words. The toolkit described at http://www.nltk.org/nltk_data/ is incorporated by reference for all purposes.”)
Jayaraman teaches and wherein identifying the one or more potential causes of failure for the machine learning model comprises determining that a subset of the units of text of the training dataset or the test dataset were classified differently than their corresponding units of text in the expanded dataset.
(Jayaraman, “[0150] The ML model building pipeline 1010 additionally includes an ML model building phase (build model phase 1040). This phase operates to generate the full ML model based on the conditioned data set generated by the data pre-processing phase 1020. The execution of the model building phase 1040 may be contingent on the outcome of the model utility validation phase 1030, on an analysis of one or more model failure metrics [identifying the one or more potential causes of failure for the machine learning model] generated based on the output(s) of previous phases or sub-phases of the ML model building pipeline 1010 [determining that a subset of the units of text of the training dataset or the test dataset were classified differently than their corresponding units of text in the expanded dataset], or may be contingent on some other circumstance.”)
(Jayaraman, “[0117] Thus, ML trainer 610 may be implemented as a multi-stage pipeline as depicted in FIG. 6C. As shown, ML training includes two phases: data pre-processing phase 620 and build model phase 624. But in general, ML training may contain more or fewer phases. For example, the ML trainer 610 may include a model utility validation phase to determine whether a chosen ML model architecture is likely to provide benefits (e.g., predictive accuracy) relative to a model having fewer trainable parameters or that is otherwise less complex than the chosen ML model architecture.”; wherein the predictive accuracy is indicative of the mislabeling of inputs)
Jayaraman and Yuan and Jin and Malak are both considered to be analogous to the claimed invention because they are in the same field of NLP. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jayaraman and Yuan and Jin to incorporate the teachings of Malak in order to provide an automated process to enrich/repair data to augment resulting analyses to produce useful results. (Malak, “[0042] In certain embodiments, prior to loading data into a data warehouse (or other data target) the data is processed through a pipeline (also referred to herein as a semantic pipeline) which includes various processing stages. In some embodiments, the pipeline can include an ingest stage, prepare stage, profile stage, transform stage, and publish stage. During processing, the data can be analyzed, prepared, and enriched. The resulting data can then be published (e.g., provided to a downstream process) into one or more data targets (such as local storage systems, cloud-based storage services, web services, data warehouses, etc.) where various data analytics can be performed on the data. Because of the repairs and enrichments made to the data, the resulting analyses produce useful results. Additionally, because the data onboarding process is automated, it can be scaled to process very large data sets that cannot be manually processed due to volume.”)

Claim(s) 11-13 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Jayaraman in view of Yuan and Jin in further view of Wu et al. “Errudite: Scalable, Reproducible, and Testable Error Analysis” (“Wu”)
In regards to claim 11,
Jayaraman and Yuan and Jin teaches The system of claim 1, 
Wu teaches wherein the one or more processors are further configured to: obtain representations of classification errors in the expanded dataset; 
(Wu, Section I., “We present an error analysis methodology grounded in three principles: hypothesized error causes [obtain representations of classification errors] should be (1) formalized in a precise and reproducible manner, (2) applied to all instances [in the expanded dataset] rather than a small sample of errors, and (3) tested explicitly via counterfactual analysis. We instantiate these principles in the design of an interactive system called Errudite. At the core of Errudite is an expressive domain-specific language (DSL) for precisely querying instances based on linguistic features. The DSL concretizes unambiguous error hypotheses, allows grouping to scale to all instances, and enables rewriting for counterfactual testing. For example, it makes it easy to create a precise group containing all instances where the ground truth and the prediction share entity type (which would include the example in Figure 2), verify how often the model gets distracted, and check if the model turns to the correct entity when the distractor is removed. This sequence is precisely what we use to illustrate the design of Errudite (§3)”)
Wu teaches obtain, by way of the machine learning model, measures of classification confidence for each of the units of text; 
(Wu, Section 3.2 Table 1 teaches exact_match, f1, is_correct_sent as measures of classification confidence (performance on correctness) for the model ie by way of the machine learning model wherein the measure is obtained for all instances ie for each of the units of text)

    PNG
    media_image2.png
    222
    658
    media_image2.png
    Greyscale

Wu teaches generate a graphical user interface, wherein the graphical user interface includes indications of the distributional properties, indications of the classification errors, 

    PNG
    media_image3.png
    628
    1228
    media_image3.png
    Greyscale

(Wu, Fig. 1 teaches the Errudite interface ie a graphical user interface, Section 4, “The UI (Figure 1) [generate a graphical user interface] contains three main components. The central component contains a filter panel (C) and an instance browser (D), which help examine the results of data groupings or rewrite rules for iterative refinement. The collapsible sidebar on the left contains a list of different models being analyzed with summary statistics (A) [indications of the distributional properties] and customizable attribute histograms (B) [indications of the classification errors]. The one on the right contains a list of saved data groups (E) and rewrite rules (F); these can be loaded into the central component via mouse click. All groups and rewrite rules can be saved and loaded through the interface, so the analysis can be easily shared and reproduced”)
(Wu, Fig. 3 teaches (c) indications of the classification errors (orange indicates errors, blue indicates correct predictions))

    PNG
    media_image4.png
    390
    592
    media_image4.png
    Greyscale

Wu teaches and a histogram, wherein the histogram comprises a plurality of bins each representing respective ranges of the measures of classification confidence, and wherein each bin is associated with a first bar representing a first count of correct classifications and a second bar representing a second count of incorrect classifications; and 


    PNG
    media_image5.png
    101
    597
    media_image5.png
    Greyscale

(Wu, Fig. 5(a), Section 4., “To guide the exploration, group creation and refinement, Errudite supports defining complex attributes and inspecting their distributions. An example in Figure 5 shows the histogram of ground truth entity types [a histogram, wherein the histogram comprises a plurality of bins each representing respective ranges of the measures of classification confidence]. It displays the relative frequency of different entity answers, as well as the proportion of incorrect predictions [wherein each bin is associated with a first bar representing a first count of correct classifications ie blue bar and a second bar representing a second count of incorrect classifications ie orange bar]. The histograms are updated to show conditional distributions when a user selects a group. Figure 5(a) shows histograms for the ground truth entity type in the group is entity: when the answer is an entity, it is most often a DATE, PERSON, ORG, or CARDINAL. Figure 5(b) displays the same histogram for the group is distracted. We note that the frequency of “distraction” mistakes for PERSON and CARDINAL are higher, while lower for ORG, relative to the base frequencies in Figure 5(a), an insight that may warrant further investigation.”)
Wu teaches provide, to a client device, a representation of the graphical user interface.
(Wu, Section 4, “these can be loaded into the central component via mouse click [a representation of the graphical user interface]. All groups and rewrite rules can be saved and loaded through the interface, so the analysis can be easily shared and reproduced [provide, to a client device].”)
Jayaraman and Yuan and Jin and Wu are both considered to be analogous to the claimed invention because they are in the same field of NLP. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Jayaraman and Yuan and Jin to incorporate the teachings of Wu in order to provide a robust and customizable graphical interface to users to perform high quality and reproducible error analysis (Wu, Abstract, “Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions. This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process. First, error groups should be precisely defined for reproducibility; Errudite supports this with an expressive domain specific language. Second, to avoid spurious conclusions, a large set of instances should be analyzed, including both positive and negative examples; Errudite enables systematic grouping of relevant instances with filtering queries. Third, hypotheses about the cause of errors should be explicitly tested; Errudite supports this via automated counterfactual rewriting. We validate our approach with a user study, finding that Errudite (1) enables users to perform high quality and reproducible error analyses with less effort, (2) reveals substantial ambiguities in prior published error analyses practices, and (3) enhances the error analysis experience by allowing users to test and revise prior beliefs.”)

In regards to claim 12,
Jayaraman and Yuan and Jin and Wu teaches The system of claim 11, 
Yuan teaches wherein the graphical user interface also includes [two sets of] aggregated word clouds, one for tokens in the units of text that were most salient [for correct predictions of category labels, and another for] tokens in the units of text that were most salient [for incorrect predictions of category labels].

    PNG
    media_image6.png
    803
    780
    media_image6.png
    Greyscale

(Yuan, “Figure 2: The visualization interpretation result [aggregated word cloud] for the first example for the MR dataset. The middle part of the figure shows the contribution of different spatial locations in hidden layers, where red color means highest contribution to the final decision; blue color refers to the second highest contribution; and green means the third highest contribution [tokens in the units of text that were most salient]. The bounding boxes in different colors correspond to the receptive field of different spatial location. The top part shows the t-SNE visualization of the interpretation obtained by our approach. The interpretations of target spatial locations are marked as “targetword” and connected to the corresponding spatial locations by dash lines.”)
However, Yuan does not explicitly teach wherein the graphical user interface also includes two sets of [aggregated word clouds] visualizations, one [for tokens in the units of text that were most salient] for correct predictions of category labels, and another [for tokens in the units of text that were most salient] for incorrect predictions of category labels
Wu teaches wherein the graphical user interface also includes two sets of [aggregated word clouds] visualizations, one [for tokens in the units of text that were most salient] for correct predictions of category labels, and another [for tokens in the units of text that were most salient] for incorrect predictions of category labels ie a graphical user interface with a filter panel that can filter data groups based on given code such that there can be a data group for correct predictions and a data group for incorrect predictions
(Wu, Fig. 1 teaches the Errudite interface ie a graphical user interface, Section 4, “The UI (Figure 1) contains three main components. The central component contains a filter panel (C) and an instance browser (D), which help examine the results of data groupings or rewrite rules for iterative refinement. The collapsible sidebar on the left contains a list of different models being analyzed with summary statistics (A) and customizable attribute histograms (B). The one on the right contains a list of saved data groups (E) and rewrite rules (F); these can be loaded into the central component via mouse click. All groups and rewrite rules can be saved and loaded through the interface, so the analysis can be easily shared and reproduced”)
(Wu, Section 3.1, “Finally, extractors are composable through standard logical and numerical operators, serving as building blocks for more complex attributes. For example, to create a boolean attribute that checks if the ground truth span contains an entity, the != operator is used, yielding ENT(g)!="". A more complex example is counting the number of times the ground truth entity appears in the passage context: count(token(c, pattern=ENT(g))). Being reusable and composable makes extractors much more expressive than predefined attributes, and helps formulate much richer hypotheses.”)
(Wu, Section 3.2, “

    PNG
    media_image7.png
    299
    731
    media_image7.png
    Greyscale
 The query can be broken down into the following conditions: the ground truth is an entity (line 1); there are potential distractors – i.e., there are more tokens matching the ground truth entity type (ENT(g)) in the whole context than in the ground truth (lines 2-3); the prediction entity type matches the ground truth one (line 4) [for correct predictions of category labels, and another for incorrect predictions of category labels; wherein the filter panel (C) takes code and can be customized to generate visualizations based on whether the prediction is correct/incorrect (emphasis on Fig. 1 (C) wherein a user is able to press Record the Group after inputting specific code in the Filter CMD)]; and the prediction is incorrect (line 5). Starting from all instances, we can subset groups by applying these conditions successively in order. Errudite conveys useful statistics about the groups via visualizations, as in Figure 3.”)

    PNG
    media_image8.png
    190
    832
    media_image8.png
    Greyscale


In regards to claim 13,
Jayaraman and Yuan and Jin and Wu teaches The system of claim 11, 
Wu teaches wherein the graphical user interface also includes a filter menu that allows filtering of the histogram based on the categories as labelled or the categories as predicted.
(Wu, Fig. 1 teaches the Errudite interface ie a graphical user interface, Section 4, “The UI (Figure 1) contains three main components. The central component contains a filter panel (C) [a filter menu that allows filtering of the histogram] and an instance browser (D), which help examine the results of data groupings or rewrite rules for iterative refinement. The collapsible sidebar on the left contains a list of different models being analyzed with summary statistics (A) and customizable attribute histograms (B). The one on the right contains a list of saved data groups (E) and rewrite rules (F); these can be loaded into the central component via mouse click. All groups and rewrite rules can be saved and loaded through the interface, so the analysis can be easily shared and reproduced”)
(Wu, Section 3.1, “Finally, extractors are composable through standard logical and numerical operators, serving as building blocks for more complex attributes. For example, to create a boolean attribute that checks if the ground truth span contains an entity, the != operator is used, yielding ENT(g)!="". A more complex example is counting the number of times the ground truth entity appears in the passage context: count(token(c, pattern=ENT(g))). Being reusable and composable makes extractors much more expressive than predefined attributes, and helps formulate much richer hypotheses.”)
(Wu, Section 3.2, “

    PNG
    media_image7.png
    299
    731
    media_image7.png
    Greyscale
 The query can be broken down into the following conditions: the ground truth is an entity (line 1); there are potential distractors – i.e., there are more tokens matching the ground truth entity type (ENT(g)) in the whole context than in the ground truth (lines 2-3); the prediction entity type matches the ground truth one (line 4) [based on the categories as labelled or the categories as predicted; wherein the filter panel (C) takes code and can be customized to generate visualizations (histograms)]; and the prediction is incorrect (line 5). Starting from all instances, we can subset groups by applying these conditions successively in order. Errudite conveys useful statistics about the groups via visualizations, as in Figure 3.”)

    PNG
    media_image8.png
    190
    832
    media_image8.png
    Greyscale

Claim 18 is rejected on the same grounds under 35 U.S.C. 103 as they are substantially similar to claim 11
Claim 19 is rejected on the same grounds under 35 U.S.C. 103 as they are substantially similar to claim 12

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US Pub No. US20210150379A1 Wang et al. teaches Systems and methods for alerting to model degradation based on distribution analysis
US Pub No. US20210173736A1 Duesterwald et al. teaches Detection of failure conditions and restoration of deployed models in a computing environment
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASMINE THAI whose telephone number is (703)756-5904. The examiner can normally be reached M-F 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.T.T./Examiner, Art Unit 2129                                                                                                                                                                                                        





/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Oct 13, 2021
Application Filed
Mar 10, 2025
Non-Final Rejection — §103, §112
Jul 09, 2025
Response Filed
Aug 10, 2025
Final Rejection — §103, §112
Nov 03, 2025
Examiner Interview Summary
Nov 03, 2025
Applicant Interview (Telephonic)
Nov 03, 2025
Request for Continued Examination
Nov 07, 2025
Response after Non-Final Action
Feb 08, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/366,773
Patent 12561603
SYSTEM FOR TIME BASED MONITORING AND IMPROVED INTEGRITY OF MACHINE LEARNING MODEL INPUT DATA
2y 5m to grant Granted Feb 24, 2026
17/588,175
Patent 12555000
GENERATION OF CONVERSATIONAL TASK COMPLETION STRUCTURE
2y 5m to grant Granted Feb 17, 2026
17/676,775
Patent 12462154
METHOD AND SYSTEM FOR ASPECT-LEVEL SENTIMENT CLASSIFICATION BY MERGING GRAPHS
2y 5m to grant Granted Nov 04, 2025
17/470,900
Patent 12395590
REDUCTION AND GEO-SPATIAL DISTRIBUTION OF TRAINING DATA FOR GEOLOCATION PREDICTION USING MACHINE LEARNING
2y 5m to grant Granted Aug 19, 2025
17/357,626
Patent 12380361
Federated Machine Learning Management
2y 5m to grant Granted Aug 05, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
25%
Grant Probability
81%
With Interview (+56.3%)
4y 0m
Median Time to Grant
High
PTA Risk
Based on 24 resolved cases by this examiner. Grant probability derived from career allow rate.
Classification Evaluation and Improvement in Machine Learning Models

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email