Last updated: May 29, 2026

Application No. 18/066,321

SYSTEMS AND METHODS FOR LABEL GENERATION FOR UNLABELLED MACHINE LEARNING MODEL TRAINING DATA

Non-Final OA §101

Filed

Dec 15, 2022

Examiner

DORVIL, RICHEMOND

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Capital One Services LLC

OA Round

3 (Non-Final)

Interview Optional

— +22.5% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 28% grant rate with +22.5% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 53 resolved cases, 2023–2026

Examiner Intelligence

DORVIL, RICHEMOND View full profile →

Grants only 28% of cases

Career Allowance Rate

15 granted / 53 resolved

-33.7% vs TC avg

Strong +22% interview lift

Without

With

+22.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 6m

Avg Prosecution

3 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

3.4%

-36.6% vs TC avg

§103

90.4%

+50.4% vs TC avg

§102

3.4%

-36.6% vs TC avg

§112

0.7%

-39.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 53 resolved cases

Office Action

§101

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 06/11/2025 have been fully considered but they are not persuasive. 
               Applicant’s response to the 101 rejection, abstract idea, argued that “the claims integrate any   alleged abstract idea into a practical application and therefore are patent eligible” (see pages 12 & 13 of applicant’s remarks “The Step 2A, prong two, rejections” . The examiner respectfully disagrees because the applicant relied only on section of the specification not related to any of the language recited in the claims. In addition to the identified abstract idea, (see 101 rejection below), the claimed invention only included one or more processor, a computer readable medium processor which amount to no more than mere instructions to implement an otherwise abstract idea using generic computer components. The claimed invention (see independent claims 1, 2 and 16), does not include limitations related to a practical application remaining after the abstract idea has been extracted nor does the claimed invention relates to the improvement of a computer as a tool. The use of generic computer components only serves to use the computer as a tool for executing an otherwise abstract idea under the BRI. 
It should be also noted that the features in the specification upon which applicant relies (¶137, ¶1151) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). While the examiner is required to consult the specification to determine whether the disclosed invention improves technology or a technical field, but the claims should also be evaluated to insure they reflect the disclosed improvement. In this case, the claimed invention does not reflect the technical improvement described in the disclosure. The claims are deemed ineligible.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
The independent Claim 1 recites “a system for generating recommendations for unlabeled chatbot data for use in an artificial neural network model using natural language processing, the system comprising: one or more processors; and a non-transitory, computer-readable medium comprising instructions that, when executed by the one or more processors cause operations comprising: 
receiving, at a device on a computer network, first text data for a first unlabeled training datum, corresponding to chatbot messages from a chatbot user, in a first training dataset for a machine learning model, wherein the first text data comprises language- based data with syntax information; 
retrieving a plurality of textual datasets corresponding to a plurality of label records from a label record database, wherein the plurality of label records comprises a plurality of dataset identifiers and corresponding labels, and wherein the plurality of textual datasets comprises chatbot text data previously processed through machine learning models; 
generating a vector representation of the first text data and a plurality of vector representations of the plurality of textual datasets for use in a natural language processing model, wherein the vector representation preserves syntax; 
in response to inputting the vector representation and the plurality of vector representations into a natural language processing-based neural network model, determining a plurality of average similarity metrics between the first text data and each textual dataset in the plurality of textual datasets; 
determining that a first dataset of the plurality of textual datasets has a highest average similarity metric of the plurality of average similarity metrics, wherein the plurality of average similarity metrics measures lexical and syntactic similarity between text; 
based on determining that the first dataset of the plurality of textual datasets has the highest average similarity metric of the plurality of average similarity metrics, determining a label record for the first dataset, wherein the label record comprises a label name, a modification timestamp, and a first dataset identifier; 
generating a first recommendation, for display on a user interface, for a first label for the first text data based on the label record for the first dataset; 
generating a first feature input for the machine learning model based on the first label and the first unlabeled training datum; and 
generating a first output for the machine learning model based on the first label and the first unlabeled training datum, wherein the first output comprises sentiment analysis relating to satisfaction of the chatbot user.”
The limitations of “receiving…”, “retrieving…”, “generating a vector representation…”, “determining a plurality of average similarity metrics between the first text data and each textual dataset in the plurality of textual datasets”, “determining that a first dataset…”, “based on determining…”, “generating a first recommendation…”, “generating a first feature input…”, and “generating a first output…” as drafted cover mental activities which can be performed by a human using a pen and paper. For example, a human could: receive a piece of paper with text data on it; retrieve multiple boxes with descriptive labels, each containing multiple pieces of paper with text data; write down a vector representation of the text data from each paper in a way that preserves syntax, e.g., the text “Hello, world” could become the vector <0, 235>, where each unique word is assigned a unique number; perform an inner product to determine cosine similarity between two vectors; repeating this process between the received text data and every text data in the all of the boxes; finding the average cosine similarity for each box; reading the label of the box with the highest average similarity; recommending that the received text data be labeled with the same label as the box; combining the received text data and predicted label into a vector, e.g., <”Hello, world”, Greeting>; and creating a prediction of user satisfaction for the predicted label of the received text data, e.g., <Satisfied>.
This judicial exception is not integrated into a practical application. Claim 1 recites the additional elements of “one or more processors”, “a non-transitory, computer-readable medium”, “in response to inputting the vector representation and the plurality of vector representations into a natural language processing-based neural network model”, and “display on a user interface”. These additional elements do not integrate the abstract idea into a practical application because they are all considered mere instructions to apply the abstract idea on a generic computer. Regarding the “natural language processing-based neural network model”, there is no description of how the model performs “determining a plurality of average similarity metrics…”. According to MPEP 2106.05(f): “The recitation of claim limitations that attempt to cover any solution to an identified problem with no restriction on how the result is accomplished and no description of the mechanism for accomplishing the result, does not integrate a judicial exception into a practical application or provide significantly more because this type of recitation is equivalent to the words "apply it".”
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements using a general computer are noted. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

	The independent Claim 2 recites “a method for generating recommendations for unlabeled data using natural language processing, the method comprising: 
receiving, at a device on a computer network, first text data for a first unlabeled training datum in a first training dataset for a machine learning model; 
retrieving a plurality of textual datasets corresponding to a plurality of label records from a label record database; 
determining a first dataset that corresponds to a first label record within the plurality of label records; 
determining a first plurality of textual data corresponding to the first dataset; 
comparing the first text data and the first plurality of textual data to determine a first plurality of similarity metrics between the first text data and respective textual data in the first plurality of textual data; 
based on the first plurality of similarity metrics, determining the first label record for the first dataset; and 
generating a first recommendation for a first label for the first text data based on the first label record.”
The limitations of “receiving…”, “retrieving…”, “determining a first dataset…”, “determining a first plurality…”, “comparing the first text data…”, “based on the first plurality…”, and “generating a first recommendation…” as drafted cover mental activities which can be performed by a human using a pen and paper. For example, a human could: receive a piece of paper with text data on it; retrieve multiple boxes with descriptive labels, each containing multiple pieces of paper with text data; pick a first box; take multiple papers out of the first box; compare the received text data to each paper and assign each paper a score based on perceived similarity; based on the scores determine whether to read the label of the first box; recommending that the received text data be labeled with the same label as the first box.
This judicial exception is not integrated into a practical application. Claim 2 does not recite any additional elements.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Claims 3-15 are rejected as being directed to an abstract idea without significantly more under a similar rationale as Claim 2. These claims depend from Claim 2 and inherit its limitations.
Claim 3 discusses evaluating the similarity scores for a second dataset and suggesting a label based on the first dataset (the human could perform the same steps for a second box and decide to suggest the label of the first box).
Claim 4 discusses evaluating the similarity scores for a second dataset and suggesting a label based on the second dataset (the human could perform the same steps for a second box and decide to suggest the label of the second box).
Claim 5 discusses evaluating the similarity scores for a second dataset, comparing the scores of the second dataset to those of the first dataset, and suggesting a label based on that comparison (the human could perform the same steps for a second box, compare the results to the results of the first box, and suggest the label of the more similar box).
Claim 6 discusses comparing the average of the first similarity metrics to the average of the second similarity metrics (the human compares the average cosine similarities of the first and second box and determines which average is greater).
Claim 7 discusses retrieving text-based datasets based on dataset identifiers from label records found in a label record database (the human could have a dataset spreadsheet with box ID numbers and information regarding the type of information contained in the boxes that they use to determine which boxes they can select).
Claim 8 discusses updating the received text data with real-time data and repeating the labeling procedure on the updated text data (the human could be handed another piece of paper with additional text on it and recalculate all the cosine similarities).
Claim 9 discusses using a vector representation of the text data and inputting the vector representations of the received text data and the plurality of text data in each dataset into a natural language processing model which determines a similarity metric (the human can create a vector representation: “Hello, world” becomes <0, 235> and calculate cosine similarities, which can be considered distance metrics). Regarding the “natural language processing model”, there is no description of how the model performs “determining the first plurality of similarity metrics…”. According to MPEP 2106.05(f): “The recitation of claim limitations that attempt to cover any solution to an identified problem with no restriction on how the result is accomplished and no description of the mechanism for accomplishing the result, does not integrate a judicial exception into a practical application or provide significantly more because this type of recitation is equivalent to the words "apply it".”
Claim 10 discusses using a vector representation of the text data and performing an inner product to measure vector similarity (the human can calculate the inner product manually).
Claim 11 discusses using a natural language processing model to calculate text distances between the received text data and the text data stored in the datasets (the human can calculate cosine similarities, which can be considered distance values). There is no description of how the model performs “determining the first plurality of similarity metrics…”. According to MPEP 2106.05(f): “The recitation of claim limitations that attempt to cover any solution to an identified problem with no restriction on how the result is accomplished and no description of the mechanism for accomplishing the result, does not integrate a judicial exception into a practical application or provide significantly more because this type of recitation is equivalent to the words "apply it".”
Claim 12 discusses generating a machine learning model input from the unlabeled training datum and its new label and a machine learning model output based on the datum and its label (the human can generate an input of the form <”Hello, world”, Greeting> and an output <Satisfied>).
Claim 13 discusses generating a label for the first text data based on a comparison between model error indicators, which are present in their corresponding label records, for two datasets (the human can look at two box labels to see a parameter equivalent to the model error indicator and compare the two values to influence their label suggestion).
Claim 14 discusses generating a new label record for the first text data and its label (the human can update the dataset spreadsheet by writing in an identifier for the text data and its label).
Claim 15 discusses adding a new dataset to the system and reclassifying an already labeled text (a new box is brought in, and the human retrieves a classified text data and performs all the comparison steps again for the new box).

The independent Claim 16 recites “a non-transitory, computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: 
receiving, at a device on a computer network, first text data for a first unlabeled training datum in a first training dataset for a machine learning model; 
retrieving a plurality of textual datasets corresponding to a plurality of label records from a label record database; 
determining a first dataset that corresponds to a first label record within the plurality of label records; 
determining a first plurality of textual data corresponding to the first dataset; 
comparing the first text data and the first plurality of textual data to determine a first plurality of similarity metrics between the first text data and respective textual data in the first plurality of textual data; 
based on the first plurality of similarity metrics, determining the first label record for the first dataset; and 
generating a first recommendation for a first label for the first text data based on the first label record.”
The limitations of “receiving…”, “retrieving…”, “determining a first dataset…”, “determining a first plurality…”, “comparing the first text data…”, “based on the first plurality…”, and “generating a first recommendation…” as drafted cover mental activities which can be performed by a human using a pen and paper. For example, a human could: receive a piece of paper with text data on it; retrieve multiple boxes with descriptive labels, each containing multiple pieces of paper with text data; pick a first box; take multiple papers out of the first box; compare the received text data to each paper and assign each paper a score based on perceived similarity; based on the scores determine whether to read the label of the first box; recommending that the received text data be labeled with the same label as the first box.
This judicial exception is not integrated into a practical application. Claim 16 does not recite any additional elements.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim is not patent eligible.

Claims 17-20 are rejected as being directed to an abstract idea without significantly more under a similar rationale as Claim 16. These claims depend from Claim 16 and inherit its limitations.
Claim 17 discusses evaluating the similarity scores for a second dataset and suggesting a label based on the first dataset (the human could perform the same steps for a second box and decide to suggest the label of the first box).
Claim 18 discusses evaluating the similarity scores for a second dataset and suggesting a label based on the second dataset (the human could perform the same steps for a second box and decide to suggest the label of the second box).
Claim 19 discusses updating the received text data with real-time data and repeating the labeling procedure on the updated text data (the human could be handed another piece of paper with additional text on it and recalculate all the cosine similarities).
Claim 20 discusses generating a machine learning model input from the unlabeled training datum and its new label and a machine learning model output based on the datum and its label (the human can generate an input of the form <”Hello, world”, Greeting> and an output <Satisfied>).

Allowable Subject Matter
Claims 1 -20 would be allowable if rewritten to overcome their respective 101 rejections.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
He et al. US 11416904 B1 teaches “a method for machine learning-based account manager virtual assistant staging includes receiving a message and a classification, generating a staging record, generating a status using staging rules, generating an order when the message classification is order and the status is complete, and transmitting the order. An account manager virtual assistant staging system includes a processor and a memory storing instructions that cause the system to receive a message and a classification, generate a staging record, generate a status using staging rules, generate an order when the message classification is order and the status is complete, and transmit the order. A non-transitory computer readable medium contains program instructions that when executed, cause a computer to receive a message and a classification, generate a staging record, generate a status using staging rules, generate an order when the message classification is order and the status is complete, and transmit the order.”
Velagapudi et al. US 20210232911 A1 teaches “Techniques performed by a data processing system for analyzing training data for a machine learning model and identifying outliers in the training data herein include obtaining training data for the model from a memory of the data processing system; analyzing the training data using a Siamese Neural Network to determine within-label similarities and cross-label similarities associated with a plurality of data elements within the training data, the within-label representing similarities between a respective data element and a first set of data elements similarly labeled in the training data, the cross-label similarities representing similarities between the respective data element and a second set of data elements dissimilarly labeled in the training data; identifying outlier data elements in the plurality of data elements based on the within-label and cross-label similarities; and processing the training data comprising the outlier data elements. Processing may include deleting the outlier data elements or generating a report.”

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICHEMOND DORVIL whose telephone number is (571)272-7602. The examiner can normally be reached 8:30 - 5:30 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658

Read full office action

Prosecution Timeline

Show 5 earlier events

Jun 11, 2025

Response Filed

Sep 18, 2025

Final Rejection mailed — §101

Oct 07, 2025

Applicant Interview (Telephonic)

Oct 07, 2025

Examiner Interview Summary

Nov 14, 2025

Response after Non-Final Action

Dec 19, 2025

Request for Continued Examination

Jan 16, 2026

Response after Non-Final Action

May 26, 2026

Non-Final Rejection mailed — §101 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/173,402

Patent 12591738

Autocorrect Candidate Selection

3y 1m to grant Granted Mar 31, 2026

18/461,095

Patent 12573397

ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF

2y 6m to grant Granted Mar 10, 2026

18/301,064

Patent 12567401

EVALUATING RELIABILITY OF AUDIO DATA FOR USE IN SPEECH PROCESSING

2y 10m to grant Granted Mar 03, 2026

18/447,506

Patent 12547849

ABSTRACTIVE SUMMARIZATION OF INFORMATION TECHNOLOGY ISSUES USING A METHOD OF GENERATING COMPARATIVES

2y 6m to grant Granted Feb 10, 2026

18/005,801

Patent 12505853

SIGNAL PROCESSING DEVICE AND METHOD

2y 11m to grant Granted Dec 23, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

28%

Grant Probability

51%

With Interview (+22.5%)

3y 6m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 53 resolved cases by this examiner. Grant probability derived from career allowance rate.