Last updated: May 29, 2026

Application No. 18/748,865

METHOD AND SYSTEM FOR TRAINING A NEURAL LANGUAGE-BASED MODEL FOR DATA ANNOTATION

Non-Final OA §101§102

Filed

Jun 20, 2024

Priority

Jun 23, 2023 — IN 202311041980

Examiner

KIM, ETHAN DANIEL

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Jpmorgan Chase Bank N A

OA Round

1 (Non-Final)

Interview Optional

— +29.5% interview lift. Examiner has a relatively high allowance rate (78%); +29.5% interview lift. A written response may suffice.

Based on 107 resolved cases, 2023–2026

Examiner Intelligence

KIM, ETHAN DANIEL View full profile →

Grants 78% — above average

Career Allowance Rate

83 granted / 107 resolved

+15.6% vs TC avg

Strong +30% interview lift

Without

With

+29.5%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

11 currently pending

Career history

122

Total Applications

across all art units

Statute-Specific Performance

§101

3.4%

-36.6% vs TC avg

§103

69.8%

+29.8% vs TC avg

§102

23.4%

-16.6% vs TC avg

§112

0.8%

-39.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 107 resolved cases

Office Action

§101 §102

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
2.	Claim Rejections - 35 USC § 101 7. 35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. 

3.	Claims 1-6 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
The Independent claim 1 recite(s) “A method for training a neural language-based model for data annotation, the method comprising: receiving, by at least one processor via a communication interface, a first set of data from a plurality of sources, each item of the first set of data being associated with a pre-defined data class; generating, by the at least one processor, at least one category vocabulary for the pre-defined data class; identifying, by the at least one processor, at least one token in the first set of data based on an analysis of the first set of data, wherein the at least one token corresponds to a category indicator of the pre-defined data class; masking, by the at least one processor, the at least one token; feeding, by the at least one processor, the masked at least one token together with a corresponding contextual vector to the neural language-based model; and predicting, by the at least one processor using the neural language-based model, a class of the masked at least one token using the corresponding contextual vector”.
The limitations “receiving, by at least one processor via a communication interface, a first set of data from a plurality of sources, each item of the first set of data being associated with a pre-defined data class; generating, by the at least one processor, at least one category vocabulary for the pre-defined data class; identifying, by the at least one processor, at least one token in the first set of data based on an analysis of the first set of data, wherein the at least one token corresponds to a category indicator of the pre-defined data class; masking, by the at least one processor, the at least one token; feeding, by the at least one processor, the masked at least one token together with a corresponding contextual vector to the neural language-based model; and predicting, by the at least one processor using the neural language-based model, a class of the masked at least one token using the corresponding contextual vector” as drafted, covers a mental process, as this could be done by mentally or by hand with pen and paper. 
This judicial exception is not integrated into a practical application. Claim 1 recites “A method for training a neural language-based model for data annotation, the method comprising:…” this limitation directs towards using a computer for the method, and does not impose any meaningful limits on practicing the abstract idea. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The addition of the generic computer components recited above with regard to claim 1 do not amount to more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Claim 1 does not recite any additional limitations. The claim as drafted, is not patent eligible.
Claim Rejections - 35 USC § 102
4.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

5.	Claims 1-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Reza
(U.S. Publication No. 20230237277).
Regarding claim 1, Reza discloses a method for training a neural language-based model for data annotation, the method comprising: 
receiving, by at least one processor via a communication interface, a first set of data from a plurality of sources, each item of the first set of data being associated with a pre-defined data class ([0036] - The labels may be provided by a user (e.g., a customer) and may be particular to a domain that the user intends to train the model within. For example, the text labels may be words such as terrible, bland, flavorful, delicious, disgusting, sour, sweet, poison, enjoyable, spicy, etc. that relate to various semantic classes (e.g., positive, negative, neutral, or the like) to be predicted for each text example within the domain of food); 
generating, by the at least one processor, at least one category vocabulary for the pre-defined data class ([0038] - a prompt template 115 is generated for each relevant feature (a subset of features) selected from the extracted relevant features 110. Each dynamic prompt template 115 comprises a prompt for a text example from the training data, a relevant feature 110 extracted from the text example, and a blank or open field. The prompts are in natural language and are composed of discrete tokens from the vocabulary); 
identifying, by the at least one processor, at least one token in the first set of data based on an analysis of the first set of data, wherein the at least one token corresponds to a category indicator of the pre-defined data class ([0038] - The prompts are in natural language and are composed of discrete tokens from the vocabulary); 
masking, by the at least one processor, the at least one token ([0041] - The labels within the prompting templates are then masked with a masking token); 
feeding, by the at least one processor, the masked at least one token together with a corresponding contextual vector to the neural language-based model ([0041] - The prompting functions 120 with the masked prompting templates are then input into the model [0046] - The training techniques may depend on the type of model that is being trained. For example, there are different types of supervised learning models, such as different types of neural network models, support vector machine (SVM) models, and others); 
and predicting, by the at least one processor using the neural language-based model, a class of the masked at least one token using the corresponding contextual vector ([0064] - The model learns statistical properties of word sequences and linguistic patterns given the words in the prompting functions (i.e., the words in the text example and the prompting template) and uses those properties and patterns to predict text for the masked labels. The conditional probability of predicting the text for the mask labels provided the set of prompting functions is evaluated and combined together to predict a joint probability value for the solution such as a class (e.g., a sentiment class)).
Regarding claim 2, Reza discloses the method, wherein the identifying of the at least one token in the first set of data comprises: 
identifying, by the at least one processor, contextually similar words in the first set of data that represent the pre-defined data class ([0036] - The labels may be provided by a user (e.g., a customer) and may be particular to a domain that the user intends to train the model within. For example, the text labels may be words such as terrible, bland, flavorful, delicious, disgusting, sour, sweet, poison, enjoyable, spicy, etc. that relate to various semantic classes (e.g., positive, negative, neutral, or the like) to be predicted for each text example within the domain of food);
retrieving, by the at least one processor from a word repository, at least one replacement word for the contextually similar words ([0057] - his paraphrasing can be done in a number of ways, including using round-trip translation of the prompt into another language then back, using replacement of phrases from a thesaurus, or using a neural prompt rewriter specifically optimized to improve accuracy of systems using the prompt); 
checking, by the at least one processor, an occurrence of the at least one replacement word in the at least one category vocabulary ([0056] - The prompt mining approach is a mining-based method to automatically find templates given a set of training inputs x and outputs y. This method scrapes a large text corpus (e.g., Wikipedia) for strings containing x and y, and finds either the middle words or dependency paths between the inputs and outputs. Frequent middle words or dependency paths can serve as a template as in “[X] middle words [Z].” The middle words may be searched based on the extracted features or a middle word may be replaced with an extracted feature); 
and tagging, by the at least one processor, the contextually similar words as the at least one token in an event that the occurrence of the at least one replacement word in the at least one category vocabulary exceeds a threshold number ([0076] - modifying the model parameters with goal being to minimize the difference between the text labels and the text predicted for the mask tokens and minimize the difference between the specified solution label and the joint probability value predicted for the solution. The training may be performed iteratively (steps a-h) for each prompting function and/or until a specified condition is met, e.g., the model achieves a accuracy above a given threshold. The first cost function and the second cost function may be the same cost function or different cost functions).
Regarding claim 3, Reza discloses the method, further comprising implementing a self-training process for the neural language-based model on an unlabeled second set of data ([0033] - obtaining a set of training data comprising text examples and associated labels, where the labels comprise: (i) text labels that relate to possible solutions for a task to be learned by a machine learning language model, and (ii) specified solution labels for the task).
Regarding claim 4, Reza discloses the method, further comprising: updating, by the at least one processor, the at least one category vocabulary with the identified at least one token for the pre-defined data class ([0033] - where the training learns or updates model parameters of the machine learning language model for performing the task; and providing the machine learning language model with the learned or updated model parameters. [0038] - a prompt template 115 is generated for each relevant feature (a subset of features) selected from the extracted relevant features 110. Each dynamic prompt template 115 comprises a prompt for a text example from the training data, a relevant feature 110 extracted from the text example, and a blank or open field. The prompts are in natural language and are composed of discrete tokens from the vocabulary).
Regarding claim 5, Reza discloses the method, wherein the first set of data comprises domain-specific data ([0036] - The labels may be provided by a user (e.g., a customer) and may be particular to a domain that the user intends to train the model within).
Regarding claim 6, Reza discloses a computing device configured to implement an execution of a method for training a neural language-based model for data annotation, the computing device comprising: 
a processor ([0070] - …executed by one or more processing units…);
a memory ([0070] – The software may be stored in a memory…); 
and a communication interface coupled to each of the processor and the memory, wherein the processor is configured to: 
receive, via the communication interface, a first set of data from a plurality of sources, each item of the first set of data being associated with a pre-defined data class ([0036] - The labels may be provided by a user (e.g., a customer) and may be particular to a domain that the user intends to train the model within. For example, the text labels may be words such as terrible, bland, flavorful, delicious, disgusting, sour, sweet, poison, enjoyable, spicy, etc. that relate to various semantic classes (e.g., positive, negative, neutral, or the like) to be predicted for each text example within the domain of food);
generate at least one category vocabulary for the pre-defined data class ([0038] - a prompt template 115 is generated for each relevant feature (a subset of features) selected from the extracted relevant features 110. Each dynamic prompt template 115 comprises a prompt for a text example from the training data, a relevant feature 110 extracted from the text example, and a blank or open field. The prompts are in natural language and are composed of discrete tokens from the vocabulary);
identify at least one token in the first set of data based on an analysis of the first set of data, wherein the at least one token corresponds to a category indicator of the pre-defined data class ([0038] - The prompts are in natural language and are composed of discrete tokens from the vocabulary);
mask the at least one token ([0041] - The labels within the prompting templates are then masked with a masking token);
feed the masked at least one token together with a corresponding contextual vector to the neural language-based model ([0041] - The prompting functions 120 with the masked prompting templates are then input into the model [0046] - The training techniques may depend on the type of model that is being trained. For example, there are different types of supervised learning models, such as different types of neural network models, support vector machine (SVM) models, and others);
and predict, using the neural language-based model, a class of the masked at least one token using the corresponding contextual vector ([0064] - The model learns statistical properties of word sequences and linguistic patterns given the words in the prompting functions (i.e., the words in the text example and the prompting template) and uses those properties and patterns to predict text for the masked labels. The conditional probability of predicting the text for the mask labels provided the set of prompting functions is evaluated and combined together to predict a joint probability value for the solution such as a class (e.g., a sentiment class)).
Dependent claims 7-10 are analogous in scope to claims 2-5, and are rejected according to the same reasoning.
Regarding claim 11, Reza discloses a non-transitory computer readable storage medium storing instructions for training a neural language-based model for data annotation, the storage medium comprising executable code which, when executed by a processor, causes the processor to: 
receive, via a communication interface, a first set of data from a plurality of sources, each item of the first set of data being associated with a pre-defined data class ([0036] - The labels may be provided by a user (e.g., a customer) and may be particular to a domain that the user intends to train the model within. For example, the text labels may be words such as terrible, bland, flavorful, delicious, disgusting, sour, sweet, poison, enjoyable, spicy, etc. that relate to various semantic classes (e.g., positive, negative, neutral, or the like) to be predicted for each text example within the domain of food);
generate at least one category vocabulary for the pre-defined data class ([0038] - a prompt template 115 is generated for each relevant feature (a subset of features) selected from the extracted relevant features 110. Each dynamic prompt template 115 comprises a prompt for a text example from the training data, a relevant feature 110 extracted from the text example, and a blank or open field. The prompts are in natural language and are composed of discrete tokens from the vocabulary);
identify at least one token in the first set of data based on an analysis of the first set of data, wherein the at least one token corresponds to a category indicator of the pre-defined data class ([0038] - The prompts are in natural language and are composed of discrete tokens from the vocabulary);
mask the at least one token ([0041] - The labels within the prompting templates are then masked with a masking token);
feed the masked at least one token together with a corresponding contextual vector to the neural language-based model ([0041] - The prompting functions 120 with the masked prompting templates are then input into the model [0046] - The training techniques may depend on the type of model that is being trained. For example, there are different types of supervised learning models, such as different types of neural network models, support vector machine (SVM) models, and others);
and predict, using the neural language-based model, a class of the masked at least one token using the corresponding contextual vector ([0064] - The model learns statistical properties of word sequences and linguistic patterns given the words in the prompting functions (i.e., the words in the text example and the prompting template) and uses those properties and patterns to predict text for the masked labels. The conditional probability of predicting the text for the mask labels provided the set of prompting functions is evaluated and combined together to predict a joint probability value for the solution such as a class (e.g., a sentiment class)).
Dependent claims 12-15 are analogous in scope to claims 2-5, and are rejected according to the same reasoning.
Conclusion
6.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Girardi (U.S. Publication No. 20210286831) teaches query expansion in information retrieval systems. Nguyen (U.S. Publication No. 20230044266) teaches machine learning method and named entity recognition apparatus.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ETHAN DANIEL KIM whose telephone number is (571) 272-1405.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ETHAN DANIEL KIM/
Examiner, Art Unit 2658



/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658

Read full office action

Prosecution Timeline

Jun 20, 2024

Application Filed

Jan 14, 2026

Non-Final Rejection mailed — §101, §102

Apr 10, 2026

Response Filed

Precedent Cases

Applications granted by this same examiner with similar technology

18/309,496

Patent 12639532

GENERATING MULTI-ORDER TEXT QUERY RESULTS UTILIZING A CONTEXT ORCHESTRATION ENGINE

3y 0m to grant Granted May 26, 2026

18/405,269

Patent 12640162

System and Method for Podcast Repetitive Content Detection

2y 4m to grant Granted May 26, 2026

17/588,241

Patent 12626049

System and Method for Automatic Summarization in Interlocutor Turn-Based Electronic Conversational Flow

4y 3m to grant Granted May 12, 2026

18/303,524

Patent 12626712

SPEECH ENHANCEMENT SYSTEM

3y 0m to grant Granted May 12, 2026

17/895,715

Patent 12620397

SYSTEMS AND METHODS FOR PERFORMING LIVE TRANSCRIPTION

3y 8m to grant Granted May 05, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

78%

Grant Probability

99%

With Interview (+29.5%)

2y 10m (~11m remaining)

Median Time to Grant

Low

PTA Risk

Based on 107 resolved cases by this examiner. Grant probability derived from career allowance rate.