Last updated: May 29, 2026
Application No. 18/777,830
AUTOMATIC LABELING OF TEXT DATA

Non-Final OA §101§103§112
Filed
Jul 19, 2024
Priority
Jun 29, 2021 — IN 202141029147 +1 more
Examiner
DWIVEDI, MAHESH H
Art Unit
2168
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
1 (Non-Final)
Interview Optional

— +4.5% interview lift. Interview lift (+4.5%) is below the 15.0% threshold. A written response is recommended.
Based on 754 resolved cases, 2023–2026
Examiner Intelligence

DWIVEDI, MAHESH H View full profile →
Grants 69% — above average
Career Allowance Rate
523 granted / 754 resolved
+14.4% vs TC avg
Minimal +4% lift
Without
With
+4.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
20 currently pending
Career history
774
Total Applications
across all art units
Statute-Specific Performance

§101
5.9%
-34.1% vs TC avg
§103
76.0%
+36.0% vs TC avg
§102
11.2%
-28.8% vs TC avg
§112
4.4%
-35.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 754 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
2.	Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
3.         The information disclosure statements (IDS) submitted on 10/30/2025 and 03/11/2026 have been received, entered into the record, and considered.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.
Claim Objections
4.	Claim 1 is objected to because of the following informalities:  There should be a comma between the words “model” and “the” in the limitation “providing as an input to a generative model the positive example and the negative example”.  Appropriate correction is required.
	Dependent claims 2-7 are objected to for incorporating the deficiencies of independent claim 1.
Claim 8 is objected to because of the following informalities:  The limitation “generating an prompt for a generative model based on the negative example and the positive example” is grammatically incoherent and should be replaced with “generating a prompt for a generative model based on the negative example and the positive example”.  Appropriate correction is required.
	Dependent claims 9-12 are objected to for incorporating the deficiencies of independent claim 8.
	Claim 11 is objected to because of the following informalities:  The limitation “wherein the first ranked score of the positive example and the second ranked score of the negative example are generate based on a response from submitting a candidate result generate by the generative model to a search engine as a query over a corpus comprising the positive example and the negative example” is grammatically incoherent and should be replaced with “wherein the first ranked score of the positive example and the second ranked score of the negative example are generated based on a response from submitting a candidate result generated by the generative model to a search engine as a query over a corpus comprising the positive example and the negative example”.  Appropriate correction is required.
	Dependent claim 12 is objected to for incorporating the deficiencies of dependent claim 11.
	Claim 13 is objected to because of the following informalities:  The limitation “outputting an indication that the candidate text corresponds to the label based an a result of classifying the candidate text using the machine learning model” is grammatically incoherent and should be replaced with “outputting an indication that the candidate text corresponds to the label based on a result of classifying the candidate text using the machine learning model”.  Appropriate correction is required.
	Dependent claims 14-20 are objected to for incorporating the deficiencies of independent claim 13.
Claim Rejections - 35 USC § 112
5.	The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
6.	The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
7.	Claim 8 recites the limitation “determining a label probability estimate by comparing a first ranked score of the positive example generated by the generative model based on the prompt to a second ranked score of the negative example result generated by the generative model based on the prompt" in Page 02.  There is insufficient antecedent basis for this limitation in the claim as no “negative example result” is claimed earlier in the claim.
	Dependent claims 9-12 are rejected to for incorporating the deficiencies of independent claim 8.
Claim 10 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Specifically, it is unclear as to whether the later claimed “negative example” in the limitation “causing the generative model to generate the negative example from a second input including a negative example, wherein the negative example text is generated using semantic language processing and embodying a concept opposite to the label description” refers to the earlier claimed “negative example” in the same claim and in parent independent claim 8.
Dependent claims 11-12 are rejected to for incorporating the deficiencies of dependent claim 10.
	Claim 10 recites the limitation “causing the generative model to generate the negative example from a second input including a negative example, wherein the negative example text is generated using semantic language processing and embodying a concept opposite to the label description" in Page 03.  There is insufficient antecedent basis for this limitation in the claim as no “negative example text” is claimed earlier in the claim or in parent independent claim 8.
	Dependent claims 11-12 are rejected to for incorporating the deficiencies of dependent claim 10.
Claim Rejections - 35 USC § 101
8.         35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
9.         Independent claim 8 is rejected under 35 U.S.C 101 because the claimed invention is directed to the non-statutory subject area of electro-magnetic signals and carrier waves.  Claim 8 is directed towards “computer-readable media”.  The examiner interprets “computer-readable media” as media defined by the characteristics in paragraphs 192 and 194 of the applicant’s specification.  According to paragraphs 194 and 196 of the applicant’s specification, “Computing device 800 typically includes a variety of computer-readable media. Computer-readable media may be any available media that may be accessed by computing device 800 and includes both volatile and nonvolatile, removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media” (Paragraph 194) and “Communication media typically embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media” (Paragraph 196).  Thus, independent claim 8 is rejected for containing nonstatutory subject matter of carrier/propagated signals/waves because the specification states that computer-readable media can include communication media.  The examiner interprets a computer-readable media as being directed towards the non-statutory subject matter of carrier waves.
            Dependent Claims 9-12 are rejected for incorporating the deficiencies of independent claim 8.
10.	Claims (1-7), (8-12), and (13-20) are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
Under the 2019 PEG, when considering subject matter eligibility under 35 U.S.C. § 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter (step 1).  If the claim does fall within one of the statutory categories, it must then be determined whether the claim is directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea) (step 2A prong 1), and if so, it must additionally be determined whether the claim is integrated into a practical application (step 2A prong 2).  If an abstract idea is present in the claim without integration into a practical application, any element or combination of elements in the claim must be sufficient to ensure that the claim amounts to significantly more than the abstract idea itself (step 2B).  
In the instant case, claims (1-7), (8-12), and (13-20) are directed to a method, a computer-readable media, and a system respectively. Thus, each of the claims falls within one of the four statutory categories. However, the claims also fall within the judicial exception of an abstract idea. 
Under Step 2A Prong 1, the test is to identify whether the claims recite a judicial exception. The examiner notes that the claimed invention recites an abstract idea in that the instant application recites a mental processes, specifically labeling data.
The examiner further notes that claims (1-7), (8-12), and (13-20) recite a method, a computer-readable media, and a system for labeling data which is similar to themes defined above of method of mental processes such as performing the labeling data, and is similar to the abstract idea identified in the 2019 PEG in grouping “c” in that the claims recite certain methods of mental processes such as performing the labeling of data. The limitations, substantially comprising the body of the claim, recite a process of labeling data.  The examiner notes that the claimed invention labels data.  Because the limitations above closely follow the steps in labeling data, and the steps of the claims involve mental processes, the claim recites an abstract idea consistent with the “mental processes” grouping set forth in the 2019 PEG.
Claim 1:
A method comprising: receiving a candidate text; 
receiving a label description; 
obtaining a positive example and a negative example associated with the label description; 
providing as an input to a generative model the positive example and the negative example; 
determining a label probability estimate based on an output of the generative model; and 
outputting an indication whether the candidate text corresponds to the label description based on the label probability estimate.
These limitations, as drafted, is an apparatus that, under its broadest reasonable interpretation, covers the performance of mental processes specifically labeling data.  Labeling data has long before the modern computer was invented, and continues to be predominantly a product of human endeavor. The instant application recites labeling data.  Moreover, the obtaining of a positive example and negative example can be performed by a human via their mind and/or pen & paper.  Furthermore, the providing of an input into a model can be performed by a human via their mind and/or pen & paper.  Additionally, the claimed determining of a label probability estimate can be performed by a human via their mind and/or pen & paper.  Furthermore, the outputting of an indication on whether a candidate text corresponds to a label description based on a determined label probability estimate can be performed by a human via their mind and/or pen & paper.  Because the limitations above closely follow the steps of labeling data, and the steps involved human judgments, observations and evaluations that can be practically or reasonably performed in the human mind and/or pen & paper, the claim recites an abstract idea consistent with the “mental process” grouping set forth in the 2019 PEG. 
If the claims recite the judicial exception of an abstract idea, it must then be determined under Step 2A Prong 2 whether the judicial exception is integrated into a practical application.  The Examiner notes that considerations under Step 2A Prong 2 comprise most the consideration previously evaluated in the context of Step 2B. The Examiner submits that the considerations discussed previously determined that the claim does not recite “significantly more” at Step 2B would be evaluated the same under Step 2A Prong 1 and result in the determination that the claim does not integrate the abstract idea into a practical application.
The instant application fails to integrate the judicial exception into a practical application because the instant application merely recites words “apply it” (or an equivalent) with the judicial exception or merely includes instructions to implement an abstract idea. The instant application is directed to an apparatus instructing the reader to implement the identified apparatus of mental processes of labeling data. The elements of the claim do not themselves amount to an improvement to the computer, to a technology or another technical field.  Moreover, the receiving of candidate text is a data gathering operation that is an insignificant data gathering operation that does not integrate the abstract idea into a practical application.  Furthermore, the receiving of a label description is a data gathering operation that is an insignificant data gathering operation that does not integrate the abstract idea into a practical application.  Additionally, the output from a model is a data output operation that is an insignificant data output operation that does not integrate the abstract idea into a practical application.
Here, the claim elements entirely comprise the abstract idea, leaving little if any aspects of the claim for further consideration under Step 2A Prong 2. In short, the claims have failed to integrate a practical application (see at least 84 Fed. Reg. (4) at 55). Under the 2019 PEG, this supports the conclusion that the claim is directed to an abstract idea, and the analysis proceeds to Step 2B.
While many considerations in Step 2A need not be reevaluated in Step 2B because the outcome will be the same. Here, on the basis of the additional elements other than the abstract idea, considered individually and in combination as discussed above, the Examiner respectfully submits that the claim 1 does not contain any additional elements that individually or as an ordered combination amount to an inventive concept and the claims are ineligible.
With respect to the dependent claims do not recite anything that is found to render the abstract idea as being transformed into a patent eligible invention. The dependent claims are merely reciting further embellishments of the abstract idea and do not claim anything that amounts to significantly more than the abstract idea itself.
With respect to the dependent claims, they have been considered and are not found to be reciting anything that amounts to being significantly more than the abstract idea. Claims 2-7 are directed to further embellishments of the central theme of the abstract idea in that the claims are directed to further embellishments of the labeling data of the steps of claim 1 and do not amount to significantly more.
Specifically, claim 2 is directed to the searching of documents via a generated defined query which can be performed by a human via their mind and/or pen & paper and does not amount to significantly more.
Additionally, claim 3 is directed to the defining of the positive example that is based on a text and a label which can be performed by a human via their mind and/or pen & paper and does not amount to significantly more. 
Furthermore, claim 4 is directed to the determining of a label probability estimate which can be performed by a human via their mind and/or pen & paper and does not amount to significantly more.
Moreover, claim 5 recites the use of first and second weights on first and second label scores to determine a label probability which can be performed by the human mind and/or pen & paper and does not amount to significantly more.
Furthermore, claim 6 recites the defining of the first and second weights which can be performed by the human mind and/or pen & paper and does not amount to significantly more.
Additionally, claim 7 recites the vectorization of graph terms which can be performed by the human mind and/or pen & paper and does not amount to significantly more.
Claim 8:
A computer-readable media comprising instructions that when executed by a computing device cause the computing device to perform a method comprising: receiving a candidate text; 
receiving a label description; 
obtaining a negative example and a positive example associated with the label description; 
generating an prompt for a generative model based on the negative example and the positive example; 
determining a label probability estimate by comparing a first ranked score of the positive example generated by the generative model based on the prompt to a second ranked score of the negative example result generated by the generative model based on the prompt; and 
determining that the candidate text corresponds to the label description based on the label probability estimate.
These limitations, as drafted, is an apparatus that, under its broadest reasonable interpretation, covers the performance of mental processes specifically labeling data.  Labeling data has long before the modern computer was invented, and continues to be predominantly a product of human endeavor. The instant application recites labeling data.  Moreover, the obtaining of a positive example and negative example that are both associated with a label description can be performed by a human via their mind and/or pen & paper.  Additionally, the claimed generation of a prompt can be performed by a human via their mind and/or pen & paper.  Moreover, the claimed determining of a label probability estimate can be performed by a human via their mind and/or pen & paper.  Furthermore, the determining on whether a candidate text corresponds to a label based on a label probability estimate can be performed by a human via their mind and/or pen & paper.  Because the limitations above closely follow the steps of labeling data, and the steps involved human judgments, observations and evaluations that can be practically or reasonably performed in the human mind and/or pen & paper, the claim recites an abstract idea consistent with the “mental process” grouping set forth in the 2019 PEG. 
The mere nominal recitation of generic computing components such as computer-readable media and a computing device do not take the claim out of certain methods of mental processes grouping. Therefore, the limitation recites an abstract idea. 
If the claims recite the judicial exception of an abstract idea, it must then be determined under Step 2A Prong 2 whether the judicial exception is integrated into a practical application.  The Examiner notes that considerations under Step 2A Prong 2 comprise most the consideration previously evaluated in the context of Step 2B. The Examiner submits that the considerations discussed previously determined that the claim does not recite “significantly more” at Step 2B would be evaluated the same under Step 2A Prong 1 and result in the determination that the claim does not integrate the abstract idea into a practical application.
The instant application fails to integrate the judicial exception into a practical application because the instant application merely recites words “apply it” (or an equivalent) with the judicial exception or merely includes instructions to implement an abstract idea. The instant application is directed to an apparatus instructing the reader to implement the identified apparatus of mental processes of labeling data. The elements of the claim do not themselves amount to an improvement to the computer, to a technology or another technical field.  Moreover, the receiving of candidate text is a data gathering operation that is an insignificant data gathering operation that does not integrate the abstract idea into a practical application.  Furthermore, the receiving of a label description is a data gathering operation that is an insignificant data gathering operation that does not integrate the abstract idea into a practical application.  
Here, the claim elements entirely comprise the abstract idea, leaving little if any aspects of the claim for further consideration under Step 2A Prong 2. In short, the claims have failed to integrate a practical application (see at least 84 Fed. Reg. (4) at 55). Under the 2019 PEG, this supports the conclusion that the claim is directed to an abstract idea, and the analysis proceeds to Step 2B.
While many considerations in Step 2A need not be reevaluated in Step 2B because the outcome will be the same. Here, on the basis of the additional elements other than the abstract idea, considered individually and in combination as discussed above, the Examiner respectfully submits that the claim 8 does not contain any additional elements that individually or as an ordered combination amount to an inventive concept and the claims are ineligible.
With respect to the dependent claims do not recite anything that is found to render the abstract idea as being transformed into a patent eligible invention. The dependent claims are merely reciting further embellishments of the abstract idea and do not claim anything that amounts to significantly more than the abstract idea itself.
With respect to the dependent claims, they have been considered and are not found to be reciting anything that amounts to being significantly more than the abstract idea. Claims 9-12 are directed to further embellishments of the central theme of the abstract idea in that the claims are directed to further embellishments of the labeling data of the steps of claim 8 and do not amount to significantly more.
	Specifically, claim 9 is directed to the defining of a first ranked score which can be performed by a human via their mind and/or pen & paper and does not amount to significantly more.
	Additionally, claim 10 is directed to the generation of a defined positive example and defined negative example which can be performed by a human via their mind and/or pen & paper and does not amount to significantly more.
	Furthermore, claim 11 is directed to the generation of a defined first score and defined second score which can be performed by a human via their mind and/or pen & paper and does not amount to significantly more.  Furthermore, the submission of a data to a search engine is a data transmission operation that is an insignificant data transmission operation that does not integrate the abstract idea into a practical application.
	Moreover, claim 12 is directed to the defining of a corpus which can be performed by a human via their mind and/or pen & paper and does not amount to significantly more.
Claim 13:
A system comprising: one or more processors; and 
one or more computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to perform a method, the method comprising: obtaining a candidate text and a label associated with the candidate text; 
generating an augmented training data including a positive example and a negative example; 
classifying the candidate text using a machine learning model trained with the augmented training data; and 
outputting an indication that the candidate text corresponds to the label based an a result of classifying the candidate text using the machine learning model.
These limitations, as drafted, is an apparatus that, under its broadest reasonable interpretation, covers the performance of mental processes specifically labeling data.  Labeling data has long before the modern computer was invented, and continues to be predominantly a product of human endeavor. The instant application recites labeling data.  Moreover, the obtaining of a candidate text and label associated with the candidate text can be performed by a human via their mind and/or pen & paper.  Additionally, the claimed generation of augmented training data can be performed by a human via their mind and/or pen & paper.  Moreover, the claimed classifying of candidate text can be performed by a human via their mind and/or pen & paper.  Furthermore, the outputting of an indication on whether a candidate text corresponds to a label based on a classification can be performed by a human via their mind and/or pen & paper.  Because the limitations above closely follow the steps of labeling data, and the steps involved human judgments, observations and evaluations that can be practically or reasonably performed in the human mind and/or pen & paper, the claim recites an abstract idea consistent with the “mental process” grouping set forth in the 2019 PEG. 
The mere nominal recitation of generic computing components such as one or more processors and one or more computer storage media do not take the claim out of certain methods of mental processes grouping. Therefore, the limitation recites an abstract idea. 
If the claims recite the judicial exception of an abstract idea, it must then be determined under Step 2A Prong 2 whether the judicial exception is integrated into a practical application.  The Examiner notes that considerations under Step 2A Prong 2 comprise most the consideration previously evaluated in the context of Step 2B. The Examiner submits that the considerations discussed previously determined that the claim does not recite “significantly more” at Step 2B would be evaluated the same under Step 2A Prong 1 and result in the determination that the claim does not integrate the abstract idea into a practical application.
The instant application fails to integrate the judicial exception into a practical application because the instant application merely recites words “apply it” (or an equivalent) with the judicial exception or merely includes instructions to implement an abstract idea. The instant application is directed to an apparatus instructing the reader to implement the identified apparatus of mental processes of labeling data. The elements of the claim do not themselves amount to an improvement to the computer, to a technology or another technical field.  
Here, the claim elements entirely comprise the abstract idea, leaving little if any aspects of the claim for further consideration under Step 2A Prong 2. In short, the claims have failed to integrate a practical application (see at least 84 Fed. Reg. (4) at 55). Under the 2019 PEG, this supports the conclusion that the claim is directed to an abstract idea, and the analysis proceeds to Step 2B.
While many considerations in Step 2A need not be reevaluated in Step 2B because the outcome will be the same. Here, on the basis of the additional elements other than the abstract idea, considered individually and in combination as discussed above, the Examiner respectfully submits that the claim 13 does not contain any additional elements that individually or as an ordered combination amount to an inventive concept and the claims are ineligible.
With respect to the dependent claims do not recite anything that is found to render the abstract idea as being transformed into a patent eligible invention. The dependent claims are merely reciting further embellishments of the abstract idea and do not claim anything that amounts to significantly more than the abstract idea itself.
With respect to the dependent claims, they have been considered and are not found to be reciting anything that amounts to being significantly more than the abstract idea. Claims 14-20 are directed to further embellishments of the central theme of the abstract idea in that the claims are directed to further embellishments of the labeling data of the steps of claim 13 and do not amount to significantly more.
	Specifically, claim 14 is directed to the determination of different keywords which can be performed by a human via their mind and/or pen & paper and does not amount to significantly more.  Additionally, the submission of a query to a search engine is a data transmission operation that is an insignificant data transmission operation that does not integrate the abstract idea into a practical application.  Additionally, the receiving of a response from the search engine is a data transmission operation that is an insignificant data transmission operation that does not integrate the abstract idea into a practical application.
	Additionally, claim 15 is directed to the defining of a received response from a search engine which is a data transmission operation that is an insignificant data transmission operation that does not integrate the abstract idea into a practical application.
	Moreover, claim 16 is directed to the storage of data which is a data storage operation that is an insignificant data storage operation that does not integrate the abstract idea into a practical application.
	Furthermore, claim 17 is directed to the vectorization and subsequent comparison of different vectors which can be performed by a human via their mind and/or pen & paper and does not amount to significantly more.
	Additionally, claim 18 is directed to the use of a cosine computation between vectors which can be performed by a human via their mind and/or pen & paper and does not amount to significantly more.
	Moreover, claim 19 is directed to the defining of a negative example which can be performed by a human via their mind and/or pen & paper and does not amount to significantly more.
	Furthermore, claim 20 is directed to the obtaining of an indication that a probability is above of a threshold which can be performed by a human via their mind and/or pen & paper and does not amount to significantly more.
Double Patenting
11.	The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public Policy(a Policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
12.	A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
13.	Claims 1 and 8 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 8 of U.S. Patent No. 12,197,486 (herein referred to as Sewak 486). 
14.	Although the claims at issue are not identical, they are not patentably distinct from each other because of the following reasons:  Claim 1 of the instant application substantially recites the limitations of claim 8 of Sewak 486.  Both claims recite substantially similar limitations regarding the labeling of data.
Application Claim 1
U.S. Patent 12,197,486 Claim 8
1.  A method comprising:
A)  receiving a candidate text; 
B)  receiving a label description; 
C)  obtaining a positive example and a negative example associated with the label description; 
D)  providing as an input to a generative model the positive example and the negative example; 
E)  determining a label probability estimate based on an output of the generative model; and 
F)  outputting an indication whether the candidate text corresponds to the label description based on the label probability estimate.




8.  A computer storage media comprising: A)  instructions that when executed by a computing device cause the computing device to perform a method for determining a correspondence between a class label and a text comprising: receiving a candidate text (Corresponds to Limitation A); 
B)  receiving a label description (Corresponds to Limitation B); 
C)  generating a candidate result from a generative model with the candidate text as input to the generative model; 
D)  generating a positive example result from the generative model with a positive example text as input to the generative model, the positive example text embodying the label description (Corresponds to Limitations C & D); 
F)  generating a negative example result from a generative model with a negative example text as input to the generative model, the negative example text embodying a concept opposite to the label description (Corresponds to Limitations C & D); 
G)  determining a first ranked score of the positive example result based on a response from submitting the candidate result to a search engine as a query over a corpus comprising the positive example result and the negative example result; 
H)  determining a second ranked score of the negative example result based on the response from submitting the candidate result to the search engine as the query over a corpus comprising the positive example result and the negative example result; 
I)  determining a label probability estimate by comparing the first ranked score of the positive example result to the second ranked score of the negative example result (Corresponds to Limitation E); and 
J)  outputting an indication whether the candidate text corresponds to the label description based on the label probability estimate (Corresponds to Limitation F).





However, the cited patent of Sewak 486 also labels data.  It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to claim a different statutory class.
Although the claims at issue are not identical, they are not patentably distinct from each other because of the following reasons:  Claim 8 of the instant application substantially recites the limitations of claim 8 of Sewak 486.  Both claims recite substantially similar limitations regarding the labeling of data.
Application Claim 8
U.S. Patent 12,197,486 Claim 8
8.  A computer-readable media comprising:
A)  instructions that when executed by a computing device cause the computing device to perform a method comprising: receiving a candidate text; 
B)  receiving a label description; 
C)  obtaining a negative example and a positive example associated with the label description; 
D)  generating an prompt for a generative model based on the negative example and the positive example; 
E)  determining a label probability estimate by comparing a first ranked score of the positive example generated by the generative model based on the prompt to a second ranked score of the negative example result generated by the generative model based on the prompt; and 
F)  determining that the candidate text corresponds to the label description based on the label probability estimate.




8.  A computer storage media comprising: A)  instructions that when executed by a computing device cause the computing device to perform a method for determining a correspondence between a class label and a text comprising: receiving a candidate text (Corresponds to Limitation A); 
B)  receiving a label description (Corresponds to Limitation B); 
C)  generating a candidate result from a generative model with the candidate text as input to the generative model; 
D)  generating a positive example result from the generative model with a positive example text as input to the generative model, the positive example text embodying the label description (Corresponds to Limitations C & D); 
F)  generating a negative example result from a generative model with a negative example text as input to the generative model, the negative example text embodying a concept opposite to the label description (Corresponds to Limitations C & D); 
G)  determining a first ranked score of the positive example result based on a response from submitting the candidate result to a search engine as a query over a corpus comprising the positive example result and the negative example result; 
H)  determining a second ranked score of the negative example result based on the response from submitting the candidate result to the search engine as the query over a corpus comprising the positive example result and the negative example result; 
I)  determining a label probability estimate by comparing the first ranked score of the positive example result to the second ranked score of the negative example result (Corresponds to Limitation E); and 
J)  outputting an indication whether the candidate text corresponds to the label description based on the label probability estimate (Corresponds to Limitation F).





However, the cited patent of Sewak 486 also labels data.
Claim Rejections - 35 USC § 103
15.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
16.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
17.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
18.	Claims 1, 13, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Walters et al. (U.S. PGPUB 2022/0284280), in view of Amrite et al. (U.S. Patent 10,853,580).
19.	Regarding claim 1, Walters teaches a method comprising:
A)  receiving a candidate text (Paragraphs 30, 34, 40,  and 44); 
B)  receiving a label description (Paragraphs 30, 34, 40, and 44); 
E)  determining a label probability estimate based on an output of the generative model (Paragraphs 34, 37, 40-41, 44, and 47); and 
F)  outputting an indication whether the candidate text corresponds to the label description based on the label probability estimate (Paragraphs 34, 37, 40-41, 44, and 47).
	The examiner notes that Walters teaches “receiving a candidate text” as “Once the user 105 has finished inputting labels for the example data sample 205, the user 105 may, after moving pointer 307 to a location over the button 303, press the button 303 to indicate that labeling of the example data sample 205 is complete. Based on pressing the button 303, the labels for the example data sample 205 may be stored for later access (e.g., by the server 110 receiving and storing the labels for the example data sample 205)” (Paragraph 30), “The labels themselves are included in a set of labels 230 for the additional example modified data sample 225. As depicted in FIG. 2B, the set of labels 230 includes an ordered list of labels that corresponds to each encoding inserted into the additional modified data sample 225” (Paragraph 34), “Once one or more iterations of the iterative training process is complete, the model 115 may be used to generate predicted labels 117. Predicted labels 117 may include data that indicates potential labels and confidence values associated with the potential labels. For example, the model 115 may receive, as input, a data sample and may generate, as output, one or more predictions for the data sample. A prediction may be for a specific location in the data sample. Further, the prediction may include one or more potential labels for the specific location and one or more confidence values for the one or more potential labels. An illustration of using the model 115 to generate predictions is provided in FIG. 2C. As depicted in FIG. 2C, the model 115 receives, as input, a data sample 235 formatted as email data. FIG. 2C shows an example portion of the data sample 235. Specifically, the example portion includes the text “My name is Smith. I am located in New York”. The model 115 may process the received data sample and generate predicted labels 240 for the received data sample. FIG. 2C shows, as part of the predicted labels 240, example predictions for the example portion of the data sample 235” (Paragraph 40), and “The determination whether to continue or stop the iterative training process for the model 115 may be performed on a per-set basis. As the user 105, via the user-based labeling process, is providing labels for additional sets of the data samples 103, the server 110 may be performing a process for predicting labels for the additional sets of the data samples 103. These additional sets, for example, may be the second set 103-2 and the third set 103-3 of the data samples 103. Once the user 105 completes labeling an additional set of the data samples 103 and the model 115 outputs predicted labels for the additional set of the data samples 103, the server 110 may determine whether to continue or stop the iterative training process for the model 115 based on a comparison between the labels provided by the user 105 for the additional set and the predicted labels generated by the model 115 for the additional set” (Paragraph 44).  The examiner further notes that data samples 103 and data sample 235 are examples of the claimed candidate text.  The examiner further notes that Walters teaches “receiving a label description” as “Once the user 105 has finished inputting labels for the example data sample 205, the user 105 may, after moving pointer 307 to a location over the button 303, press the button 303 to indicate that labeling of the example data sample 205 is complete. Based on pressing the button 303, the labels for the example data sample 205 may be stored for later access (e.g., by the server 110 receiving and storing the labels for the example data sample 205)” (Paragraph 30), “The labels themselves are included in a set of labels 230 for the additional example modified data sample 225. As depicted in FIG. 2B, the set of labels 230 includes an ordered list of labels that corresponds to each encoding inserted into the additional modified data sample 225” (Paragraph 34), “Once one or more iterations of the iterative training process is complete, the model 115 may be used to generate predicted labels 117. Predicted labels 117 may include data that indicates potential labels and confidence values associated with the potential labels. For example, the model 115 may receive, as input, a data sample and may generate, as output, one or more predictions for the data sample. A prediction may be for a specific location in the data sample. Further, the prediction may include one or more potential labels for the specific location and one or more confidence values for the one or more potential labels. An illustration of using the model 115 to generate predictions is provided in FIG. 2C. As depicted in FIG. 2C, the model 115 receives, as input, a data sample 235 formatted as email data. FIG. 2C shows an example portion of the data sample 235. Specifically, the example portion includes the text “My name is Smith. I am located in New York”. The model 115 may process the received data sample and generate predicted labels 240 for the received data sample. FIG. 2C shows, as part of the predicted labels 240, example predictions for the example portion of the data sample 235” (Paragraph 40), and “The determination whether to continue or stop the iterative training process for the model 115 may be performed on a per-set basis. As the user 105, via the user-based labeling process, is providing labels for additional sets of the data samples 103, the server 110 may be performing a process for predicting labels for the additional sets of the data samples 103. These additional sets, for example, may be the second set 103-2 and the third set 103-3 of the data samples 103. Once the user 105 completes labeling an additional set of the data samples 103 and the model 115 outputs predicted labels for the additional set of the data samples 103, the server 110 may determine whether to continue or stop the iterative training process for the model 115 based on a comparison between the labels provided by the user 105 for the additional set and the predicted labels generated by the model 115 for the additional set” (Paragraph 44).  The examiner further notes that manually input labels (i.e. an example of the claimed label description in the broadest reasonable interpretation) are received.  The examiner further notes that Walters teaches “determining a label probability estimate based on an output of the generative model” as “The labels themselves are included in a set of labels 230 for the additional example modified data sample 225. As depicted in FIG. 2B, the set of labels 230 includes an ordered list of labels that corresponds to each encoding inserted into the additional modified data sample 225” (Paragraph 34), “The iterative training process may include training the model 115 using the first modified set of the data samples. For example, the model 115 may be trained based on the example modified data sample 210, and/or the additional example modified data sample 225 and the set of labels 230. In this way, the model 115 is trained based on the first set 103-1 of the data samples 103, but in a way that does not directly use the first set 103-1 of the data samples 103 in the training process. The model 115 may be configured to predict labels associated with particular formats of data” (Paragraph 37), “Once one or more iterations of the iterative training process is complete, the model 115 may be used to generate predicted labels 117. Predicted labels 117 may include data that indicates potential labels and confidence values associated with the potential labels. For example, the model 115 may receive, as input, a data sample and may generate, as output, one or more predictions for the data sample. A prediction may be for a specific location in the data sample. Further, the prediction may include one or more potential labels for the specific location and one or more confidence values for the one or more potential labels. An illustration of using the model 115 to generate predictions is provided in FIG. 2C. As depicted in FIG. 2C, the model 115 receives, as input, a data sample 235 formatted as email data. FIG. 2C shows an example portion of the data sample 235. Specifically, the example portion includes the text “My name is Smith. I am located in New York”. The model 115 may process the received data sample and generate predicted labels 240 for the received data sample. FIG. 2C shows, as part of the predicted labels 240, example predictions for the example portion of the data sample 235” (Paragraph 40), “Specifically, the examples of predicted labels 240 include a first example prediction of “Location (a, b): [first name, 66], [last name, 25]” for the text “Smith” and a second example prediction of “Location (c, d): [city, 50], [state, 50]” for the text “New York”. For the first example prediction, “Location (a, b)” may indicate the location of “Smith” in the data sample (e.g., a may indicate the location of the first character from “Smith”, S; and b may indicate the location of the last character from “Smith”, h). Continuing with the first example prediction, [first name, 66] may indicate a potential label of “first name” for “Smith” with a confidence value of 66” (Paragraph 41), “The determination whether to continue or stop the iterative training process for the model 115 may be performed on a per-set basis. As the user 105, via the user-based labeling process, is providing labels for additional sets of the data samples 103, the server 110 may be performing a process for predicting labels for the additional sets of the data samples 103. These additional sets, for example, may be the second set 103-2 and the third set 103-3 of the data samples 103. Once the user 105 completes labeling an additional set of the data samples 103 and the model 115 outputs predicted labels for the additional set of the data samples 103, the server 110 may determine whether to continue or stop the iterative training process for the model 115 based on a comparison between the labels provided by the user 105 for the additional set and the predicted labels generated by the model 115 for the additional set” (Paragraph 44), and “Additionally, the comparison may result in determining whether the label provided by the user 105 differs or matches the potential labels of the first example prediction based on the first example prediction's confidence values. For example, the comparison may be performed based on matching the label provided by the user 105 to the potential label having the highest confidence value. In this way, if the user 105 provided a label of “first name” for “Smith”, then the comparison may result in determining that the label provided by the user 105 matches the potential labels of the first example prediction (e.g., based on the first example prediction including “[first name, 66]” and that potential label having the highest confidence value for the first example prediction). If the user 105 provided a label of “last name” for “Smith”, then the comparison may result in determining that the label provided by the user 105 differs from the potential labels of the first example prediction (e.g., based on “[last name, 25]” not having the highest confidence value for the first example prediction). An additional example of performing the comparison based on the first example prediction's confidence value may include applying a confidence threshold to the potential labels. Under this additional example, a label provided by the user 105 matches only if the label is found in the prediction and has a confidence value greater than a threshold confidence value” (Paragraph 47).  The examiner further notes that calculated confidence values (i.e. the claimed label probability estimate in the broadest reasonable interpretation) are based off of output from a trained model (i.e. the claimed generative model in the broadest reasonable interpretation).  The examiner further notes that Walters teaches “outputting an indication whether the candidate text corresponds to the label description based on the label probability estimate” as “The labels themselves are included in a set of labels 230 for the additional example modified data sample 225. As depicted in FIG. 2B, the set of labels 230 includes an ordered list of labels that corresponds to each encoding inserted into the additional modified data sample 225” (Paragraph 34), “The iterative training process may include training the model 115 using the first modified set of the data samples. For example, the model 115 may be trained based on the example modified data sample 210, and/or the additional example modified data sample 225 and the set of labels 230. In this way, the model 115 is trained based on the first set 103-1 of the data samples 103, but in a way that does not directly use the first set 103-1 of the data samples 103 in the training process. The model 115 may be configured to predict labels associated with particular formats of data” (Paragraph 37), “Once one or more iterations of the iterative training process is complete, the model 115 may be used to generate predicted labels 117. Predicted labels 117 may include data that indicates potential labels and confidence values associated with the potential labels. For example, the model 115 may receive, as input, a data sample and may generate, as output, one or more predictions for the data sample. A prediction may be for a specific location in the data sample. Further, the prediction may include one or more potential labels for the specific location and one or more confidence values for the one or more potential labels. An illustration of using the model 115 to generate predictions is provided in FIG. 2C. As depicted in FIG. 2C, the model 115 receives, as input, a data sample 235 formatted as email data. FIG. 2C shows an example portion of the data sample 235. Specifically, the example portion includes the text “My name is Smith. I am located in New York”. The model 115 may process the received data sample and generate predicted labels 240 for the received data sample. FIG. 2C shows, as part of the predicted labels 240, example predictions for the example portion of the data sample 235” (Paragraph 40), “Specifically, the examples of predicted labels 240 include a first example prediction of “Location (a, b): [first name, 66], [last name, 25]” for the text “Smith” and a second example prediction of “Location (c, d): [city, 50], [state, 50]” for the text “New York”. For the first example prediction, “Location (a, b)” may indicate the location of “Smith” in the data sample (e.g., a may indicate the location of the first character from “Smith”, S; and b may indicate the location of the last character from “Smith”, h). Continuing with the first example prediction, [first name, 66] may indicate a potential label of “first name” for “Smith” with a confidence value of 66” (Paragraph 41), “The determination whether to continue or stop the iterative training process for the model 115 may be performed on a per-set basis. As the user 105, via the user-based labeling process, is providing labels for additional sets of the data samples 103, the server 110 may be performing a process for predicting labels for the additional sets of the data samples 103. These additional sets, for example, may be the second set 103-2 and the third set 103-3 of the data samples 103. Once the user 105 completes labeling an additional set of the data samples 103 and the model 115 outputs predicted labels for the additional set of the data samples 103, the server 110 may determine whether to continue or stop the iterative training process for the model 115 based on a comparison between the labels provided by the user 105 for the additional set and the predicted labels generated by the model 115 for the additional set” (Paragraph 44), and “Additionally, the comparison may result in determining whether the label provided by the user 105 differs or matches the potential labels of the first example prediction based on the first example prediction's confidence values. For example, the comparison may be performed based on matching the label provided by the user 105 to the potential label having the highest confidence value. In this way, if the user 105 provided a label of “first name” for “Smith”, then the comparison may result in determining that the label provided by the user 105 matches the potential labels of the first example prediction (e.g., based on the first example prediction including “[first name, 66]” and that potential label having the highest confidence value for the first example prediction). If the user 105 provided a label of “last name” for “Smith”, then the comparison may result in determining that the label provided by the user 105 differs from the potential labels of the first example prediction (e.g., based on “[last name, 25]” not having the highest confidence value for the first example prediction). An additional example of performing the comparison based on the first example prediction's confidence value may include applying a confidence threshold to the potential labels. Under this additional example, a label provided by the user 105 matches only if the label is found in the prediction and has a confidence value greater than a threshold confidence value” (Paragraph 47).  The examiner further notes that calculated confidence values (i.e. the claimed label probability estimate in the broadest reasonable interpretation) are output and are indicative if the candidate text corresponds to label descriptions.
	Walters does not explicitly teach:
C)  obtaining a positive example and a negative example associated with the label description; 
D)  providing as an input to a generative model the positive example and the negative example.
	Amrite, however, teaches “obtaining a positive example and a negative example associated with the label description” as “The training data generator 172 generates the labeled training data 180. For example, the training data generator 172 stores or modifies one or more data structures (e.g., data tables) to indicate which word groups of the first and second candidate word groups 212, 214 are assigned as positive or negative examples of each of one or more labels (or text classification categories)” (Column 14, lines 48-54) and “providing as an input to a generative model the positive example and the negative example” as “FIG. 1 illustrates a particular example of a system 100 that is operable to generate labeled training data 180 to train a text classifier 178” (Column 4, lines 59-61) and “The training data generator 172 generates the labeled training data 180. For example, the training data generator 172 stores or modifies one or more data structures (e.g., data tables) to indicate which word groups of the first and second candidate word groups 212, 214 are assigned as positive or negative examples of each of one or more labels (or text classification categories)” (Column 14, lines 48-54).
	The examiner further notes that the secondary reference of Amrite teaches the concept of using positive and negative examples that are associated with labels (i.e. labels descriptions in the broadest reasonable interpretation) in training data that is input into a classifier (i.e. the claimed generative model in the broadest reasonable interpretation) for training that classifier.  The combination would result in the use of such positive and negative examples to train the classifier of Walters.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Amrite’s would have allowed Walters’ to provide a method for improving the accuracy of classifiers, as noted by Amrite (Column 8, lines 39-41).

Regarding claim 13, Walters teaches a system comprising:
A)  one or more processors (Paragraph 83); and 
B)  one or more computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to perform a method, the method comprising: obtaining a candidate text and a label associated with the candidate text (Paragraphs 30, 34, 40, 44, and 83); 
C)  generating an augmented training data (Paragraph 37); 
D)  classifying the candidate text using a machine learning model trained with the augmented training data (Paragraphs 37, 40, and 44); and 
E)  outputting an indication that the candidate text corresponds to the label based an a result of classifying the candidate text using the machine learning model (Paragraphs 34, 37, 40-41, and 44).
	The examiner notes that Walters teaches “one or more processors” as “As seen in FIG. 6, the computing device 601 may include a processor 611, RAM 613, ROM 615, network interface 617, input/output interfaces 619 (e.g., keyboard, mouse, display, printer, etc.), and memory 621. Processor 611 may include one or more computer processing units (CPUs), graphical processing units (GPUs), and/or other processing units such as a processor adapted to perform computations associated with speech processing or other forms of machine learning” (Paragraph 83).  The examiner further notes that processor 611 teaches the claimed one or more processors.  The examiner further notes that Walters teaches “one or more computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to perform a method, the method comprising: obtaining a candidate text and a label associated with the candidate text” as “Once the user 105 has finished inputting labels for the example data sample 205, the user 105 may, after moving pointer 307 to a location over the button 303, press the button 303 to indicate that labeling of the example data sample 205 is complete. Based on pressing the button 303, the labels for the example data sample 205 may be stored for later access (e.g., by the server 110 receiving and storing the labels for the example data sample 205)” (Paragraph 30), “The labels themselves are included in a set of labels 230 for the additional example modified data sample 225. As depicted in FIG. 2B, the set of labels 230 includes an ordered list of labels that corresponds to each encoding inserted into the additional modified data sample 225” (Paragraph 34), “Once one or more iterations of the iterative training process is complete, the model 115 may be used to generate predicted labels 117. Predicted labels 117 may include data that indicates potential labels and confidence values associated with the potential labels. For example, the model 115 may receive, as input, a data sample and may generate, as output, one or more predictions for the data sample. A prediction may be for a specific location in the data sample. Further, the prediction may include one or more potential labels for the specific location and one or more confidence values for the one or more potential labels. An illustration of using the model 115 to generate predictions is provided in FIG. 2C. As depicted in FIG. 2C, the model 115 receives, as input, a data sample 235 formatted as email data. FIG. 2C shows an example portion of the data sample 235. Specifically, the example portion includes the text “My name is Smith. I am located in New York”. The model 115 may process the received data sample and generate predicted labels 240 for the received data sample. FIG. 2C shows, as part of the predicted labels 240, example predictions for the example portion of the data sample 235” (Paragraph 40), “The determination whether to continue or stop the iterative training process for the model 115 may be performed on a per-set basis. As the user 105, via the user-based labeling process, is providing labels for additional sets of the data samples 103, the server 110 may be performing a process for predicting labels for the additional sets of the data samples 103. These additional sets, for example, may be the second set 103-2 and the third set 103-3 of the data samples 103. Once the user 105 completes labeling an additional set of the data samples 103 and the model 115 outputs predicted labels for the additional set of the data samples 103, the server 110 may determine whether to continue or stop the iterative training process for the model 115 based on a comparison between the labels provided by the user 105 for the additional set and the predicted labels generated by the model 115 for the additional set” (Paragraph 44), and “As seen in FIG. 6, the computing device 601 may include a processor 611, RAM 613, ROM 615, network interface 617, input/output interfaces 619 (e.g., keyboard, mouse, display, printer, etc.), and memory 621” (Paragraph 83).  The examiner further notes that a data sample 235 (i.e. an example of the claimed candidate text) and stored labels are obtained.  Such stored labels are “associated” with the data sample 235 (i.e. the claimed candidate text) in the broadest reasonable interpretation.  Alternatively, the data samples 103 and corresponding manually input labels also teach the claimed candidate text and associated label respectively.  The examiner further notes that Walters teaches “generating an augmented training data” as “The iterative training process may include training the model 115 using the first modified set of the data samples. For example, the model 115 may be trained based on the example modified data sample 210, and/or the additional example modified data sample 225 and the set of labels 230. In this way, the model 115 is trained based on the first set 103-1 of the data samples 103, but in a way that does not directly use the first set 103-1 of the data samples 103 in the training process. The model 115 may be configured to predict labels associated with particular formats of data” (Paragraph 37).  The examiner further notes that a model 115 is trained via a modified set of data samples (i.e. the claimed augmented training data).  The examiner further notes that Walters teaches “classifying the candidate text using a machine learning model trained with the augmented training data” as “The iterative training process may include training the model 115 using the first modified set of the data samples. For example, the model 115 may be trained based on the example modified data sample 210, and/or the additional example modified data sample 225 and the set of labels 230. In this way, the model 115 is trained based on the first set 103-1 of the data samples 103, but in a way that does not directly use the first set 103-1 of the data samples 103 in the training process. The model 115 may be configured to predict labels associated with particular formats of data” (Paragraph 37), “Once one or more iterations of the iterative training process is complete, the model 115 may be used to generate predicted labels 117. Predicted labels 117 may include data that indicates potential labels and confidence values associated with the potential labels. For example, the model 115 may receive, as input, a data sample and may generate, as output, one or more predictions for the data sample. A prediction may be for a specific location in the data sample. Further, the prediction may include one or more potential labels for the specific location and one or more confidence values for the one or more potential labels. An illustration of using the model 115 to generate predictions is provided in FIG. 2C. As depicted in FIG. 2C, the model 115 receives, as input, a data sample 235 formatted as email data. FIG. 2C shows an example portion of the data sample 235. Specifically, the example portion includes the text “My name is Smith. I am located in New York”. The model 115 may process the received data sample and generate predicted labels 240 for the received data sample. FIG. 2C shows, as part of the predicted labels 240, example predictions for the example portion of the data sample 235” (Paragraph 40), and “The determination whether to continue or stop the iterative training process for the model 115 may be performed on a per-set basis. As the user 105, via the user-based labeling process, is providing labels for additional sets of the data samples 103, the server 110 may be performing a process for predicting labels for the additional sets of the data samples 103. These additional sets, for example, may be the second set 103-2 and the third set 103-3 of the data samples 103. Once the user 105 completes labeling an additional set of the data samples 103 and the model 115 outputs predicted labels for the additional set of the data samples 103, the server 110 may determine whether to continue or stop the iterative training process for the model 115 based on a comparison between the labels provided by the user 105 for the additional set and the predicted labels generated by the model 115 for the additional set” (Paragraph 44).  The examiner further notes that the example portion of data sample 235 (i.e. an example of candidate text) is labeled (i.e. classified) via the use of trained model 115 (i.e. the claimed machine learning model that is trained with augmented training data).  Alternatively, the iterative training process including using a trained model (which has been trained in previous iterations) to output a classification (i.e. label) that is compared to the manually entered label.  The examiner further notes that Walters teaches “outputting an indication that the candidate text corresponds to the label based an a result of classifying the candidate text using the machine learning model” as “The labels themselves are included in a set of labels 230 for the additional example modified data sample 225. As depicted in FIG. 2B, the set of labels 230 includes an ordered list of labels that corresponds to each encoding inserted into the additional modified data sample 225” (Paragraph 34), “The iterative training process may include training the model 115 using the first modified set of the data samples. For example, the model 115 may be trained based on the example modified data sample 210, and/or the additional example modified data sample 225 and the set of labels 230. In this way, the model 115 is trained based on the first set 103-1 of the data samples 103, but in a way that does not directly use the first set 103-1 of the data samples 103 in the training process. The model 115 may be configured to predict labels associated with particular formats of data” (Paragraph 37), “Once one or more iterations of the iterative training process is complete, the model 115 may be used to generate predicted labels 117. Predicted labels 117 may include data that indicates potential labels and confidence values associated with the potential labels. For example, the model 115 may receive, as input, a data sample and may generate, as output, one or more predictions for the data sample. A prediction may be for a specific location in the data sample. Further, the prediction may include one or more potential labels for the specific location and one or more confidence values for the one or more potential labels. An illustration of using the model 115 to generate predictions is provided in FIG. 2C. As depicted in FIG. 2C, the model 115 receives, as input, a data sample 235 formatted as email data. FIG. 2C shows an example portion of the data sample 235. Specifically, the example portion includes the text “My name is Smith. I am located in New York”. The model 115 may process the received data sample and generate predicted labels 240 for the received data sample. FIG. 2C shows, as part of the predicted labels 240, example predictions for the example portion of the data sample 235” (Paragraph 40), “Specifically, the examples of predicted labels 240 include a first example prediction of “Location (a, b): [first name, 66], [last name, 25]” for the text “Smith” and a second example prediction of “Location (c, d): [city, 50], [state, 50]” for the text “New York”. For the first example prediction, “Location (a, b)” may indicate the location of “Smith” in the data sample (e.g., a may indicate the location of the first character from “Smith”, S; and b may indicate the location of the last character from “Smith”, h). Continuing with the first example prediction, [first name, 66] may indicate a potential label of “first name” for “Smith” with a confidence value of 66” (Paragraph 41), and “The determination whether to continue or stop the iterative training process for the model 115 may be performed on a per-set basis. As the user 105, via the user-based labeling process, is providing labels for additional sets of the data samples 103, the server 110 may be performing a process for predicting labels for the additional sets of the data samples 103. These additional sets, for example, may be the second set 103-2 and the third set 103-3 of the data samples 103. Once the user 105 completes labeling an additional set of the data samples 103 and the model 115 outputs predicted labels for the additional set of the data samples 103, the server 110 may determine whether to continue or stop the iterative training process for the model 115 based on a comparison between the labels provided by the user 105 for the additional set and the predicted labels generated by the model 115 for the additional set” (Paragraph 44).  The examiner further notes that output from trained model 115 (i.e. the claimed machine learning model) includes an indication that the data sample 235 (i.e. the claimed candidate text) corresponds to the label.  Alternatively, the iterative training process including using a trained model (which has been trained in previous iterations) to output a classification (i.e. label) that is compared to the manually entered label.
	Walters does not explicitly teach:
C)  including a positive example and a negative example.
	Amrite, however, teaches “including a positive example and a negative example” as “The training data generator 172 generates the labeled training data 180. For example, the training data generator 172 stores or modifies one or more data structures (e.g., data tables) to indicate which word groups of the first and second candidate word groups 212, 214 are assigned as positive or negative examples of each of one or more labels (or text classification categories)” (Column 14, lines 48-54).
	The examiner further notes that the secondary reference of Amrite teaches the concept of using positive and negative examples in training data that is used to train a classifier.  The combination would result in the use of such positive and negative examples to train the classifier of Walters.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Amrite’s would have allowed Walters’ to provide a method for improving the accuracy of classifiers, as noted by Amrite (Column 8, lines 39-41).

	Regarding claim 19, Walters does not explicitly teach a system comprising:
A)  wherein the negative example corresponds to a concept opposite to the label.
	Amrite, however, teaches “wherein the negative example corresponds to a concept opposite to the label” as “the second user interface 140 enables the user to designate both positive and negative examples of particular text classification categories based on the MLT search. For example, the second user interface 140 includes at least some of the MLT search results and visually distinguishes the TOI 126 or terms semantically similar to the TOI 126 in the document samples 142. The user can therefore see what term in each document sample 142 caused the document sample 142 to be listed in the MLT search results. As explained above, the MLT search results can list one or more dissimilar word groups, which correspond to one or more of the document samples 142, such as the second document sample 146. The dissimilar word groups are identified in the MLT search results to improve the text classifier's ability to distinguish uses of the TOI 126 (or semantically similar terms) that are associated with the text classification category from uses of the TOI 126 (or semantically similar terms) that are not associated with the text classification category. To illustrate, after reviewing the second document sample 146, the user may determine that the TOI 126, as used in the second document sample 146, does not have the meaning associated with the text classification category. For example, if the TOI 126 is “Queen” and the user is labeling document samples 142 associated with a “music groups” text classification category, the user may review the second document sample 146 and determine that the use of the term “Queen” in the second document sample 146 is a reference to a monarch rather than to the band Queen. Accordingly, the second document sample 146 can be designated a negative example of the text classification category” (Column 7, lines 30-60) and “The training data generator 172 generates the labeled training data 180. For example, the training data generator 172 stores or modifies one or more data structures (e.g., data tables) to indicate which word groups of the first and second candidate word groups 212, 214 are assigned as positive or negative examples of each of one or more labels (or text classification categories)” (Column 14, lines 48-54).
	The examiner further notes that the secondary reference of Amrite teaches the concept of using negative examples (which is a “concept” opposite a label) in training data that is used to train a classifier.  The combination would result in the use of such negative examples to train the classifier of Walters.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Amrite’s would have allowed Walters’ to provide a method for improving the accuracy of classifiers, as noted by Amrite (Column 8, lines 39-41).

	Regarding claim 20, Walters further teaches a system comprising:
A)  wherein classifying the candidate text further comprises obtaining, from the machine learning model, an indication that a probability that the candidate text corresponds to the label is above a threshold (Paragraph 47).
	The examiner notes that Walters teaches “wherein classifying the candidate text further comprises obtaining, from the machine learning model, an indication that a probability that the candidate text corresponds to the label is above a threshold” as “Additionally, the comparison may result in determining whether the label provided by the user 105 differs or matches the potential labels of the first example prediction based on the first example prediction's confidence values. For example, the comparison may be performed based on matching the label provided by the user 105 to the potential label having the highest confidence value. In this way, if the user 105 provided a label of “first name” for “Smith”, then the comparison may result in determining that the label provided by the user 105 matches the potential labels of the first example prediction (e.g., based on the first example prediction including “[first name, 66]” and that potential label having the highest confidence value for the first example prediction). If the user 105 provided a label of “last name” for “Smith”, then the comparison may result in determining that the label provided by the user 105 differs from the potential labels of the first example prediction (e.g., based on “[last name, 25]” not having the highest confidence value for the first example prediction). An additional example of performing the comparison based on the first example prediction's confidence value may include applying a confidence threshold to the potential labels. Under this additional example, a label provided by the user 105 matches only if the label is found in the prediction and has a confidence value greater than a threshold confidence value” (Paragraph 47).  The examiner further notes that a confidence threshold (i.e. the claimed threshold) is used in the classification process for the model.
20.	Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Walters et al. (U.S. PGPUB 2022/0284280), in view of Amrite et al. (U.S. Patent 10,853,580) as applied to claims 1, 13, and 19-20 above, and further in view of Kumbhar et al. (U.S. PGPUB 2020/0110836).
21.	Regarding claim 2, Walters and Amrite do not explicitly teach a method comprising:
A)  wherein obtaining the positive example further comprises obtaining the positive example by at least searching a corpus of documents based on a query generated using the candidate text and the label description.
	Caputo, however, teaches “wherein obtaining the positive example further comprises obtaining the positive example by at least searching a corpus of documents based on a query generated using the candidate text and the label description” as “Context module 160 may collect and filter contexts for a query phrase. Context module 160 may receive a query phrase, along with the context of the query phrase as it appears in a phrase corpus, which is stored in database 180 and is distinct from the user's corpus. The phrase corpus may include a large sample of written texts in the same language as the phrase corpus” (Paragraph 29), “Candidate module 165 may identify and rank candidate phrases by searching the phrase corpus for all instances of the n-gram tokens representing preceding contexts, succeeding contexts, and/or cradle contexts” (Paragraph 31), and “Once candidate module 165 has filtered out candidate phrases, candidate module 165 may rank the phrases. In some embodiments, candidate module 165 ranks the phrases according to a shared feature gain algorithm, and further refines the ranking using a ranking function based on Kullback-Leibler divergence” (Paragraph 32).
	The examiner further notes that the secondary reference of Kumbhar teaches the concept of obtaining similar phrases (i.e. “positive example(s)) via the searching of a corpus of different texts (i.e. documents in the broadest reasonable interpretation) via a query that includes a phrase (i.e. candidate text) and context (i.e. a label description in the broadest reasonable interpretation).  The combination would result in obtaining the positive examples of Amrite via the querying of a corpus.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Kumbhar’s would have allowed Walters’ and Amrite’s to provide a method for improving data discovery, as noted by Kumbhar (Paragraph 13).
22.	Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Walters et al. (U.S. PGPUB 2022/0284280), in view of Amrite et al. (U.S. Patent 10,853,580) as applied to claims 1, 13, and 19-20 above, and further in view of Ackerman et al. (U.S. PGPUB 2021/0335334).
23.	Regarding claim 3, Walters and Amrite do not explicitly teach a method comprising:
A)  wherein the positive example further comprises a second output generated by the generative model based on a second input including the candidate text and the label description.
	Ackerman, however, teaches “wherein the positive example further comprises a second output generated by the generative model based on a second input including the candidate text and the label description” as “The Lyric Engine receives a second set of suggested song lyrics that correspond to the selected song criteria and the selected song lyrics (Act 250). For example, the second set of suggest song lyrics may be generated by and received from the remote neural network based, at least in part, on recent activity and selections by the user that occurred after the Lyric Engine displayed the first set of suggested song lyrics” (Paragraph 45).
	The examiner further notes that the secondary reference of Ackerman teaches the concept of generating a second set of lyrics (i.e. a positive example in the broadest reasonable interpretation) based on a selected song lyric(s) (i.e. an example of candidate text) and song criteria (i.e. a label description).  The combination would result in the positive examples of Amrite being generated based off of candidate text and the label description of Walters.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Ackerman’s would have allowed Walters’ and Amrite’s to provide a method for generating musical lyrics based off of user criteria, as noted by Ackerman (Paragraph 4).
22.	Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Walters et al. (U.S. PGPUB 2022/0284280), in view of Amrite et al. (U.S. Patent 10,853,580) as applied to claims 1, 13, and 19-20 above, and further in view of AAPA (Applicant’s Admitted Prior Art).
23.	Regarding claim 4, Walters and Amrite do not explicitly teach a method comprising:
A)  wherein the label probability estimate is determined from a token probability of text included in the output of the generative model that corresponds to a first keyword associated with the label description or a second keyword associated with an anti-label description.
	AAPA, however, teaches “wherein the label probability estimate is determined from a token probability of text included in the output of the generative model that corresponds to a first keyword associated with the label description or a second keyword associated with an anti-label description” as “This label prediction and scoring systems behaves like any standard binary/multinominal classification system in the terms of its output. The output of the system is a Boolean/Multinominal class indicator (for Boolean positive class is indicated as 1 and negative as 0), and associated probability/likelihood” (Paragraph 99).
	The examiner further notes that the instant specification states that the output of the system is just like any other standard scoring system such that a class indicator (i.e. a token in the broadest reasonable interpretation) and associated likelihood is output.  The combination would result in the label probability estimate of Walters to be based on the output token probability of AAPA.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching AAPA’s would have allowed Walters’ and Amrite’s to provide a binary/multinominal classification, as noted by AAPA (Paragraph 99).
24.	Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Walters et al. (U.S. PGPUB 2022/0284280), in view of Amrite et al. (U.S. Patent 10,853,580) as applied to claims 1, 13, and 19-20 above, and further in view of Gong et al. (U.S. Patent 9,552,549).
25.	Regarding claim 8, Walters teaches a computer-readable media comprising:
A)  instructions that when executed by a computing device cause the computing device to perform a method comprising: receiving a candidate text (Paragraphs 30, 34, 40,  and 44); 
B)  receiving a label description (Paragraphs 30, 34, 40,  and 44); 
E)  determining a label probability estimate (Paragraphs 34, 37, 40-41, 44, and 47); and 
F)  determining that the candidate text corresponds to the label description based on the label probability estimate (Paragraphs 34, 37, 40-41, 44, and 47).
The examiner notes that Walters teaches “instructions that when executed by a computing device cause the computing device to perform a method comprising: receiving a candidate text” as “Once the user 105 has finished inputting labels for the example data sample 205, the user 105 may, after moving pointer 307 to a location over the button 303, press the button 303 to indicate that labeling of the example data sample 205 is complete. Based on pressing the button 303, the labels for the example data sample 205 may be stored for later access (e.g., by the server 110 receiving and storing the labels for the example data sample 205)” (Paragraph 30), “The labels themselves are included in a set of labels 230 for the additional example modified data sample 225. As depicted in FIG. 2B, the set of labels 230 includes an ordered list of labels that corresponds to each encoding inserted into the additional modified data sample 225” (Paragraph 34), “Once one or more iterations of the iterative training process is complete, the model 115 may be used to generate predicted labels 117. Predicted labels 117 may include data that indicates potential labels and confidence values associated with the potential labels. For example, the model 115 may receive, as input, a data sample and may generate, as output, one or more predictions for the data sample. A prediction may be for a specific location in the data sample. Further, the prediction may include one or more potential labels for the specific location and one or more confidence values for the one or more potential labels. An illustration of using the model 115 to generate predictions is provided in FIG. 2C. As depicted in FIG. 2C, the model 115 receives, as input, a data sample 235 formatted as email data. FIG. 2C shows an example portion of the data sample 235. Specifically, the example portion includes the text “My name is Smith. I am located in New York”. The model 115 may process the received data sample and generate predicted labels 240 for the received data sample. FIG. 2C shows, as part of the predicted labels 240, example predictions for the example portion of the data sample 235” (Paragraph 40), and “The determination whether to continue or stop the iterative training process for the model 115 may be performed on a per-set basis. As the user 105, via the user-based labeling process, is providing labels for additional sets of the data samples 103, the server 110 may be performing a process for predicting labels for the additional sets of the data samples 103. These additional sets, for example, may be the second set 103-2 and the third set 103-3 of the data samples 103. Once the user 105 completes labeling an additional set of the data samples 103 and the model 115 outputs predicted labels for the additional set of the data samples 103, the server 110 may determine whether to continue or stop the iterative training process for the model 115 based on a comparison between the labels provided by the user 105 for the additional set and the predicted labels generated by the model 115 for the additional set” (Paragraph 44).  The examiner further notes that data samples 103 and data sample 235 are examples of the claimed candidate text.  The examiner further notes that Walters teaches “receiving a label description” as “Once the user 105 has finished inputting labels for the example data sample 205, the user 105 may, after moving pointer 307 to a location over the button 303, press the button 303 to indicate that labeling of the example data sample 205 is complete. Based on pressing the button 303, the labels for the example data sample 205 may be stored for later access (e.g., by the server 110 receiving and storing the labels for the example data sample 205)” (Paragraph 30), “The labels themselves are included in a set of labels 230 for the additional example modified data sample 225. As depicted in FIG. 2B, the set of labels 230 includes an ordered list of labels that corresponds to each encoding inserted into the additional modified data sample 225” (Paragraph 34), “Once one or more iterations of the iterative training process is complete, the model 115 may be used to generate predicted labels 117. Predicted labels 117 may include data that indicates potential labels and confidence values associated with the potential labels. For example, the model 115 may receive, as input, a data sample and may generate, as output, one or more predictions for the data sample. A prediction may be for a specific location in the data sample. Further, the prediction may include one or more potential labels for the specific location and one or more confidence values for the one or more potential labels. An illustration of using the model 115 to generate predictions is provided in FIG. 2C. As depicted in FIG. 2C, the model 115 receives, as input, a data sample 235 formatted as email data. FIG. 2C shows an example portion of the data sample 235. Specifically, the example portion includes the text “My name is Smith. I am located in New York”. The model 115 may process the received data sample and generate predicted labels 240 for the received data sample. FIG. 2C shows, as part of the predicted labels 240, example predictions for the example portion of the data sample 235” (Paragraph 40), and “The determination whether to continue or stop the iterative training process for the model 115 may be performed on a per-set basis. As the user 105, via the user-based labeling process, is providing labels for additional sets of the data samples 103, the server 110 may be performing a process for predicting labels for the additional sets of the data samples 103. These additional sets, for example, may be the second set 103-2 and the third set 103-3 of the data samples 103. Once the user 105 completes labeling an additional set of the data samples 103 and the model 115 outputs predicted labels for the additional set of the data samples 103, the server 110 may determine whether to continue or stop the iterative training process for the model 115 based on a comparison between the labels provided by the user 105 for the additional set and the predicted labels generated by the model 115 for the additional set” (Paragraph 44).  The examiner further notes that manually input labels (i.e. an example of the claimed label description in the broadest reasonable interpretation) are received.  The examiner further notes that Walters teaches “determining a label probability estimate” as “The labels themselves are included in a set of labels 230 for the additional example modified data sample 225. As depicted in FIG. 2B, the set of labels 230 includes an ordered list of labels that corresponds to each encoding inserted into the additional modified data sample 225” (Paragraph 34), “The iterative training process may include training the model 115 using the first modified set of the data samples. For example, the model 115 may be trained based on the example modified data sample 210, and/or the additional example modified data sample 225 and the set of labels 230. In this way, the model 115 is trained based on the first set 103-1 of the data samples 103, but in a way that does not directly use the first set 103-1 of the data samples 103 in the training process. The model 115 may be configured to predict labels associated with particular formats of data” (Paragraph 37), “Once one or more iterations of the iterative training process is complete, the model 115 may be used to generate predicted labels 117. Predicted labels 117 may include data that indicates potential labels and confidence values associated with the potential labels. For example, the model 115 may receive, as input, a data sample and may generate, as output, one or more predictions for the data sample. A prediction may be for a specific location in the data sample. Further, the prediction may include one or more potential labels for the specific location and one or more confidence values for the one or more potential labels. An illustration of using the model 115 to generate predictions is provided in FIG. 2C. As depicted in FIG. 2C, the model 115 receives, as input, a data sample 235 formatted as email data. FIG. 2C shows an example portion of the data sample 235. Specifically, the example portion includes the text “My name is Smith. I am located in New York”. The model 115 may process the received data sample and generate predicted labels 240 for the received data sample. FIG. 2C shows, as part of the predicted labels 240, example predictions for the example portion of the data sample 235” (Paragraph 40), “Specifically, the examples of predicted labels 240 include a first example prediction of “Location (a, b): [first name, 66], [last name, 25]” for the text “Smith” and a second example prediction of “Location (c, d): [city, 50], [state, 50]” for the text “New York”. For the first example prediction, “Location (a, b)” may indicate the location of “Smith” in the data sample (e.g., a may indicate the location of the first character from “Smith”, S; and b may indicate the location of the last character from “Smith”, h). Continuing with the first example prediction, [first name, 66] may indicate a potential label of “first name” for “Smith” with a confidence value of 66” (Paragraph 41), “The determination whether to continue or stop the iterative training process for the model 115 may be performed on a per-set basis. As the user 105, via the user-based labeling process, is providing labels for additional sets of the data samples 103, the server 110 may be performing a process for predicting labels for the additional sets of the data samples 103. These additional sets, for example, may be the second set 103-2 and the third set 103-3 of the data samples 103. Once the user 105 completes labeling an additional set of the data samples 103 and the model 115 outputs predicted labels for the additional set of the data samples 103, the server 110 may determine whether to continue or stop the iterative training process for the model 115 based on a comparison between the labels provided by the user 105 for the additional set and the predicted labels generated by the model 115 for the additional set” (Paragraph 44), and “Additionally, the comparison may result in determining whether the label provided by the user 105 differs or matches the potential labels of the first example prediction based on the first example prediction's confidence values. For example, the comparison may be performed based on matching the label provided by the user 105 to the potential label having the highest confidence value. In this way, if the user 105 provided a label of “first name” for “Smith”, then the comparison may result in determining that the label provided by the user 105 matches the potential labels of the first example prediction (e.g., based on the first example prediction including “[first name, 66]” and that potential label having the highest confidence value for the first example prediction). If the user 105 provided a label of “last name” for “Smith”, then the comparison may result in determining that the label provided by the user 105 differs from the potential labels of the first example prediction (e.g., based on “[last name, 25]” not having the highest confidence value for the first example prediction). An additional example of performing the comparison based on the first example prediction's confidence value may include applying a confidence threshold to the potential labels. Under this additional example, a label provided by the user 105 matches only if the label is found in the prediction and has a confidence value greater than a threshold confidence value” (Paragraph 47).  The examiner further notes that calculated confidence values (i.e. the claimed label probability estimate in the broadest reasonable interpretation) are determined.  The examiner further notes that Walters teaches “determining that the candidate text corresponds to the label description based on the label probability estimate” as “The labels themselves are included in a set of labels 230 for the additional example modified data sample 225. As depicted in FIG. 2B, the set of labels 230 includes an ordered list of labels that corresponds to each encoding inserted into the additional modified data sample 225” (Paragraph 34), “The iterative training process may include training the model 115 using the first modified set of the data samples. For example, the model 115 may be trained based on the example modified data sample 210, and/or the additional example modified data sample 225 and the set of labels 230. In this way, the model 115 is trained based on the first set 103-1 of the data samples 103, but in a way that does not directly use the first set 103-1 of the data samples 103 in the training process. The model 115 may be configured to predict labels associated with particular formats of data” (Paragraph 37), “Once one or more iterations of the iterative training process is complete, the model 115 may be used to generate predicted labels 117. Predicted labels 117 may include data that indicates potential labels and confidence values associated with the potential labels. For example, the model 115 may receive, as input, a data sample and may generate, as output, one or more predictions for the data sample. A prediction may be for a specific location in the data sample. Further, the prediction may include one or more potential labels for the specific location and one or more confidence values for the one or more potential labels. An illustration of using the model 115 to generate predictions is provided in FIG. 2C. As depicted in FIG. 2C, the model 115 receives, as input, a data sample 235 formatted as email data. FIG. 2C shows an example portion of the data sample 235. Specifically, the example portion includes the text “My name is Smith. I am located in New York”. The model 115 may process the received data sample and generate predicted labels 240 for the received data sample. FIG. 2C shows, as part of the predicted labels 240, example predictions for the example portion of the data sample 235” (Paragraph 40), “Specifically, the examples of predicted labels 240 include a first example prediction of “Location (a, b): [first name, 66], [last name, 25]” for the text “Smith” and a second example prediction of “Location (c, d): [city, 50], [state, 50]” for the text “New York”. For the first example prediction, “Location (a, b)” may indicate the location of “Smith” in the data sample (e.g., a may indicate the location of the first character from “Smith”, S; and b may indicate the location of the last character from “Smith”, h). Continuing with the first example prediction, [first name, 66] may indicate a potential label of “first name” for “Smith” with a confidence value of 66” (Paragraph 41), “The determination whether to continue or stop the iterative training process for the model 115 may be performed on a per-set basis. As the user 105, via the user-based labeling process, is providing labels for additional sets of the data samples 103, the server 110 may be performing a process for predicting labels for the additional sets of the data samples 103. These additional sets, for example, may be the second set 103-2 and the third set 103-3 of the data samples 103. Once the user 105 completes labeling an additional set of the data samples 103 and the model 115 outputs predicted labels for the additional set of the data samples 103, the server 110 may determine whether to continue or stop the iterative training process for the model 115 based on a comparison between the labels provided by the user 105 for the additional set and the predicted labels generated by the model 115 for the additional set” (Paragraph 44), and “Additionally, the comparison may result in determining whether the label provided by the user 105 differs or matches the potential labels of the first example prediction based on the first example prediction's confidence values. For example, the comparison may be performed based on matching the label provided by the user 105 to the potential label having the highest confidence value. In this way, if the user 105 provided a label of “first name” for “Smith”, then the comparison may result in determining that the label provided by the user 105 matches the potential labels of the first example prediction (e.g., based on the first example prediction including “[first name, 66]” and that potential label having the highest confidence value for the first example prediction). If the user 105 provided a label of “last name” for “Smith”, then the comparison may result in determining that the label provided by the user 105 differs from the potential labels of the first example prediction (e.g., based on “[last name, 25]” not having the highest confidence value for the first example prediction). An additional example of performing the comparison based on the first example prediction's confidence value may include applying a confidence threshold to the potential labels. Under this additional example, a label provided by the user 105 matches only if the label is found in the prediction and has a confidence value greater than a threshold confidence value” (Paragraph 47).  The examiner further notes that calculated confidence values (i.e. the claimed label probability estimate in the broadest reasonable interpretation) are output and are indicative if the candidate text corresponds to label descriptions.
Walters does not explicitly teach:
C)  obtaining a negative example and a positive example associated with the label description; 
D)  generating an prompt for a generative model based on the negative example and the positive example.
	Amrite, however, teaches “obtaining a positive example and a negative example associated with the label description” as “The training data generator 172 generates the labeled training data 180. For example, the training data generator 172 stores or modifies one or more data structures (e.g., data tables) to indicate which word groups of the first and second candidate word groups 212, 214 are assigned as positive or negative examples of each of one or more labels (or text classification categories)” (Column 14, lines 48-54) and “generating an prompt for a generative model based on the negative example and the positive example” as “FIG. 1 illustrates a particular example of a system 100 that is operable to generate labeled training data 180 to train a text classifier 178” (Column 4, lines 59-61) and “The training data generator 172 generates the labeled training data 180. For example, the training data generator 172 stores or modifies one or more data structures (e.g., data tables) to indicate which word groups of the first and second candidate word groups 212, 214 are assigned as positive or negative examples of each of one or more labels (or text classification categories)” (Column 14, lines 48-54).
	The examiner further notes that the secondary reference of Amrite teaches the concept of using positive and negative examples that are associated with labels (i.e. labels descriptions in the broadest reasonable interpretation) in training data that is input (i.e. a “prompt” in the broadest reasonable interpretation) into a classifier (i.e. the claimed generative model in the broadest reasonable interpretation) for training that classifier.  The combination would result in the use of such positive and negative examples to train the classifier of Walters.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Amrite’s would have allowed Walters’ to provide a method for improving the accuracy of classifiers, as noted by Amrite (Column 8, lines 39-41).
Walters and Amrite do not explicitly teach:
E)  by comparing a first ranked score of the positive example generated by the generative model based on the prompt to a second ranked score of the negative example result generated by the generative model based on the prompt.
	Gong, however, teaches “by comparing a first ranked score of the positive example generated by the generative model based on the prompt to a second ranked score of the negative example result generated by the generative model based on the prompt” as “An error of the neural network may be determined based on a comparison, for each of the training examples, of the label scores for positive labels and negative labels for the training example and a semantic distance between each positive label and each negative label for the training example” (Abstract) and “At 506, semantic ranking loss may be determined for the label scores using the semantic structure and training example labels. For example, the semantic ranking loss, which may be the magnitude of the error of the neural network 110, may be determined by the neural network trainer 120 using the semantic ranking loss function 125, for example, according to (1). The semantic structure 180 may be used to determine the semantic distance between two labels. The semantic ranking loss function 125 may compare the label scores given to each of the positive labels with the label scores given to each of the negative labels by the neural network 110, weighting any errors based on the semantic distance between the labels being compared” (Column 10, lines 15-28).
	The examiner further notes that although Amrite uses positive and negative examples, there is no explicit teaching of comparing scores between the positive and negative.  Nevertheless, Gong teaches the concept of comparing scores of positive labels with scores of negative labels.  The combination would result in comparing scores of Amrite’s positive and negative examples.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching Gong’s would have allowed Walters’ and Amrite’s to provide a method for improving the prediction output of models, as noted by Gong (Column 5, lines 53-54).
26.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Walters et al. (U.S. PGPUB 2022/0284280), in view of Amrite et al. (U.S. Patent 10,853,580) as applied to claims 1, 13, and 19-20 above, and further in view of Gong et al. (U.S. Patent 9,552,549) as applied to claim 8 above, and further in view of AAPA (Applicant’s Admitted Prior Art).
27.	Regarding claim 9, Walters, Amrite, and Gong do not explicitly teach a computer-readable media comprising:
A)  wherein the first ranked score indicates a token generated by the generative model and a token probability associated with the token.
	AAPA, however, teaches “wherein the first ranked score indicates a token generated by the generative model and a token probability associated with the token” as “This label prediction and scoring systems behaves like any standard binary/multinominal classification system in the terms of its output. The output of the system is a Boolean/Multinominal class indicator (for Boolean positive class is indicated as 1 and negative as 0), and associated probability/likelihood” (Paragraph 99).
	The examiner further notes that the instant specification states that the output of the system is just like any other standard scoring system such that a class indicator (i.e. a token in the broadest reasonable interpretation) and associated likelihood is output.
It would have been obvious to one of ordinary skill in the art before the effective filing date of instant invention to combine the teachings of the cited references because teaching AAPA’s would have allowed Walters’, Amrite’s, and Gong’s to provide a binary/multinominal classification, as noted by AAPA (Paragraph 99).
Conclusion
28.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. PGPUB 2022/0019730 issued to Sharma et al. on 20 January 2022.  The subject matter disclosed therein is pertinent to that of claims 1-20 (e.g., methods to generate labels).
U.S. PGPUB 2021/0240781 issued to Horesh et al. on 05 August 2021.  The subject matter disclosed therein is pertinent to that of claims 1-20 (e.g., methods to generate labels).
Contact Information
29.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Mahesh Dwivedi whose telephone number is (571) 272-2731.  The examiner can normally be reached on Monday to Friday 8:20 am – 4:40 pm.
	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached (571) 272-4085.  The fax number for the organization where this application or proceeding is assigned is (571) 273-8300.
	Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


Mahesh Dwivedi
Primary Examiner
Art Unit 2168

April 20, 2026
/MAHESH H DWIVEDI/Primary Examiner, Art Unit 2168
Read full office action
Prosecution Timeline

Jul 19, 2024
Application Filed
Apr 24, 2026
Non-Final Rejection mailed — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/795,142
Patent 12639257
FILE SYSTEM CONTENT ARCHIVING BASED ON THIRD-PARTY APPLICATION ARCHIVING RULES AND METADATA
1y 9m to grant Granted May 26, 2026
18/459,144
Patent 12626160
MANAGING IMPACT OF POISONED INFERENCES ON DEPLOYMENTS OF HARDWARE TO DOWNSTREAM CONSUMERS
2y 8m to grant Granted May 12, 2026
18/809,378
Patent 12613837
SYSTEM AND METHOD FOR CLOUD-BASED READ-ONLY FOLDER SYNCHRONIZATION
1y 8m to grant Granted Apr 28, 2026
18/539,424
Patent 12608403
EXTRACTION MACHINE LEARNING FRAMEWORK
2y 4m to grant Granted Apr 21, 2026
18/171,704
Patent 12591818
FORECASTING AND MITIGATING CONCEPT DRIFT USING NATURAL LANGUAGE PROCESSING
3y 1m to grant Granted Mar 31, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
69%
Grant Probability
74%
With Interview (+4.5%)
3y 7m (~1y 9m remaining)
Median Time to Grant
Low
PTA Risk
Based on 754 resolved cases by this examiner. Grant probability derived from career allowance rate.