Last updated: May 04, 2026

Application No. 18/623,064

TEXT AUGMENTATION USING DATASET RECONSTRUCTION

Non-Final OA §112

Filed

Apr 01, 2024

Examiner

NEWAY, SAMUEL G

Art Unit

2657

Tech Center

2600 — Communications

Assignee

International Business Machines Corporation

OA Round

1 (Non-Final)

Interview Optional

— +7.6% interview lift. Interview lift (+7.6%) is below the 15.0% threshold. A written response is recommended.

Based on 686 resolved cases, 2023–2026

Examiner Intelligence

NEWAY, SAMUEL G View full profile →

Grants 75% — above average

Career Allowance Rate

517 granted / 686 resolved

+13.4% vs TC avg

Moderate +8% lift

Without

With

+7.6%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

29 currently pending

Career history

715

Total Applications

across all art units

Statute-Specific Performance

§101

16.6%

-23.4% vs TC avg

§103

34.4%

-5.6% vs TC avg

§102

17.1%

-22.9% vs TC avg

§112

20.2%

-19.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 686 resolved cases

Office Action

§112

DETAILED ACTION
This is responsive to the application filed 01 April 2024.
Claims 1-20 are pending and considered below.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 8-20 are objected to because of the following informalities:
In line 13 of claim 8 the limitation “soft-prompts” should be ‘soft prompts’ for consistency.  Claim 15 suffers from a similar deficiency. The remaining claims are objected to for depending upon an objected to claim without providing a remedy.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5, 12, 18 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 5, in line 2, recites the limitation “a last hidden representation ([cls] token)”. It is unclear if [cls] token is a necessary limitation or merely an example. 
Claims 12 and 18 suffer from similar deficiencies.
Claim 20, in line 1, recites the limitation “The computer program product of claim 8” which lacks proper antecedent basis in the claim as 8 is not directed to a computer program product. It is believed the limitation should be ‘The computer program product of claim [[8]] 15’ and will be interpreted as such herein below.

Allowable Subject Matter
Claims 1-4 and 6-7 are allowed.
Claims 5 and 8-20 would be allowable if rewritten to overcome the objections and rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action.
The following is a statement of reasons for the indication of allowable subject matter:  the closest prior art of record Anaby-Tavor et al. ("Do not have enough data? Deep learning to the rescue!." Proceedings of the AAAI conference on artificial intelligence. Vol. 34. No. 05. 2020.) discloses a computer-implemented method comprising: 
receiving a source dataset comprising: a plurality of textual data instances, and corresponding labels in two or more classes (“The main input to LAMBADA is a training dataset Dtrain, which we would like to augment with synthesized data. Dtrain contains a set of sentences, each labeled with a class”, section 4, paragraph 2); 
training a machine learning classifier on the source dataset (“train a baseline classifier h = A(Dtrain) using the existing data Dtrain”, section 4, Step 1); and
feeding data as prompts for a trained language model, to tune said trained language model to reconstruct said data instances in said subset (“fine-tune the language model G to the task of synthesizing labeled sentences, to obtain the fine-tuned language model Gtuned. Here, G is specifically fine-tuned to the linguistic domain of Dtrain”, section 4, Step 1).
Anaby-Tavor, individually or in combination, does not disclose performing inference by the trained machine learning classifier over a subset of said data instances in the source dataset, to extract a hidden representation for each of said data instances in said subset; applying a trained multilayer perceptron (MLP) network to the extracted hidden representations, to generate a set of corresponding soft prompts; and feeding the generated set of soft prompts as prompts for a trained language model, to tune said trained language model to reconstruct said data instances in said subset.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Kantor et al. (US 2021/0350076) discloses techniques for augmenting textual data that may be used for textual classification tasks. Embodiments of such techniques may provide the capability to synthesize labeled data to improve text classification tasks. Embodiments may be specifically useful when only a small amount of data is available, and provide improved performance in such cases. For example, in an embodiment, a method implemented in a computer system may comprise a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor, and the method may comprise fine-tuning a language model using a training dataset, synthesizing a plurality of samples using the fine-tuned language model, filtering the plurality of synthesized samples, and generating an augmented training dataset comprising the training dataset and the filtered plurality of synthesized sentences.
Chen et al. (US 2023/0419049) discloses training a prompt generator for text classification. A first training dataset associated with a first plurality of class labels is received for a first training process. For a first instance of the first training dataset, a set of labels of interest is generated by sampling from a set of possible class labels including the first plurality of class labels. The prompt generator generates a first prompt based on the set of labels of interest. A pretrained language model generates a task output in response to an input of the first instance prepended with the first prompt. A loss objective is generated based on the task output and the set of labels of interest. Parameters of the prompt generator are updated based on the computed loss function via backpropagation while the pretrained language model is frozen.
Ding et al. ("DAGA: Data augmentation with a generation approach for low-resource tagging tasks." arXiv preprint arXiv:2011.01549 (2020)) discloses a novel augmentation method with language models trained on the linearized labeled sentences in order to generate high quality synthetic data for low-resource tagging tasks.
Yang et al. ("Generative data augmentation for commonsense reasoning." Findings of the Association for Computational Linguistics: EMNLP 2020) discloses a generative data augmentation technique that aims to achieve more accurate and robust learning in a low-resource setting. The approach generates synthetic examples using pretrained language models and selects the most informative and diverse set of examples for data augmentation.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMUEL G NEWAY whose telephone number is (571)270-1058. The examiner can normally be reached Monday-Friday 9:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SAMUEL G NEWAY/Primary Examiner, Art Unit 2657

Read full office action

Prosecution Timeline

Apr 01, 2024

Application Filed

Jan 31, 2026

Non-Final Rejection — §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/489,772

Patent 12613789

ARTIFICIAL INTELLIGENCE BASED GENERATION OF DATA CONNECTORS

2y 6m to grant Granted Apr 28, 2026

18/441,889

Patent 12608561

STRUCTURED DOCUMENT GENERATION USING DOCUMENT-SCALE EMBEDDINGS

2y 2m to grant Granted Apr 21, 2026

18/736,727

Patent 12608554

Method And System For Understanding Medical Chinese Spoken Language, Electronic Device, And Storage Medium

1y 10m to grant Granted Apr 21, 2026

18/067,086

Patent 12602538

METHOD AND SYSTEM FOR EXEMPLAR LEARNING FOR TEMPLATIZING DOCUMENTS ACROSS DATA SOURCES

3y 4m to grant Granted Apr 14, 2026

18/100,645

Patent 12603177

INTERACTIVE CONVERSATIONAL SYMPTOM CHECKER

3y 2m to grant Granted Apr 14, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

75%

Grant Probability

83%

With Interview (+7.6%)

3y 0m (~12m remaining)

Median Time to Grant

Low

PTA Risk

Based on 686 resolved cases by this examiner. Grant probability derived from career allowance rate.