Last updated: May 29, 2026

Application No. 18/634,828

DEVICE AND METHOD FOR TRAINING A LANGUAGE MODEL

Non-Final OA §102§103§112

Filed

Apr 12, 2024

Priority

Apr 14, 2023 — SG 10202301040X +1 more

Examiner

ZHANG, LESHUI

Art Unit

2695

Tech Center

2600 — Communications

Assignee

Shopee Ip Singapore Private Limited

OA Round

1 (Non-Final)

Interview Optional

— +35.5% interview lift. Examiner has a relatively high allowance rate (78%); +35.5% interview lift. A written response may suffice.

Based on 937 resolved cases, 2023–2026

Examiner Intelligence

ZHANG, LESHUI View full profile →

Grants 78% — above average

Career Allowance Rate

728 granted / 937 resolved

+15.7% vs TC avg

Strong +36% interview lift

Without

With

+35.5%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

26 currently pending

Career history

980

Total Applications

across all art units

Statute-Specific Performance

§101

1.0%

-39.0% vs TC avg

§103

83.0%

+43.0% vs TC avg

§102

5.7%

-34.3% vs TC avg

§112

8.8%

-31.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 937 resolved cases

Office Action

§102 §103 §112

DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the response to this office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Specification
The application specification failed to disclose the claimed features: “preforming instruction tunning of a language model … ” as recited in claims 1, 13, 20 and the application specification merely disclosed “performing instruction tunning … (para 5, 36)” and “perform the method .. (para 17-19)”, etc.
Appropriate correction is required.

Drawings
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, the “preforming instruction tunning of a language model based on the training data” as recited in claims 1, 13,20, must be shown or the feature(s) canceled from the claim(s).  No new matter should be entered.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(B)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 4-5, 6-7, 9-12 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.
Claim 4 recited “The method of any one of claim 1” which is confusing because it is unclear whether “any one of claim 1” is referred to method step “generating training data”, “preforming instruction tuning of …” or preamble “training a language model” and thus, renders claim indefinite. Claim 5 is rejected due to the dependency to claim 4.
Claims 6, 9, 11, 12 are rejected for the at least similar reason as described in claim 4 above since claims 6, 9, 11, 12 recited the similar deficient feature as recited in claim 4 above. Claims 7, 10 are rejected due to the dependency to claims 6, 9, respectively. 

Examiner Comment
claims 2, 14 recited “an instruction to perform the symbolic task” and ““an instruction to perform the natural language task” and it appears that the two instructions above are different because one is dedicated “to perform the symbolic task” and one is dedicated “to perform the natural language task”, but use the same wording. In order to clarify the claimed term above, it is recommended to use --a first instruction to perform the symbolic task-- and -- a second instruction to perform the natural language task-- or similar and otherwise, if “an instruction” to perform the symbolic task and “an instruction” to perform the natural language task are the same “instruction(s)”, the second or later usage of word “instruction” shall be added with “the” or “said” before the wording “instruction” such as –the instruction to perform the natural language task-- or similar.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention..

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 8-14, 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wei et al. (“Finetuned Language Models are Zero-Shot Learners”, ICLR Feb 8, 2022, p.1-46, arXiv:2109.01652v5 [cs.CL], URL: 2109.01652).
Claim 1:  Wei teaches a method (title and abstract, ln 1-13, fig. 1) for training a language model (fine tuning language models), comprising: 
generating training data (left side of fig. 1, as input tasks from 12 categories or tasks in fig. 3, and for fine tuning a 137B parameter pretrained LM, abstract and seven clusters for instruction tuning, SESSION 4.1 NUMBER OF INSTRUCTION TUNING CLUSTERS, para 3 of p.6), the training data including symbolic tasks (tasks including Natural Language Generative NLG, teal colored, categories in fig. 3, e.g. the task of from structure to text with four datasets, etc. teal colored, in fig. 3) and natural language tasks (tasks including Natural Language Understanding NLU, blue colored, categories in fig. 3, e.g. the task of Reading Comprehension with five datasets, blue colored, in fig. 3, and generally, the input datasets have lengths 1024, SESSION 2.4 TRAINING DETAILS, para 4 of p. 4) and target outputs associated with the symbolic tasks and the natural language tasks (e.g., a target instruction for “Commonsense Reasoning” task and a target instruction for “Translation” task in fig. 1, generally, the target sequence has lengths 256, SESSION 2.4 TRAINING DETAILS, para 4 of p.4); and 
preforming instruction tuning of a language model (a 137B parameter pretrained LM, SESSION 1. INTRODUCTION, para 2 of p.2) based on the training data (performing instruction tuning, i.e., finetuning the model on a mixture of more th 60 NLP datasets expressed via natural language instructions, SESSION 1. INTRODUCTION, PARA 2 OF P.2).
Claim 13 has been analyzed and rejected according to claim 1 above and wherein Wei further teaches a data processing system (a computer having TPUv3 with 128 cores, SESSION 2.4 TRAINING DETAILS, para 4 of p.4) comprising: 
a memory storing instructions (using TPUv3 with 128 cores in LLM training and TensorFlowTM code of GoogleTM, the last paragraph of p.37, and as source code, SESSION 1. INTRODUCTION, para 5 of p.2, and thus, memory for storing the code is inherency); and 
at least one processor (LLM training in improving the zero-shot performance of LLM, session 1. INTRODUCTION, para 2 of p.2, by using TPUv3 with 128 cores above) coupled to the memory, the processor being configured to execute the instructions to perform the method of claim 1 (coupling the memory by the TPUv3 with 128 cores and execution of the software code for fine tuning the pretrained model are inherency).
Claim 20 has been analyzed and rejected according to claims 1, 13 above.
Claim 2: Wei further teaches, according to claim 1 above, wherein the training data further includes training inputs (datasets as input tasks with datasets and instructions in fig. 3, and details in SESSION G. TASKS AND DATASETS, paragraphs of p.36-37), wherein each of the training inputs includes a symbolic task and an instruction to perform the symbolic task (e.g., the translation task as input for instruction tuning the pretrained LM in fig. 1, the last paragraph of p.1) or a natural language task and an instruction to perform the natural language task (e.g., the the commonsense reasoning task as input for instruction tuning above, the last paragraph of p.1).
Claim 8: Wei further teaches, according to claim 1 above, the method further comprising: training the language model based on a training data set that includes first training data elements (teal colored tasks or categories in fig. 3), and second training data elements (blue colored tasks or categories in fig. 3), wherein each of the first training data elements includes a specification of a respective symbolic task (e.g., task translation as input, specification in SESSION G.7, p.46) and a target output (Target in G.7, p.46) for the symbolic task (for translation task as the input) and each of the second training data elements includes a specification of a respective natural language task (e.g., reading comprehension with commonsense task as the input, and specification with Options, SESSION G.6, P.46) and a target output for the natural language task (Target of G.6, p.46).
Claim 9: Wei further teaches, according to any one of claim 1 above, the method further comprising: training the language model based on a training data set (tasks or categories in fig. 3) that includes the training data (training datasets in each of categories or tasks in fig. 3), wherein each training data element of the training data includes, as training input (input in G.7, p.46), a specification of a respective symbolic task (“translate to German”, etc., in G.7), a target output for the symbolic task (the Target of G.7, p.46), a natural language task (e.g., Reading Comprehension with Commonsense in G.6, p.46) and a target output for the natural language task (Target of G.6, p.46).
Claim 10: Wei further teaches, according to claim 9 above, wherein the specification of the respective symbolic task includes a database table (each of dataset uses ten unique templates as the database table, and the templates use natural language instructions to describe the task for that dataset, e.g., in fig. 4) and the respective symbolic task is a query of the database table (hypothesis as query to the templates with options yes or no in fig. 4).
Claim 11: Wei further teaches, according to any one of claim 1 above, wherein the language model is a large language model (the simple method to improve the zero-shot performance of the large language model LLM, SESSION INTRODUCTION, para 2 of p.2).
Claim 12: Wei further teaches, according to any one of claim 1 above, wherein the language model is a pre-trained language model (a 137B parameter pretrained language model and tuned by instruction tune on over 60 NLP datasets verbalized via natural language instruction templates, abstract) and wherein the method further comprises: fine-tuning the pre-trained language model based on the generated training data (the training datasets in fig. 3, and applied for instruction tuning the pretrained language model, abstract).
Claim 14 has been analyzed and rejected according to claims 13, 2 above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 3-5, 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Wei (above) and in view of reference Chen et al. (“Meta-learning via Language Model In-context Tuning”, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Vol 1, pp. 719-730, May 22-27, 2022, hereinafter Chen).
Claim 3: Wei further teaches, according to claim 1 above, target outputs (expected Target for each dataset in G1-G7, p.38-46) and outputs of the language model (one of Options in G1-G7, or out from FLAN associated with inputs in figs. 13-22), except explicitly teaching that the method further comprising: determining a loss between outputs of the language model and the target outputs associated with the natural language tasks; and adapting parameters of the language model to reduce the loss.
Chen teaches an analogous field of endeavor by disclosing a method for training a language model (title and abstract, ln 1- 29 and fig. 1) and wherein outputs of the language model are disclosed (represented by Ө as initialized and then updated with gradient descent using task examples ₸, para Figure 1, p.720) and target outputs associated with the natural language tasks are also disclosed (a task-specific model from task examples ₸ and represented as Ө’, para Figure 1, p.720) and a loss between outputs of the language model and the target outputs associated with the natural language tasks is determined (few-shot prediction loss ∆ of Ө’ using task examples from ₸, para Figure 1, p.720); and adapting parameters of the language model to reduce the loss (by using few-shot in-context tuning objective function 1-2, p.721 and Ḽ(Ө) is sum of all few-shot in-context tuning objective function ḼT(Ө) in equation 2, p.721 and by using gradient-based task adaptation, SESSION 2.3, P.721 and minimizing few-shot prediction loss of Ө’ on task T, para Figure 1, p.720) for benefits of simplifying adaptation (by simplifying two-stage process of few-shot task adaptation and task-specific prediction as on sequence prediction problem by concatenating model inputs, SESSION 2.3, P.721). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied determining the loss between the outputs of the language model and the target outputs associated with the natural language tasks; and adapting parameters of the language model to reduce the loss, as taught by Chen, to the target outputs and outputs of the language model, as taught by Wei, for the benefits discussed above.
Claim 4: the combination of Wei and Chen further teaches, according to claim 1 above, the method further comprising: determining a loss between outputs of the language model and the target outputs for the symbolic tasks (Chen, the loss for individual task T, i.e., individual few-shot in-context tuning objective function ḼT(Ө) of individual task T in equation 2, p.721); and adapting parameters of the language model to reduce the loss (Chen, minimize few-shot prediction loss of Ө’ on task T, para Figure 1, p.720).
Claim 5: the combination of Wei and Chen further teaches, according to claim 4 above, the language model (Wei, pretrained language model, as discussed in claim 1 above, and Chen, using in-context tuning ICT to meta train the language model, abstract), and the language model comprises parameters (Wei, 137B parameter pretrained language model, abstract and Chen, model parameters are fixed during task adaptation, para Figure 1, p.720 and adapting the model parameters to new tasks by gradient descent on few-shot examples, SESSION 2.3 Gradient-based Task Adaptation, p.721), except explicitly teaching wherein the language model comprises a neural network and the parameters include neural network weights.
An Official Notice is taken that language model comprising a neural network and parameters of the language model including neural network weights are well-known in the art for benefits of high accuracy in prediction, capability of handling large datasets, automatic feature extraction, and demonstrating fault tolerance and parallel processing.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein the language model comprising the  neural network and the parameters of the language model including neural network weights, as taught by well-known above, to the language model with the parameters in the method for training the language model, as taught by the combination of Wei and Chen, for the benefits discussed above.
Claim 15 has been analyzed and rejected according to claims 13, 3 above.
Claim 16 has been analyzed and rejected according to claims 13, 4 above.
Claim 17 has been analyzed and rejected according to claims 16, 5 above.

Claims 6-7, 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Wei (above) and in view of reference Snell et al. (“LEARNING BY DISTILLING CONTEXT”, University of California, Berkeley, EECS Dept., arXiv:2209.15189v1 [cs.CL], URL: 2209.15189, September 30, 2022, hereinafter Snell).
Claim 6: Wei further teaches, according to claim 1 above, wherein each of at least some of the symbolic tasks is the query (query: translation this sentence to Spanish, as the input task in fig. 1, p.1), except explicitly teaching the query is in a database query language.
Snell teaches an analogous field of endeavor by disclosing a method for training a language model (title and abstract, ln 1-21 and with context distillation in fig. 2) and wherein a query of task example is disclosed (SQL query annotated in English questions of a list in the SPIDER database, SESSION 3.2 LEARNING FROM CONCRETE EXAMPLES, p.7) to be in a database query language (SQL query above) for benefits of improving training performance (by improving context distillation while internalizing natural language explanation, para 5 of p.6, and internalizing skills generated by reasoning, SESSION 3.3 LEARNING FROM STEP-BY-STEP REASONING, para 4 of p.8). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein the query is in the database query language, as taught by Snell, to the query in each of the at least some of the symbolic tasks in the method for training the language model, as taught by Wei, for the benefits discussed above.
Claim 7: the combination of Wei and Snell further teaches, according to claim 6 above, wherein the database query language is Structured Query Language (Snell, SQL by using database in SPIDER, SESSION 3.2 LEARNING FROM CONCRETE EXAMPLES, Dataset and Language Models, para 6 of p.7).
Claim 18 has been analyzed and rejected according to claims 13, 6 above.
Claim 19 has been analyzed and rejected according to claims 18, 7 above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589. The examiner can normally be reached Monday-Friday 6:30amp-4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached at 571-272-7848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/LESHUI ZHANG/
Primary Examiner, 
Art Unit 2695

Read full office action

Prosecution Timeline

Apr 12, 2024

Application Filed

Dec 22, 2025

Non-Final Rejection mailed — §102, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/504,102

Patent 12640156

AUDIO ENCODING METHOD AND APPARATUS, AND AUDIO DECODING METHOD AND APPARATUS

2y 6m to grant Granted May 26, 2026

17/992,473

Patent 12633301

METHOD AND SYSTEM FOR PERFORMING DATA AUGMENTATION BASED ON MODIFIED SURROGATES, AND, NON-TRANSITORY COMPUTER READABLE MEDIUM

3y 5m to grant Granted May 19, 2026

18/262,169

Patent 12620401

ACOUSTIC PATTERN DETERMINATION

2y 9m to grant Granted May 05, 2026

18/685,762

Patent 12621620

SOUND SIGNAL DOWNMIX METHOD, SOUND SIGNAL CODING METHOD, SOUND SIGNAL DOWNMIX APPARATUS, SOUND SIGNAL CODING APPARATUS, PROGRAM

2y 2m to grant Granted May 05, 2026

18/476,267

Patent 12614555

Method and System for Producing an Augmented Ambisonic Format

2y 7m to grant Granted Apr 28, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

78%

Grant Probability

99%

With Interview (+35.5%)

2y 9m (~7m remaining)

Median Time to Grant

Low

PTA Risk

Based on 937 resolved cases by this examiner. Grant probability derived from career allowance rate.