Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-6, 8-13, and 15-20 are pending. Claims 1, 8 and 15 are independent.
This Application was published as US 20240037345.
Apparent priority is 28 July 2022.
Response to Amendments
Amendments introduced new matter which is not described in the originally filed application. Refer to 112(a) rejection below.
Response to Arguments
35 USC 103
Applicant’s arguments with respect to claim(s) 1-6, 8-13, and 15-20 have been considered but are not persuasive.
Applicant argues on pg. 8-9 that "executing, via the at least one processor, natural language processing (NLP) on the prompt, resulting in parsed text of the prompt" and "training a machine learning model using the prompt, wherein the training comprises changing one or more parameters of the machine learning model based on the parsed text of the prompt" is not disclosed or adequately suggested by the cited combination.
MPEP 211.01.I states that “Under a broadest reasonable interpretation (BRI), words of the claim must be given their plain meaning, unless such meaning is inconsistent with the specification. The plain meaning of a term means the ordinary and customary meaning given to the term by those of ordinary skill in the art at the relevant time.” The instant specification does not provide a clear meaning of parsed text. [0036] in combination with figure 1 use the term “parsing” to describe the processing done by the machine learning model, while [0053] describes “parsed text” as part of the prompt which is later input to a machine learning model. IBM defines parsing as: “Parsing is separating data and assigning parts of it into one or more variables. Parsing can assign each word in the data into a variable or can divide the data into smaller parts. Parsing is also useful to format data into columns.” Para 1.
Under the BRI, parsing includes any separating of data into smaller parts. Brown discloses that BPE encoding is used with their model. (pg. 24, para 5) This includes breaking up the input to the model (prompt) into smaller tokens by processing natural language. Under the BRI this reads on parsing the input.
Applicant also argues on page 9 that the proposed modification would change the principle of operation of Brown’s few-shot approach. MPEP 2143.01.VI cites the that the court held the “suggested combination of references would require a substantial reconstruction and redesign of the elements shown in [the primary reference] as well as a change in the basic principle under which the [primary reference] construction was designed to operate.”). Selectively updating weights in a few-shot training method does not change the basic principle under which the machine learning model operates, and does not require any reconstruction of the model itself. It simply alters which parameters are updated while retaining all of the same parameters in the original model. Rather, this combination falls under at least “C. Use of known technique to improve similar devices (methods, or products) in the same way” (MPEP 2143.I.C), “D. Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results” (MPEP 2143.I.D), and “G. Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention.” (MPEP 2143.I.G).
Applicant argues (bottom of page 9) that there is no motivation for a POSITA to combine the references because Brown does not describe a problem of continual learning or catastrophic forgetting/overfitting, and because Mazumder discloses an issue with fine-tuning where all weights are adjusted.
Examiner disagrees for multiple reasons. First, Brown does disclose a problem of continual learning and overfitting (“…However, a major limitation to this approach is that while the architecture is task-agnostic, there is still a need for task-specific datasets and task-specific fine-tuning: to achieve strong performance on a desired task typically requires fine-tuning on a dataset of thousands to hundreds of thousands of examples specific to that task. Removing this limitation would be desirable, for several reasons. First, from a practical perspective, the need for a large dataset of labeled examples for every new task limits the applicability of language models…” Pg. 3, para 2-3) Brown additionally discloses that their method is competitive with fine-tuned models, but fine-tuned models remain state of the art (“Broadly, on NLP tasks GPT-3 achieves promising results in the zero-shot and one-shot settings, and in the the few-shot setting is sometimes competitive with or even occasionally surpasses state-of-the-art (despite state-of-the-art being held by fine-tuned models)…” Pg. 5 para 4) Brown clearly discloses that their method has advantages over traditional fine-tuning, but there is still a performance gap. (“…By presenting a broad characterization of GPT-3’s strengths and weaknesses, including these limitations, we hope to stimulate study of few-shot learning in language models and draw attention to where progress is most needed.” Pg. 5, para 6).
Mazumder is directed to solving the same issues: performing few-shot learning and preventing overfitting (abstract). While Brown occasionally surpasses state of the art (pg. 5 para 4), Mazumder surpasses state of the art by a 19.27% margin (abstract). Therefore, a POSITA would have been motivated to use Mazumder’s method to improve Brown’s method.
Second, MPEP 2143.I.G cites: “Because the desire to enhance commercial opportunities by improving a product or process is universal—and even common-sensical—we have held that there exists in these situations a motivation to combine prior art references even absent any hint of suggestion in the references themselves. In such situations, the proper question is whether the ordinary artisan possesses knowledge and skills rendering him capable of combining the prior art references." Id. at 1368, 80 USPQ2d at 1651.” There is no requirement that the primary reference describe a problem which is solved by the secondary reference; a teaching of any improvement is considered adequate motivation to combine references.
Therefore, the rejection is maintained.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-6, 8-13, and 15-20 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 1, lines 12-13 add the limitation “executing, via the at least one processor, natural language processing (NLP) on the prompt, resulting in parsed text of the prompt”.
[0036] of the instant specification states: “…In other configurations, the created prompt 106 is reduced or formatted prior to aggregation. The resulting prompt 106 is input into a machine learning model 108, which performs natural language processing on the prompt 106, parsing out the task description 102 and the examples 104, with the output being a task algorithm 110.”
PNG
media_image1.png
788
1152
media_image1.png
Greyscale
Fig. 1, Instant Application
In this example in [0036], the output of the NLP “parsing” by the Machine Learning Model is the Task Algorithm, which appears to be a trained machine learning model. However, in the claims, the prompt is parsed, and then further used to train a machine learning model, which is completely different.
[0053] describes parsing the task description, wherein the prompt further comprises the parsed text. However, in the claims, the entire aggregated prompt, including task description, example transformations, and input and output labels, is parsed. This is not supported by the originally filed application, and therefore the claim is rejected.
Claims 8 and 15 have corresponding limitations and are rejected for the same reasons.
Claims 2-6, 9-13, and 16-20 are rejected as depending on the rejected claims.
Claims 5, 12, and 19 repeat the new matter found in the independent claims and are additionally rejected for the same reasons as the independent claims.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-6, 8-13, and 15-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Brown et al. ("Language Models Are Few-Shot Learners") in view of Pfitzmann et al. (US 20230132061 A1) and Mazumder et al. ("Few-Shot Lifelong Learning").
Regarding claim 1 Brown discloses:
A method comprising: receiving data at a source system comprising a source-specific configuration, ("For each task we generate a dataset of 2,000 random instances of the task and evaluate all models on those instances."[pg. 22, paragraph 2]) wherein the data has a plurality of rows and an initial format (Figure G.21 shows the context data is in multiple rows. Brown also discloses use of massive datasets which could typically be arranged in rows. Brown also discloses data (example transformations) in rows in Fig. G.3.)
receiving, from a user at the source system, a description of a task associated with the data; ("task description"[pg. 7, Figure 2.1, Few-shot section])
receiving, from the user at the source system, a plurality of example transformations; ("examples"[pg. 7, Figure 2.1, Few-shot section])
receiving, from the user at the source system, input and output labels. (Figure G.18 shows input labels “Q:” and output labels “A:”.)
combining, via at least one processor of the source system, the task description together with the plurality of example transformations and input and output labels, resulting in a prompt; (Lines 1-5 are entered to the model at the same time and can be considered a single prompt.[pg. 7, Figure 2.1, Few-shot section])
wherein the combining further comprises performing a string aggregation of the task description together with the plurality of example transformations and input and output labels; (Fig. G.33 shows a task description (“Instructions”) with input and output labels (“Question:” and “Answer:”). These are aggregated in a string format, with different portions separated by equal signs (“=”) which would be valid in a string prompt. Fig. G.1 shows a string aggregation of example transformations with input and output labels. )
(It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the task description with the example transformations and labels in order to improve model performance. (“Model performance improves with the addition of a natural language task description, and with the number of examples in the model’s context, K.” Brown, Pg. 5 para 2))
executing, via the at least one processor, natural language processing (NLP) on the prompt, resulting in parsed text of the prompt; (“…our BPE encoding operates on significant fractions of a word (on average 0:7 words per token), so from the LM’s perspective succeeding at these tasks involves not just manipulating BPE tokens but understanding and pulling apart their substructure…”pg. 24, para 5 – BPE encoding includes parsing all input to the model under the BRI.)
training a machine learning model using the prompt (“As shown in Figure 2.1, for a typical dataset an example has a context and a desired completion (for example an English sentence and the French translation), and few-shot works by giving K examples of context and completion, and then one final example of context, with the model expected to provide the completion. … As indicated by the name, few-shot learning as described here for language models is related to few-shot learning as used in other contexts in ML [HYC01, VBL+16] – both involve learning based on a broad distribution of tasks (in this case implicit in the pre-training data) and then rapidly adapting to a new task.” Pg. 6, para 7 – providing examples for the machine learning model to learn would be considered training.
wherein the training comprises changing one or more parameters of the machine learning model (not explicitly disclosed by Brown)
based on the parsed text of the prompt; (the BPE encoding would be applied to all inputs)
executing, via the at least one processor, the machine learning model, wherein the prompt is an input to the machine learning model, and wherein output of the machine learning model comprises computer-executable instructions for executing the task; ("the model is given a few demonstrations of the task at inference time as conditioning". [pg. 6, paragraph 7] The examples are input to the model to train it in the few shot setting. The trained model contains computer-executable instructions which can process further data)
executing, via the at least one processor, the task to transform a row of the plurality of rows from the initial format to a transformed format using the computer-executable instructions; ("For each task we generate a dataset of 2,000 random instances of the task and evaluate all models on those instances."[pg. 22, paragraph 2])
presenting, by a user interface, the transformed format of the row; (Fig. 3.17 shows outputs in bold.)
validating, by the user, the transformed format of the row; (Figure G.26 shows an example of using the method to format text input in a Symbol Insertion task to produce a clean output. (“Context -> Please unscramble the letters into a word, and write that word: r e!c.i p r o.c a/l = Target Completion -> reciprocal”.) Pg. 8, Section 2.2 clearly indicates that part of the data is held out to validate the result. For the Symbol Insertion task, this would be a validation of the format. Additionally, Table H.1 shows the scores, including for “Symbol Insertion.” See also Figures 3.1 and 4.1)
responsive to the validating, automatically applying the computer-executable instructions to each of the plurality of rows; and (Fig. 3.1 shows performance based on validation. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to automatically apply the best performing model.)
loading the each of the plurality of rows, after the validation, to a knowledge graph via a connector interface. (not explicitly disclosed by Brown)
Brown does not disclose sending the output, after the validation, to a knowledge graph, or changing parameters of the model during the (few-shot) training.
Pfitzmann discloses: loading the each of the plurality of rows, after the validation, to a knowledge graph via a connector interface. (“Knowledge graphs are well-known data structures for representing information derived from a large corpus of documents. A knowledge graph essentially comprises nodes, which represent particular entities about which associated information is stored, interconnected by edges which represent defined relations between entities.” [0003]; See Abstract, which discloses that information is extracted from a corpus of documents and used to generate a knowledge graph. An interface to extract the information is implicitly disclosed.)
Brown and Pfitzmann are considered analogous art to the claimed invention because they disclose methods of extracting information from text. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Brown with a knowledge graph as disclosed by Pfitzmann. Doing so would have been beneficial to produce a searchable representation of the extracted information. (Pfitzmann, Abstract). Additionally, this combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Pfitzmann does not changing parameters of the model during the (few-shot) training.
Mazumder discloses: wherein the training comprises changing one or more parameters of the machine learning model (“For training on the few-shot training set D(t>1), we select a few unimportant parameters as the session trainable parameters (marked in green). After completing the training on each few-shot training set D(t>1), we re-identify the important and unimportant parameters and select the few session trainable parameters for the next session. By preserving the important parameters in the model, the model can preserve the old knowledge. Further, by training only a few session trainable parameters for each few-shot training set, overfitting is also reduced.” pg. 2338, Figure 1)
Brown, Pfitzmann, and Mazumder are considered analogous art to the claimed invention because they disclose methods of extracting information. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Brown in view of Pfitzmann to update model parameters while performing few shot learning. Doing so would have been beneficial to allow for continual learning while reducing overfitting. (Mazumder Fig. 1). Mazumder further discloses that this method improves on the state of the art performance (abstract). Additionally, this combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Regarding claim 2 in addition to the limitations of claim 1 Brown discloses: The method of claim 1, wherein the plurality of example transformations comprise: an input for a transformation; and an output for the transformation. ("sea otter => loutre de mer"[pg. 7, Figure 2.1, Few-shot section])
Regarding claim 3 in addition to the limitations of claim 1 Brown discloses: The method of claim 1, wherein the plurality of example transformations number three. (There are three examples (sea otter, peppermint, and plush girafe)[pg. 7, Figure 2.1, Few-shot section])
Regarding claim 4 in addition to the limitations of claim 1 Brown discloses: The method of claim 1, wherein the description of the task is prose. (“For some tasks (see Appendix G) we also use a natural language prompt in addition to (or for K = 0, instead of) demonstrations.”[pg. 10 paragraph 2])
Regarding claim 5 in addition to the limitations of claim 4 Brown discloses: The method of claim 4, further comprising: executing, via the at least one processor, natural language processing (NLP) on the description of the task, the plurality of example transformations and input and output labels, resulting in parsed text, wherein the prompt further comprises the parsed text. ("larger models are able to make increasingly effective use of in-context information, including both task examples and natural language task descriptions." [pg. 24, paragraph 4] The model uses the natural language task description to more effectively perform the task. It would be obvious to one of ordinary skill in the art that in order to use the natural language description, the model must use NLP and use the processed text as an input. Additionally the BPE encoding mapped in claim 1 reads on parsing all input text.)
Regarding claim 6 in addition to the limitations of claim 1 Brown discloses: The method of claim 1, further comprising: receiving, at the source system, feedback regarding accuracy of the execution of the task on the data using the computer-executable instruction; and retraining, via the at least one processor, the machine learning model using the feedback. ("we run evaluation on the clean-only examples and report the relative percent change between the clean score and the original score." "retrain the model on a corrected version of the training dataset." [pg. 44 paragraph 5, paragraph 4] Brown describes the evaluation of the model, specifically with regard to the training data. In this case, the authors did not retrain the model due to cost restraints, but a method wherein the model is retrained is disclosed.)
Claim 8 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale. Additionally, “at least one processor” and “a non-transitory computer-readable storage medium having instruction stored” of the Claim are taught by Brown (Brown discloses training a GPT-3 model which requires a processor and instructions stored in memory).
Claim 9 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 10 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 11 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.
Claim 12 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.
Claim 13 is a system claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.
Claim 15 is a computer-readable storage medium claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
Claim 16 is a computer-readable storage medium claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 17 is a computer-readable storage medium claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 18 is a computer-readable storage medium claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.
Claim 19 is a computer-readable storage medium claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.
Claim 20 is a computer-readable storage medium claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JON C MEIS whose telephone number is (703)756-1566. The examiner can normally be reached Monday - Thursday, 8:30 am - 5:30 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached on 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JON CHRISTOPHER MEIS/Examiner, Art Unit 2654
/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654