DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on October 6, 2023, and on September 26, 2024, are in compliance with the provisions of 37 CFR 1.97 and has been considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea (mental process) without significantly more.
Claim 1:
Regarding claim 1, in step 1 of the 101-analysis set forth in MPEP 2106, the claim recites “a computer-implemented method for improved prompting of a machine-learned model, the method comprising: obtaining, by a computing system comprising one or more processors, an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response;
inputting, by the computing system and to the machine-learned model, the instructive sequence and an operative query, wherein the machine-learned model has been pre-trained using a plurality of diversified objectives; and generating, by the computing system, using the machine-learned model and responsive to the operative query, an operative response,”
and a method is one of the four statutory categories of invention.
In step 2A prong 1 of the 101-analysis set forth in the MPEP 2106, the examiner has determined that the following limitations recite a process that, under the broadest reasonable interpretation, covers a mental process but for recitation of generic computer components:
obtaining,… an instructive sequence descriptive of an instructive query, an instructive response, (mental process, a person can mentally evaluate and obtain an instructive sequence that describes an instructive query, and a person can mentally evaluate and generate an instructive response, see MPEP 2106.04(a)(2)(III)),
and an instructive trace of intermediate states from the instructive query to the instructive response; (mental process, a person can mentally evaluate and obtain a series of steps to solve a problem (i.e. obtain an instructive trace of intermediate states from the instructive query to the instructive response), see MPEP 2106.04(a)(2)(III)),
generating, … and responsive to the operative query, an operative response, (This is considered a mental process, a person can mentally evaluate from the operative query, and generate an operative response, see MPEP 2106.04(a)(2)(III)),
If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. Accordingly, the claim “recites” an abstract idea.
In step 2A prong 2 of the 101-analysis set forth in MPEP 2106, the examiner has determined that the following additional elements do not integrate this judicial exception into a practical application:
A computer-implemented method for improved prompting of a machine-learned model, the method comprising: obtaining, by a computing system comprising one or more processors, (This is considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
inputting, by the computing system and to the machine-learned model, the instructive sequence and an operative query, (In step 2A, prong 2, inputting an instructive sequence and an operative query to the computing system and machine learning model recites mere data gathering, which is considered insignificant extra-solution activity – see MPEP 2106.05(g))
wherein the machine-learned model has been pre-trained using a plurality of diversified objectives, (The pre-training process with diversified objectives or various tasks is considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
generating by the computing system, using the machine-learned model…, (This is considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is “directed” to an abstract idea.
In step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, additional elements iv, vi and vii recite mere instructions to apply the judicial exception using generic computer, which are not indicative of significantly more. The additional element v recites mere data gathering, and is considered insignificant extra-solution activity. In step 2B, this insignificant extra-solution activity is well understood routine and conventional activity which includes receiving or transmitting data over a network from court case Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016), – see MPEP 2106.05(d) (II)(i)),
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim 2:
Regarding claim 2, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 2 recites the following additional element:
The computer-implemented method of claim 1, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence…, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer with generating operative trace of intermediate states from the operative query to the operative response operation able to be performed by any generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Further, claim 2 recites the following abstract idea:
to generate an operative trace of intermediate states from the operative query to the operative response, (This is considered a mental process, a person can mentally evaluate and generate an operative trace of intermediate states from the operative query to the operative response, see MPEP 2106.04(a)(2)(III)),
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is “directed” to an abstract idea.
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 3:
Regarding claim 3, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 3 recites the following additional element:
The computer-implemented method of claim 1, wherein: the instructive sequence is prepended to the operative query; and the instructive trace comprises a chain of intermediate responses to intermediate queries, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer where the instructive sequence is prepended to the operative query able to be performed by any generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f))
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 4:
Regarding claim 4, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 4 recites the following additional element:
The computer-implemented method of claim 1, wherein the instructive sequence comprises a tokenized representation of a natural language, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer with the instructive sequence comprises a tokenized representation of a natural language performed by any generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 5:
Regarding claim 5, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 5 recites the following additional elements:
The computer-implemented method of claim 1, wherein generating the operative response comprises … by the computing system and using the machine-learned model, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
and … by the computing system,…(In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer with the determining the operative response based on a sample of operative responses performed by any generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Further, claim 5 recites the following abstract ideas.
comprises: generating, … a plurality of operative responses; (this is considered a mental process, since a person can mentally evaluate a plurality of operative responses, see MPEP 2106.04(a)(2)(III)),
determining, …the operative response based on a sample of the plurality of operative responses, (this is considered a mental process, since a person can mentally evaluate the operative response from a sample of operative responses, see MPEP 2106.04(a)(2)(III)),
If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. Accordingly, the claim “recites” an abstract idea.
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 6:
Regarding claim 6, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 6 recites the following additional elements:
The computer-implemented method of claim 1, wherein the operative query is a first query component and the operative response is a first response component, and wherein the method comprises: inputting, by the computing system and to the machine-learned model, the instructive sequence, the first query component, the first response component, and a second query component; (In step 2A, prong 2, this recites mere data gathering, which is considered insignificant extra-solution activity – see MPEP 2106.05(g),). In step 2B, this insignificant extra-solution activity is well understood routine and conventional activity which includes receiving or transmitting data over a network from court case Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) – see MPEP 2106.05(d) (II)(i),
and generating, by the computing system, using the machine-learned model and responsive to the second query component, a second response component, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer with generating a second response component performed by any generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 7:
Regarding claim 7, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 7 recites the following additional element:
The computer-implemented method of claim 1, wherein to pre-train the machine-learned model using the plurality of diversified objectives the machine-learned model has been pre-trained using a plurality of different combinations of configuration parameters of a pretraining objective framework, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer where the model is pre-trained using different combinations of configuration parameters of a pretraining objective framework performed by any generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 8:
Regarding claim 8, it is dependent upon claim 7, and thereby incorporates the limitations of, and corresponding analysis applied to claim 7. Further, claim 8 recites the following additional element:
The computer-implemented method of claim 7, wherein the machine-learned model has been pre-trained on a plurality of corrupted training examples that were generated from one or more training examples, wherein the plurality of corrupted training examples were respectively generated according to the plurality of different combinations of configuration parameters, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer with pre-training a model and generating corrupted training examples according to different combinations of configuration parameters performed by any generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 9:
Regarding claim 9, it is dependent upon claim 8, and thereby incorporates the limitations of, and corresponding analysis applied to claim 8. Further, claim 9 recites the following additional element:
The computer-implemented method of claim 8, wherein the pre-training objectives required to machine-learned model to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer with pre-training objectives used on the model to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples able to be performed by any generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 10:
Regarding claim 10, it is dependent upon claim 7, and thereby incorporates the limitations of, and corresponding analysis applied to claim 7. Further, claim 10 recites the following additional element:
The computer-implemented method of claim 7, wherein the configuration parameters comprise two or more different parameters of: a subportion length parameter, a subportion quantity parameter, or a corruption rate parameter, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 11:
Regarding claim 11, it is dependent upon claim 7, and thereby incorporates the limitations of, and corresponding analysis applied to claim 7. Further, claim 11 recites the following additional elements:
The computer-implemented method of claim 7, wherein the plurality of different combinations of configuration parameters comprise: a distributed configuration configured for generating a plurality of corrupted subportions distributed over a training example; and a sequential configuration configured for generating a corrupted subportion corresponding to a terminus of the training example, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 12:
Regarding claim 12, it is dependent upon claim 7, and thereby incorporates the limitations of, and corresponding analysis applied to claim 7. Further, claim 12 recites the following additional elements:
The computer-implemented method of claim 7, wherein the plurality of different combinations of configuration parameters comprise: a first distributed configuration configured for generating a first plurality of corrupted subportions distributed over a training example; (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
a second distributed configuration configured for generating a second plurality of corrupted subportions distributed over the training example, wherein the second distributed configuration is configured to cause greater corruption of the training example than the first distributed configuration; (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
and a sequential configuration configured for generating a corrupted subportion corresponding to a terminus of the training example, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 13:
Regarding claim 13, it is dependent upon claim 1, and thereby incorporates the limitations of, and corresponding analysis applied to claim 1. Further, claim 13 recites the following additional element:
The computer-implemented method of claim 1, wherein at least one of the plurality of diversified objectives comprises a bidirectional masked language modeling objective, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 14:
Regarding claim 14, in step 1 of the 101-analysis set forth in MPEP 2106, the claim recites
“one or more memory devices storing non-transitory computer-readable instructions for improved prompting of a machine-learned model, the instructions executable to cause one or more processors to perform operations, the operations comprising: obtaining an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response; inputting, to a machine-learned model, the instructive sequence and an operative query, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence, and wherein the machine-learned model has been pre-trained using a plurality of diversified objectives; and generating using the machine-learned model and responsive to the operative query, an operative response,” and a device is considered a machine, which is one of the four statutory categories of invention.
In step 2A prong 1 of the 101-analysis set forth in the MPEP 2106, the examiner has determined that the following limitations recite a process that, under the broadest reasonable interpretation, covers a mental process but for recitation of generic computer components:
…the operations comprising: obtaining an instructive sequence descriptive of an instructive query, an instructive response, (mental process, a person can mentally evaluate and obtain an instructive sequence that describes an instructive query, and a person can mentally evaluate and generate an instructive response, see MPEP 2106.04(a)(2)(III)),
and an instructive trace of intermediate states from the instructive query to the instructive response; (mental process, a person can mentally evaluate and obtain a series of steps to solve a problem (i.e. obtain an instructive trace of intermediate states from the instructive query to the instructive response), see MPEP 2106.04(a)(2)(III)),
and generating … and responsive to the operative query, an operative response, (this is considered a mental process, since a person can mentally evaluate and generate an operative response from an operative query, see MPEP 2106.04(a)(2)(III)),
If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. Accordingly, the claim “recites” an abstract idea.
In step 2A prong 2 of the 101-analysis set forth in MPEP 2106, the examiner has determined that the following additional elements do not integrate this judicial exception into a practical application:
one or more memory devices storing non-transitory computer-readable instructions for improved prompting of a machine-learned model, the instructions executable to cause one or more processors to perform operations, the operations comprising… (This is considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
inputting, to a machine-learned model, the instructive sequence and an operative query, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence, (In step 2A, prong 2, inputting to a model the instructive sequence and an operative query recites mere data gathering, which is considered insignificant extra-solution activity – see MPEP 2106.05(g))
and wherein the machine-learned model has been pre-trained using a plurality of diversified objectives; (The pre-training process with diversified objectives or various tasks is considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
and… using the machine-learned model, (This is considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is “directed” to an abstract idea.
In step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, additional elements iv, vi and vii recites mere instructions to apply the judicial exception using generic computer, which is not indicative of significantly more. The additional element v recites mere data gathering, and is considered insignificant extra-solution activity. In step 2B, this insignificant extra-solution activity is well understood routine and conventional activity which includes receiving or transmitting data over a network from court case Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016), – see MPEP 2106.05(d) (II)(i)),
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim 15:
Regarding claim 15, it is dependent upon claim 14, and thereby incorporates the limitations of, and corresponding analysis applied to claim 14. Further, claim 15 recites the following additional element:
The one or more memory devices of claim 14, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence…, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer with generating an operative trace of intermediate states performed by any generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Further, claim 15 recites the following abstract idea:
…to generate an operative trace of intermediate states from the operative query to the operative response, (this is considered a mental process, since a person can mentally evaluate and generate an operative trace of intermediate states from the operative query to the operative response, see MPEP 2106.04(a)(2)(III)),
If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mathematical concept but for the recitation of generic computer components, then it falls within the mathematical concept grouping of abstract ideas. Accordingly, the claim “recites” an abstract idea.
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 16:
Regarding claim 16, it is dependent upon claim 14, and thereby incorporates the limitations of, and corresponding analysis applied to claim 14. Further, claim 16 recites the following additional element:
The one or more memory devices of claim 14, wherein: the instructive sequence is prepended to the operative query; and the instructive trace comprises a chain of intermediate responses to intermediate queries, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 17:
Regarding claim 17, it is dependent upon claim 14, and thereby incorporates the limitations of, and corresponding analysis applied to claim 14. Further, claim 17 recites the following additional element:
The one or more memory devices of claim 14, wherein to pre-train the machine-learned model using the plurality of diversified objectives the machine-learned model has been pre-trained using a plurality of different combinations of configuration parameters of a pretraining objective framework, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 18:
Regarding claim 18, it is dependent upon claim 17, and thereby incorporates the limitations of, and corresponding analysis applied to claim 17. Further, claim 18 recites the following
The one or more memory devices of claim 17, wherein the machine-learned model has been pre-trained on a plurality of corrupted training examples that were generated from one or more training examples, wherein the plurality of corrupted training examples were respectively generated according to the plurality of different combinations of configuration parameters, (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 19:
Regarding claim 19, it is dependent upon claim 14, and thereby incorporates the limitations of, and corresponding analysis applied to claim 14. Further, claim 19 recites the following additional element:
The one or more memory devices of claim 14, wherein at least one of the plurality of diversified objectives comprises a bidirectional masked language modeling objective., (In step 2A, prong 2, this is considered mere instructions to apply an exception using generic computer, see MPEP 2106.05(f)). (In step 2B, this is also considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 20:
Regarding claim 20, in step 1 of the 101-analysis set forth in MPEP 2106, the claim recites “a computing system for improved prompting of a machine-learned model, the system comprising: one or more processors; and one or more memory devices storing non-transitory computer-readable instructions that are executable to cause the one or more processors to perform operations, the operations comprising: obtaining a chain of thought prompt comprising an instructive trace through a series of intermediate states; inputting, to a machine-learned model, the chain of thought prompt, wherein the machine-learned model has been pre-trained using a plurality of diversified objectives; and generating using the machine-learned model and responsive to the chain of thought prompt, an operative response,” and a system or machine is one of the four statutory categories of invention.
In step 2A prong 1 of the 101-analysis set forth in the MPEP 2106, the examiner has determined that the following limitations recite a process that, under the broadest reasonable interpretation, covers a mental process but for recitation of generic computer components:
the operations comprising: obtaining a chain of thought prompt comprising an instructive trace through a series of intermediate states; (this is considered a mental process, a person can mentally evaluate and obtain a series of steps to solve a problem (i.e. obtain a chain of thought prompt comprising an instructive trace of intermediate states), see MPEP 2106.04(a)(2)(III)),
and generating… and responsive to the chain of thought prompt, an operative response, (this is considered a mental process, a person can mentally evaluate and generate an operative response from the chain of thought prompt, see MPEP 2106.04(a)(2)(III)),
If claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process but for the recitation of generic computer components, then it falls within the mental process grouping of abstract ideas. Accordingly, the claim “recites” an abstract idea.
In step 2A prong 2 of the 101-analysis set forth in MPEP 2106, the examiner has determined that the following additional elements do not integrate this judicial exception into a practical application:
a computing system for improved prompting of a machine-learned model, the system comprising: one or more processors; and one or more memory devices storing non-transitory computer-readable instructions that are executable to cause the one or more processors to perform operations, (This is considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f))
inputting, to a machine-learned model, the chain of thought prompt, (In step 2A, prong 2, inputting a prompt (or a record of data) recites mere data gathering, which is considered insignificant extra-solution activity – see MPEP 2106.05(g)),
wherein the machine-learned model has been pre-trained using a plurality of diversified objectives; (This is considered mere instructions to apply an exception using generic computer – see MPEP 2106.05(f)),
and … using the machine-learned model …, (This is considered mere instructions to apply an exception using generic computer, since the computer system can generate an output of an operative query and a response – see MPEP 2106.05(f)),
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is “directed” to an abstract idea.
In step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, additional elements iii, v, and vi recite mere instructions to apply the judicial exception using generic computer components, which are not indicative of significantly more. The additional element iv recites mere data gathering, and is considered insignificant extra-solution activity. In step 2B, this insignificant extra-solution activity is well understood routine and conventional activity which includes receiving or transmitting data over a network from court case Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016), – see MPEP 2106.05(d) (II)(i)),
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-6, 14, 20:
Claims 1, 2, 3, 4, 5, 6, 14, and 20 of the instant application are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 2, 3 and 4, 6, 8, 13, 19, and 20, respectively of U.S. Patent No. 12346828 in view of reference Reynolds L. et al. “Prompt Programming for Large Language Models: Beyond the Few Shot Paradigm”, available in the October 6, 2023 IDS, and at https://arxiv.org/pdf/2102.07350, published on February 15, 2021, (hereafter, REYNOLDS). See table map for more information.
Table map:
Application 18-160776
Patent Reference: US 12346828
1. A computer-implemented method for improved prompting of a machine-learned model, the method comprising:
obtaining, by a computing system comprising one or more processors, an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response;
inputting, by the computing system and to the machine-learned model, the instructive sequence and an operative query, wherein the machine-learned model has been pre-trained using a plurality of diversified objectives; and
generating, by the computing system, using the machine-learned model and responsive to the operative query, an operative response.
1. A computer-implemented method for performing image analysis, the method comprising:
obtaining, by a computing system comprising one or more processors, an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response;
inputting, by the computing system and to a machine-learned model, the instructive sequence and an operative image processing query comprising image data, wherein the machine-learned model is configured to process the operative image processing query with attention over the instructive sequence; and
generating, by the computing system, using the machine-learned model and responsive to the operative image processing query, an operative image processing response.
2. The computer-implemented method of claim 1, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence to generate an operative trace of intermediate states from the operative query to the operative response.
2. The computer-implemented method of claim 1, comprising:
generating, by the computing system, using the machine-learned model and responsive to the operative image processing query, an operative trace of intermediate states from the operative query to the operative image processing response. /
1…inputting, by the computing system and to a machine-learned model, the instructive sequence and an operative image processing query comprising image data, wherein the machine-learned model is configured to process the operative image processing query with attention over the instructive sequence; …
3. The computer-implemented method of claim 1, wherein:
the instructive sequence is prepended to the operative query; and
the instructive trace comprises a chain of intermediate responses to intermediate queries.
3. The computer-implemented method of claim 1, wherein the instructive sequence is prepended to the operative image processing query. /
4. The computer-implemented method of claim 2, wherein the instructive trace comprises a chain of intermediate responses to intermediate queries,
4. The computer-implemented method of claim 1, wherein the instructive sequence comprises a tokenized representation of a natural language.
6. The computer-implemented method of claim 1, wherein the instructive sequence comprises a tokenized representation of a natural language.
5. The computer-implemented method of claim 1, wherein generating the operative response comprises:
generating, by the computing system and using the machine-learned model, a plurality of operative responses; and
determining, by the computing system, the operative response based on a sample of the plurality of operative responses.
8. The computer-implemented method of claim 1, wherein generating the operative response comprises:
generating, by the computing system and using the machine-learned model, a plurality of operative responses; and
determining, by the computing system, the operative image processing response based on a sample of the plurality of operative responses.
6. The computer-implemented method of claim 1, wherein the operative query is a first query component and the operative response is a first response component, and wherein the method comprises:
inputting, by the computing system and to the machine-learned model, the instructive sequence, the first query component, the first response component, and a second query component;
and generating, by the computing system, using the machine-learned model and responsive to the second query component, a second response component.
13. The computer-implemented method of claim 1, wherein the operative image processing query is a first query component and the operative image processing response is a first response component, and wherein the method comprises:
inputting, by the computing system and to the machine-learned model, the instructive sequence, the first query component, the first response component, and a second query component; and
generating, by the computing system, using the machine-learned model and responsive to the second query component, a second response component.
14. One or more memory devices storing non-transitory computer-readable instructions for improved prompting of a machine-learned model, the instructions executable to cause one or more processors to perform operations, the operations comprising:
obtaining an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response;
inputting, to a machine-learned model, the instructive sequence and an operative query, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence, and wherein the machine-learned model has been pre-trained using a plurality of diversified objectives; and
generating using the machine-learned model and responsive to the operative query, an operative response.
19. One or more memory devices storing non-transitory computer-readable instructions executable to cause one or more processors to perform operations for performing image analysis, the operations comprising:
obtaining an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response;
inputting, to a machine-learned model, the instructive sequence and an operative image processing query comprising image data, wherein the machine-learned model is configured to process the operative image processing query with attention over the instructive sequence; and
generating, using the machine-learned model and responsive to the operative image processing query, an operative image processing response.
20. A computing system for improved prompting of a machine-learned model, the system comprising:
one or more processors; and
one or more memory devices storing non-transitory computer-readable instructions that are executable to cause the one or more processors to perform operations, the operations comprising:
obtaining a chain of thought prompt comprising an instructive trace through a series of intermediate states;
inputting, to a machine-learned model, the chain of thought prompt, wherein the machine-learned model has been pre-trained using a plurality of diversified objectives; and
generating using the machine-learned model and responsive to the chain of thought prompt, an operative response.
20. A computing system for performing image analysis, the system comprising:
one or more processors; and
one or more memory devices storing non-transitory computer-readable instructions that are executable to cause the one or more processors to perform operations, the operations comprising:
obtaining an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response;
inputting, to a machine-learned model, the instructive sequence and an operative image processing query comprising image data, wherein the machine-learned model is configured to process the operative image processing query with attention over the instructive sequence; and
generating, using the machine-learned model and responsive to the operative image processing query, an operative image processing response.
However, U.S. Patent No. 12346828 cited above fails to explicitly teach the limitations in italics from the table.
As shown in the table map, the methods of instant claim 1 are largely identical to the methods of claim 1 of the reference patent, except that claim 1 of the instant application recites “improved prompting of a machine-learned model” and “wherein the machine-learned model has been pre-trained using a plurality of diversified objectives;”
As shown in the table map, the system and components of instant claim 14 are largely similar to the system and components of claim 19 of the reference patent, except that claim 14 of the instant application recites “improved prompting of a machine-learned model” and “wherein the machine-learned model has been pre-trained using a plurality of diversified objectives;”
As shown in the table map, the system components and operations of instant claim 20 are largely similar to those of claim 20 of the reference patent, except that claim 20 of the instant application recites “inputting, to a machine-learned model, the chain of thought prompt, wherein the machine-learned model has been pre-trained using a plurality of diversified objectives,” and similar elements regarding “ the chain of thought”.
In particular, in the same field, analogous art REYNOLDS teaches “improved prompting of a machine-learned model”, “chain of thought”, and “wherein the machine-learned model has been pre-trained using a plurality of diversified objectives”.
See REYNOLDS in page 7 and in figure 3, that a "metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem."
PNG
media_image1.png
204
534
media_image1.png
Greyscale
Here, the operative query is What is f(f(3))?, and the instructive sequence is the steps the model takes to solve the problem, which breaks the problem into steps f(3*3) = 3*3*3 = 27. In figure 3 and page 7, REYNOLDS illustrates the model uses a chain of thought prompt to solve a math problem.
Further, REYNOLDS talks in page 2, section 2 Related work, about the model “GPT-3’s capabilities through demonstrations of it writing fiction, poetry, navy seals copypasta parodies, and performing tasks like PDF cleaning… using dialogues to prompt GPT-3 (via AI Dungeon) to break problems into steps and follow procedures such as brute force checking [13, 14], achieving impressive results on math problems.” Here, REYNOLDS teaches that the machine learning model is set up to develop a broad set of skills and pattern recognition abilities from math problems in page 7, word problems to generating text in page 2 for improved prompting, which relate to pre-trained with diversified objectives.
It would have been obvious to one of ordinary skill before the effective filing date, to modify the teachings of the instant application’s inventors Wei J. et al. with the above teachings of REYNOLDS in order to provide a “more efficient and informative in context, both from the perspective of a human and a language model” (REYNOLDS, page 5, section 4.3. Task specification by demonstration).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 3, 4, 7, 14, 15, 16, 17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Reynolds L. et al. “Prompt Programming for Large Language Models: Beyond the Few Shot Paradigm”, available in the October 6, 2023 IDS, and at https://arxiv.org/pdf/2102.07350, published on February 15, 2021, (hereafter, REYNOLDS), in view of REZA S. et al. (US PG Pub. No. US US20230237277A1), filed on January 25, 2022, (hereafter, REZA).
Claim 1:
Referring to claim 1, REYNOLDS teaches “a computer-implemented method for improved prompting of a machine-learned model, the method comprising: obtaining, by a computing system comprising one or more processors, an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response;”
See REYNOLDS in page 7, section 4.7 Metaprompt programming, and in figure 3, describing, that a "metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem." Here, the instructive query is What is f(f(3))?, and the instructive trace of intermediate states is part of the steps the model takes to solve the problem, which breaks the problem into steps with f(3*3) = 3*3*3,
PNG
media_image1.png
204
534
media_image1.png
Greyscale
In figure 3 and page 7, REYNOLDS describes a metaprompt is similar to an instructive trace of intermediate states followed by an instructive response in a model.
Further, REYNOLDS teaches “inputting, by the computing system and to the machine-learned model, the instructive sequence and an operative query, wherein the machine-learned model has been pre-trained using a plurality of diversified objectives;”
See REYNOLDS in page 7 and in figure 3, that a "metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem."
PNG
media_image1.png
204
534
media_image1.png
Greyscale
Here, the operative query is What is f(f(3))?, and the instructive sequence is the steps the model takes to solve the problem, which breaks the problem into f(3*3) = 3*3*3. In figure 3 and page 7, REYNOLDS illustrates the instructive sequence and serial explanation, where the model uses a metaprompt to demonstrate the task (i.e. relates to operative trace of intermediate states from the operative query), to predict a response to the problem (i.e. an operative response).
Further, REYNOLDS talks in page 2, section 2 Related work, about the model “GPT-3’s capabilities through demonstrations of it writing fiction, poetry, navy seals copypasta parodies, and performing tasks like PDF cleaning… using dialogues to prompt GPT-3 (via AI Dungeon) to break problems into steps and follow procedures such as brute force checking [13, 14], achieving impressive results on math problems.” Here, REYNOLDS teaches that the machine learning model is set up to develop a broad set of skills and pattern recognition abilities from math problems in page 7, word problems to generating text in page 2, which relate to handling diversified objectives.
Further, REYNOLDS teaches “generating, by the computing system, using the machine-learned model and responsive to the operative query, an operative response,”
See REYNOLDS in figures 3-5, page 7, describe that a “metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem. Metaprompt examples (Figs 3-5) were generated with GPT-3 using OpenAI’s API..”
PNG
media_image2.png
792
340
media_image2.png
Greyscale
Here, REYNOLDS illustrates examples, from arithmetic calculations to word problems, of an operative response that the model performs from receiving a math question that relates to an operative query.
However, REYNOLDS fail to teach “…obtaining, by a computing system comprising one or more processors…”
In an analogous system, REZA teaches “…obtaining, by a computing system comprising one or more processors…”
See REZA in paragraphs [0004 – 0005] describing "techniques are provided (e.g., a method, a system, non-transitory computer-readable medium storing code or instructions executable by one or more processors) for dynamically developing a contextual set of prompts based on relevant aspects extracted from s set of training data. In various embodiments, a method is provided for that comprises: obtaining, by a computing system, a set of training data comprising text examples and associated labels, wherein the labels comprise: (i) text labels that relate to possible solutions for a task to be learned by a machine learning language model, and (ii) specified solution labels for the task; extracting, by the computing system, aspects from the set of training data… " Here, REZA shows that the operations such as improved prompting of a machine-learned model can be run by a computing system and a non-transitory computer-readable medium that stores instructions and is implemented by one or more processors.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to combine the base reference of REYNOLDS and incorporate into the teachings of REZA because both references teach methods of improving prompting in machine learning models.
One of ordinary skill in the art would be motivated to do so because "these approaches provide an enormous benefit for custom models where developing the custom models with the prompting framework can provide near state of the art performance with as low as sixteen data points. Advantageously, this is close to a 100× reduction in training time." (REZA, paragraph [0028]), “suggesting prompt-based learning enables a significant leap in training efficiency” (REZA, paragraph [0003]).
Claim 2:
Regarding claim 2, REYNOLDS in view of REZA teaches the limitations in claim 1.
Further, REYNOLDS teaches “the computer-implemented method of claim 1, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence to generate an operative trace of intermediate states from the operative query to the operative response,”
See REYNOLDS in page 7, section 4.7 Metaprompt programming, and in figure 3, describing, that a "metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem."
PNG
media_image1.png
204
534
media_image1.png
Greyscale
Here, the instructive query is What is f(f(3))?, and the instructive trace of intermediate states is part of the steps the model takes to solve the problem, which breaks the problem into steps with f(3*3) = 3*3*3,
In figure 3 and page 7, REYNOLDS describes a metaprompt is similar to an instructive trace of intermediate states followed by an instructive response in a model.
Here, REYNOLDS teaches that the machine learning model is set up to develop a broad set of skills and pattern recognition abilities from math problems to word problems, which relate to handling diversified objectives. In figure 3 and page 7, REYNOLDS illustrates the instructive sequence and serial explanation, where the model uses a metaprompt to demonstrate the task (i.e. relates to operative trace of intermediate states from the operative query, to predict a response to the problem (i.e. an operative response).
Claim 3:
Regarding claim 3, REYNOLDS in view of REZA, teaches the limitations in claim 1.
Further, REYNOLDS teaches "the instructive trace comprises a chain of intermediate responses to intermediate queries,"
See REYNOLDS in page 7 and in figure 3, that a "metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem."
PNG
media_image1.png
204
534
media_image1.png
Greyscale
Here, the operative query is What is f(f(3))?, and the instructive sequence is the steps the model takes to solve the problem, which breaks the problem into f(3*3) = 3*3*3. In figure 3 and page 7, REYNOLDS describes a metaprompt is similar to an instructive sequence followed by an operative query existing simultaneously within the model. Here, REYNOLDS teaches that before answering the query for the user, the model explains the steps of its thinking process with reasoning before generating the response similar to a chain of thought process (i.e. instructive trace comprises a chain of intermediate responses to intermediate queries).
Further, REYNOLDS does explicitly disclose, however REZA teaches "the computer-implemented method of claim 1, wherein: the instructive sequence is prepended to the operative query;"
See paragraphs [0039-0040], where Reza describes "the original input 105 is then concatenated with the respective generated prompting template 115 to create a prompting function 120. For example, the original input 105 and the prompting template 115 may be used as input for a concatenate function configured to join the two text strings into a single text string: the original input 105; the prompting template 115, such that the two text strings are now linked or associated with one another. The model is then trained on the prompting functions 120 to predict a solution for the given task. The training comprises formulating the task as a masked language modeling problem with the prompting templates 115 and the expected output for the blanks being set as the labels. More specifically, the blank in each of the prompting functions 120 is populated with the labels to create a set of prompting functions for each of the text examples. For example, for the original input 105—the labels may comprise: terrible, bland, flavorful, delicious, disgusting, and sour; where terrible and disgusting are associate with a negative sentiment class, flavorful and delicious are associated with a positive sentiment class, and bland and sour are associate with a neutral sentiment class. The blanks in the prompting template 110 may be populated with the labels such that the following prompting templates are generated for the original input 105: The food was terrible, The food was bland, The food was flavorful, The food was delicious, The food was disgusting, and The food was sour. As should be understood, a similar populating process may be performed for the service-related template shown in FIG. 1." Here, REZA shows an input of a prompting template used to concatenate to the original input (i.e. instructive sequence is prepended to the operative query), and join two text strings into a single text string, and then this becomes the input for the model to train on to predict a solution of the task.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to combine the base reference of REYNOLDS and incorporate into the teachings of REZA because both references teach methods of improving prompting in machine learning models.
One of ordinary skill in the art would be motivated to do so because "these approaches provide an enormous benefit for custom models where developing the custom models with the prompting framework can provide near state of the art performance with as low as sixteen data points. Advantageously, this is close to a 100× reduction in training time." (REZA, paragraph [0028]), “suggesting prompt-based learning enables a significant leap in training efficiency” (REZA, paragraph [0003]).
Claim 4:
Regarding claim 4, REYNOLDS in view of REZA teaches the limitations in claim 1.
Further, REYNOLDS teaches "the computer-implemented method of claim 1, wherein the instructive sequence comprises a tokenized representation of a natural language,"
See REYNOLDS in page 4, section 4.1 The dynamics of language, where the model "GPT-3 was trained in a self-supervised setting on hundreds of gigabytes of natural language [3]. Self-supervision is a form of unsupervised learning in which ground truth labels are derived from the data itself. In the case of GPT-3, the ground truth label assigned to each example was simply the token that came next in the original source. The ground truth function which GPT-3 approximates, then, is the underlying dynamic that determined what tokens came next in the original source." Further, see REYNOLDS in page 7 and in figure 3, that a "metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem."
PNG
media_image1.png
204
534
media_image1.png
Greyscale
Here, the operative query is What is f(f(3))?, and the instructive sequence is the steps the model takes to solve the problem, which breaks the problem into f(3*3) = 3*3*3,
Overall, REYNOLDS teaches about using a model GPT-3 that generates an instructive sequence in figure 3 that also includes mentions tokens that came next in the original source (i.e. a tokenized representation of natural language) in page 4 of REYNOLDS.
Claim 7:
Regarding claim 7, REYNOLDS in view of REZA, teaches the limitations in claim 1.
Further, REZA teaches “the computer-implemented method of claim 1, wherein to pre-train the machine-learned model using the plurality of diversified objectives the machine-learned model has been pre-trained using a plurality of different combinations of configuration parameters of a pretraining objective framework,”
See REZA in paragraph [0051], where REZA mentions that "the training process for the aspect model 245 includes selecting hyperparameters for the aspect model 245 and performing iterative operations of inputting aspect examples from the subset of aspect examples 250 into the aspect model 245 to find a set of model parameters (e.g., weights and/or biases) that minimizes a cost function(s) such as loss or error function for the aspect model 245 . The hyperparameters are settings that can be tuned or optimized to control the behavior of the aspect model 245 . Most models explicitly define hyperparameters that control different features of the models such as memory or cost of execution. However, additional hyperparameters may be defined to adapt the aspect model 245 to a specific scenario. For example, the hyperparameters may include the number of hidden units of a model, the learning rate of a model, the convolution kernel width, or the number of kernels for a model.", Here, REZA talks about using hyperparameters, which are a part of a set of model parameters and can be tuned or optimized to control model behavior. The hyperparameters here relate to configuration parameters, and is part of the training the model.
Further, see REZA describe in paragraph [0062] that “the fine-tuning system 215 comprises an inference framework 260. The inference framework 260 executes a fine-tuning process on a pre-trained model. The fine-tuning process includes re-training a pre-trained language model using a user's own custom data. As a result of the fine-tuning process, the model parameters of the original pre-trained language model are updated to account for the characteristics of the domain data and the task the user is interested in. For example, a user's own custom data can be stored in the natural language text store 240 as sets of data and associated labels. The sets of data may be associated with one or more domains such as financial, health care, food and service, transportation, education, and the like. In some instances, text examples within a set of training data…” This shows REZA teaches applying the model to various domains of sets of data such as financial, education, food and service, and etc., which corresponds to the plurality of diversified objectives the machine-learned model has been pre-trained on.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to combine the base reference of REYNOLDS and incorporate into the teachings of REZA because both references teach methods of improving prompting in machine learning models.
One of ordinary skill in the art would be motivated to do so because "these approaches provide an enormous benefit for custom models where developing the custom models with the prompting framework can provide near state of the art performance with as low as sixteen data points. Advantageously, this is close to a 100× reduction in training time." (REZA, paragraph [0028]), “suggesting prompt-based learning enables a significant leap in training efficiency” (REZA, paragraph [0003]).
Claim 14:
Regarding claim 14, REYNOLDS teaches “…the operations comprising: obtaining an instructive sequence descriptive of an instructive query, an instructive response, and an instructive trace of intermediate states from the instructive query to the instructive response;”
See REYNOLDS in page 7, section 4.7 Metaprompt programming, and in figure 3, describing, that a "metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem."
PNG
media_image1.png
204
534
media_image1.png
Greyscale
Here, the instructive query is What is f(f(3))?, and the instructive trace of intermediate states is part of the steps the model takes to solve the problem, which breaks the problem into steps with f(3*3) = 3*3*3,
In figure 3 and page 7, REYNOLDS describes a metaprompt is similar to an instructive trace of intermediate states followed by an instructive response in a model.
Further, Further, REYNOLDS teaches “inputting, to a machine-learned model, the instructive sequence and an operative query, wherein the machine-learned model is configured to process the operative query with attention over the instructive sequence, and wherein the machine-learned model has been pre-trained using a plurality of diversified objectives,”
See REYNOLDS in page 7 and figure 3, that a "metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem."
PNG
media_image1.png
204
534
media_image1.png
Greyscale
Here, the operative query is What is f(f(3))?, and the instructive sequence is the steps the model takes to solve the problem, which breaks the problem into f(3*3) = 3*3*3. In figure 3 and page 7, REYNOLDS describes a metaprompt is similar to an instructive sequence followed by an operative query existing simultaneously within the model. In figures 3,4, and 5, REYNOLDS illustrates examples of pre-trained with a plurality of diversified objectives, where problems can range from math problems to word analogy problems indicating various tasks that the model performs.
PNG
media_image2.png
792
340
media_image2.png
Greyscale
Further, REYNOLDS talks in page 2, section 2 Related work, about the model “GPT-3’s capabilities through demonstrations of it writing fiction, poetry, navy seals copypasta parodies, and performing tasks like PDF cleaning… using dialogues to prompt GPT-3 (via AI Dungeon) to break problems into steps and follow procedures such as brute force checking [13, 14], achieving impressive results on math problems.” Here, REYNOLDS teaches that the machine learning model is set up to develop a broad set of skills and pattern recognition abilities from math problems, word problems to generating text, which relate to handling diversified objectives.
Further, REYNOLDS teaches “and generating using the machine-learned model and responsive to the operative query, an operative response.”
See REYNOLDS in figures 3-5, page 7, describe that a “metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem. Metaprompt examples (Figs 3-5) were generated with GPT-3 using OpenAI’s API..” Here, REYNOLDS illustrates examples, from arithmetic calculations to word problems, of an operative response that the model performs from receiving a math question that relates to an operative query.
However, REYNOLDS did not explicitly teach “one or more memory devices storing non-transitory computer-readable instructions for improved prompting of a machine-learned model, the instructions executable to cause one or more processors to perform operations, …”
In an analogous art, REZA teaches “One or more memory devices storing non-transitory computer-readable instructions for improved prompting of a machine-learned model, the instructions executable to cause one or more processors to perform operations, …”
See REZA in paragraph [0004] describing that “techniques are provided (e.g., a method, a system, non-transitory computer-readable medium storing code or instructions executable by one or more processors) for dynamically developing a contextual set of prompts based on relevant aspects extracted from s set of training data.” REZA shows that the operations such as improved prompting of a machine-learned model can be run by non-transitory computer-readable medium that stores instructions and is implemented by one or more processors.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to combine the base reference of REYNOLDS and incorporate into the teachings of REZA because both references teach methods of improving prompting in machine learning models.
One of ordinary skill in the art would be motivated to do so because "these approaches provide an enormous benefit for custom models where developing the custom models with the prompting framework can provide near state of the art performance with as low as sixteen data points. Advantageously, this is close to a 100× reduction in training time." (REZA, paragraph [0028]), “suggesting prompt-based learning enables a significant leap in training efficiency” (REZA, paragraph [0003]).
Claim 15:
Regarding claim 15, REYNOLDS in view of REZA teaches the limitations in claim 14.
Referring to claim 15, the claim recites similar limitations as corresponding claim 2 and is rejected for similar reasons as claim 2 using similar teachings and rationale.
Claim 16:
Regarding claim 16, REYNOLDS in view of REZA teaches the limitations in claim 14.
Referring to claim 16, the claim recites similar limitations as corresponding claim 3 and is rejected for similar reasons as claim 3 using similar teachings and rationale.
Claim 17:
Regarding claim 17, REYNOLDS in view of REZA teaches the limitations in claim 14.
Referring to claim 17, the claim recites similar limitations as corresponding claim 7 and is rejected for similar reasons as claim 7 using similar teachings and rationale.
Claim 20:
Regarding claim 20, REYNOLDS teaches “a computing system for improved prompting of a machine-learned model, the system comprising: one or more processors; and one or more memory devices storing non-transitory computer-readable instructions that are executable to cause the one or more processors to perform operations, the operations comprising: obtaining a chain of thought prompt comprising an instructive trace through a series of intermediate states;”
See REYNOLDS in page 7, section 4.7 Metaprompt programming, and in figure 3, describing, that a "metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem."
PNG
media_image1.png
204
534
media_image1.png
Greyscale
Here, the instructive query is What is f(f(3))?, and the instructive trace of intermediate states is part of the steps the model takes to solve the problem, which breaks the problem into steps with f(3*3) = 3*3*3, which relates to a series of intermediate states. In figure 3 and page 7, REYNOLDS describes a metaprompt is similar to a chain of thought prompt that comprises of an instructive trace through a series of intermediate states in a model.
Further, REYNOLDS teaches “inputting, to a machine-learned model, the chain of thought prompt, wherein the machine-learned model has been pre-trained using a plurality of diversified objectives;”
See REYNOLDS in page 7 and in figure 3, that a "metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem."
PNG
media_image1.png
204
534
media_image1.png
Greyscale
Here, the operative query is What is f(f(3))?, and the instructive sequence is the steps the model takes to solve the problem, which breaks the problem into steps f(3*3) = 3*3*3 = 27. In figure 3 and page 7, REYNOLDS illustrates the model uses a chain of thought prompt to solve a math problem.
Further, REYNOLDS talks in page 2, section 2 Related work, about the model “GPT-3’s capabilities through demonstrations of it writing fiction, poetry, navy seals copypasta parodies, and performing tasks like PDF cleaning… using dialogues to prompt GPT-3 (via AI Dungeon) to break problems into steps and follow procedures such as brute force checking [13, 14], achieving impressive results on math problems.” Here, REYNOLDS teaches that the machine learning model is set up to develop a broad set of skills and pattern recognition abilities from math problems in page 7, word problems to generating text in page 2, which relate to pre-trained with diversified objectives.
Further, REYNOLDS teaches “and generating using the machine-learned model and responsive to the chain of thought prompt, an operative response,”
See REYNOLDS in figures 3-5, page 7, describe that a “metaprompt may be something as short as a phrase such as 'This problem asks us to,' a seemingly innocuous fragment which, by prompting for a statement of the problem's intention, sets the stage for a serial explanation of a procedure to solve the problem. Metaprompt examples (Figs 3-5) were generated with GPT-3 using OpenAI’s API..”
PNG
media_image2.png
792
340
media_image2.png
Greyscale
Here, REYNOLDS illustrates examples, from arithmetic calculations to word problems, of an operative response that the model performs from receiving a math question or word problem that relates to an operative query. REYNOLDS teaches a chain of thought prompt and an operative response, where the model states to the user ‘Let’s solve this problem by breaking it into steps’ and later illustrates the steps in detail along with the correct answer selection, which correspond to responsive to the chain of thought prompt, an operative response.
However, REYNOLDS fail to teach “a computing system for improved prompting of a machine-learned model, the system comprising: one or more processors; and one or more memory devices storing non-transitory computer-readable instructions that are executable to cause the one or more processors to perform operations,”
In an analogous system, REZA teaches “a computing system for improved prompting of a machine-learned model, the system comprising: one or more processors; and one or more memory devices storing non-transitory computer-readable instructions that are executable to cause the one or more processors to perform operations,”
See REZA in paragraphs [0004 – 0005] describing "techniques are provided (e.g., a method, a system, non-transitory computer-readable medium storing code or instructions executable by one or more processors) for dynamically developing a contextual set of prompts based on relevant aspects extracted from s set of training data. In various embodiments, a method is provided for that comprises: obtaining, by a computing system, a set of training data comprising text examples and associated labels, wherein the labels comprise: (i) text labels that relate to possible solutions for a task to be learned by a machine learning language model, and (ii) specified solution labels for the task; extracting, by the computing system, aspects from the set of training data… " Further, REZA describes in paragraph [0135] of “in some implementations, system memory 810 may include multiple different types of memory, such as static random access memory (SRAM) or dynamic random access memory (DRAM). In some implementations, a basic input/output system (BIOS), containing the basic routines that help to transfer information between elements within computer system 800 , such as during start-up, may typically be stored in the ROM. By way of example, and not limitation, system memory 810 also illustrates application programs 812, which may include client applications, Web browsers, mid-tier applications, relational database management systems (RDBMS), etc., program data 814 , and an operating system 816.” Here, REZA shows that the operations such as improved prompting of a machine-learned model can be run by a computing system and a non-transitory computer-readable medium that stores instructions and is implemented by one or more processors, and one or more memory devices.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to combine the base reference of REYNOLDS and incorporate into the teachings of REZA because both references teach methods of improving prompting in machine learning models.
One of ordinary skill in the art would be motivated to do so because "these approaches provide an enormous benefit for custom models where developing the custom models with the prompting framework can provide near state of the art performance with as low as sixteen data points. Advantageously, this is close to a 100× reduction in training time." (REZA, paragraph [0028]), “suggesting prompt-based learning enables a significant leap in training efficiency” (REZA, paragraph [0003]).
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over REYNOLDS in view of REZA, further in view of Brown T. et al., “Language Models are Few-Shot Learners”, provided in the October 6, 2023 IDS, and available at https://arxiv.org/pdf/2005.14165, published on July 22, 2020, (hereafter, BROWN),
Claim 5:
Regarding claim 5, REYNOLDS in view of REZA, teaches the limitations in claim 1.
Further, REYNOLDS teaches " the computer-implemented method of claim 1, wherein generating the operative response comprises: … determining, by the computing system, the operative response based on a sample of the plurality of operative responses, "
see REYNOLDS in Figure 5, and in page 2, section 3, where REYNOLDS describes the model "GPT-3 must rely primarily, if not entirely, on the knowledge of vocabulary and grammar of both the source and target languages embedded in its trained weights. Rather than viewing these tasks as few-shot-learning, we will explicitly show that these prompts primarily direct the model to access existing knowledge. We do so by investigating whether examples (training samples) are even necessary." Here, REYNOLDS mentions using samples of the training examples of various model responses. In figure 5, REYNOLDS determines a correct answer (i.e. operative response) based on a sample of answer selections (i.e. operative responses).
PNG
media_image3.png
704
676
media_image3.png
Greyscale
However, REYNOLDS in view of REZA did not explicitly teach "generating, by the computing system and using the machine-learned model, a plurality of operative responses," In an analogous field, BROWN teaches "generating, by the computing system and using the machine-learned model, a plurality of operative responses, "
See BROWN in page 25, where BROWN describes "the dataset used to train GPT-3 is much less weighted towards news articles, so trying to generate news articles via raw unconditional samples is less effective – for example GPT-3 often interprets the proposed first sentence of a “news article” as a tweet and then posts synthetic responses or follow-up tweets. To solve this problem we employed GPT-3’s few-shot learning abilities by providing three previous news articles in the model’s context to condition it. With the title and subtitle of a proposed next article, the model is able to reliably generate short articles in the 'news' genre." Here, BROWN talks about using a machine-learning model GPT-3 to generate responses or follow up tweets (i.e. plurality of operative responses).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of REYNOLDS and REZA, and incorporate into the teachings of BROWN because all references teach using methods to improve prompting for machine learning models.
One of ordinary skill in the art would be motivated to do so because by integrating BROWN’s framework into the methods of REYNOLDS and REZA to incorporate “models like GPT-3 …can be surprisingly efficient once trained: even with the full GPT-3 175B, generating 100 pages of content from a trained model can cost on the order of 0.4 kW-hr, or only a few cents in energy costs…” (BROWN, page 39, section 6.3 Energy usage). This combination would also bring users to “adopt a paradigm of training single, large-scale models, then creating more efficient versions of them for use in appropriate contexts” (BROWN, page 39, section 6.3 Energy usage).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over REYNOLDS in view of REZA, further in view of Magliozzi, A. et al. (US PG Pub No. US11128579), published on September 21, 2021, (hereafter, MAGLIOZZI).
Claim 6:
Regarding claim 6, REYNOLDS in view of REZA, teaches the limitations in claim 1.
However, REYNOLDS in view of REZA, fail to teach
“The computer-implemented method of claim 1, wherein the operative query is a first query component and the operative response is a first response component, and wherein the method comprises: inputting, by the computing system and to the machine-learned model, the instructive sequence, the first query component, the first response component, and a second query component,”
“generating, by the computing system, using the machine-learned model and responsive to the second query component, a second response component,”
In an analogous system, MAGLIOZZI teaches “the computer-implemented method of claim 1, wherein the operative query is a first query component and the operative response is a first response component, and wherein the method comprises: inputting, by the computing system and to the machine-learned model, the instructive sequence, the first query component, the first response component, and a second query component,”
See MAGLIOZZI in column 1, lines 35-59 describe “in response to the first query, the processor transmits a question included in a workflow to the user via the user interface. The workflow can be generated, e.g., by the processor, based at least in part on the model. In response to the question, the processor obtains at least a first response from the user via the user interface. The method also includes storing, in a memory coupled to the processor, at least the first response as a second set of data to train the chatbot. The processor can update the model based on the second set of data. The method also includes training the chatbot based at least in part on the model via the processor. In some cases, training the chatbot further comprises obtaining a second query from the user via the user interface. In response to the second query, the processor generates a second response to the user based on the model. In some instances, the method can further include generating a confidence value for the second response via the processor. The confidence value indicates a probability with which the second response is an accurate response to the second query.” Here, MAGLIOZZI teaches inputting by the computing system a question included in a workflow (i.e. the instructive sequence), as well as the first query component, the first response component, and a second query component.
Further, MAGLIOZZI teaches “generating, by the computing system, using the machine-learned model and responsive to the second query component, a second response component,”
See MAGLIOZZI describe in column 9, lines 52-67, "In step 214, the chatbot updates the language model based on the answer obtained from the user. For example, in step 208, if the chatbot asked the user a question regarding the user's children and in step 210, the user responds to the question stating that the user has three children, then the processor can analyze the language model to determine if the language model includes the number of children for that user. If the language model does not include this information, the processor can update the language model accordingly. In step 216, the chatbot is trained based on the language model. For example, the chatbot, can be trained based on supervised learning or unsupervised learning. As the chatbot obtains more information from the user, the language model is updated and trained, thereby making the model more comprehensive." Note the examiner construes the process of query and response to mean question and answer, and construes the term ‘second’ to mean updated or additional information the model gathers. See column 1, lines 35-59 in MAGLIOZZI for more information. Here, MAGLIOZZI teaches the language model is updated based on the answer obtained from the user, and in column 1, lines 35-59, MAGLIOZZI mentions a second query and second response, (i.e. generating, by the computing system, using the machine-learned model and responsive to the second query component, a second response component).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine references of REYNOLDS and REZA, and incorporate into the teachings of MAGLIOZZI because all references teach using methods to improve prompting for machine learning models.
One of ordinary skill in the art would be motivated to do so because by integrating MAGLIOZZI’s framework into the methods of REYNOLDS and REZA, one with ordinary skill in the art would achieve the goal of a model that “can generalize categories from the categories it was trained on so the category encodings improve continuously as knowledge is added. And the entire encoding process in the neural network can also be improved with periodic retraining.” (MAGLIOZZI, column 14, lines 43-47).
Claims 8, 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over REYNOLDS, further in view of REZA, and further in view of Ren S. et al. (US PG Pub No. US 20220335216 A1), filed on April 7, 2021, (hereafter, REN).
Claim 8:
Regarding claim 8, REYNOLDS in view of REZA teaches the limitations in claim 7.
However, REYNOLDS in view of REZA, fail to teach the limitation, “the computer-implemented method of claim 7, wherein the machine-learned model has been pre-trained on a plurality of corrupted training examples that were generated from one or more training examples, wherein the plurality of corrupted training examples were respectively generated according to the plurality of different combinations of configuration parameters.”
In an analogous system, REN teaches “the computer-implemented method of claim 7, wherein the machine-learned model has been pre-trained on a plurality of corrupted training examples that were generated from one or more training examples, wherein the plurality of corrupted training examples were respectively generated according to the plurality of different combinations of configuration parameters,”
See REN describe in paragraphs [0033-0035] about "the phrase 'altered' shall be understood to cover embodiments of one or more words in the altered sequence being corrupted, masked, or changed from corresponding words in a text sequence, one or more words missing in the altered sequence, or any combination thereof. In one or more embodiments, the altered sequence 105 is obtained from a text sequence (e.g., w1 , w2 , w3 , w4 , and w5 ) with binary masks (e.g., m) applied on at least one word (e.g., w3 ). In one or more embodiments, the word covered with the binary mask is absent in the altered sequence 105 ....At least one of the prior networks, the encoder and the decoder may be trained ( 220 ) using an objective involving at least the decoder output and a ground-truth label constructed from the text sequence and the binary masks." Here, REN teaches the process of training using text sequence and binary masks, where the masks relate to the corrupted training examples and were generated according to the plurality of different combinations of configuration parameters from the model.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the references of REYNOLDS and REZA with the teachings of REN by using REYNOLDS and REZA’s teachings of methods of improved prompting for machine learning models, with REN’s teaching of using corrupted training examples generated from different combinations of configuration parameters.
One of ordinary skill in the art would be motivated to do so because by integrating REN’s framework into the methods of REYNOLDS and REZA, one with ordinary skill in the art would achieve the goal of providing a method “flexible to aggregate structural knowledge among the words, and may improve the representation of learned word representations.” (REN, paragraph [0036]) and help “regularize the dynamic of embedding vectors and improve model's training stability” (REN, paragraph [0075)).
Claim 9:
Regarding claim 9, REYNOLDS in view of REZA, further in view of REN teaches the limitations in claim 8.
Further, REZA teaches “The computer-implemented method of claim 8, wherein the pre-training objectives required to machine-learned model to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples,”
See REZA in paragraph [0026] that "prompt-based learning generally makes use of a template (discretized) or a vector based approach (continuous) to perform recall better during inference. However, the number of templates or vectors that can be created are finite and not context dependent. For example, consider the review “I went to the movies yesterday. No reason to watch”, one possible (conventional) prompt template is “It was <mask>”. Thereafter, the prompt template can be appended to the review and formulated as a language modeling tasks: “I went to the movies yesterday. No reason to watch. It was <mask>”, where the <mask>(or blank) can be filled-in by words like terrible, horrible, terrifying, excellent, exciting, disturbing, sad, funny, etc. The finite nature of the templates (or vectors) and the absence of context in the prompt: It was <mask> limit the model's ability to capture the domain specific knowledge for a given training example." Note, the examiner construes corrupted subportions to mean any occurrence when a word is altered, blank, missing, or left out of a sentence. Here, REZA teaches using blank or missing sections of text shown as <mask> as corrupted subportions that limit the model's ability to capture the domain specific knowledge for a given training example. After letting the model view missing or masked corrupted sections of text from training examples, the model fills in the corrupted sections with uncorrupted words such as an adjective like excellent, exciting, sad, funny, etc. By using the model to predict and fill in regular words or uncorrupted sections of text relevant to the context of the paragraph or sentence that correspond to previous corrupted sections of text, REZA teaches using the machine-learned model to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples.
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to combine the base reference of REYNOLDS and incorporate into the teachings of REZA because both references teach methods of improving prompting in machine learning models.
One of ordinary skill in the art would be motivated to do so because "these approaches provide an enormous benefit for custom models where developing the custom models with the prompting framework can provide near state of the art performance with as low as sixteen data points. Advantageously, this is close to a 100× reduction in training time." (REZA, paragraph [0028]), “suggesting prompt-based learning enables a significant leap in training efficiency” (REZA, paragraph [0003]).
Claim 18:
Regarding claim 18, REYNOLDS in view of REZA teaches the limitations in claim 17.
Referring to claim 18, the claim recites similar limitations as corresponding claim 8 and is rejected for similar reasons as claim 8 using similar teachings and rationale.
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over REYNOLDS, further in view of REZA, and further in view of RAFFEL C. et al. “Exploring the limits of transfer learning with a unified text-to-text transformer”, published in July 28, 2020, available at https://arxiv.org/pdf/1910.10683v3 , and provided in the October 6, 2023 IDS.
Claim 10:
Regarding claim 10, REYNOLDS in view of REZA teaches the limitations in claim 7.
However, REYNOLDS in view of REZA fail to teach “the computer-implemented method of claim 7, wherein the configuration parameters comprise two or more different parameters of: a subportion length parameter, a subportion quantity parameter, or a corruption rate parameter,”
In an analogous art, RAFFEL teaches “the computer-implemented method of claim 7, wherein the configuration parameters comprise two or more different parameters of: a subportion length parameter, a subportion quantity parameter, or a corruption rate parameter,”
See RAFFEL in page 23, section 3.3.4 where RAFFEL mentions "when multiple consecutive tokens have been corrupted, they are treated as a “span” and a single unique mask token is used to replace the entire span. Replacing entire spans with a single token results in unlabeled text data being processed into shorter sequences. Since we are using an i.i.d. corruption strategy, it is not always the case that a significant number of corrupted tokens appear consecutively. As a result, we might obtain additional speedup by specifically corrupting spans of tokens rather than corrupting individual tokens in an i.i.d. manner. Corrupting spans was also previously considered as a pre-training objective for BERT, where it was found to improve performance (Joshi et al., 2019). To test this idea, we consider an objective that specifically corrupts contiguous, randomly-spaced spans of tokens. This objective can be parametrized by the proportion of tokens to be corrupted and the total number of corrupted spans. The span lengths are then chosen randomly to satisfy these specified parameters. For example, if we are processing a sequence of 500 tokens and we have specified that 15% of tokens should be corrupted and that there should be 25 total spans, then the total number of corrupted tokens would be 500×0.15 = 75 and the average span length would be 75/25 = 3. Note that given the original sequence length and corruption rate, we can equivalently parametrize this objective either by the average span length or the total number of spans." Here, RAFFEL teaches about the configuration parameters including having a percentage of tokens that should be corrupted (i.e. a corruption rate), a sequence length (i.e. subportion length parameter), and a span length (i.e. subportion quantity parameter).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to combine the references of REYNOLDS and REZA incorporate into the teachings of RAFFEL because all references teach methods of improving prompting in machine learning models.
One of ordinary skill in the art would be motivated to do so because the method “produced marginally better performance (Table 7) while being slightly more computationally efficient due to shorter target sequence lengths” (RAFFEL, page 36, section 3.7, ‘Putting it all together, objective’).
Claims 11, 12, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over REYNOLDS, in view of REZA, and further in view of Mim F. et al., “Corruption Is Not All Bad: Incorporating Discourse Structure Into Pre-Training via Corruption for Essay Scoring”, available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9451631, published on June 10, 2021, (hereafter, MIM).
Claim 11:
Regarding claim 11, REYNOLDS in view of REZA teaches the limitations in claim 7.
However, REYNOLDS in view of REZA did not teach the limitations:
“the computer-implemented method of claim 7, wherein the plurality of different combinations of configuration parameters comprise: a distributed configuration configured for generating a plurality of corrupted subportions distributed over a training example,”
“and a sequential configuration configured for generating a corrupted subportion corresponding to a terminus of the training example,”
In an analogous system, MIM teaches “the computer-implemented method of claim 7, wherein the plurality of different combinations of configuration parameters comprise: a distributed configuration configured for generating a plurality of corrupted subportions distributed over a training example,”
See MIM in abstract on page 2202 mention "in this paper, we propose an unsupervised pre-training approach to capture discourse structure of essays in terms of coherence and cohesion that does not require any discourse parser or annotation. We introduce several types of token, sentence and paragraph-level corruption techniques for our proposed pre-training approach and augment masked language modeling pre-training with our pre-training method to leverage both contextualized and discourse information." MIM also in page 2207, illustrates corruption examples used for training models,
PNG
media_image4.png
564
1262
media_image4.png
Greyscale
and in figure 3 shows various types of corruption examples within different essays. Here, MIM introduces different types of corruption at the sentence and paragraph level, including a distributed configuration for generating corrupted subportions over a training example.
Further, MIM teaches “and a sequential configuration configured for generating a corrupted subportion corresponding to a terminus of the training example,”
See MIM illustrates in pages 2202-2203, section I. Introduction and in figure 1 where "an example of the relation between coherence, cohesion, and an essay’s Organization is shown in Fig. 1. The high-scored essay (i.e., Essay (a) with an Organization score of 4) first states its position regarding the prompt and then provides several reasons to strengthen the claim. The essay is considered coherent because it follows a logical order that makes the writer’s position and arguments very clear...Furthermore, Essay (a) has cohesive markers (e.g., “in connection with,” “as a conclusion”) at the beginning of paragraphs which helps the reader understand the flow of ideas throughout the essay. Thus, it is considered as a cohesive essay. However, Essay (c) should have some cohesive markers at the beginning of fifth paragraph (e.g., “moreover,” “besides”) and sixth paragraph (e.g., “therefore,” “hence”) to connect the ideas between paragraphs, but it doesn’t have such cohesive markers. In addition, there is no cohesive marker at the beginning of the last paragraph (e.g., “in conclusion”) to indicate that the author is summing up their opinions which makes the last paragraph slightly disconnected from former paragraphs. Due to the absence of these cohesive markers, it is difficult to understand the arguments of the essay and connections between them. Therefore, Essay (c) is considered as an incohesive essay." Note, the examiner construes the term ‘terminus’ to mean the end of a section of text.
PNG
media_image5.png
636
1334
media_image5.png
Greyscale
Here, MIM mentions the cohesive markers help essays be comprehensible, but removing these markers render the essays to be less coherent. MIM shows altering the ending phrases of an essay to be blank or missing to train the model to predict the missing text at the end of an essay as part of the masked language modeling objective. The examiner construes a sequential configuration to be defined from paragraph [0161] of the application’s specification of a sequence of uncorrupted input followed by a single span of corrupted input. In figure 1, MIM teaches a sequential configuration in essays (b) and (c). The last few sections of text have an absence of cohesive markers in figure 1 correlates with the terminus or end of the essay, which relates to corrupted subportion of an end of the essay (i.e. terminus) of the training example.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the references of REYNOLDS and REZA with the teachings of MIM by using the teachings of REYNOLDS and REZA of methods of improved prompting for machine learning models, with MIM’s teaching of sequential and distributed configurations for generating corrupted sub-portions of training examples.
One of ordinary skill in the art would be motivated to do so because by integrating MIM’s framework into the methods of REYNOLDS and REZA, one with ordinary skill in the art would achieve the goal of providing a method where “results also show that the combination of MLM pre-trained document encoder and paragraph level discourse corruption pre-training is effective for capturing the discourse of essay Organization. The combination of these two can handle both global and local coherence” (MIM, page 2213, section Conclusion).
Claim 12:
Regarding claim 12, REYNOLDS in view of REZA teaches the limitations in claim 7.
However, REYNOLDS in view of REZA did not teach
“a first distributed configuration configured for generating a first plurality of corrupted subportions distributed over a training example;”
“a second distributed configuration configured for generating a second plurality of corrupted subportions distributed over the training example, wherein the second distributed configuration is configured to cause greater corruption of the training example than the first distributed configuration;”
“and a sequential configuration configured for generating a corrupted subportion corresponding to a terminus of the training example.”
However, in an analogous art, MIM teaches “a first distributed configuration configured for generating a first plurality of corrupted subportions distributed over a training example,”
See MIM in page 2207 describing “1) Sentence Corruption (SC): This group has 2 different types of corruption. In Complete Sentence Shuffle (C-Sent), all the sentences of a document are shuffled. In Moderate Sentence Shuffle (M-Sent), only a subset of the sentences of a document are shuffled.” Figure 3 shows examples as follows
PNG
media_image4.png
564
1262
media_image4.png
Greyscale
MIM in page 2206 mentions “The pre-training is done in two steps. First, the document encoder is pre-trained with large-scale, unlabeled essays from various corpora. Second, the encoder is fine-tuned on the unlabeled essays of the target corpus (essay Organization scoring corpus). We expect that this fine-tuning alleviates the domain mismatch between the large-scale essays and target essays (e.g., essay length).” Here, page 2206-2207 and figure 3 shows a sentence corruption arrangement, which relates to a first distributed configuration to create corrupted subportions over a training example.
Further, MIM teaches “a second distributed configuration configured for generating a second plurality of corrupted subportions distributed over the training example, wherein the second distributed configuration is configured to cause greater corruption of the training example than the first distributed configuration,”
See MIM in page 2207 describes using a "Discourse Indicator Corruption (DIC): We corrupt DIs since they represent the logical connection between sentences. For example, “Mary did well although she was ill” is logically connected, but “Mary did well but she was ill.” and “Mary did well. She was ill.” lack logical sequencing because of improper and lack of DI usage, respectively. We perform two types of DI corruption. In Complete Discourse Indicator Shuffle (C-DI), we shuffle all the discourse indicators of a document. In Moderate Discourse Indicator Shuffle (M-DI), we first select 50% of unique DIs in a document and randomly shuffle each of their instances in a document." Here, MIM describes using the discourse indicator corruption method on at least a few examples of shuffling sentences' order around. This produces one configuration of a sentence that is logically connected as seen in Mary did well although she was ill, but results in a second configuration that contain a lack of logical sequence within the sentence, which results in a greater corruption of the training example than the first example sentence or the first configuration.
Further, MIM teaches “and a sequential configuration configured for generating a corrupted subportion corresponding to a terminus of the training example,”
See MIM illustrates in pages 2202-2203, section I. Introduction and in figure 1 where "An example of the relation between coherence, cohesion, and an essay’s Organization is shown in Fig. 1. The high-scored essay (i.e., Essay (a) with an Organization score of 4) first states its position regarding the prompt and then provides several reasons to strengthen the claim. The essay is considered coherent because it follows a logical order that makes the writer’s position and arguments very clear...Furthermore, Essay (a) has cohesive markers (e.g., “in connection with,” “as a conclusion”) at the beginning of paragraphs which helps the reader understand the flow of ideas throughout the essay. Thus, it is considered as a cohesive essay. However, Essay (c) should have some cohesive markers at the beginning of fifth paragraph (e.g., “moreover,” “besides”) and sixth paragraph (e.g., “therefore,” “hence”) to connect the ideas between paragraphs, but it doesn’t have such cohesive markers. In addition, there is no cohesive marker at the beginning of the last paragraph (e.g., “in conclusion”) to indicate that the author is summing up their opinions which makes the last paragraph slightly disconnected from former paragraphs. Due to the absence of these cohesive markers, it is difficult to understand the arguments of the essay and connections between them. Therefore, Essay (c) is considered as an incohesive essay." Here, MIM mentions altering the ending phrases of an essay to be blank or missing to train the model to predict the missing text at the end of an essay as part of the masked language modeling objective, which relates to corrupted subportion of an end of the essay (i.e. terminus) of the training example.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the references of REYNOLDS and REZA with the teachings of MIM by using the teachings of REYNOLDS and REZA of methods of improved prompting for machine learning models, with MIM’s teaching of first and second distributed configurations and sequential configurations for generating corrupted sub-portions of training examples.
One of ordinary skill in the art would be motivated to do so because by integrating MIM’s framework into the methods of REYNOLDS and REZA, one with ordinary skill in the art would achieve the goal of providing a method where “results also show that the combination of MLM pre-trained document encoder and paragraph level discourse corruption pre-training is effective for capturing the discourse of essay Organization. The combination of these two can handle both global and local coherence” (MIM, page 2213, section Conclusion).
Claim 13:
Regarding claim 13, REYNOLDS in view of REZA teaches the limitations in claim 1.
However, REYNOLDS in view of REZA fail to teach “the computer-implemented method of claim 1, wherein at least one of the plurality of diversified objectives comprises a bidirectional masked language modeling objective,”
In an analogous system, MIM teaches "the computer-implemented method of claim 1, wherein at least one of the plurality of diversified objectives comprises a bidirectional masked language modeling objective, "
See MIM in page 2205, section C. Pre-Trained Language Models and Document Representation Learning, where MIM mentions that "lately, transformer-based pre-trained models have achieved significant performance gain in different document-level downstream tasks of NLP. .. fine-tuned BERT [25] for several document classification tasks and demonstrated that knowledge can be distilled from BERT to small bidirectional LSTMs which provides competitive results at a low computational expense." MIM also "presented a strategy to pre-train hierarchical bidirectional transformer encoders for document representation. They randomly masked sentences of documents and predicted those masked sentences with their proposed architecture, a hierarchical fusion of Transformer-based [24] sentence and document encoders." Here, MIM teaches the different document-level downstream tasks of NLP (i.e. plurality of diversified objectives), which comprises of bidirectional transformer models that randomly masked sentences of documents and predicted those masked sentences with their proposed architecture (i.e. a bidirectional masked language modeling objective).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the references of REYNOLDS and REZA with the teachings of MIM by using the teachings of REYNOLDS and REZA of methods of improved prompting for machine learning models, with MIM’s teaching of using diversified objectives that comprises a bidirectional masked language modeling objective.
One of ordinary skill in the art would be motivated to do so because by integrating MIM’s framework into the methods of REYNOLDS and REZA, one with ordinary skill in the art would achieve the goal of providing a method where “results also show that the combination of MLM pre-trained document encoder and paragraph level discourse corruption pre-training is effective for capturing the discourse of essay Organization. The combination of these two can handle both global and local coherence” (MIM, page 2213, section Conclusion), and “provides competitive results at a low computational expense” (MIM, page 2205, section C. Pre-Trained Language Models and Document Representation Learning).
Claim 19:
Regarding claim 19, REYNOLDS in view of REZA teaches the limitations in claim 14.
Referring to claim 19, the claim recites similar limitations as corresponding claim 13 and is rejected for similar reasons as claim 13 using similar teachings and rationale.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WENWEI ZENG whose telephone number is (571)272-7111. The examiner can normally be reached Monday-Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached at (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/WenWei Zeng/Examiner, Art Unit 2146
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146