DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/14/2026 has been entered.
Response to Arguments
Applicant’s arguments with respect to claims 1, 2, 4-9, 11-16 and 18-20 have been considered but are moot because of the new ground of rejection in view of Lester, Sewak and Agarwal.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 2, 4-9, 11-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lester (US PG Pub 20230325725) in view of Sewak (US PG Pub 20220414137) and further in view of Agarwal (US PG Pub 20210097339).
As per claims 1, 8 and 15, Lester discloses:
A method, a non-transitory machine-readable medium comprising a plurality of machine-readable instructions, and a system, comprising: a non-transitory memory (Lester; p. 0081 - The model trainer 160 includes computer logic utilized to provide desired functionality. The model trainer 160 can be implemented in hardware, firmware, and/or software controlling a general purpose processor. For example, in some implementations, the model trainer 160 includes program files stored on a storage device, loaded into a memory and executed by one or more processors. In other implementations, the model trainer 160 includes one or more sets of computer-executable instructions that are stored in a tangible computer-readable storage medium such as RAM hard disk or optical or magnetic media); and one or more hardware processors coupled to the non-transitory memory and configured to read instructions from the non-transitory memory (Lester; p. 0081) to cause the system to perform a method comprising: receiving, via a data interface, an input text sequence (Lester; p. 0085 – input text; see also p. 0108); generating, by the label modular prompt generator, a plurality of label prompts based on the set of class labels of interest, wherein the generating the plurality of label prompts includes: concatenating a first class label and a first sequence of soft tokens that are generated based on representations associated with the first class label into a first label prompt (Lester; p. 0136 - Given a series of n tokens, {x.sub.0, x.sub.1, . . . , x.sub.n}, the T5 model can embed the tokens, forming a matrix X.sub.eϵcustom-character where e is the dimension of the embedding space. The soft-prompts can be represented as a parameter P.sub.eϵcustom-character, where p is the length of the prompt. In some implementations, the prompt can then be concatenated to the embedded input forming a single matrix [P.sub.e; X.sub.e]custom-character which then flows through the encoder-decoder as normal. The models can be trained to maximize the probability of Y, but only the prompt parameters P.sub.e may be updated; see also p. 0135), wherein the soft tokens are tunable using the plurality of parameters of the label modular prompt generator (Lester; p. 0252-0256 - prompt tuning multi-task training… combining a general prompt and task specific prompt… Moreover, the combination can include a shared prompt that is the same value for each task and a set of N task prompts, one for each task…); providing, by the label modular prompt generator, an input of the input text sequence prepended with the plurality of label prompts to a pretrained language model (Lester p. 0135 - Prompting can be the approach of adding extra information for the model to condition on during generation of Y. Normally, prompting can be completed by prepending a series of tokens, P, to the input X, such that the model maximizes the likelihood of the correct Y, p.sub.θ(Y|[P;X]), while keeping the model parameters, θ, fixed; see also p. 0154-0155 - Prompt tuning can use a single prompt representation that is prepended to the embedded input); generating, by the pretrained language model, a task output in response to the input text sequence (Lester; p. 0108 - …the prompt ensembling 500 can include a pre-trained machine-learned model 510 that is operable to process the input data 502 and the prompts 504, 506, & 508 to generate the output data 512, 514, & 516; see also p. 0115 - The particular task can be a natural language processing task); and performing a first training process to the label modular prompt generator while the pretrained language model is frozen (Lester; p. 0116 - The plurality of pre-trained parameters for the pre-trained machine-learned model can be fixed during prompt tuning (e.g., the pre-trained machine-learned model can be frozen such that the parameters are not adjusted during training of the prompt parameters); see also p. 0102 & p. 0105 - prompt tuning 304 can retain the strong task performance of model tuning 302, while keeping the pre-trained model frozen, enabling efficient multitask serving). Sewak does teach determining, by a label modular prompt generator having a plurality of parameters, a set of class labels of interest from a set of possible class labels associated with the input text sequence (Sewak; p. 0061 - The disclosure describes systems and methods to train text classification models without the need of representative labelled data or human grader's assistance to create representations that could otherwise be used directly or indirectly to create representative data conducive to train a Natural Language Processing (NLP) or a text classification model that could map/classify a candidate input text across one or more classes (class-labels) of interest). Therefore, it would have been obvious to one of ordinary skill in the art to modify the method non-transitory machine-readable medium and system of Lester to include determining, by a label modular prompt generator having a plurality of parameters, a set of class labels of interest from a set of possible class labels associated with the input text sequence, as taught by Sewak, in order to provide good performance while, for example, making use of a model trained primarily in an unsupervised mode, without requiring a large number of manual user-input examples of a label class. Because the input and computer training requirements of the labelling service are far less resource intensive than typical, the computerized system provides a technical improvement of requiring less computer processing to render a result (Sewak; p. 0007). Furthermore, Lester in view of Sewak fail to disclose wherein for a first instance of the first training dataset, a first plurality of class labels of interest are randomly sampled from the set of possible class labels and used in a first forward pass for the first instance, wherein for a second instance of the first training dataset, a second plurality of class labels of interest are randomly sampled from the set of possible class labels and used in a second forward pass for the second instance, wherein the first plurality of class labels of interest and the second plurality of class labels of interest are different. Agarwal does teach wherein for a first instance of the first training dataset, a first plurality of class labels of interest are randomly sampled from the set of possible class labels and used in a first forward pass for the first instance, wherein for a second instance of the first training dataset, a second plurality of class labels of interest are randomly sampled from the set of possible class labels and used in a second forward pass for the second instance, wherein the first plurality of class labels of interest and the second plurality of class labels of interest are different (Agarwal; p. 0046 - In another example, the PU learning technique creates a training dataset (first training dataset) that includes all labels 220 and a random sample of unlabeled edges 218 and/or node pairs 212 (first plurality of class labels of interest… randomly sampled). The PU learning technique trains machine learning model 208 using the training dataset and subsequently applies the trained machine learning model 208 to remaining unlabeled edges 218 and/or node pairs 212 to generate predictions 226 of confidence in the relationship for the remaining samples (first forward pass). The PU learning techniques repeats the process of training machine learning model 208 using samples with positive labels 220 and randomly selected samples assigned negative labels (second plurality of class labels different from the first plurality of class labels), followed by using the trained machine learning model 208 to generate predictions 226 remaining samples (second forward pass). After a certain number of predictions 226 are generated by multiple versions of machine learning model 208 for a given unlabeled sample, an edge score for the sample is obtained as the average of all predictions 226 produced for the sample by the versions). Therefore, it would have been obvious to one of ordinary skill in the art to modify the method non-transitory machine-readable medium and system of Lester and Sewak to include wherein for a first instance of the first training dataset, a first plurality of class labels of interest are randomly sampled from the set of possible class labels and used in a first forward pass for the first instance, wherein for a second instance of the first training dataset, a second plurality of class labels of interest are randomly sampled from the set of possible class labels and used in a second forward pass for the second instance, wherein the first plurality of class labels of interest and the second plurality of class labels of interest are different, as taught by Agarwal, because by propagating labels 220 for relationships (or other attributes) across edges 218 in a network based on machine learning predictions of confidences in the relationships and the structure of the network, the disclosed embodiments combine attributes of edges 218 and topological features of the network to infer likelihoods of the relationships for non-labeled edges 218 in the network. The inferred likelihoods are thus more accurate than conventional machine learning techniques that do not account for topological features, network effect, and/or homophily among users or entities within a given “neighborhood” in the network (e.g., entities that are within a certain number of hops from one another in the network) (Agarwal; p. 0062).
As per claims 2, 9 and 16, Lester in view of Sewak and Agarwal disclose: The method, non-transitory machine-readable medium, and system of claims 1, 8 and 15, further comprising: performing an inference process using the label modular prompt generator and the pretrained language model (Lester; p. 0103 - Prompt tuning 204 may only rely on storing a small task-specific prompt for each task and may enable mixed-task inference using the original pre-trained model). And further, Sewak discloses wherein the set of class labels of interest includes the set of possible class labels (Sewak; p. 0061 - The disclosure describes systems and methods to train text classification models without the need of representative labelled data or human grader's assistance to create representations that could otherwise be used directly or indirectly to create representative data conducive to train a Natural Language Processing (NLP) or a text classification model that could map/classify a candidate input text across one or more classes (class-labels) of interest). Therefore, it would have been obvious to one of ordinary skill in the art to modify the method non-transitory machine-readable medium and system of Lester to include wherein the set of class labels of interest includes the set of possible class labels, as taught by Sewak, in order to provide good performance while, for example, making use of a model trained primarily in an unsupervised mode, without requiring a large number of manual user-input examples of a label class. Because the input and computer training requirements of the labelling service are far less resource intensive than typical, the computerized system provides a technical improvement of requiring less computer processing to render a result (Sewak; p. 0007).
As per claims 4, 11 and 18, Lester in view of Sewak and Agarwal disclose: The method, non-transitory machine-readable medium, and system of claims 1, 8 and 15, wherein the performing the first training process includes: computing a loss objective based on the task output and the set of class labels of interest (Lester; p. 0117 - In some implementations, the prompt gradient can be determined by evaluating a loss function that is evaluated based on a difference between the training output and the one or more training labels. The loss function can include a perceptual loss or another loss function. In some implementations, the labels can include ground truth outputs for the respective training examples; see also p. 0077); and updating the plurality of parameters of the label modular prompt generator based on the computed loss objective via backpropagation (Lester; p. 0116 - The plurality of pre-trained parameters for the pre-trained machine-learned model can be fixed during prompt tuning (e.g., the pre-trained machine-learned model can be frozen such that the parameters are not adjusted during training of the prompt parameters); see also p. 0102 & p. 0105 - prompt tuning 304 can retain the strong task performance of model tuning 302, while keeping the pre-trained model frozen, enabling efficient multitask serving).
As per claims 5, 12 and 19, Lester in view of Sewak and Agarwal disclose: The method, non-transitory machine-readable medium, and system of claims 4, 11 and 18, wherein the loss objective includes a subset invariant loss that accommodate the set of class labels of interest that varies during training (Lester; p. 0229 - In some implementations, meta-prompt creation may involve joint training. In some implementations, the systems and methods for meta-prompt tuning can involve inputting a meta-prompt and one of the aggregated datasets into the model. A single meta-prompt variable may be initialized. The variable can be unique, dataset-independent (subset invariant), and may be updated as we train. The prompt variable and one of several aggregated datasets may be fed into our prompt generation model. The model can either be shared or disjointed from the pretrained frozen model that may take the output of this model (e.g., the generated prompt) as input. The model and the frozen model may be initialized to match in order to have a shared language understanding; see also p. 0231 - A loss can then be calculated, and the error can be backpropagated all the way back to the meta-prompt producing a gradient. The labels from the example batch are used to calculate a loss and do backpropagation through both networks, all the way back to the meta-prompt).
As per claims 6, 13 and 20, Lester in view of Sewak and Agarwal disclose: The method, non-transitory machine-readable medium, and system of claims 1, 8 and 15, wherein the first training process uses a first training dataset associated with a first plurality of class labels (Lester; p. 0229), further comprising: after the first training process, performing a second training process using a second training dataset associated with a second plurality of class labels, wherein representations for the second plurality of class labels are initialized using representations for the first plurality of class labels (Lester; p. 0232 - The meta-prompt can then be updated based on its gradient, and the process can be repeated again. Each iteration can use a different dataset, a different batch of examples, a different result of any sampling aggregation, etc).
As per claims 7 and 14, Lester in view of Sewak and Agarwal disclose: The method and non-transitory machine-readable medium of claims 6 and 13, wherein the set of possible class labels of the second training process includes the first plurality of class labels and the second plurality of class labels (Lester; p. 0234 - The generated meta-prompt can then be utilized for model inference and prompt generation. For example, a few-shot dataset can be input as multiple (example, label) pairs into a model with a meta-prompt in order to generate a prompt, which can be used to solve a task defined by the (example, label) pairs).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon includes:
Vu (US PG Pub 20230115321) discloses techniques for customizing or fine-tuning a pre-trained version of a machine-learning model that includes multiple layers and is configured to process audio or textual language input. Each of the multiple layers is configured with a plurality of layer-specific pre-trained parameter values corresponding to a plurality of parameters, and each of the multiple layers is configured to implement multi-head attention. An incomplete subset of the multiple layers is identified for which corresponding layer-specific pre-trained parameter values are to be fine-tuned using a client data set. The machine-learning model is fine-tuned using the client data set to generate an updated version of the machine-learning model, where the layer-specific pre-trained parameter values configured for each layer of one of more of the multiple layers not included in the incomplete subset are frozen during the fine-tuning. Use of the updated version of the machine-learning model is facilitated (Vu; Abstract).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139. The examiner can normally be reached Monday - Friday 9-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 5712727602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RODRIGO A CHAVEZ/Examiner, Art Unit 2658
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658