Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 02/02/2025 has been entered.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al (20220343205) in view of Barkan et al (20210182489).
As per claim 1, Yu et al (20220343205) teaches a system comprising:
a processor; and a computer-readable medium storing instructions that are operative upon execution by the processor to (as processor and computer readable medium storing the executable instructions – para 0074 – 0076):
identify training samples from a dataset via active learning using a teacher model (as teacher version of the machine learning model – para 0015); generate a first group of soft labels for the training samples using a large language machine learning model (LLM – using the teacher model as a larger heavier network – para 0060), the LLM being independent of
training the student model using the training samples, with the first group of soft labels, the student model being configured to output class membership probabilities for input samples (as, using probabilities to compare for results of the language model – para 0086, as applied to training the student model – para 0070, 0068) ;
evaluate a performance metric of the trained student model based on human-annotated ground truth samples (as, conventional systems perform ground-truth labels – para 0064); upon determining that, that the performance metric is below a threshold level, identify additional training samples from the dataset using the teacher model (and using the human annotated ground truth samples in training the teacher model – para 0064);
using the LLM, generating a first group of soft labels and generate a second group of soft labels for the identified additional training samples (as, using a smaller subgroup to re-prompt the model – para 0085); and retrain the student model using at least the training samples and the additional training samples with the second group of labels (as using user input – para 0096, and using human supervision to updates the teacher version as well as the distillation of the student model – para 0020 – examiner notes, that the “distillation process” includes labeling and human input into the training of the student model – see para 0020 – “incrementally finetuning the student version” – and the student version is trained using labels – see para 0015, 0025). Barkan et al (20210182489) teaches a BERT LLM –see para 0053, as well as 2 teacher models – para 0047, and a student model – see Fig. 3). Therefore, it would have been obvious to one of ordinary skill in the art of distributed language models, to modify the model structure of Yu et al (20220343205) to a LLM, multiple teacher models, and a teacher-student model structure, as noted above in Barkan et al (20210182489), because it would advantageously provide a separating of the sentence embedding and the sentence similarity score, so that the similarity score calculation is more efficient/faster (see Barkan et al (20210182489) , para 0019, 0073).
As per claim 2, the combination of Yu et al (20220343205) in view of Barkan et al (20210182489) teaches the system of claim 1, wherein the instructions are further operative to: cause a user interface (UI) to be displayed on a display device (Yu et al (20220343205), as GUI presenting a visual representation of the data – para 0095),
the UI including a graph comprising data points, each of the data points representing a training sample from the training samples (Yu et al (20220343205), as, showing the change in data –para 0095; wherein the data refers to monitoring of certain data to identify potential bias, errors, and unintended outcomes – para 0094);
receive second user input indicating selection of a first data point; cause to be displayed sample data associated with the first data point; and receive third user input identifying a label for the first data point, thereby causing the first data point to become a human-annotated training sample of the additional training sample used to retrain the student model (Yu et al (20220343205), as, using human supervision to updates the teacher version as well as the distillation of the student model – para 0020 – examiner notes, that the “distillation process” includes labeling and human input into the training of the student model – see para 0020 – “incrementally finetuning the student version” – and the student version is trained using labels – see para 0015, 0025).
As per claim 3, the combination of Yu et al (20220343205) in view of Barkan et al (20210182489) teaches the system of claim 2, wherein the instructions are further operative to: in response to receiving the second user input indicating selection of the first data point, prompt the LLM to generate a label recommendation for the first data point (Yu et al (20220343205), LLM – using the teacher model as a larger heavier network – para 0060) (Yu et al (20220343205), deriving labels for the training samples – as, the student version generates programmed labels for data snippets – para 0015), wherein causing to be displayed sample data associated with the first data point includes causing the label recommendation to be displayed (Yu et al (20220343205), as GUI presenting a visual representation of the data – para 0095) .
As per claim 4, the combination of Yu et al (20220343205) in view of Barkan et al (20210182489) teaches the system of claim 1, wherein the instructions are further operative to: cause a user interface (UI) to be displayed on a display device, the UI including a graph comprising data points, each of the data points representing a training sample from the training samples (Yu et al (20220343205), as, showing the change in data –para 0095; wherein the data refers to monitoring of certain data to identify potential bias, errors, and unintended outcomes – para 0094);
receive second user input indicating selection of a region of the graph; identify data points occurring within the region; cause the UI to display sample data for each of the data points occurring within the region; and receive additional user input identifying a label for each of the data points (Yu et al (20220343205), as, when the user constantly watches the display – para 0095, monitoring the constant updated display – para 0094; and choosing which sections, based on the display, that needs user/human input to increase the accuracy of the models – see para 0100 – human intervention, and human updating of the labels on the student models – para 0015).
As per claim 5, the combination of Yu et al (20220343205) in view of Barkan et al (20210182489) teaches the system of claim 1, wherein the instructions are further operative to: perform iterations of student model retraining;
at each of the iterations of student model retraining: compare a current performance metric of a current student model to a previous performance metric of a prior student model, thereby identifying a performance differential (Yu et al (20220343205), as, measuring performance and improving performance by increasing accuracy in the programmed labels compare with pseudo labels – para 0038);
and based on the comparison, add an additional soft labeled training sample to the training samples when the performance differential is above a threshold and add an additional human-labeled training sample to the training samples when the performance differential is below the threshold (Yu et al (20220343205), as, using human supervision to updates the teacher version as well as the distillation of the student model – para 0020 – examiner notes, that the “distillation process” includes labeling and human input into the training of the student model – see para 0020 – “incrementally finetuning the student version” – and the student version is trained using labels – see para 0015, 0025).
As per claim 6, the combination of Yu et al (20220343205) in view of Barkan et al (20210182489) teaches the system of claim 1, wherein the instructions are further operative to: determine, using the student model, a class membership probability for a first sample belonging to a first class (Yu et al (20220343205), as, using probabilities to compare for results of the language model – para 0086, as applied to training the student model – para 0070, 0068; wherein the classification can be based on classes – para 0085); and assign the first class as a soft label to the first sample when the class membership probability is above a threshold (Yu et al (20220343205), as labelling, and additional labeling, according to threshold confidence levels – para 0040).
As per claim 7, the combination of Yu et al (20220343205) in view of Barkan et al (20210182489) teaches the system of claim 1, wherein the LLM generates semantic embeddings for the additional training samples, and the student model is retrained using at least the additional training samples and the generated semantic embeddings (see Yu et al (20220343205), as using language models – para 0088, 0089, and para 0041 showing similarity in content features to the label; Barkan et al (20210182489) teaching deep contextual word processing – para 0021, and common context meanings – para 0043) ).
Claims 8-14 are method claims whose steps are performed by the system claims 1-7 above and as such, claims 8-14 are similar in scope and content to claims 1-7 above; therefore, claims 8-14 are rejected under similar rationale as presented against claims 1-7 above. Furthermore, to claims 9,10, (Examiner notes the list of categories are in the alternative form, “or”, and Yu et al (20220343205) teaches operating on image information – see para 0099).
Claims 15-20 are device claims contain steps that are performed by the systems claims 1-7 above and as such, claims 15-20 are similar in scope and content to claims 1-7 above; therefore, claims 15-20 are rejected under similar rationale as presented against claims 1-7 above. Furthermore, Yu et al (20220343205) teaches processor/memories performing stored steps (para 0074).
Response to Arguments
Applicant's arguments filed 2/2/2026 have been fully considered but are not persuasive. In general, the Yu et al reference teaches the claimed soft labels, with the Barkan et al reference teaching the separation of LLM, multiple teacher models, and a student model; Examiner notes the purpose of the Barkan et al (20210182489) reference is to address the claim amendments toward the distribution of language models, and the relationships between the LLM, teacher, and student models. As to applicants arguments, on the bottom of pp 8 of the response, toward “the Office interprets the teacher model as the LLM”, examiner strongly disagrees and notes that the Yu et al reference teaches an LLM generating soft labels, however, the Yu et al reference teaches a student/teacher model structure; the introduction of the Barkan et al reference, is to teach the concept of separate LLM’s, teacher, and student models (whereas the Yu et al reference shows a teacher/student model relationship, alone). The Barkan et al reference, is used to teach a modification, of, replacing a teacher model with a LLM/2 teachers models, interfacing with a student model; for the benefit of faster scoring (as noted in the Office Action rejection above). Clearly, a prima facie case has been established, by 1) pointing the teachings of the primary reference 2) showing the shortfalls of the primary reference, 3) pointing to a secondary reference to teach the shortcomings of the primary reference and 4) providing a motivation to combine the references (under the very well known, TSM technique – see MPEP 2141. As to applicants arguments on pp 9 of the response, examiner disagrees and argues that the fundamental process of Yu et al is, not altered, in the sense that the modified Yu et al reference (via the teachings of Barkan et al), still generates/trains a student model; with a modified structure of LLM/2 teacher models, for the benefits of faster scoring. In other words, the additional model structure of Barkan et al allows for more detailed labeling/scoring, in parallel, so as to score faster and more accurately. Regarding the arguments on pp 10 of the response, toward the “using at least the training samples and the additional training samples with the second group of soft labels”, examiner notes that applicants arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. Applicants arguments repeats the claim limitations, repeats the rationale presented by the examiner, but fails to provide any reasoned differential between the elements, other than “is patentably distinct from using a smaller subgroup to re-prompt the model. Further, words like ‘subgoup’ and ‘re-prompt’ are not even present in Yu. Applicant respectfully requests the Office to quote from the references rather than make inferences in hindsight”; applicants fail to present a qualified rebuttal to the detailed explanation as provided by the examiner. Examiner has provided an explanation of the analysis, using terms that are quite familiar to one of ordinary skill in the art of training/retraining/updating language models (synonyms are not precluded in providing a detailed explanation of the applied prior art; “Under no circumstance should an examiner accept as persuasive a bare statement or opinion that the element shown in the prior art is not an equivalent embraced by the claim limitation – see MPEP). On pp 10-11 of the response, applicants arguments omits the statement in para 0047 of Barkan et al “…two teacher models T and R are used…”. Furthermore, Barkan emphasizes that the teacher models do not have to be BERT-Large model – hence, T and R can be BERL-Large model or a transformer language model teacher.
Lastly, as to soft labeling, examiner notes the Fukuda et al (20220414448) teaching separate teacher model, student model, language model, -- see paragraphs 0048, 0017, with soft labeling – para 0003, 0027.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Please see related art listed on the PTO-892 form.
Furthermore, the following references were found:
Fukuda et al (20220414448) teaches soft labeling by the teacher model see para 0003, 0027.
Bui et al (20220114476) teaches pseudo labeling in addition to an initial round of labeling (para 0020).
Kim et al (20230143721) teaches natural language processing (NLP) with few-shot processing (para 0021) with label sequencing (para 0030-0032), differentiating between teacher and student models – para 0039)
Luong et al (20220383206) teaches pretrained language models, improving accuracy with few-shot benchmarks (para 0034), using ground truth labels (para 0043), operating on teacher/student models (para 0056-0060).
Balasubramanian et al (20230368786) teaches student/teacher models operating on speech (para 0032), with displays to monitor/modify the models – para 0036, using modified labeling (para 0061).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
/Michael N Opsasnick/Primary Examiner, Art Unit 2658 02/17/2026