Last updated: April 19, 2026
Application No. 19/271,920
Method and System for Multi-Level Artificial Intelligence Supercomputer Design Featuring Sequencing of Large Language Models

Non-Final OA §112§DP
Filed
Jul 17, 2025
Examiner
YEN, ERIC L
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Vijay Madisetti
OA Round
1 (Non-Final)
Interview Optional

— +11.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 765 resolved cases, 2023–2026
Examiner Intelligence

YEN, ERIC L View full profile →
Grants 85% — above average
Career Allow Rate
650 granted / 765 resolved
+23.0% vs TC avg
Moderate +12% lift
Without
With
+11.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
11 currently pending
Career history
776
Total Applications
across all art units
Statute-Specific Performance

§101
18.1%
-21.9% vs TC avg
§103
29.8%
-10.2% vs TC avg
§102
3.5%
-36.5% vs TC avg
§112
35.1%
-4.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 765 resolved cases
Office Action

§112 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
	As per Claim 3:
	“each h-LLM” in line 2 of claim 3 is interpreted as referring to “each h-LLM of the plurality of h-LLMs”.
	As per Claim 15:
	“the creation process” in line 2 of claim 15 is interpreted as referring to the process involved in “creating the specialized h-LLM” in line 1 of claim 15.
Claim Objections
	Claim 5 is not formally objected to, but Applicant can, at Applicant’s discretion, amend claim 5 to depend on claim 1 and not on claim 4 (there does not appear to be any particular reason why claim 5 must depend on claim 4)
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 7-11 and 18 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 

As per Claim 7:
The original Specification (i.e. the original Specification of Parent Application 18/348,692, hereafter original Specification, where this application is one of a sequence of continuations and not a continuation-in-part) does not have written description for assigning one or more weights to the original input data based on errors in the first output results (Applicant’s Specification’s description of the “boosting approach” is limited to “Referring now to FIG. 5 is an illustration a boosting approach, that has some similarities to that originally used in the context of machine learning models in a different way (for analytics as opposed to generative AI applications used in this invention) where multiple h-LLMs of increasing precision and accuracy are created in a sequential manner and then merged/fused to create a merged h-LLM, is described in more detail. Boosting is a machine learning technique that involves creating a stronger and more accurate model from a number of weaker models. The original data 400 is used to train an h-LLM 402. The h-LLM 402 is tested and the output 404 is assigned weights to generate weighted data 406. The weighted data 406 is then used to train h-LLM 408. The same process is then repeated and h-LLMs 414 and 420 are generated in a sequence. The h-LLMs 402, 408, 414 and 420 are then combined in a process called merging or fusing 424 to create a merged h-LLM 426”, which does not describe the specifics/criteria of/for how data is weighted [and thus does not describe where weights have higher values when data portions resulted in higher error rates], and also does not describe where weight[s] are assigned to the input data [weights appear to be assigned to h-LLM outputs, not to the input data upon which an h-LLM is trained]).

As per Claim 8:
The original Specification does not have written description for applying performance-based weights to each h-LLM in the sequence (the original Specification [see paragraph cited in the written description rejection of claim 7] does not describe the specifics/criteria of/for how data is weighted [and thus does not describe where weights have higher values when data portions resulted in higher error rates])

As per Claim 10:
The original Specification does not have written description for applying performance-based weights to each h-LLM in the sequence (the original Specification [see paragraph cited in the written description rejection of claim 7] describes where outputs of h-LLMs are assigned weights, not where h-LLMs themselves are assigned weights, and not where the weights are performance-based)

As per Claim 11:
The original Specification does not have written description for evaluating performance on a validation dataset separate from the original input data (The original Specification does not describe a validation dataset or evaluating performance).

	As per Claim 18:
	The original Specification does not have written description for extract task-specific knowledge from the general purpose h-LLM through at least one of parameter selection and optimization (the original Specification recites “Referring now to FIG. 6 is an illustration of creating a smaller and more specialized h-LLM through extraction/specialization process from a larger h-LLM, is described in more detail. The extraction/specialization process 502 extracts the specific knowledge required for a task from a big, general-purpose model, and creates a smaller h-LLM 506. For example, a specific task can be sentiment analysis of input text, for which a smaller model 506 is more efficient as compared to a large, general-purpose model” which describes extracting task-specific knowledge from the general purpose h-LLM but does not describe parameter selection and optimization as processes used to perform the extracting of the task-specific knowledge).

	The dependent claims include the issues of their respective parent claims.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

As per Claim 1:
It is not clear if Applicant meant for “large language model (h-LLM)” in line 1 of claim 1 to establish “h-LLM” as an abbreviation of “large language model” (typically abbreviated LLM, not h-LLM).  Applicant’s Specification recites “An h-LLM is a family of models”, which seems to suggest that h-LLM inherently includes plural models, and Applicant has consistently used h-LLM as an abbreviation for a family of LLMs in other applications.

Claims 7 and 12 include the same issue as Claim 1 (see line 1 of claim 7 and line 1 of claim 12).

As a consequence of the issue discussed in the 112(b) rejection of claim 1, it is also not clear if “h-LLM” in claims 1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 14, 15, 16, and 18 (where claim 18 does not actually define the acronym/abbreviation “h-LLM”) is supposed to refer to one large language model (as suggested/defined by line 1 of claims 1, 7, and 12) or to a family of large language models (consistent with the Specification)

As per Claims 8-11:
It seems like Applicant may have meant for claims 8-11 to depend on claim 7, and not on claim 6 (considering claim 7 is the independent claim that mentions a sequence of h-LLMs, a first h-LLM, weighting, and subsequent h-LLMs).

As per Claim 7:
“the subsequent outputs results” in the 4th to last line of claim 7 lacks antecedent basis (iteratively testing the subsequent h-LLMs is not specifically recited to generate “subsequent output results”).
“the enhanced h-LLM” in the last line of claim 7 is ambiguous (the 2nd to last line of claim 7 recites “an enhanced h-LLM” but line 1 of claim 7 also recites “an enhanced large language model [h-LLM]” which could be interpreted, by abbreviation, as another “an enhanced h-LLM”)

As per Claim 9:
“each subsequent h-LLM” in line 2 of claim 9 is ambiguous (the 5th to last line of claim 7 recites “subsequent h-LLMs” [which are iteratively tested] and the 3rd to last line of claim 7 recites “subsequent h-LLMs” [which are trained] and these two sets of subsequent h-LLMs” do not necessarily refer to the same subsequent h-LLMs, as claimed)
“the previous h-LLM” in lines 2-3 of claim 9 lacks antecedent basis and is unclear (neither recitation of “subsequent h-LLMs” in claim 7 is, as claimed, part of “a/the sequence of h-LLMs” [such that there is not necessarily an inherent “previous h-LLM” for each subsequent h-LLM], and even assuming “each subsequent h-LLM” inherently has a previous h-LLM [due to the word “subsequent”], it is not clear which subsequent h-LLM’s “previous h-LLM” is the one that “the previous h-LLM” in lines 2-3 of claim 9 is supposed to refer to, and also “each subsequent h-LLM” can follow multiple previous h-LLMs such that even if “the previous h-LLM” is interpreted as a respective previous h-LLM of “each subsequent h-LLM”, and when subsequent h-LLMs follow multiple h-LLMs, then it would not be clear which one of multiple respective previous h-LLMs for a particular subsequent h-LLM is the one that “the previous h-LLM” in lines 2-3 of claim 9 is supposed to refer to [Applicant may have intended to refer to the immediately preceding h-LLM])

As per Claim 12:
“the specialized h-LLM” in the 3nd to last line of claim 12 is ambiguous (it can refer to “a specialized h-LLM” in the 5th to last line of claim 12 [probably what Applicant meant to claim] or to “a specialized large language model [h-LLM] in line 1 of claim 12)

As per Claims 14-15:
“the specialized h-LLM” is ambiguous (same issue as in claim 12).

	The dependent claims include the issues of their respective parent claims.
Allowable Subject Matter
Claim 12 would be allowable if rewritten or amended to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action.
Claims 4 and 13-17 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
	As per Claim(s) 1 (and consequently claim[s] 2-6 which depend on claim[s] 1), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 1, including (i.e. in combination with the remaining limitations in claim[s] 1) A method for creating a merged large language model (h-LLM) using a bagging approach comprising: receiving input data at a computer system comprising a processor, non-transitory storage medium, and software stored on the non-transitory storage medium; creating a plurality of data subsets from the input data; training a plurality of h-LLMs, each h-LLM of the plurality of h-LLMs being trained on a respective data subset of the plurality of data subsets; creating a merged h-LLM by merging the plurality of h-LLMs ; and outputting the merged h-LLM.
2008/0133434 teaches “Bagging was proposed by Breiman [4], and is based on bootstrapping [7] and aggregating concepts, so it incorporates the benefits of both approaches. Bootstrapping is based on random sampling with replacement. Therefore, taking a bootstrap replicate X=(X1, X2, . . . , Xn) (random selection with replacement) of the training set (X1, X2, . . . , Xn), one can sometimes avoid or get less misleading training objects in the bootstrap training set. Consequently, a classifier constructed on such a training set may have a better performance. Aggregating actually means combining classifiers. Often a combined classifier gives better results than individual classifiers, because of combining the advantages of the individual classifiers in the final solution. Therefore, bagging might be helpful to build a better classifier on training sample sets with misleaders. In bagging, bootstrapping and aggregating techniques are implemented in the following way: Classification: 1. The same split percentages is used for randomly creating multiple (training and validation) datasets. 1. For each dataset (training and validation), the best model is produced. 2. The models are aggregated by a simple majority rule. The models that produce the majority classification for a molecule are aggregated to produce the bagged model. 1. The same split percentages is used for randomly creating multiple (training and validation) datasets. 2. For each dataset (training and validation), the best model is produced. 3. The models are simply aggregated by averaging the models” (paragraphs 110-116).
2017/0124074 teaches “Ensemble algorithms are models composed of multiple weaker models that are independently trained and whose predictions are combined in some way to make the overall prediction. Determining which models to combine and the ways to combine them may vary depending on the type of information provided to the analytic engine, however popular ensemble algorithms that may be used for determining an appropriate music recommendation may include Boosting, Bootstrapped Aggregation (Bagging), AdaBoost, Stacked Generalization (blending), Gradient Boosting Machines (GBM), Gradient Boosted Regression Trees (GBRT) and Random Forest” (paragraph 44).  This reference describes an alternative name for “bagging” which is “bootstrapped aggregation”.
2018/0308003 teaches “The term bagging may refer to a process by which the outputs of machine learning models are combined in order to produce a single output. Typically this takes the form of a weighted average of the outputs. In one example of a bagging scheme for a set of machine learning classifiers, the single output is the majority classification. Effectively the machine learning classifiers perform a majority vote to determine what classification to output” (paragraph 41).  This reference describes where bagging does not necessarily refer to model merging/combination but can also refer to combining outputs.
2022/0269986 teaches “BAGGING. Bagging stands for bootstrap aggregation. In order to reduce the variance of an estimate bagging averages together multiple estimates. Bagging uses bootstrap sampling to obtain the data subsets for training the base learners. For aggregating the outputs of base learners, bagging uses voting for classification and averaging for regression” (paragraph 26).  This reference describes where bagging is not necessarily combining models, but can also combine outputs.
2023/0351203 teaches “ensemble learning algorithms (such as Boosting and variations (e.g. Adaboost, Xgboost etc.), Bagging, stacked generalization, decision trees etc.)” (paragraph 67).  This reference describes bagging and boosting as examples of “ensemble learning”.
	2024/0420491 (provisional 63/508,650 does NOT appear to support the cited paragraph) teaches “Multiple parallel sessions may be used to combine the capabilities of multiple LLMs to optimize for the user's experience rather than the LLMs own considerations. In other words, the user experiences fast responses for simple questions, while also benefitting from in-depth answers where necessary. Selection may be based on a variety of criteria e.g., response time, response length, response quality, etc. As but one such example, multiple queries may be launched to models of different complexity; while a simple model can answer more quickly, the complex model may answer more accurately. Here, the first response that sufficiently answers the query is used. As another such example, multiple queries may be launched to LLMs with access to different libraries of information. The most comprehensive response (that is not a hallucination) may be used” (paragraph 300).  This reference does not appear to merge LLM families and also does not appear to qualify as prior art.

As per Claim(s) 7 (and consequently claim[s] 8-11 which depend on claim[s] 7), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 7, including (i.e. in combination with the remaining limitations in claim[s] 7) A method for creating an enhanced large language model (h-LLM) using a boosting approach comprising: training a first h-LLM using original input data; testing the first h-LLM to generate first output results; generating a first weighted data by assigning one or more weights to the original input data based on errors in the first output results; training a second h-LLM using the first weighted data; generating a sequence of h-LLMs with increasing levels of at least one of precision and accuracy by iteratively testing the second h-LLM and subsequent h-LLMs, generating subsequent weighted data from the subsequent output results, and training subsequent h-LLMs on the subsequent weighted data; merging the sequence of h-LLMs to create an enhanced h-LLM; and outputting the enhanced h-LLM for use in processing language tasks.
	2017/0262770 teaches “Boosting methods are another approach in which a sequence of models (again, typically decisions trees) are iteratively trained (forward, stagewise training) using a common set of features to classify an instance or predict a value, but the data input into each model is weighted to emphasize the incorrectly classified examples and deemphasize the correctly weighted ones. The final model is the additive combination of the multiple models. Gradient boosting methods also use a forward, stagewise additive approach, but here each model in the sequence learns to predict the residuals of the prior model” (paragraph 4).
	2022/0269986 teaches “BOOSTING. Boosting refers to a family of algorithms that are able to convert weak learners to strong learners. The main principle of boosting is to fit a sequence of weak learners—models that are only slightly better than random guessing, such as small decision trees—to weighted versions of the data. More weight is given to examples that were misclassified by earlier rounds. The predictions are then combined through a weighted majority vote (classification) or a weighted sum (regression) to produce the final prediction. The principal difference between boosting and the committee methods, such as bagging, is that base learners are trained in sequence on a weighted version of the data” (paragraph 27).  This reference describes where boosting is not necessarily combining models, but can also combine predictions made by models.
 	2008/0133434 teaches “Boosting.sup.4 .sup.4Meir, Ron; Ratsch, Gunnar. An Introduction to Boosting and Leveraging Boosting is based on the observation that finding many not-so-accurate models can be a lot easier than finding a single, highly accurate prediction model. To apply the boosting approach, we start with a method or algorithm for finding moderately accurate models. The boosting algorithm calls this "weak" or "base" learning algorithm repeatedly, each time feeding it a different subset of the training examples (or, to be more precise, a different distribution or weighting over the training examples 1). Each time it is called, the base learning algorithm generates a new weak model, and after many rounds, the boosting algorithm must combine these weak models into a single model that, hopefully, will be much more accurate than any one of the weak models.  To make this approach work, there are two fundamental questions that must be answered: first, how should each distribution be chosen on each round, and second, how should the weak rules be combined into a single rule? Regarding the choice of distribution, the technique that is advocated by Robert Schapire is to place the most weight on the examples most often misclassified by the preceding weak rules; this has the effect of forcing the base learner to focus its attention on the "hardest" examples. As for combining the weak rules, simply taking a (weighted) majority vote of their predictions is natural and effect of forcing the base learner to focus its attention on the "hardest" examples. As for combining the weak rules, simply taking a (weighted) majority vote of their predictions is natural and effective for classification. A weighted average of the predictions is used for regression.  An actual training set is selected from the available training patterns for T different classifiers. However, the general idea in Boosting is that which patterns are selected for the I-th training set, is dependent on the performance of the earlier classifiers. Examples that are incorrectly predicted (more often) by previous classifiers are chosen more often for subsequent classifiers. A probability pj of being selected for the next training set is associated with each pattern j, j belonging to {0, 1, . . . , 1train-1}. Initially, of course, pj=1/train. To construct an actual training set, repeat 1 train times: Choose pattern j with probability pj. For subsequent classifiers, the pj are changes. The way in which pj are changed depends on which variant of Boosting is used” (paragraphs 107-109).
2023/0351203 teaches “ensemble learning algorithms (such as Boosting and variations (e.g. Adaboost, Xgboost etc.), Bagging, stacked generalization, decision trees etc.)” (paragraph 67).  This reference describes bagging and boosting as examples of “ensemble learning”.

As per Claim(s) 12 (and consequently claim[s] 13-17 which depend on claim[s] 12), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 12, including (i.e. in combination with the remaining limitations in claim[s] 12) A method for creating a specialized large language model (h-LLM) through extraction comprising: receiving a general purpose h-LLM; identifying a specialized task from a group of tasks; extracting task-specific knowledge from the general purpose h-LLM corresponding to the specialized task; creating a specialized h-LLM having reduced computational requirements compared to the general purpose h-LLM while maintaining performance for the specialized task; and configuring the specialized h-LLM to process prompts related to the specialized task.
As per Claim(s) 18, the prior art of record does not teach or suggest the combination of all limitations in claim(s) 18, including (i.e. in combination with the remaining limitations in claim[s] 18) A system for creating specialized large language models comprising: a processor; a non-transitory computer-readable storage medium positioned in communication with the processor; and software stored on the storage medium that, when executed by the processor, is operable to: receive a general purpose h-LLM; identify a specialized task; extract task-specific knowledge from the general purpose h-LLM through at least one of parameter selection and optimization; generate a specialized h-LLM with reduced computational requirements while maintaining task-specific performance related to the specialized task; and deploy the specialized h-LLM for processing prompts related to the specialized task.
	Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin, “Distilling Task-Specific Knowledge from BERT into Simple Neural Networks”, 2019, https://arxiv.org/abs/1903.12136 teaches distilling/transferring task-specific knowledge from BERT to a shallow neural architecture (particularly a BiLSTM, see 1 Introduction), and where the distilled model uses fewer parameters and less inference time (see 6 Conclusion).
	2021/0182662 teaches “One possible approach for a knowledge distillation method for BERT involves distilling task-specific knowledge from BERT into simple neural networks. However, this approach is not task-independent. In other words, whenever a new target down-stream task is presented, a new student model would need to be distilled from the original cumbersome BERT model. In addition, such an approach requires many heuristics to construct training examples for the distillation process” (paragraph 12) and “Techniques are disclosed for training a reduced scale neural network based natural language processing (NLP) model using a full-scale NN based NLP model. The techniques are particularly well-suited for training transformer-based neural network models, such as BERT. In an embodiment, a dense knowledge distillation approach is used to train the reduced scale model. In this manner, the dense knowledge distillation can be used to effectively transfer knowledge acquired in the full-scale model to the reduced scale model. The full-scale model acts as a teacher model, and the reduced scale model acts as a student model” (paragraph 10).
	2022/0245343 teaches “Examples of pre-trained transformer models include but are not limited to BERT (Bidirectional Encoder Representations from Transformers) models and GPT (Generative Pre-trained Transformer) models, which have been pre-trained with large general language datasets, such as Wikipedia Corpus, and can be fine-tuned to specific language tasks” (paragraph 79).
	2024/0054338 teaches “General purpose pretrained generative models, such as but not limited to the GPT Family, T5, CLIP, Codex, etc., can be trained in a self-supervised manner on large amounts of uncurated data and can then be adapted to specific downstream tasks or control objectives. A nonlimiting example downstream task may include generating Python code, while an example control objective may be controlling the style of the generated code” (paragraph 23).
	2023/0418694 teaches “analyzing a task currently being performed in the electronic device based on the detection of the second user input, calling a clipboard based on the detection of the second user input, extracting clip data corresponding to the task from among a plurality of pieces of clip data of the clipboard, and providing a clipboard based on the clip data corresponding to the task through a display module” (paragraph 379).
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1, 2, 3, 5, and 6 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 4, 7, and 8 of U.S. Patent No. 12,321,371, hereafter Parent Patent 1. Although the claims at issue are not identical, they are not patentably distinct from each other because claims of this application are rendered obvious by the claims of Parent Patent 1.

As per Claim 1:
	Claim 8 of Parent Patent 1 (interpreted as incorporating the limitations of Claim 1 of Parent Patent 1) suggests
A method for creating a merged large language model (h-LLM) using a bagging approach (lines 1-8 of claim 1 of Parent Patent 1 and claim 8 of Parent Patent 1, where lines 1-8 of claim 1 of Parent Patent 1 describe training LLMs based on respective task-specific subsets of input data and merging the trained LLMs [which is described in Applicant’s Specification as “bagging” in the description of Figure 4], and claim 8 of Parent Patent 1 teaches where h-LLMs are trained and merged)
comprising: receiving input data at a computer system comprising a processor, non-transitory storage medium, and software stored on the non-transitory storage medium; (lines 1-8 Claim 1 of Parent Patent 1; the steps of Claim 1 of Parent Patent 1 are obviously performed by a computer [which conventionally includes a processor, hard drive, and software] and a computer logically cannot operate based on data that it does not have/receive)
creating a plurality of data subsets from the input data; (lines 1-8 Claim 1 of Parent Patent 1, where, in order to train the LLMs based on subsets, the subsets are logically/obviously “created”)
training a plurality of h-LLMs, each h-LLM of the plurality of h-LLMs being trained on a respective data subset of the plurality of data subsets; creating a merged h-LLM by merging the plurality of h-LLMs ; and outputting the merged h-LLM (lines 1-8 of claim 1 of Parent Patent 1 and claim 8 of Parent Patent 1, where lines 1-8 of claim 1 of Parent Patent 1 describe training 2 LLMs based on respective task-specific subsets of input data and merging the trained LLMs [which logically creates and outputs the merged LLM], and claim 8 of Parent Patent 1 teaches where h-LLMs are trained and merged)

As per Claim 2, Claim 8 of Parent Patent 1 (interpreted as incorporating the limitations of Claim 1 of Parent Patent 1) suggests wherein creating the plurality of data subsets comprises dividing the input data to create multiple data subsets (lines 1-8 of claim 1 of Parent Patent 1, in order for the two subsets of the input data upon which the two LLMs are trained to exist, they are logically “created”, and subsets are portions/”divisions”)

As per Claim 3, Claim 8 of Parent Patent 1 (interpreted as incorporating the limitations of Claim 1 of Parent Patent 1) suggests wherein merging the plurality of h-LLMs comprises combining model parameters from each h-LLM through a merging or fusing process (lines 1-8 of claim 1 of Parent Patent 1 and claim 8 of Parent Patent 1, where merging LLMs obviously/logically combines features/parameters of the LLMs being combined/merged/fused, and claim 8 of Parent Patent 1 teaches where h-LLMs are trained and merged)

As per Claim 5:
Claim 8 of Parent Patent 1 does not, but Claim 7 of Parent Patent 1 suggests wherein the h-LLMs of the plurality of h-LLMs are trained concurrently (Claims 1 and 8 of Parent Patent 1 teach where h-LLMs are trained, and Claim 7 of Parent Patent 1 teaches where LLMs are trained [where claim 8 of Parent Patent 1 describes where the trained LLMs can be h-LLMs] in parallel [i.e. “concurrently”])
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of training of multiple models with another because Claim 8 of Parent Patent 1 teaches the claimed invention except for the substitution of training of multiple models which does not necessarily train the multiple models concurrently with training of multiple models which does.  Claim 7 of Parent Patent 1 teaches that training of multiple models which trains the multiple models concurrently was known in the claims.  One of ordinary skill in the art could have substituted one type of training of multiple models with another to obtain the predictable results of Claim 8 of Parent Patent 1, where the h-LLMs are trained in parallel (as per Claim 7 of Parent Patent 1).

	As per Claim 6:
Claim 8 of Parent Patent 1 does not, but Claim 4 of Parent Patent 1 suggests wherein the merged h-LLM has at least one of a higher precision, a higher accuracy, and an improved stability than each h-LLM of the plurality of h-LLMs (Claim 4 of Parent Patent 1 teaches where the merged LLM has improved precision/accuracy relative to each of the trained first and second LLMs [where claim 8 describes where the trained LLMs can be h-LLMs])
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of merged model with another because Claim 8 of Parent Patent 1 teaches the claimed invention except for the substitution of a merged model which does not necessarily have at least one of a higher precision, a higher accuracy, and an improved stability relative to the models merged to produce the merged model with a merged model which does.  Claim 4 of Parent Patent 1 teaches that a merged model which has at least one of a higher precision, a higher accuracy, and an improved stability relative to the models merged to produce the merged model was known in the claims.  One of ordinary skill in the art could have substituted one type of training of multiple models with another to obtain the predictable results of Claim 8 of Parent Patent 1, where the merged model has an improved precision and/or accuracy relative to each of the models that were merged to produce the merged model (as per Claim 4 of Parent Patent 1).

Claims 1, 2, 3, 5, and 6 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 3, 5, 7, and 8 of U.S. Patent No. 12,430,370, hereafter Parent Patent 2. Although the claims at issue are not identical, they are not patentably distinct from each other because claims of this application are rendered obvious by the claims of Parent Patent 2.

As per Claim 1:
	Claim 8 of Parent Patent 2 (interpreted as incorporating the limitations of Claims 1 and 3 of Parent Patent 2) suggests
A method for creating a merged large language model (h-LLM) using a bagging approach (Claim 3 of Parent Patent 2 and claim 8 of Parent Patent 2, where Claim 3 of Parent Patent 2 describes training LLMs based on respective task-specific subsets of input data and merging the trained LLMs [which is described in Applicant’s Specification as “bagging” in the description of Figure 4], and claim 8 of Parent Patent 2 teaches where h-LLMs are trained and merged)
comprising: receiving input data at a computer system comprising a processor, non-transitory storage medium, and software stored on the non-transitory storage medium; (Claim 3 of Parent Patent 2; the steps of Claim 3 of Parent Patent 2 are obviously performed by a computer [which conventionally includes a processor, hard drive, and software] and a computer logically cannot operate based on data that it does not have/receive)
creating a plurality of data subsets from the input data; (Claim 3 of Parent Patent 2, where, in order to train the LLMs based on subsets, the subsets are logically/obviously “created”)
training a plurality of h-LLMs, each h-LLM of the plurality of h-LLMs being trained on a respective data subset of the plurality of data subsets; creating a merged h-LLM by merging the plurality of h-LLMs ; and outputting the merged h-LLM (Claim 3 of Parent Patent 2 and claim 8 of Parent Patent 2, where Claim 3 of Parent Patent 2 describes training 2 LLMs based on respective task-specific subsets of input data and merging the trained LLMs [which logically creates and outputs the merged LLM], and claim 8 of Parent Patent 2 teaches where h-LLMs are trained and merged)

As per Claim 2, Claim 8 of Parent Patent 2 (interpreted as incorporating the limitations of Claims 1 and 3 of Parent Patent 2) suggests wherein creating the plurality of data subsets comprises dividing the input data to create multiple data subsets (Claim 3 of Parent Patent 2, in order for the two subsets of the input data upon which the two LLMs are trained to exist, they are logically “created”, and subsets are portions/”divisions”)

As per Claim 3, Claim 8 of Parent Patent 2 (interpreted as incorporating the limitations of Claims 1 and 3 of Parent Patent 2) suggests wherein merging the plurality of h-LLMs comprises combining model parameters from each h-LLM through a merging or fusing process (Claim 3 of Parent Patent 2 and claim 8 of Parent Patent 2, where merging LLMs obviously/logically combines features/parameters of the LLMs being combined/merged/fused, and claim 8 of Parent Patent 2 teaches where h-LLMs are trained and merged)

As per Claim 5:
Claim 8 of Parent Patent 2 does not, but Claim 7 of Parent Patent 2 suggests wherein the h-LLMs of the plurality of h-LLMs are trained concurrently (Claims 3 and 8 of Parent Patent 2 teach where h-LLMs are trained, and Claim 7 of Parent Patent 2 teaches where LLMs are trained [where claim 8 of Parent Patent 2 describes where the trained LLMs can be h-LLMs] in parallel [i.e. “concurrently”])
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of training of multiple models with another because Claim 8 of Parent Patent 2 teaches the claimed invention except for the substitution of training of multiple models which does not necessarily train the multiple models concurrently with training of multiple models which does.  Claim 7 of Parent Patent 2 teaches that training of multiple models which trains the multiple models concurrently was known in the claims.  One of ordinary skill in the art could have substituted one type of training of multiple models with another to obtain the predictable results of Claim 8 of Parent Patent 2, where the h-LLMs are trained in parallel (as per Claim 7 of Parent Patent 2).

	As per Claim 6:
Claim 8 of Parent Patent 2 does not, but Claim 5 of Parent Patent 2 suggests wherein the merged h-LLM has at least one of a higher precision, a higher accuracy, and an improved stability than each h-LLM of the plurality of h-LLMs (Claim 5 of Parent Patent 2 teaches where the merged LLM has improved precision/accuracy relative to each of the trained first and second LLMs [where claim 8 of Parent Patent 2 describes where the trained LLMs can be h-LLMs])
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of merged model with another because Claim 8 of Parent Patent 2 teaches the claimed invention except for the substitution of a merged model which does not necessarily have at least one of a higher precision, a higher accuracy, and an improved stability relative to the models merged to produce the merged model with a merged model which does.  Claim 5 of Parent Patent 2 teaches that a merged model which has at least one of a higher precision, a higher accuracy, and an improved stability relative to the models merged to produce the merged model was known in the claims.  One of ordinary skill in the art could have substituted one type of training of multiple models with another to obtain the predictable results of Claim 8 of Parent Patent 2, where the merged model has an improved precision and/or accuracy relative to each of the models that were merged to produce the merged model (as per Claim 5 of Parent Patent 2).

Claims 1, 2, 3, 5, and 6 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 and 4 of copending Application No. 19/273,406 (reference application), hereafter Sibling Application 3 (this is this first Sibling Application but is the 3rd application/patent cited in the Double Patenting rejections). Although the claims at issue are not identical, they are not patentably distinct from each other because claims of this application are rendered obvious by the claims of Sibling Patent 3.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

As per Claim 1:
	Claim 4 of Sibling Patent 3 (interpreted as incorporating the limitations of Claim 1 of Sibling Application 3) suggests
A method for creating a merged large language model (h-LLM) using a bagging approach (Claim 4 of Sibling Application 3)
comprising: receiving input data at a computer system comprising a processor, non-transitory storage medium, and software stored on the non-transitory storage medium; (Claim 4 of Sibling Application 3; the steps of Claim 4 of Sibling Application 3 are obviously performed by a computer [which conventionally includes a processor, hard drive, and software] and a computer logically cannot operate based on data that it does not have/receive)
creating a plurality of data subsets from the input data; (Claim 4 of Sibling Application 3, where, in order to train the LLMs based on subsets, the subsets are logically/obviously “created”)
training a plurality of h-LLMs, each h-LLM of the plurality of h-LLMs being trained on a respective data subset of the plurality of data subsets; creating a merged h-LLM by merging the plurality of h-LLMs ; and outputting the merged h-LLM (Claim 4 of Sibling Application 3 [which logically creates and outputs the merged h-LLM])

As per Claim 2, Claim 4 of Sibling Application 3 (interpreted as incorporating the limitations of Claim 1 of Sibling Application 3) suggests wherein creating the plurality of data subsets comprises dividing the input data to create multiple data subsets (Claim 4 of Sibling Application 3, in order for the two subsets of the input data upon which the two LLMs are trained to exist, they are logically “created”, and subsets are portions/”divisions”)

As per Claim 3, Claim 4 of Sibling Application 3 (interpreted as incorporating the limitations of Claim 1 of Sibling Application 3) suggests wherein merging the plurality of h-LLMs comprises combining model parameters from each h-LLM through a merging or fusing process (Claim 4 of Sibling Application 3, where merging LLMs obviously/logically combines features/parameters of the LLMs being combined/merged/fused)

As per Claim 5:
Claim 4 of Sibling Application 3 suggests wherein the h-LLMs of the plurality of h-LLMs are trained concurrently (Claim 4 of Sibling Application 3, where “in parallel” is synonymous with “concurrently”)

	As per Claim 6:
Claim 4 of Sibling Application 3 suggests wherein the merged h-LLM has at least one of a higher precision, a higher accuracy, and an improved stability than each h-LLM of the plurality of h-LLMs (Claim 4 of Sibling Application 3)
	
	For clarity of the record, Claim 7 of this application is not rejected based on Claim 5 of Sibling Application 3 because Claim 5 of Sibling Application 3 does not teach where weights are assigned to the original input data, but if this claim is amended to recite where weights are assigned to h-LLM outputs (to address the new matter issue discussed above in the 112[a] rejections), then Claim 7 would be rejected based on Claim 5 of Sibling Application 3.

	For clarity of the record, Claim 12 of this application is not rejected based on Claim 6 of Sibling Application 3 because Claim 6 of Sibling Application 3 does not specifically teach creating a specialized h-LLM having reduced computational requirements compared to the general purpose h-LLM while maintaining performance for the specialized task; and configuring the specialized h-LLM to process prompts related to the specialized task.

For clarity of the record, Claim 18 of this application is not rejected based on Claim 6 of Sibling Application 3 because Claim 6 of Sibling Application 3 does not teach where task-specific knowledge is extracted from the general purpose h-LLM through at least one of parameter selection and optimization, but if this claim is amended to delete “through at least one of parameter selection and optimization (to address the new matter issue discussed above in the 112[a] rejections), then Claim 18 would be rejected based on Claim 6 of Sibling Application 3.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249. The examiner can normally be reached M-F 12:00PM -8:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached at (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





EY 2/11/2026
/ERIC YEN/           Primary Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Jul 17, 2025
Application Filed
Feb 11, 2026
Non-Final Rejection — §112, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/644,842
Patent 12602541
MINIMIZING LARGE LANGUAGE MODEL HALLUCINATIONS IN GENERATED SUMMARIES
2y 5m to grant Granted Apr 14, 2026
18/429,109
Patent 12585880
SCALABLE CONSISTENCY ENSEMBLE FOR MACHINE LEARNING MODELS
2y 5m to grant Granted Mar 24, 2026
19/094,630
Patent 12585886
CONVERSATION METHODS, APPARATUS, ELECTRONIC DEVICES, STORAGE MEDIA, AND PRODUCTS
2y 5m to grant Granted Mar 24, 2026
18/034,007
Patent 12547651
SYSTEMS AND METHOD FOR DYNAMICALLY UPDATING MATERIALITY DISTRIBUTIONS AND CLASSIFICATIONS IN MULTIPLE DIMENSIONS
2y 5m to grant Granted Feb 10, 2026
18/464,446
Patent 12524617
SYSTEM AND METHOD FOR VISUAL REPRESENTATION OF DOCUMENT TOPICS
2y 5m to grant Granted Jan 13, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
85%
Grant Probability
97%
With Interview (+11.7%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 765 resolved cases by this examiner. Grant probability derived from career allow rate.