Response to Amendment
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to amendment and RCE filed on 3/5/26. Claims 1-10 and 12-21 are now pending.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-10 and 12-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Klafter et al (USPN. 2025/0156634) in view of Solomon et al (USPN. 2025/0272504).
1. Klafter teaches a method, comprising (figs. 1 and 7, par. 40, computer systems):
dividing, by a computing system comprising one or more processors configured to perform one or more processes, a corpus of one or more documents into a first plurality of fragments based on a first large language model (figs. 7, 11 and 14, LLMs 702, par. 79-83, relevant trained LLMs to topics/concepts are selected in response to query/prompt, note different LLM can be used and is noted based on language, items 1112 and 1118. Par. 127 specifically teaches splitting query into different parts and using specific LLM such as for generating a graph based on input), but Klafter does not explicitly teach the LLM is limited on a first threshold size corresponding to a token limit of a LLM. However, Solomon teaches token limit threshold corresponding to a token limit of an LLM or a maximum (fig. 1, par. 38, content of selected message responsive to a query does not exceed token limit of the LLM or a maximum threshold, see also pars. 73, “language model 106 having a token limit 115”, par. 76, “determining that a token count for content… is below a maximum threshold”, Solomon). It would have been obvious to one of ordinary skill in the field at the effective filing time of the application to integrate Klafter multiple LLM selections (fig. 7, item 702, Klafter) with Solomon LLM token limit requirements (par. 38, Solomon) to provide effective query responses. On would have been motivated to provide efficient query/prompt responses without overwhelming the AI/LLM systems.
Klafter in view of Solomon teach,
computing, by the computing system comprising one or more processors configured to perform one or more processes, a first set of embeddings using the first large language model to analyze the first plurality of fragments (pars. 135-142, embeddings created for relevant documents retrieved, LLM1, Klafter);
dividing, by the computing system comprising one or more processors configured to perform one or more processes, the same corpus of one or more documents into a second plurality of fragments at a different granularity based on a second threshold size corresponding to a token limit of a second of a second large language model wherein the second threshold size is different from the first threshold size (fig. 7, par. 127, plurality of different LLMs comprise different token limit, modified Klafter , each LLM receives the same corpus, pars. 113 and 115, manual input of data sources or web generated relevant to input prompt, Klafter, and fig. 1, par. 30 and 32, Chat thread 104 may be the relevant document responsive to the prompt, Solomon. Each LLM receives a respective number of segments from Chat Thread 104 based on token limit of an LLM.
Note: applicant confirms every LLM comprises different number of token limit, par. 74, Publication);
computing, by the computing system comprising one or more processors configured to perform one or more processes, a second set of embeddings using the second large language model to analyze the second plurality of fragments (pars. 135-142, embeddings created for relevant documents retrieved, LLM3, batteries industry, modified Klafter); and,
comparing a vector associated with queries (par. 123, input question split into vector embeddings, Klafter) to the first plurality of fragments and to the second plurality of fragments to determine vector similarity matches as part of providing responses to the queries (pars. 135 and 141, compared to embeddings stored in vector database, Klafter. For additional support, instant application provisionally filed, on page 6, “analyzes user’s question to identify the key concepts and topics… looks up the scores of the LLMs on these topics… selects the LLM with highest score”, and “each of these sub-questions could be sent to a different LLM to get a variety of perspectives and insights. The model could then combine the answers from the LLMs to generate a more comprehensive and informative response to the original question”. Note that the sub-questions analysis by different LLM for a variety of perspectives and insights and combining the answers from the LLMs to generate a more informative response uses fragments of questions in vector forms structure as the LLM always interprets and associates data using vector embeddings. For example, Solomon teaches and supports the concept of Klafter by using embedding models to obtain vector representations, see par. 31, Solomon. Content segments and query parameters are similarly used to analyze data, see pars. 32 and 33, Solomon).
wherein responses to queries according to the corpus of the one or more documents are based on an aggregation of i) the first plurality of fragments and the first set of embeddings, and ii) the second plurality of fragments and the second set of embeddings, wherein the aggregation utilizes both the first plurality of fragments and the second plurality of fragments and embeddings sets to provide the responses (fig. 14, step 1430, pars. 135-142, responses to query/prompt based on modified embeddings for relevant documents in view of Solomon token limit, Klafter).
2. Klafter in view of Solomon teach,
wherein the responses to queries based on the corpus of one or more documents aggregate results from i) the first large language model using the first plurality of fragments and the first set of embeddings, and ii) the second large language model using the second plurality of fragments and the second set of embeddings (fig. 14, steps 1410- 1430, selected LLMs provide responses, Klafter).
3. Klafter in view of Solomon teach,
further comprising: dividing the corpus of the one or more documents into the first plurality of fragments based further on semantic content associated with the corpus of the one or more documents and dividing the corpus of the one or more documents into the second plurality of fragments based further on the semantic content associated with the corpus of the one or more documents (figs. 7, 11 and 14, LLMs 702, par. 79-83, relevant trained LLMs to topics/concepts are selected in response to query/prompt based on similarity, note different LLMs are aggregated for response, see Step 1430, Klafter).
4. Klafter in view of Solomon teach,
further comprising: computing additional pluralities of fragments and additional sets of embeddings for additional large language models based on additional threshold sizes for the additional large language models (figs. 7, 11 and 14, LLMs 702, par. 79-83, relevant trained LLMs to topics/concepts are selected in response to query/prompt based on similarity, Klafter).
5. Klafter in view of Solomon teach, further comprising: storing the first plurality of fragments, the first set of embeddings, the second plurality of fragments, and the second set of embeddings in a persistent storage (fig. 2, items 218 and 228, pars. 138 and 139, store embeddings, Klafter).
6. Klafter in view of Solomon teach, further comprising: retrieving the first plurality of fragments, the first set of embeddings, the second plurality of fragments, and the second set of embeddings from the persistent storage for later use by the first large language model or the second large language model (fig. 2, items 218 and 228, pars. 138 and 139, store embeddings in storage for fast retrieval, Klafter).
7. Klafter in view of Solomon teach,, wherein the aggregation for the responses to queries is performed by a third large language model (figs. 7, 11 and 14, pars. 132 and 146, aggregated results by search engine or LLM by the plurality of LLMs, Klafter).
8. Klafter in view of Solomon teach, wherein: one or more of the first plurality of fragments are stored in a file system, a relational database management system, or a NoSQL database, and one or more of the second plurality of fragments are stored in the file system, the relational database management system, or the NoSQL database (pars. 136-139, relational databases, Klafter).
9. Klafter in view of Solomon teach,, wherein: one or more of the first set of embeddings are stored in a vector database, a file system, a relational database management system, or a NoSQL database, and one or more of the second set of embeddings are stored in the vector database, the file system, the relational database management system, or the NoSQL database (par. 141, vector database, Klafter).
10. Klafter in view of Solomon teach, further comprising: dividing the corpus of the one or more documents into the first plurality of fragments based further on a syntax associated with the corpus of the one or more documents; and dividing the corpus of the one or more documents into the second plurality of fragments based further on the syntax associated with the corpus of the one or more documents (par. 43, syntax trained LLMs, see further figs. 7, 11 and 14, LLMs 702, par. 79-83, relevant trained LLMs to topics/concepts are selected in response to query/prompt, Klafter).
12. Klafter teaches a method, comprising (figs. 1 and 7, par. 40, computer systems):
determining, by a computing system comprising one or more processors configured to perform one or more processes, based on a first large language model can accommodate (figs. 7, 11 and 14, LLMs 702, par. 79-83, relevant trained LLMs to topics/concepts are selected in response to query/prompt, note different LLM can be used and is noted based on language, items 1112 and 1118. Par. 127 specifically teaches splitting query into different parts and using specific LLM such as for generating a graph based on input), but Klafter does not explicitly teach the LLM is limited on a first threshold size for a first large language model based on a maximum size corresponding to a token limit. However, Solomon teaches token limit threshold corresponding to a token limit of an LLM or a maximum (fig. 1, par. 38, content of selected message responsive to a query does not exceed token limit of the LLM or a maximum threshold, see also pars. 73, “language model 106 having a token limit 115”, par. 76, “determining that a token count for content… is below a maximum threshold”, Solomon). It would have been obvious to one of ordinary skill in the field at the effective filing time of the application to integrate Klafter multiple LLM selections (fig. 7, item 702, Klafter) with Solomon LLM token limit requirements (par. 38, Solomon) to provide effective query responses. On would have been motivated to provide efficient query/prompt responses without overwhelming the AI/LLM systems.
Klafter in view of Solomon teach,
determining, by the computing system comprising one or more processors configured to perform one or more processes, a second threshold size for a second large language model based on a maximum size corresponding to a token limit that the second large language model can accommodate, wherein the second threshold size is different from the first threshold size (fig. 7, item 702, multiple LLMs Klafter, in view of fig. 1, pars. 30, 32, 38, content of selected message/documents responsive to a query does not exceed token limit of the LLM or a maximum threshold, Solomon).
dividing a plurality of documents into a first plurality of fragments at a first granularity based on semantic content of the plurality of documents, a syntax associated with the plurality of documents and the first threshold size (figs. 7, 11 and 14, LLMs 702, par. 79-83, relevant trained LLMs to topics/concepts are selected in response to query/prompt, note different LLM can be used and is noted based on language, items 1112 and 1118. Par. 127 specifically teaches splitting query into different parts and using specific LLM such as information about topic/industry based on input, modified Klafter. For additional support, instant application provisionally filed, on page 6, “analyzes user’s question to identify the key concepts and topics… looks up the scores of the LLMs on these topics… selects the LLM with highest score”, and “each of these sub-questions could be sent to a different LLM to get a variety of perspectives and insights. The model could then combine the answers from the LLMs to generate a more comprehensive and informative response to the original question”. Note that the sub-questions analysis by different LLM for a variety of perspectives and insights and combining the answers from the LLMs to generate a more informative response uses fragments of questions in vector forms structure as the LLM always interprets and associates data using vector embeddings); and
dividing the same plurality of documents into a second plurality of fragments at a second granularity different from the first granularity based on semantic content of the plurality of documents, the syntax associated with the plurality of documents and the second threshold size (figs. 7, 11 and 14, LLMs 702, par. 79-83, relevant trained LLMs to topics/concepts are selected in response to query/prompt, note different LLM can be used and is noted based on language, items 1112 and 1118, par. 127, plurality of different LLMs comprise different token limit, modified Klafter , each LLM receives the same corpus, pars. 113 and 115, manual input of data sources or web generated relevant to input prompt, Klafter, and fig. 1, par. 30 and 32, Chat thread 104 may be the relevant document responsive to the prompt, Solomon. Each LLM receives a respective number of segments from Chat Thread 104 based on token limit of an LLM Note: applicant confirms every LLM comprises different number of token limit, par. 74, Publication);
computing, by the computing system comprising one or more processors configured to perform one or more processes, a first set of embeddings using the first large language model to analyze the first plurality of fragments (pars. 135-142, embeddings created for relevant documents retrieved, LLM1, Klafter);
computing, by the computing system comprising one or more processors configured to perform one or more processes, a second set of embeddings using the second large language model to analyze the second plurality of fragments (pars. 135-142, embeddings created for relevant documents retrieved, LLM3, batteries industry, modified Klafter);
storing the first plurality of fragments, the first set of embeddings, the second plurality of fragments, and the second set of embeddings in a persistent storage (fig. 2, items 218 and 228, pars. 138 and 139, store embeddings, Klafter).
13. Klafter in view of Solomon teach further comprising: computing additional pluralities of fragments and additional sets of embeddings for additional large language models based on maximum sizes that the additional large language models can accommodate (figs. 7, 11 and 14, LLMs 702, par. 79-83, relevant trained LLMs to topics/concepts are selected in response to query/prompt based on similarity, Klafter).
14. Klafter in view of Solomon teach further comprising: wherein at least some of the first plurality of fragments or the second plurality of fragments are stored in one of a file system, a relational database management system, or a NoSQL database (fig. 2, items 218 and 228, pars. 138 and 139, store embeddings, Klafter).
15. Klafter in view of Solomon teach wherein at least some of the first set of embeddings or the second set of embeddings are stored in one of a vector database, a file system, a relational database management system, or a NoSQL database (fig. 2, items 218 and 228, pars. 138 and 139, store embeddings for retrieval, Klafter).
16. Klafter in view of Solomon teach wherein at least one of the maximum size that the first large language model can accommodate and the maximum size that the second large language model can accommodate comprises a token limit (fig. 1, par. 38, content of selected message responsive to a query does not exceed token limit of the LLM or a maximum threshold, Solomon).
17. Klafter teaches an apparatus, comprising (figs. 1 and 7, par. 40, computer systems):
one or more network interfaces to communicate with a network (figs. 1 and 7, par. 40, network):
a processor coupled to the one or more network interfaces and configured to execute one or more processes and a memory configured to store a process that is executable by the processor, the process comprising (figs. 1 and 7, par. 40, processor):
dividing, by a computing system comprising one or more processors configured to perform one or more processes, a corpus of one or more documents into a first plurality of fragments based on a first large language model (figs. 7, 11 and 14, LLMs 702, par. 79-83, relevant trained LLMs to topics/concepts are selected in response to query/prompt, note different LLM can be used and is noted based on language, items 1112 and 1118. Par. 127 specifically teaches splitting query into different parts and using specific LLM such as for generating a graph based on input), but Klafter does not explicitly teach the LLM is limited on a first threshold size corresponding to a token limit of a LLM. However, Solomon teaches token limit threshold corresponding to a token limit of an LLM or a maximum (fig. 1, par. 38, content of selected message responsive to a query does not exceed token limit of the LLM or a maximum threshold, see also pars. 73, “language model 106 having a token limit 115”, par. 76, “determining that a token count for content… is below a maximum threshold”, Solomon). It would have been obvious to one of ordinary skill in the field at the effective filing time of the application to integrate Klafter multiple LLM selections (fig. 7, item 702, Klafter) with Solomon LLM token limit requirements (par. 38, Solomon) to provide effective query responses. On would have been motivated to provide efficient query/prompt responses without overwhelming the AI/LLM systems.
Klafter in view of Solomon teach,
computing, by the computing system comprising one or more processors configured to perform one or more processes, a first set of embeddings using the first large language model to analyze the first plurality of fragments (pars. 135-142, embeddings created for relevant documents retrieved, LLM1, Klafter);
dividing, by the computing system comprising one or more processors configured to perform one or more processes, the same corpus of one or more documents into a second plurality of fragments at a different granularity based on a second threshold size corresponding to a token limit of a second of a second large language model wherein the second threshold size is different from the first threshold size (fig. 7, par. 127, plurality of different LLMs comprise different token limit, modified Klafter , each LLM receives the same corpus, pars. 113 and 115, manual input of data sources or web generated relevant to input prompt, Klafter, and fig. 1, par. 30 and 32, Chat thread 104 may be the relevant document responsive to the prompt, Solomon. Each LLM receives a respective number of segments from Chat Thread 104 based on token limit of an LLM.
Note: applicant confirms every LLM comprises different number of token limit, par. 74, Publication);
computing, by the computing system comprising one or more processors configured to perform one or more processes, a second set of embeddings using the second large language model to analyze the second plurality of fragments (pars. 135-142, embeddings created for relevant documents retrieved, LLM3, batteries industry, modified Klafter); and,
comparing a vector associated with queries (par. 123, input question split into vector embeddings, Klafter) to the first plurality of fragments and to the second plurality of fragments to determine vector similarity matches as part of providing responses to the queries (pars. 135 and 141, compared to embeddings stored in vector database, Klafter. For additional support, instant application provisionally filed, on page 6, “analyzes user’s question to identify the key concepts and topics… looks up the scores of the LLMs on these topics… selects the LLM with highest score”, and “each of these sub-questions could be sent to a different LLM to get a variety of perspectives and insights. The model could then combine the answers from the LLMs to generate a more comprehensive and informative response to the original question”. Note that the sub-questions analysis by different LLM for a variety of perspectives and insights and combining the answers from the LLMs to generate a more informative response uses fragments of questions in vector forms structure as the LLM always interprets and associates data using vector embeddings. For example, Solomon teaches and supports the concept of Klafter by using embedding models to obtain vector representations, see par. 31, Solomon. Content segments and query parameters are similarly used to analyze data, see pars. 32 and 33, Solomon).
wherein responses to queries according to the corpus of the one or more documents are based on an aggregation of i) the first plurality of fragments and the first set of embeddings, and ii) the second plurality of fragments and the second set of embeddings, wherein the aggregation utilizes both the first plurality of fragments and the second plurality of fragments and embeddings sets to provide the responses (fig. 14, step 1430, pars. 135-142, responses to query/prompt based on modified embeddings for relevant documents in view of Solomon token limit, Klafter).
18. Klafter in view of Solomon teach,
wherein the responses to queries based on the corpus of one or more documents aggregate results from i) the first large language model using the first plurality of fragments and the first set of embeddings, and ii) the second large language model using the second plurality of fragments and the second set of embeddings (fig. 14, steps 1410- 1430, selected LLMs provide responses, Klafter).
19. Klafter in view of Solomon teach,
further comprising: dividing the corpus of the one or more documents into the first plurality of fragments based further on semantic content associated with the corpus of the one or more documents and dividing the corpus of the one or more documents into the second plurality of fragments based further on the semantic content associated with the corpus of the one or more documents (figs. 7, 11 and 14, LLMs 702, par. 79-83, relevant trained LLMs to topics/concepts are selected in response to query/prompt based on similarity, note different LLMs are aggregated for response, see Step 1430, Klafter).
20. Klafter in view of Solomon teach, further comprising: dividing the corpus of the one or more documents into the first plurality of fragments based further on a syntax associated with the corpus of the one or more documents and dividing the corpus of the one or more documents into the second plurality of fragments based further on the syntax associated with the corpus of the one or more documents (par. 43, syntax trained LLMs, see further figs. 7, 11 and 14, LLMs 702, par. 79-83, relevant trained LLMs to topics/concepts are selected in response to query/prompt, Klafter).
21. Klafter in view of Solomon teach, wherein the determining the vector similarity matches includes utilizing at least one of cosine similarity, dot product, Euclidean distance, Manhattan distance, or Minkowski distance (fig. 1, LLM, see par. 141, cosine similarity and dot, Klafter, note that all LLM embedding models use the type of similarity methods to compare data).
Response to Arguments
Applicant's arguments filed on 3/5/26 have been fully considered but they are not persuasive. See remarks below.
Applicant alleges splitting a query is different than the claimed splitting a document corpus.
Examiner agrees. The rejection has been updated to clarify the splitting of a document inputted by a user and/or found by an LLM and not splitting a query. Please refer to the updated rejection for detailed mapping.
Applicant alleges there is no motivation to combine the references.
Examiner disagrees. Klafter in view of Solomon teach converting/extracting a responsive document to a query/input based on LLM token limit is illustrated in figure 7 of Klafter comprising a plurality of LLMs to parse an inputted document as done in Solomon and illustrated in figure 1. The office points to applicant’s own specification confirming that “every LLM comprises different number of token limit” par. 74, Publication). There is no logical reason why the prior art is not combinable. As such, all allegations are believed moot.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure in the field of large language models: USPN. 11769017 – fig. 1, item 150 LLM selected.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCIN R FILIPCZYK whose telephone number is (571)272-4019. The examiner can normally be reached M-F 7-4 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached at 571-272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
March 21, 2026
/MARCIN R FILIPCZYK/Primary Examiner, Art Unit 2153