Office Action Analysis: 18651241 — Section-based chunking technique for Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs)

Office Action

§101 §102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities: 
Paragraph 0079 reference to “vector store 22”, in order to match the drawings submitted and the rest of the specification, it should read “vector store 20”.  
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 to 20 are rejected under 35 U.S.C. 101 as being directed to a patent-ineligible subject matter.
The independent claim 1 is directed to a device (non-transitory computer-readable medium), which is a statutory category of invention. The only stated function of the device is to perform a series of steps of “in response to receiving a user query directed to subject information retrievable from documentation stored in a private database, using a section-based chunking procedure to obtain, from the private database, a relevant section of the documentation as context; and feeding the user query and the relevant section as context to a Large Language Model (LLM).” Those steps constitute as an abstract idea directed to a mental process that can be executed by a human using pen and paper or by a human using a generic computer, which is a judicial exception to patent eligibility. A human can read a query that will be sent to a generic LLM, determine if external information is needed, search for such information and modify the query to include a section of the text as context.  
The claim recites additional elements such as a “non-transitory computer-readable medium configured to store a computer program”, “one or more processing devices” and “Large Language Model (LLM)” that can be seen as a generic CRM, one or more generic processing devices and a generic Large Language Model software, where the sole purpose of the system is not significantly more than performing the steps listed and they don’t integrate the mental process into a practical application.
The dependent claims 2 to 10 are rejected under 35 USC § 101 as they include all the limitations of claim 1. Limitations listed in claims 2 to 5 and 9 to 10 can also be considered abstract ideas that could be executed by a human, some examples are “parse the user query and retrieve the relevant section…” (Claim 2), “use the inherent structure of the documentation…” (Claim 3), “separate the document into sections…” (Claim 4), “dividing the content of each section…” (Claim 5), “obtains the relevant section of the documentation” (Claim 9) and adjusting “the size of the user query and relevant section” “to fall within an input token limit of the LLM” (Claim 10). A human can read a query that will be sent to a generic LLM, determine if external information is needed, search for such information, divide the document into sections, select the section that contain the information and modify the query to include a section of the text as context within the input limit. Regarding claim 6 to 8, the limitations related to vectors and the operation of such in the application, for example “embedding content value of each section as vectors” (Claim 6), searching the private database for vectors semantically closest to the query vector” (Claim 7) and “detecting the header of the vector” (Claim 8). A human, to a certain extent, can index information in a form of vector, can search through a series of documents with the index system and compare the header if the indexed section against the query. The dependent claims 2 to 10 do not recite any additional elements therefore they do not describe a practical application or significantly more.
The independent claim 12 is directed to a process, which is a statutory category of invention. However, claim 12 recite “in response to receiving a user query directed to subject information retrievable from documentation stored in a private database, using a section-based chunking procedure to obtain, from the private database, a relevant section of the documentation as context; and feeding the user query and the relevant section as context to a Large Language Model (LLM).”. The limitations of “receiving a user query…”, “obtain, from a private database, a section of documentation”, “feeding the user query and the relevant section as context to a Large Language Model (LLM)” constitute as an abstract idea directed to a mental process that can be executed by a human using pen and paper or by a human using a generic computer, which is a judicial exception to patent eligibility. A human can read a query that will be sent to a generic LLM, determine if external information is needed, search for such information and modify the query to include a section of the text as context. 
The claim recites elements like a “private database” and “Large Language Model (LLM)” that can be seen as a generic private database and a generic large language model that this procedure is applied to. Those elements are considered conventional tool that the process is applied, without a practical application, and do not add significantly more to the method described in the claim. 
With respect to claims 13, 14 and 15, they are rejected under 35 USC § 101 as they include all the limitations of claim 12. Although the listed claims fall under the process category, which is a statutory category, they recite steps that include but are not limited to “parse the user query and retrieve relevant section” (Claim 13), “uses an inherent structure of the documentation” (Claim 14) and “separate the documentation into sections” (Claim 15) which are considered mental processes that can be executed by a human using pen and paper or a generic computer, hence retaining the inherited patent-ineligible status. The dependent claims do not recite any additional elements therefore they do not describe a practical application or significantly more.
The independent claim 16 is directed to a machine, which is a statutory category of invention. However, claim 16 recite steps like: “receiving a user query…”, “obtain, form a private database, relevant sections of documentation…” and “feeding the user query and the relevant section to a Large Language Model (LLM)” that constitute an abstract idea directed to a mental process that can be executed by a human using pen and paper or by a human using a generic computer, which is a judicial exception to patent eligibility. A human can read a query that will be sent to a generic LLM, determine if external information is needed, search for such information and modify the query to include a section of the text as context.
The claim recites elements like a “processor”, “memory”, “private database” and “Large Language Model (LLM)” that can be seen as a generic element that the procedure is applied to, and do not represent significantly more than a tool used by the steps listed and they don’t integrate the mental process into a practical application.
With respect to claims 17 to 20, they are rejected under 35 USC § 101 as they include all the limitations of claim 16. Furthermore, claims recite limitations that include “to embed the user query as a query vector, wherein obtaining the relevant section of the documentation as context…” (Claim 17), “obtains relevant sections of the documentation…” (Claim 18) and adjusting “the size of the user query and relevant section…” “to fall within an input token limit of the LLM…” (Claim 19) which are considered mental processes that can be executed by a human using pen and paper or a generic computer, hence retaining the inherited patent-ineligible status. A human can read a query that will be sent to a generic LLM, assign an index value to the query and look in a database for documents matching that index value, extract the relevant text information and modify the query to include a section of the text as context within the input limit of the LLM. The dependent claims do not recite any additional elements therefore they do not describe a practical application or significantly more.
Regarding the dependent claim 20, additional elements were added to the invention, elements like “the private database” that “is a vector store” and “a server and a retriever”, can be seen as generic network components operating as a generic database operated by a private entity, a generic server and a generic retriever, therefore they do not describe a practical application or significantly more than just tools to complete the steps listed in the independent claim.
The claims may be amended in accordance with MPEP § 2106 to overcome the 35 USC § 101 rejection. 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1 to 3, 7, 9 to 14, 16 and 18 to 20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Mondlock (US 12020140 B1) hereinafter Mondlock .

Regarding claim 1, Mondlock teaches:
A non-transitory computer-readable medium configured to store a computer program having logical instructions for enabling one or more processing devices to perform the steps of: in response to receiving a user query directed to subject information retrievable from documentation stored in a private database, using a section-based chunking procedure to obtain, from the private database, a relevant section of the documentation as context; 
Shown in “FIG. 7B illustrates aspects of the computer-implemented method 700 involving splitting documents into text chunks, selecting relevant text chunks, and concurrently sending augmented text chunks to an LLM to extract relevant information. In some aspects, the computer-implemented method 700 may include at block 722 causing the plurality of documents to be split into a text chunks set. The plurality of documents may be split by the document/asset/expert module 122 or the LLM service 170. Splitting the plurality of documents may be performed at block 380A of the RAG pipeline 300.” (Mondlock Column 23 lines 10 to 20). 
Where the private database is discussed in “The internal data store 140 may be owned or operated by the same organization that owns or operates the generative AI pipeline. The internal data store 140 may include a relational database (e.g., a PostgreSQL database), a non-relational datastore (e.g., a NoSQL database), a vector database (e.g., Pinecone), a web server, file server, and/or application server. In some aspects, the internal data store 140 may be located remotely from the server 110, such as in a public cloud environment. The internal data store 140 may store one or more data sources, such as a chat history 142, document collections 144, asset collections 146, and/or expert collections 148. Chat history 142 may include one or more records of the queries submitted by users and the responses output by server 110. Document collections 144 may include one or more sets of documents, such as web pages, PDFs, Word documents, text files, or any other suitable file containing text. Asset collections 146 may include one or more sets of databases, data sets, applications, models, knowledge graphs, or any other suitable sources of data. Expert collections 148 may include or more sets of identifying information for experts, corpora of experts' works, and/or experts' biographies for one or more subject matter experts. For example, any of the document collections 144, asset collections 146, or expert collections 148 may include data from a catalogue of documents, assets, or experts, such as a data set of assets and descriptions of the respective assets (e.g., applications or models for generating predictions or other data).” (Mondlock Column 4 line 65 to column 5 line 25).
and feeding the user query and the relevant section as context to a Large Language Model (LLM).  
“The computer-implemented method 600 may continue at block 622 by sending an augmented user query to an LLM to cause the LLM to obtain an answer from the LLM, such as an LLM service 170. The augmented user query may be sent by the LLM interface module 132. Sending the augmented user query may occur at block 388 of the RAG pipeline 300. The augmented user query may include the relevant information responses, the user query, and a prompt to cause the LLM to generate an answer.” (Mondlock Column 20 line 13 to 21).


Regarding Claim 2, Mondlock teaches:
The non-transitory computer-readable medium of claim 1, wherein the section-based chunking procedure uses Retrieval-Augmented Generation (RAG) to parse the user query and retrieve the relevant section. In the following sections:
“FIG. 3B illustrates an aspect of the RAG pipeline 300 involving receiving and processing a user query. In some aspects, the RAG pipeline 300 may include at block 320 receiving a user query. The user query may be received by the input/output module 120 or any other suitable program. The user query may be received from the user device 180 or from a generative AI pipeline, such as RAG pipeline 300. User queries may be distributed among a plurality of servers 110 by the load balancer 190. The user query may comprise a question or a request. The user query may comprise a selection or deselection of one or more document collections, expert collections, or asset collections. The user query may comprise a selection of whether to provide relevant information to the LLM to assist in answering the query (i.e., use RAG) or submit the query without providing relevant information (i.e., do not use RAG).” (Mondlock Column 11 lines 23 to 38).
“FIG. 3E illustrates an aspect of the RAG pipeline 300 involving splitting documents or assets into text or data chunks and determining the most relevant text or data chunks. In some aspects, the RAG pipeline 300 may include at block 380A splitting each selected document into a plurality of text chunks and/or splitting each selected asset into a plurality of data chunks at block 380B. The selected documents and/or selected assets may be split by the document/asset/expert module 122 or any other suitable program.” (Mondlock Column 15 line 63 to column 16 line 4).


Regarding Claim 3 Mondlock teaches:
The non-transitory computer-readable medium of claim 1, wherein the section-based chunking procedure uses an inherent structure of the documentation to select, for the relevant section, one or more of subsections, paragraphs, bullet point lists, and tables.
“In some aspects, the document/asset/expert module 122 may include instructions for splitting documents or assets into chunks and generating embeddings of those chunks. The document/asset/expert module 122 may split each document of document collections 144 and document collections 162 into a plurality of text chunks and split each asset of asset collections 146 and asset collections 164 into a plurality of text chunks and/or data chunks. The text chunks may be paragraph-sized, sentence-sized, fixed-sized (e.g., 50 words) or any other appropriate size. The document/asset/expert module 122 may use a tool, such as Natural Language Toolkit (NLTK) or Sentence Splitter, to perform the splitting. In some aspects, document/asset/expert module 122 may transmit the documents and/or assets, via the LLM interface module 132, to the LLM service 170 and receive text chunks and/or asset chunks from the LLM service 170.” (Mondlock Column 7 lines 4 to 18).

Regarding Claim 7, Mondlock teaches:
The non-transitory computer-readable medium of claim 1, wherein the logical instructions further enable the one or more processing devices to embed the user query as a query vector, wherein obtaining the relevant section of the documentation as context includes searching the private database for vectors semantically closest to the query vector. In the following sections: 
“In some aspects, the document/asset/expert module 122 may include instructions for generating embeddings from each text chunk and/or data chunk. The embeddings represent the text chunks and data chunks as multi-dimensional (e.g., 768 or 1,536 dimension) vectors of numerical values. The document/asset/expert module 122 may use Word2Vec, Bidirectional Encoder Representations from Transformers (BERT), or other suitable algorithms to generate the embeddings. Alternatively, the document/asset/expert module 122 may transmit the text chunks and/or data chunks, via the LLM interface module 132, to the LLM service 170 (e.g., using the text-embedding-ada-002 model) and receive embeddings from the LLM service 170. The document/asset/expert module 122 may save the embeddings into embeddings 168 in the external data sources 160. The embeddings 168 may comprise a vector database, such as ChromaDB, Pinecone, or Milvus.” (Mondlock Column 7 lines 19 to 35).
“In some aspects, the relevant information identification module 130 may include instructions for identifying relevant text chunks and/or data chunks from the relevant documents and/or relevant assets. The relevant information identification module 130 may use a semantic search to compare the embedding of the user query to embeddings of the text chunks and/or data chunks to identify relevant text chunks and/or relevant data chunks” (Mondlock Column 9 lines 7 to 14).
“The computer-implemented method 600 may continue at block 614 by causing chunk similarity scores to be calculated. The chunk similarity scores may be calculated by the relevant information identification module 130 or the LLM service 170. Chunk similarity scores may be calculated at blocks 382A and/or 384A of the RAG pipeline 300. Chunk similarity scores may indicate semantic similarity of the user query to each text chunk of the plurality of documents. The chunk similarity scores may be calculated using various techniques, such as cosine similarity between the vectors representing the chunks.” (Mondlock Column 20 lines 5 to 12).


Regarding Claim 9, Mondlock teaches:
The non-transitory computer-readable medium of claim 1, wherein the section-based chunking procedure obtains the relevant section of the documentation in a manner unrelated to a sliding window procedure.
“In some aspects, the RAG pipeline 300 may include at block 384A identifying the top relevant text chunks and/or identifying the top relevant data chunks at block 384B. The top relevant text chunks and/or data chunks may be identified by the relevant information identification module 130 or any other suitable program. The top relevant text and data chunks may be identified by a semantic search. The semantic search may comprise performing a KNN search of the user query embedding and text chunk and/or data chunk embeddings.” (Mondlock Column 16 lines 14 to 23).

Regarding Claim 10, Mondlock teaches:
The non-transitory computer-readable medium of claim 1, wherein a size of the user query and relevant section is configured to fall within an input token limit of the LLM. 
“In some aspects, the query module 128 may include instructions for generating an augmented user query from the rephrased user query. The query module 128 may supplement the rephrased user query with information obtained (via the document/asset/expert module 122 and the relevant information identification module 130) from document collections 144, document collections 162, asset collections 146, asset collections 164, and/or other suitable sources to generate a prompt. For example, a rephrased user query may ask a question regarding Acme Corp.'s most recent earnings report. The query module 128 may append the contents of Acme Corp.'s earnings report when generating the augmented user query. The query module 128 may summarize an augmented user query in order to satisfy a maximum word or token limit of the LLM service 170. For example, the query module 128 may implement map reduce functionality to split Acme Corp.'s earnings report document into a plurality of text chunks and summarize each text chunk to generate a summarized output text suitable for submission to the LLM service 170.” (Mondlock Column 8 lines 19 to 38).

Regarding claim 11, Mondlock teaches:
The non-transitory computer-readable medium of claim 1, wherein the private database is a vector store.
“The internal data store 140 may be owned or operated by the same organization that owns or operates the generative AI pipeline. The internal data store 140 may include a relational database (e.g., a PostgreSQL database), a non-relational datastore (e.g., a NoSQL database), a vector database (e.g., Pinecone), a web server, file server, and/or application server.” (Mondlock Column 4 line 65 to column 5 line 4).
“In some aspects, the computer-implemented method 700 may include at block 726 saving the text chunks set and text embeddings into a data store. The text chunks set and text embeddings may be saved by the document/asset/expert module 122. The text chunks set and text embeddings may be saved at block 382A of the RAG pipeline 300. The text chunks set and text embeddings may be saved into internal data store 140.” (Mondlock Column 23 lines 28 to 35).

Regarding claim 12, Mondlock teaches:
A method comprising the steps of: in response to receiving a user query directed to subject information retrievable from documentation stored in a private database, using a section-based chunking procedure to obtain, from the private database, a relevant section of the documentation as context; 
“FIG. 7B illustrates aspects of the computer-implemented method 700 involving splitting documents into text chunks, selecting relevant text chunks, and concurrently sending augmented text chunks to an LLM to extract relevant information. In some aspects, the computer-implemented method 700 may include at block 722 causing the plurality of documents to be split into a text chunks set. The plurality of documents may be split by the document/asset/expert module 122 or the LLM service 170. Splitting the plurality of documents may be performed at block 380A of the RAG pipeline 300.” (Mondlock Column 23 lines 10 to 20).
“The internal data store 140 may be owned or operated by the same organization that owns or operates the generative AI pipeline.” (Mondlock Column 4 lines 65 to 67).
and feeding the user query and the relevant section as context to a Large Language Model (LLM).
“The computer-implemented method 600 may continue at block 622 by sending an augmented user query to an LLM to cause the LLM to obtain an answer from the LLM, such as an LLM service 170. The augmented user query may be sent by the LLM interface module 132. Sending the augmented user query may occur at block 388 of the RAG pipeline 300. The augmented user query may include the relevant information responses, the user query, and a prompt to cause the LLM to generate an answer.” (Mondlock Column 20 line 13 to 21).

Regarding claim 13, Mondlock teaches:
The method of claim 12, wherein the section-based chunking procedure uses Retrieval-Augmented Generation (RAG) to parse the user query and retrieve the relevant section.  
“FIG. 3B illustrates an aspect of the RAG pipeline 300 involving receiving and processing a user query. In some aspects, the RAG pipeline 300 may include at block 320 receiving a user query.” (Mondlock Column 11 lines 23 to 26).
“FIG. 3E illustrates an aspect of the RAG pipeline 300 involving splitting documents or assets into text or data chunks and determining the most relevant text or data chunks.” (Mondlock Column 15 lines 63 to 66).

Regarding claim 14, Mondlock teaches:
The method of claim 12, wherein the section-based chunking procedure uses an inherent structure of the documentation including, for the relevant section, one or more subsections, paragraphs, bullet point lists, and tables.
“In some aspects, the document/asset/expert module 122 may include instructions for splitting documents or assets into chunks and generating embeddings of those chunks. The document/asset/expert module 122 may split each document of document collections 144 and document collections 162 into a plurality of text chunks and split each asset of asset collections 146 and asset collections 164 into a plurality of text chunks and/or data chunks. The text chunks may be paragraph-sized, sentence-sized, fixed-sized (e.g., 50 words) or any other appropriate size. The document/asset/expert module 122 may use a tool, such as Natural Language Toolkit (NLTK) or Sentence Splitter, to perform the splitting. In some aspects, document/asset/expert module 122 may transmit the documents and/or assets, via the LLM interface module 132, to the LLM service 170 and receive text chunks and/or asset chunks from the LLM service 170.” (Mondlock Column 7 lines 4 to 18).


Regarding claim 16, Mondlock teaches: 
A system comprising: a processing device; and memory configured to store computer logic having instructions enabling the processing device to perform the steps of: in response to receiving a user query directed to subject information retrievable from documentation stored in a private database, using a section-based chunking procedure to obtain, from the private database, a relevant section of the documentation as context; 
“FIG. 7B illustrates aspects of the computer-implemented method 700 involving splitting documents into text chunks, selecting relevant text chunks, and concurrently sending augmented text chunks to an LLM to extract relevant information. In some aspects, the computer-implemented method 700 may include at block 722 causing the plurality of documents to be split into a text chunks set. The plurality of documents may be split by the document/asset/expert module 122 or the LLM service 170. Splitting the plurality of documents may be performed at block 380A of the RAG pipeline 300.” (Mondlock Column 23 lines 10 to 20).
“The internal data store 140 may be owned or operated by the same organization that owns or operates the generative AI pipeline.” (Mondlock Column 4 lines 65 to 67).
and feeding the user query and the relevant section as context to a Large Language Model (LLM).
“The computer-implemented method 600 may continue at block 622 by sending an augmented user query to an LLM to cause the LLM to obtain an answer from the LLM, such as an LLM service 170. The augmented user query may be sent by the LLM interface module 132. Sending the augmented user query may occur at block 388 of the RAG pipeline 300. The augmented user query may include the relevant information responses, the user query, and a prompt to cause the LLM to generate an answer.” (Mondlock Column 20 line 13 to 21).

Regarding claim 18, Mondlock teaches:
The system of claim 16, wherein the section-based chunking procedure obtains the relevant section of the documentation in a manner unrelated to a sliding window procedure.
“In some aspects, the RAG pipeline 300 may include at block 384A identifying the top relevant text chunks and/or identifying the top relevant data chunks at block 384B. The top relevant text chunks and/or data chunks may be identified by the relevant information identification module 130 or any other suitable program. The top relevant text and data chunks may be identified by a semantic search. The semantic search may comprise performing a KNN search of the user query embedding and text chunk and/or data chunk embeddings.” (Mondlock Column 16 lines 14 to 23).

Regarding claim 19, Mondlock teaches:
The system of claim 16, wherein a size of the user query and relevant section is configured to fall within an input token limit of the LLM.
“In some aspects, the query module 128 may include instructions for generating an augmented user query from the rephrased user query. The query module 128 may supplement the rephrased user query with information obtained (via the document/asset/expert module 122 and the relevant information identification module 130) from document collections 144, document collections 162, asset collections 146, asset collections 164, and/or other suitable sources to generate a prompt. For example, a rephrased user query may ask a question regarding Acme Corp.'s most recent earnings report. The query module 128 may append the contents of Acme Corp.'s earnings report when generating the augmented user query. The query module 128 may summarize an augmented user query in order to satisfy a maximum word or token limit of the LLM service 170. For example, the query module 128 may implement map reduce functionality to split Acme Corp.'s earnings report document into a plurality of text chunks and summarize each text chunk to generate a summarized output text suitable for submission to the LLM service 170.” (Mondlock Column 8 lines 19 to 38).

Regarding claim 20, Mondlock teaches:
The system of claim 16, wherein the private database is a vector store, and wherein the system includes one or more of a server and a retriever.
“The internal data store 140 may be owned or operated by the same organization that owns or operates the generative AI pipeline. The internal data store 140 may include a relational database (e.g., a PostgreSQL database), a non-relational datastore (e.g., a NoSQL database), a vector database (e.g., Pinecone), a web server, file server, and/or application server.” (Mondlock Column 8 lines 19 to 38).
“In some aspects, the computer-implemented method 700 may include at block 726 saving the text chunks set and text embeddings into a data store. The text chunks set and text embeddings may be saved by the document/asset/expert module 122. The text chunks set and text embeddings may be saved at block 382A of the RAG pipeline 300. The text chunks set and text embeddings may be saved into internal data store 140.” (Mondlock Column 4 line 65 to column 5 line 4).
“The server 110 may be an individual server, a group (e.g., cluster) of multiple servers, or another suitable type of computing device or system (e.g., a collection of computing resources). The server 110 may be located within the enterprise network of an organization that owns or operates the generative AI pipeline or hosted by a third-party provider.” (Mondlock Column 4 lines 31 to 36).
“In some aspects, if the intent is to query one or more document collections, then the RAG pipeline 300 may include at block 354A retrieving the one or more documents from the one or more selected document collections. The documents may be retrieved by the document/asset/expert module 122 or any other suitable program. The retrieved documents may be stored, short term or long term, in document collections 144.” (Mondlock Column 14 lines 1 to 8).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 4, 5, 6, 8, 15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Mondlock in view of Arunachalam; Elavarasi et al. (US 20250211549 A1) hereinafter Arunachalam.


Regarding claim 4, the rejection of claim 1 is incorporated, furthermore Mondlock teaches:
The non-transitory computer-readable medium of claim 1, wherein, before receiving the user query, the logical instructions further enable the one or more processing devices to perform a data preparation procedure to separate the documentation into sections,
“In some aspects, the document/asset/expert module 122 may include instructions for splitting documents or assets into chunks and generating embeddings of those chunks. The document/asset/expert module 122 may split each document of document collections 144 and document collections 162 into a plurality of text chunks and split each asset of asset collections 146 and asset collections 164 into a plurality of text chunks and/or data chunks. The text chunks may be paragraph-sized, sentence-sized, fixed-sized (e.g., 50 words) or any other appropriate size. The document/asset/expert module 122 may use a tool, such as Natural Language Toolkit (NLTK) or Sentence Splitter, to perform the splitting. In some aspects, document/asset/expert module 122 may transmit the documents and/or assets, via the LLM interface module 132, to the LLM service 170 and receive text chunks and/or asset chunks from the LLM service 170.” (Mondlock Column 7 lines 4 to 18).
Mondlock does not teach: each section including content under a respective section header. On the other hand, Arunachalam teaches: “More specifically, during an ingestion stage, a generative AI system may receive data from various sources. The data may correspond to a project and may be included in documents, transcripts, and the like. The generative AI system may convert the data into a uniform format and may also generate an identifier, such as a project identifier, for the data. Next, the generative AI system may divide the data in the uniform format into chunks having a predefined length. The generative AI system may also associate metadata tags with the chunks. There may be one metadata tag for one chunk in some embodiments. The metadata tag may include the identifier, the title of the chunk, the hierarchy of the chunk compared to other chunks, the data source (e.g., the document or transcript from where the chunk came from), and the like. From the chunks, an embedding large language model (LLM) may generate embedding vectors (or simply vectors). The vectors may include numeric embeddings that represent information in the chunks. The generative AI system may generate a dictionary for the data, where the dictionary includes or points to an identifier, chunks, metadata tags, and vectors associated with the data. The generative AI system may store the dictionary, including the identifier, chunks, metadata tags, and vectors in a vector storage, or a combination of various storage devices.” (Arunachalam [0015]). 
It would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Mondlock to incorporate the teachings of Arunachalam to include the header of the text chunk in the embeddings of the text chunk. The motivation to include the respective header of the text chunk to the text chunk embedding is discussed by Arunachalam and can be found in “If the request for information is a request for a summary, the generative AI system may use the identifier to identify the dictionary. From the dictionary, the generative AI system may use the metadata tags to identify a subset of chunks that may include information that may be used to generate the summary. Next, the generative AI system may use the subset of chunks to generate a summary. Notably, the generative AI system may refine the summary using the subset of chunks or retrieving additional chunks from the vectors.” (Arunachalam [0018]).

Regarding claim 5, the rejection of claim 4 is incorporated, furthermore Mondlock teaches: 
The non-transitory computer-readable medium of claim 4, wherein the data preparation procedure further includes dividing the content of each section into one or more of paragraphs, table entries, and subsections.
“In some aspects, the document/asset/expert module 122 may include instructions for splitting documents or assets into chunks and generating embeddings of those chunks. The document/asset/expert module 122 may split each document of document collections 144 and document collections 162 into a plurality of text chunks and split each asset of asset collections 146 and asset collections 164 into a plurality of text chunks and/or data chunks. The text chunks may be paragraph-sized, sentence-sized, fixed-sized (e.g., 50 words) or any other appropriate size. The document/asset/expert module 122 may use a tool, such as Natural Language Toolkit (NLTK) or Sentence Splitter, to perform the splitting. In some aspects, document/asset/expert module 122 may transmit the documents and/or assets, via the LLM interface module 132, to the LLM service 170 and receive text chunks and/or asset chunks from the LLM service 170.” (Mondlock Column 7 lines 4 to 18).

Regarding claim 6, the rejection of claim 4 is incorporated, furthermore Mondlock teaches: 
The non-transitory computer-readable medium of claim 4, wherein the data preparation procedure further includes embedding a content value of each section as vectors in the private database to enable the documentation to be searched by section.
“In some aspects, the document/asset/expert module 122 may include instructions for generating embeddings from each text chunk and/or data chunk. The embeddings represent the text chunks and data chunks as multi-dimensional (e.g., 768 or 1,536 dimension) vectors of numerical values. The document/asset/expert module 122 may use Word2Vec, Bidirectional Encoder Representations from Transformers (BERT), or other suitable algorithms to generate the embeddings. Alternatively, the document/asset/expert module 122 may transmit the text chunks and/or data chunks, via the LLM interface module 132, to the LLM service 170 (e.g., using the text-embedding-ada-002 model) and receive embeddings from the LLM service 170. The document/asset/expert module 122 may save the embeddings into embeddings 168 in the external data sources 160. The embeddings 168 may comprise a vector database, such as ChromaDB, Pinecone, or Milvus.” (Mondlock Column 7 lines 19 to 35).
“In some aspects, the relevant information identification module 130 may include instructions for identifying relevant text chunks and/or data chunks from the relevant documents and/or relevant assets. The relevant information identification module 130 may use a semantic search to compare the embedding of the user query to embeddings of the text chunks and/or data chunks to identify relevant text chunks and/or relevant data chunks” (Mondlock Column 9 lines 7 to 14).

Regarding claim 8, the rejection of claim 7 is incorporated, furthermore Mondlock teaches:
The non-transitory computer-readable medium of claim 7, wherein obtaining the relevant section further includes a) detecting a header of the vectors semantically closest to the query vector and b) searching the private database for subsections having headers that match the header of the vectors semantically closest to the query vector.
“In some aspects, the relevant information identification module 130 may include instructions for identifying relevant text chunks and/or data chunks from the relevant documents and/or relevant assets. The relevant information identification module 130 may use a semantic search to compare the embedding of the user query to embeddings of the text chunks and/or data chunks to identify relevant text chunks and/or relevant data chunks” (Mondlock Column 9 lines 7 to 14).
“In some aspects, the RAG pipeline 300 may include at block 384A identifying the top relevant text chunks and/or identifying the top relevant data chunks at block 384B. The top relevant text chunks and/or data chunks may be identified by the relevant information identification module 130 or any other suitable program. The top relevant text and data chunks may be identified by a semantic search. The semantic search may comprise performing a KNN search of the user query embedding and text chunk and/or data chunk embeddings.” (Mondlock Column 16 lines 14 to 23).
“In some aspects, the RAG pipeline 300 may include at block 366B performing a semantic search of the asset metadata from the one or more asset collections selected at block 338. The semantic search may be performed by the relevant information identification module 130 or any other suitable program. The semantic search may comprise performing a KNN search of the user query embedding and asset metadata embeddings. The asset metadata embeddings may be stored in embeddings 168. The semantic search may select one or more assets whose metadata have the highest semantic similarity to the user query.” (Mondlock Column 15 lines 1 to 13).
Mondlock does not teach the header of the text chunk is embedded with the text chunk as a searchable tag of the text chunk, although it talks about having searchable metadata embedded with the text chunk and using it to reference an asset or a section of a document. On the other hand, Arunachalam define what can be considered as metadata that could be embedded with a text chunk that is being prepared for an LLM in here: “The generative AI system may also associate metadata tags with the chunks. There may be one metadata tag for one chunk in some embodiments. The metadata tag may include the identifier, the title of the chunk, the hierarchy of the chunk compared to other chunks, the data source (e.g., the document or transcript from where the chunk came from), and the like.” (Arunachalam [0015] lines 9 to 15).
Similar to claim 4, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Mondlock to incorporate the teachings of Arunachalam to include the header of the text chunk in the embeddings of the text chunk. The motivation to include the respective header of the text chunk to the text chunk embedding is discussed by Arunachalam and can be found in “If the request for information is a request for a summary, the generative AI system may use the identifier to identify the dictionary. From the dictionary, the generative AI system may use the metadata tags to identify a subset of chunks that may include information that may be used to generate the summary. Next, the generative AI system may use the subset of chunks to generate a summary. Notably, the generative AI system may refine the summary using the subset of chunks or retrieving additional chunks from the vectors.” (Arunachalam [0018]).


Regarding claim 15, the rejection of claim 12 is incorporated, furthermore Mondlock teaches: 
dividing the content of each section into one or more of paragraphs, table entries, and subsections; 
“In some aspects, the document/asset/expert module 122 may include instructions for splitting documents or assets into chunks and generating embeddings of those chunks. The document/asset/expert module 122 may split each document of document collections 144 and document collections 162 into a plurality of text chunks and split each asset of asset collections 146 and asset collections 164 into a plurality of text chunks and/or data chunks. The text chunks may be paragraph-sized, sentence-sized, fixed-sized (e.g., 50 words) or any other appropriate size. The document/asset/expert module 122 may use a tool, such as Natural Language Toolkit (NLTK) or Sentence Splitter, to perform the splitting. In some aspects, document/asset/expert module 122 may transmit the documents and/or assets, via the LLM interface module 132, to the LLM service 170 and receive text chunks and/or asset chunks from the LLM service 170.” (Mondlock Column 7 lines 4 to 18).
and embedding a content value of each section as vectors in the private database to enable the documentation to be searched by section.
“In some aspects, the document/asset/expert module 122 may include instructions for generating embeddings from each text chunk and/or data chunk. The embeddings represent the text chunks and data chunks as multi-dimensional (e.g., 768 or 1,536 dimension) vectors of numerical values. The document/asset/expert module 122 may use Word2Vec, Bidirectional Encoder Representations from Transformers (BERT), or other suitable algorithms to generate the embeddings. Alternatively, the document/asset/expert module 122 may transmit the text chunks and/or data chunks, via the LLM interface module 132, to the LLM service 170 (e.g., using the text-embedding-ada-002 model) and receive embeddings from the LLM service 170. The document/asset/expert module 122 may save the embeddings into embeddings 168 in the external data sources 160. The embeddings 168 may comprise a vector database, such as ChromaDB, Pinecone, or Milvus.” (Mondlock Column 7 lines 19 to 35).
“In some aspects, the relevant information identification module 130 may include instructions for identifying relevant text chunks and/or data chunks from the relevant documents and/or relevant assets. The relevant information identification module 130 may use a semantic search to compare the embedding of the user query to embeddings of the text chunks and/or data chunks to identify relevant text chunks and/or relevant data chunks” (Mondlock Column 9 lines 7 to 14).
Mondlock does not teach: The method of claim 12, wherein, before receiving the user query, the process further comprises the steps of: performing a data preparation procedure to separate the documentation into sections, each section including content under a respective section header;
In Mondlock there is no specific mention that the header of the text chunk is embedded with the text chunk as a searchable tag of the text chunk. 
On the other hand, Arunachalam define sections of metadata that could be embedded in a vector with a text chunk that is being prepared for an LLM in here: “The generative AI system may also associate metadata tags with the chunks. There may be one metadata tag for one chunk in some embodiments. The metadata tag may include the identifier, the title of the chunk, the hierarchy of the chunk compared to other chunks, the data source (e.g., the document or transcript from where the chunk came from), and the like.” (Arunachalam [0015] lines 9 to 15).
Such metadata portions in the embeddings of the text chunks could modify the data preparation procedure disclosed by Mondlock in: “In some aspects, the document/asset/expert module 122 may include instructions for splitting documents or assets into chunks and generating embeddings of those chunks. The document/asset/expert module 122 may split each document of document collections 144 and document collections 162 into a plurality of text chunks and split each asset of asset collections 146 and asset collections 164 into a plurality of text chunks and/or data chunks. The text chunks may be paragraph-sized, sentence-sized, fixed-sized (e.g., 50 words) or any other appropriate size. The document/asset/expert module 122 may use a tool, such as Natural Language Toolkit (NLTK) or Sentence Splitter, to perform the splitting. In some aspects, document/asset/expert module 122 may transmit the documents and/or assets, via the LLM interface module 132, to the LLM service 170 and receive text chunks and/or asset chunks from the LLM service 170.” (Mondlock Column 7 lines 4 to 18).
Similar to claim 4 and 8, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Mondlock to incorporate the teachings of Arunachalam to include the header of the text chunk in the embeddings of the text chunk. The motivation to include the respective header of the text chunk to the text chunk embedding is discussed by Arunachalam and can be found in “If the request for information is a request for a summary, the generative AI system may use the identifier to identify the dictionary. From the dictionary, the generative AI system may use the metadata tags to identify a subset of chunks that may include information that may be used to generate the summary. Next, the generative AI system may use the subset of chunks to generate a summary. Notably, the generative AI system may refine the summary using the subset of chunks or retrieving additional chunks from the vectors.” (Arunachalam [0018]).

Regarding claim 17, the rejection of claim 16 is incorporated, furthermore Mondlock teaches: 
The system of claim 16, wherein the instructions further enable the processing device to embed the user query as a query vector, wherein obtaining the relevant section of the documentation as context includes: searching the private database for vectors semantically closest to the query vector, 
“In some aspects, the document/asset/expert module 122 may include instructions for generating embeddings from each text chunk and/or data chunk. The embeddings represent the text chunks and data chunks as multi-dimensional (e.g., 768 or 1,536 dimension) vectors of numerical values. The document/asset/expert module 122 may use Word2Vec, Bidirectional Encoder Representations from Transformers (BERT), or other suitable algorithms to generate the embeddings. Alternatively, the document/asset/expert module 122 may transmit the text chunks and/or data chunks, via the LLM interface module 132, to the LLM service 170 (e.g., using the text-embedding-ada-002 model) and receive embeddings from the LLM service 170. The document/asset/expert module 122 may save the embeddings into embeddings 168 in the external data sources 160. The embeddings 168 may comprise a vector database, such as ChromaDB, Pinecone, or Milvus.” (Mondlock Column 7 lines 19 to 35).
“In some aspects, the relevant information identification module 130 may include instructions for identifying relevant text chunks and/or data chunks from the relevant documents and/or relevant assets. The relevant information identification module 130 may use a semantic search to compare the embedding of the user query to embeddings of the text chunks and/or data chunks to identify relevant text chunks and/or relevant data chunks” (Mondlock Column 9 lines 7 to 14).
Mondlock does not explicitly teach: detecting a header of the vectors semantically closest to the query vector, and searching the private database for subsections having headers that match the header of the vectors semantically closest to the query vector. Since in Mondlock there is no specific mention of the header of the text chunk being embedded with the text chunk as a searchable tag of the text chunk or any metadata embedded in a chunk.
Regarding detecting a header of the vectors semantically closest to the query vector, Mondlock discloses the use of semantic search to compare the embeddings of a user query vector and the embeddings of a text chunk. “The computer-implemented method 600 may continue at block 614 by causing chunk similarity scores to be calculated. The chunk similarity scores may be calculated by the relevant information identification module 130 or the LLM service 170. Chunk similarity scores may be calculated at blocks 382A and/or 384A of the RAG pipeline 300. Chunk similarity scores may indicate semantic similarity of the user query to each text chunk of the plurality of documents. The chunk similarity scores may be calculated using various techniques, such as cosine similarity between the vectors representing the chunks.” (Mondlock Column 19 lines 45 to 55).
On the other hand, Arunachalam describe the inclusion of such information in the embeddings of a text chunk that is being prepared for an LLM: “The generative AI system may also associate metadata tags with the chunks. There may be one metadata tag for one chunk in some embodiments. The metadata tag may include the identifier, the title of the chunk, the hierarchy of the chunk compared to other chunks, the data source (e.g., the document or transcript from where the chunk came from), and the like.” (Arunachalam [0015] lines 9 to 15).
Regarding and searching the private database for subsections having headers that match the header of the vectors semantically closest to the query vector. Mondlock discloses the use of semantic search to search in the database compare the embeddings of a user query vector and the embeddings of a text chunk.
“The internal data store 140 may be owned or operated by the same organization that owns or operates the generative AI pipeline. The internal data store 140 may include a relational database (e.g., a PostgreSQL database), a non-relational datastore (e.g., a NoSQL database), a vector database (e.g., Pinecone), a web server, file server, and/or application server. In some aspects, the internal data store 140 may be located remotely from the server 110, such as in a public cloud environment. The internal data store 140 may store one or more data sources, such as a chat history 142, document collections 144, asset collections 146, and/or expert collections 148.” (Mondlock Column 4 line 65 to column 5 line 9).
“In some aspects, the relevant information identification module 130 may include instructions for identifying one or more documents, assets, and/or experts that are relevant to the user query. The relevant information identification module 130 may use a semantic search to identify the relevant documents, assets, and experts. The semantic search may include (1) generating an embedding of the user query; and (2) compare the user query embedding to the document and asset embeddings in embeddings 168 using clustering techniques, such as k-means clustering, to identify relevant documents and/or assets. Alternatively, the relevant information identification module 130 may transmit the query embedding, the document embeddings, and/or the data embeddings, via the LLM interface module 132, to the LLM service 170 and receive semantic search scores from the LLM service 170.” (Mondlock Column 8 lines 44 to 59).
“In some aspects, the relevant information identification module 130 may include instructions for identifying relevant text chunks and/or data chunks from the relevant documents and/or relevant assets. The relevant information identification module 130 may use a semantic search to compare the embedding of the user query to embeddings of the text chunks and/or data chunks to identify relevant text chunks and/or relevant data chunks. Alternatively, the relevant information identification module 130 may transmit the query embedding, the text chunk embeddings, and/or the data chunk embeddings, via the LLM interface module 132, to the LLM service 170 and receive semantic search scores from the LLM service 170.” (Mondlock Column 9 lines 7 to 19).
On the other hand, Arunachalam describe the inclusion of such information in the embeddings of a text chunk that is being prepared for an LLM: “The generative AI system may also associate metadata tags with the chunks. There may be one metadata tag for one chunk in some embodiments. The metadata tag may include the identifier, the title of the chunk, the hierarchy of the chunk compared to other chunks, the data source (e.g., the document or transcript from where the chunk came from), and the like.” (Arunachalam [0015] lines 9 to 15).
Similar to claim 4, 8 and 15, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Mondlock to incorporate the teachings of Arunachalam to include the header of the text chunk in the embedding information and use it to compare the query vector to determine which text chunks are semantically closer. The motivation to include the respective header of the text chunk to the text chunk embedding is discussed by Arunachalam and can be found in “If the request for information is a request for a summary, the generative AI system may use the identifier to identify the dictionary. From the dictionary, the generative AI system may use the metadata tags to identify a subset of chunks that may include information that may be used to generate the summary. Next, the generative AI system may use the subset of chunks to generate a summary. Notably, the generative AI system may refine the summary using the subset of chunks or retrieving additional chunks from the vectors.” (Arunachalam [0018]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HECTOR J. CRESPO FEBLES whose telephone number is (571)272-4512. The examiner can normally be reached Mon - Fri 7:30 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/H.J.C./Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657
Read full office action
Section-based chunking technique for Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs)

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Section-based chunking technique for Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs)

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email