Last updated: May 29, 2026
Application No. 18/652,839
RANKING-AUGMENTED GENERATION FOR LONG DOCUMENTS

Non-Final OA §101§103
Filed
May 02, 2024
Examiner
SIRJANI, FARIBA
Art Unit
2659
Tech Center
2600 — Communications
Assignee
International Business Machines Corporation
OA Round
2 (Non-Final)
Interview Optional

— +31.5% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 76% grant rate with +31.5% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 554 resolved cases, 2023–2026
Examiner Intelligence

SIRJANI, FARIBA View full profile →
Grants 76% — above average
Career Allowance Rate
419 granted / 554 resolved
+13.6% vs TC avg
Strong +32% interview lift
Without
With
+31.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
19 currently pending
Career history
580
Total Applications
across all art units
Statute-Specific Performance

§101
1.5%
-38.5% vs TC avg
§103
91.0%
+51.0% vs TC avg
§102
3.9%
-36.1% vs TC avg
§112
1.3%
-38.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 554 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending. Claims 1, 8, and 15 are independent.  Claims are not amended.
This Application was published as U.S.  20250342181.
Apparent priority: 2 May 2024.
Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection that, if presented, were necessitated by the amendments to the Claims.
This action is Final.	
Response to Arguments
Claims have not been amended and the Applicant has chosen to traverse.
The following present Examiner’s Reply to the Applicant’s Arguments in the Response submitted by the Applicant.
101 Rejection
Applicant’s arguments on pages 7-11 of the Response traverse the 35 U.S.C. 101 rejection.
Each heading is addressed in turn.

    PNG
    media_image1.png
    88
    726
    media_image1.png
    Greyscale

Applicant’s arguments in this area, similar to those presented with respect to the 35 USC 103 rejection far exceed the substance presented in the Claim language or even supported by the Disclosure that could have been brought in by amendment.
For example, to the mapping of the Claim to the example of a student preparing material for an open book exam that is provided in the rejection, the Applicant responds:

    PNG
    media_image2.png
    160
    710
    media_image2.png
    Greyscale

Response at 8.
In Reply, a student using a computer is still human mental activity.  It is not an invention, an integration into a practical application, or significantly more than the abstract idea.  The word processor is an invention; using it is not.  LLMs are inventions; sitting in front of a computer and asking the LLM questions is not.
Additionally, the analogy presented by the rejection included both with and without computer versions to show that even a student sitting at a computer and making his cheat sheets falls within the realm of human mental activity.  The cheat sheet can of course be prepared with the computer as well.
the highfalutin technological terms are present in the Claims either in the alternative, such that they are not limiting the claim, or are mentioned as black boxes for implementation lacking any specificity or nexus to the Claim.  As such they represent the epitome of well-understood, routine, and conventional components.

    PNG
    media_image3.png
    154
    680
    media_image3.png
    Greyscale

The goal is to input a query into an LLM and get a response.  Using an LLM like ChatGPT or Claude is not an invention or an integration of an abstract idea into a practical application.
LLMs operate better when the query is accompanied by context.  So far, no technological practical application unless the use of an LLM is considered to be an invention.
One source document is provided to the LLM together with the query in order to provide context for the query and hopefully lead to a better response.  LLMs operate better with context and users are encouraged to add context.  Still no inventio/practical application/significantly more.
LLMs have token/window size limits and the context document cannot be too large.  This is an expression of their construct and the Applicant did not make them so.  We are still at parameters of using an LLM and the use of an LLM is not an invention.
The highlighted portions of the Claim below correspond to the steps discussed above which are not technological and rather consist of using an LLM while providing context to it:
1. A computer-implemented method comprising: 
receiving, as input, a query and a source document intended for a content-grounded question-answering or multi-turn conversation task by a specified large language model (LLM) which has a context window size limit, wherein said source document has a size which exceeds said context window size limit;
dividing said source document into a plurality of segments; 
applying a language model to each of said segments, to assign to each of said segments a relevance score; 
selecting the k-top segments having the highest said relevance scores; 
combining said selected k-top segments into a virtual document having a size which complies with said context window size limit; and 
feeding said virtual document as input to the specified LLM, to generate a response that is grounded in the content of said virtual document. 

Then, the Claim tries to address the restricted window size nature of the LLMs with a process of segmenting the context document and feeding only parts of it that are considered more relevant as context and this is where “Integrating into a practical (technological) application” could enter.  The highlighted portion of the Claim below pertains to this portion:
1. A computer-implemented method comprising: 
receiving, as input, a query and a source document intended for a content-grounded question-answering or multi-turn conversation task by a specified large language model (LLM) which has a context window size limit, wherein said source document has a size which exceeds said context window size limit;
dividing said source document into a plurality of segments; 
applying a language model to each of said segments, to assign to each of said segments a relevance score; 
selecting the k-top segments having the highest said relevance scores; 
combining said selected k-top segments into a virtual document having a size which complies with said context window size limit; and 
feeding said virtual document as input to the specified LLM, to generate a response that is grounded in the content of said virtual document. 

The Claim is abstract because the portion where integration into a practical application could have been expounded is left devoid of detail that is required by a machine to perform the steps.  Only a human can perform the above steps because each requires decision making which in turn requires specific criteria if a machine were to perform the steps.  

See the following questions raised by each of the limitations:
dividing said source document into a plurality of segments; [A machine needs to know the criteria for segmentation.  How many segments?  Or How large the segments are going to be?  Some level of detail is required to guide the machine.]
applying a language model to each of said segments, to assign to each of said segments a relevance score; [This is black box invocation of a “Language Model” for performing a task.  Is this another LLM?  Relevance score is relevance of the segment to What?  How are these scores assigned?   A machine requires details while a person can make a decision mentally on his own.]
selecting the k-top segments having the highest said relevance scores; [This step is the exception in clarity; a machine can select the n-best only that the step is standing on unstable shoulders of its predecessor limitations.]
combining said selected k-top segments into a virtual document having a size which complies with said context window size limit; and [This is another vague step.  A machine must make a lot of unspecified decisions to come up with the combination.  What if the k-top exceed the size limit?]

The Claim needs more detail to become machine-worthy and cross the threshold from an abstract idea that requires tons of human decision making to a practical technological application of that abstract idea.


    PNG
    media_image4.png
    36
    608
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    190
    714
    media_image5.png
    Greyscale

Response at 9.
In Reply:  perhaps if the computer could run Claim 1 it would include an improved function by including a new application but it cannot; the Claim is not providing the level of specificity that a computer needs.  It does not know the relevance score is relevance of segment to “what” which means the k-top segments cannot be selected properly and it does not know how to combine the k-top segments if their combination exceeds the window size.  A lot is left to the imagination of a machine that has none.
Applicant refers to the Specification:

    PNG
    media_image6.png
    176
    716
    media_image6.png
    Greyscale

Response at 10
In Reply: whether for art rejections or for abstract idea rejections, it is good that the Specification includes substance but the substance is considered only after it is presented in the Claim.  The Claims are examined.  The Specification is there for support.  Once sufficient technological detail is included in the Claim the rejection will be withdrawn.

    PNG
    media_image7.png
    216
    716
    media_image7.png
    Greyscale

Response at 10.
In Reply, as provided above:  how is the relevance score defined?  How is the language model coming up with the relevance score?  Neither the Claim nor the Specification shed any light and Language Models determine the relevance score particularly when it is not clear the relevance is between what to what.  In the art  language models may be developed to represent the query and documents to be searched, and the information retrieval is based on similarity of query and document language models.  This is again a black box use.
Regarding the dependents, Applicant provides:

    PNG
    media_image8.png
    182
    708
    media_image8.png
    Greyscale

Response at 10.
In Reply, the dependent Claims as provided in the rejection add a black-box implementation without nexus to the rest of the Claim and amount to no more than name dropping.  The key inventive features should be in the independent Claim and then the dependents can be as sparse as they wish.

    PNG
    media_image9.png
    200
    714
    media_image9.png
    Greyscale

In Reply, none of the argued steps are unconventional as evidenced by the art that has been applied.  Applicant may have intended the use of a second LLM in which case would be conventional.

    PNG
    media_image10.png
    426
    738
    media_image10.png
    Greyscale

Response at 11.
In Reply, the insertion of well-understood, routine and conventional components that are engaged in their well-understood, routine and conventional functions into the Claim does not elevate the Claim from abstract to not abstract.  The components argued by the Applicant are well-understood, routine and conventional components that are engaged in their well-understood, routine and conventional functions and are most importantly used in a disconnected manner lacking nexus to the remainder of the Claim.
As provided above, sitting in front of an LLM and asking questions is not an invention/ significantly more concepts. 
Applicant continues:

    PNG
    media_image11.png
    216
    730
    media_image11.png
    Greyscale

Response at 11.
Claim 4 refers to “query” where Claim 1 already included “a query” and therefore the “a query” of Claim 4 was interpreted as the combining of the query with the context before feeding the LLM.  The remaining dependent Claims add a black-box well-known implementation option to the Claim in a disconnected fashion lacking the requisite nexus to avert the abstract nature of Claim.  They do not include the types of components that can cause the Claim as a whole to amount to significantly more than the underlying abstract idea.

103 Rejections
Claim 1 is argued and is provided below.
1. A computer-implemented method comprising: 
receiving, as input, a query and a source document intended for a content-grounded question-answering or multi-turn conversation task by a specified large language model (LLM) which has a context window size limit, wherein said source document has a size which exceeds said context window size limit; 
dividing said source document into a plurality of segments; 
applying a language model to each of said segments, to assign to each of said segments a relevance score; 
selecting the k-top segments having the highest said relevance scores; 
combining said selected k-top segments into a virtual document having a size which complies with said context window size limit; and 
feeding said virtual document as input to the specified LLM, to generate a response that is grounded in the content of said virtual document. 

Applicant’s various arguments are addressed.
First the Applicant present his strongest argument:

    PNG
    media_image12.png
    312
    710
    media_image12.png
    Greyscale


    PNG
    media_image13.png
    188
    722
    media_image13.png
    Greyscale


    PNG
    media_image14.png
    106
    702
    media_image14.png
    Greyscale

Response 12-13 (emphasis added).

The above arguments regarding the “virtual document” pertain to the following Claim limitation:
combining said selected k-top segments into a virtual document having a size which complies with said context window size limit; and 
This limitation is supported by Figures 2 and 3 of the instant Application.  This is Figure 3 of the instant Application which includes Chunk (302), Rank (304), and Generate new document (306).

    PNG
    media_image15.png
    780
    502
    media_image15.png
    Greyscale

The mapping of the Office action referred to various parts of Vouitsis including Figure 4, 450.  Figure 4 of Vouitsis is shown below.
Vouitsis, chunks the input document, ranks the chunks according to their relevance to an input query and reorders the chunks in the order of relevance and picks a select number of all of the chunks until the sum of relevance scores satisfies a threshold to generate its output at 450.  The final result at 450 of Figure 4 may or may not have all of the chunks of the original input of 410 and the ordering of the chunks is likely different even if all are present.  This is a new document and it is precisely what the Applicant argues it is not: “a new artifact obtained by combining selected chunks into a single synthesized text object.”  What is lacking is only the size limitation of the Claim which is not vociferously argued by the Applicant and was combined from Vinodkumar.  The final document at 450 of Figure 4 of Vouitsis complies with a different type of limit. 


    PNG
    media_image16.png
    698
    894
    media_image16.png
    Greyscale

The very same Construction Step that is argued as the core of the invention of the instant Application is present in Vouitsis.

Next Applicant moves to the “Context Window Size Limit” and argues:


    PNG
    media_image17.png
    342
    752
    media_image17.png
    Greyscale


    PNG
    media_image18.png
    245
    720
    media_image18.png
    Greyscale

Response 13.

This argument pertains to the following Claim language:
receiving, as input, a query and a source document intended for a content-grounded question-answering or multi-turn conversation task by a specified large language model (LLM) which has a context window size limit, wherein said source document has a size which exceeds said context window size limit;

This entire limitation is providing a setup/scene that paint the situation for the steps that follow in the Claim.  As provided below, the input of query and source are taught by Vouitis, all LLMs have a context window size limit inherently, and the Claim is addressing a situation where the input document is too big for the window size limit thus setting the scene for the chunking that follows in the next limitation.
In Reply, first, the “receiving, as input, a query and a source document intended for a content-grounded question-answering or multi-turn conversation task by a specified large language model (LLM)” portion of the limitation is clearly taught by Figure 1B of Vouitsis that shows the documents 162 coming in through the data ingestor 132 and the query 138 being input from the client device 190.
    PNG
    media_image19.png
    782
    958
    media_image19.png
    Greyscale

In Reply, next: any LLM has a “context window size limit.”  This is inherent to the LLM structure as also acknowledged by the Applicant’s arguments:  “The context window token limit is a specific hardware/software constraint for the downstream LLM ….”  See the following evidence of inherency from Wikipedia:  “The context window is the maximum length of input a large language model (LLM) can consider at once. In the development and maturation of LLM technology expanding the context window has been a major goal.[1][2] The length of a context window is measured in tokens. In 2025, the Gemini LLM had the largest context window with two million tokens.[3]”  Context window - Wikipedia.   Accordingly, the next part of the limitation is merely stating a situation that is known as a precursor for further acts by the claim and is not a limiting feature of the Claim: “a specified large language model (LLM) which has a context window size limit.”
In Reply, further, regarding “wherein said source document has a size which exceeds said context window size limit;” portion of the limitation, it makes perfect sense (while not taught expressly) that considering the limited context window size of LLMs, that the inputs to the LLM are broken up (chunked) into pieces that are not too big and fit the limited window size.  This follows from the inherent context window size limit of LLMs.  Vouitsis presents several methods of chunking one of which is a fixed-size chunking which when taken together with the fixed window size of LLMs suggests that the fixed size is set according to the LLM window size and that some documents are too large for this window size.
In Reply, finally, Vinodkumar teaches:  “[0065] In some examples, the chunks can correspond with a particular size. The size of each chunk may be determined based on an application's file size requirements, available memory, network bandwidth, or other considerations….”  Here Vinodkumar is effectively sayins the same thing that the Applicant referenced as “The context window token limit is a specific hardware/software constraint for the downstream LLM ….”  In other words, the software/LLM has an input size restriction and the size of the chunk is determined in consideration of this limitation.
Accordingly, while Vouitsis suggests the limitation, the combination with Vinodkumar teaches that if a document is too large for the size limitation of the LLM the document is chunked into pieces that can fit the size restrictions of the software/LLLm.

Next Applicant moves to the “Relevance Score” of the Claim and argues:

    PNG
    media_image20.png
    56
    694
    media_image20.png
    Greyscale


    PNG
    media_image21.png
    306
    712
    media_image21.png
    Greyscale

Response at 13-14.

This argument pertains to the following Claim language in Claim 1:
dividing said source document into a plurality of segments; 
applying a language model to each of said segments, to assign to each of said segments a relevance score;

In Reply, first, the above portion of the Response includes misplaced reliance on material not currently inside the Claim and further on a dependent Claim.  A dependent claim must, under 35 U.S.C. 112(d), further limit which means it to have a scope that is narrower than the scope of the independent claim form which it depends.  The arguments of the Applicants are therefore replied to as two sets of arguments: those directed to Claim 1 and those directed to Claim 4 which is being mixed in with Claim 1.

Claim 1:

Note that Claim 1 begins with an LLM: “receiving, as input, a query and a source document intended for a content-grounded question-answering or multi-turn conversation task by a specified large language model (LLM)” for receiving the query and documents but moves to a different “Language Model” which is signaled by “a language model” in the limitation at issue for calculating the “relevance score.”   The entity (“a language model”) that calculates the “relevance score” is not “the LLM” that received the query and the document inputs at the beginning of the Claim.

Further, it is not clear what the Applicant means by:  “This is a special language model application – not a generic similarity calculation.”  There are 2 sources of interpretation:
1) Language of the Claim on its face is referring to “applying a language model to each of said segments, to assign to each of said segments a relevance score;” which is quite broad in not specifying how the “language model” assigns the “relevance scores.”
2) Supporting Disclosure including the Specification and the Drawings which are both quite scant and unspecific: “[0025] In some embodiments, the present technique provides for dividing the document into a plurality of segments, and using a language model to rank the plurality of segments by the likelihood to contain answers to the user question. …”  Note that not all of the above is in the Claim and the Claim does not specify that the “language model” is obtaining the likelihood that a chunk/segment contains the answer to the query.

Figures 2 and 3 of the instant Application show that the Relevance Score (Figure 2, 206 shown below) is obtained by the Ranking Module 304 of Figure 3 that is separate and distinct from the Large Language Model at the end that answers the query.

    PNG
    media_image22.png
    726
    447
    media_image22.png
    Greyscale

    PNG
    media_image23.png
    780
    494
    media_image23.png
    Greyscale

	This setup of the instant Application is uncannily similar to that of Figure 1B of Vouitsis where the “Scoring Model/Module 147” of Vouitsis corresponds to the “Ranking Module 304” of the instant Application and therefore teaches the “langauge model” of the Claim.  Vouitisis defines it as:  “[0048] … In some cases, the labelling module 146 includes a scoring module 147 that scores the relevance of text in a chunk with respect to a given query.”  This teaching of Vouitsis is perfectly adequate for teaching “applying a language model to each of said segments, to assign to each of said segments a relevance score;” of the Claim.  The Claim has no langague specifiying “relevance score” of the “segment” to What?  And, thus, falls quite short of particularity needed to circumscribe the potential invention not to mention to overcome the cited art.

    PNG
    media_image24.png
    694
    892
    media_image24.png
    Greyscale

Applicant’s general allegation of “mischaracterization” falls short on specifics and resorts to Claim 4 which is addressed separately.  The heading begins with a limitation of Claim 1 and the arguments of the Applicant drift to Claim 4.  
If the Disclosure of the instant Application includes specifics regarding the obtaining of the “relevance score” that cause the “language model” of the Claim which is shown in the drawings as “Ranking Module 304” to be distinguished from the “Scoring Module 147” of Vouitsis such language must be included in Claim 1 before it can be argued as a distinguishing feature.  While on the subject of specifics, it would be helpful for the Claim to include the “relevance score” of the “segment” is “assigned” with respect to the relevance of the segment to what.   While new matter may not be introduced and while the Disclosure does not include a description of the “specific language model application,” an explanation of what this “specific application” may be and how it is achieved may also be helpful. 
Claim 4:
Claim 4 provides:
4. The computer-implemented method of claim 1, 
wherein said relevance score for each of said segments is computed by applying said language model to said segment, with a prompt instructing said language model to generate a query based on said segment.

The “a query” in Claim 4 was interpreted as the final compound query that includes the question and context.  Claim 1 includes “a query” as input and the “query” of Claim 4 was interpreted as the expanded query that includes the input query and the context provided by the relevant chunks/segment.

The remainder of the Applicant’s assertions have no basis in the language of Claim 1 or its supporting Disclosure and appear to be directed to Claim 4:

    PNG
    media_image25.png
    146
    694
    media_image25.png
    Greyscale


    PNG
    media_image26.png
    269
    718
    media_image26.png
    Greyscale

Response at 14.

The above is divided into portions considering that the correlation between the portions is unclear to the Examiner and to leave no stone of the arguments unturned.

    PNG
    media_image27.png
    86
    726
    media_image27.png
    Greyscale


In Reply, this is not true of Claim 1.  Claim 1 operates like Vouitsis:  chunks/segments the documents; ranks the chunks/segments according to their relevance (Claim 1 does not say relevance to what and Vouitsis determines relevance to the query); packs the more relevant chunks/segments together and sends them in to the LLM as context for the query in an expanded query.
So, in Claim 1 there is no “generating a query based on that segment.”  The query is an input to Claim 1.  Please refer to the language of the Claim above.
Claim 4 includes “with a prompt instructing said language model to generate a query based on said segment.” Where this second “query” was interpreted as the expanded query which combines the input query with the context obtains from the combined relevant chunks/segments of the various documents.

Moving to the next segment:

    PNG
    media_image28.png
    86
    704
    media_image28.png
    Greyscale

This characterization has no basis in either the text of either of the Claims 1 or 4 or in the Disclosure of the instant Application.
First, please note the various ambiguity issues regarding the LLM being different from “a language model” in Claim 1 and two occurrences of “a query” in Claims 1 and 4 the second of which was interpreted as the expanded query+context of Vouitsis which conforms to the language of Claim 4 and the operation of the Disclosure of this Application.
Second, the “relevance score” was already obtained in Claim 1 as follows:
dividing said source document into a plurality of segments; 
applying a language model to each of said segments, to assign to each of said segments a relevance score; 
As discussed in the section that addresses the “relevance score” arguments above, there is no mention of “query” in this definition that is claimed.  There should be but there isn’t.  
Once Claim 1 has defined the term, Claim 4 cannot supplant the definition; it can only “further limit” it under 35 U.S.C. 112(d).  If Claim 1 says A is green, a dependent Claim 4 cannot say you know what A is no longer green but it is now red.  All that Claim 4 can do is to say A is a dark shade of green:  further limit the green; not supplant it.
At any rate, the “relevance score” is not defined by Claim 4 either as being “derived from the relationship between the generated query and the user’s input query.”  Claim 4 merely says:  “wherein said relevance score for each of said segments is computed … with a prompt instructing said language model to generate a query based on said segment.”  Aside from the grammatical and statutory issues with this Claim, it does not state that the relevance score is derived from the relationship between the generated query and the user’s input query; there is no talk of the “user’s input query” only of the segment.

The next segment of the Applicant’s arguments is moot in view of the interpretation of the second query in Claim 4 as the expanded query that is input to the LLM:

    PNG
    media_image26.png
    269
    718
    media_image26.png
    Greyscale

The mapping provided for Claim 4 maps this second “a query” of Claim 4 to the expanded query of Vouitsis that combines the user input query with the relevant chunks, as context, to be provided for the LLM.  This interpretation is consistent with the Disclosure of the instant Application which is doing precisely the same thing by using different terminology.  The relevance score was mapped to the scoring and ranking of Vouitsis in the rejection of Claim 1 and there was no need to address it in the rejection of Claim 4 and therefore the single sentence of the mapping is taken out of context.  The remainder of the mapping is comprehensive, clear, and consistent with the interpretation of “a query” in Claim 4 as the expanded query+context that is taught by Vouitsis in the mentioned sections.

Next Applicant moves to Motivation to Combine and argues:

    PNG
    media_image29.png
    56
    700
    media_image29.png
    Greyscale


    PNG
    media_image30.png
    276
    696
    media_image30.png
    Greyscale


    PNG
    media_image31.png
    246
    694
    media_image31.png
    Greyscale

Response 14-15.
In Reply:  Applicant admits that both references involve chunking and LLMs and are thus in the same field of endeavor.  That is sufficient basis for the combination.  In addition, they are both directed to or include the solution of the problem of chunking in the context of LLMs.  They satisfy both conditions of 103 combination when one would suffice.  Further, as replied above, the limited chunk size is inherent to LLMs and strongly suggested by the primary reference and the combination of secondary was for one minor incidental point that would not implicate the rest of the Claim.  Primary reference mentions and conducts chunking and the secondary in the same field of query/response with LLMs and for the purpose of satisfying the chunk size of an LLM goes into further detail of chunking.  Perfectly motivated to combine and satisfying several others of the KSR rationales such as substitution and improvement.

103 Rejections of the Other Dependent Claims

Claims are grouped by the Applicant according to similarity and therefore only the representative Claim is presented and discussed by both the Applicant and the Examiner.

Claims 2, 9, and 16
Claim 2 says conduct the query/response process twice or more and user the previous rounds as context.  This practice of using history as context  has been prevalent in natural language IVR systems for decades.
2. The computer-implemented method of claim 1, 
wherein said steps of receiving, dividing, applying, selecting, combining, and feeding are iterated two or more times, and
wherein, in each current one of said iterations, a current said query comprises said queries and said responses from at least one previous said iteration.

Applicant’s characterization of the added reference Lo provides a reply to the Applicant’s arguments:

    PNG
    media_image32.png
    270
    700
    media_image32.png
    Greyscale

Response at 15.
Basically, Applicant is arguing that Lo’s sophisticated system of using conversation history does not teach our simple Claim.  This type of argument is hardly ever a good argument.  Lo has the claimed method and more which means that it still teaches the Claim.
The remainder of argument pertaining to the motivation to combine are of the same type of the more sophisticated reference does not teach a claim which is a special simplified case of its teachings:

    PNG
    media_image33.png
    290
    736
    media_image33.png
    Greyscale

Response at 16.
In Reply, the continuity requirement can be set to 0 or turned off.  When a reference includes more, a dwindled down and devolved version of it always can teach a simplified method.  
Regarding motivation, considering all references pertain to natural language query and response and are in the same field of endeavor, combination is proper either under TSM to let history become context or under several of the KSR rationales.

Claims 5-6, 12-13, and 19
5. The computer-implemented method of claim 1, wherein said relevance score is a cross-entropy loss for each of said generated responses. 

6. The computer-implemented method of claim 5, wherein: 
said language model is an encoder-based language model, and 
wherein said loss is computed based, at least in part, on a similarity of the representation of each of said generated responses to the representation of said input query; 
or 
said language model is an encoder-decoder or decoder-only language model, and wherein said loss is computed based, at least in part, on the inverse of perplexity of the representation of each of said generated responses to the representation of the input query.

Regarding Claim 5, Applicant argues:

    PNG
    media_image34.png
    492
    708
    media_image34.png
    Greyscale


    PNG
    media_image35.png
    96
    708
    media_image35.png
    Greyscale

Response at 16-17.
As characterized by the Applicant, the Claim asks for relevance score to be cross-entropy loss and the reference teaches that the relevance score is a cross-entropy loss function.   Applicant takes issue that the cross-entropy loss of the reference is calculated during training whereas the relevance score/cross-entropy loss of the Claim is obtained during the inference phase.
C ross-entropy is one method of obtaining loss; loss is difference; difference is related to relevance.  Reference teaches that this one method (cross-entropy) is used in ranking documents for their relevance.
In Reply, the reference, based on the characterization by the Applicant, is obtaining relevance scores by a particular method that is consistent with that which is claimed.  This teaching means that the particular score can be and is in the art obtained by this particular method.  That is all that the Claim requires and is satisfied by the teachings of the reference.  Neither the Claim nor the Specification elaborate why it is important to obtain the score by this particular method or even mention the distinction between training time and inference time and it is not clear why the training or inference phases should even enter the discussion of the rejection.  
All of the two instances where cross-entropy is mentioned in the instant Application are provided below and indicate that the “cross-entropy loss” is obtained for the responses and then taken as the relevance score which neatly conforms to the teachings of the reference:
[0014] In some embodiments, the relevance score is the cross-entropy loss for each of the generated responses.
[0050] In some embodiments, the instructions of ranking module 304 may cause VDOC model 300 to apply a language model to each segment created in step 204, with the instruction to generate a conversation (response) between a user and an agent based on the input segment. The instructions of ranking module 304 may then cause VDOC model 300 to compute the cross-entropy loss for each of the generated responses, and assign a relevance score defined as by (−1)*loss to each of the segments based on the computed loss. The instructions of ranking module 304 may then cause VDOC model 300 to rank the segments based on the assigned relevance score by ascending order.

Regarding Claim 6, Applicant argues:

    PNG
    media_image36.png
    362
    698
    media_image36.png
    Greyscale

Response at 17.
In Reply, Applicant’s arguments are not conclusory.
Claim 6 does specify two technical implementation; but does so In the Alternative using the connector OR which indicates that the intent is to limit Only to one of the two alternatives. 
The cited reference, as admitted by the Applicant teaches a “Multimodal Encoder 670” which clearly teaches the “said language model is an encoder-based language model, and” of Claim 6.
Next Applicant falls back on a distinction between the training vs inference phases which is not persuasive.  None of the Claim or the Application or the Reference are focused on training/inference phases and their distinctions.  The Claim is simply asking for “wherein said loss is computed based, at least in part, on a similarity of the representation of each of said generated responses to the representation of said input query;” which frankly is just a definition of loss; loss is the difference between what we want and what we get and at least this limitation does not include any “precise technical specifications of how different language model architectures compute the relevance signal” as touted by the Applicant.  If anything, the limitation is an expression of the well-known definition of loss.  Any mention of loss in the reference would have been sufficient for this level of claimed language.

Claims 7, 14, and 20
7. The computer-implemented method of claim 1, 
wherein said task comprises one of: 
(i) end-to-end inference based on said query and single said source document; 
(ii) end-to-end training based on said query, a single said source document, and a gold response; or 
(iii) a plurality of instances of in-context learning, each comprising a said query, a said source document, and a gold response, and using only a portion of said context window.
Applicant argues:

    PNG
    media_image37.png
    62
    722
    media_image37.png
    Greyscale


    PNG
    media_image38.png
    120
    701
    media_image38.png
    Greyscale

Response at 17.
In Reply:
First, the Claim is another example of claiming in the alternative where it limits itself to only one of the options and a reference that teaches any of the options teaches it.
Second, while “end-to-end” systems may be “specific machine learning operational paradigms with precise technical meanings in the field,” nevertheless: 1) they are quite common and 2) the Claim is simply asking for an “end-to-end inference based on the query and source document.”  This is the entire extent of the sophistication:  use an end-to-end system that is known in the art to implement the method set forth in Claim 1.  It does not provide any detail or sophistication.
Vouitsis as provided in the rejection is an end-to-end system which is not addressed by the Applicant’s arguments.
Birru was cited because it includes the keyword “end-to-end” and also because this Claim is asking for a “single … source document.”
The remainder of the Applicant’s arguments have no basis in the Claim or even the supporting Specification.  This Application is not about the virtues or the specificities of end-to-end systems.  Rather it uses a known industry method as a black box to implement its invention that lies elsewhere.

Note all occurrences of end-to-end in the Specification and the dearth of detail when such systems are mentioned that indicates they are used as mere black boxes:
[0016] In some embodiments, the task comprises one of: (i) end-to-end inference based on the query and single the source document; (ii) end-to-end training based on the query, a single the source document, and a gold response; or (iii) a plurality of instances of in-context learning, each comprising a query, a source document, and a gold response, and using only a portion of the context window.
End-To-End Evaluation (this section discusses the inputs vs. outputs using the method of the Application.)

Summary of Reply to the Arguments regarding Art:
Applicant’s arguments tend to rely on material not in the Claim and not supported by the Specification.  Any material that is argued needs to be first claimed with particularity and having support in the Disclosure.  Touting the sophistication of systems that are merely mentioned by name in both the Claim and even the Specification as black boxes for implementation does not help overcome the cited art.  

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Step 1: The independent Claims are directed to statutory categories: 
Claim 1 is a method claim and directed to the process category of patentable subject matter.
Claim 8 is a system claim and directed to the machine or manufacture category of patentable subject matter.
Claim 15 is a computer-readable-storage device claim and is directed to the machine or manufacture category of patentable subject matter.

Step 2A, Prong One: Does the Claim recite a Judicially Recognized Exception? Abstract Idea? Are these Claims nevertheless considered Abstract as a Mathematical Concept (mathematical relationships, mathematical formulas or equations, mathematical calculations), Mental Process (concepts performed in the human mind (including an observation, evaluation, judgment, opinion), or Certain Methods of Organizing Human Activity (1-fundamental economic principles or practices (including hedging, insurance, mitigating risk), 2-commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations), 3- managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions) and fall under the judicial exception to patentable subject matter?)
The rejected Claims’ core steps—dividing a document into segments, scoring segments for relevance, selecting top-k segments, combining them into a virtual document that fits a context window, and feeding that virtual document to an LLM—are directed to organizing, processing, and presenting information for question answering. Federal courts and the USPTO commonly treat “collecting/organizing/manipulating information” and “using a generic computer to perform conventional tasks” as abstract ideas.
Accordingly, the Claim is directed to an abstract idea absent additional technical detail showing a specific improvement to computer functionality or a particular technical implementation.
Step 2A, Prong Two: Additional Elements that Integrate the Judicial Exception into a Practical Application? Identifying whether there are any additional elements recited in the claim beyond the judicial exception(s), and evaluating those additional elements to determine whether they integrate the exception into a practical application of the exception. “Integration into a practical application” requires an additional element(s) or a combination of additional elements in the claim to apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the exception. Uses the considerations laid out by the Supreme Court and the Federal Circuit to evaluate whether the judicial exception is integrated into a practical application.
The rejected Claims do not include additional limitations that point to integration of the abstract idea into a practical application and are therefore directed to the abstract idea.
To be eligible, the claim must contain additional elements that amount to significantly more than the abstract idea, e.g., a non-conventional and non-generic arrangement of components, a specific technical improvement to computer functioning, or novel data structures/algorithms that solve a technical problem.
As drafted, the claim recites high-level steps and the use of a “specified large language model” without specifying:
o	how segmentation is performed (specific algorithm, data structures, heuristics),
o	how relevance scoring is computed (specific model, embeddings, similarity function, weighting, thresholds),
o	how the virtual document is constructed to preserve context or reduce hallucination (ordering rules, summaries, compression),
o	how these steps produce a technical improvement (reduced latency, memory use, improved accuracy measured by specific metrics), or
o	any non-generic hardware or system architecture or components and how they improve the underlying computer or network. 
Therefore, as drafted, the Claims fail step 2.
1. A computer-implemented method comprising: [As provided below, the additional hardware components in counterpart Claims 8 and 15 are processors, memory and computer readable media which are generic and considered well-understood, routine, and conventional.]
receiving, as input, a query and a source document intended for a content-grounded question-answering or multi-turn conversation task by a specified large language model (LLM) [Student is taking an open book exam and receives the question and the material intended to be used to answer the question.  In the alternative, the student is permitted to take a cheat sheet to the exam that summarizes the key points.] 
which has a context window size limit, wherein said source document has a size which exceeds said context window size limit; [A certain volume, number of pages, of open book material is provided to the student that must not be exceeded.]
dividing said source document into a plurality of segments; [The source material is divided into chapters.]
applying a language model to each of said segments, to assign to each of said segments a relevance score; [The student is permitted to extract the key parts of each chapter.]
selecting the k-top segments having the highest said relevance scores; [Student selects those parts, e.g. formulas, that the student finds most important for answering the exam questions.]
combining said selected k-top segments into a virtual document having a size which complies with said context window size limit; and [Student forms a cheat sheet from the collected important sections.]
feeding said virtual document as input to the specified LLM, to generate a response that is grounded in the content of said virtual document. [Student uses the cheat sheet thus created to answer the exam questions.]

Step 2B: Search for Inventive Concept: Additional Elements Do not amount to Significantly More: The limitation of LLM in Claims 1, 8, and 15, and the limitations of processors, memory, and programs, in the counterpart Claims 8 and 15, are well-understood, routine, and conventional machine components that are being used for their well-understood, routine, conventional and rather generic functions. Additionally, these limitations are expressed parenthetically and lack nexus to the Claim language and as such are a separable and divisible mention to a machine. Accordingly, they are not sufficient to cause the Claim as a whole to amount to significantly more than the underlying abstract idea. 

The Dependent Claims do not add limitations that could integrate the abstract idea into a practical technological application or could help the Claim as a whole to amount to significantly more than the Abstract idea identified for the Independent Claim:

2. The computer-implemented method of claim 1, 
wherein said steps of receiving, dividing, applying, selecting, combining, and feeding are iterated two or more times, and [Student takes the test several times a year and every time the student is permitted to collate a cheat sheet for a semi-open book exam.]
wherein, in each current one of said iterations, a current said query comprises said queries and said responses from at least one previous said iteration. [Student decides the use the cheat sheet from the previous exam and add to it.]

3. The computer-implemented method of claim 1, 
wherein said dividing is performed using a sliding window operation or using semantic chunking. [Student makes sure that the portions he is taking make sense.]

4. The computer-implemented method of claim 1, 
wherein said relevance score for each of said segments is computed by applying said language model to said segment, with a prompt instructing said language model to generate a query based on said segment. [Student can use ChatGPT to see if the segments he is selecting from a book are good and relevant segments or he can evaluate it by looking at the material that he has selected.]

5. The computer-implemented method of claim 1, wherein said relevance score is a cross-entropy loss for each of said generated responses. [Student can use the mathematical formula of cross-entropy for evaluating relevance.]

6. The computer-implemented method of claim 5, wherein: 
said language model is an encoder-based language model, and [This limitation refers to the language model used for assigning relevance scores which is a tool used by the Student and is considered a well-understood routine and conventional component engaging in its werc task.  The Student makes an input and receives an output.]
wherein said loss is computed based, at least in part, on a similarity of the representation of each of said generated responses to the representation of said input query; [This is how loss is computed by encoders or decoders.  It is an expression of generic operation of a generic tool.]
or 
said language model is an encoder-decoder or decoder-only language model, and wherein said loss is computed based, at least in part, on the inverse of perplexity of the representation of each of said generated responses to the representation of the input query.

7. The computer-implemented method of claim 1, 
wherein said task comprises one of: 
(i) end-to-end inference based on said query and single said source document; [A person is in general and end-to-end system because a person does not dissect the steps that his brain takes.]
(ii) end-to-end training based on said query, a single said source document, and a gold response; or 
(iii) a plurality of instances of in-context learning, each comprising a said query, a said source document, and a gold response, and using only a portion of said context window.


With respect to Independent Claim 8 and independent Claim 15, which have limitations similar to the limitations of Claim 1, the limitations of “processor,” “memory, “ or “computer storage medium" are expressed parenthetically and lack nexus to the Claim language and as such are a separable and divisible mention to a machine.  Accordingly, they do not include additional limitations that can 1) integrate the Abstract Idea into a practical application or 2) cause the Claim as a whole to amount to more than the underlying abstract idea. 
The remaining dependent Claims are parallel to Claims 2-7 and are rejected under similar rationale.  
Claim 9 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 10 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 11 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.
Claim 12 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.
Claim 13 is a system claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.
Claim 14 is a system claim with limitations corresponding to the limitations of Claim 7 and is rejected under similar rationale.
Claim 16 is a computer program product system claim with limitations corresponding to the limitations of method Claim 2 and is rejected under similar rationale.
Claim 17 is a computer program product system claim with limitations corresponding to the limitations of method Claim 3 and is rejected under similar rationale.
Claim 18 is a computer program product system claim with limitations corresponding to the limitations of method Claim 4 or 5 and is rejected under similar rationale.
Claim 19 is a computer program product system claim with limitations corresponding to the limitations of method Claim 6 and is rejected under similar rationale.
Claim 20 is a computer program product system claim with limitations corresponding to the limitations of method Claim 7 and is rejected under similar rationale.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3-4, 8, 10-11, 15, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Vouitsis (US 20250335492) in view of Vinodkumar (U.S. 20250291768).
Regarding Claim 1, Vouitsis teaches: 
1. A computer-implemented method comprising: [Vouitsis, Figure 2 shows the hardware including “processor 210,” “memory 220” and other “CPUS 212 and GPUs 214.”]
receiving, as input, a query and a source document intended for a content-grounded question-answering or multi-turn conversation task by a specified large language model (LLM) [Vouitsis, Figure 1B, the “pipeline 140” is receiving a “query 138” from one side and “documents 162” from a “document repository 160.”  The “query 138” receives a “response 139” generated by the “generator LLM 150” of the “pipeline 140.”] [Figure 6, 602. “[0094] Block 604: The computing system generates an expanded prompt using the one or more retrieved documents (e.g., from block 512) and the query.”] 
which has a context window size limit, wherein said source document has a size which exceeds said context window size limit; [Vouitsis teaches fixed-sized chunking which implies a window size limit and documents that exceed that limit: “[0045] In some cases, the chunking module 142 obtains multiple documents 162 through a document loader, which may be part of or in addition to a data ingestor 132. In some cases, for each given document, the chunking module 142 segments the text in the given document into portions of text. In some cases, semantic chunking is used to segment the text. In some other cases, document-based chunking is used to segment the text, which identifies and uses a structure of a document. Other examples of chunking computations include recursive chunking and fixed-sized chunking. Other currently known and future known chunking computations can be used by the chunking module 142.”]
dividing said source document into a plurality of segments; [Vouitsis, Figure 1B, the “Document 162” are fed to “Chunking Module 142.” “[0045] In some cases, the chunking module 142 obtains multiple documents 162 through a document loader, which may be part of or in addition to a data ingestor 132. In some cases, for each given document, the chunking module 142 segments the text in the given document into portions of text….”  “[0052] In some cases, text data (e.g., in the form of documents or other files) are obtained via the data ingestor 132 and are transmitted into the chunking module 142 or the document repository 160, or both. The chunking module 142 generates chunks 164 from the multiple documents, which are processed by the embedding LLM 144 to generate embeddings that are stored in the vector database 166, or a graph database, or both….”] [Figure 4, 410, Figure 5, 502.]
applying a language model to each of said segments, to assign to each of said segments a relevance score; [Vouitsis, Figure 1B, “Scoring Model 147” which is being fed by the “Embedding Vector Database 166” that is formed from embeddings of “passages/chunks 164.”  ‘[0048] … In some cases, the labelling module 146 includes a scoring module 147 that scores the relevance of text in a chunk with respect to a given query.”  See also Figure 3: “[0065] Block 320: The computing system computes a score for each chunk in relation to a given query.”] [Figure 4, 420, Figure 5, 504.]
selecting the k-top segments having the highest said relevance scores; [Vouitsis, Figure 1B, the “retriever module 148” orders the chunks p in the order of relevance score f(q,p) to the query q from high to low  such that a k-top is taught by n-top.  See Figure 3, 330: “Reorder Scores from Highest to Lowest” and 331 which shows the ordering of the chunks D1-3, D1-2 … according to their relevance scores 0.9, 0.8, etc such that the n highest scoring chunks can be selected.  “[0072] In an example illustration of obtaining the sum 341, the n number of highest scores 346 is summed until a threshold is reached….”    “[0053] In some cases, the computations by the pipeline 140 includes obtaining k chunks in a search index, such as the vector database 166, and executing a computation f(q,p) that returns a search score between a query q and a chunk p. In some cases, the computation f(q,p) is executed by the retriever module 148. It is assumed also that for each query q there is a required set of documents D. The expression π(q) is the permutation of {1, . . . ,k} that sorts results of f(q,p) from a highest search score to a lowest search score for each of the k chunks p in the search index….”] [Figure 4, 430, Figure 5, 506.]
combining said selected k-top segments into a virtual document having a size which complies with said context window size limit; and [Vouitsis, Figure 3, 340:  “Starting with the highest score and working down the reordered list, sum the scores until a desired threshold is reached (or exceeded) which produces a subset of chunks.”   See 331 and 346 and how the top 4 chunks which are obtained while the condition in 343 is satisfied. “0080] Block 450: The computing system identifies the subset of chunks as relevant to the given query.” What is shown in Figure 3, 346 and Figure 4, 450 teaches the “virtual document” of the Claim.]  [Figure 4, 450, Figure 5, 508, 510, 512.]
feeding said virtual document as input to the specified LLM, to generate a response that is grounded in the content of said virtual document. [Vouitsis, Figure 1B, the “generator LLM 150” that generates the “response 139” is operating on the output of the “retriever module 148” that combines the k-top/n-top most relevant. ] [Figure 6, 604, 606, 608.  “[0095] Block 606: The computing system inputs the expanded prompt into a generator LLM, which outputs a response.”]
While the “fixed-size chunking” of Vouitsis teaches a limited window size and documents that need to be chunked because they are larger than this window size, an express reference is added. 
Vinodkumar is elaborate with respect to chunking and teaches:
which has a context window size limit, wherein said source document has a size which exceeds said context window size limit; [Vindokumar, Figure 3, Chunking with hierarchy 340 and Figure 4, chunking 410. “[0064] At block 340, chunking is initiated with the hierarchy to create data object chunks that are smaller than the original data objects. For example, a process may separate the data/document object into chunks of data that are smaller file sizes than the original objects (e.g., object 1/chunk 1, object 1/chunk 2, object 2/chunk 1, object 2/chunk 2, etc.). The smaller pieces/chunks may comprise the hierarchy data dictionary embedded as metadata with each chunk.”  “[0065] In some examples, the chunks can correspond with a particular size. The size of each chunk may be determined based on an application's file size requirements, available memory, network bandwidth, or other considerations. In some examples, the destination/device that receives the vector embedding at the end of the process may restrict data to a particular token size limit. In instances where the log files are larger than the token size limit, the chunking process may help reduce the size of the data in order to transmit the context vectors to the large language models. Although two chunks are illustrated for the container provided in example 300, more than two or less than two chunks may be implemented without diverting from the essence of the disclosure.”]
Vouitsis and Vinodkumar pertain to query/response system with the use of LLMs and it would have been obvious to buttress the teachings of Vouitsis which mention fixed-size chunking with the elaborate discussion of Vinodkumar regarding chunking into chunks of a particular size in order to accommodate the context vector size of the recipient LLM for completeness.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 3, Vouitsis teaches: 
3. The computer-implemented method of claim 1, 
wherein said dividing is performed using a sliding window operation or using semantic chunking. [Voutisis: “[0045] In some cases, the chunking module 142 obtains multiple documents 162 through a document loader, which may be part of or in addition to a data ingestor 132. In some cases, for each given document, the chunking module 142 segments the text in the given document into portions of text. In some cases, semantic chunking is used to segment the text. In some other cases, document-based chunking is used to segment the text, which identifies and uses a structure of a document. Other examples of chunking computations include recursive chunking and fixed-sized chunking. Other currently known and future known chunking computations can be used by the chunking module 142.”]

Regarding Claim 4, Vouitsis teaches:
4. The computer-implemented method of claim 1, 
wherein said relevance score for each of said segments is computed by applying said language model to said segment, with a prompt instructing said language model to generate a query based on said segment. [Vouitsis: All of the processes of the “pipeline 140” which include the “scoring model 147” are triggered by the input of the “query 138” and responsive to that query.  Figure 1B:  “[0052] .. The vector representation of the query is also herein referred to as “query”. In some cases, the vector representation of the query is used for computations in the pipeline 140. The labelling module 146 identifies and labels one or more documents that are considered relevant to the query….In some cases, an expanded prompt is generated using the relevant information outputted by the retriever module 148 and the query. In some cases, the expanded prompt is a vector representation. The expanded prompt is inputted into the generator LLM 150, and the generator LLM 150 outputs a response 139….”   “[0053] In some cases, the computations by the pipeline 140 includes obtaining k chunks in a search index, such as the vector database 166, and executing a computation f(q,p) that returns a search score between a query q and a chunk p….”  Figure 6, 602.  The expanded prompt of Vouitsis which includes the query and the retrieved documents is an indirect instruction to generated the relevance scores.   “[0094] Block 604: The computing system generates an expanded prompt using the one or more retrieved documents (e.g., from block 512) and the query.”]

Claim 8 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
Claim 10 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 11 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.

Claim 15 is a computer program product system claim with limitations corresponding to the limitations of method Claim 1 and is rejected under similar rationale.
Claim 17 is a computer program product system claim with limitations corresponding to the limitations of method Claim 3 and is rejected under similar rationale.
Claim 18 is a computer program product system claim with limitations corresponding to the limitations of method Claim 4 or 5 and is rejected under similar rationale.

Claims 2, 9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Vouitsis and Vinodkumar in view of Lo (U.S. 12061970).
Regarding Claim 2, Vouitsis teaches:
2. The computer-implemented method of claim 1, 
wherein said steps of receiving, dividing, applying, selecting, combining, and feeding are iterated two or more times, and  [Vouitsis has no limitations and generally in query/response systems, the system is used multiple times.]
wherein, in each current one of said iterations, a current said query comprises said queries and said responses from at least one previous said iteration.
Vouitsis and Vinodkumar do not teach that the expanded query includes query/response pair of the previous iterations.
Lo teaches:
wherein, in each current one of said iterations, a current said query comprises said queries and said responses from at least one previous said iteration. [ Lo teaches using the previous rounds of query/response as context for the new query.  Figure 2, “context retrieval component 204” and “continuity metric 206” as parts of the “context engine 112” which provides input to the “model orchestration LLM 114” which generates the “response 102” to the “natural language request 142.”  Lo uses a sophisticated method of balancing continuity with attention to determine how much of the previous conversation to use as context.   “… The natural language request and the context attributes are input into the model orchestration large language model (LLM) to output instructions to machine learning (ML) agents based on the context attributes. The ML agents output responses associated with the user-provided data record query based on the instructions, and the responses are input into the model orchestration LLM to output to the user computing device a natural language response based on the context attributes.”  Abstract.  “In some embodiments, the context engine 112 may dynamically control the context kept in the runtime. Thus, in some embodiments, the context engine 112 may systematically reduce the context input into the model orchestration LLM 114 while preserving the meaning of context. In some embodiments, the model orchestration LLM 114 may have a limited context size, including both hard limits (there is a hard ceiling), and also semantic cognitive limits (e.g., the more in the context, the less attention is direct to the context). In some embodiments, to optimize the resource use and attention of the model orchestration LLM 114, the context engine 112 may dynamically reduce context size to balance continuity for the user with attention in the model orchestration LLM 114. To do so, the context engine 112 may review the conversation occurring via the GUI and determine how much conversation history is required to maintain the conversation. The context engine 112 performs this review dynamically and in conjunction with the injection of context data described above to enable seamless conversation across hundreds of messages or more.”  10:48-67.  “In some embodiments, the balancing is configured to optimize based on one or more parameters, such as, e.g., recency (memory depth), stated importance, model capability, cost, attribute(s) of the conversation (e.g., industry, sector, investment grade versus high yield, or other attributes of the financial investment related conversations or any combination thereof) among others or any combination thereof. In some embodiments, recency refers to how recent the data is to be kept in the runtime. For example if the user asked for high yield bonds, and then next question ‘just above 5% yield’, it should know high yield bonds just above 5%. But if the user asked high yield bonds last week, it should not maintain that query in memory. …” 11:1-25.  “In some embodiments, the context engine 112 may also implement a continuity metric 206 and an attention metric 208 to optimize continuity and attention of the model orchestration LLM 114 based on optimization parameters 210. In some embodiments, the optimization parameters 210 may include, e.g., recency, stated importance, model capability, resource user, among others or any combination thereof. Based on the optimization parameters 210 the context engine 112 may generate a continuity metric 206 balanced with an attention metric 208 to determine an amount of data to maintain in the runtime of the model orchestration LLM 114.”  20: 1-12 and see the examples provided at 20:13-50 of when the previous conversation is kept and when it is discarded.]
Vouitsis/Vinodkumar and Lo pertain to document-based query/response systems and it would have been obvious to use the continuity measure of Lo which evaluates how many of the previous turns of query/response to include as context with the system of combination to provide more context for the search.  This is combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 9 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 16 is a computer program product system claim with limitations corresponding to the limitations of method Claim 2 and is rejected under similar rationale.

Claims 5-6, 12-13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Vouitsis and Vinodkumar in view of Shankdhar (U.S. 20240394942).
Regarding Claim 5, Vouitsis uses f(q, p): “[0053] In some cases, the computations by the pipeline 140 includes obtaining k chunks in a search index, such as the vector database 166, and executing a computation f(q,p) that returns a search score between a query q and a chunk p. In some cases, the computation f(q,p) is executed by the retriever module 148….”   Vouitsis uses f(q, p) in its other calculations but does not include a formula for how f (q,p) is obtained.  
Neither does Vinodkumar.
Shankdhar teaches:
5. The computer-implemented method of claim 1, wherein said relevance score is a cross-entropy loss for each of said generated responses. [Shankdhar teaches that relevance score and cross-entropy loss are related: “[0058] In some embodiments, the document expansion system 106 optimizes/trains the model parameters using backpropagation and stochastic gradient descent (SGD) or other variant. During training, the document expansion system 106 presents the model with a batch of queries and documents and calculates a loss function (e.g., cross-entropy loss or mean squared error) based on the predicted relevance scores and the true relevance labels….”]
Vouitsis/Vinodkumar and Shankdhar pertain to document-based query/response systems and the relationship between cross-entropy loss and relevance score is apparently quite common in such references and it would have been obvious to use this method for the obtaining of the relevance score.  This is combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 6, Vouitsis teaches “[0046] In some cases, the embedding LLM 144 encodes the chunks into embeddings (also called vectors) and stores and indexes the embeddings into a vector database 166….”  But it does not discuss the remaining step pertaining to the loss which is well-known in classification.  
Shankdhar teaches:
6. The computer-implemented method of claim 5, wherein: 
said language model is an encoder-based language model, and [Shankdhar, Figure 6B, “Multimodal Encoder 670” is an encoder-based ML.]
wherein said loss is computed based, at least in part, on a similarity of the representation of each of said generated responses to the representation of said input query; [Shankdhar, “[0058] In some embodiments, the document expansion system 106 optimizes/trains the model parameters using backpropagation and stochastic gradient descent (SGD) or other variant. During training, the document expansion system 106 presents the model with a batch of queries and documents and calculates a loss function (e.g., cross-entropy loss or mean squared error) based on the predicted relevance scores and the true relevance labels. The document expansion system 106 determines gradients of the loss function with respect to the model parameters for use in updating the parameters….”]
or 
said language model is an encoder-decoder or decoder-only language model, and wherein said loss is computed based, at least in part, on the inverse of perplexity of the representation of each of said generated responses to the representation of the input query.
Vouitsis/Vinodkumar and Shankdhar pertain to document-based query/response systems and it would have been obvious to use an encoder only model with a loss function as defined by Shankdhar to implement the ML model of the combination.  This is combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 12 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.
Claim 13 is a system claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.
Claim 19 is a computer program product system claim with limitations corresponding to the limitations of method Claim 6 and is rejected under similar rationale.

Claims 7, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Vouitsis and Vinodkumar in view of Birru (U.S. 20250200290).
Regarding Claim 7, Vouitsis teaches:
7. The computer-implemented method of claim 1, 
wherein said task comprises one of: 
(i) end-to-end inference based on said query and single said source document; [Vouitsis’s uses a pretrained to arrive at an inference in an end-to-end fashion and the relevant document that is used to generate the “expanded prompt” of Figure 6, 604, may be a single document: “[0051] … The generator LLM 150 is configured to synthesize the retrieved information (e.g., provided by the retriever module 148) with its pre-trained configuration to generate a contextually relevant response.”  “[0049] In some cases, the retriever module 148 is configured to retrieve the one or more documents labelled as relevant to the given query. In some cases, the retriever module 148 ignores other documents in the set of available documents (e.g., the document set). In this way, the retriever module 148 does not need to process documents that are considered superfluous, which could cause hallucinations….”]
(ii) end-to-end training based on said query, a single said source document, and a gold response; or 
(iii) a plurality of instances of in-context learning, each comprising a said query, a said source document, and a gold response, and using only a portion of said context window.
The references do not include the phrase end-to-end expressly.
Birru teaches:
wherein said task comprises one of: 
(i) end-to-end inference based on said query and single said source document; [Birru, “[0033] At step 116 the LLM is prompted, using the constructed optimal prompt for generating a response as a recommendation of the identified predefined number of similar assets as the suitable assets for solving the user query. In this regard, the solution to the enterprise problem is provided in form of the recommendation asking the user to make use of the predefined number of similar assets that are identified as the suitable assets for the enterprise problem in the user query. The method is able to achieve successful utilization of the constructed optimal prompt and provide accurate recommendations for the enterprise problem in the user query. For example, the response that is generated for the user query “recognize a given action in a given sport” is “The proprietary Object Detection Training asset, the Object Detection Model Deployment and the proprietary Object Detection Inference asset can be used to build a solution for player recognition in sports. These assets provide the necessary tools for training an object detection model on the proprietary DGX platform using proprietary TLT, and for running an end-to-end object detection inference using proprietary Deepstream on the DGX platform. Additionally, the Dockerfile included in these modules contains all the necessary dependencies for launching a container to run the model training end-to-end”.”]
Vouitsis/Vinodkumar and Birru pertain to query/response systems and it would have been obvious to use the express mention of end-to-end inference from Birru with the system of combination which appears as an end-to-end inference system at any rate.  This is combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 14 is a system claim with limitations corresponding to the limitations of Claim 7 and is rejected under similar rationale.
Claim 20 is a computer program product system claim with limitations corresponding to the limitations of method Claim 7 and is rejected under similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Eshghi (U.S. 7,797,323) and K. Eshghi et al., "A Framework for Analyzing and Improving Content-Based Chunking Algorithms," pp. 1-10 (2005).
Eshghi, Figure 2, “receive a file 202” and “divide window into chunks 204.”  “… The term "chunk" refers to a segment of the file, where the chunk is produced by chunking the file based on the content of the file (which is contrasted with segmenting the file according to a size of the file, or segmenting the file into segments of a target size). Because chunks of a file are identified based on their content, the chunks of the file can have varying sizes.”  1:35-48.  Eshghi uses a sliding window method across each chunk for fingerprinting the chunk by a hash algorithm.  Figure 2, 210 and 212.

Yuan (U.S. 20250278564):
“[0073] In this regard, the semantic integrity-driven sliding window is different from a traditional length-based sliding window. For example, in FIG. 4, the sentence “Semantic integrity-driven sliding window' is different from the traditional length-based sliding window, which is composed of a lightweight pointer neural network,” illustrated at 402, is split, at 404, by a traditional sliding window technique into text chunks 406 and 408, and at 410, by a semantic-integrity driven sliding window into text chunks 412 and 414. It is evident from FIG. 4 that the combination of text chunks 412 and 414 has greater semantic integrity than the combination of text chunks 406 and 408. For example, text chunk 406 appears to be truncated after the word “composed,” which causes text chunk 408 to have less meaning by itself. Additional aspects of the lightweight pointer neural network are described in greater detail with reference to FIG. 5.”

Papangelis (U.S. 12293758) 
“The response generation component 170 may generate the output data 175 using a ML model. For example, the ML model may take as input the dialog context data 135 and the opinion-based information data 165, and may generate the output data 175 therefrom. …. In some embodiments, the ML model may be a decoder-only model (e.g., GPT-2) or an encoder-decoder model (e.g., a BART-based model).”  15:11-31. 
“In some embodiments, the opinion component 147 may implement a machine learning (ML) model configured to determine whether generation of a response, to a user input, requires use of opinion-based information. For example, the ML model may be configured to take as input the dialog context data 135 and output an indication of whether generation of a response to the present user input, represented in the input data 127, requires use of opinion-based information. In some embodiments, the indication may be a score. For example, an output of less than 0.5 may indicates that opinion-based information is not required, whereas an output of 0.5 or higher may indicate opinion-based information is required. The ML model may be trained using supervised learning methods. For example, during training, the ML model may receive a training input pair including natural language data (e.g., text or tokens) and an indication of whether generation of a response, to the natural language data, requires use of opinion-based information, and the ML model may be tasked with properly predicting whether generation of a response, to natural language data, requires uses of opinion-based information. Based on whether the ML model's prediction matches the indication included in the training input pair, the ML model may be configured (or reconfigured) accordingly (e.g., based on a cross-entropy loss). In some embodiments, the ML model may take as input an encoded representation of the dialog context.” 7:44-8:2.

Bhowal (U.S. 20200327197: 
Figures 1 and 4, the “response generation component 128” is a “machine learning component” that digests “documents” and generates sets of questions and answers from the document and stores them in the  “data storage device 124.”  “[0042] The response generation component autonomously generates the set of generated utterances. A human user is not required to manually generate utterances and responses for the chatbot. Instead, the user simply inputs a policy document containing the relevant information into the system 100 and the response generation component automatically analyzes the document for facts associated with various intents and entities. The response generation component applies machine learning (pattern recognition) to anticipate what questions may be asked by users based on information/facts in the policy document. The response generation component generates all possible questions (utterances) which may be asked by users and maps those questions to answers extracted from the document. The system converts this information into the set of generated utterances and answers for use by the chatbot system 100.”  “[0056] … A policy document in some examples is a document in a PDF format or a word document.”
 “[0059] In other examples, the utterance generator breaks a document down into sub-documents by the process of moving cumulative sentences of sizes up to 5. Keywords are obtained from these sub-documents via intent identification, in which each sub document is treated as an utterance. Keyword extraction is performed on each sub-document to obtain one or more keywords. Also, sub-documents are trained with different word-embedding techniques.”]
“[0070] In other non-limiting examples, similarity scores for each document or sub-document associated with an utterance is also computed. A response is created based on the aggregation of all the above outputs, coupled with intelligent summarization to make the response short and crisp.”

Clinchant (U.S. 20100082615)
“[0003] Many information retrieval systems are text-based. That is, the information retrieval system receives a textual query and searches textual content of documents for similarities with the textual query, such as the same or similar words or terms, common semantic content (based, for example, on derivation of semantically related words determined using an on-line thesaurus), or so forth. In a more complex approach, language models may be developed to represent the query and documents to be searched, and the information retrieval is based on similarity of query and document language models.”

Ko (U.S. 20250335445):
[0040] In other embodiments, the relevant data string may be found through Language Model for Information Retrieval (LMIR), but the present disclosure is not limited thereto. Language Model for Information Retrieval (LMIR) may use language models to calculate relevance scores between documents and queries. The most common formula may be to use KL (Kullback-Leibler) divergence to measure the similarity between documents and queries. Here is the KL divergence formula used in LMIR:

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Fariba Sirjani/
Primary Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Show 5 earlier events
Mar 17, 2026
Response Filed
Apr 22, 2026
Final Rejection mailed — §101, §103
Apr 24, 2026
Interview Requested
Apr 30, 2026
Examiner Interview Summary
Apr 30, 2026
Applicant Interview (Telephonic)
May 08, 2026
Response after Non-Final Action
May 20, 2026
Request for Continued Examination
May 22, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/390,830
Patent 12640143
UTILIZING GENERATIVE MODEL IN GENERATING SUMMARY OF LONG-FORM CONTENT
2y 5m to grant Granted May 26, 2026
18/788,501
Patent 12640159
VOICE SIGNAL PROCESSING DEVICE, VOICE SIGNAL PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM STORING VOICE SIGNAL PROCESSING PROGRAM
1y 10m to grant Granted May 26, 2026
18/457,121
Patent 12614558
Method and Apparatus for Detecting Correctness of Pitch Period
2y 8m to grant Granted Apr 28, 2026
18/406,418
Patent 12605109
APPARATUS AND METHOD FOR DETERMINING BRAIN LANGUAGE AREA INVASION BASED ON SPEECH DATA
2y 3m to grant Granted Apr 21, 2026
18/454,031
Patent 12603099
SELF-ADJUSTING ASSISTANT LLMS ENABLING ROBUST INTERACTION WITH BUSINESS LLMS
2y 7m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
76%
Grant Probability
99%
With Interview (+31.5%)
2y 9m (~8m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 554 resolved cases by this examiner. Grant probability derived from career allowance rate.