Last updated: April 19, 2026
Application No. 18/427,014
SEMANTIC SEARCHING OF STRUCTURED DATA USING GENERATED QUERY SPACES

Final Rejection §101§103
Filed
Jan 30, 2024
Examiner
VOGT, JACOB BUI
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Salesforce Inc.
OA Round
2 (Final)
Interview Optional

— +100.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 7 resolved cases, 2023–2026
Examiner Intelligence

VOGT, JACOB BUI View full profile →
Grants 57% of resolved cases
Career Allow Rate
4 granted / 7 resolved
-4.9% vs TC avg
Strong +100% interview lift
Without
With
+100.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
33 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
35.1%
-4.9% vs TC avg
§103
43.8%
+3.8% vs TC avg
§102
8.7%
-31.3% vs TC avg
§112
10.6%
-29.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 7 resolved cases
Office Action

§101 §103
DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on 12/17/2025. Claims 1-8, 10-17, 19, and 20 are pending and have been examined. Hence, this action has been made FINAL.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
The reply filed on 12/17/2025 has been entered. Applicant’s arguments with respect to claims 1-8, 10-17, 19, and 20 have been considered but are not persuasive/moot in view of new ground(s) of rejection caused by the amendments.
With respect to the applicant’s arguments to claim rejections under 35 U.S.C § 101, Applicant has amended each of the independent claims and asserts that “the human mind is not equipped for "training a large language model," let alone using that large language model for "generating ... a plurality of candidate natural language queries ... in accordance with the training of the large language model," as recited in amended independent claim 1. Specifically, the human mind cannot practically process the volume and complexity of data typically associated with AI model training and execution across multiple entity datasets. For example, "the human mind is not equipped to detect suspicious activity by using network monitors and analyzing network packets." See SRI Int'l, Inc. v. Cisco Systems, Inc., 930 F.3d 1295, 1304 (Fed. Cir. 2019).” The examiner respectfully disagrees with these assertions. While training a large language model may include steps and processes that cannot be practically performed entirely within a human mind, the amended claim language fails to claim these specific steps and processes, instead only generally claiming the “training [of] a large language model using one or more serialized sets of data.” The additional element of a “large language model” in the amended claim language is recited at a high level of generality (¶ [0035]) and merely equates to “apply it” or otherwise merely uses a generic computer as a tool to perform an abstract which are not indicative of integration into a practical application as per MPEP 2106.05(f). Further details may be found below with respect to claim rejections under 35 USC § 101.
Applicant further asserts that “the human mind is not equipped for "automatically converting a structured set of metadata associated with a structured data object to a serialized set of metadata," at least because the human mind cannot convert data from a "structured" format to a "serialized" format automatically, as recited in amended independent claim 1.” The examiner respectfully disagrees with these assertions. The term “automatically” does not in and of itself tie the claim limitation to a computing environment. In fact, a human can “automatically convert” structured metadata into serialized metadata mentally by simply looking at the structured metadata and interpreting it mentally as a single long sentence. 
Applicant further asserts that “utilizing metadata "rather than an entire document itself' reduces "memory and power consumption" and generating candidate natural language queries allows for users to "engage with a search or discovery process in a natural way without having to rely on specific and accurate keywords in a query."” The examiner respectfully disagrees with these assertions. While the use of metadata and query generation are noted, these traits do not bring inherent benefit to the field of large language models as a whole. First, utilizing metadata instead of an entire document as input to an LLM cannot provide inherent benefit to the field of LLMs because there are obvious drawbacks that could come with the disclosed method. For example, there is additional overhead in ensuring that the databases that store structured documents also support compatible metadata formats for each of the structured documents. Further, the resources required to convert structured metadata to serialized metadata before querying the LLM could potentially offset the claimed benefit of reducing “memory and power consumption”. As such, simply utilizing metadata instead of entire documents cannot be considered an inherent benefit of the invention of the instant application. Second, generating candidate natural language queries cannot provide inherent benefit to the field of LLMs as claimed because the technology of LLMs already provide the claimed benefit. For example, LLMs allow users to perform an information retrieval task using natural language queries. The ability to interact with LLMs using natural language queries inherently allows a user to “engage with a search or discovery process in a natural way without having to rely on specific and accurate keywords in a query.” Thus, because the technological field of LLMs already provides this inherent benefit, there is no specific claimed benefit that the invention of the instant application actually provides.
Applicant further asserts, with respect to Ex Parte Desjardins, that “The current § 101 rejection in this case falls into the "high level of generality" trap that the ARP urges be avoided. As explained above, the use of the large language model based on the training recited in claim 1 is non-conventional, and is not routine” The examiner respectfully disagrees with these assertions. As amended, the claim language states “training a large language model using one or more serialized sets of data” and “generating, via the large language model, a plurality of candidate natural language queries that correspond to the structured data object by inputting a set of the serialized metadata associated with the data object into the large language model in accordance with the training of the large language model.” The examiner fails to see any component of these claim limitations that could be construed as “non-conventional” or “not routine.” In fact, generating natural language queries using a serialized input is well-known in the field of LLM inferencing, and is not a feature unique to the invention of the instant application. Further, the ARP decision in Ex Parte Desjardins was based off of the invention dynamically updating internal weights/parameters of a machine learning model, which provided an inherent benefit to improving LLMs as a whole. See pg. 9, paragraph 1 of the Ex Parte Desjardins ARP decision, which states, “When evaluating the claim as a whole, we discern at least the following limitation of independent claim 1 that reflects the improvement: "adjust the first values of the plurality of parameters to optimize performance of the machine learning model on the second machine learning task while protecting performance of the machine learning model on the first machine learning task." We are persuaded that constitutes an improvement to how the machine learning model itself operates, and not, for example, the identified mathematical calculation.” The claims as a whole in Ex Parte Desjardins show a specific improvement in the training of a machine learning model, that is, by dynamically updating and improving the internal weights/parameters of the machine learning model. Importantly, the invention in Ex Parte Desjardins did not simply apply machine learning or LLMs to a known field such as information retrieval, but instead improved the inherent technology behind machine learning as a whole. As amended, there is no language in the independent claims that provide a similar benefit to the one recited by the ARP board in Ex Parte Desjardins. Further details can be found below with respect to claim rejections under 35 USC § 101.
With respect to the applicant’s arguments to claim rejections under 35 U.S.C § 102 and 103, the applicant’s arguments with respect to claims  1-8, 10-17, 19, and 20 have been considered but are not persuasive.
With respect to Penha et al. and its relevance to claim rejections under 35 USC § 103, Applicant asserts that “Penha describes generating queries using three datasets comprising entities, queries, and judgments, which does not teach or suggest "training a large language model using one or more serialized sets of data," as recited in amended independent claim 1. Rather, Penha describes datasets being used to generate queries using machine learning models in which Penha is silent as to the datasets comprising "one or more serialized sets of data."” The examiner respectfully disagrees with these assertions. Penha et al. disclose training a large language model  (Penha et al. pg. 3187, Section 4.2.1, Paragraph 1, "For the CtrlQGen implementation we also rely on the T5 (t5-base) model." T5 is considered analogous to a large language model) using one or more serialized sets of data (Penha et al. pg. 3185, Section 3.1.3, Paragraph 1, "We train the model with the following prompt: “Generate a query with narrow/broad intent from: <serialized_entity>” and its respective query as the output."). Accordingly, Penha et al. specifically states that its large language model, CtrlQGen, is trained using <serialized_entity> as input.
Applicant further asserts that “Penha does not indicate that "one or more serialized sets of data" are "used to train the large language model," let alone that "a structured set of metadata" is automatically converted "to a serialized set of metadata ... that corresponds to a same format as the one or more serialized sets of data used to train the large language model."” The examiner respectfully disagrees with these assertions. Penha et al. discloses training a large language model using a serialized entity as input; see above. Further, pg. 3186, Section 3.2.1., Paragraph 1 of Penha et al. disclose, “For a randomly sampled set of entities             
                E
                '
            
         from the collection             
                E
            
         we apply CtrlQGen with both desired intents … for each             
                e
            
         in             
                E
                '
            
        . After that, given a desired weight proportion of broad queries and narrow queries… we can sample training in-stances from the synthetic generated queries             
                Q
                '
            
         for training the Bi-Encoder.” Accordingly, Penha et al. disclose automatically converting a structured set of metadata to a serialized set of metadata that corresponds to a same format as the one or more serialized sets of data used to train the large language model. The term “automatically” holds no patentable weight or distinction from Penha et al. because the claim language fails to specify how “automatically” converting metadata differs from simply converting metadata. As such, under the broadest reasonable interpretation, Penha et al. disclose all of the deficiencies of Dicklin et al., as described further below with respect to claim rejections under 35 USC § 103.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-8, 10-17, 19, and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. All of the claims are method claims (1-8), apparatus/machine claims (10-17, 19, 20) or manufacture claim under (Step 1), but under Step 2A all of these claims recite abstract ideas and specifically mental processes. These mental processes are more particularly recited in claims 1, 10, and 19 as:
training a large language model using one or more serialized sets of data…
automatically converting a structured set of metadata associated with a structured data object to a serialized set of metadata associated with the structured data object…
inputting, into the large language model, the serialized set of metadata associated with the structured data object…
generating, via the large language model, a plurality of candidate natural language queries that correspond to the structured data object by inputting the serialized set of metadata associated with the structured data object into a large language model in accordance with the training of the large language model…
embedding the plurality of candidate natural language queries into a first set of vectors and a natural language query received from a user into a second vector…
a vector-space comparison of the second vector to the first set of vectors…
Under Step 2A Prong One, claims 1, 10, and 19 are directed to an abstract idea and specifically a mental process. As detailed above, the steps of training, converting, inputting, generating, embedding, etc. may be practically performed in the human mind with the use of a physical aid such as a pen and paper. For example, a human could familiarize themselves with how to convert structured metadata to serialized metadata and receive a structured document, a corresponding set of structured metadata, and instruction to find similar documents from their boss. The human could then rely on their familiarity with metadata conversion to convert the structured set of metadata corresponding to the structured document to a serialized set of metadata, generate candidate queries based off of the serialized set of metadata, calculate vector embeddings for both the candidate queries and their boss’ instruction, and send their boss the most relevant documents based on a vector-space cosine similarity comparison between the candidate query embeddings and boss’ query embedding.
Under Step 2A Prong Two, this judicial exception is not integrated into a practical application because claims 1-20 do not recite additional elements that integrate the exception into a practical application. In particular, claims 1, 10, and 19 recite the additional elements of a large language model (¶ [0035]), memory storing code (¶ [0072]), and a processor (¶ [0073]). These additional elements are recited at a high level of generality and merely equate to “apply it” or otherwise merely uses a generic computer as a tool to perform an abstract which are not indicative of integration into a practical application as per MPEP 2106.05(f). Further, claims 1, 10, and 19 recite the additional elements of “causing for display…” which amounts to insignificant extra-solution activities which are not indicative of integration into a practical application as per MPEP 2106.05(g). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Under Step 2B, the claims do not recite additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using a computer is noted as a general computer {large language model (¶ [0035]); memory storing code (¶ [0072]); processor (¶ [0073])}. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, the additional limitations in the claims noted above are directed towards insignificant extra-solution activities. The claims are not patent eligible.
With respect to claims 2, 11, and 20, the claim relates to generating vectors using an embedding model. The additional element of an “embedding model” is recited at a high level of generality (¶ [0063]) and merely equates to “apply it” or otherwise merely uses a generic computer as a tool to perform an abstract which are not indicative of integration into a practical application as per MPEP 2106.05(f). No additional limitations are present. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claims 3 and 12, the claim relates to specifying the vector-space comparison to a nearest neighbor approach. This relates to a human performing vector-space comparison of the candidate queries and the boss query by utilizing a nearest neighbor algorithm. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claims 4 and 13, the claim relates to basing the indication of a data object on the vector space comparison. This relates to a human sending their boss the most relevant document based on a vector-space comparison between the candidate query embeddings and boss’ query embedding. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claims 5 and 14, the claim relates to removing candidate queries that are irrelevant to the data object. This relates to a human reviewing their created candidate queries for a given document, and removing ones that in hindsight do not relate or hallucinate information. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claims 6 and 15 the claim relates to generating a prompt that indicates a set of metadata to the LLM. This relates to a human using metadata to generate a prompt derived from features of a document and creating queries based on the prompt. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claims 7 and 16 the claim relates to storing generated vectors in a vector database. This relates to a human writing down and filing each vector they generate from the set of candidate queries they created from the set of metadata . The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claims 8 and 17 the claim relates to the plurality of candidate queries comprising fragments of each candidate query. This relates to a human only generating partial candidate queries responsive to receiving the set of metadata. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
For all of the above reasons, taken alone or in combination, claims 1-8, 10-17, 19, and 20 recite a non-statutory mental process.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-8, 10-13, 15-17, 19, and 20 are rejected under 35 U.S.C. 103 as obvious over US Patent Publication 20250103826 A1 (Dicklin et al.) in view of “Improving Content Retrievability in Search with Controllable Query Generation” (Penha et al.).
Claim 1
Regarding claim 1, Dicklin et al. disclose training a large language model (Dicklin et al. ¶ [0054], "the generative MLM 120 may be based on or include one or more deep learning machine learning models. One or more of those deep learning models may include a transformer-based large language model (LLM)" using one or more [serialized] sets of data (Dicklin et al. ¶ [0026], "the generative MLM is trained to gather and analyze numerous documents stored at the cloud-based data storage system, derive meaning of content in the documents, and/or generate relevant information needed by a user to make a decision or perform an action" Numerous documents are considered analogous to sets of data);
generating, via the large language model, a plurality of candidate natural language queries that correspond to the structured data object (Dicklin et al. ¶ [0026], "The different documents may be stored in different locations ... in documents with different formats (e.g., word processing documents, emails, spreadsheets, and image files)" Documents in different formats, and the listed examples thereof, are considered analogous to a structured data object) by inputting a set of [the serialized] metadata associated with the structured data object into the large language model in accordance with the training of the large language model (Dicklin et al. ¶ [0032], "The document pre-processing subsystem can cause the generative MLM to use a document of the cloud-based data storage system, or portions of the document, as input and generate generative MLM prompts about the document or the document portions" Document portions are considered analogous to metadata. Generative MLM prompts are considered analogous to natural language queries.;
embedding the plurality of candidate natural language queries into a first set of vectors (Dicklin et al. ¶ [0032], "The document pre-processing subsystem may then input these generated prompts into the embedding model to generate query embeddings that are digital representations of the prompts.") and a natural language query received from a user into a second vector (Dicklin et al. ¶ [0031], "a user of the platform may submit a text prompt to the system, and the system may input the text prompt into the embedding model to obtain a query embedding for the text prompt."); and
causing for display  (Dicklin et al. ¶ [0131], "The cloud content management system 110 may send the generative MLM response to the client 140-1, which may display the generative MLM response in a text box 804-1.") an indication of the structured data object as being related to the natural language query received from the user (Dicklin et al. ¶ [0035], "The generative response subsystem may input the generative MLM prompt and the documents or document portions from the search results into the generative MLM to generate a generative MLM response to the user's prompt. The response may include citations to the documents or document portions from the search results." A citation is considered analogous to an indication.) based at least in part on a vector-space comparison of the second vector to the first set of vectors (Dicklin et al. ¶ [0031], "The text prompt's query embedding can be compared to the query embeddings associated with a document or portions of that document. If the text prompt's query embedding is sufficiently similar to a document's or a document portion's query embedding, then the platform can determine that the document or document portion is likely relevant to the user's text prompt." ¶ [0078], "The query embedding may include a vector.").
Dicklin et al. do not explicitly disclose all of converting a structured set of metadata into a serialized set of metadata associated with a structured data object.
However, Penha et al. disclose training a large language model  (Penha et al. pg. 3187, Section 4.2.1, Paragraph 1, "For the CtrlQGen implementation we also rely on the T5 (t5-base) model." T5 is considered analogous to a large language model) (Dicklin et al. ¶ [0026], "the generative MLM is trained to gather and analyze numerous documents stored at the cloud-based data storage system, derive meaning of content in the documents, and/or generate relevant information needed by a user to make a decision or perform an action" ¶ [0054], "the generative MLM 120 may be based on or include one or more deep learning machine learning models. One or more of those deep learning models may include a transformer-based large language model (LLM)") using one or more serialized sets of data (Penha et al. pg. 3185, Section 3.1.3, Paragraph 1, "We train the model with the following prompt: “Generate a query with narrow/broad intent from: <serialized_entity>” and its respective query as the output.");
automatically converting a structured set of metadata associated with a structured data object to a serialized set of metadata associated with the structured data object (Penha et al. pg. 3184-3185, Section 3, Paragraph 1, "The serialization module is required to obtain a text representation for a given entity so that text-based models can use that as input." pg. 3185, Section 3.1.1, Paragraph 1, "[the serialization module] takes as input an entity             
                e
            
         and outputs a string representation of the entity:             
                
                    
                        e
                    
                    
                        s
                        e
                        r
                        i
                        a
                        l
                        i
                        z
                        e
                        d
                    
                
                =
                s
                (
                e
                )
            
        . The serialization function             
                s
            
         concatenates every metadata column of the entity with their respective values" Entity             
                e
            
         is considered analogous to a structured set of metadata.              
                
                    
                        e
                    
                    
                        s
                        e
                        r
                        i
                        a
                        l
                        i
                        z
                        e
                        d
                    
                
            
         is considered analogous to a serialized set of metadata.See figure that follows Section 3.1.1, paragraph 1.) that corresponds to a same format as the one or more serialized sets of data used to train the large language model  (Penha et al. pg. 3185, Section 3.1.3, Paragraph 1, "We train the model with the following prompt: “Generate a query with narrow/broad intent from: <serialized_entity>” and its respective query as the output." A serialized entity is considered analogous to a set of metadata using the same format as the set of data used to train an LLM);
inputting, into the large language model, the serialized set of metadata associated with the structured data object based at least in part on converting the structured set of metadata (Penha et al. pg. 3185, Section 3.1.3, Paragraph 1, "We train the model with the following prompt: “Generate a query with narrow/broad intent from: <serialized_entity>” and its respective query as the output."); and
generating, via the large language model, a plurality of candidate natural language queries that correspond to the structured data object by inputting a set of the serialized metadata associated with the structured data object into the large language model (Penha et al. pg. 3186, Section 3.2.2, Paragraph 1, "for a given input query             
                q
            
        , we can obtain a list             
                
                    
                        R
                    
                    
                        q
                    
                
            
         with the top-k entities ranked for it using a ranking model. For each entity in the top-k ranked list, we apply CtrlQGen to generate a set of broad queries             
                Q
                '
            
         to recommend:             
                
                    
                        Q
                    
                    
                        '
                    
                
                =
                {
                G
                
                    
                        
                            
                                e
                            
                            
                                i
                            
                        
                        ,
                        b
                        r
                        o
                        a
                        d
                    
                
                }
            
         for             
                
                    
                        e
                    
                    
                        i
                    
                
            
         in             
                
                    
                        R
                    
                    
                        q
                    
                
            
        ." ) in accordance with the training of the large language model (Penha et al. pg. 3185, Section 3.1.3, Paragraph 1, "we train an encoder-decoder model             
                G
            
         that receives as input the entity and the underlying intent to control for, and it outputs the query:             
                G
                
                    
                        e
                        ,
                        i
                    
                
                =
                q
            
        . ... We train the model with the following prompt: “Generate a query with narrow/broad intent from: <serialized_entity>” and its respective query as the output." Input             
                e
            
         of encoder-decoder model             
                G
            
        , otherwise referred to as <serialized_entity>, is considered analogous to a set of serialized metadata).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify Dicklin et al.’s information retrieval system to include Penha et al.’s serialized metadata because such a modification is the result of combining prior art elements according to known methods to yield predictable results.  More specifically, Dicklin et al.’s document input as modified by Penha et al.’s serialized metadata can yield a predictable result of expanding system functionality since Penha et al.’s metadata-based conversion and training would allow additional forms of input to the LLM by a user. Thus, a person of ordinary skill would have appreciated including in Dicklin et al.’s information retrieval system the ability to do Penha et al.’s serialized metadata since the claimed invention is merely a combination of old elements, and in the combination each element merely would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Claim 2
Regarding claim 2, the rejection of claim 1 is incorporated. 
Dicklin et al. further disclose generating the first set of vectors and the second vector using an embedding model (Dicklin et al. ¶ [0031], "The preparation can also include inputting a portion of the document (e.g., a sentence, paragraph, or section of the document) into the embedding model to generate a query embedding for the portion of the document ... a user of the platform may submit a text prompt to the system, and the system may input the text prompt into the embedding model to obtain a query embedding for the text prompt.").
Claim 3
Regarding claim 3, the rejection of claim 1 is incorporated. 
Dicklin et al. further disclose wherein causing for display the indication of the structured data object further comprises: 
performing the vector-space comparison based at least in part on measuring a distance between the second vector to a nearest neighbor among the first set of vectors (Dicklin et al. ¶ [0126], "In some embodiments, determining whether two query embeddings are within a threshold similarity may include performing a nearest neighbor search, determining a cosine similarity between the query embeddings, or some other calculation that can determine a distance between multiple query embeddings."), measuring a distance between the second vector and a hyperplane associated with the first set of vectors, or both.
Claim 4
Regarding claim 4, the rejection of claim 1 is incorporated. 
Dicklin et al. further disclose wherein causing for display the indication of the structured data object further comprises:
performing a comparison between the second vector and one or more vectors of the first set of vectors, wherein the indication of the structured data object is based at least in part on the comparison (Dicklin et al. ¶ [0035], "The generative response subsystem can re-rank or refine the search results based on ... a query embedding of the prompt as compared to a query embedding associated with a document or document portion, or other search criteria. The generative response subsystem may input the generative MLM prompt and the documents or document portions from the search results into the generative MLM to generate a generative MLM response to the user's prompt. The response may include citations to the documents or document portions from the search results.").
Claim 6
Regarding claim 6, the rejection of claim 1 is incorporated.
Penha et al. further disclose generating a prompt indicating the structured set of metadata to the large language model, wherein the plurality of candidate natural language queries is generated in accordance with the prompt (Penha et al. pg. 3185, Section 3.1.3, Paragraph 1, "We train the model with the following prompt: “Generate a query with narrow/broad intent from: <serialized_entity>” and its respective query as the output." See input section within the figure that follows Section 3.1.3, paragraph 1. See Figure 2, left (CtrlQGen pipeline). Generating a prompt that includes both the serialized entities and intent in order to generate a natural language query output is considered analogous to generating a prompt indicating a set of metadata to an LLM.).
Claim 7
Regarding claim 7, the rejection of claim 1 is incorporated. 
Dicklin et al. further disclose storing the first set of vectors in a vector database (Dicklin et al. ¶ [0090], "the document pre-processing subsystem 112 may store the query embeddings generated in each iteration of block 326 on the cloud content management system 110. This may include storing the query embeddings in a memory of the cloud content management system 110." ¶ [0078], "The query embedding may include a vector.").
Claim 8
Regarding claim 8, the rejection of claim 1 is incorporated. 
Dicklin et al. further disclose wherein the plurality of candidate natural language queries comprises one or more fragments of each candidate natural language query (Dicklin et al. ¶ [0032], "The document pre-processing subsystem can cause the generative MLM to use a document of the cloud-based data storage system, or portions of the document, as input and generate generative MLM prompts about the document or the document portions" One or more fragments could include the whole prompt if all fragments are included, thus generating MLM prompts is considered analogous to generating one or more fragments of each candidate NL query).
Claim 10
Regarding claim 10, Dicklin et al. disclose an apparatus for data processing, comprising:
one or more memories storing processor-executable code (Dicklin et al. ¶ [0195], "The instructions 1426 can also reside, completely or at least partially, within the volatile memory 1404 and/or within the processing device 1402 during execution thereof by the computer system 1400"); and
one or more processors coupled with the one or more memories and individually or collectively operable to execute the code (Dicklin et al. ¶ [0193], "The processing device 1402 represents one or more general-purpose processing devices such as a microprocessor, central processing unit, graphics processing unit, or the like.").
The remaining limitations of claim 10 are similar to the limitations of claim 1 and therefore are rejected for similar reasons as described above.
Claim 11
Regarding claim 11, the rejection of claim 10 is incorporated. The limitations of claim 11 are similar to that of claim 2 and therefore are rejected for similar reasons as described above.
Claim 12
Regarding claim 12, the rejection of claim 10 is incorporated. The limitations of claim 12 are similar to that of claim 3 and therefore are rejected for similar reasons as described above.
Claim 13
Regarding claim 13, the rejection of claim 10 is incorporated. The limitations of claim 13 are similar to that of claim 4 and therefore are rejected for similar reasons as described above.
Claim 15
Regarding claim 15, the rejection of claim 10 is incorporated. The limitations of claim 15 are similar to that of claim 6 and therefore are rejected for similar reasons as described above.
Claim 16
Regarding claim 16, the rejection of claim 10 is incorporated. The limitations of claim 16 are similar to that of claim 7 and therefore are rejected for similar reasons as described above.
Claim 17
Regarding claim 17, the rejection of claim 10 is incorporated. The limitations of claim 17 are similar to that of claim 8 and therefore are rejected for similar reasons as described above.
Claim 19
Regarding claim 19, Dicklin et al. disclose a non-transitory computer-readable medium storing code for data processing, the code comprising instructions executable by one or more processors (Dicklin et al. ¶ [0195], "The instructions 1426 can also reside, completely or at least partially, within the volatile memory 1404 and/or within the processing device 1402 during execution thereof by the computer system 1400").
The remaining limitations of claim 19 are similar to the limitations of claim 1 and therefore are rejected for similar reasons as described above.
Claim 20
Regarding claim 20, the rejection of claim 19 is incorporated. The limitations of claim 20 are similar to that of claim 2 and therefore are rejected for similar reasons as described above.

Claims 5 and 14 are rejected under 35 U.S.C. 103 as obvious over Dicklin et al. in view of Penha et al. as applied to claims 1 and 10 above, and further in view of "Doc2Query--: When Less is More"  (Gospodinov et al.).
Claim 5
Regarding claim 5, the rejection of claim 1 is incorporated. Dicklin et al. disclose all the elements of the claimed invention as stated above.
Dicklin et al. do not explicitly disclose all of removing candidate queries based on consistency filtering.
However, Gospodinov et al. disclose removing one or more candidate natural language queries from the plurality of candidate natural language queries based at least in part on applying consistency filtering, wherein the consistency filtering comprises matching each candidate natural language query of the plurality of candidate natural language queries to the structured data object (Gospodinov et al. pg. 3, Section 3, Paragraph 1, "Doc2Query-- consists of two phases: a generation phrase and a filtering phase. In the generation phase, a Doc2Query model generates a set of n queries that each document might be able to answer. However, as shown in Figure 1, not all of the queries are necessarily relevant to the document. To mitigate this problem, Doc2Query-- then proceeds to a filtering phase, which is responsible for eliminating the generated queries that are least relevant to the source document.").
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify Dicklin et al.’s information retrieval system to incorporate Gospodinov et al.’s irrelevant query removal. 
The suggestion/motivation for doing so would have been that, “Because hallucinated queries contain details not present in the original text (by definition), we argue that hallucinated queries are less useful for retrieval than non-hallucinated ones” as noted by the Gospodinov et al. disclosure in pg. 3, Section 3, Paragraph 1.
Claim 14
Regarding claim 14, the rejection of claim 10 is incorporated. The limitations of claim 14 are similar to that of claim 5 and therefore are rejected for similar reasons as described above.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB B VOGT whose telephone number is (571)272-7028. The examiner can normally be reached Monday - Friday 9:30am - 7pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D Shah can be reached at (571)270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JACOB B VOGT/               Examiner, Art Unit 2653                                                                                                                                                                                         
/Paras D Shah/               Supervisory Patent Examiner, Art Unit 2653                                                                                                                                                                                         

02/12/2026
Read full office action
Prosecution Timeline

Jan 30, 2024
Application Filed
Sep 12, 2025
Non-Final Rejection — §101, §103
Dec 04, 2025
Applicant Interview (Telephonic)
Dec 04, 2025
Examiner Interview Summary
Dec 17, 2025
Response Filed
Feb 12, 2026
Final Rejection — §101, §103
Apr 08, 2026
Applicant Interview (Telephonic)
Apr 08, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/103,858
Patent 12505279
METHOD AND SYSTEM FOR DOMAIN ADAPTATION OF SOCIAL MEDIA TEXT USING LEXICAL DATA TRANSFORMATIONS
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 1 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
57%
Grant Probability
99%
With Interview (+100.0%)
2y 10m
Median Time to Grant
Moderate
PTA Risk
Based on 7 resolved cases by this examiner. Grant probability derived from career allow rate.