DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on February 5, 2026 has been entered.
Response to Amendment
In response to the amendment filed on January 12, 2026:
Claims 1, 6, 9, and 17 are amended.
Claims 1-20 are pending.
Response to Arguments
In response to the remarks filed on January 26, 2026:
Applicant’s remarks toward the 35 U.S.C. 102(a)(1) and 103 rejections have been fully considered but are moot in view of a new ground of rejections presented hereon.
Claim Objections
Claims 1, 9, and 17 are objected to for citing an indefinite term of “usable” in the limitation of “…wherein the plurality of valid queries are known to be usable for generating…” each claim since there is no guarantee that the plurality of valid queries would be used to generate the claimed structured queries. One example to resolve such deficiency is as follows: “…wherein the plurality of valid queries are known ” Correction and/or revision are required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
The claimed invention in claims 1-20 are directed to a judicial exception (i.e., an abstract idea) without significantly more.
Claims 1-20 pass step 1 of the 35 U.S.C. 101 analysis since each claim is either directed to a method, or a computing system comprising at least one memory component and at least one processor (e.g., hardware components per Figure 3 and supporting texts).
Claim 1 recites each, in part, elements that are directed to an abstract idea (“Courts have examined claims that required the use of a computer and still found that the underlying, patent-ineligible invention could be performed via pen and paper or in a person’s mind.” Versata Dev. Group v. SAP Am., Inc., 793 F.3d 1306, 1335, 115 USPQ2d 1681, 1702 (Fed. Cir. 2015)). Each claim recites the limitation of generating…a natural language query embedding…; determining…a validity of the natural language query by comparing…; and converting…the natural language query into a structured query… The limitations, as drafted, is a process that, under its broadest reasonable interpretation, fall into the following judicial exception categories step 2A – prong 1 of the Abstract Idea Analysis:
Mathematical Concepts: the use of “embeddings” and “vector space” represents the application of mathematical algorithms to data.
Mental Processes: the steps of “determining validity” by comparing a request to known valid examples mirror the mental process of a human librarian or data analyst would perform when evaluating whether a request is clear enough to be fulfilled.
Certain Methods of Organizing Human Activity: the gating logic of approval or denial based on a threshold dis a fundamental organization practice (e.g., screening or filtering requests).
Under step 2A – prong 2 of the abstract idea analysis, the claim further recites extra-solution activities (i.e., receiving, generating, converting, and retrieving) and nothing in the limitations indicates a specific improvement in the functioning of the computer itself (e.g., new memory structures, reduced latency mechanisms, new database architecture) or in how databases execute queries. The cited DMBS and language generation model are recited as generic computer components performing their well-known functions. The claim merely uses the computer as a tool to automate the abstract idea of filtering since the claim focuses on the logic of the data flow rather than a technical solution to a technical problem in computer architecture.
Under step 2B of the abstract idea analysis, the additional elements in the claim, considered both individually and as an ordered combination, do not amount to an inventive concept sufficient to transform the abstract idea into a patent-eligible invention. The concept of converting natural language to SQL and retrieving data are well-understood, routine, and conventional (WURC) activities in the field of database management. The use of embeddings for similarity matching is a standard NLP technique. The extra-solution activities (i.e., receiving, generating, converting, and retrieving) in step 2A - prong 2 are reevaluated in step 2B to determining if each limitation is more than what is well-understood, routine, conventional activity in the field. The background of the limitations does not provide any indication that the computer components (e.g., a computing system, an embedding model of the computing system, or a language generation model of the computing system) are not off-the-shelf computer components, e.g., generic hardware or software modules for implementing such functions). The Symantec, TLI, and OOP Techs court decisions cited in MPEP 2106.05(d)(II) indicate that mere receiving, generating, storing, determining, identifying, and transmitting of data over a network are WURC functions when claimed in a merely generic manner (as it is here). Accordingly, a conclusion that the claims are well-understood, routine, conventional activity is supported under Berkheimer Option 2. For these reasons, there is no inventive concept in each claims, thus, the claims are ineligible.
Claim 2 further recites limitations of tokenizing…to obtain a sequence of tokens; and computing…a vector representing…based on the sequence of token… which can be implemented in a human mind and/or with the aid of pen/paper (e.g., writing down the sequence of tokens based on the natural language query; and constructing on paper vector representing the natural language based on the sequence of tokens). The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 3 merely provides a definition for the structured query comprises a database query format. The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 4 further recites a limitation of computing a distance between the natural language query embedding and the valid query embedding… which can be implemented in a human mind and/or with the aid of pen/paper (e.g., mentally determining the distance). The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 5 further recites a limitation of determining that the distance is less than a threshold… which can be implemented in a human mind and/or with the aid of pen/paper (e.g., mentally determining the threshold and comparing with the distance). The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 6 further recites limitations of computing a quantile of nearest neighbor distances… which can be implemented in a human mind and/or with the aid of pen/paper (e.g., mentally computing the quantile value). Such computation is a conventional statistical technique for outlier detection. Applying a statistical formula to a set of distances is the mere application of an abstract mathematical concept and does not provide a technical improvement. Thus, the claim is ineligible.
Claim 7 further recites a limitation of normalizing the natural language query embedding… which can be implemented in a human mind and/or with the aid of pen/paper (e.g., writing down a compact representation of the original embedding, and mentally determining the distance based on the normalized embedding). The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 8 further recites a limitation of computing a distance… which can be implemented in a human mind and/or with the aid of pen/paper (e.g., mentally determining the distances based on a plurality of valid embeddings). The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 9 recites similar limitations of claim 1 with the additional elements of “computing…a distance” “determining…that the distance is greater than a threshold distance” and “generating…an invalidity response based on the determination.” The computation of a distance and evaluation of the distance against a threshold can be categorized as a mental process and mathematical concepts as presented in claim 1. Further, providing an error message after a validation failure is a WURC software response and does not solve a technological problem. Thus, the claim is ineligible.
Claim 10 further recites limitation of refraining from generating a structured query… which can be implemented in a human mind and/or with the aid of pen/paper (e.g., mentally not constructing the structured query). The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 11 further recites limitation of generating…a suggested query… which can be implemented in a human mind and/or with the aid of pen/paper (e.g., writing down a different query based on the invalidity response). The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 12 further recites limitations of obtaining a plurality of valid query embeddings; and comparing the natural language query embedding to a plurality of valid query embeddings to identify a nearest neighbor… which can be implemented in a human mind and/or with the aid of pen/paper (e.g., writing down the valid query embeddings; and visually comparing the embeddings to find nearest neighbors). The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 13 further recites limitation of receiving a modified natural language query… which can be implemented in a human mind and/or with the aid of pen/paper (e.g., writing down a different query based on the invalidity response). The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 14 further recites limitations of determining a validity of the modified natural language query; and converting…the modified natural language query into a structured query… which can be implemented in a human mind and/or with the aid of pen/paper (e.g., mentally determining the validity; and writing down a structured query based on the modified query). The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 15 further recites limitation of retrieving data from a database which is an extra-solution activity similar to the above analysis. The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 16 merely provides a definition for the structured query comprises a database query format. The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 17 is ineligible for the similar reasons presented in claims 1 and 9.
Claim 18 further recites limitation of a database storing data retrievable based on the structured query which is an extra-solution activity similar to the above analysis. The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 19 further recites limitation of a retrieval component configured to retrieve data… which is an extra-solution activity similar to the above analysis. The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim 20 further recites limitations of a user interface configured to receive the natural language query and to display a result… which are extra-solution activities similar to the above analysis. The claim does not have any additional elements for further analysis. Thus, the claim is ineligible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 8, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (Pub. No. US 2020/0334233, published on October 22, 2020; hereinafter Lee) in view of Xu et al. (Pat. No. US 12450272, filed on May 21, 2024; hereinafter Xu).
Regarding claim 1, Lee clearly shows and discloses a method for data processing (Abstract), comprising:
receiving, by a database management system, a natural language query comprising a request for data from a database (a user 210 may input natural language (spoken content) to a database 202. In response, the database 202 may determine a SQL command/query that can be used to access the data necessary to answer the input from the user 201. For example, in a first input 210, the user 201 asks for the population of Hawaii, [0032]);
generating, using an embedding model of the database management system, a natural language query embedding representing the natural language query in a vector space (the deep learning neural network also converts the user input 302 into a vector format, [0043]);
determining, by the database management system, a validity of the natural language query based on comparing the natural language query embedding and a plurality of valid query embeddings in the vector space (The candidate search network 310 identifies a subset (reduced set 322) of candidates from a larger candidate set 312 using a deep learning neural network. Each candidate included in the candidate set 312 includes a SQL template paired with a natural language text corresponding thereto. In other words, each unique SQL template is paired with a corresponding unique natural language input that triggers the SQL template, [0042]), wherein the plurality of valid queries are known to be useable for generating structured queries that result in data being accurately retrieved from the database (the architecture 300 extends a template-based model with one-shot learning. The architecture 300 is not limited to any format of SQL, and it is free of SQL syntax error, [0041], [0043]. Figure 3D shows the query embedding is useable to accurately retrieve similar result to the data requested in the input natural language query);
converting, using a language generation model of the database management system, the natural language query into a structured query based on the validity of the natural language query (an SQL template 332 from the selected candidate set is retrieved and forwarded to the pointer network 330, as the most appropriate SQL template for the user input 302. In other words, the result of the matching network 320 is a most appropriate matching template 332 from among all possible SQL templates, [0048]). In 440, the method may include determining a SQL command that corresponds to the natural language input based on the selected SQL template and content from the natural language input, [0060]); and
retrieving the data from the database using the structured query (generating a response to the determined SQL command, and outputting the response to at least one of a user interface and a software program, [0060]).
Xu then discloses:
wherein the plurality of valid query embeddings are generated based on a plurality of valid queries, respectively (The knowledge embedding service 225 may generate a vector embedding of the user table retrieval text query 220. The knowledge embedding service 225 may compare the generated vector embedding of the user table retrieval text query 220 to respective vector embeddings of anchor queries. For example, the knowledge embedding service 225 may compare the user table retrieval text query 220 to an anchor query 230-a, an anchor query 230-b, and an anchor query 230-c to determine which anchor query has a highest degree of similarity (e.g., according to comparing the vector embeddings) to the user table retrieval text query 220, [Column 7, Line 58 – Column 8, Line 15], [Column 14, Lines 52-67]);
wherein the plurality of valid queries are known to be usable for generating structured queries that result in data being accurately retrieved from the database (the service may compare vector embeddings of the natural language query to vector embeddings of the multiple anchor queries, where each database table is associated with one or more of the multiple anchor queries. The anchor queries may serve as reference points or “anchors” in the semantic search and may be more representative of metadata associated with the corresponding database table. In other words, the anchor queries may be representative of the nuanced differences between different database tables, which may otherwise have similar vector embeddings, [Column 2, Lines 27-42]. It is clear that the anchor queries are validated based on their embeddings determined to be similar to one or more respective embeddings of database tables); and
wherein the structured queries are generated based on the plurality of valid queries (To generate the anchor queries, the service may perform an automated anchor query generation process. The anchor query generation process may involve initial generation, back-translation, contrastive learning, and validation. For example, the service may generate, for a database table and using a language model, candidate SQL queries and candidate query “intents” using a set of SQL query templates and table metadata for the database table, where each candidate SQL query may be paired with a respective candidate query intent. The candidate query intents may be representative of a specific question or the task indicated by a natural language query. For example, a query intent may define information or action sought by a user providing the natural language query. The service may validate the respective candidate query intents by translating (e.g., back-translating) the candidate query intent into a test SQL query and determining whether the test SQL query corresponds to a candidate SQL query associated with the candidate query intent, [Column 2, Line 43 – Column 3, Line 7]).
It would have been obvious to an ordinary person skilled in the art at the time of the invention was effectively filed to incorporate the teachings of Xu with the teachings of Lee for the purpose of improving accuracy, reliability and semantic alignment of natural language inferences for databases by ensuring the translation of a natural language query to a corresponding structured query is anchored a known-valid semantic space.
Regarding claim 2, Lee further discloses generating the natural language query embedding comprises:
tokenizing the natural language query to obtain a sequence of tokens (using tokens (e.g., words, text, etc.) from the natural language input sentence, [0036], [0051]-[0052]); and
computing, using the embedding model, a vector representing the natural language query based on the sequence of tokens, wherein the natural language query embedding comprises the vector (converting the natural language input into a natural language vector using a second function, [0059]).
Regarding claim 3, Lee further discloses the structured query comprises a database query format (In 440, the method may include determining a SQL command that corresponds to the natural language input based on the selected SQL template and content from the natural language input, [0060]).
Regarding claim 4, Lee further discloses determining the validity of the natural language query comprises: computing a distance between the natural language query embedding and the plurality of valid query embeddings (the deep learning neural network converts the natural language text (paired with each SQL template) into a vector format (number) using a vector function (g) 314. Likewise, the deep learning neural network also converts the user input 302 into a vector format. Furthermore, the deep learning neural network compares the vectors 316 of the candidate set to the vectorized format of the user input 320, and chooses the top-n most relevant vectors 316. The comparison may be performed using a cosine similarity function, etc, [0043]).
Regarding claim 8, Lee further discloses computing a distance between the natural language query embedding and each of a plurality of valid query embeddings (The deep learning neural network converts the natural language text (paired with each SQL template) into a vector format (number) using a vector function (g) 314. Likewise, the deep learning neural network also converts the user input 302 into a vector format. Furthermore, the deep learning neural network compares the vectors 316 of the candidate set to the vectorized format of the user input 320, and chooses the top-n most relevant vectors 316. The comparison may be performed using a cosine similarity function, etc, [0043]).
Regarding claim 17, Lee clearly shows and discloses a database management system, comprising: at least one memory component; at least one processor executing instructions stored in the at least one memory component (Figure 5);
an embedding model comprising embedding parameters stored in the at least one memory component, the embedding model trained to generate a natural language query embedding of a natural language query (the deep learning neural network also converts the user input 302 into a vector format, [0043]) comprising a request for data from the database (a user 210 may input natural language (spoken content) to a database 202. In response, the database 202 may determine a SQL command/query that can be used to access the data necessary to answer the input from the user 201. For example, in a first input 210, the user 201 asks for the population of Hawaii, [0032]);
a validation component configured to determine a validity of the natural language query by comparing the natural language query embedding to a plurality of valid query embeddings (The candidate search network 310 identifies a subset (reduced set 322) of candidates from a larger candidate set 312 using a deep learning neural network. Each candidate included in the candidate set 312 includes a SQL template paired with a natural language text corresponding thereto. In other words, each unique SQL template is paired with a corresponding unique natural language input that triggers the SQL template, [0042]), wherein the plurality of valid queries are known to be useable for generating structured queries that result in data being accurately retrieved from the database (the architecture 300 extends a template-based model with one-shot learning. The architecture 300 is not limited to any format of SQL, and it is free of SQL syntax error, [0041]. Figure 3D shows the query embedding is useable to accurately retrieve similar result to the data requested in the input natural language query); and
a language generation model comprising text generation parameters stored in the at least one memory component, the language generation model trained to convert the natural language query into a structured query based on the validity of the natural language query (an SQL template 332 from the selected candidate set is retrieved and forwarded to the pointer network 330, as the most appropriate SQL template for the user input 302. In other words, the result of the matching network 320 is a most appropriate matching template 332 from among all possible SQL templates, [0048]). In 440, the method may include determining a SQL command that corresponds to the natural language input based on the selected SQL template and content from the natural language input, [0060]).
Xu then discloses:
wherein the plurality of valid query embeddings are generated based on a plurality of valid queries, respectively (The knowledge embedding service 225 may generate a vector embedding of the user table retrieval text query 220. The knowledge embedding service 225 may compare the generated vector embedding of the user table retrieval text query 220 to respective vector embeddings of anchor queries. For example, the knowledge embedding service 225 may compare the user table retrieval text query 220 to an anchor query 230-a, an anchor query 230-b, and an anchor query 230-c to determine which anchor query has a highest degree of similarity (e.g., according to comparing the vector embeddings) to the user table retrieval text query 220, [Column 7, Line 58 – Column 8, Line 15], [Column 14, Lines 52-67]);
wherein the plurality of valid queries are known to be usable for generating structured queries that result in data being accurately retrieved from the database (the service may compare vector embeddings of the natural language query to vector embeddings of the multiple anchor queries, where each database table is associated with one or more of the multiple anchor queries. The anchor queries may serve as reference points or “anchors” in the semantic search and may be more representative of metadata associated with the corresponding database table. In other words, the anchor queries may be representative of the nuanced differences between different database tables, which may otherwise have similar vector embeddings, [Column 2, Lines 27-42]. It is clear that the anchor queries are validated based on their embeddings determined to be similar to one or more respective embeddings of database tables); and
wherein the structured queries are generated based on the plurality of valid queries (To generate the anchor queries, the service may perform an automated anchor query generation process. The anchor query generation process may involve initial generation, back-translation, contrastive learning, and validation. For example, the service may generate, for a database table and using a language model, candidate SQL queries and candidate query “intents” using a set of SQL query templates and table metadata for the database table, where each candidate SQL query may be paired with a respective candidate query intent. The candidate query intents may be representative of a specific question or the task indicated by a natural language query. For example, a query intent may define information or action sought by a user providing the natural language query. The service may validate the respective candidate query intents by translating (e.g., back-translating) the candidate query intent into a test SQL query and determining whether the test SQL query corresponds to a candidate SQL query associated with the candidate query intent, [Column 2, Line 43 – Column 3, Line 7]).
It would have been obvious to an ordinary person skilled in the art at the time of the invention was effectively filed to incorporate the teachings of Xu with the teachings of Lee for the purpose of improving accuracy, reliability and semantic alignment of natural language inferences for databases by ensuring the translation of a natural language query to a corresponding structured query is anchored a known-valid semantic space.
Regarding claim 18, Lee further discloses a database storing data retrievable based on the structured query (a client 140 may execute one or more of the applications 145 to perform visual analysis via a user interface displayed on the client 140 to view analytical information such as charts, graphs, tables, and the like, based on the underlying data stored in the data store 110. The applications 145 may pass analytic information to one of services 135 based on input received via the client 140. A structured query language (SQL) query may be generated based on the request and forwarded to DBMS 120. DBMS 120 may execute the SQL query to return a result set based on data of data store 110, [0021]).
Regarding claim 19, Lee further discloses a retrieval component configured to retrieve data from a database using the structured query (generating a response to the determined SQL command, and outputting the response to at least one of a user interface and a software program, [0060]).
Regarding claim 20, Lee further discloses a user interface configured to receive the natural language query and to display a result based on the structured query (Each of clients 140 may include one or more devices executing program code of the applications 145 for presenting user interfaces to allow interaction with application server 130. The user interfaces of applications 145 may comprise user interfaces suited for reporting, data analysis, and/or any other functions based on the data of data store 110, [0030]).
Claims 5, and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Xu and further view of Pryzant et al. (Pub. No. US 2025/0111147, filed on September 29, 2023; hereinafter Pryzant).
Regarding claim 5, Pryzant then discloses determining the validity of the natural language query comprises: determining that the distance is less than a threshold distance (The distance between the embeddings may be calculated in the vector space, such as through the use of a cosine similarity analysis or similar analysis. Prompts corresponding to embeddings that have distances below the first threshold embedding distance are defined as being too close (or duplicative or excess) and are thus not sufficiently diverse. Non-diverse prompts are consolidated or sampled to remove excess or duplicative prompts. Prompts corresponding to embeddings that have distances above the second threshold embedding distance are defined as being too far apart and are thus potentially not relevant or in a different class, [0046]).
It would have been obvious to an ordinary person skilled in the art at the time of the invention was effectively filed to incorporate the teachings of Pryzant with the teachings of Lee, as modified by Xu, for the purpose of improving the performance of the prompt and results in the task being completed more accurately and/or more efficiently with respect to computing resource utilization based on analysis of related embeddings associated with an initial natural language query.
Regarding claim 7, Pryzant further discloses normalizing the natural language query embedding to obtain a normalized embedding, wherein the distance is computed based on the normalized embedding (the editing prompt stage produces a plurality of optimized prompts 240 based on the plurality of gradients 230 that is output by the feedback prompt stage. The editing prompt stage allows for increasing prompt diversity while mitigating, minimizing, or avoiding significant changes or deviations in terms of classifications or domains of the prompts as compared with the initial prompt or current prompt(s), [0044]-[0046]).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Xu in view of Pryzant and further in view of Vargas et al. (Pub. No. US 2024/0037326, filed on July 27, 2022; hereinafter Vargas).
Regarding claim 6, Vargas then discloses computing a quantile of nearest neighbor distances of the plurality of valid query embeddings, wherein the threshold distance comprises the quantile (the similarity engine 150 may use the aggregated input embedding i and the aggregated stored embedding s of the pair of the input document 201 and the stored document 202 to determine a similarity measure using the similarity model 251. In some embodiments, the similarity measure may include, e.g., Approximate Nearest Neighbors, K-Nearest Neighbors, among other similarity measure, [0071]. The set of the highest scoring stored documents may be defined by, e.g., a percentile threshold, a score threshold, a rank threshold, or any other suitable threshold. For example, a percentile threshold may include a suitable percentile to indicate a similar or relevant relationship between the input document and a particular stored document, such as, e.g., 75th percentile, or other suitable percentile threshold, [0120]).
It would have been obvious to an ordinary person skilled in the art at the time of the invention was effectively filed to incorporate the teachings of Vargas with the teachings of Lee, as modified by Xu and Pryzant, for the purpose of matching input features with a set of stored features to determine a similarity metric measuring the similarity between the input features and each stored feature based on their respective embeddings to return the best results matching the input features.
Claims 9-10, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Xu and further in view of Betteridge et al. (Pub. No. US 2024/0143946, filed on June 5, 2023; hereinafter Betteridge).
Regarding claim 9, Lee clearly shows and discloses a method for data processing (Abstract), comprising:
receiving, by a database management system, a natural language query comprising a request for data from a database (a user 210 may input natural language (spoken content) to a database 202. In response, the database 202 may determine a SQL command/query that can be used to access the data necessary to answer the input from the user 201. For example, in a first input 210, the user 201 asks for the population of Hawaii, [0032]);
generating, using an embedding model of the database management system, a natural language query embedding of the natural language query (the deep learning neural network also converts the user input 302 into a vector format, [0043]);
computing, by the database management system, a distance between the natural language query embedding and a plurality of valid query embeddings (the deep learning neural network converts the natural language text (paired with each SQL template) into a vector format (number) using a vector function (g) 314. Likewise, the deep learning neural network also converts the user input 302 into a vector format. Furthermore, the deep learning neural network compares the vectors 316 of the candidate set to the vectorized format of the user input 320, and chooses the top-n most relevant vectors 316. The comparison may be performed using a cosine similarity function, etc, [0043]), wherein the plurality of valid queries are known to be useable for generating structured queries that result in data being accurately retrieved from the database (the architecture 300 extends a template-based model with one-shot learning. The architecture 300 is not limited to any format of SQL, and it is free of SQL syntax error, [0041]. Figure 3D shows the query embedding is useable to accurately retrieve similar result to the data requested in the input natural language query).
Xu then discloses:
wherein the plurality of valid query embeddings are generated based on a plurality of valid queries, respectively (The knowledge embedding service 225 may generate a vector embedding of the user table retrieval text query 220. The knowledge embedding service 225 may compare the generated vector embedding of the user table retrieval text query 220 to respective vector embeddings of anchor queries. For example, the knowledge embedding service 225 may compare the user table retrieval text query 220 to an anchor query 230-a, an anchor query 230-b, and an anchor query 230-c to determine which anchor query has a highest degree of similarity (e.g., according to comparing the vector embeddings) to the user table retrieval text query 220, [Column 7, Line 58 – Column 8, Line 15], [Column 14, Lines 52-67]);
wherein the plurality of valid queries are known to be usable for generating structured queries that result in data being accurately retrieved from the database (the service may compare vector embeddings of the natural language query to vector embeddings of the multiple anchor queries, where each database table is associated with one or more of the multiple anchor queries. The anchor queries may serve as reference points or “anchors” in the semantic search and may be more representative of metadata associated with the corresponding database table. In other words, the anchor queries may be representative of the nuanced differences between different database tables, which may otherwise have similar vector embeddings, [Column 2, Lines 27-42]. It is clear that the anchor queries are validated based on their embeddings determined to be similar to one or more respective embeddings of database tables); and
wherein the structured queries are generated based on the plurality of valid queries (To generate the anchor queries, the service may perform an automated anchor query generation process. The anchor query generation process may involve initial generation, back-translation, contrastive learning, and validation. For example, the service may generate, for a database table and using a language model, candidate SQL queries and candidate query “intents” using a set of SQL query templates and table metadata for the database table, where each candidate SQL query may be paired with a respective candidate query intent. The candidate query intents may be representative of a specific question or the task indicated by a natural language query. For example, a query intent may define information or action sought by a user providing the natural language query. The service may validate the respective candidate query intents by translating (e.g., back-translating) the candidate query intent into a test SQL query and determining whether the test SQL query corresponds to a candidate SQL query associated with the candidate query intent, [Column 2, Line 43 – Column 3, Line 7]).
It would have been obvious to an ordinary person skilled in the art at the time of the invention was effectively filed to incorporate the teachings of Xu with the teachings of Lee for the purpose of improving accuracy, reliability and semantic alignment of natural language inferences for databases by ensuring the translation of a natural language query to a corresponding structured query is anchored a known-valid semantic space.
Betteridge then discloses:
determining, by the database management system, that the distance is greater than a threshold distance (The numeric vector 904A (associated with the query 902A and the query 902C) is matched (e.g., using the vector-to-intent engine 612) to an intent 906A—“get a ride.” Similarly, the vector 904B (associated with the query 904A) is matched to an intent 906B—“add or edit payment method.” Each numeric vector may be matched to an intent from a set of intents using artificial intelligence technology. According to some examples, the artificial intelligence technology computes a probability that a given numeric vector corresponds to each intent from the set of intents. The given numeric vector is assigned to the intent with the highest probability if that probability exceeds a preset threshold (e.g., 0.9 or 0.85), [0094]); and
generating, by the database management system, an invalidity response based on the determination (If the probability does not exceed the threshold, the query is not assigned to an intent and another processing technique (e.g., assigning to a human customer service agent or asking the user to repeat or rephrase the query) may be used, [0094]. It is quite obvious that comparison of a query distance exceeding or not exceeding a threshold is a design choice to determine whether the query matches the intent or not, e.g., if the probability exceeds the threshold, then the query does not match the intent).
It would have been obvious to an ordinary person skilled in the art at the time of the invention was effectively filed to incorporate the teachings of Betteridge with the teachings of Lee, as modified by Xu, for the purpose of enhancing result retrieval associated with a query by analysis of vectorized embedding of the query with a plurality of related embeddings of similar query intents to identify users’ goals and provide appropriate responses.
Regarding claim 10, Betteridge further discloses refraining from generating a structured query based on the natural language query in response to the determination (If the probability does not exceed the threshold, the query is not assigned to an intent and another processing technique (e.g., assigning to a human customer service agent or asking the user to repeat or rephrase the query) may be used, [0094]).
Regarding claim 12, Lee further discloses comparing the natural language query embedding to the plurality of valid query embeddings to identify a nearest neighbor, wherein the determination is based on the nearest neighbor (A matching network may train an end-to-end k-nearest neighbor (kNN) network by combining feature extraction and a differentiable distance metric with cosine similarity, [0035]. The deep learning neural network converts the natural language text (paired with each SQL template) into a vector format (number) using a vector function (g) 314. Likewise, the deep learning neural network also converts the user input 302 into a vector format. Furthermore, the deep learning neural network compares the vectors 316 of the candidate set to the vectorized format of the user input 320, and chooses the top-n most relevant vectors 316. The comparison may be performed using a cosine similarity function, etc, [0043]).
Claims 11, and 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Lee in view of Xu in view of Betteridge and further in view of Pryzant.
Regarding claim 11, Pryzant then discloses generating, using a language generation model of the database management system, a suggested query based on the invalidity response (the editing prompt stage includes history context stage that tracks a history of a prompt as it is iterated (e.g., through the operations in FIG. 2A), and includes the tracked history in the prompts. In an example, in a current prompt(s) that requests a better prompt, the history context stage includes a whole history of the prompt optimization process (e.g., “the first time, the prompt changed from [XXXX] to [YYYY] and the difference was [AAAA], then the prompt changed from [YYYY] to [ZZZZ] and the difference was [BBBB],” and so on), [0047]).
It would have been obvious to an ordinary person skilled in the art at the time of the invention was effectively filed to incorporate the teachings of Pryzant with the teachings of Lee as modified by Xu and Betteridge, for the purpose of improving the performance of the prompt and results in the task being completed more accurately and/or more efficiently with respect to computing resource utilization based on analysis of related embeddings associated with an initial natural language query.
Regarding claim 13, Pryzant further discloses receiving a modified natural language query following the invalidity response (The set of operations includes providing, as input to a large language model (“LLM”), a first feedback prompt requesting one or more first textual gradients, each first textual gradient including a description of one or more first flaws in the initial prompt resulting in errors in LLM predictions. Providing, as input to the LLM, a first editing prompt requesting a first set of optimized prompts, based on the initial prompt and the one or more first textual gradients; and receiving, from output of the LLM, the first set of optimized prompts, [0112]).
Regarding claim 14, Pryzant further discloses:
determining a validity of the modified natural language query (selecting the one or more optimized prompts includes one of selecting a first number of the one or more optimized prompts that have scores above the average score or selecting a remaining number of the one or more optimized prompts after removing a second number of the one or more optimized prompts that have scores below the average score, [0118]. The editing prompt stage includes an embedding-based classification process that converts the optimized prompts into embeddings that are mapped to a prompt embedding space and measures distances between the embeddings of the optimized prompts within the embedding space to identify prompts that are below a first threshold embedding distance and prompts that are above a second threshold embedding distance, [0046]).
Lee then discloses:
converting, using a language generation model of the database management system, the modified natural language query into a structured query based on the validity of the modified natural language query (an SQL template 332 from the selected candidate set is retrieved and forwarded to the pointer network 330, as the most appropriate SQL template for the user input 302. In other words, the result of the matching network 320 is a most appropriate matching template 332 from among all possible SQL templates, [0048]). In 440, the method may include determining a SQL command that corresponds to the natural language input based on the selected SQL template and content from the natural language input, [0060]).
Regarding claim 15, Lee further discloses retrieving data from a database using the structured query (generating a response to the determined SQL command, and outputting the response to at least one of a user interface and a software program, [0060]).
Regarding claim 16, Lee further discloses the structured query comprises a database query format (In 440, the method may include determining a SQL command that corresponds to the natural language input based on the selected SQL template and content from the natural language input, [0060]).
Related Prior Art
The following reference(s) is/are considered related to the claims:
Sheinin et al. (Pub. No. US 2020/0257679) teaches machine translation for processing input questions includes receiving, in a processor on a computer, an input question presented in a natural language. The input question is preprocessed to find one or more condition values for possible Structured Query Language (SQL) queries. One or more possible SQL queries are enumerated based on the one or more found condition values and a paraphrasing model is used to rank the enumerated SQL queries. The highest ranked SQL query is executed against a relational database to search for a response to the input question.
Contact Information
Any inquiry concerning this communication or earlier communications from the Examiner should be directed to Son Hoang whose telephone number is (571) 270-1752. The Examiner can normally be reached on Monday – Friday (7:00 AM – 4:00 PM).
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, Sherief Badawi can be reached on (571) 272-9782. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SON T HOANG/Primary Examiner, Art Unit 2169 March 6, 2026