Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination
2. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 10/17/2025 has been entered.
DETAILED ACTION
3. This Office Action is in response to the filing with the office dated 10/17/2025.
Claims 1 and 11 have been amended. Claims 1 and 11 are independent claims. Claims 1-20 are presented in this office action.
Response to amendment/arguments
4. Applicant’s arguments with respect to the rejection of claims under 35 U.S.C. § 102 (a)(i) and 103(a) have been fully considered and are not persuasive, thus necessitated the rejection as presented in this Office action. Please see the response to arguments below.
Response to 103 Arguments
5. Applicant’s arguments on page 14 regarding claim 1 states “Accordingly, Applicant submits that claim 1 as amended is patentably distinguishable over Rafidi, Zaremoodi, Belcher, and Shahriar, alone or in combination, do not teach, suggest, or motivate "mapping the condensed feature set to a database-specific query syntax using the medical database query map, and formatting the query into a structure executable by the medical database”.
Examiner respectfully disagrees and maintains the rejection as Belcher et al teaches, "mapping the condensed feature set to a database-specific query syntax using the medical database query map, and formatting the query into a structure executable by the medical database." (Paragraph [0100] discloses, mapping the condense feature set/ filters/ features from the query into a database query syntax/ SQL query to output the results from the medical database. Also see Abstract (i.e., condensing/ reducing/ filtering the collection of data into feature set to extract the data by converting the filters into domains specific query syntax using the medical database, and formatting the query into SQL query to output the results from the database)).
Therefore Belcher teaches the argued limitation.
Response to 101 Arguments
6. Applicant’s arguments regarding 101 rejection on pages 8, 9 recites “The amended claim recites additional limitations that clearly integrate the alleged abstract idea into a practical application. Specifically, the claim now requires that "generating the first medical database query further comprises mapping the feature set to a database-specific query syntax using the medical database query map, and formatting the query into a structure executable by the medical database." This is not a mental process. Rather, it recites a specific technical solution implemented by a computing device to overcome the problem that medical databases require requests to be submitted in structured, machine-readable syntax.
Examiner respectfully disagrees as the amended claim "generating the first medical database query further comprises mapping the feature set to a database-specific query syntax using the medical database query map, and formatting the query into a structure executable by the medical database." covers performance of the limitation in the mind but for the recitation of generic computer components. The limitations are steps involving processes that can be practically performed by a human with the aid of pen and paper, or as explained above, using a computer as a tool to perform the concept. For example at the high level of generality as drafted, would encompass a user to receive a query related to the medical field and map the features to the database and retrieve the results. For example when a user searches for information in the form of text/ natural language, the system translates the query into a domain specific language and provides the results using a programmed computer is well understood and conventional in the field of computer technology. These limitations do not improve the functioning of a computer, improve the technology, apply the abstract idea to a particular machine, effect a transformation, nor provide meaningful limitations beyond linking the abstract idea to computer technology. They do not recite specific details that amount to significantly more than the abstract idea or providing meaningful limits on the abstract idea. For at least these reasons, claims 1 and 11 are nonstatutory because they are directed to a judicial exception without significantly more.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
7. Determining whether claims are statutory under 35 U.S.C. 101 involves a two-step analysis. Step 1 requires a determination of whether the claims are directed to the statutory categories of invention. Step 2 requires a determination of whether the claims are directed to a judicial exception without significantly more. Step 2 is divided into two prongs, with the first prong having a part 1 and part 2. See MPEP 2106; See 2019 Revised Patent Subject Matter Eligibility Guidance (2019 PEG).
Step 1, Claims 1-11 recite an apparatus which are directed to the statutory category of a machine.
Regarding independent claims 1 and 11
Step 2A, Part 1, claims are analyzed to determine whether they are directed to an abstract idea. Under the 2019 PEG, claims are deemed to be directed to an abstract idea if they fall within one of the enumerated categories of (a) mathematical concepts, (b) certain methods of organizing human activity, and (c) mental processes. Here, claims 1 and 11 are directed to an abstract idea categorized under mental processes. Courts consider a mental process if it “can be performed in the human mind, or by a human using a pen and paper.” MPEP 2016(a)(2)(III). Courts also consider a mental process as one that can be performed in the human mind and is merely using a computer as a tool to perform the concept. MPEP 2016(a)(2)(III)(C)(3). Claims 1 and 11 recites a mental process because the recited steps recite “receiving a natural language query”, “receive feature set…”, “generate a query to access a database using a database query map, and output the result”, “mapping….”. The processor and memory, are recited at a high level of generality and do not place meaningful limits on the abstract idea. input the query into a model, receive feature set, generate a query to access a database, is a task that can be performed by a human with the use of the computer as a tool. These limitations are essentially steps of generating and manipulating data at a high level of generality, which can be performed by a person using a computer as a tool.
Step 2A, part 2, claims are analyzed to determine whether the recited abstract idea is integrated into a practical application. In this case, as explained above, claims 1, 11 merely recite a mental process. These limitations describe receiving a natural language query, input the query into a model, receive feature set, generate a query to access a database using a database query map, and output the result. While claims 1, 11 recites additional components in the form of processors, a memory, and a storage, these components are recited at a high level of generality, which do not add meaningful limits on the recited abstract idea to integrate it into a practical application by providing an improvement to the functioning of a computer or technology, implementing the abstract idea with a particular machine or manufacture that is integral to the claim, effecting a transformation or reduction of a particular article to a different state or thing, nor applying the abstract idea in some meaningful way beyond linking its use to computer technology. See 2019 PEG. The additional limitations receiving a natural language query, input the query into a model and output the result are insufficient to integrate the abstract idea into a practical application. receiving a natural language query, input the query into a model are insignificant extra-solution activity of a data gathering process. receive feature set, generate a query to access a database using a database query map is a mental process and output the results is post-solution insignificant activity as mere data outputting.
Since claims 1 and 11 are directed to an abstract idea categorized as a mental process and does not integrate the judicial exception into a practical application, claims 1 and 11 are directed to a judicial exception.
Step 2B, Claims are analyzed to determine whether they recite significantly more than the abstract idea. In other words, it is determined whether the claims provide an inventive concept. In this case, claims 1, 11 does not recite limitations that amount to significantly more than the abstract idea. The limitations are steps involving processes that can be practically performed by a human with the aid of pen and paper, or as explained above, using a computer as a tool to perform the concept. These limitations, at the high level of generality as drafted, would encompass a user to receive a query and analyze the query and uncover patterns, structures or relationships within the data without predefined outputs, so that input the query maps the extracted features to an medical concept and output the data is mentally performable as an evaluation or judgement. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. These limitations do not improve the functioning of a computer, improve the technology, apply the abstract idea to a particular machine, effect a transformation, nor provide meaningful limitations beyond linking the abstract idea to computer technology. They do not recite specific details that amount to significantly more than the abstract idea or providing meaningful limits on the abstract idea. For at least these reasons, claim 1 is nonstatutory because they are directed to a judicial exception without significantly more.
Regarding dependent claims 2 and 12 depend on independent claims 1 and 11 and therefore recites the same abstract idea. claims 2 and 12 recites the additional limitations output the aggregated output to a user as a function of a medical database response subject count. This limitation seems to recite the feature of counting the number of results in the output. See specification Paragraph [0054]. These limitations do not integrate the abstract idea into a practical application because the additional limitation merely describes removing indices for data blocks that are no longer needed and will be deleted. Therefore, these additional limitations do not integrate the abstract idea into a practical application. Pursuant to step 2B, the additional limitations do not amount to significantly more than the abstract idea because the limitations are not recited in a manner that provides improvements to the functioning of a computer or any other technology or technical field.
Regarding dependent claims 3 and 13 depend on independent claims 1 and 11 and therefore recites the same abstract idea. claims 3 and 13 recites the additional limitations receive the first natural language database query as a function of a user input of a user. This limitation seems to recite the feature of receiving a natural language query. These limitations do not integrate the abstract idea into a practical application because the additional limitation merely describes removing indices for data blocks that are no longer needed and will be deleted. Therefore, these additional limitations do not integrate the abstract idea into a practical application. Pursuant to step 2B, the additional limitations do not amount to significantly more than the abstract idea because the limitations are not recited in a manner that provides improvements to the functioning of a computer or any other technology or technical field.
Regarding dependent claims 4 and 14 depend on independent claims 1 and 11 and therefore recites the same abstract idea. claims 4 and 14 recites the additional limitations creating a condensed feature set containing at least one combination feature as a function of the feature set. This limitation seems to recite the feature of having certain constraints on the desired data. See specification Paragraph [0050]. These limitations do not integrate the abstract idea into a practical application because the additional limitation merely describes removing indices for data blocks that are no longer needed and will be deleted. Therefore, these additional limitations do not integrate the abstract idea into a practical application. Pursuant to step 2B, the additional limitations do not amount to significantly more than the abstract idea because the limitations are not recited in a manner that provides improvements to the functioning of a computer or any other technology or technical field.
Regarding dependent claims 5 and 15 depend on independent claims 1 and 11 and therefore recites the same abstract idea. claims 5 and 15 recites the additional limitations inputting the at least one combination feature of the condensed feature set into a template. This limitation seems to recite the feature of storing the frequently requested data searches in the form of a template. These limitations do not integrate the abstract idea into a practical application because the additional limitation merely describes removing indices for data blocks that are no longer needed and will be deleted. Therefore, these additional limitations do not integrate the abstract idea into a practical application. Pursuant to step 2B, the additional limitations do not amount to significantly more than the abstract idea because the limitations are not recited in a manner that provides improvements to the functioning of a computer or any other technology or technical field.
Regarding dependent claims 6 and 16 depend on independent claims 1 and 11 and therefore recites the same abstract idea. claims 6 and 16 recites the additional limitations output the feature set to a user. This limitation seems to recite the feature of presenting/ output the data to the user. This limitation seems to recite the feature of having certain constraints on the desired data. See specification Paragraph [0050]. These limitations do not integrate the abstract idea into a practical application because the additional limitation merely describes removing indices for data blocks that are no longer needed and will be deleted. Therefore, these additional limitations do not integrate the abstract idea into a practical application. Pursuant to step 2B, the additional limitations do not amount to significantly more than the abstract idea because the limitations are not recited in a manner that provides improvements to the functioning of a computer or any other technology or technical field.
Regarding dependent claims 7 and 17 depend on independent claims 1 and 11 and therefore recites the same abstract idea. claims 7 and 17 recites the additional limitations receive a second natural language database query; and modify the feature set as a function of the second natural language database query. This limitation seems to recite the feature of receiving another query from the user and modify the first query based on the second query is a mental process. These limitations do not integrate the abstract idea into a practical application because the additional limitation merely describes removing indices for data blocks that are no longer needed and will be deleted. Therefore, these additional limitations do not integrate the abstract idea into a practical application. Pursuant to step 2B, the additional limitations do not amount to significantly more than the abstract idea because the limitations are not recited in a manner that provides improvements to the functioning of a computer or any other technology or technical field.
Regarding dependent claims 8 and 18 depend on independent claims 1 and 11 and therefore recites the same abstract idea. claims 8 and 18 recites the additional limitations train the LLM on a training dataset including a plurality of example natural language database queries as inputs correlated to a plurality of example feature sets as outputs. This limitation seems to recite the feature of training the dataset using a computer as a tool. These limitations do not integrate the abstract idea into a practical application because the additional limitation merely describes removing indices for data blocks that are no longer needed and will be deleted. Therefore, these additional limitations do not integrate the abstract idea into a practical application. Pursuant to step 2B, the additional limitations do not amount to significantly more than the abstract idea because the limitations are not recited in a manner that provides improvements to the functioning of a computer or any other technology or technical field.
Regarding dependent claims 9 and 19 depend on independent claims 1 and 11 and therefore recites the same abstract idea. claims 8 and 18 recites the additional limitations generate a second medical database query as a function of the feature set. This limitation seems to recite the feature of generating a query which is a post-solution insignificant activity. These limitations do not integrate the abstract idea into a practical application because the additional limitation merely describes removing indices for data blocks that are no longer needed and will be deleted. Therefore, these additional limitations do not integrate the abstract idea into a practical application. Pursuant to step 2B, the additional limitations do not amount to significantly more than the abstract idea because the limitations are not recited in a manner that provides improvements to the functioning of a computer or any other technology or technical field.
Regarding dependent claims 10 and 20 depend on independent claims 1 and 11 and therefore recites the same abstract idea. claims 8 and 18 recites the additional limitations generate the aggregated output as a function of a first medical database response responsive to the first medical database query and a second medical database response responsive to the second medical database query. This limitation seems to recite the feature of aggregated output which is a post-solution insignificant activity.
Claim Rejections - 35 U.S.C. § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
8. Claims 1- 20 are rejected under 35 U.S.C. 103 as being unpatentable over Rafidi; Joseph (US 20240045863 A1) in view of Belcher; Thomas (US 20180137177 A1) and in further view of SHAHRIAR; Muneem ( US 20230161763 A1).
Regarding independent claim 1, Rafidi; Joseph (US 20240045863 A1) teaches, an apparatus for generating a medical database query (Paragraph [0047] In some embodiments, the process 120 includes generating a model query based on the NL query. Also see Paragraph [0055]), the apparatus comprising: at least a processor; and a memory communicatively connected to the at least processor, wherein the memory contains instructions configuring the at least processor (Fig. 4 elements 612, 604, Paragraph [0061]) to: receive a first natural language database query (Paragraph [0025] In certain embodiments, the process 110 includes receiving an NL query, one or more input datasets (e.g., including one or more tables), and optionally one or more target datasets (e.g., including one or more tables). The NL query may be a query indicating some desired information, or one or more desired datasets. The NL query may include one or more strings. The NL query may include language that indicates certain constraints on the desired data (e.g., may include language specifying a date range, or an age range));
input the first natural language database query into a large language model (LLM) (Paragraphs [0047], [0048] In some embodiments, the process 130 may include selecting a model and implementing the model to determine a model result. The model may, but need not, be selected from a set of models based on the NL query, the one or more input data sets, and optionally the one or more target datasets. In some embodiments, the model may be a predetermined model. The model may be an NL processing model, such as a machine-learning NL processing model. For example, the model may be an autoregressive language model, such as a Generative Pre-trained Transformer 3 (GPT-3) model (i.e., Examiner interprets large language model (LLM) as machine learning models and GPT-3 is one of the model). Implementing the model may provide a query as an output (e.g., a structured query language (SQL) query). Also see Paragraph [0025]);
receive from the LLM a feature set (Paragraph [0056] In some embodiments, the process 230 includes applying the model to the first query, thus generating a NL description of the input data pipeline. The NL description of the input data pipeline can be in any appropriate format (e.g., textual or audio). The NL description may include an explanation of one or more metrics or parameters that the pipeline can be used to determine, and may include a description of conditions defined by the pipeline (i.e., Examiner interprets receiving feature set as receiving/ extracting plurality of parameters/ concepts from the query). For example, one NL description may be a string that states “This pipeline is counting the number of patients who have recovered from COVID-19 and were in critical condition in Seattle”, where the number of patients is a metric that the pipeline can be used to determine, and the conditions include having recovered from covid, having been in critical condition, and having been a patient in Seattle. The model may translate conditions defined in the pipeline (e.g., by referencing and translating corresponding conditions defined in the SQL query) into NL (e.g., into at least a portion of the NL description. Also see Paragraph [0071]), wherein the feature set comprises at least a combination feature (Paragraph [0025] The NL query may include language that indicates certain constraints on the desired data (e.g., may include language specifying a date range, or an age range. (As best understood by the examiner, with instant specification Paragraph [0050] As used herein, a “condensed feature set” is a feature set including a combination feature. As used herein, a “combination feature” is a data structure including 2 or more features and a logical operator….a combination feature may indicate that subjects must be above 40 and under 50. In another non-limiting example, a combination feature may indicate that subjects must be on a first drug but not on a second drug. A combination feature may describe a range of valid values (such as age between 40 and 50), a list of valid categorical items (such as on a first drug or on a second drug), or the like) as guidance Examiner interprets a condensed feature set containing at least one combination feature as a function of the feature set as a certain constraints on the desired data, which includes age range));
using a medical database query map, generate a first medical database query as a function of the feature set (Paragraph [0048] Implementing the model may provide a query as an output (e.g., a structured query language (SQL) query. Also see Paragraph [0068]. Paragraph [0087], [0109] the data pipeline in one or more platform-specific expressions of a first platform (e.g., a domain). Here the domain could be a medical database), wherein generating the first medical database query comprises inputting the combination feature of a condensed feature set (Paragraph [0055], [0056] discloses, generating medical database query (SQL) comprising combination feature of a condensed feature set (Examiner interprets combination feature of a condensed feature set as including two or more features/ conditions from the query as explained above));
and generate, using the LLM, an aggregated output by querying a medical database interfaced with the LLM using the first medical database query (Paragraph [0060] FIG. 3A displays an example of input datasets (labeled “patients,” “hospitalization objects,” and “hospital objects” in the depicted image) and an example target dataset (labeled “hospitals_with_num_crit” in the depicted image) displayed via the GUI 300. The GUI 300 can provide for a user selecting the input datasets and, optionally, the target dataset, and selecting a button or other input mechanism to generate a pipeline based on those inputs. Responsive to the button or other input mechanism being activated, the GUI 300 may prompt the user to input an NL query (e.g., in textual format via a textbox, or in audio format). The computing system 600 may then use these inputs to implement process 100, thus generating a pipeline that may optionally be displayed or otherwise presented (e.g., in an audio format) by the GUI 300. FIG. 3B shows an example of such a pipeline. The depicted pipeline includes the three original input datasets, various transformations including two joins, a filter, and an aggregation, and an output dataset that matches certain parameters of the target dataset (e.g., matches the schema of the target dataset) (i.e., generating an aggregated output based on the input data sets/ feature sets/ input parameters). Also see Paragraph [0049], [0056]),
Rafidi et al fails to explicitly teach, wherein generating the first medical database query further comprises: mapping the condense feature set to a database-specific query syntax using the medical database query map; and formatting the query into a structure executable by the medical database.
Belcher; Thomas (US 20180137177 A1) teaches, using a medical database query map, generate a first medical database query as a function of the feature set (Paragraphs [0094], [0095]] In some embodiments, SQL builder 600 comprises one or more SQL filters 400 and an SQL query constructor 500. Each SQL filter 400 is associated with a filter 300 in some embodiments. One or more filters 300 and/or SQL filters 400 may be used by SQL builder 600 via SQL query constructor 500 to generate a query for the medical database or information retrieval system. That is, an SQL query constructor 500 may receive data from one or more filters 300, SQL filters 400 and create an SQL query that may be used to query one or more databases and return the queried data. Filters 300 find data that matches the DSL definition. In some embodiments, each result filter 300 maps to a corresponding SQL filter 400. Also see Paragraph [0197] In some embodiments, system 700 generates SQL queries using a template-driven approach, where each SQL filter 400 is generated based on the corresponding filter 300 and instantiated with properties identifying the corresponding database table and columns);
and generate, using the LLM, an aggregated output by querying a medical database interfaced with the LLM using the first medical database query (Paragraphs [0111]-[0115] the SQL builder 600 can generate an SQL query that, when executed over a dataset in a data repository, returns a larger dataset than the one indicating by the RLQL query 100. Subsequent filtering can be performed on the larger dataset to extract only the data corresponding to data indicated by the RLQL query 100 (i.e., an aggregated output is returned to the user);
wherein generating the first medical database query further comprises: mapping the condense feature set to a database-specific query syntax using the medical database query map; and formatting the query into a structure executable by the medical database (Paragraph [0100] discloses, mapping the condense feature set/ filters/ features from the query into a database query syntax/ SQL query to output the results from the medical database. Also see Abstract).
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of Rafidi et al by providing, wherein generating the first medical database query further comprises: mapping the condense feature set to a database-specific query syntax using the medical database query map; and formatting the query into a structure executable by the medical database, as taught by Belcher et al (Paragraph [0100]).
One of the ordinary skill in the art would have been motivated to make this modification, by doing so, improves computer performance, for example, as it can be more efficient to execute a single SQL query rather than execute multiple queries and then chain the results together as taught by Belcher et al (Paragraph [0100]).
Rafidi et al and Belcher et al fails to explicitly teach, wherein generating the first medical database query comprises inputting the combination feature of a condensed feature set into a template.
SHAHRIAR et al further teaches, wherein generating the first medical database query comprises inputting the combination feature of a condensed feature set into a template (Paragraph [0022], [0023] A data store, such as a database, typically stores specific information related to a specialized field. A user that wants to retrieve data in the specialized field may ask questions for very specific types of data from such a data store. A medical researcher may want to retrieve blood pressure data from patients within a certain age range or a sales person may want to determine the amount of sales for one or more products over the last year. In such cases, conventional algorithms (e.g., WikiSQL based algorithms) for translating natural language queries into database queries fail to fulfill the exclusive needs of those specialists. The disclosed techniques addresses this problem at least in part by using a supervised template-driven process that can be trained and validated for specialized scenarios. The disclosed approach may be generally referred to herein as Text2SQL, which may comprise a natural language to database query translation process that may be trained for specific use cases. [0023] The disclosed techniques may comprise the use of templates. A template may comprise a sample question (e.g., in a natural language) associated with a corresponding database query (e.g., SQL queries). Templates may be determined and/or stored for frequently requested data searches. Based on the question templates, the disclosed techniques may adjust to an individual use case by learning the domain questions and filtering out those questions that are outside of its domain of knowledge. The table schema as well as the stored data types and values may be determined. Using the stored data information, the text of a question may be preprocessed by utilizing a heuristic search process to extract detectable conditions from the question and to reduce the question to a basic form. The basic form may be later input into multiple machine learning models configured to determine query language, such as query modifiers. The results of the heuristic search process may be combined with the results of the by multiple machine learning models to form a complete query. Also see Paragraph [0109]).
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of Rafidi et al and Belcher et al by providing wherein generating the first medical database query comprises inputting the combination feature of a condensed feature set into a template, as taught by SHAHRIAR et al (Paragraphs [0022], [0023])
One of the ordinary skill in the art would have been motivated to make this modification, by doing so, the base question may be input into one or more machine learning models. Since the base question is a simplified version of the original question, the machine learning models may yield more accurate results, be more efficient, and otherwise improve upon traditional techniques as taught by SHAHRIAR et al (Paragraph [0004]).
Regarding dependent claim 2, Rafidi et al, Belcher et al and SHAHRIAR et al teach, the apparatus of claim 1.
Rafidi et al further teaches, wherein the memory contains instructions configuring the at least processor (Fig. 4 elements 612, 604, Paragraph [0061]) to output the aggregated output to a user as a function of a medical database response subject count (Paragraph [0056] In some embodiments, the process 230 includes applying the model to the first query, thus generating a NL description of the input data pipeline. The NL description of the input data pipeline can be in any appropriate format (e.g., textual or audio). The NL description may include an explanation of one or more metrics or parameters that the pipeline can be used to determine, and may include a description of conditions defined by the pipeline. For example, one NL description may be a string that states “This pipeline is counting the number of patients who have recovered from COVID-19 and were in critical condition in Seattle” (As best understood by the examiner, with instant specification (Fig, Paragraph [0054] As used herein, a “medical database response subject count” is the number of subjects whose data is included in medical database response as guidance Examiner interprets counting number of patients included in the response as subject count), where the number of patients is a metric that the pipeline can be used to determine, and the conditions include having recovered from covid, having been in critical condition, and having been a patient in Seattle. The model may translate conditions defined in the pipeline (e.g., by referencing and translating corresponding conditions defined in the SQL query) into NL (e.g., into at least a portion of the NL description).
Belcher et al also teaches, wherein the memory contains instructions configuring the at least processor to output the aggregated output to a user as a function of a medical database response subject count (Paragraphs [0111]-[0115] the SQL builder 600 can generate an SQL query that, when executed over a dataset in a data repository, returns a larger dataset than the one indicating by the RLQL query 100. Subsequent filtering can be performed on the larger dataset to extract only the data corresponding to data indicated by the RLQL query 100 (i.e., an aggregated output is returned to the user based on the response of the query)).
Regarding dependent claim 3, Rafidi et al, Belcher et al and SHAHRIAR et al teach, the apparatus of claim 1.
Rafidi et al further teaches, wherein the memory contains instructions configuring the at least processor (Fig. 4 elements 612, 604, Paragraph [0061]) to receive the first natural language database query as a function of a user input of a user (Paragraph [0025] the process 110 includes receiving an NL query, one or more input datasets. Also see Paragraph [0065]).
Regarding dependent claim 4, Rafidi et al, Belcher et al and SHAHRIAR et al teach, the apparatus of claim 1.
Rafidi et al further teaches, wherein generating the medical database query comprises creating a condensed feature set containing at least one combination feature as a function of the feature set (Paragraph [0025] The NL query may include language that indicates certain constraints on the desired data (e.g., may include language specifying a date range, or an age range. (As best understood by the examiner, with instant specification Paragraph [0050] As used herein, a “condensed feature set” is a feature set including a combination feature. As used herein, a “combination feature” is a data structure including 2 or more features and a logical operator….a combination feature may indicate that subjects must be above 40 and under 50. In another non-limiting example, a combination feature may indicate that subjects must be on a first drug but not on a second drug. A combination feature may describe a range of valid values (such as age between 40 and 50), a list of valid categorical items (such as on a first drug or on a second drug), or the like) as guidance Examiner interprets a condensed feature set containing at least one combination feature as a function of the feature set as a certain constraints on the desired data, which includes age range)).
Regarding dependent claim 5, Rafidi et al, Belcher et al and SHAHRIAR et al teach, the apparatus of claim 4.
SHAHRIAR et al further teaches, wherein generating the medical database query comprises inputting the at least one combination feature of the condensed feature set into a template (Paragraph [0022], [0023] A data store, such as a database, typically stores specific information related to a specialized field. A user that wants to retrieve data in the specialized field may ask questions for very specific types of data from such a data store. A medical researcher may want to retrieve blood pressure data from patients within a certain age range or a sales person may want to determine the amount of sales for one or more products over the last year. In such cases, conventional algorithms (e.g., WikiSQL based algorithms) for translating natural language queries into database queries fail to fulfill the exclusive needs of those specialists. The disclosed techniques addresses this problem at least in part by using a supervised template-driven process that can be trained and validated for specialized scenarios. The disclosed approach may be generally referred to herein as Text2SQL, which may comprise a natural language to database query translation process that may be trained for specific use cases. [0023] The disclosed techniques may comprise the use of templates. A template may comprise a sample question (e.g., in a natural language) associated with a corresponding database query (e.g., SQL queries). Templates may be determined and/or stored for frequently requested data searches. Based on the question templates, the disclosed techniques may adjust to an individual use case by learning the domain questions and filtering out those questions that are outside of its domain of knowledge. The table schema as well as the stored data types and values may be determined. Using the stored data information, the text of a question may be preprocessed by utilizing a heuristic search process to extract detectable conditions from the question and to reduce the question to a basic form. The basic form may be later input into multiple machine learning models configured to determine query language, such as query modifiers. The results of the heuristic search process may be combined with the results of the by multiple machine learning models to form a complete query. Also see Paragraph [0109]).
Regarding dependent claim 6, Rafidi et al, Belcher et al and SHAHRIAR et al teach, the apparatus of claim 1.
Rafidi et al further teaches, wherein the memory contains instructions configuring the at least processor (Fig. 4 elements 612, 604, Paragraph [0061]) to output the feature set to a user (Fig. 5 Paragraph [0070]- [0083] In certain embodiments, the computing system is configured to make sure data pipelines are accurate. In some embodiments, the computing system interacts with the model solution to figure out if, given the NL query, the model solution has the right level of understanding of the concepts in the NL query. If not, the model solution, via the computing system, prompt the user for an explanation (i.e., Examiner interprets output the feature set as prompting the user for explanation/ unmatched column regarding the dataset), and the computing system can feed the explanation back to the model to ensure the most accurate pipeline is generated. In certain embodiments, the computing system can tie the explanation back to the datasets (e.g., the input datasets, the target dataset), to make sure that that the explanation (e.g., context) is stored for the pipelining attempts. [0072] “Who is the CTO?” can be generated).
Regarding dependent claim 7, Rafidi et al, Belcher et al and SHAHRIAR et al teach, the apparatus of claim 1.
Rafidi et al further teaches, wherein the memory contains instructions configuring the at least processor (Fig. 4 elements 612, 604, Paragraph [0061]) to: receive a second natural language database query (Fig. 5 Paragraph [0070]- [0083] In certain embodiments, the computing system is configured to make sure data pipelines are accurate. In some embodiments, the computing system interacts with the model solution to figure out if, given the NL query, the model solution has the right level of understanding of the concepts in the NL query. If not, the model solution, via the computing system, prompt the user for an explanation, and the computing system can feed the explanation back to the model to ensure the most accurate pipeline is generated (i.e., Examiner interprets the explanation provided by the user as a second query). In certain embodiments, the computing system can tie the explanation back to the datasets (e.g., the input datasets, the target dataset), to make sure that that the explanation (e.g., context) is stored for the pipelining attempts. [0073] According to certain embodiments, at process 540, the computing system presents or transmits (e.g., to another computing device) the one or more additional NL queries. In some embodiments, at process 545, the computing system receives one or more explanations corresponding to the one or more additional NL queries. In certain embodiments, at process 515, the computing system can incorporate the one or more explanations to the model query. In some embodiments, the computing system can incorporate the one or more explanations into the one or more input datasets and/or the target dataset. In the previous example, the computing system may receive an explanation of “CTO is Joe Doe” and incorporate it to the model query (i.e., receiving second natural language query associating the first natural language query)).
and modify the feature set as a function of the second natural language database query (Paragraph [0073] In some embodiments, at process 545, the computing system receives one or more explanations corresponding to the one or more additional NL queries. In certain embodiments, at process 515, the computing system can incorporate the one or more explanations to the model query. In some embodiments, the computing system can incorporate the one or more explanations into the one or more input datasets and/or the target dataset. In the previous example, the computing system may receive an explanation of “CTO is Joe Doe” and incorporate it to the model query (i.e., modifying the feature set by incorporating the second natural language query/ explanation of a feature/ term)).
Regarding dependent claim 8, Rafidi et al, Belcher et al and SHAHRIAR et al teach, the apparatus of claim 1.
SHAHRIAR et al further teaches, wherein the memory contains instructions configuring the at least processor to train the LLM on a training dataset including a plurality of example natural language database queries as inputs correlated to a plurality of example feature sets as outputs (Paragraph [0133] Data Preparation for the example Text2SQL engine is described as follows. The example Text2SQL engine was trained with question-query templates based on the use case. Several base questions (e.g., basic questions) which are frequently asked on the supply chain dataset were determined. By determining the common traits among the questions, question templates were determined).
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of Rafidi et al and Belcher et al by providing wherein the memory contains instructions configuring the at least processor to train the LLM on a training dataset including a plurality of example natural language database queries as inputs correlated to a plurality of example feature sets as outputs, as taught by SHAHRIAR et al (Paragraph [0133])
One of the ordinary skill in the art would have been motivated to make this modification, by doing so, the base question may be input into one or more machine learning models. Since the base question is a simplified version of the original question, the machine learning models may yield more accurate results, be more efficient, and otherwise improve upon traditional techniques as taught by SHAHRIAR et al (Paragraph [0004], [0022]).
Regarding dependent claim 9, Rafidi et al, Belcher et al and SHAHRIAR et al teach, the apparatus of claim 1.
Rafidi et al further teaches, wherein the memory contains instructions configuring the at least processor (Fig. 4 elements 612, 604, Paragraph [0061]) to generate a second medical database query as a function of the feature set (Paragraph [0072] According to some embodiments, at process 530, the computing system can determine whether the confidence score associated with the model result and/or the query in the standard language is higher than a predetermined threshold. In certain embodiments, if the confidence score is lower than a predetermined threshold, at process 535, the computing system and/or the model solution can generate one or more additional NL queries. In the previous example, the additional NL query of “Who is the CTO?” can be generated (i.e., generating the second database query)).
Regarding dependent claim 10, Rafidi et al, Belcher et al and SHAHRIAR et al teach, the apparatus of claim 9.
Rafidi et al further teaches, wherein the memory contains instructions configuring the at least processor (Fig. 4 elements 612, 604, Paragraph [0061]) to generate the aggregated output as a function of a first medical database response responsive to the first medical database query and a second medical database response responsive to the second medical database query (Paragraph [0074] According to some embodiments, the computing system may receive or generate the model result including an SQL query, and optionally a confidence score. In the previous example, the generated SQL query can be: [0075] SELECT first_name, last_name, salary_payment_in_us_dollars [0076] FROM Employees [0077] JOIN Payments ON Employees.employee_id=Payments.employee_id [0078] WHERE salary_payment_in_us_dollars >(SELECT salary_payment_in_us_dollars [0079] FROM Employees [0080] JOIN Payments ON Employees.employee_id=Payments.employee_id [0081] WHERE first_name=‘John’ AND last_name=‘Doe’ AND payment_year=2020. Paragraph [0088] According to certain embodiments, at process 560, the computing system can apply the data pipeline to the one or more input datasets to generate an output dataset (i.e., aggregated output is generated based on the first and the second natural language queries).
Regarding independent claim 11, Rafidi; Joseph (US 20240045863 A1) teaches, a method of generating a medical database query (Paragraph [0047] In some embodiments, the process 120 includes generating a model query based on the NL query. Also see Paragraph [0055]), the method comprising: using at least a processor (Fig. 4 elements 612, 604, Paragraph [0061]), receiving a first natural language database query (Paragraph [0025] In certain embodiments, the process 110 includes receiving an NL query, one or more input datasets (e.g., including one or more tables), and optionally one or more target datasets (e.g., including one or more tables). The NL query may be a query indicating some desired information, or one or more desired datasets. The NL query may include one or more strings. The NL query may include language that indicates certain constraints on the desired data (e.g., may include language specifying a date range, or an age range));
using the at least a processor, inputting the first natural language database query into a large language model (LLM) Paragraphs [0047], [0048] In some embodiments, the process 130 may include selecting a model and implementing the model to determine a model result. The model may, but need not, be selected from a set of models based on the NL query, the one or more input data sets, and optionally the one or more target datasets. In some embodiments, the model may be a predetermined model. The model may be an NL processing model, such as a machine-lea