DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Claims 24-43 are allowable over the prior art. However, the claims remain rejected under 35 USC §101.
Reasons For Allowance
The cited references do not disclose generating a prompt for an explanation associated with the NL query and the first model query on the user interface; generating a second model query based at least in part on the NL query and the explanation, receiving a second model result generated by the one or more computing models applying to the second model query, the second model result including a generated query in a standard query language, and generating the data pipeline based at least in part on the generated query in the standard query language, the data pipeline comprising one or more data pipeline elements, at least one data pipeline element of the one or more data pipeline elements corresponding to a query component of the generated query in the standard query language.
Claim Rejections – 35 U.S.C. § 101
35 U.S.C. § 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 24-43 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to non-statutory subject matter.
These claims are rejected under 35 USC §101 because the claimed invention is directed to an abstract idea without significantly more. Each claim recites at a very level, changing the format of an augmented query, and formulating an abstract representation (i.e., a pipeline) of the steps/elements that would be needed to execute that query. Thus, the claims encompass the performance of the limitations, via a mental process or using paper and pencil, that are not tied to a practical application.
Regarding the independent claims:
Step 1: Yes, claim 24 is directed to a method (therefore a process), claim 35 is directed to is directed to a system (therefore a product/machine), and claim 43 recites a storage medium for performing a series of steps executed on a processor (therefore a process embodied in a product/machine). Thus, each of these claims is directed to a statutory category.
Step 2A, Prong 1 (Judicial Exception Recited?): Yes. Claims 24, 35 and 43 recite limitations directed to an abstract idea: “generating a first model query based on the NL query; … generating a prompt for an explanation associated with the NL query and the first model query on the user interface; generating a second model query based at least in part on the NL query and the explanation; … and generating the data pipeline based at least in part on the generated query in the standard query language, the data pipeline comprising one or more data pipeline elements, at least one data pipeline element of the one or more data pipeline elements corresponding to a query component of the generated query in the standard query language;”. As drafted, each of these limitations recites a mentally performable process as one can reformat and augment query text, and create an abstract representation of each step/element of the query via a mental process or using paper and pencil.
Step 2A, Prong 2 (Integrated into a Practical Application?): No. Claim 24 recites the following additional elements: “user interface”, "computing models” and “one or more processors”. Claim 35 recites: “one or more processors”, “one or more memories”, “user interface” and "computing models”. And, claim 43 recites “computer-readable storage medium”, “one or more processors”, “user interface” and "computing models”. Each of these are merely high-level recitations of generic computer components and represent mere instructions to apply on a computer as in MPEP 2106.05(f), which does not provide integration into a practical application.
Additionally, claims 24, 35 and 43 each recites “receiving a natural language (NL) query from a user interface; … receiving a first model result generated by one or more computing models applying to the first model query; … receiving a second model result generated by the one or more computing models applying to the second model query, the second model result including a generated query in a standard query language;”. These steps represent insignificant extra-solution activity. The “receiving” of data (i.e., mere data gathering) such as 'obtaining information' is identified in MPEP 2106.05(g) as insignificant extra-solution activity, and does not provide integration into a practical application.
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose meaningful limits on practicing the abstract idea. Viewing the additional limitations together and the claims as a whole, nothing provides integration into a practical application. Therefore, each claim is directed to an abstract idea.
Step 2B (Inventive Concept Provided?): No. As discussed with respect to Step 2A, the elements (i.e., the multiple steps of receiving) in each claim amount to no more than mere instructions to apply the exception. Mere instructions to apply an exception using generic computer components (e.g., hardware/software elements such as processors, storage/memory, interfaces and computing models) cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
With respect to the multiple “receiving” limitations discussed above, and when re-evaluated these elements are well-understood, routine, and conventional as evidenced by the court cases in MPEP 2106.05(d)(II), "i. Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); … OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network);" and thus remains insignificant extra-solution activity that does not provide “significantly more”.
Therefore, each of the claims, taken as a whole, does not change this conclusion and the claim is ineligible.
Claims 25-34 and 36-42 depend upon claims 24 and 35, respectively, and do not correct the issues set forth above. These claims essentially further actions such as displaying, receiving data / inputs, and additional abstract mental concepts such as further generation steps. Therefore, these claims are likewise rejected.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Relevance is provided in at least the Abstract of each cited document.
Non-Patent Literature
Soysal, Ergin, et al., “CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines”, Journal of the American Medical Informatics Ass’n, Vol. 25, Issue 3, March 2018, pp. 331-336.
Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community. (page 331, Abstract). CLAMP’s GUI was built on top of the Eclipse Framework, which provides built-in components for developing interactive interfaces. Figure 1 shows a screenshot of the main interface of CLAMP for building an NLP pipeline. Built-in NLP components are listed in the top-left palette, and the corpus management palette is in the left-middle area. User-defined NLP pipelines are displayed in the left-bottom palette. The details of each pipeline are displayed in the center area after users click a pipeline. A pipeline can be visually created by dragging and dropping components into the middle window, following specific orders (e.g., tokenizer should be before NER). After selecting the components of a pipeline, users can click each component to customize its settings. For example, for regular expression-based or dictionary-based NER components, users can specify their own regular expression or dictionary files. For machine learning–based NER, users can swap the default machine learning model with models trained on local data. To facilitate building machine learning–based NER modules on local data, CLAMP provides interfaces for corpus annotation and model training. We developed a fully functional annotation interface (by leveraging the brat annotation tool31), which allows users to de fine types of entity of interest and annotate them following guide lines (see Figure 2 for the annotation interface). After finishing annotation, users can click the training icon to build CRF models us ing the annotated corpus. (page 332, section entitled “GUI development”).
Mellah, Youssef, et al., “COMBINE: A Pipeline for SQL Generation from Natural Language”, ICACDS 2021, CCIS 1441, © Springer Nature Switzerland AG, 21 October 2021, pp. 97-106.
Accessing data stored in relational databases requires an understanding of the database schema and mainly a query language such as SQL, which, while powerful, is difficult to master. In this sense, recent researches try to approach systems
to facilitate this task, in particular by making Text-to-SQL models that attempt to map a question in Natural Language (NL) to the corresponding SQL query. In this paper, we present COMBINE, a pipeline for SQL generation from NL, in which we combine two existing models, RATSQL (We used the version RATSQL v3+BERT; paper’s url: arxiv.org/abs/1911.04942.) and BRIDGE (We used the version BRIDGE v1+BERT; paper’s url: aclweb.org/anthology/2020.findingsemnlp.438/.), that are based on recent advances in Deep Learning (DL) for Natural Language Processing (NLP). (page 97, Abstract).
Joshi, Salil Rajeev, et al., “A Natural Language and Interactive End-to-End Querying and Reporting System”, CoDS COMAD 2020, Hyderabad, India, January 5-7, 2020, pp. 261-267.
Natural language query understanding for unstructured textual sources has seen significant progress over the last couple of decades. For structured data, while the ecosystem has evolved with regard to data storage and retrieval mechanisms, the query language has remained predominantly SQL (or SQL-like). Towards making the latter more natural there has been recent research emphasis on Natural Language Interface to DataBases (NLIDB) systems. Piggy backing on the rise of ‘deep learning’ systems, the state-of-the-art NLIDB solutions over large parallel and standard benchmarks (viz, WikiSQL and Spider) primarily rely on attention based sequence to-sequence models. (page 261, Abstract).
Song, Yuanfeng, et al., “Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question”, arXiv, Cornell University Document Archive, January 4, 2022, pp. 1-14.
Speech-based inputs have been gaining significant momentum with the popularity of smartphones and tablets in our daily lives, since voice is the most easiest and efficient way for human-computer interaction. This paper works towards designing more effective speech-based interfaces to query the structured data in relational databases. We first identify a new task named Speech-to-SQL, which aims to understand the information conveyed by human speech and directly translate it into structured query language (SQL) statements. A naive solution to this problem can work in a cascaded manner, that is, an automatic speech recognition (ASR) component followed by a text-to-SQL component. However, it requires a high-quality ASR system and also suffers from the error compounding problem between the two components, resulting in limited performance. To handle these challenges, we further propose a novel end-to-end neural architecture named SpeechSQLNet to directly translate human speech into SQL queries without an external ASR step. SpeechSQLNet has the advantage of making full use of the rich linguistic information presented in speech. To the best of our knowledge, this is the first attempt to directly synthesize SQL based on arbitrary natural language questions, rather than a natural language-based version of SQL or its variants with a limited SQL grammar. To validate the effectiveness of the proposed problem and model, we further construct a dataset named SpeechQL, by piggybacking the widely-used text-to-SQL datasets. Extensive experimental evaluations on this dataset show that SpeechSQLNet can directly synthesize high-quality SQL queries from human speech, outperforming various competitive counterparts as well as the cascaded methods in terms of exact match accuracies. (page 1, Abstract).
US Patent Application Publications
Wang 2023/0078177
Multiple stage filtering may be implemented for natural language query processing pipelines. Natural language queries may be received at a natural language query processing system and processed through a query language processing pipeline. The query language processing pipeline may filter candidate linkages for a natural language query before performing further filtering of the candidate linkages in the natural language query processing pipeline as part of generating an intermediate representation used to execute the natural language query. (Abstract). As indicated at 730, a result for the natural language query determined using the intermediate representation of the natural language query may be returned via the interface, in some embodiments. In some embodiments, if a confidence value for none of the intermediate representations is above a minimum threshold, then an error or prompt to specify the natural language query according to an interface, protocol, or query language may be returned (e.g., a prompt to rewrite the natural language query as a SQL query). Confidence values generated at other stages in a natural language query processing pipeline (e.g., entity recognition 410, entity linking 510, or data set selection 520) may also trigger an error or prompt to specify the natural language query according to an interface, protocol, or query language if minimum confidence values are not met, in some embodiments. (para 0072).
Beller 2021/0149936
A computer implemented method, in a data processing system comprising a processor and a memory comprising instructions which are executed by the processor to cause the processor to implement an improved search query generation system, the method comprising inputting a natural language question; parsing the natural language question into a parse tree; identifying argument positions comprising one or more argument position terms, wherein each argument position term is a single word; for each argument position: comparing a head term's discriminator score against a threshold discriminator score; and if the head term surpasses the threshold discriminator score, adding the head term as a required term to an improved search query; and outputting the improved search query. (Abstract). Referring again to FIG. 3, the identified major features are then used during the question decomposition stage 330 to decompose the question into one or more queries that are applied to the corpora of data/information in order to generate one or more hypotheses. The queries are generated in any known or later developed query language, such as the Structure Query Language (SQL), or the like. The queries are applied to one or more databases storing information about the electronic texts, documents, articles, websites, and the like, that make up the corpora of data/information. That is, these various sources themselves, different collections of sources, and the like, represent a different corpus within the corpora. There may be different corpora defined for different collections of documents based on various criteria depending upon the particular implementation. For example, different corpora may be established for different topics, subject matter categories, sources of information, or the like. As one example, a first corpus may be associated with healthcare documents while a second corpus may be associated with financial documents. Alternatively, one corpus may be documents published by the U.S. Department of Energy while another corpus may be IBM Redbooks documents. Any collection of content having some similar attribute may be considered to be a corpus within the corpora. (para 0067).
MacLean 2020/0026790
In an embodiment, either the front-end interface or the CLI can be used to input SQL queries, statements or transforms to SQL interface 150, or to make inspection requests 136 to the column provenance instructions 134, optionally to cause displaying a graphical display of column provenance relationships using a computer display device. (para 0040).
US Patents
Dole 10,339,168
In one aspect, cognitive systems provide mechanisms for answering questions posed to these cognitive systems using a Question Answering pipeline or system (QA system). The QA pipeline or system is an artificial intelligence application executing on data processing hardware that answers questions pertaining to a given subject-matter domain presented in natural language. The QA pipeline receives inputs from various sources including input over a network, a corpus of electronic documents or other data, data from a content creator, information from one or more content users, and other such inputs from other possible sources of input. Data storage devices store the corpus of data. A content creator creates content in a document for use as part of a corpus of data with the QA pipeline. The document may include any file, text, article, or source of data for use in the QA system. For example, a QA pipeline accesses a body of knowledge about the domain, or subject matter area (e.g., financial domain, medical domain, legal domain, etc.) where the body of knowledge (knowledgebase) can be organized in a variety of configurations, e.g., a structured repository of domain-specific information, such as ontologies, or unstructured data related to the domain, or a collection of natural language documents about the domain. User queries and questions entered into a search engine generally follow a keyword-based, “question-intent” syntax. In order to generate fully formed questions from these question-intent queries, regular expressions can be used for natural language processing (NLP). To efficiently do this, a full question generation system can group parsed queries by interrogative words. Syntactically correct, fully-formed questions can be generated based on these parsed queries by inserting other relevant interrogative words or verbs. (col. 5 line 47 – col. 6 line 11).
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner ROBERT STEVENS whose telephone number is (571) 272-4102. The examiner can normally be reached Mon - Fri 6:00 - 2:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amy Ng can be reached on (571) 270-1698. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ROBERT STEVENS/Primary Examiner, Art Unit 2164
April 17, 2026