Prosecution Insights
Last updated: April 19, 2026
Application No. 19/094,596

Natural Language-Based Data Integration

Non-Final OA §102§103§DP
Filed
Mar 28, 2025
Examiner
LE, HUNG D
Art Unit
2161
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
1 (Non-Final)
90%
Grant Probability
Favorable
1-2
OA Rounds
2y 6m
To Grant
97%
With Interview

Examiner Intelligence

Grants 90% — above average
90%
Career Allow Rate
969 granted / 1073 resolved
+35.3% vs TC avg
Moderate +6% lift
Without
With
+6.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
33 currently pending
Career history
1106
Total Applications
across all art units

Statute-Specific Performance

§101
12.3%
-27.7% vs TC avg
§103
39.2%
-0.8% vs TC avg
§102
20.6%
-19.4% vs TC avg
§112
9.2%
-30.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1073 resolved cases

Office Action

§102 §103 §DP
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . DETAILED ACTION 1. This Office Action is in response to the application filed on 03/28/2025. Claims 1-20 are pending. Priority 2. This application is a Continuation of 18/241,028 (Patent US 12,287,804), which was filed on 08/31/2023, was acknowledged and considered. Double Patenting 3. The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the "right to exclude" ranted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory obviousness-type double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Omum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969). A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the conflicting application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. Effective January 1, 1994, a registered attorney or agent of record may sign a terminal disclaimer. A terminal disclaimer signed by the assignee must fully comply with 37 CFR 3.73(b). 4. Claims 1-20 are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 12,287,804. Although the conflicting claims are not identical, they are not patentably distinct from each other. Instant Application 19094596 Patent US 12,287,804 Claim 1: A method for performing natural language-based data integration using a data integration application on at least one computing device, the method comprising: obtaining, based on a natural language input to a large language model (LLM), an output of the LLM comprising a set of ordered activities corresponding to a data integration task represented by the natural language input that is provided for the data integration task; selecting at least one application programming interface (API) for performing each activity within the set of ordered activities; generating a data pipeline based on the set of ordered activities and the at least one API for performing each activity; and back-translating the data pipeline by converting an intermediate language in which each activity of the set of ordered activities is expressed to a desired data format for execution by the data integration application. Claim 1: A method for performing natural language-based data integration, wherein the method is implemented via a service provider device comprising a processor, and wherein the method comprises: causing execution of a data integration application on a remote device via a network; causing surfacing of a graphical user interface (GUI) corresponding to the data integration application on a display of the remote device; receiving, via the GUI, a natural language input representing a data integration task; generating, via a large language model (LLM), a set of ordered activities corresponding to the data integration task represented by the natural language input; selecting, via the LLM, at least one application programming interface (API) for performing each activity within the set of ordered activities; generating a data pipeline based on the set of ordered activities and the at least one API for performing each activity; and back-translating the data pipeline by converting an intermediate language in which each activity of the set of ordered activities is expressed to a desired data format for execution by the data integration application. Claim 14: A system for performing natural language-based data integration using a data integration application on at least one computing device, the system comprising: at least one processor; memory in electronic communication with the at least one processor; and instructions stored in the memory, the instructions being executable by the at least one processor to: obtain, based on a natural language input to a large language model (LLM), an output of the LLM comprising a set of ordered activities corresponding to a data integration task represented by the natural language input that is provided for the data integration task; select at least one application programming interface (API) for performing each activity within the set of ordered activities; generate a data pipeline based on the set of ordered activities and the at least one API for performing each activity; and back-translate the data pipeline by converting an intermediate language in which each activity of the set of ordered activities is expressed to a desired data format for execution by the data integration application. Claim 12: A service provider device, comprising: a processor; a communication connection for connecting a remote device to the service provider device via a network; a data integration application; and a computer-readable storage medium operatively coupled to the processor, the computer-readable storage medium comprising computer-executable instructions that, when executed by the processor, cause the processor to: cause execution of the data integration application on the remote device via the network; cause surfacing of a graphical user interface (GUI) corresponding to the data integration application on a display of the remote device; receive, via the GUI, a natural language input representing a data integration task; generate, via a large language model (LLM), a set of ordered activities corresponding to the data integration task represented by the natural language input; select, via the LLM, at least one application programming interface (API) for performing each activity within the set of ordered activities; execute the at least one API for performing each activity to generate a context for each activity; generate a data pipeline based on the set of ordered activities, the at least one API for performing each activity, and the corresponding context for each activity; back-translate the data pipeline by converting an intermediate language in which each activity of the set of ordered activities is expressed to a desired data format for execution by the data integration application; and cause surfacing of a representation of the data pipeline via the GUI. Claim 18: A non-transitory computer readable medium storing instructions thereon that, when executed by at least one processor, cause a computing device to: obtain, based on a natural language input to a large language model (LLM), an output of the LLM comprising a set of ordered activities corresponding to a data integration task represented by the natural language input that is provided for the data integration task; select at least one application programming interface (API) for performing each activity within the set of ordered activities; generate a data pipeline based on the set of ordered activities and the at least one API for performing each activity; and back-translate the data pipeline by converting an intermediate language in which each activity of the set of ordered activities is expressed to a desired data format for execution by the data integration application. Claim 18: A computer-readable storage medium comprising computer-executable instructions that, when executed by a processor, cause the processor to: execute a data integration application; surface a graphical user interface (GUI) corresponding to the data integration application; receive, via the GUI, a natural language input representing a data integration task; generate, via a large language model (LLM), a set of ordered activities corresponding to the data integration task represented by the natural language input; select, via the LLM, at least one application programming interface (API) for performing each activity within the set of ordered activities; generate a data pipeline based on the set of ordered activities and the at least one API for performing each activity; and back-translate the data pipeline by converting a language in which each activity of the set of ordered activities is expressed to a desired data format for execution by the data integration application. Examiner’s Note 5. A Large Language Model (According to Google): “A Large Language Model (LLM) is a type of AI designed to understand, generate, and process human language by analyzing massive datasets using deep learning, specifically neural networks called transformers. LLMs predict the most likely next word or sequence in a text, allowing them to summarize, translate, and answer questions. Key examples include ChatGPT, Gemini, and Claude.” Back translation (According to Google): “Back translation is a quality control process where a translator translates a text from a target language back into the original source language to check for accuracy, nuances, and potential misinterpretations, ensuring the meaning remains consistent across languages, especially for critical documents like clinical trial forms, legal contracts, or marketing slogans. It involves a separate, often blind, translator re-translating the already translated text, and the resulting back-translated text is then compared to the original to identify discrepancies and refine the translation.” Does ETL use data pipelining? (According to Google): “Yes, Extract, Transform, Load (ETL) is a specific type of data pipeline. It acts as an automated, ordered set of processes that pulls raw data from source systems, modifies it in a staging area, and loads the structured data into a target repository like a data warehouse for analysis.” Guo et al, US 12,368,745, [Guo: Abstract and column 71, lines 59-67 through column 72, lines 1-4 (“receiving a natural language input; generating, by a large language model and based on the natural language input, a first query directed to one or more tables of a plurality of tables”, i.e., ‘obtaining …input to a …LLM … an output … for data integration task.’)] [Guo: Column 5, lines 59-67 through column 6, lines 1-5 (“Agents can be implemented in any appropriate programming language, such as C or Golang, using applicable kernel APIs”, i.e., ‘selecting … one application programming interface (API) for performing each activity within the set of ordered activities’)] [Guo: Column 24, liens 23-35 (“A variety of components can also be used, such as open source scheduler frameworks (e.g., Airflow), or AWS services (e.g., the AWS Data pipeline) which can be used for managing schedules”, i.e., ‘generating a data pipeline based on the set of ordered activities’)] [Guo: Column 24, liens 23-35 (“Scheduler 152 is a microservice that may act as a scheduler and that may run arbitrary jobs organized as a directed graph”, i.e., ‘ordered activities’)]. O’ Kelly et al, US 20250005300, [O’ Kelly: Paragraphs 3 and 36 (“A large language model may be provided with input text, such as a question. The model may then provide output text in response, such as an answer to the question”)]. Ghoche et al, US 20240386214, [Ghoche: Paragraphs 101 and 103 (“which may include back translation. In back translation, new examples may be generated by translating back and forth between languages. For example, an English language example may be translated into Chinese and then translated back into English to create a new example. The new example is basically a paraphrasing, and would have the same label. The back translation can be performed more than once. It may also be performed through multiple languages (e.g., English-French-English, English-German-English”, i.e., “French” (of “English-French-English”) or “German” (of “English-German-English”) is considered as ‘intermediate language’)] [Ghoche: Paragraphs 59 and 175 (“prompts applied to a large language model used to support an autonomous AI chatbot in accordance with an implementation”, i.e., ‘obtaining …input to a …LLM … an output … for data integration task.’)] [Ghoche: Abstract and paragraphs 177 and 187 (“A natural language workflow policy is generated for a workflow to solve customer support tickets is automatically generated from representative tickets. Tools, such as API calls”, i.e., ‘selecting … one application programming interface (API) for performing each activity within the set of ordered activities’, “workflow” is considered as ‘ordered activities’)] [Ghoche: Paragraph 218 (“the AI chat widget identifies the location of the bike using an API and confirms the location with the client”, i.e., selecting API based on location with the client)] [Ghoche: Paragraphs 15 and 133 (“As illustrated in FIG. 6, in one implementation, the ML pipeline may include a question text embedder, a candidate answer text embedder, and an answer classifier”, i.e., ‘generating a data pipeline’)]. Claim Rejections - 35 USC § 102 6. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 7. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. 8. Claims 1-2, 4-6, 9-10, 14-16 and 18-19 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Ghoche et al (US 20240386214). Claim 1: Ghoche suggests a method for performing natural language-based data integration using a data integration application on at least one computing device, the method comprising: obtaining, based on a natural language input to a large language model (LLM), an output of the LLM comprising a set of ordered activities corresponding to a data integration task represented by the natural language input that is provided for the data integration task [Ghoche: Paragraphs 59 and 175 (“prompts applied to a large language model used to support an autonomous AI chatbot in accordance with an implementation”, i.e., ‘obtaining …input to a …LLM … an output … for data integration task.’)] [Ghoche: Abstract and paragraphs 177 and 187 (“A natural language workflow policy is generated for a workflow to solve customer support tickets is automatically generated from representative tickets. Tools, such as API calls”, i.e., ‘selecting … one application programming interface (API) for performing each activity within the set of ordered activities’, “workflow” is considered as ‘ordered activities’)]. Ghoche suggests selecting at least one application programming interface (API) for performing each activity within the set of ordered activities [Ghoche: Abstract and paragraphs 177 and 187 (“A natural language workflow policy is generated for a workflow to solve customer support tickets is automatically generated from representative tickets. Tools, such as API calls”, i.e., ‘selecting … one application programming interface (API) for performing each activity within the set of ordered activities’, “workflow” is considered as ‘ordered activities’)]. Ghoche suggests generating a data pipeline based on the set of ordered activities and the at least one API for performing each activity [Ghoche: Paragraphs 15 and 133 (“As illustrated in FIG. 6, in one implementation, the ML pipeline may include a question text embedder, a candidate answer text embedder, and an answer classifier”, i.e., ‘generating a data pipeline’)]. Ghoche suggests back-translating the data pipeline by converting an intermediate language in which each activity of the set of ordered activities is expressed to a desired data format for execution by the data integration application [Ghoche: Paragraphs 101 and 103 (“which may include back translation. In back translation, new examples may be generated by translating back and forth between languages. For example, an English language example may be translated into Chinese and then translated back into English to create a new example. The new example is basically a paraphrasing, and would have the same label. The back translation can be performed more than once. It may also be performed through multiple languages (e.g., English-French-English, English-German-English”, i.e., “French” (of “English-French-English”) or “German” (of “English-German-English”) is considered as ‘intermediate language’)]. Claim 2: Ghoche suggests executing the at least one API for performing each activity to generate a context for each activity [Ghoche: Abstract and paragraphs 177 and 187 (“A natural language workflow policy is generated for a workflow to solve customer support tickets is automatically generated from representative tickets. Tools, such as API calls”, i.e., ‘selecting … one application programming interface (API) for performing each activity within the set of ordered activities’, “workflow” is considered as ‘ordered activities’)] [Ghoche: Paragraph 175 (“understanding context about an inquiry”, i.e., ‘a context for each activity’)]; and generating the data pipeline based on the set of ordered activities and the at least one API for performing each activity, in combination with corresponding context for each activity [Ghoche: Paragraphs 15 and 133 (“As illustrated in FIG. 6, in one implementation, the ML pipeline may include a question text embedder, a candidate answer text embedder, and an answer classifier”, i.e., ‘generating a data pipeline’)]. Claim 4: Ghoche suggests wherein the at least one API is selected from a pre- generated list of APIs [Ghoche: Paragraph 235 (“As examples of actions, FIG. 51 shows a handoff to agent action and an API call (e.g., check order status) in block 5104, although more general a UI may provide a list of all available actions from which to select from”)]. Claim 5: Ghoche suggests wherein the at least one API is selected from a plurality of APIs that are exposed by the data integration application [Ghoche: Paragraph 187 (“and API calls to answer a customer question for a particular topic”)]. Claim 6: Ghoche suggests wherein selecting the at least one API is performed by the LLM [Ghoche: Paragraph 187 (“and API calls to answer a customer question for a particular topic”)] [Ghoche: Abstract (“The generated workflows may be used by a large language model to generate answers for customer questions for an autonomous AI chatbot agent”)]. Claim 9: Ghoche suggests causing a graphical user interface (GUI) of a client device to display a representation of the data pipeline [Ghoche: Paragraph 22 (“user interface to define a custom intent”)] [Ghoche: Paragraphs 15 and 133 (“As illustrated in FIG. 6, in one implementation, the ML pipeline may include a question text embedder, a candidate answer text embedder, and an answer classifier”, i.e., ‘generating a data pipeline’)]. Claim 10: Ghoche suggests wherein generating the set of ordered activities includes: parsing the natural language input into multiple activities corresponding to the data integration task represented by the natural language input [Ghoche: Paragraphs 59 and 175 (“prompts applied to a large language model used to support an autonomous AI chatbot in accordance with an implementation”, i.e., ‘obtaining …input to a …LLM … an output … for data integration task.’)]; determining an execution order for the activities [Ghoche: Abstract (“workflow”)]; and determining dependencies among the activities [Ghoche: Abstract (“workflow”)]. Claim 14: Claim 14 is essentially the same as claim 1 except that it sets forth the claimed invention as a system rather than a method and rejected under the same reasons as applied above. Claim 15: Claim 15 is essentially the same as claim 2 except that it sets forth the claimed invention as a system rather than a method and rejected under the same reasons as applied above. Claim 16: Claim 16 is essentially the same as claim 4 except that it sets forth the claimed invention as a system rather than a method and rejected under the same reasons as applied above. Claim 18: Claim 18 is essentially the same as claim 1 except that it sets forth the claimed invention as a program product rather than a method and rejected under the same reasons as applied above. Claim 19: Claim 19 is essentially the same as claim 2 except that it sets forth the claimed invention as a program product rather than a method and rejected under the same reasons as applied above. Claim Rejections - 35 USC § 103 9. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 10. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 11. Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Ghoche et al (US 20240386214), in view of Guo et al (US 12,368,745). Claim 12: The combined teachings of Ghoche and Guo suggest wherein the data format for execution by the data integration application is a JavaScript Object Notation (JSON) data format [Guo: Column 26, lines 30-41 (“and a combination of JSON/HTTP to manage the service. Example ways the client elements can be implemented are using frameworks such as React, Angular, or Backbone. JSON, jQuery, and JavaScript libraries (e.g., underscore) can also be used”)]. Both references (Ghoche and Guo) taught features that were directed to analogous art and they were directed to the same field of endeavor, such as data processing. It would have been obvious to one of ordinary skill in the art at the time the invention was made, having the teachings of Ghoche and Guo before him/her, to modify the system of Ghoche with the teaching of Guo in order to implement JSON as a data format [Guo: Column 26, lines 30-41]. 12. Claims 13, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ghoche et al (US 20240386214), in view of Corredor Ortega et al (US 20250013888). Claim 13: The combined teachings of Ghoche and Corredor Ortega suggest wherein the data pipeline comprises an Extract, Transform, and Load (ETL) data pipeline or an Extract, Load, and Transform (ELT) data pipeline [Corredor Orgega: Paragraph 92 (“for integrating with external data sources, traditional ETL (Extract, Transform, Load) techniques are used”)]. Both references (Ghoche and Corredor Ortega) taught features that were directed to analogous art and they were directed to the same field of endeavor, such as data processing. It would have been obvious to one of ordinary skill in the art at the time the invention was made, having the teachings of Ghoche and Corredor Ortega before him/her, to modify the system of Ghoche with the teaching of Corredor Ortega in order to implement ETL in data processing Corredor Orgega: Paragraph 92]. Claim 17: The combined teachings of Ghoche and Corredor Ortega suggest wherein the data format for execution by the data integration application is a JavaScript Object Notation (JSON) data format pipeline [Corredor Orgega: Paragraph 38 (“machine-readable file formats, such as extensible markup language (XML) and JavaScript object notation (JSON).”)], and wherein the data pipeline comprises an Extract, Transform, and Load (ETL) data pipeline or an Extract, Load, and Transform (ELT) data pipeline [Corredor Orgega: Paragraph 92 (“for integrating with external data sources, traditional ETL (Extract, Transform, Load) techniques are used”)]. Both references (Ghoche and Corredor Ortega) taught features that were directed to analogous art and they were directed to the same field of endeavor, such as data processing. It would have been obvious to one of ordinary skill in the art at the time the invention was made, having the teachings of Ghoche and Corredor Ortega before him/her, to modify the system of Ghoche with the teaching of Corredor Ortega in order to implement ETL in data processing Corredor Orgega: Paragraph 92]. Claim 20: Claim 20 is essentially the same as claim 17 except that it sets forth the claimed invention as a program product rather than a system and rejected under the same reasons as applied above. Allowable Subject Matter 13. Claims 3, 7-8 and 11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 14. Any inquiry concerning this communication or earlier communications from the examiner should be directed to [Hung D. Le], whose telephone number is [571-270-1404]. The examiner can normally be communicated on [Monday to Friday: 9:00 A.M. to 5:00 P.M.]. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Apu Mofiz can be reached on [571-272-4080]. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, contact [800-786-9199 (IN USA OR CANADA) or 571-272-1000]. Hung Le 02/04/2026 /HUNG D LE/Primary Examiner, Art Unit 2161
Read full office action

Prosecution Timeline

Mar 28, 2025
Application Filed
Feb 04, 2026
Non-Final Rejection — §102, §103, §DP (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12596684
SYSTEMS AND METHODS FOR SEARCHING DEDUPLICATED DATA
2y 5m to grant Granted Apr 07, 2026
Patent 12596724
SYSTEMS AND METHODS FOR USE IN REPLICATING DATA
2y 5m to grant Granted Apr 07, 2026
Patent 12596736
SYSTEMS AND METHODS FOR USING PROMPT DISSECTION FOR LARGE LANGUAGE MODELS
2y 5m to grant Granted Apr 07, 2026
Patent 12591489
POINT-IN-TIME DATA COPY IN A DISTRIBUTED SYSTEM
2y 5m to grant Granted Mar 31, 2026
Patent 12585625
SYSTEM AND METHOD FOR IMPLEMENTING A DATA QUALITY FRAMEWORK AND ENGINE
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
90%
Grant Probability
97%
With Interview (+6.4%)
2y 6m
Median Time to Grant
Low
PTA Risk
Based on 1073 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month