DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments with respect to 35 U.S.C. 101 Abstract Idea in regards to claims 1-20 have been considered, however are not found to be persuasive due to the following reasons. Examiner respectfully disagrees with the Applicant because the claims are directed to an abstract information-handling process: it receives a question, decides that outside information should be called, translates that request into the right format, gets data back, adds that data to context, and then answers the user. In everyday terms, that is close to what a human assistant could do with notes and a lookup source. Abstract ideas include “mental processes,” such as observations, evaluations, and judgments, and a claim does not become eligible just because a generic computer is used as the tool for carrying them out. The claim also does not add a specific technical improvement that would take it out of the abstract idea category. It does not explain a new language model design, a new API-conversion technique, a new memory structure for context, or any other concrete improvement in how the computer itself works. Instead, it recites broad components—“data processing hardware,” “language model,” “external data source,” and “API”—and uses them at a high level to get a result. Therefore, the rejection is maintained.
Applicant's arguments with respect to 35 U.S.C. 102 and 103 in regards to claims 1-20 have been considered but are moot due to new grounds of rejection necessitated by amendments. See detailed rejection below.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 4-11 and 14-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
Claims 1 and 11 are directed to the abstract idea. The claim recites a series of steps for receiving a user query, analyzing the query using a language model, determining whether to invoke an external data source, retrieving data, updating a conversational context, and generating a response. These steps collectively describe the abstract concept of processing information to respond to a request, including decision-making, information retrieval, and presentation of results. Such operations fall within well-recognized abstract idea groupings, including mental processes (e.g., evaluating a query and determining whether to call an external source) and methods of organizing and using information. The claim is drafted at a high level of abstraction and focuses on the result of responding to a user query, rather than on a specific technical means for achieving that result.
The claim does not integrate the abstract idea into a practical application and does not recite additional elements that amount to significantly more than the abstract idea itself. The recited “data processing hardware,” “language model,” “external data source,” and “conversational context” are generic computing components used in their ordinary capacities to automate the abstract information-processing workflow. The claim does not specify any particular model architecture, data structure, system configuration, or technical improvement to computer functionality, nor does it impose meaningful constraints on how the operations are performed. Instead, it merely uses a computer as a tool to carry out the abstract idea more efficiently. Accordingly, because the claim is directed to an abstract idea and lacks an inventive concept sufficient to transform the abstract idea into patent-eligible subject matter, Claims 1 and 11 is rejected under 35 U.S.C. 101.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims are (i) mere instructions to implement the idea on a computer, and/or (ii) recitation of generic computer structure that serves to perform generic computer functions that are well-understood, routine, and conventional activities previously known to the pertinent industry. Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself. Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter. There is further no improvement to the computing device.
Dependent claims 4-10 and 14-20 further recite an abstract idea performable by a human and do not amount to significantly more than the abstract idea as they do not provide steps other than what is conventionally known in dialog systems.
Claims 4 and 14: streaming speech transcription input: adds receiving a query via speech transcription from streaming audio, which is a conventional input modality for a digital assistant and still part of collecting/processing information.
Claims 5 and 15: assistant-generated transcription via ASR: specifies that the assistant performs speech recognition—a conventional preprocessing step for converting audio to text—without reciting a technical improvement to ASR.
Claims 6 and 16: receive audio + perform ASR: similarly recites generic ASR on received audio, which is routine and does not add an inventive concept.
Claims 7 and 17: text input: adds that the query is received as text input, a generic input mechanism that does not meaningfully limit the abstract idea.
Claims 8 and 18: synthesized speech output: adds TTS audible output, a conventional output mechanism that is merely presenting information.
Claims 9 and 19: GUI display output: adds displaying text on a GUI, another conventional presentation of results.
Claims 10 and 20: pre-trained + fine-tuned model: specifies a pre-trained language model fine-tuned on labeled samples, which is a conventional ML training approach and does not, as claimed, recite a specific technical improvement.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 7-11 and 17-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Byrne et al. (“TicketTalk”; Aug. 1-6, 2021; pgs. 671-680) in view of Mazumder et al. (“Building an Application Independent Natural Language Interface”; Oct. 30, 2019; Dep. Of Computer Science, University of Illinois at Chicago, USA).
Claims 1 and 11,
Byrne teaches a computer-implemented method when executed by data processing hardware causes the data processing hardware to perform operations comprising: receiving a digital representation of a user query directed toward a digital assistant ([pg. 675] [Left col. Step 1-2] obtain user utterance and format it by adding the speaker token; the system obtains the user’s query (in digital text form) directed to the conversational agent (receives the user utterance in digital/text form for use by the dialog agent/system));
processing, using a language model, the digital representation of the user query to generate an output that includes a reference to a function call to invoke an external data source ([Abstract] [Section 4.1] [pgs. 671 and 674] uses a neural language model (T5 based text-to-text transformer) to process the user’s utterance and produce an output that contains an API call (a function call to an external knowledge base); that model generates both verbal responses and API call predictions; given a user query, the model’s output can include a reference to an external API function( (<PN> token in the output) to fetch needed data (T5 language model processes the user query and generates an output that may be an API/program call used to fetch external data));
determining the output generated by the language model includes the reference to the function call ([pg. 675] [Left col. Step 3] checking the model’s output for an API call indicator; the system determines if the model’s generated output contains a function/API call reference by detecting the special token for a program call; if the model prediction contains the program (<PN>) token then it processed to invoke the API (detects weather the mode output includes a program/API call reference)); and
calling, using the function call, the external data source to receive data responsive to the user query ([pg. 675] [Left col. Step 3] [Section 4.4] once an API/function call is detected in the model’s output, invoking the external data source; issue the API call b providing it to the API adapter; the system uses the function call predicted by the model to call the external API through an adapter, thereby retrieving the relevant information (uses the API call to invoke an external data source));
receiving, from the external data source, the data responsive to the user query ([Section 4.4] after calling the API, the system obtains the return data; when we detect an API call in the output, we invoke the API, retrieve the results, and embed the responses in the next model input; retrieve the results indicates the system receives data from the external source that is responsive to the user’s query (receives the returned data/results from the API/external source));
updating a conversational context by appending the digital representation of the user query and the data responsive to the user query ([pg. 675] [Left col. Step 3] updating the conversation context with the newly received data (while the user’s query remains part of the context); after the API returns the results, the system formats API results and provides it to the mode along with the conversation context (both the user utterance and returned program/API response into conversational context/model input)); and
processing, using the language model, the updated conversational context to generate a response to the user query ([Section 4.4] [pg. 676] the end-to-end system is able to interact with the user to solicit details relevant to the task, generate API calls to fetch data from external knowledge sources, and use the responses provided by the API call to construct natural language responses (then feeds the updated context back into the model to generate the natural language response)).
The difference between the prior art and the claimed invention is that Byrne does not explicitly teach the function call comprising a natural language format; based on determining the output generated by the language model includes the reference to the function call: translating the natural language format for the function call to an application programming interface (API) calls that conforms to requirements of an API to invoke the external data source.
Mazumder teaches the function call comprising a natural language format ([Introduction] for each API, the proposed approach attaches a natural language representations of it, which is a set of one or more API seed commands (ASCs) written in natural language (API/function call representations written in natural language));
based on determining the output generated by the language model includes the reference to the function call: translating the natural language format for the function call to an application programming interface (API) calls that conforms to requirements of an API to invoke the external data source ([Introduction] [Intro CML] when the user gives a natural language command, the system simply matches the command with one of the ASCs and in doing so, also instantiates the objects for the associated API to be executed; CML simply maps a user command to a correct ASC and the system executes the API attached to the ASC (ground/mapping the natural-language form command to the API and instantiating arguments so the API can be executed)).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Byrne with teachings of Mazumder by modifying the toward human-level performance with end-to-end transaction-based dialog system as taught Byrne to include the function call comprising a natural language format; based on determining the output generated by the language model includes the reference to the function call: translating the natural language format for the function call to an application programming interface (API) calls that conforms to requirements of an API to invoke the external data source as taught by Mazumder for the benefit of correctly parsing natural language (Mazumder [Abstract]).
Claims 7 and 17,
Byrne further teaches the method of claim 1, wherein receiving the digital representation of the user query comprises receiving, from the digital assistant, a textual representation of the user query, the textual representation of the user query input by the user via the digital assistant ([Section 4] [pg. 675] [Table 4] text-to-text; a unified text-to-text format where the input and output are always text strings; the digital representation of the user query is a text string; the user’s input as a user utterance marked with a user token <U> and feeds it into the model; the first runtime step is: obtain user utterance and format it by adding the speaker token; identifies a user string and shows example user inputs as text lines beginning with <U>;).
Claims 8 and 18,
Byrne further teaches the method of claim 1, wherein the response to the user query comprises an audible representation including synthesized speech audibly output from the digital assistant ([Summary] [Section 4] a model that generates both verbal responses and API call predictions and an interaction step where when the model predicts an agent response, the system will format it and show it to the user).
Claims 9 and 19,
Byrne further teaches the method of claim 1, wherein the response to the user query comprises a textual representation displayed on a graphical user interface (GUI) executing on the digital assistant ([pg. 677] interactive UI).
Claims 10 and 20,
Byrne further teaches the method of claim 1, wherein the language model comprises a pre-trained language model that is fine-tuned using labeled training samples ([Introduction] [Section 4] pre-trained transformer language model; use T5-base as pre-trained model; fine-tuning the pre-trained model on a labeled dialog dataset; fine-tune this model on the Taskmaster-3 dataset; the approach relies on a sufficiently large, in-domain, labeled dataset).
Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Byrne et al. (“TicketTalk”; Aug. 1-6, 2021; pgs. 671-680) in view of Mazumder et al. (“Building an Application Independent Natural Language Interface”; Oct. 30, 2019; Dep. Of Computer Science, University of Illinois at Chicago, USA) and further in view of Shires et al. (US 8,612,211).
Claims 4 and 14,
Byrne teaches all the limitations in claim 1. The difference between the prior art and the claimed invention is that Byrne does not explicitly teach wherein receiving the digital representation of the user query comprises obtaining a transcription of the user query spoken by the user and captured by the digital assistant in streaming audio.
Shires teaches wherein receiving the digital representation of the user query comprises obtaining a transcription of the user query spoken by the user and captured by the digital assistant in streaming audio ([Summary] speech recognition producing textual outputs including “partial transcriptions” as the user speaks; streaming recognition: partial results can be streamed out from a recognizer while the user is speaking).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Byrne with teachings of Shires by modifying the TicketTalk, toward human-level performance with end-to-end transaction-based dialog systems as taught by Byrne to include wherein receiving the digital representation of the user query comprises obtaining a transcription of the user query spoken by the user and captured by the digital assistant in streaming audio as taught by Shires for the benefit of enhancing user’s speech ([Summary] Shires).
Claims 5-6 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Byrne et al. (“TicketTalk”; Aug. 1-6, 2021; pgs. 671-680) in view of Mazumder et al. (“Building an Application Independent Natural Language Interface”; Oct. 30, 2019; Dep. Of Computer Science, University of Illinois at Chicago, USA) in view of Shires et al. (US 8,612,211) and further in view of Cerra et al. (US 8,886,540).
Claims 5 and 15,
Byrne, Mazumder and Shires teach all the limitations in claim 4. The difference between the prior art and the claimed invention is that Byrne, Mazumder nor Shires explicitly teach wherein obtaining the transcription comprises receiving the transcription of the user query from the digital assistant, the digital assistant generating the transcription by performing speech recognition on audio data characterizing the user query .
Cerra teaches wherein obtaining the transcription comprises receiving the transcription of the user query from the digital assistant, the digital assistant generating the transcription by performing speech recognition on audio data characterizing the user query ([Summary] [col. 2 lines 9-31] recording speech on a device and then receiving speech recognition results back at the device; recording speech, transmitting the recording, generating results and transmitting the results to the device).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Byrne and Shires with teachings of Cerra by modifying the TicketTalk, toward human-level performance with end-to-end transaction-based dialog systems as taught by Byrne to include wherein obtaining the transcription comprises receiving the transcription of the user query from the digital assistant, the digital assistant generating the transcription by performing speech recognition on audio data characterizing the user query as taught by Cerra for the benefit of providing real-time speech recognition (Cerra [col. 1 line 55]).
Claims 6 and 16,
Byrne, Mazumder and Shires teach all the limitations in claim 4. The difference between the prior art and the claimed invention is that Byrne, Mazumder nor Shires explicitly teach wherein obtaining the transcription comprises: receiving audio data characterizing the user query; and performing speech recognition on the audio data characterizing the user query to generate the transcription.
Cerra teaches wherein obtaining the transcription comprises: receiving audio data characterizing the user query; and performing speech recognition on the audio data characterizing the user query to generate the transcription ([Summary] [Claims] receiving a recording of speech at a speech recognition facility; generating results utilizing the speech recognition facility from that recording (producing recognized word results (transcription))).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Byrne and Shires with teachings of Cerra by modifying the TicketTalk, toward human-level performance with end-to-end transaction-based dialog systems as taught by Byrne to include wherein obtaining the transcription comprises: receiving audio data characterizing the user query; and performing speech recognition on the audio data characterizing the user query to generate the transcription as taught by Cerra for the benefit of providing real-time speech recognition (Cerra [col. 1 line 55]).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
SHREYANS A. PATEL
Primary Examiner
Art Unit 2653
/SHREYANS A PATEL/ Examiner, Art Unit 2659