DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This office action is in response to the amendment filed on 01/20/2026. Claims remain pending in the application. Claims 1, 13, and 20 are independent.
Claim Objections
Applicant's amendment to claims corrects previous objections; therefore, the previous objections are withdrawn.
Claim Rejections - 35 USC § 112
Applicant's amendment to claims does not correct previous rejections; therefore, The rejections remain the same as shown below.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 13, and 20 recite the limitation "... identifying/identify one or more missing entity arguments from at least a portion of the one or more partial query signals; updating/update at least a portion of the one or more missing entity arguments by inferring data from at least a portion of the one or more partial query signals using ..." in lines 6-9, 8-11, and 8-11 respectively, which rendering these claims indefinite because it is unclear whether these two instances of "at least a portion of the one or more partial query signals" are the same or different. Clarification is required.
Claims 2-12 and 14-19 are rejected for fully incorporating the deficiency of their respective base claims.
Claims 2-3 and 14-15 recite the limitation "... wherein inferring the data from at least a portion of the one or more partial query signals using … comprises inferring the data from at least a portion of the one or more partial query signals by processing the at least a portion of the one or more partial query signals using ..." in lines 1-5, which rendering these claims indefinite because "... identifying/identify one or more missing entity arguments from at least a portion of the one or more partial query signals; updating/update at least a portion of the one or more missing entity arguments by inferring data from at least a portion of the one or more partial query signals using ..." is also recited in their respective based claims and it is unclear (a) whether these instances of "at least a portion of the one or more partial query signals" are the same or different; and (b) if two instances of "at least a portion of the one or more partial query signals" recited here and two instances of "at least a portion of the one or more partial query signals" recited in their respective based claims are different, which instance is referred by "the at least a portion of the one or more partial query signals" recited here. Clarification is required.
Claims 6 and 18 recite the limitation "... wherein inferring the data from at least a portion of the one or more partial query signals using … comprises dynamically identifying one or more synonyms for one or more words within the at least a portion of the one or more partial query signals" in lines 1-4, which rendering these claims indefinite because "... identifying/identify one or more missing entity arguments from at least a portion of the one or more partial query signals; updating/update at least a portion of the one or more missing entity arguments by inferring data from at least a portion of the one or more partial query signals using ..." is also recited in their respective based claims and it is unclear (a) whether these instances of "at least a portion of the one or more partial query signals" are the same or different; and (b) if one instance of "at least a portion of the one or more partial query signals" recited here and two instances of "at least a portion of the one or more partial query signals" recited in their respective based claims are different, which instance is referred by "the at least a portion of the one or more partial query signals" recited here. Clarification is required.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-4, 6-16, and 18-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by PAGALLO et al. (US 2017 /0357632 A1, pub. date: 12/14/2017), hereinafter PAGALLO.
Independent Claims 1, 13, and 20
PAGALLO discloses a computer-implemented method comprising: detecting two or more languages in an input query to an artificial intelligence-based question answering system (PAGALLO, ¶¶ [0030]-[0033] with FIG. 1: the terms "digital assistant," "virtual assistant," "intelligent automated assistant," or "automatic digital assistant" can refer to any information processing system that interprets natural language input in spoken and/or textual form to infer user intent, and performs actions based on the inferred user intent; a digital assistant can be capable of accepting a user request at least partially in the form of a natural language command, request, statement, narrative, and/or inquiry; typically, the user request can seek either an informational answer or performance of a task by the digital assistant; a satisfactory response to the user request can be a provision of the requested informational answer, a performance of the requested task, or a combination of the two; during performance of a requested task, the digital assistant can sometimes interact with the user in a continuous dialogue involving multiple exchanges of information over an extended period of time; in addition to providing verbal responses and taking programmed actions, the digital assistant can also provide responses in other visual or audio forms, e.g., as text, alerts, music, videos, animations, etc.; a digital assistant can be implemented according to a client-server model; the digital assistant can include client-side portion 102 (hereafter "DA client 102") executed on user device 104 and server-side portion 106 (hereafter "DA server 106") executed on server system 108; one or more processing modules 114 can utilize data and models 116 to process speech input and determine the user's intent based on natural language input; one or more processing modules 114 perform task execution based on inferred user intent; DA server 106 can communicate with external services 120 through network(s) 110 for task completion or information acquisition; ¶¶ [0075]-[0079] with FIGS. 1 and 2A: digital assistant client module 229 can be capable of accepting voice input (e.g., speech input), text input, touch input, and/or gestural input through various user interfaces (e.g., microphone 213, accelerometer(s) 268, touch-sensitive display system 212, optical sensor(s) 229, other input control devices 216, etc.) of portable multifunction device 200; user data and models 231 can include various data associated with the user (e.g., user-specific vocabulary data, user preference data, user-specified name pronunciations, data from the user's electronic address book, to-do lists, shopping lists, etc.) to provide the client-side functionalities of the digital assistant; further, user data and models 231 can includes various models (e.g., speech recognition models, statistical language models, natural language processing models, ontology, task flow models, service models, etc.) for processing user input and determining user intent; digital assistant client module 229 can utilize the various sensors, subsystems, and peripheral devices of portable multifunction device 200 to gather additional information from the surrounding environment of the portable multifunction device 200 to establish a context associated with a user, the current user interaction, and/or the current user input; digital assistant client module 229 can provide the contextual information or a subset thereof with the user input to DA server 106 to help infer the user's intent; the digital assistant can also use the contextual information to determine how to prepare and deliver outputs to the user; digital assistant client module 229 can also elicit additional input from the user via a natural language dialogue or other user interfaces upon request by DA server 106; ¶¶ [0195]-[0196] and [0205]-[0207] with FIGS. 7A-B: I/O processing module 728 can interact with the user through I/O devices 716 in FIG. 7A or with a user device through network communications interface 708 in FIG. 7A to obtain user input (e.g., a speech input) and to provide responses (e.g., as speech outputs) to the user input; I/O processing module 728 can optionally obtain contextual information associated with the user input from the user device, along with or shortly after the receipt of the user input; I/O processing module 728 can also send follow-up questions to, and receive answers from, the user regarding the user request; ¶¶ [0237]-[0251] with FIG. 8: the multilingual language modeling system 800 serves to provide candidate words based on context information of an electronic device; the multilingual language modeling system 800 may include a plurality of language models 810, each of which may be a monolingual language model (i.e., a language model corresponding to a respective language); the multilingual language modeling system 800 includes a language identification module 815; the language identification module 815 receives context information of the current word wq, and in response, identifies a language of each word of the context information, respectively; the context information may include any number of words of a sentence preceding the current word wq (i.e., left context) and/or following the current word wq (i.e., right context); in particular, the language identification module may identify a language of each word of the context information W to generate a language identification string L; the language identification string L includes a numeric language identifier for each word of the context information W; the multilingual combination module 820 receives the monolingual probabilities P from each of the language models 810, respectively, and the language identification string L from the language identification module 815; ¶¶ [0252]-[0263]: FIG. 9: identify (e.g., receive) context information corresponding to the messaging application; context information of the messaging application may include words of a message transcript 902, such as those included in message 904 (e.g., "Bonjour. How are you?"), provided to the recipient 915 (e.g., Eliza – a digital assistant) and message 906 (i.e., response message; e.g., "Good. How are you?"), received from the recipient 915, wherein "Bonjour" is a French word and the remaining words are English in an input query; ¶¶ [0264]-[0270] with FIGS. 10A-D: assign a language to the keyboard 1010 based on context information; identify a most probable language of a current input (e.g., current word) and assign the identified language (e.g., English) to the keyboard 1010; identifying the most probable language of a current input may include providing one or more candidate words, identifying a language of the candidate word having the highest weighted probability, and assigning the identified language to the keyboard 1010; during operation of an application, determine that a language of a current input is most likely a different language than the assigned language of keyboard 1010; as a result, assign different language (e.g., French) to keyboard 1010; assigning a language in this manner may cause a keyboard layout of keyboard 1010 to switch, e.g., from an English keyboard layout (FIG. 10A) to a French keyboard layout (FIG. 10B); a keyboard of the electronic device 900 in FIGS. 10C-D is a multilingual keyboard, where the first language of the plurality of languages is assigned as a primary language of the multilingual keyboard 1030 and the second language of the plurality of languages is assigned as a secondary language of the multilingual keyboard 1030; identify a most likely language of a current input and assign the identified language as a primary language of the keyboard 1030; further, identify a second most likely language of a current input and assign the identified language as a secondary language of the keyboard 1030; because keyboard 1030 is associated with multiple languages, keyboard 1030 may simultaneously display (e.g., overlay) multiple keyboard layouts; e.g., as illustrated in FIG. 10C, a keyboard layout for the primary language of keyboard 1030, English, is displayed using relatively large characters, and a keyboard layout for the secondary language of the keyboard 1030, French, is displayed using relatively small characters; with reference to FIG. 10D, French may be assigned as the primary language and a French keyboard layout may be displayed using relatively large characters, and English may be assigned as a secondary language and an English keyboard layout may be displayed using relatively small characters);
determining, in the input query, one or more partial query signals associated with each of the two or more languages (PAGALLO, ¶¶ [0208]-[0226] with FIGS. 7B-C: STT processing module 730 can include one or more automatic speech recognition (ASR) systems which can process the speech input that is received through I/O processing module 728 to produce a recognition result; the front-end speech pre-processor can extract representative features from the speech input; each ASR system can include one or more speech recognition models (e.g., acoustic models and/or language models) and can implement one or more speech recognition engines; the one or more speech recognition models and the one or more speech recognition engines can be used to process the extracted representative features of the front-end speech pre-processor to produce intermediate recognitions results ( e.g., phonemes, phonemic strings, and sub-words), and ultimately, text recognition results ( e.g., words, word strings, or sequence of tokens); once STT processing module 730 produces recognition results containing a text string (e.g., words, or sequence of words, or sequence of tokens), the recognition result can be passed to natural language processing module 732 for intent deduction; STT processing module 730 can include and/or access a vocabulary of recognizable words via phonetic alphabet conversion module 731; each vocabulary word can be associated with one or more candidate pronunciations of the word represented in a speech recognition phonetic alphabet; when a speech input is received, STT processing module 730 can be used to determine the phonemes corresponding to the speech input (e.g., using an acoustic model), and then attempt to determine words that match the phonemes (e.g., using a language model); Natural language processing module 732 ("natural language processor") of the digital assistant can take the sequence of words or tokens ("token sequence") generated by STT processing module 730, and attempt to associate the token sequence with one or more "actionable intents" recognized by the digital assistant; an "actionable intent" can represent a task that can be performed by the digital assistant, and can have an associated task flow implemented in task flow models 754; the effectiveness of the digital assistant can be dependent on the assistant's ability to infer the correct "actionable intent(s)" from the user request expressed in natural language; in addition to the sequence of words or tokens obtained from STT processing module 730, natural language processing module 732 can also receive contextual information associated with the user request, e.g., from I/O processing module 728; the natural language processing module 732 can optionally use the contextual information to clarify, supplement, and/or further define the information contained in the token sequence received from STT processing module 730; the natural language processing can be based on, e.g., ontology 760 which can be a hierarchical structure containing many nodes, each node representing either an "actionable intent" or a "property" relevant to one or more of the "actionable intents" or other "properties"; an "actionable intent" can represent a task that the digital assistant is capable of performing, i.e., it is "actionable" or can be acted on; a "property" can represent a parameter associated with an actionable intent or a sub-aspect of another property; a linkage between an actionable intent node and a property node in ontology 760 can define how a parameter represented by the property node pertains to the task represented by the actionable intent node; ontology 760 can be made up of actionable intent nodes (e.g., "restaurant reservation" and "set reminder") and property nodes (e.g., "restaurant", "date/time," and "party size" ); within ontology 760, each actionable intent node can be linked to one or more property nodes either directly or through one or more intermediate property nodes; similarly, each property node can be linked to one or more actionable intent nodes either directly or through one or more intermediate property nodes; an actionable intent node, along with its linked concept nodes, can be described as a "domain" (e.g., restaurant reservation domain 762 and reminder domain 764); nodes associated with multiple related actionable intents can be clustered under a super domain" in ontology 760; each node in ontology 760 can be associated with a set of words and/or phrases that are relevant to the property or actionable intent represented by the node; the respective set of words and/or phrases associated with each node can be the so-called "vocabulary" associated with the node; the respective set of words and/or phrases associated with each node can be stored in vocabulary index 744 in association with the property or actionable intent represented by the node; the vocabulary index 744 can optionally include words and phrases in different languages; Natural language processing module 732 can receive the token sequence (e.g., a text string) from STT processing module 730, and determine what nodes are implicated by the words in the token sequence; if a word or phrase in the token sequence is found to be associated with one or more nodes in ontology 760 (via vocabulary index 744), the word or phrase can "trigger" or "activate" those nodes; based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 can select one of the actionable intents as the task that the user intended the digital assistant to perform; the domain that has the most "triggered" nodes can be selected; the domain having the highest confidence value ( e.g., based on the relative importance of its various triggered nodes) can be selected; additional factors are considered in selecting the node as well, such as whether the digital assistant has previously correctly interpreted a similar request from a user; natural language processing module 732 can use the user-specific information to supplement the information contained in the user input to further define the user intent; ¶¶ [0237]-[0251] with FIG. 8: candidate words are word predictions for a current word and in some examples, may be auto-completion and/or auto-correction word predictions; providing candidate words includes determining monolingual probabilities, i.e., determining words using a monolingual language model, and adjusting (e.g., weighting) each of the monolingual probabilities to provide multilingual probabilities; monolingual probabilities are adjusted based on the context information of the electronic device; candidate words are thereafter provided based on the multilingual probabilities; each of the language models 810 may identify (e.g., receive) context information; based on the context information, each language model 810 provides a respective set of monolingual probabilities; probabilities provided are determined ( e.g., generated) according to any known text prediction methodologies, including but not limited to, n-gram (e.g., word n-gram and/or character n-gram) language model word prediction and continuous space language model word prediction; based on each of the monolingual probabilities P and the language weights L, the multilingual combination module 820 provides a set of multilingual probabilities J; because determining multilingual probabilities J may be computationally demanding, the expression for determining multilingual probabilities J may be simplified by way of approximation; as a result of this approximation, each set of monolingual probabilities P no longer depends on languages specified by language identification string L, and the multilingual probabilities J may be represented as a weighted form of monolingual probabilities P; the approximation of the multilingual probabilities J represents a mixture of the monolingual probabilities weighted by the relative importance of the current word wq for each language; once the multilingual combination module 820 has provided (e.g., generated) multilingual probabilities J, the multilingual combination module 820 may provide one or more candidate words (e.g., word predictions) based on the multilingual probabilities J; the multilingual combination module 820 may identify one or words corresponding to the highest probabilities of the multilingual probabilities and provide the identified words as candidate words; candidate words may be provided during use of any application allowing for text entry);
identifying one or more missing entity arguments from at least a portion of the one or more partial query signals; updating at least a portion of the one or more missing entity arguments by inferring data from at least a portion of the one or more partial query signals using at least one artificial intelligence technique (PAGALLO, ¶¶ [0227]-[0229] with FIGS. 7B-C: once natural language processing module 732 identifies an actionable intent (or domain) based on the user request, natural language processing module 732 can generate a structured query to represent the identified actionable intent; the structured query can include parameters for one or more nodes within the domain for the actionable intent, and at least some of the parameters are populated with the specific information and requirements specified in the user request; e.g., the user may say "Make me a dinner reservation at a sushi place at 7"; according to the ontology, a structured query for a "restaurant reservation" domain may include parameters such as {Cuisine}, {Time}, {Date}, {Party Size}, and the like; based on the speech input and the text derived from the speech input using STT processing module 730, natural language processing module 732 can generate a partial structured query for the restaurant reservation domain, where the partial structured query includes the parameters {Cuisine="Sushi"} and {Time="7 pm"}; however, in this example, the user's utterance contains insufficient information to complete the structured query associated with the domain; therefore, other necessary parameters such as {Party Size} and {Date} may not be specified in the structured query based on the information currently available; natural language processing module 732 can populate some parameters of the structured query with received contextual information; e.g., if the user requested a sushi restaurant "near me"; natural language processing module 732 can populate a {location} parameter in the structured query with GPS coordinates from the user device; natural language processing module 732 can pass the generated structured query (including any completed parameters) to task flow processing module 736 ("task flow processor"); task flow processing module 736 can be configured to receive the structured query from natural language processing module 732, complete the structured query, if necessary, and perform the actions required to "complete" the user's ultimate request; task flow models 754 can include procedures for obtaining additional information from the user and task flows for performing actions associated with the actionable intent; in order to complete a structured query, task flow processing module 736 may need to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances; when such interactions are necessary, task flow processing module 736 can invoke dialogue flow processing module 734 to engage in a dialogue with the user; dialogue flow processing module 734 can determine how (and/or when) to ask the user for the additional information and receives and processes the user responses; the questions can be provided to and answers can be received from the users through I/O processing module 728; dialogue flow processing module 734 can present dialogue output to the user via audio and/or visual output, and receives input from the user via spoken or physical (e.g., clicking) responses; when task flow processing module 736 invokes dialogue flow processing module 734 to determine the "party size" and "date" information for the structured query associated with the domain "restaurant reservation," dialogue flow processing module 734 can generate questions such as "For how many people?" and "On which day?" to pass to the user; once answers are received from the user, dialogue flow processing module 734 can then populate the structured query with the missing information, or pass the information to task flow processing module 736 to complete the missing information from the structured query; i.e., updating at least a portion of the one or more missing entity arguments (e.g., "location" and "party size") by inferring data from at least a portion of the one or more partial query signals (e.g., "location" is inferred from "a sushi place near me" using contextual information and "party size" is inferred from additional dialogue with the user) using at least one artificial intelligence technique (e.g., "Natural language processing module" 732, "Ontology" 760 (see also ¶¶ [0030] and [0216]-[0225] described in the previous limitations), "Task Flow Processing Module" 736 and "Dialogue Flow Processing Module" 734 to identify intent, generate/construct "a partial structured query" (see arrow between "natural language processing module" 732 and "task flow processing module" 736 in FIG.7B), and derived/requested missing parameters to complete the "structured query")); and
performing one or more automated actions based at least in part on the updating of the at least a portion of the one or more missing entity arguments (PAGALLO, ¶¶ [0230]-[0235] with FIG. 7B: once task flow processing module 736 has completed the structured query for an actionable intent, task flow processing module 736 can proceed to perform the ultimate task associated with the actionable intent; task flow processing module 736 can execute the steps and instructions in the task flow model according to the specific parameters contained in the structured query; e.g., using a structured query such as: { restaurant reservation, restaurant=ABC Cafe, date=3/12/2012, time=7 pm, party size=5}, task flow processing module 736 can perform the steps of: (1) logging onto a server of the ABC Cafe or a restaurant reservation system such as OPENTABLE®, (2) entering the date, time, and party size information in a form on the website, (3) submitting the form, and (4) making a calendar entry for the reservation in the user's calendar; task flow processing module 736 can employ the assistance of service processing module 738 ("service processing module") to complete a task requested in the user input or to provide an informational answer requested in the user input; e.g., service processing module 738 can act on behalf of task flow processing module 736 to make a phone call, set a calendar entry, invoke a map search, invoke or interact with other user applications installed on the user device, and invoke or interact with third-party services (e.g., a restaurant reservation portal, a social networking website, a banking portal, etc.); the protocols and application programming interfaces (API) required by each service can be specified by a respective service model among service models 756; service processing module 738 can access the appropriate service model for a service and generate requests for the service in accordance with the protocols and APIs required by the service according to the service model; natural language processing module 732, dialogue flow processing module 734, and task flow processing module 736 can be used collectively and iteratively to infer and define the user's intent, obtain information to further clarify and refine the user intent, and finally generate a response (i.e., an output to the user, or the completion of a task) to fulfill the user's intent; the generated response can be a dialogue response to the speech input that at least partially fulfills the user's intent; further, the generated response can be output as a speech output, and in these examples, the generated response can be sent to speech synthesis module 740 (e.g., speech synthesizer) where it can be processed to synthesize the dialogue response in speech form; the generated response can be data content relevant to satisfying a user request in the speech input), wherein performing the one or more automated actions comprises generating at least one backend query to be processed by the artificial intelligence-based question answering system by integrating the input query and the at least a portion of the one or more missing entity arguments updated with the inferred data (PAGALLO, ¶¶ [0032]-[0039], [0195], and [0205]-[0207] with FIGS. 1 and 7A-B: a digital assistant can be implemented according to a client-server model; the digital assistant can include client-side portion 102 (hereafter "DA client 102") executed on user device 104 and server-side portion 106 (hereafter "DA server 106") executed on server system 108; DA client 102 can provide client-side functionalities such as user-facing input and output processing and communication with DA server 106; DA server 106 can include client-facing I/O interface 112, one or more processing modules 114, data and models 116, and I/O interface to external services 118; the client-facing I/O interface 112 can facilitate the client-facing input and output processing for DA server 106; one or more processing modules 114 can utilize data and models 116 to process speech input and determine the user's intent based on natural language input; the divisions of functionalities between the client and server portions of the digital assistant can vary in different implementations; e.g., the DA client can be a thin-client that provides only user-facing input and output processing functions, and delegates all other functionalities of the digital assistant to a backend server; i.e., server system 108 (and/or DA server 106) is a backend server including one or more processing modules 114 to process speech input and determine the user's intent based on natural language input; digital assistant system 700 can be implemented on a standalone computer system; digital assistant system 700 can be distributed across multiple computers; some of the modules and functions of the digital assistant can be divided into a server portion and a client portion, where the client portion resides on one or more user devices (e.g., devices 104, 122, 200, 400, or 600) and communicates with the server portion (e.g., server system 108) through one or more networks, e.g., as shown in FIG. 1; digital assistant system 700 can be an implementation of server system 108 (and/or DA server 106) shown in FIG. 1; memory 702 can also store digital assistant module 726 (or the server portion of a digital assistant), which can include input/output processing module 728, speech-to-text (STT) processing module 730, natural language processing module 732, dialogue flow processing module 734, task flow processing module 736, service processing module 738, and speech synthesis module 740; each of these modules can have access to one or more of the following systems or data and models of the digital assistant module 726: ontology 760, vocabulary index 744, user data 748, task flow models 754, service models 756, and ASR systems; using the processing modules, data, and models implemented in digital assistant module 726, the digital assistant can perform at least some of the following: (a) converting speech input into text; (b) identifying a user's intent expressed in a natural language input received from the user; (c) actively eliciting and obtaining information needed to fully infer the user's intent (e.g., by disambiguating words, games, intentions, etc.); (d) determining the task flow for fulfilling the inferred intent; and (e) executing the task flow to fulfill the inferred intent; I/O processing module 728 can interact with the user through I/O devices 716 in FIG. 7A or with a user device (e.g., devices 104, 200, 400, or 600) through network communications interface 708 in FIG. 7A to obtain user input (e.g., a speech input) and to provide responses (e.g., as speech outputs) to the user input. I/O processing module 728 can optionally obtain contextual information associated with the user input from the user device, along with or shortly after the receipt of the user input; I/O processing module 728 can also send follow-up questions to, and receive answers from, the user; i.e., digital assistant system 700 is equivalent to a backend server (e.g., server system 108 (and/or DA server 106)), wherein I/O processing module is equivalent to the client-facing I/O interface 112 and rest of modules (e.g., 730, 731, 732, 734, 736, 738, and 740) in FIG. 7B are equivalent to one or more processing modules 114; ¶¶ [0030] and [0216]-[0235] with FIGS. 7B-C: the terms "digital assistant," "virtual assistant," "intelligent automated assistant," or "automatic digital assistant" can refer to any information processing system that interprets natural language input in spoken and/or textual form to infer user intent, and performs actions based on the inferred user intent; the natural language processing can be based on, e.g., ontology 760 which can be a hierarchical structure containing many nodes, each node representing either an "actionable intent" or a "property" relevant to one or more of the "actionable intents" or other "properties"; an "actionable intent" can represent a task that the digital assistant is capable of performing, i.e., it is "actionable" or can be acted on; a "property" can represent a parameter associated with an actionable intent or a sub-aspect of another property; a linkage between an actionable intent node and a property node in ontology 760 can define how a parameter represented by the property node pertains to the task represented by the actionable intent node; ontology 760 can be made up of actionable intent nodes and property nodes; within ontology 760, each actionable intent node can be linked to one or more property nodes either directly or through one or more intermediate property nodes; similarly, each property node can be linked to one or more actionable intent nodes either directly or through one or more intermediate property nodes; e.g., ontology 760 can include a "restaurant reservation" node (i.e., an actionable intent node); property nodes "restaurant," "date/time" (for the reservation), and "party size" can each be directly linked to the actionable intent node (i.e., the "restaurant reservation" node); in addition, property nodes "cuisine," "price range," "phone number," and "location" can be sub-nodes of the property node "restaurant," and can each be linked to the "restaurant reservation" node (i.e., the actionable intent node) through the intermediate property node "restaurant."; ontology 760 can also include a "set reminder" node (i.e., another actionable intent node); property nodes "date/time" (for setting the reminder) and "subject" (for the reminder) can each be linked to the "set reminder" node; since the property "date/time" can be relevant to both the task of taking a restaurant reservation and the task of setting a reminder, the property node "date/time" can be linked to both the "restaurant reservation" node and the "set reminder" node in ontology 760; while FIG. 7C illustrates two example domains within ontology 760, other domains can include, e.g., "find a movie," "initiate a phone call," "find directions," "schedule a meeting," "send a message," and "provide an answer to a question," "read a list," "providing navigation instructions," "provide instructions for a task" and so on; nodes associated with multiple related actionable intents can be clustered under a "super domain" in ontology 760; e.g., a "travel" super-domain can include a cluster of property nodes and actionable intent nodes related to travel; the actionable intent nodes related to travel can include "airline reservation," "hotel reservation," "car rental," "get directions," "find points of interest," and so on; each node in ontology 760 can be associated with a set of words and/or phrases that are relevant to the property or actionable intent represented by the node; based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 can select one of the actionable intents as the task that the user intended the digital assistant to perform; once natural language processing module 732 identifies an actionable intent (or domain) based on the user request, natural language processing module 732 can generate a structured query to represent the identified actionable intent; the structured query can include parameters for one or more nodes within the domain for the actionable intent, and at least some of the parameters are populated with the specific information and requirements specified in the user request; e.g., the user may say "Make me a dinner reservation at a sushi place at 7"; according to the ontology, a structured query for a "restaurant reservation" domain may include parameters such as {Cuisine}, {Time}, {Date}, {Party Size}, and the like; based on the speech input and the text derived from the speech input using STT processing module 730, natural language processing module 732 can generate a partial structured query for the restaurant reservation domain, where the partial structured query includes the parameters {Cuisine="Sushi"} and {Time="7 pm"}; however, in this example, the user's utterance contains insufficient information to complete the structured query associated with the domain; therefore, other necessary parameters such as {Party Size} and {Date} may not be specified in the structured query based on the information currently available; natural language processing module 732 can populate some parameters of the structured query with received contextual information; e.g., if the user requested a sushi restaurant "near me"; natural language processing module 732 can populate a {location} parameter in the structured query with GPS coordinates from the user device; natural language processing module 732 can populate a {location} parameter in the structured query with GPS coordinates from the user device; natural language processing module 732 can pass the generated structured query (including any completed parameters) to task flow processing module 736 ("task flow processor"); task flow processing module 736 can be configured to receive the structured query from natural language processing module 732, complete the structured query, if necessary, and perform the actions required to "complete" the user's ultimate request; task flow models 754 can include procedures for obtaining additional information from the user and task flows for performing actions associated with the actionable intent; in order to complete a structured query, task flow processing module 736 may need to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances; when such interactions are necessary, task flow processing module 736 can invoke dialogue flow processing module 734 to engage in a dialogue with the user; dialogue flow processing module 734 can determine how (and/or when) to ask the user for the additional information and receives and processes the user responses; the questions can be provided to and answers can be received from the users through I/O processing module 728; dialogue flow processing module 734 can present dialogue output to the user via audio and/or visual output, and receives input from the user via spoken or physical (e.g., clicking) responses; when task flow processing module 736 invokes dialogue flow processing module 734 to determine the "party size" and "date" information for the structured query associated with the domain "restaurant reservation," dialogue flow processing module 734 can generate questions such as "For how many people?" and "On which day?" to pass to the user; once answers are received from the user, dialogue flow processing module 734 can then populate the structured query with the missing information, or pass the information to task flow processing module 736 to complete the missing information from the structured query; once task flow processing module 736 has completed the structured query for an actionable intent, task flow processing module 736 can proceed to perform the ultimate task associated with the actionable intent; task flow processing module 736 can execute the steps and instructions in the task flow model according to the specific parameters contained in the structured query; e.g., using a structured query such as: { restaurant reservation, restaurant=ABC Cafe, date=3/12/2012, time=7 pm, party size=5}, task flow processing module 736 can perform the steps of: (1) logging onto a server of the ABC Cafe or a restaurant reservation system such as OPENTABLE®, (2) entering the date, time, and party size information in a form on the website, (3) submitting the form, and (4) making a calendar entry for the reservation in the user's calendar; task flow processing module 736 can employ the assistance of service processing module 738 ("service processing module") to complete a task requested in the user input or to provide an informational answer requested in the user input; e.g., service processing module 738 can act on behalf of task flow processing module 736 to make a phone call, set a calendar entry, invoke a map search, invoke or interact with other user applications installed on the user device, and invoke or interact with third-party services (e.g., a restaurant reservation portal, a social networking website, a banking portal, etc.); the protocols and application programming interfaces (API) required by each service can be specified by a respective service model among service models 756; service processing module 738 can access the appropriate service model for a service and generate requests for the service in accordance with the protocols and APIs required by the service according to the service model; natural language processing module 732, dialogue flow processing module 734, and task flow processing module 736 can be used collectively and iteratively to infer and define the user's intent, obtain information to further clarify and refine the user intent, and finally generate a response (i.e., an output to the user, or the completion of a task) to fulfill the user's intent; the generated response can be a dialogue response to the speech input that at least partially fulfills the user's intent; further, the generated response can be output as a speech output, and in these examples, the generated response can be sent to speech synthesis module 740 (e.g., speech synthesizer) where it can be processed to synthesize the dialogue response in speech form; the generated response can be data content relevant to satisfying a user request in the speech input; i.e., generating at least one backend query (e.g., "a structured query" initially generated by "Natural Language processing Module" 732 and completed by "Task Flow Processing Module" 736 of the backend server 106/108/700 as in FIGS. 1 and 7A-B) to be processed by the artificial intelligence-based question answering system (e.g., "Task Flow Processing Module" 736 and "Service Processing Module" 738 of the "digital assistant"/"intelligent automated assistant" 700) by integrating the input query (e.g., "Make me a dinner reservation at a sushi place at 7") and the at least a portion of the one or more missing entity arguments (e.g., "location" and "party size") updated with the inferred data (e.g., e.g., "location" is inferred from "a sushi place near me" using contextual information and "party size" is inferred from additional dialogue with the user))
wherein the method is carried out by at least one computing device (PAGALLO, ¶¶ [0032]-[0033] with FIG. 1: a digital assistant can be implemented according to a client-server model; the digital assistant can include client-side portion 102 (hereafter "DA client 102") executed on user device 104 and server-side portion 106 (hereafter "DA server 106") executed on server system 108; one or more processing modules 114 can utilize data and models 116 to process speech input and determine the user's intent based on natural language input; one or more processing modules 114 perform task execution based on inferred user intent; DA server 106 can communicate with external services 120 through network(s) 110 for task completion or information acquisition; ¶ [0075] with FIG. 2A: digital assistant client module 229 can be capable of accepting voice input (e.g., speech input), text input, touch input, and/or gestural input through various user interfaces (e.g., microphone 213, accelerometer(s) 268, touch-sensitive display system 212, optical sensor(s) 229, other input control devices 216, etc.) of portable multifunction device 200; digital assistant client module 229 can also be capable of providing output in audio (e.g., speech output), visual, and/or tactile forms through various output interfaces (e.g., speaker 211, touch-sensitive display system 212, tactile output generator(s) 267, etc.) of portable multifunction device 200; ¶¶ [0195]-[0196] and [0205]-[0206] with FIGS. 7A-7B: digital assistant system 700 can be distributed across multiple computers, which can include memory 702, one or more processors 704, input/output (I/O) interface 706, and network communications interface 708; memory 702 can also store digital assistant module 726 which can include the following sub-modules, or a subset or superset thereof: input/output processing module 728, speech-to-text (STT) processing module 730, natural language processing module 732, dialogue flow processing module 734, task flow processing module 736, service processing module 738, and speech synthesis module 740; the digital assistant can perform at least some of the following: converting speech input into text; identifying a user's intent expressed in a natural language input received from the user; actively eliciting and obtaining information needed to fully infer the user's intent (e.g., by disambiguating words, games, intentions, etc.); determining the task flow for fulfilling the inferred intent; and executing the task flow to fulfill the inferred intent; ¶ [0237] with FIGS. 8, 1, 2A, 4, 6A-6B, 9, 10A-D, and 12: the multilingual language modeling system 800 may be implemented using any device, including any of devices 104, 200, 400, 600, 900, or 1200).
PAGALLO further discloses a computer program product comprising a computer readable storage medium (PAGALLO, ¶¶ [0040] and [0044] with 202 in FIG. 2A: memory 202 may include one or more computer-readable storage mediums; ¶¶ [0196]-[0197] with 702 in FIG. 7A: memory 702 can include a non-transitory computer-readable medium) having program instructions embodied therewith (PAGALLO, ¶ [0045] with FIGS. 1 and 2A: a non-transitory computer-readable storage medium of memory 202 can be used to store instructions; the instructions can be stored on a non-transitory computer-readable storage medium of the server system 108 or can be divided between the non-transitory computer-readable storage medium of memory 202 and the non-transitory computer-readable storage medium of server system 108; ¶¶ [0200] and [0205] with FIG. 7A: memory 702, or the computer readable storage media of memory 702, can store programs, modules (e.g., input/output processing module 728, speech-to-text (STT) processing module 730, natural language processing module 732, dialogue flow processing module 734, task flow processing module 736, service processing module 738, and speech synthesis module 740), instructions, and data structures), the program instructions executable by a computing device (PAGALLO, ¶ [0032] with 104 and 108 in FIG. 1: user device 104 and server system 108; ¶¶ [0039]-[0040] with 200 in FIG. 2: the functions of a digital assistant can be implemented as a standalone application installed on a user device (e.g., portable multifunction device 200); ¶ [0196] with 700 FIG. 7A: digital assistant system 700) to cause the computing device to perform the method described above (PAGALLO, ¶ [0046] with FIG. 2A: the one or more processors 220 run or execute various software programs and/or sets of instructions stored in memory 202 to perform various functions for device 200 and to process data; ¶¶ [0200] and [0204] with FIGS. 7A and 11: memory 702, or the computer-readable storage media of memory 702, can store instructions for performing process 1100; one or more processors 704 can execute these programs, modules, and instructions, and reads/writes from/to the data structures; applications 724 can include programs and/or modules that are configured to be executed by one or more processors 704).
PAGALLO also discloses a system (PAGALLO, ¶ [0032] with 104 and 108 in FIG. 1: user device 104 and server system 108; ¶¶ [0039]-[0040] with 200 in FIG. 2: the functions of a digital assistant can be implemented as a standalone application installed on a user device (e.g., portable multifunction device 200); ¶ [0196] with 700 FIG. 7A: digital assistant system 700) comprising: a memory (PAGALLO, ¶¶ [0040] and [0044] with 202 in FIG. 2A: memory 202 may include one or more computer-readable storage mediums; ¶¶ [0196]-[0197] with 702 in FIG. 7A: memory 702 can include a non-transitory computer-readable medium) configured to store program instructions (PAGALLO, ¶ [0045] with FIGS. 1 and 2A: a non-transitory computer-readable storage medium of memory 202 can be used to store instructions; the instructions can be stored on a non-transitory computer-readable storage medium of the server system 108 or can be divided between the non-transitory computer-readable storage medium of memory 202 and the non-transitory computer-readable storage medium of server system 108; ¶¶ [0200] and [0205] with FIG. 7A: memory 702, or the computer readable storage media of memory 702, can store programs, modules (e.g., input/output processing module 728, speech-to-text (STT) processing module 730, natural language processing module 732, dialogue flow processing module 734, task flow processing module 736, service processing module 738, and speech synthesis module 740), instructions, and data structures); and a processor (PAGALLO, ¶ [0040] with 220 in FIG. 2A: one or more processing units (CPUs) 220; ¶ [0196] with 704 in FIG. 7A: one or more processors 704) operatively coupled to the memory to execute the program instructions to perform the method described above (PAGALLO, ¶ [0046] with FIG. 2A: the one or more processors 220 run or execute various software programs and/or sets of instructions stored in memory 202 to perform various functions for device 200 and to process data; ¶¶ [0200] and [0204] with FIGS. 7A and 11: memory 702, or the computer-readable storage media of memory 702, can store instructions for performing process 1100; one or more processors 704 can execute these programs, modules, and instructions, and reads/writes from/to the data structures; applications 724 can include programs and/or modules that are configured to be executed by one or more processors 704).
Claims 2 and 14
PAGALLO discloses all the elements as stated in Claims 1 and 13 respectively and further discloses wherein inferring the data from at least a portion of the one or more partial query signals using the at least one artificial intelligence technique comprises inferring the data from at least a portion of the one or more partial query signals by processing the at least a portion of the one or more partial query signals using one or more semantic reasoning techniques (PAGALLO, ¶¶ [0075]-[0079] with FIG, 2A: user data and models 231 can includes various models (e.g., speech recognition models, statistical language models, natural language processing models, ontology, task flow models, service models, etc.) for processing user input and determining user intent; digital assistant client module 229 can also elicit additional input from the user via a natural language dialogue or other user interfaces upon request by DA server 106; digital assistant client module 229 can pass the additional input to DA server 106 to help DA server 106 in intent deduction and/or fulfillment of the user's intent expressed in the user request; ¶¶ [0206]-[0226] and [0229] with FIGS. 7A-C: identifying a user's intent expressed in a natural language input received from the user; actively eliciting and obtaining information needed to fully infer the user's intent (e.g., by disambiguating words, games, intentions, etc.); each ASR system can include one or more speech recognition models (e.g., acoustic models and/or language models) and can implement one or more speech recognition engines; examples of speech recognition models can include Hidden Markov Models, Gaussian-Mixture Models, Deep Neural Network Models, n-gram language models, and other statistical models; examples of speech recognition engines can include the dynamic time warping based engines and weighted finite-state transducers (WFST) based engines; the one or more speech recognition models and the one or more speech recognition engines can be used to process the extracted representative features of the front-end speech pre-processor to produce intermediate recognitions results (e.g., phonemes, phonemic strings, and sub-words), and ultimately, text recognition results (e.g., words, word strings, or sequence of tokens); Natural language processing module 732 ("natural language processor") of the digital assistant can take the sequence of words or tokens ("token sequence") generated by STT processing module 730, and attempt to associate the token sequence with one or more "actionable intents" recognized by the digital assistant; in addition to the sequence of words or tokens obtained from STT processing module 730, natural language processing module 732 can also receive contextual information associated with the user request, e.g., from I/O processing module 728; the natural language processing module 732 can optionally use the contextual information to clarify, supplement, and/or further define the information contained in the token sequence received from STT processing module 730; the natural language processing can be based on, e.g., ontology 760 which can be a hierarchical structure containing many nodes, each node representing either an "actionable intent" or a "property" relevant to one or more of the "actionable intents" or other "properties"; each node in ontology 760 can be associated with a set of words and/or phrases that are relevant to the property or actionable intent represented by the node; the respective set of words and/or phrases associated with each node can be the so-called "vocabulary"; associated with the node. the respective set of words and/or phrases associated with each node can be stored in vocabulary index 744 in association with the property or actionable intent represented by the node; the vocabulary index 744 can optionally include words and phrases in different languages; Natural language processing module 732 can receive the token sequence (e.g., a text string) from STT processing module 730, and determine what nodes are implicated by the words in the token sequence; if a word or phrase in the token sequence is found to be associated with one or more nodes in ontology 760 (via vocabulary index 744), the word or phrase can "trigger" or "activate" those nodes; based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 can select one of the actionable intents as the task that the user intended the digital assistant to perform; the domain that has the most "triggered" nodes can be selected; the domain having the highest confidence value ( e.g., based on the relative importance of its various triggered nodes) can be selected; additional factors are considered in selecting the node as well, such as whether the digital assistant has previously correctly interpreted a similar request from a user; natural language processing module 732 can use the user-specific information to supplement the information contained in the user input to further define the user intent; in order to complete a structured query, task flow processing module 736 may need to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances; dialogue flow processing module 734 can determine how (and/or when) to ask the user for the additional information and receives and processes the user responses; ¶¶ [0237]-[0251] with FIG. 8: the multilingual language modeling system 800 serves to provide candidate words based on context information of an electronic device; candidate words are word predictions for a current word and in some examples, may be auto-completion and/or auto-correction word predictions; providing candidate words includes determining monolingual probabilities, i.e., determining words using a monolingual language model, and adjusting (e.g., weighting) each of the monolingual probabilities to provide multilingual probabilities; monolingual probabilities are adjusted based on the context information of the electronic device; candidate words are thereafter provided based on the multilingual probabilities; the multilingual language modeling system 800 may include a plurality of language models 810, each of which may be a monolingual language model (i.e., a language model corresponding to a respective language); each of the language models 810 may identify (e.g., receive) context information; based on the context information; context information identified in this manner may include one or more words relevant to predicting a current word wq; one or more of the language models 810 operate at a sentence-level; the context information may include any number of words of a sentence preceding the current word wq (i.e., left context) and/or following the current word wq (i.e., right context); based on the context information, each language model 810 provides a respective set of monolingual probabilities; probabilities provided in this manner are determined (e.g., generated) according to any known text prediction methodologies, including but not limited to, n-gram (e.g., word n-gram and/or character n-gram) language model word prediction and continuous space language model word prediction; the multilingual combination module 820 receives the monolingual probabilities P from each of the language models 810, respectively, and the language identification string L from the language identification module 815; based on each of the monolingual probabilities P and the language weights L, the multilingual combination module 820 provides a set of multilingual probabilities J; because determining multilingual probabilities J may be computationally demanding, the expression for determining multilingual probabilities J may be simplified by way of approximation; as a result of this approximation, each set of monolingual probabilities P no longer depends on languages specified by language identification string L, and the multilingual probabilities J may be represented as a weighted form of monolingual probabilities P; the approximation of the multilingual probabilities J represents a mixture of the monolingual probabilities weighted by the relative importance of the current word wq for each language; once the multilingual combination module 820 has provided (e.g., generated) multilingual probabilities J, the multilingual combination module 820 may provide one or more candidate words (e.g., word predictions) based on the multilingual probabilities J; the multilingual combination module 820 may identify one or words corresponding to the highest probabilities of the multilingual probabilities and provide the identified words as candidate words; candidate words may be provided during use of any application allowing for text entry).
Claims 3 and 15
PAGALLO discloses all the elements as stated in Claims 1 and 13 respectively and further discloses wherein inferring the data from at least a portion of the one or more partial query signals using the at least one artificial intelligence technique comprises inferring the data from at least a portion of the one or more partial query signals by processing the at least a portion of the one or more partial query signals using one or more sequence-to-sequence learning techniques (PAGALLO, ¶ [0030]: the terms "digital assistant," "virtual assistant," "intelligent automated assistant," or "automatic digital assistant" can refer to any information processing system that interprets natural language input in spoken and/or textual form to infer user intent, and performs actions based on the inferred user intent; ¶¶ [0075]-[0079] with FIG, 2A: user data and models 231 can includes various models (e.g., speech recognition models, statistical language models, natural language processing models, ontology, task flow models, service models, etc.) for processing user input and determining user intent; digital assistant client module 229 can also elicit additional input from the user via a natural language dialogue or other user interfaces upon request by DA server 106; digital assistant client module 229 can pass the additional input to DA server 106 to help DA server 106 in intent deduction and/or fulfillment of the user's intent expressed in the user request; ¶¶ [0206]-[0226] and [0229] with FIGS. 7A-C: identifying a user's intent expressed in a natural language input received from the user; actively eliciting and obtaining information needed to fully infer the user's intent (e.g., by disambiguating words, games, intentions, etc.); each ASR system can include one or more speech recognition models (e.g., acoustic models and/or language models) and can implement one or more speech recognition engines; examples of speech recognition models can include Hidden Markov Models, Gaussian-Mixture Models, Deep Neural Network Models, n-gram language models, and other statistical models; examples of speech recognition engines can include the dynamic time warping based engines and weighted finite-state transducers (WFST) based engines; the one or more speech recognition models and the one or more speech recognition engines can be used to process the extracted representative features of the front-end speech pre-processor to produce intermediate recognitions results (e.g., phonemes, phonemic strings, and sub-words), and ultimately, text recognition results (e.g., words, word strings, or sequence of tokens); Natural language processing module 732 ("natural language processor") of the digital assistant can take the sequence of words or tokens ("token sequence") generated by STT processing module 730, and attempt to associate the token sequence with one or more "actionable intents" recognized by the digital assistant; in addition to the sequence of words or tokens obtained from STT processing module 730, natural language processing module 732 can also receive contextual information associated with the user request, e.g., from I/O processing module 728; the natural language processing module 732 can optionally use the contextual information to clarify, supplement, and/or further define the information contained in the token sequence received from STT processing module 730; the natural language processing can be based on, e.g., ontology 760 which can be a hierarchical structure containing many nodes, each node representing either an "actionable intent" or a "property" relevant to one or more of the "actionable intents" or other "properties"; each node in ontology 760 can be associated with a set of words and/or phrases that are relevant to the property or actionable intent represented by the node; the respective set of words and/or phrases associated with each node can be the so-called "vocabulary"; associated with the node. the respective set of words and/or phrases associated with each node can be stored in vocabulary index 744 in association with the property or actionable intent represented by the node; the vocabulary index 744 can optionally include words and phrases in different languages; Natural language processing module 732 can receive the token sequence (e.g., a text string) from STT processing module 730, and determine what nodes are implicated by the words in the token sequence; if a word or phrase in the token sequence is found to be associated with one or more nodes in ontology 760 (via vocabulary index 744), the word or phrase can "trigger" or "activate" those nodes; based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 can select one of the actionable intents as the task that the user intended the digital assistant to perform; the domain that has the most "triggered" nodes can be selected; the domain having the highest confidence value ( e.g., based on the relative importance of its various triggered nodes) can be selected; additional factors are considered in selecting the node as well, such as whether the digital assistant has previously correctly interpreted a similar request from a user; natural language processing module 732 can use the user-specific information to supplement the information contained in the user input to further define the user intent; in order to complete a structured query, task flow processing module 736 may need to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances; dialogue flow processing module 734 can determine how (and/or when) to ask the user for the additional information and receives and processes the user responses; ¶¶ [0237]-[0251] with FIG. 8: the multilingual language modeling system 800 serves to provide candidate words based on context information of an electronic device; candidate words are word predictions for a current word and in some examples, may be auto-completion and/or auto-correction word predictions; providing candidate words includes determining monolingual probabilities, i.e., determining words using a monolingual language model, and adjusting (e.g., weighting) each of the monolingual probabilities to provide multilingual probabilities; monolingual probabilities are adjusted based on the context information of the electronic device; candidate words are thereafter provided based on the multilingual probabilities; the multilingual language modeling system 800 may include a plurality of language models 810, each of which may be a monolingual language model (i.e., a language model corresponding to a respective language); each of the language models 810 may identify (e.g., receive) context information; based on the context information; context information identified in this manner may include one or more words relevant to predicting a current word wq; one or more of the language models 810 operate at a sentence-level; the context information may include any number of words of a sentence preceding the current word wq (i.e., left context) and/or following the current word wq (i.e., right context); based on the context information, each language model 810 provides a respective set of monolingual probabilities; probabilities provided in this manner are determined (e.g., generated) according to any known text prediction methodologies, including but not limited to, n-gram (e.g., word n-gram and/or character n-gram) language model word prediction and continuous space language model word prediction; the multilingual combination module 820 receives the monolingual probabilities P from each of the language models 810, respectively, and the language identification string L from the language identification module 815; based on each of the monolingual probabilities P and the language weights L, the multilingual combination module 820 provides a set of multilingual probabilities J; because determining multilingual probabilities J may be computationally demanding, the expression for determining multilingual probabilities J may be simplified by way of approximation; as a result of this approximation, each set of monolingual probabilities P no longer depends on languages specified by language identification string L, and the multilingual probabilities J may be represented as a weighted form of monolingual probabilities P; the approximation of the multilingual probabilities J represents a mixture of the monolingual probabilities weighted by the relative importance of the current word wq for each language; once the multilingual combination module 820 has provided (e.g., generated) multilingual probabilities J, the multilingual combination module 820 may provide one or more candidate words (e.g., word predictions) based on the multilingual probabilities J; the multilingual combination module 820 may identify one or words corresponding to the highest probabilities of the multilingual probabilities and provide the identified words as candidate words; candidate words may be provided during use of any application allowing for text entry).
Claims 4 and 16
PAGALLO discloses all the elements as stated in Claims 1 and 13 respectively and further discloses wherein performing the one or more automated actions comprises generating an answer to the input query using the artificial intelligence-based question answering system (PAGALLO, ¶ [0031]: a digital assistant can be capable of accepting a user request at least partially in the form of a natural language command, request, statement, narrative, and/or inquiry; typically, the user request can seek either an informational answer or performance of a task by the digital assistant; a satisfactory response to the user request can be a provision of the requested informational answer, a performance of the requested task, or a combination of the two; during performance of a requested task, the digital assistant can sometimes interact with the user in a continuous dialogue involving multiple exchanges of information over an extended period of time; in addition to providing verbal responses and taking programmed actions, the digital assistant can also provide responses in other visual or audio forms, e.g., as text, alerts, music, videos, animations, etc.; ¶ [0207] with FIGS. 7A-B: I/O processing module 728 can interact with the user through I/O devices 716 in FIG. 7A or with a user device through network communications interface 708 in FIG. 7A to obtain user input (e.g., a speech input) and to provide responses (e.g., as speech outputs) to the user input; I/O processing module 728 can optionally obtain contextual information associated with the user input from the user device, along with or shortly after the receipt of the user input; I/O processing module 728 can also send follow-up questions to, and receive answers from, the user regarding the user request;; ¶ [0220]: other domains can include "provide an answer to a question"; ¶¶ [0230]-[0235] with FIG. 7B: once task flow processing module 736 has completed the structured query for an actionable intent, task flow processing module 736 can proceed to perform the ultimate task associated with the actionable intent; task flow processing module 736 can execute the steps and instructions in the task flow model according to the specific parameters contained in the structured query; e.g., using a structured query such as: { restaurant reservation, restaurant=ABC Cafe, date=3/12/2012, time=7 pm, party size=5}, task flow processing module 736 can perform the steps of: (1) logging onto a server of the ABC Cafe or a restaurant reservation system such as OPENTABLE®, (2) entering the date, time, and party size information in a form on the website, (3) submitting the form, and (4) making a calendar entry for the reservation in the user's calendar; task flow processing module 736 can employ the assistance of service processing module 738 ("service processing module") to complete a task requested in the user input or to provide an informational answer requested in the user input; e.g., service processing module 738 can act on behalf of task flow processing module 736 to make a phone call, set a calendar entry, invoke a map search, invoke or interact with other user applications installed on the user device, and invoke or interact with third-party services (e.g., a restaurant reservation portal, a social networking website, a banking portal, etc.); the protocols and application programming interfaces (API) required by each service can be specified by a respective service model among service models 756; service processing module 738 can access the appropriate service model for a service and generate requests for the service in accordance with the protocols and APIs required by the service according to the service model; natural language processing module 732, dialogue flow processing module 734, and task flow processing module 736 can be used collectively and iteratively to infer and define the user's intent, obtain information to further clarify and refine the user intent, and finally generate a response (i.e., an output to the user, or the completion of a task) to fulfill the user's intent; the generated response can be a dialogue response to the speech input that at least partially fulfills the user's intent; further, the generated response can be output as a speech output, and in these examples, the generated response can be sent to speech synthesis module 740 (e.g., speech synthesizer) where it can be processed to synthesize the dialogue response in speech form; the generated response can be data content relevant to satisfying a user request in the speech input).
Claims 6 and 18
PAGALLO discloses all the elements as stated in Claims 1 and 13 respectively and further discloses wherein inferring the data from at least a portion of the one or more partial query signals using the at least one artificial intelligence technique comprises dynamically identifying one or more synonyms for one or more words within the at least a portion of the one or more partial query signals (PAGALLO, ¶ [0079] with FIG. 2A: digital assistant client module 229 can also elicit additional input from the user via a natural language dialogue or other user interfaces upon request by DA server 106; digital assistant client module 229 can pass the additional input to DA server 106 to help DA server 106 in intent deduction and/or fulfillment of the user's intent expressed in the user request; in order to complete a structured query, task flow processing module 736 may need to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances; dialogue flow processing module 734 can determine how (and/or when) to ask the user for the additional information and receives and processes the user responses; ¶¶ [0228]-[0229] with FIG. 7B: task flow models 754 can include procedures for obtaining additional information from the user and task flows for performing actions associated with the actionable intent; in order to complete a structured query, task flow processing module 736 may need to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances; dialogue flow processing module 734 can determine how (and/or when) to ask the user for the additional information and receives and processes the user responses; ¶¶ [0215]-[0216] with FIG. 7B: in addition to the sequence of words or tokens obtained from STT processing module 730, natural language processing module 732 can also receive contextual information associated with the user request, e.g., from I/O processing module 728; the natural language processing module 732 can optionally use the contextual information to clarify, supplement, and/or further define the information contained in the token sequence received from STT processing module 730; contextual information can be dynamic, and can change with time, location, content of the dialogue, and other factors; the natural language processing can be based on, e.g., ontology 760 which can be a hierarchical structure containing many nodes, each node representing either an "actionable intent" or a "property" relevant to one or more of the "actionable intents" or other "properties"; ¶ [0223] with FIG. 7B: each node in ontology 760 can be associated with a set of words and/or phrases that are relevant to the property or actionable intent represented by the node. The respective set of words and/or phrases associated with each node can be the so-called "vocabulary" associated with the node; the respective set of words and/or phrases associated with each node can be stored in vocabulary index 744 in association with the property or actionable intent represented by the node; the vocabulary index 744 can optionally include words and phrases in different languages; ¶ [0240]: each of the language models 810 may identify (e.g., receive) context information; context information identified in this manner may include one or more words relevant to predicting a current word wq; ¶ [0251] with FIG. 8: candidate words may be provided, for instance by the multilingual language modeling system 800 of FIG. 8, during use of any application allowing for text entry; ¶¶ [0252]-[0263] with FIG. 9: inserting the candidate word includes replacing an at least partial word input with the selected candidate word 920; a candidate word 920 may be identical to an at least partial word input, allowing the user to confirm correct entry of the at least partial word input as correct; user confirmation confirming correct entry may result in a language corresponding to the confirmed at least partial word input to be given greater weight, for instance, in the recipient constraint model; deleting the originally selected candidate word may result in a language corresponding to the originally selected candidate word 920 being given lesser weight in the recipient constraint model).
Claims 7 and 19
PAGALLO discloses all the elements as stated in Claims 1 and 13 respectively and further discloses wherein updating the at least a portion of the one or more missing entity arguments comprises determining one or more entity relations for one or more concepts comprising one or more of aggregation, order by, and temporal phrase (PAGALLO, ¶¶ [0227]-[0230] with FIGS. 2B-C: once natural language processing module 732 identifies an actionable intent (or domain) based on the user request, natural language processing module 732 can generate a structured query to represent the identified actionable intent; the structured query can include parameters for one or more nodes within the domain for the actionable intent, and at least some of the parameters are populated with the specific information and requirements specified in the user request; e.g., the user may say "Make me a dinner reservation at a sushi place at 7"; according to the ontology, a structured query for a "restaurant reservation" domain may include parameters such as {Cuisine}, {Time}, {Date}, {Party Size} (i.e., total number of persons in a party), and the like; based on the speech input and the text derived from the speech input using STT processing module 730, natural language processing module 732 can generate a partial structured query for the restaurant reservation domain, where the partial structured query includes the parameters {Cuisine="Sushi"} and {Time="7 pm"}; however, in this example, the user's utterance contains insufficient information to complete the structured query associated with the domain; therefore, other necessary parameters such as {Party Size} and {Date} may not be specified in the structured query based on the information currently available; natural language processing module 732 can populate some parameters of the structured query with received contextual information; e.g., if the user requested a sushi restaurant "near me"; natural language processing module 732 can populate a {location} parameter in the structured query with GPS coordinates from the user device; natural language processing module 732 can pass the generated structured query (including any completed parameters) to task flow processing module 736 ("task flow processor"); task flow models 754 can include procedures for obtaining additional information from the user and task flows for performing actions associated with the actionable intent; in order to complete a structured query, task flow processing module 736 may need to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances; when such interactions are necessary, task flow processing module 736 can invoke dialogue flow processing module 734 to engage in a dialogue with the user; dialogue flow processing module 734 can determine how (and/or when) to ask the user for the additional information and receives and processes the user responses; e.g., when task flow processing module 736 invokes dialogue flow processing module 734 to determine the "party size" and "date" information for the structured query associated with the domain "restaurant reservation," dialogue flow processing module 734 can generate questions such as "For how many people?" and "On which day?" to pass to the user; once answers are received from the user, dialogue flow processing module 734 can then populate the structured query with the missing information, or pass the information to task flow processing module 736 to complete the missing information from the structured query; once task flow processing module 736 has completed the structured query for an actionable intent, task flow processing module 736 can proceed to perform the ultimate task associated with the actionable intent; accordingly, task flow processing module 736 can execute the steps and instructions in the task flow model according to the specific parameters contained in the structured query; e.g., using a structured query such as: { restaurant reservation, restaurant=ABC Cafe, date=3/12/2012, time=7 pm, party size=5}, task flow processing module 736 can perform the steps of: (1) logging onto a server of the ABC Cafe or a restaurant reservation system such as OPENTABLE®, (2) entering the date, time, and party size information in a form on the website, (3) submitting the form, and (4) making a calendar entry for the reservation in the user's calendar; ¶¶ [0216]-[0226] with FIGS. 7B-C: the natural language processing can be based on, e.g., ontology 760 which can be a hierarchical structure containing many nodes, each node representing either an "actionable intent" or a "property" relevant to one or more of the "actionable intents" or other "properties"; a linkage between an actionable intent node and a property node in ontology 760 can define how a parameter represented by the property node pertains to the task represented by the actionable intent node; ontology 760 can be made up of actionable intent nodes and property nodes; within ontology 760, each actionable intent node can be linked to one or more property nodes either directly or through one or more intermediate property nodes; similarly, each property node can be linked to one or more actionable intent nodes either directly or through one or more intermediate property nodes; an actionable intent node, along with its linked concept nodes, can be described as a "domain"; ontology 760 can be made up of many domains; each domain can share one or more property nodes with one or more other domains; while FIG. 7C illustrates two example domains (e.g., restaurant reservation domain 762 and reminder domain 764) within ontology 760, other domains can include, for example, "find a movie," "initiate a phone call," "find directions," "schedule a meeting," "send a message," and "provide an answer to a question," "read a list," "providing navigation instructions," "provide instructions for a task" and so on; each node in ontology 760 can be associated with a set of words and/or phrases that are relevant to the property or actionable intent represented by the node; the respective set of words and/or phrases associated with each node can be the so-called "vocabulary" associated with the node; the respective set of words and/or phrases associated with each node can be stored in vocabulary index 744 in association with the property or actionable intent represented by the node; the vocabulary index 744 can optionally include words and phrases in different languages; Natural language processing module 732 can receive the token sequence (e.g., a text string) from STT processing module 730, and determine what nodes are implicated by the words in the token sequence; if a word or phrase in the token sequence is found to be associated with one or more nodes in ontology 760 (via vocabulary index 744), the word or phrase can "trigger" or "activate" those nodes; based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 can select one of the actionable intents as the task that the user intended the digital assistant to perform; the domain that has the most "triggered" nodes can be selected; the domain having the highest confidence value ( e.g., based on the relative importance of its various triggered nodes) can be selected; additional factors are considered in selecting the node as well, such as whether the digital assistant has previously correctly interpreted a similar request from a user; natural language processing module 732 can use the user-specific information to supplement the information contained in the user input to further define the user intent).
Claim 8
PAGALLO discloses all the elements as stated in Claim 1 and further discloses wherein determining the one or more partial query signals associated with said each of the two or more languages comprises identifying, for said each of the two or more languages, one or more language-specific phrases in the input query (PAGALLO, ¶¶ [0216]-[0226] with FIGS. 7B-C: the natural language processing can be based on, e.g., ontology 760 which can be a hierarchical structure containing many nodes, each node representing either an "actionable intent" or a "property" relevant to one or more of the "actionable intents" or other "properties"; each node in ontology 760 can be associated with a set of words and/or phrases that are relevant to the property or actionable intent represented by the node; the respective set of words and/or phrases associated with each node can be the so-called "vocabulary" associated with the node; the respective set of words and/or phrases associated with each node can be stored in vocabulary index 744 in association with the property or actionable intent represented by the node; the vocabulary index 744 can optionally include words and phrases in different languages; Natural language processing module 732 can receive the token sequence (e.g., a text string) from STT processing module 730, and determine what nodes are implicated by the words in the token sequence; if a word or phrase in the token sequence is found to be associated with one or more nodes in ontology 760 (via vocabulary index 744), the word or phrase can "trigger" or "activate" those nodes; based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 can select one of the actionable intents as the task that the user intended the digital assistant to perform; the domain that has the most "triggered" nodes can be selected; the domain having the highest confidence value ( e.g., based on the relative importance of its various triggered nodes) can be selected; additional factors are considered in selecting the node as well, such as whether the digital assistant has previously correctly interpreted a similar request from a user; natural language processing module 732 can use the user-specific information to supplement the information contained in the user input to further define the user intent; ¶¶ [0237]-[0251] and [0255] with FIG. 8: the multilingual language modeling system 800 serves to provide candidate words based on context information of an electronic device; candidate words are word predictions for a current word and in some examples, may be auto-completion and/or auto-correction word predictions; providing candidate words includes determining monolingual probabilities, i.e., determining words using a monolingual language model, and adjusting (e.g., weighting) each of the monolingual probabilities to provide multilingual probabilities; monolingual probabilities are adjusted based on the context information of the electronic device; candidate words are thereafter provided based on the multilingual probabilities; the multilingual language modeling system 800 may include a plurality of language models 810, each of which may be a monolingual language model (i.e., a language model corresponding to a respective language); each of the language models 810 may identify (e.g., receive) context information; based on the context information; context information identified in this manner may include one or more words relevant to predicting a current word wq; one or more of the language models 810 operate at a sentence-level; the context information may include any number of words of a sentence preceding the current word wq (i.e., left context) and/or following the current word wq (i.e., right context); based on the context information, each language model 810 provides a respective set of monolingual probabilities; probabilities provided in this manner are determined (e.g., generated) according to any known text prediction methodologies, including but not limited to, n-gram (e.g., word n-gram and/or character n-gram) language model word prediction and continuous space language model word prediction; the multilingual combination module 820 receives the monolingual probabilities P from each of the language models 810, respectively, and the language identification string L from the language identification module 815; based on each of the monolingual probabilities P and the language weights L, the multilingual combination module 820 provides a set of multilingual probabilities J; because determining multilingual probabilities J may be computationally demanding, the expression for determining multilingual probabilities J may be simplified by way of approximation; as a result of this approximation, each set of monolingual probabilities P no longer depends on languages specified by language identification string L, and the multilingual probabilities J may be represented as a weighted form of monolingual probabilities P; the approximation of the multilingual probabilities J represents a mixture of the monolingual probabilities weighted by the relative importance of the current word wq for each language; once the multilingual combination module 820 has provided (e.g., generated) multilingual probabilities J, the multilingual combination module 820 may provide one or more candidate words (e.g., word predictions) based on the multilingual probabilities J; the multilingual combination module 820 may identify one or words corresponding to the highest probabilities of the multilingual probabilities and provide the identified words as candidate words; candidate words may be provided during use of any application allowing for text entry; each of the language models may correspond to a respective language; one or more language models correspond to a multilingual lexicon including words of multiple languages (e.g., most commonly used 5,000 words in each of the plurality of languages); one or more language models correspond to a regional dialect (e.g., British-English, US-English); one or more languages correspond to a topical lexicon including words for one or more specific subjects, such as medicine or law).
Claim 9
PAGALLO discloses all the elements as stated in Claim 1 and further discloses wherein identifying the one or more missing entity arguments comprises detecting a presence of one or more aggregation keywords without at least one corresponding argument (PAGALLO, ¶¶ [0227]-[0230] with FIGS. 2B-C: once natural language processing module 732 identifies an actionable intent (or domain) based on the user request, natural language processing module 732 can generate a structured query to represent the identified actionable intent; the structured query can include parameters for one or more nodes within the domain for the actionable intent, and at least some of the parameters are populated with the specific information and requirements specified in the user request; e.g., the user may say "Make me a dinner reservation at a sushi place at 7"; according to the ontology, a structured query for a "restaurant reservation" domain may include parameters such as {Cuisine}, {Time}, {Date}, {Party Size} (i.e., total number of persons in a party), and the like; based on the speech input and the text derived from the speech input using STT processing module 730, natural language processing module 732 can generate a partial structured query for the restaurant reservation domain, where the partial structured query includes the parameters {Cuisine="Sushi"} and {Time="7 pm"}; however, in this example, the user's utterance contains insufficient information to complete the structured query associated with the domain; therefore, other necessary parameters such as {Party Size} and {Date} may not be specified in the structured query based on the information currently available; natural language processing module 732 can populate some parameters of the structured query with received contextual information; e.g., if the user requested a sushi restaurant "near me"; natural language processing module 732 can populate a {location} parameter in the structured query with GPS coordinates from the user device; natural language processing module 732 can pass the generated structured query (including any completed parameters) to task flow processing module 736 ("task flow processor"); task flow models 754 can include procedures for obtaining additional information from the user and task flows for performing actions associated with the actionable intent; in order to complete a structured query, task flow processing module 736 may need to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances; when such interactions are necessary, task flow processing module 736 can invoke dialogue flow processing module 734 to engage in a dialogue with the user; dialogue flow processing module 734 can determine how (and/or when) to ask the user for the additional information and receives and processes the user responses; e.g., when task flow processing module 736 invokes dialogue flow processing module 734 to determine the "party size" and "date" information for the structured query associated with the domain "restaurant reservation," dialogue flow processing module 734 can generate questions such as "For how many people?" and "On which day?" to pass to the user; once answers are received from the user, dialogue flow processing module 734 can then populate the structured query with the missing information, or pass the information to task flow processing module 736 to complete the missing information from the structured query; once task flow processing module 736 has completed the structured query for an actionable intent, task flow processing module 736 can proceed to perform the ultimate task associated with the actionable intent; accordingly, task flow processing module 736 can execute the steps and instructions in the task flow model according to the specific parameters contained in the structured query; e.g., using a structured query such as: { restaurant reservation, restaurant=ABC Cafe, date=3/12/2012, time=7 pm, party size=5}, task flow processing module 736 can perform the steps of: (1) logging onto a server of the ABC Cafe or a restaurant reservation system such as OPENTABLE®, (2) entering the date, time, and party size information in a form on the website, (3) submitting the form, and (4) making a calendar entry for the reservation in the user's calendar; ¶¶ [0216]-[0226] with FIGS. 7B-C: the natural language processing can be based on, e.g., ontology 760 which can be a hierarchical structure containing many nodes, each node representing either an "actionable intent" or a "property" relevant to one or more of the "actionable intents" or other "properties"; a linkage between an actionable intent node and a property node in ontology 760 can define how a parameter represented by the property node pertains to the task represented by the actionable intent node; ontology 760 can be made up of actionable intent nodes and property nodes; within ontology 760, each actionable intent node can be linked to one or more property nodes either directly or through one or more intermediate property nodes; similarly, each property node can be linked to one or more actionable intent nodes either directly or through one or more intermediate property nodes; an actionable intent node, along with its linked concept nodes, can be described as a "domain"; ontology 760 can be made up of many domains; each domain can share one or more property nodes with one or more other domains; while FIG. 7C illustrates two example domains (e.g., restaurant reservation domain 762 and reminder domain 764) within ontology 760, other domains can include, for example, "find a movie," "initiate a phone call," "find directions," "schedule a meeting," "send a message," and "provide an answer to a question," "read a list," "providing navigation instructions," "provide instructions for a task" and so on; each node in ontology 760 can be associated with a set of words and/or phrases that are relevant to the property or actionable intent represented by the node; the respective set of words and/or phrases associated with each node can be the so-called "vocabulary" associated with the node; the respective set of words and/or phrases associated with each node can be stored in vocabulary index 744 in association with the property or actionable intent represented by the node; the vocabulary index 744 can optionally include words and phrases in different languages; Natural language processing module 732 can receive the token sequence (e.g., a text string) from STT processing module 730, and determine what nodes are implicated by the words in the token sequence; if a word or phrase in the token sequence is found to be associated with one or more nodes in ontology 760 (via vocabulary index 744), the word or phrase can "trigger" or "activate" those nodes; based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 can select one of the actionable intents as the task that the user intended the digital assistant to perform; the domain that has the most "triggered" nodes can be selected; the domain having the highest confidence value ( e.g., based on the relative importance of its various triggered nodes) can be selected; additional factors are considered in selecting the node as well, such as whether the digital assistant has previously correctly interpreted a similar request from a user; natural language processing module 732 can use the user-specific information to supplement the information contained in the user input to further define the user intent).
Claim 10
PAGALLO discloses all the elements as stated in Claim 1 and further discloses wherein identifying the one or more missing entity arguments comprises detecting a presence of one or more numeric comparisons without one or more of at least one argument and at least one numeric value (PAGALLO, ¶¶ [0227]-[0230] with FIGS. 2B-C: once natural language processing module 732 identifies an actionable intent (or domain) based on the user request, natural language processing module 732 can generate a structured query to represent the identified actionable intent; the structured query can include parameters for one or more nodes within the domain for the actionable intent, and at least some of the parameters are populated with the specific information and requirements specified in the user request; e.g., the user may say "Make me a dinner reservation at a sushi place at 7"; according to the ontology, a structured query for a "restaurant reservation" domain may include parameters such as {Cuisine}, {Time}, {Date}, {Party Size} (i.e., total number of persons in a party or maximum number of persons to be attended for restaurant reservation), and the like; based on the speech input and the text derived from the speech input using STT processing module 730, natural language processing module 732 can generate a partial structured query for the restaurant reservation domain, where the partial structured query includes the parameters {Cuisine="Sushi"} and {Time="7 pm"}; however, in this example, the user's utterance contains insufficient information to complete the structured query associated with the domain; therefore, other necessary parameters such as {Party Size} and {Date} may not be specified in the structured query based on the information currently available; natural language processing module 732 can populate some parameters of the structured query with received contextual information; e.g., if the user requested a sushi restaurant "near me"; natural language processing module 732 can populate a {location} parameter in the structured query with GPS coordinates from the user device; natural language processing module 732 can pass the generated structured query (including any completed parameters) to task flow processing module 736 ("task flow processor"); task flow models 754 can include procedures for obtaining additional information from the user and task flows for performing actions associated with the actionable intent; in order to complete a structured query, task flow processing module 736 may need to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances; when such interactions are necessary, task flow processing module 736 can invoke dialogue flow processing module 734 to engage in a dialogue with the user; dialogue flow processing module 734 can determine how (and/or when) to ask the user for the additional information and receives and processes the user responses; e.g., when task flow processing module 736 invokes dialogue flow processing module 734 to determine the "party size" and "date" information for the structured query associated with the domain "restaurant reservation," dialogue flow processing module 734 can generate questions such as "For how many people?" and "On which day?" to pass to the user; once answers are received from the user, dialogue flow processing module 734 can then populate the structured query with the missing information, or pass the information to task flow processing module 736 to complete the missing information from the structured query; once task flow processing module 736 has completed the structured query for an actionable intent, task flow processing module 736 can proceed to perform the ultimate task associated with the actionable intent; accordingly, task flow processing module 736 can execute the steps and instructions in the task flow model according to the specific parameters contained in the structured query; e.g., using a structured query such as: { restaurant reservation, restaurant=ABC Cafe, date=3/12/2012, time=7 pm, party size=5}, task flow processing module 736 can perform the steps of: (1) logging onto a server of the ABC Cafe or a restaurant reservation system such as OPENTABLE®, (2) entering the date, time, and party size information in a form on the website, (3) submitting the form, and (4) making a calendar entry for the reservation in the user's calendar; ¶¶ [0216]-[0226] with FIGS. 7B-C: the natural language processing can be based on, e.g., ontology 760 which can be a hierarchical structure containing many nodes, each node representing either an "actionable intent" or a "property" relevant to one or more of the "actionable intents" or other "properties"; a linkage between an actionable intent node and a property node in ontology 760 can define how a parameter represented by the property node pertains to the task represented by the actionable intent node; ontology 760 can be made up of actionable intent nodes and property nodes; within ontology 760, each actionable intent node can be linked to one or more property nodes either directly or through one or more intermediate property nodes; similarly, each property node can be linked to one or more actionable intent nodes either directly or through one or more intermediate property nodes; an actionable intent node, along with its linked concept nodes, can be described as a "domain"; ontology 760 can be made up of many domains; each domain can share one or more property nodes with one or more other domains; while FIG. 7C illustrates two example domains (e.g., restaurant reservation domain 762 and reminder domain 764) within ontology 760, other domains can include, for example, "find a movie," "initiate a phone call," "find directions," "schedule a meeting," "send a message," and "provide an answer to a question," "read a list," "providing navigation instructions," "provide instructions for a task" and so on; each node in ontology 760 can be associated with a set of words and/or phrases that are relevant to the property or actionable intent represented by the node; the respective set of words and/or phrases associated with each node can be the so-called "vocabulary" associated with the node; the respective set of words and/or phrases associated with each node can be stored in vocabulary index 744 in association with the property or actionable intent represented by the node; the vocabulary index 744 can optionally include words and phrases in different languages; Natural language processing module 732 can receive the token sequence (e.g., a text string) from STT processing module 730, and determine what nodes are implicated by the words in the token sequence; if a word or phrase in the token sequence is found to be associated with one or more nodes in ontology 760 (via vocabulary index 744), the word or phrase can "trigger" or "activate" those nodes; based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 can select one of the actionable intents as the task that the user intended the digital assistant to perform; the domain that has the most "triggered" nodes can be selected; the domain having the highest confidence value ( e.g., based on the relative importance of its various triggered nodes) can be selected; additional factors are considered in selecting the node as well, such as whether the digital assistant has previously correctly interpreted a similar request from a user; natural language processing module 732 can use the user-specific information to supplement the information contained in the user input to further define the user intent).
Claim 11
PAGALLO discloses all the elements as stated in Claim 1 and further discloses wherein identifying the one or more missing entity arguments comprises detecting a presence of one or more co-references without at least one corresponding referred entity (PAGALLO, ¶¶ [0227]-[0230] with FIGS. 2B-C: once natural language processing module 732 identifies an actionable intent (or domain) based on the user request, natural language processing module 732 can generate a structured query to represent the identified actionable intent; the structured query can include parameters for one or more nodes within the domain for the actionable intent, and at least some of the parameters are populated with the specific information and requirements specified in the user request; e.g., the user may say "Make me a dinner reservation at a sushi place at 7"; according to the ontology, a structured query for a "restaurant reservation" domain may include parameters such as {Cuisine}, {Time}, {Date}, {Party Size} (i.e., total number of persons in a party), and the like; based on the speech input and the text derived from the speech input using STT processing module 730, natural language processing module 732 can generate a partial structured query for the restaurant reservation domain, where the partial structured query includes the parameters {Cuisine="Sushi"} and {Time="7 pm"}; however, in this example, the user's utterance contains insufficient information to complete the structured query associated with the domain; therefore, other necessary parameters such as {Party Size} and {Date} may not be specified in the structured query based on the information currently available; natural language processing module 732 can populate some parameters of the structured query with received contextual information; e.g., if the user requested a sushi restaurant "near me"; natural language processing module 732 can populate a {location} parameter in the structured query with GPS coordinates from the user device; natural language processing module 732 can pass the generated structured query (including any completed parameters) to task flow processing module 736 ("task flow processor"); task flow models 754 can include procedures for obtaining additional information from the user and task flows for performing actions associated with the actionable intent; in order to complete a structured query, task flow processing module 736 may need to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances; when such interactions are necessary, task flow processing module 736 can invoke dialogue flow processing module 734 to engage in a dialogue with the user; dialogue flow processing module 734 can determine how (and/or when) to ask the user for the additional information and receives and processes the user responses; e.g., when task flow processing module 736 invokes dialogue flow processing module 734 to determine the "party size" and "date" information for the structured query associated with the domain "restaurant reservation," dialogue flow processing module 734 can generate questions such as "For how many people?" and "On which day?" to pass to the user; once answers are received from the user, dialogue flow processing module 734 can then populate the structured query with the missing information, or pass the information to task flow processing module 736 to complete the missing information from the structured query; once task flow processing module 736 has completed the structured query for an actionable intent, task flow processing module 736 can proceed to perform the ultimate task associated with the actionable intent; accordingly, task flow processing module 736 can execute the steps and instructions in the task flow model according to the specific parameters contained in the structured query; e.g., using a structured query such as: { restaurant reservation, restaurant=ABC Cafe, date=3/12/2012, time=7 pm, party size=5}, task flow processing module 736 can perform the steps of: (1) logging onto a server of the ABC Cafe or a restaurant reservation system such as OPENTABLE®, (2) entering the date, time, and party size information in a form on the website, (3) submitting the form, and (4) making a calendar entry for the reservation in the user's calendar; ¶¶ [0216]-[0226] with FIGS. 7B-C: the natural language processing can be based on, e.g., ontology 760 which can be a hierarchical structure containing many nodes, each node representing either an "actionable intent" or a "property" relevant to one or more of the "actionable intents" or other "properties"; a linkage between an actionable intent node and a property node in ontology 760 can define how a parameter represented by the property node pertains to the task represented by the actionable intent node; ontology 760 can be made up of actionable intent nodes and property nodes; within ontology 760, each actionable intent node can be linked to one or more property nodes either directly or through one or more intermediate property nodes; similarly, each property node can be linked to one or more actionable intent nodes either directly or through one or more intermediate property nodes; e.g., ontology 760 can include a "restaurant reservation" node (i.e., an actionable intent node) and property nodes "restaurant," "date/time" (for the reservation), and "party size" can each be directly linked to the actionable intent node (i.e., the "restaurant reservation" node); in addition, property nodes "cuisine," "price range," "phone number," and "location" can be sub-nodes of the property node "restaurant," and can each be linked to the "restaurant reservation" node (i.e., the actionable intent node) through the intermediate property node "restaurant"; ontology 760 can also include a "set reminder" node (i.e., another actionable intent node) and property nodes "date/time" (for setting the reminder) and "subject" (for the reminder) can each be linked to the "set reminder" node; an actionable intent node, along with its linked concept nodes, can be described as a "domain"; ontology 760 can be made up of many domains; each domain can share one or more property nodes with one or more other domains; while FIG. 7C illustrates two example domains (e.g., restaurant reservation domain 762 and reminder domain 764) within ontology 760, other domains can include, for example, "find a movie," "initiate a phone call," "find directions," "schedule a meeting," "send a message," and "provide an answer to a question," "read a list," "providing navigation instructions," "provide instructions for a task" and so on; each node in ontology 760 can be associated with a set of words and/or phrases that are relevant to the property or actionable intent represented by the node; the respective set of words and/or phrases associated with each node can be the so-called "vocabulary" associated with the node; the respective set of words and/or phrases associated with each node can be stored in vocabulary index 744 in association with the property or actionable intent represented by the node; the vocabulary index 744 can optionally include words and phrases in different languages; Natural language processing module 732 can receive the token sequence (e.g., a text string) from STT processing module 730, and determine what nodes are implicated by the words in the token sequence; if a word or phrase in the token sequence is found to be associated with one or more nodes in ontology 760 (via vocabulary index 744), the word or phrase can "trigger" or "activate" those nodes; based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 can select one of the actionable intents as the task that the user intended the digital assistant to perform; the domain that has the most "triggered" nodes can be selected; the domain having the highest confidence value ( e.g., based on the relative importance of its various triggered nodes) can be selected; additional factors are considered in selecting the node as well, such as whether the digital assistant has previously correctly interpreted a similar request from a user; natural language processing module 732 can use the user-specific information to supplement the information contained in the user input to further define the user intent).
Claim 12
PAGALLO discloses all the elements as stated in Claim 1 and further discloses wherein software implementing the method is provided as a service in a cloud environment (PAGALLO, ¶ [0036]: server system 108 can also employ various virtual devices and/or services of third-party service providers (e.g., third-party cloud service providers) to provide the underlying computing resources and/or infrastructure resources of server system 108; ¶ [0064]: the software components stored in memory 202 include Digital Assistant Client Module 229; ¶ [0195] with FIG. 7A: the various components shown in FIG. 7A can be implemented in hardware, software instructions for execution by one or more processors, firmware, including one or more signal processing and/or application specific integrated circuits, or a combination thereof).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 5 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over PAGALLO in view of Ross et al. (US 2019/0179940 A1, pub. date on 06/13/2019), hereinafter Ross.
Claims 5 and 17
PAGALLO discloses all the elements as stated in Claims 1 and 13 respectively and further discloses wherein performing the one or more automated actions comprises performing a task (PAGALLO, ¶¶ [0230]-[0235] with FIG. 7B: once task flow processing module 736 has completed the structured query for an actionable intent, task flow processing module 736 can proceed to perform the ultimate task associated with the actionable intent; task flow processing module 736 can execute the steps and instructions in the task flow model according to the specific parameters contained in the structured query; e.g., using a structured query such as: { restaurant reservation, restaurant=ABC Cafe, date=3/12/2012, time=7 pm, party size=5}, task flow processing module 736 can perform the steps of: (1) logging onto a server of the ABC Cafe or a restaurant reservation system such as OPENTABLE®, (2) entering the date, time, and party size information in a form on the website, (3) submitting the form, and (4) making a calendar entry for the reservation in the user's calendar; task flow processing module 736 can employ the assistance of service processing module 738 ("service processing module") to complete a task requested in the user input or to provide an informational answer requested in the user input; e.g., service processing module 738 can act on behalf of task flow processing module 736 to make a phone call, set a calendar entry, invoke a map search, invoke or interact with other user applications installed on the user device, and invoke or interact with third-party services (e.g., a restaurant reservation portal, a social networking website, a banking portal, etc.); the protocols and application programming interfaces (API) required by each service can be specified by a respective service model among service models 756; service processing module 738 can access the appropriate service model for a service and generate requests for the service in accordance with the protocols and APIs required by the service according to the service model; natural language processing module 732, dialogue flow processing module 734, and task flow processing module 736 can be used collectively and iteratively to infer and define the user's intent, obtain information to further clarify and refine the user intent, and finally generate a response (i.e., an output to the user, or the completion of a task) to fulfill the user's intent; the generated response can be a dialogue response to the speech input that at least partially fulfills the user's intent; further, the generated response can be output as a speech output, and in these examples, the generated response can be sent to speech synthesis module 740 (e.g., speech synthesizer) where it can be processed to synthesize the dialogue response in speech form; the generated response can be data content relevant to satisfying a user request in the speech input).
PAGALLO fails to explicitly disclose wherein performing the one or more automated actions comprises training the at least one artificial intelligence technique using at least a portion of the input query.
Ross teaches a system and a method for performing a query (Ross, ¶¶ [0001]-[0003]), wherein performing the one or more automated actions comprises training the at least one artificial intelligence technique using at least a portion of the input query (Ross, ¶¶ [0005]-[0015]: generating a trained machine learning model "on the fly"
based on a search query; determining training instances based on a received search query, training a machine learning model based on the determined training instances, and providing machine learning model output in response to the received search query-where the machine learning model output is generated based on the trained machine learning model; assume a user interacts with a client device to submit a query of "How many doctors will there be in China in 2050?" to a search engine; the query can be parsed to determine one or more entities referenced in the search query, and at least one particular parameter, of the one or more entities, that is sought by the search query; if it is determined that no entry is available, or there is no known value defined for the entry, variations of the parameter can be generated, and the structured database(s) queried based on the variations and the one or more entities to determine variation values for the variations; training instances for a machine learning model can then be generated based on the variation parameters and their corresponding values, and the machine learning model trained utilizing the training instances; accordingly, the machine learning model can be trained to enable processing, using the trained machine learning model, of a provided year to predict a quantity of doctors in China based on the provided year; after the machine learning model is trained, machine learning model output that is based on the trained machine learning model can be provided in response to the search query; the prediction provided as machine learning model output for presentation to the user is based on one or more predicted values generated over the machine learning model; the prediction provided as machine learning model output for presentation to the user is (or indicates) multiple values, such as a predicted range of the quantity of doctors in China; e.g., a machine learning model can be trained to enable prediction of probabilities for each of X classes, and the machine learning model output can include indications of multiple of those classes along with an indication of the corresponding probability for each of the classes; as another working example, a user can be an employee of an amusement park and submit a query of "How many snowcones will we sell tomorrow," which cannot be known with certainty; the private database can be queried based on the "snowcone sales" entity and variations of the parameter "tomorrow"; e.g., the private database can be queried to identify snowcone sale values (e.g., an actual quantity sold) for multiple preceding days; additional parameters and corresponding values can be identified based on the querying, such as additional parameters that are associated with the snowcone sales for the preceding days, which could include, e.g., weather information for the previous days, attendance information for the previous days, etc.; training instances can then be generated based on the identified values, and the predictive model trained based on the training instances; when machine learning model output is provided in response to a query after training of the machine learning model "on the fly", it is understood that a delay may be present between the submission of the query and the provision of the machine learning model output; nonetheless, the machine learning model output is still provided for presentation at a computing device of a user that submitted the search query, and is provided based on the user having submitted the search query; ¶ [0020]: generating training instances and/or training a machine learning model "on the fly", and optionally without any human intervention; ¶ [0036]: the machine learning model is one previously automatically trained in response to a previous search query, the previous automatic training of the machine learning model can include training the machine learning model based on training instances automatically generated in response to the previous search query; ¶¶ [0074]-[0081] with FIGS. 1 and 5: at step 505, a submitted search query is parsed to determine one or more entities and one or more parameters begin sought by the search query; the search query can be parsed into one or more tokens, which then can be utilized to identify one or more entities with aliases of the tokens and/or that are otherwise associated with entities in a knowledge graph or entity database; parsing engine 112 identifies one or more tokens and/or entities as a parameter that is being sought by the query; at step 510, one or more structured databases are queried to identify documents and/or values responsive to the parameter sought by the search query; at step 515, search engine 114 determines that a known value for the sought parameter is not defined and/or is not locatable; at step 520, variations of the parameter are determined and utilized to search for values in the resources; at step 525, training instances for a machine learning model are generated based on the variations, entities, and values; at step 530, the training instances are utilized to train the model; training instance input of a training instance can be processed using the machine learning model to generate a training prediction, and the training prediction compared to training instance output of the training instance to determine an error, and the error can be back propagated over the machine learning model to update weights of the machine learning model (e.g., learned weights of perceptrons in a neural network model); this can be performed for each of the training instances (or until other training criterion/criteria are satisfied) to iteratively update the weights of the machine learning model; at step 535, output from the trained machine learning model is provided to the client device).
PAGALLO and Ross are analogous art because they are from the same field of endeavor, a system and a method for performing a query. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Ross to PAGALLO. Motivation for doing so would reduce the use/consumption of various resources, such as resources that would otherwise be consumed in association with submission of the additional varied queries (PAGALLO, ¶¶ [0002] and [0019]-[0020]).
Response to Arguments
Applicant's arguments filed 01/20/2026 have been fully considered but they are not persuasive.
Applicant argues on Pages 8-10 of the Remarks that (1) Pagallo fails to teach the limitations of updating at least a portion of the one or more missing entity arguments by inferring data from at least a portion of the one or more partial query signals using at least one artificial intelligence technique, arranged as recited in the independent claims; (2) Pagallo fails to teach or disclose a "backend query" in any capacity, let alone an active step of generating a backend query to be processed by an artificial intelligence-based question answering system, and further let alone performing such a generation step explicitly by "integrating the input query and the at least a portion of the one or more missing entity arguments updated with the inferred data," as required by the amended independent claims.
In response, examiner respectfully disagrees. Pagallo discloses in ¶¶ [0030] and [00216]-[0235] with FIGS. 7B-C that (1) the user may say "Make me a dinner reservation at a sushi place at 7"; (2) . according to the ontology, a structured query for a "restaurant reservation" domain may include parameters such as {Cuisine}, {Time}, {Date}, {Party Size}, and the like; (3) based on the speech input and the text derived from the speech input using STT processing module 730, natural language processing module 732 can generate a partial structured query for the restaurant reservation domain, where the partial structured query includes the parameters {Cuisine="Sushi"} and {Time="7 pm"}; (4) natural language processing module 732 can populate some parameters of the structured query with received contextual information; e.g., if the user requested a sushi restaurant "near me"; natural language processing module 732 can populate a {location} parameter in the structured query with GPS coordinates from the user device; (5) natural language processing module 732 can pass the generated structured query (including any completed parameters) to task flow processing module 736 ("task flow processor"); (6) task flow processing module 736 can be configured to receive the structured query from natural language processing module 732, complete the structured query, if necessary, and perform the actions required to "complete" the user's ultimate request; (7) in order to complete a structured query, task flow processing module 736 may need to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances; (8) when such interactions are necessary, task flow processing module 736 can invoke dialogue flow processing module 734 to engage in a dialogue with the user, wherein the dialogue flow processing module 734 can determine how (and/or when) to ask the user for the additional information and receives and processes the user responses; e.g., dialogue flow processing module 734 can generate questions such as "For how many people?" and "On which day?" to pass to the user; (9) once answers are received from the user, dialogue flow processing module 734 can then populate the structured query with the missing information, or pass the information to task flow processing module 736 to complete the missing information from the structured query; (10) once task flow processing module 736 has completed the structured query for an actionable intent, task flow processing module 736 can execute the steps and instructions in the task flow model according to the specific parameters contained in the structured query; e.g., using a structured query such as: { restaurant reservation, restaurant=ABC Cafe, date=3/12/2012, time=7 pm, party size=5}, task flow processing module 736 can perform the steps of: (a) logging onto a server of the ABC Cafe or a restaurant reservation system such as OPENTABLE®, (b) entering the date, time, and party size information in a form on the website, (c) submitting the form, and (d) making a calendar entry for the reservation in the user's calendar; and (11) task flow processing module 736 can employ the assistance of service processing module 738 ("service processing module") to complete a task requested in the user input or to provide an informational answer requested in the user input; e.g., service processing module 738 can act on behalf of task flow processing module 736 to make a phone call, set a calendar entry, invoke a map search, invoke or interact with other user applications installed on the user device, and invoke or interact with third-party services (e.g., a restaurant reservation portal, a social networking website, a banking portal, etc. Therefore, Pagallo teaches the followings: (A) updating at least a portion of the one or more missing entity arguments (e.g., "location", "party size", and "date") by inferring data from at least a portion of the one or more partial query signals (e.g., "location" is inferred from "a sushi place near me" using contextual information and "party size" and "date" is inferred from additional dialogue with the user) using at least one artificial intelligence technique (e.g., "Natural language processing module" 732, "Ontology" 760, "Task Flow Processing Module" 736, and "Dialogue Flow Processing Module" 734 to identify intent, generate/construct "a partial structured query" (see arrow between "natural language processing module" 732 and "task flow processing module" 736 in FIG.7B), and derived/requested missing parameters to complete the "structured query"); and (B) generating at least one query (e.g., "a structured query" initially generated by "Natural Language processing Module" 732 and completed by "Task Flow Processing Module" 736 as { restaurant reservation, restaurant=ABC Cafe, date=3/12/2012, time=7 pm, party size=5}, wherein "ABC Café" is inferred from "a sushi place near me", "3/12/2012" and "5" are inferred from the responses to questions "On which day?" and "For how many people?" generated by "Dialogue Flow Processing Module" 734) to be processed by the artificial intelligence-based question answering system (e.g., "Task Flow Processing Module" 736 and "Service Processing Module" 738 of the "Digital Assistant"/"Intelligent Automated Assistant" 700) by integrating the input query (e.g., "Make me a dinner reservation at a sushi place at 7") and the at least a portion of the one or more missing entity arguments (e.g., "location", "party size", and "date") updated with the inferred data (e.g., e.g., "location" is inferred from "a sushi place near me" using contextual information and "party size" and "date" are inferred from additional dialogue with the user). Pagallo further discloses in ¶¶ [0032]-[0039] and [0195] with FIGS. 1 and 7A-B that (1) the digital assistant can include client-side portion 102 (hereafter "DA client 102") executed on user device 104 and server-side portion 106 (hereafter "DA server 106") executed on server system 108; (2) DA client 102 can provide client-side functionalities such as user-facing input and output processing and communication with DA server 106; (3) DA server 106 can include client-facing I/O interface 112, one or more processing modules 114, data and models 116, and I/O interface to external services 118, wherein (a) the client-facing I/O interface 112 can facilitate the client-facing input and output processing for DA server 106; and (b) one or more processing modules 114 can utilize data and models 116 to process speech input and determine the user's intent based on natural language input; (4) the divisions of functionalities between the client and server portions of the digital assistant can vary in different implementations; e.g., the DA client can be a thin-client that provides only user-facing input and output processing functions, and delegates all other functionalities of the digital assistant to a backend server; (5) digital assistant system 700 can be an implementation of server system 108 (and/or DA server 106) shown in FIG. 1; (6) memory 702 of digital assistant system 70 can store digital assistant module 726 (or the server portion of a digital assistant), which can include input/output processing module 728, speech-to-text (STT) processing module 730, natural language processing module 732, dialogue flow processing module 734, task flow processing module 736, service processing module 738, and speech synthesis module 740, wherein each of these modules can have access to one or more of the following systems or data and models of the digital assistant module 726: ontology 760, vocabulary index 744, user data 748, task flow models 754, service models 756, and ASR systems; (7) I/O processing module 728 can interact with the user through I/O devices 716 in FIG. 7A or with a user device (e.g., devices 104, 200, 400, or 600) through network communications interface 708 in FIG. 7A to obtain user input (e.g., a speech input) and to provide responses (e.g., as speech outputs) to the user input. I/O processing module 728 can optionally obtain contextual information associated with the user input from the user device, along with or shortly after the receipt of the user input; and (8) I/O processing module 728 can also send follow-up questions to, and receive answers from, the user. In other words, (a) server system 108 (and/or DA server 106) is a backend server including one or more processing modules 114 to process speech input and determine the user's intent based on natural language input; and (b) digital assistant system 700 is equivalent to a backend server – server system 108 (and/or DA server 106), wherein I/O processing module 728 is equivalent to the client-facing I/O interface 112 and rest of modules (e.g., 730, 731, 732, 734, 736, 738, and 740) in FIG. 7B are equivalent to one or more processing modules 114. Since the input query (e.g., speech "Make me a dinner reservation at a sushi place at 7" obtained from DA client 102 by I/O processing module 728) is first processed by "STT Processing Module" 730 to convert "Speech" to "Text", then processed by "Natural Language Processing Module" 732, "Task Flow Processing Module" 736, and "Dialogue Flow Processing Module" 734 to generate a completed structured query (e.g., { restaurant reservation, restaurant=ABC Cafe, date=3/12/2012, time=7 pm, party size=5 }) from a partial structured query (e.g., { restaurant reservation, restaurant=sushi place, date=?, time=7 pm, party size=? }), and finally the completed structured query is processed by "Task Flow Processing Module" 736 and "Service Processing Module" 738 in the Digital Assistant 700 (i.e., a backend server – server system 108 (and/or DA server 106)), "the completed structured query" generated by "Natural Language Processing Module" 732, "Task Flow Processing Module" 736, and "Dialogue Flow Processing Module" 734 is indeed a "backend query". Therefore, Pagallo DOES disclose "updating at least a portion of the one or more missing entity arguments by inferring data from at least a portion of the one or more partial query signals using at least one artificial intelligence technique" and "generating at least one backend query to be processed by the artificial intelligence-based question answering system by integrating the input query and the at least a portion of the one or more missing entity arguments updated with the inferred data" as recited in the independent claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Gragnani et al. (US 2021/0382923 A1, pub. date: 12/09/2021) discloses in ABSTRACT that (1) systems and methods for natural language interpretation, wherein a user's ambiguous natural language question or command is transformed into the most relevant understood query or list of queries; (2) the query may be executed against a system of record to retrieve the answer to the user's question or command; (3) systems and methods for natural language generation, wherein abstract query expressions may be transformed into either question texts or answer texts or both; (4) systems and methods for procedural generation of training data, wherein configurations defined by data elements provided by the system of record are transformed into a large enough number of question/answer examples necessary to train the query model; and (5) systems and methods allow an interpretation system and method to produce useful results even if authorized users have added no explicit examples of question/answer pairs. Gragnani further discloses in ¶¶ [0031]-[0033] and [0038]-[0059] with FIGS. 1-5 that (1) an example system (operating environment) 100 including a frontend system 102 and a backend system 104, wherein the frontend system 102 can comprise a computing device 101, such as a frontend computer 106, and the backend system 104 can also comprise a computing device 101, such as a backend computer 110; (3) the computer 110/106 can operate in a networked environment 140 using logical connections between the frontend 102, backend system 104, and data resource 126; (4) the queries input by the user on the user interface 112 may use services and native calls of backend system 104; (5) in initiating a query from the user 14, the frontend 102 components process a query in either a discovery flow or search flow; (6) as shown in FIG. 2, the user interface 112 depicts a Search flow paradigm 200, wherein in using the search box 202 on the user interface 112, the user can enter data into the search box, similar to a search engine; (7) the backend server 104 may assist the user by predictively providing potential queries that the user may seek an answer; (8) after generating training questions, the training questions can be indexed into a storage of the backend of the system or in the front end, wherein the indexed questions can be included in an autocomplete functionality of an API hosted on the backend 106; (9) upon the user's providing a partial question request, the autocomplete API can search from questions indexed in storage; (10) the autocomplete API can provide a plurality of possible questions via a dropdown menu beneath the search box; (11) the configuration API 120 can receive a user's partial question, wherein the partial question in the form of text data can also include metadata objects which can include previously completed phrase metadata; (12) the configuration API 120 can provide a list of sentence fragments containing the original question text along with a predicted phrase; (13) based on the total question phrase, the user can choose to select the predicted phrase or further continue to provide text for a potential query; (14) the query can be processed in a discovery flow as depicted in the user interface of FIG. 3, wherein the discovery flow can be used when users do not already have in mind a current line of questioning; (15) the discovery flow can provide a method of asynchronous query recommendation or a method of discovering trending topics; (16) a secondary chatbot can be operating in the backend system 104 and allows the administrator 16 to control activities between the backend software modules; (17) the question application 108 can transmit the user question 15 via the network to a request handler 113 in backend system 104; (18) at the conclusion of the overall search, the results from the backend 104 to the frontend 102 can be displayed on the display device 119, and the question application 108 can provide the compiled query; (19) at step 502, the user 14 enters their question into user interface 112; (20) at step 504, control flow of the request data passes from the client-side user application 108 to the back end (server-side) request handler 113, through submission and handling of question/answer request 105; (21) at step 506, the request data passes to Interpreter 130, which uses the query model 1002 to generate multiple interpretations; (22) at step 508, Executor 132 builds and executes a query for each interpretation; (23) at step 510, Formatter 134 expresses each query result as a natural language answer; and (24) at step 512, the request handler 113 returns results to the user question 15.
Bostick et al. (US 2022/0067766 A1, filed on 11/12/2021) discloses in ¶¶ [0091]-[0092] and [0095]-[0112] with FIGS. 3-7 that (1) component 328 performs the evaluation of whether a given product or product combination satisfies the T &C of a given offer, and is therefore a valid combination; (2) component 328 identifies a product that is missing from a combination and is therefore preventing the combination from being a valid combination, or is otherwise needed to satisfy the T &C of an offer; (3) component 330 locates a missing or needed product using the product information supplied by retailing backend system 316 in response 4; (4) the product location information may be supplied automatically with a product identifier, or component 330 may query retailing backend system 316 with a product identifier to obtain the product's location in the store; (5) retailing backend system 316 also provides a store layout in a suitable form and format relative to which component 330 resolves the product location data into a physical location of the product in the store (block408); (6) the application queries a retailing backend system for products that satisfy the T &C, and the locations of those products; (7) the application determines whether an offer is found to be applicable to the product (block 508); (8) if an offer is found ("Yes" path of block 508), the application queries a retailing backend system for other products that can be combined with the product to satisfy the T &C of the found offer, and the locations of such other products (block 510); (9) the application queries a retailing backend system for other products that can be combined with the product to satisfy the T &C of the found offer, and the locations of such other products (block 618); (10) the application obtains the T &C of the current offers available in the store, such as by querying a retailing backend system (block 704); and (11) the application queries the retailing backend system for products that satisfy the T&C of one or more offers, and the locations of those products (block 706).
Coffman et al. (US 2007 /0043574 A1, pub. date: 02/22/2007) discloses in ¶ [0187] with FOG.6 that (1) a conversational input interface can process multi-modal input, that is, files/streams/resources, speech via a phone 600, keyboard 601, pointing devices 602, handwriting devices 603, including natural interfaces; i.e., all the input and output events across all the modalities are caught and transferred to the dialog manager (that also stores it appropriately in the context stack); (2) spoken input from a speech client (e.g., telephone 600) is subject to a speech recognition process 604 and other input (e.g., keyboard, mouse clicks etc.) are subject to NLU processing 605; (3) each input is subject to attribute acquisition (401a) whereby the attribute value n-uples are acquired from the input; (4) a summarization process 401b is performed whereby the attribute value n-uples are added to the context and then verifies with the syntax of the back-end application 608 whether the query is complete, incomplete, or ambiguous; (5) the backend accesses are also tracked by the dialog manager and the context manager; (6) it is sometimes possible to distribute some of the "intelligence" to the backend by loading some disambiguation capabilities (a feature of the dialog manager) to the backend; (7) individually, each input stream behaves the conventionally; (8) the key conversational aspect is in the input procedure wherein commands can be entered in NLU (to provide natural language understanding of input queries) or in FSG mode (for constrained input according to rules: grammar and vocabulary, as opposed to free natural input); (9) commands or queries can be completed or corrected by completing missing fields or by correcting incorrect fields for the active task; (10) as such, the CVM introduces new issues not met with conventional OS: simultaneous input streams to be merged which create input ambiguity; e.g., input may now combine input keyed on the keyboard, handwritten input and speech input, not to mention possible input from re-directed streams; and (11) therefore, provide a mechanism
to resolve any ambiguity.
BADE et al, (US 2022/0051665 A1, filed on 08/11/2020) discloses in ABSTRACT and ¶¶ [0018]-[0020], [0023]-[0030], and [0038]-[0041] with FIGS. 1 and 5A-B that (1) an Artificial Intelligence (AI) based user intent analyzer system analyzes a received user query to determine an intent of the user query and enables execution of processes on a backend system based on the user intent; (2) the user query is divided into a plurality of portions and the portions with extraneous information are discarded; (3) the remaining portions are parsed with a plurality of parsers; (4) the output from the plurality of parsers is processed for extraction of the entities , entity attributes , verbs , and verb arguments; (5) the entities and verbs are further filtered using knowledge graphs and the remaining entities and verbs are mapped to sub-intents using intent mapping rules retrieved from the knowledge graphs; (6) the sub-intents are mapped to the final intent using process rules associated with the process to be executed by the backend system in response to the user query; (7) the intent can pertain to a keyword or other trigger that initiates the process within the backend system; (8) an Al-based user query intent analysis system 100 receives a user query 150 pertaining to a process that can be executed by a backend system 160, wherein the user query 150 can be received via various modalities including a voice input, a question put to a chat bot, a request provided through a graphical user interface (GUI), an email, as a video input, etc.; (9) the intent analysis system 100 analyzes the user query 150 to identify an intent 166 (i.e. , the final intent) of the user query 150, wherein the intent 166 of the user query 150 is used to determine if the process can be executed on the backend system 160 and the output to be produced upon the execution the process on the backend system 160; (10) the intent analysis system 100 includes a query preprocessor 102, a user query parser 104, a facts-promise resolution (FPR) identifier 106, an entity analyzer 108, a verb argument analyzer 112, a knowledge processor 114 and an intent extractor 116; (11) when the user query 150 is received in modalities other than textual input, e.g., as a voice input or as a video, the received input can be initially converted into textual format by a query preprocessor 102 using a voice to text Application Programming Interfaces (APIs); (12) the textual format of the user query 150 is then provided to the user query parser 104, wherein the user query parser 104 generates multiple outputs corresponding to different parses generated by a plurality of parsers which can include a constituency parser 142, a dependency parser 144 and an SRL parser 146; (13) the constituency parser 142 produces a syntactic parse which includes phrases in the user query 150; (14) the dependency parser 144 produces an output that includes the relationships between the individual words in the user query 150; (15) the SRL parser 146 identifies the action words in the user query 150 and produces an output identifying the subject and the object of each of the verbs , whether the verb is negated , and other information related to the verb; (16) the output from the SRL parser 146 is provided to the FPR identifier 106 which identifies a plurality of portions from the user query 150, wherein the plurality of portions include at least a fact portion, a promise portion, a resolution portion and a redundant information portion, wherein the facts portion pertains to one or more facts conveyed in the user query 150, the promise portion pertains to one or more promises made to the user or one or more expectations of the user based on data conveyed to the user while the resolution portion can pertain to the resolution expected by the user or the outcome desired by the user in response to the user query 150; (17) certain portions of the user query 150 may be identified as redundant information which includes information that is not required for intent identification or for the execution of the backend process; (18) hence, a subset of the plurality of portions that includes only the facts portion, the promise portion and the resolution portion of the user query 150 can be selected for further analysis while the redundant information portion of the user query 150 is discarded from further processing/consideration; (19) the plurality of portions which are identified as one of facts, promise and request are further processed by an entity analyzer 108 for identification and processing of the entities present therein; (20) the output of the FPR identifier 106 is additionally passed on to a verb argument analyzer 112 which is configured to identify the verbs in the user query 150 and the various arguments or properties of the verbs in the user query 150; (21) the outputs from the entity analyzer 108 and the verb argument analyzer 112 are provided to a knowledge processor 114 which can be configured to determine if the entities and/or the entity attributes, the verbs 188 and the verb arguments are included in a knowledge base 152 which contains the domain knowledge associated with the processes executed by the backend system 160; (22) the intent analysis system 100 is further configured to filter entities or verbs that are irrelevant to the backend system processes via the knowledge processor 114; (23) the intent extractor 116 identifies one or more of the intent mapping rules 164 to be applied to the received data in order to extract at least an intent 166 and one or more sub-intents 168, wherein the intent 166 can enable identifying the process to be executed by the backend system 160 while the sub-intents 168 can identify one or more sub-processes or provide arguments/values to the process to be executed by the backend system 160; and (24) the backend system 160 may approve or reject the request or resolution in the user query 150 based on the validation of the entities 184 and the verbs 188 from the facts portion and the promise portion based on corresponding process rules.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HWEI-MIN LU whose telephone number is (313)446-4913. The examiner can normally be reached Mon - Fri: 9:00 AM - 6:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D. Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HWEI-MIN LU/Primary Examiner, Art Unit 2142