DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Amendment filed December 29, 2025, has been entered. Claims 1 – 6, 8 – 16 and 18 – 21 are pending in the application.
Response to Arguments
Applicant’s arguments, filed December 29, 2025, regarding the 35 U.S.C. 103 rejections of claims 1 – 6, 8 – 16 and 18 – 21 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 4, 6, 9 – 11, 14, 16 and 19 – 21 are rejected under 35 U.S.C. 103 as being unpatentable over Gruber et al. (US Patent No. 10,706,841), hereinafter Gruber, in view of Aly et al. (US Patent No. 11,314,941), hereinafter Aly, Al Hasan et al. (US Patent No. US 11,068,660), hereinafter Al Hasan, and Gadde et al. (US Patent No. 11,804,219), hereinafter Gadde.
Regarding claim 1, Gruber discloses an electronic device, comprising:
at least one processor including processing circuitry (Column 9, line 23, "CPU 62 may include one or more processor(s) 63”),
memory storing instructions (Column 9, lines 35-37, "Memory block 61 may be used for a variety of purposes such as, for example, caching and/or storing data, programming instructions") that, when executed by the at least one processor individually or collectively, cause the electronic device to:
obtain, via natural language understanding logic of the electronic device, intent information corresponding to an utterance of a user (Column 55, lines 26-29, "In step 710, a speech input utterance is obtained and a speech-to-text component (such as component described in connection with FIG. 22) interprets the speech to produce a set of candidate speech interpretations 712."; Column 55, lines 41-44, "In step 714, the candidate speech interpretations 712 are sent to a language interpreter 1070, which may produce representations of user intent 716 for at least one candidate speech interpretation 712."; Candidate speech interpretations representing user intents reads on intent information.),
obtain information associated with the utterance (Column 39, lines 7-31, "In at least one embodiment, language interpreter component(s) 1070 of assistant 1002 may be operable to perform and/or implement various types of functions, operations, actions, and/or other features such as, for example, one or more of the following (or combinations thereof): Analyze user input and identify a set of parse results. User input can include any information from the user and his/her device context that can contribute to understanding the user's intent, which can include, for example one or more of the following (or combinations thereof): sequences of words, the identity of gestures or GUI elements involved in eliciting the input, current context of the dialog, current device application and its current data objects, and/or any other personal dynamic data obtained about the user such as location, time, and the like. For example, in one embodiment, user input is in the form of the uniform annotated input format 2690 resulting from active input elicitation 1094. Parse results are associations of data in the user input with concepts, relationships, properties, instances, and/or other nodes and/or data structures in models, databases, and/or other representations of user intent and/context."; Analyzing user input, where the user input can include any information from the user and device context that can contribute to understanding the user's intent, reads on obtain information associated with the utterance.),
preprocess the utterance obtained through a conversion of the utterance into a form of text based on the information (Column 77, lines 26-51, "In one embodiment, paraphrase and prompt are generated using any relevant context data. For example, any of the following data items can be used, alone or in combination: The parse—a tree of ontology nodes bound to their matching input tokens, with annotations and exceptions. For each node in the parse, this may include the node's metadata and/or any tokens in the input that provide evidence for the node's value. The task, if known The selection class. The location constraint, independent of selection class. Which required parameters are unknown for the given selection class (e.g., location is a required constraint on restaurants). The name of a named entity in the parse that is an instance of the selection class, if there is one (e.g., a specific restaurant or movie name.) Is this a follow-up refinement or the beginning of a conversation? (Reset starts new conversation.) Which constraints in the parse are bound to values in the input that changed their values? In other words, which constraints were just changed by the latest input? Is the selection class inferred or directly stated? Sorted by quality, relevance, or proximity? For each constraint specified, how well was it matched? Was refinement entered as text or clicking?"; Generating a paraphrase and prompt using any relevant context data reads on preprocessing the utterance obtained through a conversion of the utterance into a form of text based on the information.),
calculate an intent reliability [probability value] of the intent information (Column 29, line 59 - Column 30, line 20, "A ranking component analyzes the candidate interpretations 124 and ranks 126 them according to how well they fit syntactic and/or semantic models of intelligent automated assistant 1002. Any sources of constraints on user input may be used. For example, in one embodiment, assistant 1002 may rank the output of the speech-to-text interpreter according to how well the interpretations parse in a syntactic and/or semantic sense, a domain model, task flow model, and/or dialog model, and/or the like: it evaluates how well various combinations of words in the text interpretations 124 would fit the concepts, relations, entities, and properties of active ontology 1050 and its associated models. For example, if speech-to-text service 122 generates the two candidate interpretations “italian food for lunch” and “italian shoes for lunch”, the ranking by semantic relevance 126 might rank “italian food for lunch” higher if it better matches the nodes assistant's 1002 active ontology 1050 (e.g., the words “italian”, “food” and “lunch” all match nodes in ontology 1050 and they are all connected by relationships in ontology 1050, whereas the word “shoes” does not match ontology 1050 or matches a node that is not part of the dining out domain network). In various embodiments, algorithms or procedures used by assistant 1002 for interpretation of text inputs, including any embodiment of the natural language processing procedure shown in FIG. 28, can be used to rank and score candidate text interpretations 124 generated by speech-to-text service 122."; Ranking candidate interpretations by semantic relevance reads on calculate an intent reliability of the intent information.),
in response to an unclear intent of the utterance of the user, determined by the intent reliability probability value being less than a threshold value, generate a plurality of augmented sentence candidates comprising at least a portion of the utterance, each of the plurality of augmented sentence candidates generated based on a result of the preprocessing and [a different one of ] a plurality of language models (Column 11, lines 59-65, "If the user's requests are ambiguous or need further clarification, assistant 1002 can use the various techniques described herein, including active elicitation, paraphrasing, suggestions, and the like, to obtain the needed information so that the correct services 1340 are called and the intended action taken."; Column 40, lines 21-28, "In one embodiment, if ranking component 126 determines 128 that the highest-ranking speech interpretation from interpretations 124 ranks above a specified threshold, the highest-ranking interpretation may be automatically selected 130. If no interpretation ranks above a specified threshold, possible candidate interpretations of speech 134 are presented 132 to the user. The user can then select 136 among the displayed choices."; Column 30, line 61 - Column 31, line 2, "In at least one embodiment, candidate text interpretations 124 are generated by paraphrasing speech interpretations in terms of their semantic meaning. In some embodiments, there can be multiple paraphrases of the same speech interpretation, offering different word sense or homonym alternatives. For example, if speech-to-text service 122 indicates “place for meet”, the candidate interpretations presented to the user could be paraphrased as “place to meet (local businesses)” and “place for meat (restaurants)”."; Column 29, lines 33-51, "In one embodiment, assistant 1002 employs statistical language models to generate candidate text interpretations 124 of speech input 121. In addition, in one embodiment, the statistical language models are tuned to look for words, names, and phrases that occur in the various models of assistant 1002 shown in FIG. 8. For example, in at least one embodiment the statistical language models are given words, names, and phrases from some or all of: domain models 1056 (e.g., words and phrases relating to restaurant and meal events), task flow models 1086 (e.g., words and phrases relating to planning an event), dialog flow models 1087 (e.g., words and phrases related to the constraints that are needed to gather the inputs for a restaurant reservation), domain entity databases 1072 (e.g., names of restaurants), vocabulary databases 1058 (e.g., names of cuisines), service models 1088 (e.g., names of service provides such as OpenTable), and/or any words, names, or phrases associated with any node of active ontology 1050."; The user's request being ambiguous or needing further clarification reads on an unclear intent of the utterance of the user, determining that no candidate interpretation ranks above a specified threshold reads on the intent reliability probability value being less than a threshold value, presenting possible candidate interpretations of speech to the user, where the candidate interpretations are generated by paraphrasing speech interpretations, reads on generating a plurality of augmented sentence candidates comprising at least a portion of the utterance based on a result of the preprocessing, and employing statistical language models reads on generating augmented sentence candidates based on a plurality of language models.),
and provide a response to the user based on the augmented sentence candidates (Column 40, lines 21-28, "In one embodiment, if ranking component 126 determines 128 that the highest-ranking speech interpretation from interpretations 124 ranks above a specified threshold, the highest-ranking interpretation may be automatically selected 130. If no interpretation ranks above a specified threshold, possible candidate interpretations of speech 134 are presented 132 to the user. The user can then select 136 among the displayed choices."; Presenting possible candidate interpretations of speech to the user reads on providing a response to the user based on the augmented sentence candidates.),
wherein the preprocessing of the utterance comprises performing named entity recognition on the utterance and matching recognized named entities with the associated information (Column 77, lines 26-51, "In one embodiment, paraphrase and prompt are generated using any relevant context data. For example, any of the following data items can be used, alone or in combination: The parse—a tree of ontology nodes bound to their matching input tokens, with annotations and exceptions. For each node in the parse, this may include the node's metadata and/or any tokens in the input that provide evidence for the node's value. The task, if known The selection class. The location constraint, independent of selection class. Which required parameters are unknown for the given selection class (e.g., location is a required constraint on restaurants). The name of a named entity in the parse that is an instance of the selection class, if there is one (e.g., a specific restaurant or movie name.) Is this a follow-up refinement or the beginning of a conversation? (Reset starts new conversation.) Which constraints in the parse are bound to values in the input that changed their values? In other words, which constraints were just changed by the latest input? Is the selection class inferred or directly stated? Sorted by quality, relevance, or proximity? For each constraint specified, how well was it matched? Was refinement entered as text or clicking?"; Determining the name of named entities in the parse reads on performing named entity recognition on the utterance, and matching a tree of ontology nodes to input tokens with annotations and exceptions reads on matching named entities with the associated information.).
Gruber does not specifically disclose: calculate an intent reliability probability value of the intent information.
Aly teaches:
calculate an intent reliability probability value of the intent information (Column 33, lines 12-32, "In the intent-classification submodule 520, using convolutions and pooling, the assistant system 140 executing on the client system 130 may generate one representation for the user input 510 and this representation may be projected into the intents space. In particular embodiments, the assistant system 140 may determine one or more intents associated with the user input 510 by analyzing the one or more word-embeddings based on the CNN model. Determining the one or more intents may comprise the following steps. The assistant system 140 may first generate, by the one or more convolutional layers 502 and one or more pooling layers 504 of the CNN model, a feature representation for the user input 510 based on the one or more word-embeddings. The assistant system 140 may then calculate, by one or more linear layers 508 of the CNN model, a plurality of probabilities corresponding to a plurality of intents based on the feature representation. In particular embodiments, the plurality of intents may be considered as candidate intents in the intents space. Each probability may indicate a likelihood that a corresponding intent is associated with the user input 510."; Calculating a plurality of probabilities indicating a likelihood that a corresponding intent is associated with the user input reads on calculating an intent reliability probability value of the intent information.).
Aly is considered to be analogous to the claimed invention because it is in the same field of dialog systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruber to incorporate the teachings of Aly to calculate intent reliability of intent information as disclosed by Gruber, where the calculated intent reliability is an intent reliability probability value as taught by Aly. Doing so would allow for assisting a user to obtain information or services (Aly; Column 1, line 66 - Column 2, line 7).
Gruber in view of Aly does not specifically disclose: each of the plurality of augmented sentence candidates generated based on a result of the preprocessing and a different one of a plurality of language models.
Al Hasan teaches:
each of the plurality of augmented sentence candidates generated based on a result of the preprocessing and a different one of a plurality of language models (Abstract, lines 8-11, "The system is configured to provide one or more candidate paraphrases of a natural language input based on both the word-level and character-level attention-based models."; Column 11, lines 4-20, "Ensemble component 36 is configured to determine a plurality of candidate paraphrases based on both the word-level LSTM model and the character-LSTM model. In some embodiments, the word-level and character-level models may individually generate two or more sets of candidate paraphrases. For example, word-level and character-level candidate paraphrase determinations may be generated by the models where the model can take word-level/character-level inputs at the input layer and generate word-level/character-level outputs at the prediction/output layer (which combined together comprise four sets of candidate paraphrases). Similarly, multiple approaches (e.g. bidirectional encoder-decoder, attention-based soft-search, stacked residual LSTM networks etc.) for generating the models themselves may be combined to produce multiple learning models from the same training corpus, which may contribute multiple different sets of candidate clinical paraphrases."; Column 14, lines 36-40, "Embedding component 32 is configured to determine word-level, character-level, and sentence-level embeddings using the training corpus, and update the word-level, character-level, and sentence-level embeddings based on semantic relationships known from existing knowledge bases."; Determining word-level, character-level, and sentence-level embeddings reads on preprocessing, and the word-level and character-level models individually generating two or more sets of candidate paraphrases reads on each of the plurality of augmented sentence candidates being generated based on a different one of a plurality of language models.).
Al Hasan is considered to be analogous to the claimed invention because it is in the same field of dialog systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruber in view of Aly to incorporate the teachings of Al Hasan to provide word-level and character-level models individually generating two or more sets of candidate paraphrases. Doing so would allow for performing paraphrase generation to transform a text to improve readability while keeping the overall meaning intact (Al Hasan; Column 4, lines 38-51).
Gruber in view of Aly and Al Hasan does not specifically disclose: wherein the performing of the named entity recognition comprises recognizing and extracting persons, locations, organizations, and times from the utterance.
Gadde teaches:
wherein the performing of the named entity recognition comprises recognizing and extracting persons, locations, organizations, and times from the utterance (Column 5, lines 21-27, "As part of the NLP processing for a utterance, the digital assistant is trained to understand the meaning of the utterance, which involves identifying one or more intents and one or more entities corresponding to the utterance. The entity extraction in the digital assistant has two phases: named entity recognition and entity resolution via the named entity recognizer."; Column 17, lines 45-49, "Named entity recognizer 216 identifies and classifies named entities in text (e.g., utterances) into pre-defined categories such as the person, organization, location, expressions of time, currency, universal resource language address, etc."; Identifying and classifying named entities in utterances into pre-defined categories such as the person, organization, location, expressions of time, currency, and universal resource language address reads on the named entity recognition comprising recognizing and extracting persons, locations, organizations, and times from the utterance.).
Gadde is considered to be analogous to the claimed invention because it is in the same field of dialog systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruber in view of Aly and Al Hasan to incorporate the teachings of Gadde to identify and classify named entities in utterances into pre-defined categories such as the person, organization, location, expressions of time, currency, and universal resource language address. Doing so would allow for implementing a chatbot that interprets ambiguous user input (Gadde; Column 4, lines 40-54).
Regarding claim 4, Gruber in view of Aly, Al Hasan, and Gadde discloses the electronic device as claimed in claim 1.
Gruber further discloses:
wherein the plurality of language models comprise at least one of: a first language model which is a basic language model, a second language model generated based on learning data for each domain, a third language model generated based on a user history, and a fourth language model generated with a social trend reflected therein (Column 29, lines 33-51, "In one embodiment, assistant 1002 employs statistical language models to generate candidate text interpretations 124 of speech input 121. In addition, in one embodiment, the statistical language models are tuned to look for words, names, and phrases that occur in the various models of assistant 1002 shown in FIG. 8. For example, in at least one embodiment the statistical language models are given words, names, and phrases from some or all of: domain models 1056 (e.g., words and phrases relating to restaurant and meal events), task flow models 1086 (e.g., words and phrases relating to planning an event), dialog flow models 1087 (e.g., words and phrases related to the constraints that are needed to gather the inputs for a restaurant reservation), domain entity databases 1072 (e.g., names of restaurants), vocabulary databases 1058 (e.g., names of cuisines), service models 1088 (e.g., names of service provides such as OpenTable), and/or any words, names, or phrases associated with any node of active ontology 1050."; Statistical language models read on a plurality of language models, and domain models read on a language model generated based on learning data for each domain.).
Regarding claim 6, Gruber in view of Aly, Al Hasan, and Gadde discloses the electronic device as claimed in claim 1.
Gruber further discloses:
wherein the intent reliability probability value of intent information corresponding to each of the one or more augmented sentence candidates is greater than the threshold value (Column 40, lines 21-28, "In one embodiment, if ranking component 126 determines 128 that the highest-ranking speech interpretation from interpretations 124 ranks above a specified threshold, the highest-ranking interpretation may be automatically selected 130. If no interpretation ranks above a specified threshold, possible candidate interpretations of speech 134 are presented 132 to the user. The user can then select 136 among the displayed choices."; Determining a speech interpretation ranks above a specified threshold reads on the intent reliability probability value of intent information corresponding to an augmented sentence candidate being greater than the threshold.).
Regarding claim 9, Gruber in view of Aly, Al Hasan, and Gadde discloses the electronic device as claimed in claim 1.
Gruber further discloses:
wherein the instructions, when executed by the at least one processor individually or collectively, further cause the electronic device to: select at least one augmented sentence from among the one or more augmented sentence candidates, and output the response generated based on the at least one augmented sentence (Column 40, lines 21-28, "In one embodiment, if ranking component 126 determines 128 that the highest-ranking speech interpretation from interpretations 124 ranks above a specified threshold, the highest-ranking interpretation may be automatically selected 130. If no interpretation ranks above a specified threshold, possible candidate interpretations of speech 134 are presented 132 to the user. The user can then select 136 among the displayed choices."; Presenting possible candidate interpretations of speech to the user reads on selecting at least one augmented sentence from among the one or more augmented sentence candidates and outputting the response generated based on the at least one augmented sentence.).
Regarding claim 10, Gruber in view of Aly, Al Hasan, and Gadde discloses the electronic device as claimed in claim 9.
Gruber further discloses:
wherein the instructions, when executed by the at least one processor individually or collectively, further cause the electronic device to: for each of the one or more augmented sentence candidates, select the at least one augmented sentence by verifying a confidence value, verifying an uncertainty value, verifying an augmented history, or verifying a checklist (Column 40, lines 21-28, "In one embodiment, if ranking component 126 determines 128 that the highest-ranking speech interpretation from interpretations 124 ranks above a specified threshold, the highest-ranking interpretation may be automatically selected 130. If no interpretation ranks above a specified threshold, possible candidate interpretations of speech 134 are presented 132 to the user. The user can then select 136 among the displayed choices."; Selecting the highest-ranking speech interpretation from interpretations that rank above a specified threshold reads on selecting an augmented sentence by verifying a confidence value.).
Regarding claim 11, arguments analogous to claim 1 are applicable.
Regarding claim 14, arguments analogous to claim 4 are applicable.
Regarding claim 16, arguments analogous to claim 6 are applicable.
Regarding claim 19, arguments analogous to claim 9 are applicable.
Regarding claim 20, arguments analogous to claim 10 are applicable.
Regarding claim 21, Gruber in view of Aly, Al Hasan, and Gadde discloses the electronic device as claimed in claim 10.
Gruber further discloses:
wherein the instructions, when executed by the at least one processor individually or collectively, further cause the electronic device to: provide the response to the user based on a first at least one augmented sentence candidate of the plurality of augmented sentence candidates with a highest confidence value or a lowest uncertainty value (Column 40, lines 21-28, "In one embodiment, if ranking component 126 determines 128 that the highest-ranking speech interpretation from interpretations 124 ranks above a specified threshold, the highest-ranking interpretation may be automatically selected 130. If no interpretation ranks above a specified threshold, possible candidate interpretations of speech 134 are presented 132 to the user. The user can then select 136 among the displayed choices."; Selecting the highest-ranking speech interpretation from interpretations that rank above a specified threshold reads on providing the response to the user based on an augmented sentence candidate with a highest confidence value.).
Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Gruber in view of Aly, Al Hasan, and Gadde, and further in view of Hu et al. ("Interactive Question Clarification in Dialogue via Reinforcement Learning"), hereinafter Hu.
Regarding claim 2, Gruber in view of Aly, Al Hasan, and Gadde discloses the electronic device as claimed in claim 1, but does not specifically disclose: wherein the intent reliability probability value is less than the threshold value when the utterance includes only a noun or a predicate.
Hu teaches:
wherein the intent reliability probability value is less than the threshold value when the utterance includes only a noun or a predicate (Abstract, lines 3-5, "In this work, we propose a reinforcement model to clarify ambiguous questions by suggesting refinements of the original query."; Section 1, lines 1-6, "In real-world dialogue systems, a substantial portion of all user queries are ambiguous ones for which the system is unable to precisely identify the underlying intent. We observed that many such queries in our question answering (QA) system exhibited one of the following two characteristics. 1. Lack of semantic elements such as subject, object, or predicate, e.g. “How to apply”, “Credit card”. 2. Ambiguous entities, e.g. “My health insurance” (because health insurance consists of numerous sub-categories)."; Section 3, lines 35-36, "At the first stage, we collect ambiguous questions by annotating online query logs. If a query lacks a predicate or the object of the predicate, it is annotated as ambiguous."; Determining a query to be ambiguous if it lacks a predicate or the object of the predicate reads on the reliability value being less than the threshold value when the utterance includes only a noun or a predicate.).
Hu is considered to be analogous to the claimed invention because it is in the same field of dialog systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruber in view of Aly, Al Hasan, and Gadde to incorporate the teachings of Hu to determine a query to be ambiguous if it lacks a predicate or the object of the predicate. Doing so would allow for clarifying ambiguous questions by suggesting refinements of the original query (Hu; Abstract, lines 1-10).
Regarding claim 12, arguments analogous to claim 2 are applicable.
Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Gruber in view of Aly, Al Hasan, and Gadde, and further in view of Lee et al. (US Patent Application Publication No. 2013/0238624), hereinafter Lee.
Regarding claim 3, Gruber in view of Aly, Al Hasan, and Gadde discloses the electronic device as claimed in claim 1, but does not specifically disclose: wherein the intent reliability probability value is less than the threshold value when the utterance includes a homonym.
Lee teaches:
wherein the intent reliability probability value is less than the threshold value when the utterance includes a homonym (Paragraph 0010, lines 1-10, "According to one aspect of the present disclosure, a method for conducting a search using a user device includes receiving a search word through the user device. The method also includes displaying a homonym list indicating homonyms of the input search word on a display screen of the user device. The method further includes receiving a selection of one homonym item of the homonym list through the user device. The method still further includes conducting the search through an information search system using the selected homonym item as a search word."; Paragraph 0035, lines 1-19, "The information search system 110 searches for the search word received from the user device 100, and then sends the search result to the user device 100. In particular, the information search system 110 retrieves the homonymous search words of the search word through the knowledge management system 120, generates the homonym list including the searched homonymous search words, and provides the homonym list to the user device 100. When receiving a signal indicating a particular homonymous search word determined from the user device 100, the information search system 110 searches for the determined search word. In so doing, the information search system 110 obtains category information of the searched homonymous search words from the knowledge management system 120 and provides a category list to the user device 100. When receiving a signal indicating a particular category determined from the user device 100, the information search system 110 may send to the user device 100 a list including only the homonymous search words corresponding to the determined category."; Providing a homonym list or a category list to the user when a search word is a homonym reads on the reliability value being less than the threshold value when the utterance includes a homonym.).
Lee is considered to be analogous to the claimed invention because it is in the same field of search systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruber in view of Aly, Al Hasan, and Gadde to incorporate the teachings of Lee '624 to provide a homonym list or a category list to the user when a search word is a homonym. Doing so would allow for determining a search intention of a user by classifying homonyms of a search word based on a category and determining the category of the search word under user's control (Lee; Paragraph 0008, lines 1-5).
Regarding claim 13, arguments analogous to claim 3 are applicable.
Claims 5, 8, 15 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Gruber in view of Aly, Al Hasan, and Gadde, and further in view of Garcia et al. (US Patent No 11,710,482), hereinafter Garcia.
Regarding claim 5, Gruber in view of Aly, Al Hasan, and Gadde discloses the electronic device as claimed in claim 4, but does not specifically disclose: wherein the one or more language models comprise: a fifth language model generated by a combination of two or more language models among the first language model, the second language model, the third language model, and the fourth language model.
Garcia teaches:
wherein the one or more language models comprise: a fifth language model generated by a combination of two or more language models among the first language model, the second language model, the third language model, and the fourth language model (Column 1, lines 59 - Column 2, line 6, "In accordance with a determination that the first audio stream includes the lexical trigger, the method further includes generating one or more candidate text representations of the one or more utterances and determining whether at least one candidate text representation of the one or more candidate text representations is to be disregarded by the virtual assistant. In accordance with a determination that at least one candidate text representation is to be disregarded by the virtual assistant, the method further includes generating one or more candidate intents based on candidate text representations of the one or more candidate text representations other than the to be disregarded at least one candidate text representation. The method further includes determining whether the one or more candidate intents include at least one actionable intent."; Column 16, lines 33-42, "User data and models 231 include various data associated with the user (e.g., user-specific vocabulary data, user preference data, user-specified name pronunciations, data from the user's electronic address book, to-do lists, shopping lists, etc.) to provide the client-side functionalities of the digital assistant. Further, user data and models 231 include various models (e.g., speech recognition models, statistical language models, natural language processing models, ontology, task flow models, service models, etc.) for processing user input and determining user intent."; Column 39, lines 33-43, "It should be recognized that in some examples, natural language processing module 732 is implemented using one or more machine learning mechanisms (e.g., neural networks). In particular, the one or more machine learning mechanisms are configured to receive a candidate text representation and contextual information associated with the candidate text representation. Based on the candidate text representation and the associated contextual information, the one or more machine learning mechanism are configured to determine intent confidence scores over a set of candidate actionable intents."; A natural language processing module implemented using one or more machine learning mechanisms reads on a language model generated by a combination of two or more language models, and a machine learning mechanism configured to receive a candidate text representation and contextual information associated with the candidate text representation reads on a language model generated based on a user history.).
Garcia is considered to be analogous to the claimed invention because it is in the same field of dialog systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruber in view of Aly, Al Hasan, and Gadde to incorporate the teachings of Garcia to implement natural language processing using multiple machine learning mechanisms. Doing so would allow for reducing the need for leading a user utterance with a trigger phrase and improving the accuracy associated with determining whether a user utterance is directed to a virtual assistant (Garcia; Column 4, lines 49-60).
Regarding claim 8, Gruber in view of Aly, Al Hasan, and Gadde discloses the electronic device as claimed in claim 1, but does not specifically disclose: wherein the instructions, when executed by the at least one processor individually or collectively, further cause the electronic device to: output a question generated based on a plurality of augmented sentence candidates.
Garcia teaches:
output a question generated based on a plurality of augmented sentence candidates (Column 1, lines 59 - Column 2, line 6, "In accordance with a determination that the first audio stream includes the lexical trigger, the method further includes generating one or more candidate text representations of the one or more utterances and determining whether at least one candidate text representation of the one or more candidate text representations is to be disregarded by the virtual assistant. In accordance with a determination that at least one candidate text representation is to be disregarded by the virtual assistant, the method further includes generating one or more candidate intents based on candidate text representations of the one or more candidate text representations other than the to be disregarded at least one candidate text representation. The method further includes determining whether the one or more candidate intents include at least one actionable intent."; Column 40, lines 58-67, "As described above, in order to complete a structured query, task flow processing module 736 needs to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances. When such interactions are necessary, task flow processing module 736 invokes dialogue flow processing module 734 to engage in a dialogue with the user. In some examples, dialogue flow processing module 734 determines how (and/or when) to ask the user for the additional information and receives and processes the user responses."; Generating one or more candidate intents and asking the user for additional information to disambiguate potentially ambiguous utterances reads on outputting a question generated based on a plurality of augmented sentence candidates.).
Garcia is considered to be analogous to the claimed invention because it is in the same field of dialog systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruber in view of Aly, Al Hasan, and Gadde to incorporate the teachings of Garcia to generate one or more candidate intents and ask the user for additional information to disambiguate potentially ambiguous utterances. Doing so would allow for reducing the need for leading a user utterance with a trigger phrase and improving the accuracy associated with determining whether a user utterance is directed to a virtual assistant (Garcia; Column 4, lines 49-60).
Regarding claim 15, arguments analogous to claim 5 are applicable.
Regarding claim 18, arguments analogous to claim 8 are applicable.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JAMES BOGGS/Examiner, Art Unit 2657
/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657