Last updated: May 29, 2026
Application No. 18/094,280
TASK-ORIENTED DIALOG MODELING AND ACTION DETERMINATION

Final Rejection §101§103
Filed
Jan 06, 2023
Examiner
LEE, MICHAEL CHRISTOPHER
Art Unit
2128
Tech Center
2100 — Computer Architecture & Software
Assignee
Toyota Connected North America Inc.
OA Round
2 (Final)
This examiner grants 61% of cases after interview

— +26.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 144 resolved cases, 2023–2026
Examiner Intelligence

LEE, MICHAEL CHRISTOPHER View full profile →
Grants 61% of resolved cases
Career Allowance Rate
88 granted / 144 resolved
+6.1% vs TC avg
Strong +26% interview lift
Without
With
+26.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
19 currently pending
Career history
195
Total Applications
across all art units
Statute-Specific Performance

§101
15.9%
-24.1% vs TC avg
§103
79.7%
+39.7% vs TC avg
§102
0.8%
-39.2% vs TC avg
§112
3.3%
-36.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 144 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of  AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s Amendment and remarks submitted on 3/5/2026 have been considered.  Claims 1-20 are pending.
Drawing Objections.  The objections to the drawings are withdrawn in view of the amendments to the specification to add references to reference characters 151-153 and 164.
35 U.S.C. 112(b) Rejections.  The rejections to claims 7-8 under 35 U.S.C. 112(b) are withdrawn in view of Applicant’s amendments to such claims.
35 U.S.C. 101 Rejections.  The rejections to claims 17-20 for not pertaining to an eligible statutory category are withdrawn in view of Applicant’s amendments to recite a “non-transitory computer-readable medium.”  However, the other rejections under 35 U.S.C. 101 are being maintained as explained in the detailed rejections below.

Response to Arguments
On page 10 of Applicant’s 3/5/2026 Amendment and remarks, Applicant asserts that at least paras. 0056-0058 and 0072 of the instant specification provide written description support for the claim amendments.
The examiner agrees that the portions of the disclosure identified by Applicant provide sufficient written description support for the claim amendments.

On page 12 of Applicant’s 3/5/2026 Amendment and remarks, with respect to the rejections of independent claims 1, 9, and 17 under 35 U.S.C. 103, Applicant asserts that the amendments to the claim, in their totality, are not taught by the BALIGAR and PANDEY references.
The examiner agrees that BALIGAR and PANDEY do not recite a “vehicle.”  The previous rejections under 35 U.S.C. 103 are withdrawn, however new rejections under 35 U.S.C. 103  necessitated by Applicant’s amendments are provided herein.

On pages 12-14 of Applicant’s 3/5/2026 Amendment and remarks, with respect to the rejections of the dependent claims under 35 U.S.C. 103, Applicant asserts that the rejections of such dependent claims should be withdrawn in view of Applicant’s amendments and arguments to the independent claims.
The examiner respectfully disagrees for the same reasons explained with respect to the independent claims.

Claim Objections
Claims 2-8, 10, and 18 are objected to because of the following informalities: 
The preambles of each of dependent claims 2-8 recite “The apparatus of claim 1”, and “the apparatus” now lacks antecedent basis because claim 1 was amended to delete “the apparatus.”  The examiner suggests amending the preambles of each of these claims to recite “The vehicle of claim 1” to match the amendments made to claim 1 and to provide explicit antecedent basis.
In claim 2, line 4, “recently received” should read “recently received utterance”
In claim 4, line 4, “previously received” should read “previously received utterance”
In claim 10, line 4, “recently received” should read “recently received utterance”
In claim 18, line 5, “recently received” should read “recently received utterance”
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Regarding Step 1 of the Alice/Mayo framework, Claims 1-8 are directed to an apparatus (a machine), Claims 9-16 are directed to a method (a process), and Claims 17-20 are directed to a “non-transitory computer-readable medium” (an article of manufacture) which each fall within one of the four statutory categories of inventions.

Regarding Claim 1
Step 2A, prong 1 (Is the claim directed to a law of nature, a natural phenomenon or an abstract idea).
Claim 1 recites the following mental processes, that in each case under the broadest reasonable interpretation, covers performance of the limitation in the mind (including an observation, evaluation, judgment, opinion) or with the aid of pencil and paper but for the recitation of generic computer components (e.g., “audio sensor”, “processor”, “machine learning model”, and “virtual assistant”). 
determine, ... that the utterance does not contain sufficient information to determine an action of the system that causes performance of a task by the system (under the broadest reasonable interpretation, a human can perform this limitation mentally, e.g., a human such as a human assistant can listen/read an utterance mentally and then mentally determine that there is not enough information available to determine an action, e.g., the human assistant mentally determines that additional information is required in order to understand and carry-out the user’s request)
determine, ... that sufficient information is available to determine the action based on the additional utterance (under the broadest reasonable interpretation, a human can perform this limitation mentally, e.g., a human such as a human assistant can listen/read additional utterances mentally and then mentally determine that there is now enough information available to determine an action, e.g., the human assistant mentally determines that the human assistant has enough information in order to understand and carry-out the user’s request)
determine, ... the task based on the utterance and the additional utterances (under the broadest reasonable interpretation, a human can perform this limitation mentally, e.g., a human such as a human assistant can determine the task to be performed based on the utterances, e.g., can predict that the user wants the human assistant to perform a task such as researching a question and returning with an answer)

	Claim 1 further recites the following limitation that is NOT a mental process, but that falls under the “managing personal behavior or relationships or interactions between people” category of abstract ideas as explained by MPEP 2106.04(a)(2) II.C:
in response to the utterance not containing the sufficient information, initiate, ... a dialog with the user and receiving one or more additional utterances from the user ... (under the broadest reasonable interpretation, this limitation merely relates to the social activity of having a conversation with another person, in response to determining that additional information is required)

Step 2A, prong 2 (Does the claim recite additional elements that integrate the judicial exception into a practical application?).
The judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements (e.g., “receiver”, “processor”, “machine learning model”, and “virtual assistant”) which are recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)). 
	Regarding the “A vehicle” limitation, such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (e.g., the mental processes are now limited to being performed within a vehicle). As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not integrate a judicial exception into a practical application.
Regarding the “a system comprising an audio sensor to receive an utterance from a user” limitation, such additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).  
	Regarding the “a processor in communication with the audio sensor, wherein the processor is configured to” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception.  In particular, the claim only recites the additional element of a processor.  This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a processor).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
	Regarding the “via a first machine learning (ML) model” and “via the first ML model” limitations, such limitations are recited at a high-level of generality and amount to no more than adding the words “apply it” (or an equivalent) with the judicial exception.  In particular, the claim only recites the additional element of a machine learning model.  This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a machine learning model).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
	Regarding the “via a virtual assistant” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception.  In particular, the claim only recites the additional element of a virtual assistant.  This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a virtual assistant).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Regarding the “controlled by a second ML model, different than the first ML model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception.  In particular, the claim only recites the additional element of a machine learning model.  This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a machine learning model).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Regarding the “receive, via the audio sensor, an additional utterance” limitation, such additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).  
Regarding the “cause the system to perform the task” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation attempts to cover a solution to an identified problem with no restriction on how the result is accomplished, or provides no description of the mechanism for accomplishing the result.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).

Step 2B (Does the claim recite additional elements that amount to significantly more than the judicial exception?)
	In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.  As discussed above, the additional elements (e.g., “receiver”, “processor”, “machine learning model”, and “virtual assistant”) are recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).
Regarding the “A vehicle” limitation, such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which does not amount to significantly more than the judicial exception.  MPEP 2106.05(h).
Regarding the “a system comprising an audio sensor to receive an utterance from a user”” limitation, as discussed above, the additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
Regarding the “a processor in communication with the audio sensor, wherein the processor is configured to” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
	Regarding the “via a first machine learning (ML) model” and “via the first ML model” limitations, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
	Regarding the “via a virtual assistant” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding the “controlled by a second ML model, different than the first ML model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding the “receive, via the audio sensor, an additional utterance” limitation, as discussed above, the additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
Regarding the “cause the system to perform the task” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation attempts to cover a solution to an identified problem with no restriction on how the result is accomplished, or provides no description of the mechanism for accomplishing the result.  Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).

Regarding Claim 2
Step 2A, Prong 1
prompt the user for the additional utterance based on content included in a most recently received utterance (under the broadest reasonable interpretation, a human can perform this limitation mentally, e.g., a human such as a human assistant can listen/read additional utterances mentally and then mentally determine that additional information is needed, and therefore make a mental decision to prompt the user for an additional utterance)

Regarding Step 2A, Prong 2, the claim does not include any additional elements that integrate the judicial exception into a practical application and regarding Step 2B, there are no additional elements recited that amount to significantly more than the judicial exception.

Regarding Claim 3
Step 2A, Prong 2
Regarding the “wherein the virtual assistant is embedded within the system, and the user is a passenger within the vehicle” limitation, such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use (automotive assistants). As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not integrate a judicial exception into a practical application.

Step 2B
Regarding the “wherein the virtual assistant is embedded within the system, and the user is a passenger within the vehicle” limitation, such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which does not amount to significantly more than the judicial exception.  MPEP 2106.05(h).

Regarding Claim 4
Step 2A, Prong 1
request the user to confirm content included in a previously received. (under the broadest reasonable interpretation, this limitation merely relates to the social activity of having a conversation with another person, and requesting the user to confirm concent in a previously received utterance)

Regarding Step 2A, Prong 2, the claim does not include any additional elements that integrate the judicial exception into a practical application and regarding Step 2B, there are no additional elements recited that amount to significantly more than the judicial exception.

Regarding Claim 5
Step 2A, Prong 1
predict an intent of the user after each utterance based on an aggregation of utterances during a conversation with the user (under the broadest reasonable interpretation, a human can perform this limitation mentally, e.g., a human such as a human assistant can review the aggregation of utterances and then mentally predict an intent of the user, e.g., the intent of the user is to receive information)
when ... determines that the sufficient information is available ... determine that the sufficient information is available based on the aggregation of utterances  (under the broadest reasonable interpretation, a human can perform this limitation mentally, e.g., a human such as a human assistant can mentally determine that there is enough information available in order to understand and carry-out the user’s request)

Step 2A, Prong 2
	Regarding the “the processor” and “the processor is configured to” limitations, such limitations are recited at a high-level of generality and amount to no more than adding the words “apply it” (or an equivalent) with the judicial exception.  In particular, the claim only recites the additional elements of a processor.  These additional elements are recited at a high-level of generality and amount to no more than mere instructions to apply the exception using generic computer components (a processor).  Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).

Step 2B
Regarding the “the processor” and “the processor is configured to” limitations, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).

Regarding Claim 6
Step 2A, Prong 1
determine, ... that a sequence of utterances is required to make a prediction based on the utterance.  (under the broadest reasonable interpretation, a human can perform this limitation mentally, e.g., a human such as a human assistant can mentally determine that multiple utterances will be needed to collect sufficient information from a user)

Step 2A, Prong 2
	Regarding the “via the first ML model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception.  In particular, the claim only recites the additional element of a machine learning model.  This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a machine learning model).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).

Step 2B
Regarding the “via the first ML model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).

Regarding Claim 7
Step 2A, Prong 1
predict, ... a next action to be taken ... based on the utterance and a previously received utterance associated with a same task (under the broadest reasonable interpretation, a human can perform this limitation mentally, e.g., a human such as a human assistant can mentally predict a next action to be taken based on the received utterances, e.g., can mentally predict that the next action will be to do research to find an answer to the user’s question)

Step 2A, Prong 2
	Regarding the “via the second ML model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception.  In particular, the claim only recites the additional element of a machine learning model.  This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a machine learning model).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
	Regarding the “by the virtual assistant” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception.  In particular, the claim only recites the additional element of a virtual assistant.  This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a virtual assistant).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).

Step 2B
Regarding the “via the second ML model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding the “by the virtual assistant” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).

Regarding Claim 8
Step 2A, Prong 1
identify, ... a word of interest within the additional utterance and an additional question ... to ask the user based on the word of interest (under the broadest reasonable interpretation, a human can perform this limitation mentally, e.g., a human such as a human assistant can identify a word of interest within an utterance and then based on such word of interest, form an additional question mentally to ask the user)

Step 2A, Prong 2
	Regarding the “via the second ML model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception.  In particular, the claim only recites the additional element of a machine learning model.  This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a machine learning model).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Regarding the “for the virtual assistant” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception.  In particular, the claim only recites the additional element of a virtual assistant.  This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a virtual assistant).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Step 2B
Regarding the “via the second ML model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
	Regarding the “for the virtual assistant” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea.  Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).

Regarding Claim 9
Step 2A, Prong 1
	Claim 9 recites a method that corresponds to the apparatus of claim 1, and therefore the analysis under Step 2A, Prong 1 with respect to claim 1 also applies to this claim 9.  While claim 9 recites additional generic computing components (e.g., “machine learning model”, and “virtual assistant”, and “audio sensor”), such additional generic computing components do not change the analysis under Step 2A, Prong 1.

Step 2A, Prong 2
	Claim 9 recites a method that corresponds to the apparatus of claim 1, and therefore the analysis under Step 2A, Prong 2 with respect to claim 1 also applies to this claim 9.  While claim 9 recites additional generic computing components (e.g., “machine learning model”, and “virtual assistant”, and “audio sensor”), such additional generic computing components do not change the analysis under Step 2A, Prong 2.

Step 2B
	Claim 9 recites a method that corresponds to the apparatus of claim 1, and therefore the analysis under Step 2B with respect to claim 1 also applies to this claim 9.  While claim 9 recites additional generic computing components (e.g., “machine learning model”, and “virtual assistant”, and “audio sensor”), such additional generic computing components do not change the analysis under Step 2B

	Claims 10-16 depend from claim 9, and correspond to the apparatuses of claims 2-8, and are therefore rejected for the same reasons explained above with respect to claim 9 and claims 2-8, respectively.

Regarding Claim 17
Step 2A, Prong 1
	Claim 17 recites a non-transitory computer-readable medium that corresponds to the apparatus of claim 1, and therefore the analysis under Step 2A, Prong 1 with respect to claim 1 also applies to this claim 17.  While claim 17 recites additional generic computing components (e.g., “non-transitory computer-readable medium”, “processor”, “machine learning model”, and “virtual assistant”, and “audio sensor”), such additional generic computing components do not change the analysis under Step 2A, Prong 1.

Step 2A, Prong 2
	Claim 17 recites a non-transitory computer-readable medium that corresponds to the apparatus of claim 1, and therefore the analysis under Step 2A, Prong 2 with respect to claim 1 also applies to this claim 17.  While claim 17 recites additional generic computing components (e.g., “non-transitory computer-readable medium”, “processor”, “machine learning model”, and “virtual assistant”, and “audio sensor”), such additional generic computing components do not change the analysis under Step 2A, Prong 2.

Step 2B
	Claim 17 recites a non-transitory computer-readable medium that corresponds to the apparatus of claim 1, and therefore the analysis under Step 2B with respect to claim 1 also applies to this claim 17.  While claim 17 recites additional generic computing components (e.g., “non-transitory computer-readable medium”, “processor”, “machine learning model”, and “virtual assistant”, and “audio sensor”), such additional generic computing components do not change the analysis under Step 2B.

Claims 18-20 depend from claim 17, and correspond to the apparatuses of claims 2-4, and are therefore rejected for the same reasons explained above with respect to claim 17 and claims 2-4, respectively.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-5, 7, 9-13, 15, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over US 10997963 B1, hereinafter referenced as BALIGAR in view of US 11289075 B1, hereinafter referenced as PANDEY, and further in view of US 20200211387 A1, hereinafter referenced as NOY.

Regarding Claim 1
	BALIGAR teaches:
a system comprising an audio sensor to receive an utterance from a user; and (BALIGAR, col. 4, lines 1-5”: “Instead, the voice assistant device 110 may receive input from users by receiving spoken commands, which are converted to signals by the voice assistant device 110 and/or by a cloud service, and then processed, such as by an exchange of data with voice assistant service 102.”;
BALIGAR, col. 2, lines 11-13: “The voice assistant system may include a user device that typically includes at least a network interface, a microphone, and a speaker.”
BALIGAR, col. 5, lines 10-13: “Meanwhile, the voice assistant service 102 may receive a message 122 of the audible request of “buy this”, which was received via a microphone of the voice assistance device 110.”;
Examiner’s Note: voice assistant device 110 has a microphone (corresponding to recited “audio sensor”) that receives speech inputs from users)
a processor in communication with the audio sensor, wherein the processor is configured to (BALIGAR, col. 2, lines 1-13: “The voice assistant system may include any system and/or device that receives audio commands from a user, processes the audio, possibly using speech to text algorithms and/or natural language processing (NLP) algorithms, to determine text, returns a reply based on the text, converts the reply to an audio output using text to speech algorithms, and causes a speaker to output the audio output. ... The voice assistant system may include a user device that typically includes at least a network interface, a microphone, and a speaker.”
BALIGAR, col. 5, lines 48-57: “FIG. 2 is a block diagram of an illustrative computing architecture 200 of a voice assistant service, such as the voice assistant service 102. The computing architecture 200 may be implemented in a distributed or non-distributed computing environment. The computing architecture 200 may include one or more processors 202 and one or more computer-readable media 204 that stores various modules, applications, programs, or other data. The computer-readable media 204 may include instructions that, when executed by the one or more processors 202, cause the processors to perform the operations described herein.”;
Examiner’s Note: the processor of the voice assistant service is in communication with the microphone in order to receive the utterance for processing using NLP algorithms)
determine, ... that the utterance does not contain sufficient information to determine an action of the system that causes performance of a task by the system, (BALIGAR, col. 8, line 60 – col. 9, line 5: “At 408, the voice assistant service 102 may determine whether to request more information to determine a context of an audio request. For example, when the voice assistant service 102 includes enough information to understand and respond to an audio request that is supplemented by context information derived from the context queue, then the voice assistant service 102 may not request additional information from the user. When the voice assistant service 102 does not include enough information to understand and respond to the audio request that is supplemented by context information derived from the context queue (following the “yes” route from the decision operation 408), then the process 400 may advance to an operation 410.”;
Examiner’s Note: the voice assistant determines if enough information is available to understand and respond to the user’s audio request in order for the assistant to “respond to an audio request” (where such response corresponds to the recited “performance of a task by the system”))
in response to the utterance not containing the sufficient information, initiate, via a virtual assistant ..., a dialog with the user, (BALIGAR, col. 9, lines 6-13: “At 410, the voice assistant service 102 may request additional information from the user by sending an audio request to the user via the voice assistant device or voice assistant application. The request may be a choice, such as “did you mean “buy a toaster or diapers”. In some instances, the response may be a request for more specific information, such as “Please provide more information so I can fulfill your request”.”;	
BALIGAR, col. 9, lines 14-23: “When the voice assistant service 102 does include enough information to understand and respond to the audio request that is supplemented by context information derived from the context queue (following the “no” route from the decision operation 408), then the process may advance to an operation 412. At 412, the voice assistant service 102 may implement supplemented audio request. For example, the voice assistant service 102 may provide an audio response to the user via the voice assistant device or voice assistant application.”;
Examiner’s Note: BALIGAR discloses requesting additional information from the user, which is received via user speech input, when additional information is required to fulfill the user’s request)
receive, via the audio sensor, an additional utterance from the user, (BALIGAR, col. 4, lines 1-5”: “Instead, the voice assistant device 110 may receive input from users by receiving spoken commands, which are converted to signals by the voice assistant device 110 and/or by a cloud service, and then processed, such as by an exchange of data with voice assistant service 102.”;
BALIGAR, col. 2, lines 11-13: “The voice assistant system may include a user device that typically includes at least a network interface, a microphone, and a speaker.”
BALIGAR, col. 5, lines 10-13: “Meanwhile, the voice assistant service 102 may receive a message 122 of the audible request of “buy this”, which was received via a microphone of the voice assistance device 110.”;
BALIGAR, col. 8, line 60 – col. 9, line 5: “At 408, the voice assistant service 102 may determine whether to request more information to determine a context of an audio request. For example, when the voice assistant service 102 includes enough information to understand and respond to an audio request that is supplemented by context information derived from the context queue, then the voice assistant service 102 may not request additional information from the user. When the voice assistant service 102 does not include enough information to understand and respond to the audio request that is supplemented by context information derived from the context queue (following the “yes” route from the decision operation 408), then the process 400 may advance to an operation 410.”;
Examiner’s Note: voice assistant device 110 has a microphone (corresponding to recited “audio sensor”) that receives additional speech inputs from users in response to the request for more information in operation 410)
determine, ...  that sufficient information is available to determine the action based on the additional utterance, (BALIGAR, col. 8, lines 60-67: “At 408, the voice assistant service 102 may determine whether to request more information to determine a context of an audio request. For example, when the voice assistant service 102 includes enough information to understand and respond to an audio request that is supplemented by context information derived from the context queue, then the voice assistant service 102 may not request additional information from the user”;
Examiner’s Note: As shown in Fig. 4, after step 410 (request for more information via audio in instances where not enough information is available), the flow returns to step 406 and then step 408 to analyze whether the additional information provided by the user is sufficient to understand and respond to the user’s request)
determine, ... the task based on the utterance and the additional utterance, and (BALIGAR, col. 9, lines 19-23: “At 412, the voice assistant service 102 may implement supplemented audio request. For example, the voice assistant service 102 may provide an audio response to the user via the voice assistant device or voice assistant application.”;	
Examiner’s Note: for example, a task may be to provide the user with information in the form of an audio response)
cause the system to perform the task.  (BALIGAR, col. 12, lines 13-18: “At 618, the voice assistant service 102 may transmit a request to the content provider to cause a refresh of information served to a device associated with the user who made the audio request, to cause output of more books, such as by graphically outputting recommended books similar to the book referenced via the contextual information.”;
Examiner’s Note: BALIGAR provides several examples of a voice assistant transmitting requests to other systems (such as a content provider) to perform a task requested by the user, such as causing the system to output an audio book)

However, BALIGAR fails to explicitly teach:
A vehicle comprising:
... via a first machine learning (ML) model ...
... controlled by a second ML model, different than the first ML model ...
... via the first ML model ...

	However, in a related field of endeavor (spoken language understanding techniques, see col. 2, lines 25-34), PANDEY teaches:
... via a first ML model ... (PANDEY, col. 4, lines 21-61: “Explicit feedback may be elicited for user interactions with the new skill. Additionally, over time, a user feedback prediction component may include a machine learning model (e.g., a feedback prediction machine learning model) that may be trained and/or updated based on user interactions with the new skill. Eventually, the user feedback prediction model may predict user feedback for the new skill without requesting explicit user feedback. ... In at least some examples, feedback may be requested and/or predicted based on a determination that a skill has received (and/or is associated with) less than a threshold amount of feedback data. In at least some further examples, an exploration policy may determine an amount of feedback data associated with a particular skill and may route request data to one or more skills based on a determination that the skill has not received enough feedback data (e.g., by comparing a current amount of feedback data to a threshold and/or by using a machine learning model to determine whether or not the skill needs additional feedback data in order to optimize ranking/routing to the skill).”)
... controlled by a second ML model, different than the first ML model ... (PANDEY, col. 3, lines 48-53: “The predicted user feedback data may be used to retrain the ranking component and/or other machine learning models of a speech processing system that are used to dynamically route speech processing requests (e.g., input utterances) to a particular skill for processing.”; 
PANDEY, col. 5, lines 24-26: “For example, the NLU system may use one or more machine learning models to determine a semantic interpretation of user request data.”)
... via the first ML model ... (PANDEY, col. 4, lines 21-61: “Explicit feedback may be elicited for user interactions with the new skill. Additionally, over time, a user feedback prediction component may include a machine learning model (e.g., a feedback prediction machine learning model) that may be trained and/or updated based on user interactions with the new skill. Eventually, the user feedback prediction model may predict user feedback for the new skill without requesting explicit user feedback. ... In at least some examples, feedback may be requested and/or predicted based on a determination that a skill has received (and/or is associated with) less than a threshold amount of feedback data. In at least some further examples, an exploration policy may determine an amount of feedback data associated with a particular skill and may route request data to one or more skills based on a determination that the skill has not received enough feedback data (e.g., by comparing a current amount of feedback data to a threshold and/or by using a machine learning model to determine whether or not the skill needs additional feedback data in order to optimize ranking/routing to the skill).”)

	The combination of BALIGAR and PANDEY makes obvious:
determine, via a first machine learning (ML) model, that the utterance does not contain sufficient information to determine an action of the system that causes performance of a task by the system (Examiner’s Note: PANDEY discloses using a machine learning model to determine if additional feedback data is required from a user in order to optimize ranking/routing to a particular skill as shown above; the BALIGAR-PANDEY combination now modifies the voice assistant system of BALIGAR to utilize the machine learning models of PANDEY to determine if sufficient information is available to respond to a user’s speech request as in BALIGAR)
in response to the utterance not containing the sufficient information, initiate, via a virtual assistant controlled by a second ML model, different than the first ML model, a dialog with the user (Examiner’s Note: PANDEY discloses that the machine learning model used for speech recognition is different than the machine learning model used to determine if additional feedback data is required from a user in order to optimize ranking/routing to a particular skill as shown above; the BALIGAR-PANDEY now modifies the voice assistant device 110 of BALIGAR to use a different machine learning model to initiate dialog with a user as in PANDEY)
determine, via the first ML model, that sufficient information is available to determine the action based on the additional utterance (Examiner’s Note: PANDEY discloses using a machine learning model to determine if additional feedback data is required from a user in order to optimize ranking/routing to a particular skill as explained above; the BALIGAR-PANDEY combination now modifies the voice assistant system of BALIGAR to utilize the machine learning models of PANDEY to determine that sufficient information is available to respond to a user’s speech request as in BALIGAR based on the additional user feedback solicited in BALIGAR)
determine, via the first ML model, the task based on the utterance and the additional utterance (Examiner’s Note: the BALIGAR-PANDEY combination now modifies the voice assistant system of BALIGAR to utilize the machine learning model of PANDEY with respect to providing a user with an audio response, such as an output of an audio book, as disclosed by BALIGAR)

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of BALIGAR with the teachings of PANDEY as explained above.  As disclosed by PANDEY, one of ordinary skill would have been motivated to do so because “using machine learning models to dynamically learn routing may improve the accuracy of the routing of speech processing requests, resulting in improved user experiences and/or more pertinent responses to user request data. For example, a machine learning system may dynamically learn from contextual data and/or user feedback data to provide routing exceptions and/or routing flexibility, in contrast to a deterministic routing system.”  (PANDEY, col. 5, lines 9-16). One of ordinary skill would further understand the benefit of using a machine learning model to assist an automated assistant device to better understand user requests, and that such machine learning models could understand correlations that a statistical or rules-based system cannot comprehend.

	However, BALIGAR and PANDEY fail to explicitly teach:
	A vehicle

However, in a related field of endeavor (“establishing hands-free communications between an electronic device and a user via a software assistant” see para. 0002), NOY teaches:
A vehicle (NOY, para. 0016: “Referring to FIG. 1, an example environment 100 in which the techniques outlined above can be implemented includes an electronic device 102 and a vehicle 104 with a head unit 106.”; the BALIGAR-PANDEY-NOY combination now implements the voice assistant of BALIGAR (as modified by PANDEY) into the vehicle navigation system of the vehicle of NOY)

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of BALIGAR with the teachings of PANDEY and NOY as explained above.  As disclosed by NOY, one of ordinary skill would have been motivated to do so in order to “communication navigation-related information with the user.” (para. 0019).  One of ordinary skill would further be motivated to do so in order to audibly issue directions to a user so that the driver can pay attention to the road and is not looking at a separate map.

Regarding Claim 2
	BALIGAR and PANDEY disclose the apparatus of claim 1 as explained above.  BALIGAR further teaches:
prompt the user for the additional utterance based on content included in a most recently received utterance.  (BALIGAR, col. 8, lines 60-67: “At 408, the voice assistant service 102 may determine whether to request more information to determine a context of an audio request. For example, when the voice assistant service 102 includes enough information to understand and respond to an audio request that is supplemented by context information derived from the context queue, then the voice assistant service 102 may not request additional information from the user”;
Examiner’s Note: As shown in Fig. 4, in the first iteration the request for information is based on the initial utterance (corresponding to recited “most recently received utterance”) and can request more information at step 408 for an additional audio input from the user)

Regarding Claim 3
	BALIGAR and PANDEY and NOY disclose the apparatus of claim 1 as explained above.  BALIGAR further teaches:
wherein the virtual assistant is embedded within the system (BALIGAR, col. 2, lines 1-21: “The voice assistant system may include any system and/or device that receives audio commands from a user, processes the audio, possibly using speech to text algorithms and/or natural language processing (NLP) algorithms, to determine text, returns a reply based on the text, converts the reply to an audio output using text to speech algorithms, and causes a speaker to output the audio output. Examples of voice assistant systems include Alexa® provided by Amazon.com® of Seattle, Wash., Siri® provided by Apple Corp.® of Cupertino, Calif., and Cortana® provided by Microsoft Corp.® of Redmond, Wash. The voice assistant system may include a user device that typically includes at least a network interface, a microphone, and a speaker. The user device may be a smart phone, a dedicated device, and/or other devices controlled by users and located proximate to the users. The voice assistant system may include a service engine, which may be stored in a remote location (e.g., via remote computing devices such as in a cloud computing configuration, etc.), stored in a local device (e.g., a smartphone, a dedicated voice assistant device, etc.) and/or a combination of both.”)

However, BALIGAR and PANDEY fail to explicitly teach:
 the user is a passenger within vehicle

However, in a related field of endeavor (“establishing hands-free communications between an electronic device and a user via a software assistant” see para. 0002), NOY teaches:
the user is a passenger within vehicle (NOY, para. 0031: “Accordingly, the navigation application 160 may generate a trigger event to establish a communication session with a user, who may be a driver or passenger of the vehicle 104”; 
Examiner’s Note: the BALIGAR-PANDEY-NOY combination now implements the voice assistant of BALIGAR (as modified by PANDEY) into the vehicle navigation system of NOY, such that the user of the system is a passenger)

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of BALIGAR with the teachings of PANDEY and NOY as explained above.  As disclosed by NOY, one of ordinary skill would have been motivated to do so in order to “communication navigation-related information with the user.” (para. 0019).  One of ordinary skill would further be motivated to do so in order to audibly issue directions to a user so that the driver can pay attention to the road and is not looking at a separate map.

Regarding Claim 4
	BALIGAR and PANDEY and NOY disclose the apparatus of claim 1 as explained above.  BALIGAR further teaches:
request the user to confirm content included in a previously received utterance.  (BALIGAR, col. 4, lines 16-21: “As discussed herein, the voice assistant service 102 may be configured to engage in a dialog to receive an order of one or more items from the user, including facilitating selection and confirmation of items, and cause those items to be fulfilled and delivered to the use, or for other tasks or fulfillment of audible requests.”;
BALIGAR, col. 11, lines 14-16: “A confirmation page or reply (possibly via audio) may confirm completion of this action.”;
Examiner’s Note: BALIGAR discloses asking the user to confirm ordered items via dialog)

Regarding Claim 5
	BALIGAR and PANDEY and NOY disclose the apparatus of claim 1 as explained above.  BALIGAR further teaches:
wherein the processor is configured to predict an intent of the user after each utterance based on an aggregation of utterances during a conversation with the user, and (BALIGAR, col. 9, lines 19-23: “At 412, the voice assistant service 102 may implement supplemented audio request. For example, the voice assistant service 102 may provide an audio response to the user via the voice assistant device or voice assistant application.”;	
Examiner’s Note: for example, the voice assistant can predict that the user has asked a question that needs to be responded to (e.g., the “intent” is the request for information), and as shown by Fig. 4, this can be after a number of iterations for additional information, where the cumulative user speech inputs corresponds to the recited “aggregation of utterances during a conversation”)
when the processor determines that the sufficient information is available, the processor is configured to determine that sufficient information is available based on the aggregation of utterances.  (BALIGAR, col. 8, lines 60-67: “At 408, the voice assistant service 102 may determine whether to request more information to determine a context of an audio request. For example, when the voice assistant service 102 includes enough information to understand and respond to an audio request that is supplemented by context information derived from the context queue, then the voice assistant service 102 may not request additional information from the user”;
Examiner’s Note: As shown in Fig. 4, after step 410 (request for more information via audio in instances where not enough information is available), the flow returns to step 406 and then step 408 to analyze whether the additional information provided by the user is sufficient to understand and respond to the user’s request, where such repeated iterations resulting in cumulative user speech inputs corresponds to the recited “aggregation of utterances”)

Regarding Claim 7
	BALIGAR and PANDEY disclose the apparatus of claim 1 as explained above.  BALIGAR further teaches:
predict, ... a next action to be taken by the virtual assistant based on the utterance and a previously received utterance associated with a same task.  (BALIGAR, col. 9, lines 19-23: “At 412, the voice assistant service 102 may implement supplemented audio request. For example, the voice assistant service 102 may provide an audio response to the user via the voice assistant device or voice assistant application.”;	
Examiner’s Note: for example, the voice assistant can predict that the user has asked a question that needs to be responded to and then predict the response, and as shown by Fig. 4, this can be after a number of iterations for additional information, where the cumulative user speech inputs corresponds to the recited “previously received utterance associated with a same task”)

However, BALIGAR fails to explicitly teach:
via the second ML model, ... 

However, in a related field of endeavor (spoken language understanding techniques, see col. 2, lines 25-34), PANDEY teaches:
via the machine learning model, ... (PANDEY, col. 3, lines 48-53: “The predicted user feedback data may be used to retrain the ranking component and/or other machine learning models of a speech processing system that are used to dynamically route speech processing requests (e.g., input utterances) to a particular skill for processing.”; 
PANDEY, col. 5, lines 24-26: “For example, the NLU system may use one or more machine learning models to determine a semantic interpretation of user request data.”)

The combination of BALIGAR and PANDEY makes obvious:
predict, via the second ML model, a next action to be taken by the virtual assistant based on the utterance and a previously received utterance associated with a same task.  (PANDEY, col. 5, lines 24-26: “In various further examples, SLU may include TTS where a machine learning model may receive input audio data (e.g., a user utterance) and may generate output audio data in response to the utterance”;
Examiner’s Note: PANDEY discloses using a machine learning model to determine if additional feedback data is required from a user in order to optimize ranking/routing to a particular skill; the BALIGAR-PANDEY-NOY combination now modifies the voice assistant of BALIGAR to utilize the other machine learning model of PANDEY for spoken language understanding as in PANDEY in order to predict a responsive utterance to a user input speech request)

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of BALIGAR with the teachings of PANDEY and NOY as explained above.  As disclosed by PANDEY, one of ordinary skill would have been motivated to do so because “using machine learning models to dynamically learn routing may improve the accuracy of the routing of speech processing requests, resulting in improved user experiences and/or more pertinent responses to user request data. For example, a machine learning system may dynamically learn from contextual data and/or user feedback data to provide routing exceptions and/or routing flexibility, in contrast to a deterministic routing system.”  (PANDEY, col. 5, lines 9-16).  One of ordinary skill would further understand the benefit of using a machine learning model to assist an automated assistant device to better understand user requests, and that such machine learning models could understand correlations that a statistical or rules-based system cannot comprehend.

	Claim 9 recites a method that corresponds to the vehicle of claim 1 and is therefore rejected for the same reasons explained above with respect to claim 1.
Claim 10 depends from claim 9 and claims a method that corresponds to the apparatus of claim 2, and is therefore rejected for the same reasons explained above with respect to claims 2 and 9.
Claim 11 depends from claim 9 and claims a method that corresponds to the apparatus of claim 3, and is therefore rejected for the same reasons explained above with respect to claims 3 and 9.
Claim 12 depends from claim 9 and claims a method that corresponds to the apparatus of claim 4, and is therefore rejected for the same reasons explained above with respect to claims 4 and 9.
Claim 13 depends from claim 9 and claims a method that corresponds to the apparatus of claim 5, and is therefore rejected for the same reasons explained above with respect to claims 5 and 9.
Claim 15 depends from claim 9 and claims a method that corresponds to the apparatus of claim 7, and is therefore rejected for the same reasons explained above with respect to claims 7 and 9.
Claim 17 recites a computer-readable storage medium that corresponds to the vehicle of claim 1 and is therefore rejected for the same reasons explained above with respect to claim 1.
Claim 18 depends from claim 17 and claims a computer-readable storage medium that corresponds to the apparatus of claim 2, and is therefore rejected for the same reasons explained above with respect to claims 2 and 17.
Claim 19 depends from claim 17 and claims a computer-readable storage medium that corresponds to the apparatus of claim 3, and is therefore rejected for the same reasons explained above with respect to claims 3 and 17.
Claim 20 depends from claim 17 and claims a computer-readable storage medium that corresponds to the apparatus of claim 4, and is therefore rejected for the same reasons explained above with respect to claims 4 and 17.

Claims 6 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over BALIGAR in view of PANDEY and NOY and further in view of US 20170132199 A1, hereinafter referenced as VESCOVI.

Regarding Claim 6
	BALIGAR and PANDEY and NOY disclose the apparatus of claim 1 as explained above.  However, BALIGAR and PANDEY and NOY fail to explicitly teach:
wherein the processor is configured to determine, via the first ML model, that a sequence of utterances is required to make a prediction based on the utterance.  

	However, in a related field of endeavor, (virtual assistants, see para. 0003), VESCOVI teaches:
wherein the processor is configured to determine, via the first ML model, that a sequence of utterances is required to make a prediction based on the utterance.  (VESCOVI, para. 0277: “As illustrated in the example of FIG. 8A, the user requests that the digital assistant let a visitor (in this example, Tomas) into his apartment when that visitor arrives. The digital assistant determines whether the user request corresponds to at least one of a plurality of plan templates 802, as described below in greater detail relative to FIGS. 9A-9F. A plan template 802 includes a set of instructions 804 and corresponding inputs/outputs 806. As illustrated in the example of FIG. 8B, a generic plan template 802 includes a set of ordered instructions 804, beginning with one or more instructions 804 to gather information.”;
VESCOVI, para. 0283: “Referring to FIGS. 8E-8H, the inputs 806 associated with the “time expected” and “date expected” instructions in this particular example are not optional. Because those inputs 806 are not optional, the digital assistant has insufficient information to generate a plan with this plan template 802 if no input 806 is received in association with either of the “time expected” or “date expected” instructions 804. “Sufficient information” is the minimum information with which the digital assistant can generate a plan. Because the digital assistant cannot generate a plan based on the plan template 802 if it does not receive inputs associated with both the “time expected” and “date expected” instructions 804, the digital assistant initiates communication with the user to request sufficient information to generate a plan based on the plan template. As shown in FIG. 8E, the digital assistant requests 810 from the user: “on what date is he coming?” As shown in FIG. 8F, the user replies 812 with “tonight.” The digital assistant recognizes that the word “tonight” is associated with the same date on which the user spoke the reply 812, and as a result obtains today's date from the calendar module 248 or other suitable source. The time at which the visitor is to arrive is still required, so as shown in FIG. 8G, the digital assistant requests 814 “what time is he coming?” As shown in FIG. 8H, the user replies with “about 8:00 p.m.” Having received information associated with both the “time expected” and “date expected” instructions 804, the digital assistant now has sufficient information to generate a plan based on the plan template 802.”;
Examiner’s Note: VESCOVI teaches a virtual assistant that uses plan templates 802, including plan templates that require multiple required inputs, in order to perform a task, where the multiple required inputs can be gathered based on a sequence of utterances (see Fig. 8H, requiring input utterances 812 and 816); the BALIGAR-PANDEY-NOY-VESCOVI combination now modifies the voice assistant of BALIGAR (as modified by the machine learning models of PANDEY) to use the plan templates of VESCOVI to solicit multiple input utterances from users in order to carry out a prediction (e.g., answering a query as in BALIGAR) using the plan templates of VESCOVI.

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of BALIGAR with the teachings of PANDEY, NOY, and VESCOVI as explained above.  As disclosed by VESCOVI, one of ordinary skill would have been motivated to do so because such plan templates of VESCOVI enable “more complex actions, and actions that rely upon contingent inputs.” (para. 0302).

Claim 14 depends from claim 9 and claims a method that corresponds to the apparatus of claim 6, and is therefore rejected for the same reasons explained above with respect to claims 6 and 9.

Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over BALIGAR in view of PANDEY and NOY and further in view of US 20230353516 A1, hereinafter referenced as BANERJEE.

Regarding Claim 8
	BALIGAR and PANDEY and NOY disclose the apparatus of claim 1 as explained above.  However, BALIGAR and PANDEY and NOY fail to explicitly teach:
identify, via the second ML model, a word of interest within the additional utterance and an additional question for the virtual assistant to ask the user based on the word of interest.

	However, in a related field of endeavor (AI voice systems for interacting with users, see para. 0001), BANERJEE teaches:
identify, via the second ML model, a word of interest within the additional utterance and an additional question for the virtual assistant to ask the user based on the word of interest. (BANERJEE, para. 0092: “In either case, the machine learning algorithm determines the next question or action based on the user response. In the case of a first question in the form of a menu selection, the next question or action from the system is likely predetermined based on the selection. In the case of a first question asking for a free-form spoken or text input by the user, the system may analyze the response for keywords or keyword patterns to determine the next question or action.”);
Examiner’s Note: BANERJEE teaches using a machine learning algorithm to analyze a user speech input for keywords or keyword patterns (corresponding to recited “word of interest within an additional utterance”) and using such keywords to determine a next question or action (corresponding to recited “additional question for the virtual assistant to ask the user based on the identified word of interest”); the BALIGAR-PANDEY-NOY-BANERJEE combination now modifies the voice assistant of BALIGAR (implemented using the “other” ML model of PANDEY) to use the machine learning algorithms of BANERJEE to analyze user speech for keywords and then to determine the next question for the voice assistant of BALIGAR to ask based on such identified keyword of BANERJEE).

Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of BALIGAR with the teachings of PANDEY, NOY, and BANERJEE as explained above.  As disclosed by BANERJEE, one of ordinary skill would have been motivated to do so in order to use an “AI algorithm [that] adaptively guides the dialog to achieve the most favorable outcome based on the current status of the dialog.” (para. 0005).

Claim 16 depends from claim 9 and claims a method that corresponds to the apparatus of claim 8, and is therefore rejected for the same reasons explained above with respect to claims 8 and 9.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20210064624 A1 (Carbune).  “The one or more client devices 106 may include, for example, one or more of: a desktop computing device, a laptop computing device, a tablet computing device, a mobile phone computing device, a computing device of a vehicle of the user (e.g., an in-vehicle communications system, an in-vehicle entertainment system, an in-vehicle navigation system).” (para. 0031).  “In some implementations, multiple machine learning models may be employed to predict the probability P described above.” (para. 0074).
US 20230155707 A1 (Adyanthaya).  “The MapView system may store the machine-learning model locally or remotely (in a remotely accessible database, server, etc.). Since the machine-learning model is trained using data associated with a particular user, the MapView system may store multiple machine-learning models (locally or remotely) for each user configured to use the MapView system.” (para. 0043).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C LEE whose telephone number is (571)272-4933. The examiner can normally be reached M-F 12:00 pm - 8:00 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL C. LEE/Examiner, Art Unit 2128
Read full office action
Prosecution Timeline

Jan 06, 2023
Application Filed
Jan 08, 2026
Non-Final Rejection mailed — §101, §103
Mar 05, 2026
Response Filed
May 06, 2026
Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/475,724
Patent 12603081
METHOD AND SERVER FOR A TEXT-TO-SPEECH PROCESSING
4y 7m to grant Granted Apr 14, 2026
17/732,871
Patent 12602605
QUANTUM COMPUTER ARCHITECTURE BASED ON MULTI-QUBIT GATES
3y 11m to grant Granted Apr 14, 2026
17/207,554
Patent 12591915
METHODS AND SYSTEMS FOR DETERMINING RECOMMENDATIONS BASED ON REAL-TIME OPTIMIZATION OF MACHINE LEARNING MODELS
5y 0m to grant Granted Mar 31, 2026
18/885,396
Patent 12585743
INTERFACE ACCESS PROCESSING METHOD, COMPUTER DEVICE AND STORAGE MEDIUM
1y 6m to grant Granted Mar 24, 2026
17/486,877
Patent 12568935
AI-BASED LIVESTOCK MANAGEMENT SYSTEM AND LIVESTOCK MANAGEMENT METHOD THEREOF
4y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
61%
Grant Probability
87%
With Interview (+26.0%)
3y 3m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 144 resolved cases by this examiner. Grant probability derived from career allowance rate.