Last updated: April 19, 2026
Application No. 18/425,795
SYSTEM AND METHOD FOR DETERMINING A REAL-TIME RESPONSE BASED ON AN UNDERSTANDING OF THE CONVERSATIONAL CONTEXT

Non-Final OA §101§103
Filed
Jan 29, 2024
Examiner
PULLIAS, JESSE SCOTT
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Walmart Apollo LLC
OA Round
1 (Non-Final)
Interview Optional

— +13.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 1052 resolved cases, 2023–2026
Examiner Intelligence

PULLIAS, JESSE SCOTT View full profile →
Grants 83% — above average
Career Allow Rate
873 granted / 1052 resolved
+21.0% vs TC avg
Moderate +13% lift
Without
With
+13.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
47 currently pending
Career history
1099
Total Applications
across all art units
Statute-Specific Performance

§101
15.0%
-25.0% vs TC avg
§103
50.4%
+10.4% vs TC avg
§102
19.7%
-20.3% vs TC avg
§112
4.9%
-35.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1052 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to application 18/425,795, which was filed 01/29/24. Claims 1-20 are pending in the application and have been considered.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-3, 10-13, and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.   
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim recites “…upon receiving, from a computer network, a conversational input from a user device for a user, determining a context based on one or more contextual units, wherein the one or more contextual units are associated with immediate prior one or more conversational inputs relative to the conversational input; determining an intent associated with the conversational input based on the context; determining one or more entities associated with the conversational input based on the context and one or more expected entities determined based on one or more predefined conversation flows; determining an output based on the intent and the one or more entities; and transmitting, via the computer network, the output to be displayed on the user device”. 
The limitation of upon receiving, from a computer network, a conversational input from a user device for a user, determining a context based on one or more contextual units, wherein the one or more contextual units are associated with immediate prior one or more conversational inputs relative to the conversational input, as drafted, is a process that, under its broadest reasonable interpretation, but for “from a computer network” and “from a user device”, covers performance of the limitation in the mind. For example, “upon receiving, …, a conversational input … for a user, determining a context based on one or more contextual units, wherein the one or more contextual units are associated with immediate prior one or more conversational inputs relative to the conversational input” in the context of this claim encompasses mentally listening to a conversation and thinking about the prior utterance the speaker said. 
Similarly, the limitation of “determining an intent associated with the conversational input based on the context”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “determining an intent associated with the conversational input based on the context” in the context of this claim encompasses mentally determining an intent of the conversation utterance. 
Similarly, the limitation of “determining one or more entities associated with the conversational input based on the context and one or more expected entities determined based on one or more predefined conversation flows”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “determining one or more entities associated with the conversational input based on the context and one or more expected entities determined based on one or more predefined conversation flows”  in the context of this claim encompasses mentally determining a mentioned entity from the conversation based on a set of expected entities associated with a conversation flow. 
Similarly, the limitation of “determining an output based on the intent and the one or more entities”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “determining an output based on the intent and the one or more entities” in the context of this claim encompasses mentally determining an output to write on a sheet of paper based on the intent and the one or more entities.
Finally, the limitation of “transmitting, via the computer network, the output to be displayed on the user device”, as drafted, but for “transmitting, via the computer network” and “the user device” is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “…, the output to be displayed …” in the context of this claim encompasses writing down output on a sheet of paper for display.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. This judicial exception is not integrated into a practical application. In particular, the claim only recites four additional elements – “one or more processors”, “one or more non-transitory computer-readable media storing computing instructions”, “a computer network”, and “a user device”. The computing elements in this step are recited at a high-level of generality (i.e., as a generic one or more processors, a generic one or more non-transitory computer-readable media storing computing instructions, a generic computer network, and a generic user device) such that they amount to no more than mere instructions to apply the exception using generic computer elements. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a computing device to perform the determining amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. 
Specifically with respect to Step 2A, Prong Two, of the Alice/Mayo test, the judicial exception is not integrated into a practical application. Claim 1 does not recite any limitations that are not mental steps.
Specifically with respect to Step 2B of the Alice/Mayo test, “the claim as a whole does not amount to significantly more than the exception itself (there is no inventive concept in the claim)”. MPEP 2106.05 Il. There are no limitations in claim 1 outside of the judicial exception. As a whole, there does not appear to contain any inventive concept. As discussed above, claim 1 is a mental process that pertains to determining a response to a conversation, which can be performed entirely by a human with physical aids.
Dependent claims 2, 3, and 10 depend from claim 1, do not remedy any of the deficiencies of claim 1, and therefore are rejected on the same grounds as claim 1 above.
Generally, claims 2, 3, and 10 merely recite additional steps and details for determining a response to a conversation, all of which could be performed mentally or by writing down relationships with a pen and paper, and do not amount to anything more than substantially the same abstract idea as explained with respect to claim 1.
Specifically:
Claim 2 recites “wherein: each of the one or more contextual units comprises: a respective context conversational input for each of the immediate prior one or more conversational inputs; a respective context intent vector for a respective context intent associated with the respective context conversational input; and a respective context entities vector for one or more respective context entities associated with the respective context conversational input” which could be performed by writing down a respective context conversational input for each of the immediate prior one or more conversational inputs; a respective context intent vector for a respective context intent associated with the respective context conversational input; and a respective context entities vector for one or more respective context entities associated with the respective context conversational input on a sheet of paper.
Claim 3 recites “wherein: the respective context intent vector is encoded based on the respective context intent and predefined intent vector values; and the respective context entities vector is encoded based on the one or more respective context entities and predefined entity tags” which could be performed by writing down the respective context intent vector in a code based on the respective context intent and predefined intent vector values; and the respective context entities vector is encoded based on the one or more respective context entities and predefined entity tags on a sheet of paper. 
Claim 10 recites “wherein: the immediate prior one or more conversational inputs and the conversational input occur in a time session of a conversation” which could be performed by conducting a spoken conversation as a time session. 

In sum, claims 2, 3, and 10 depend from claim 1 and further recite mental processes as explained above. None of the additional limitations recited in claims 2, 3, and 10 amount to anything more than the same or a similar abstract idea as recited in claim 1. Nor do any limitations in claims 2, 3, and 10 (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception. Claims 2, 3, and 10 are not patent eligible.

Claim 11 is directed to a method that corresponds to the system of claim 1 and is therefore rejected for the same reasons set for the above with respect to claim 1. While claim 11 recites generic computer components (computing instructions, one or more processors, one or more non-transitory computer-readable media), such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function) such that they amount to no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Claim 11 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional limitations of using generic computer components amount to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. Claim 11 is not patent eligible.
Claims 12, 13, and 20 depend from claim 11, do not remedy any of the deficiencies of claim 11, and correspond to the subject matter of claims 2, 3, and 10 discussed above. These claims are therefore rejected on the same grounds as claim 2, 3, 10, and 11 above.  

Eligible Claims
Claim 4 recites wherein: determining the context further comprises: generating, by an embedding layer, a respective context token vector for each of the one or more contextual units based on the respective context conversational input of the each of the one or more contextual units; generating, by a feedforward layer, a respective consolidated vector for each of the one or more contextual units based on the respective context token vector, the respective context intent vector, and the respective context entities vector for the each of the one or more contextual units; and concatenating, by an attention layer, the respective consolidated vector for each of the one or more contextual units into a single multi-dimensional context vector which cannot be practically performed as a mental process. 

Claim 5 depends on and includes the eligible subject matter of intervening claim 4. 

Claim 6 recites wherein: determining the intent associated with the conversational input based on the context further comprises: generating, by an embedding layer, a token vector for the conversational input; and determining, by an intent classification layer, the intent based on the token vector and a single multi-dimensional context vector for the context which cannot be practically performed as a mental process.

Claim 7 depends on and includes the eligible subject matter of intervening claim 4. 

Claim 8 recites wherein: determining the one or more entities associated with the conversational input further comprises: generating, by an embedding layer, a token vector for the conversational input; concatenating the token vector, a single multi-dimensional context vector for the context, and an expected entities vector for the one or more expected entities into a consolidated entity vector; and determining, by an entity recognizing layer, a respective entity tag for each of the one or more entities based on the consolidated entity vector which cannot be practically performed as a mental process.

Claim 9 depends on and includes the eligible subject matter of intervening claim 4.

Claims 14-19 recite limitations similar to those in claims 4-9 discussed above, and are eligible for similar reasons.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mathias et al. (US 20210142794) in view of Suwandy et al. (US 20210327413).

Consider claim 1, Mathias discloses a system comprising: one or more processors (devices and servers having processors, [0128]); and 
one or more non-transitory computer-readable media storing computing instructions configured to, when run on the one or more processors (non-transitory computer readable storage medium with instructions executed by a processor, [0133]), cause the one or more processors to perform: 
upon receiving, from a computer network, a conversational input from a user device for a user, determining a context based on one or more contextual units, wherein the one or more contextual units are associated with immediate prior one or more conversational inputs relative to the conversational input (system receives utterance 524 “any Mexican restaurants there”, [0116], over network 199 from smart device, [0130], during conversation between user and system, [0112], and determines user utterance history, including previous utterance “what is the weather in san Francisco?”, [0113]-[0115]); 
determining an intent associated with the conversational input based on the context (determining that the user wishes to perform a Local Search for San Francisco based on the previous utterance from the history, i.e. context, [0116])); 
determining one or more entities associated with the conversational input based on the context and one or more expected entities determined based on one or more predefined conversation flows (scoring entities for slots that need to be filled for the current intent, e.g. <placetype> and <city>, a predefined conversation flow, [0116]); 
determining an output based on the intent and the one or more entities (“La Taqueria is a mile away”, [0117], [0116]); and 
transmitting, via the computer network, the output on the user device (results are transmitted from server over network to devices for output, [0114], [0130], [0117]).
Mathias does not specifically mention output to be displayed on the user device.
Suwandy discloses output to be displayed on the user device (chat window 608 displays output from conversational bot on web browser 604 of computing device 602, [0071], Fig. 6).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Mathias by including output to be displayed on the user device in order to address the increasing need for conversational bots or assistants to handle requests and commands, predictably resulting in helping entities assist their customers with goods and services, as suggested by Suwandy ([0001]). The references cited are analogous art in the same field of natural language understanding. 

Consider claim 11, Mathias discloses a method being implemented via execution of computing instructions configured to run at one or more processors and stored at one or more non-transitory computer-readable media (non-transitory computer readable storage medium with instructions to perform a method executed by a processor, [0133]), the method comprising: 
upon receiving, from a computer network, a conversational input from a user device for a user, determining a context based on one or more contextual units, wherein the one or more contextual units are associated with immediate prior one or more conversational inputs relative to the conversational input (system receives utterance 524 “any Mexican restaurants there”, [0116], over network 199 from smart device, [0130], during conversation between user and system, [0112], and determines user utterance history, including previous utterance “what is the weather in san Francisco?”, [0113]-[0115]); 
determining an intent associated with the conversational input based on the context (determining that the user wishes to perform a Local Search for San Francisco based on the previous utterance from the history, i.e. context, [0116])); 
determining one or more entities associated with the conversational input based on the context and one or more expected entities determined based on one or more predefined conversation flows (scoring entities for slots that need to be filled for the current intent, e.g. <placetype> and <city>, a predefined conversation flow, [0116]); 
determining an output based on the intent and the one or more entities (“La Taqueria is a mile away”, [0117], [0116]); and 
transmitting, via the computer network, the output on the user device (results are transmitted from server over network to devices for output, [0114], [0130], [0117]).
Mathias does not specifically mention output to be displayed on the user device.
Suwandy discloses output to be displayed on the user device (chat window 608 displays output from conversational bot on web browser 604 of computing device 602, [0071], Fig. 6).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Mathias by including output to be displayed on the user device for reasons similar to those for claim 1. 
	
Consider claim 2, Mathias discloses: each of the one or more contextual units comprises: a respective context conversational input for each of the immediate prior one or more conversational inputs (e.g. previous utterance “what is the weather in san Francisco?”, which generates <getweather> intent which requires a <location> slot, [0113]-[0115]); a respective context intent vector for a respective context intent associated with the respective context conversational input (e.g. <getweather> and <location>, the pair considered a context intent vector, [0113]; these are embedded in vector form by Dialog Tracker 590, Fig 5); and a respective context entities vector for one or more respective context entities associated with the respective context conversational input (entity-key pairs embeddings, [0013], Fig. 5, element 590). 

Consider claim 3, Mathias discloses: the respective context intent vector is encoded based on the respective context intent and predefined intent vector values (e.g. the slots belonging to <getweather> and <localsearch>, [0113]-[0115], Fig 5); and the respective context entities vector is encoded based on the one or more respective context entities and predefined entity tags (e.g. the entity candidates to fill the <placetype>, <city> slots, etc., [0113]-[0115], Fig 5).

Consider claim 4, Mathias discloses: determining the context further comprises: generating, by an embedding layer, a respective context token vector for each of the one or more contextual units based on the respective context conversational input of the each of the one or more contextual units (Encoder 550 generates context embeddings, Fig 5, [0113-0115]); generating, by a feedforward layer, a respective consolidated vector for each of the one or more contextual units based on the respective context token vector, the respective context intent vector, and the respective context entities vector for the each of the one or more contextual units (e.g. <getweather> and <location>, the pair considered a context intent vector, [0113]; these are embedded in vector form by Dialog Tracker 590, Fig 5, for each utterance turn); and concatenating, by an attention layer, the respective consolidated vector for each of the one or more contextual units into a single multi-dimensional context vector (word attention vectors concatenated into per-stream context vector computed by attention models, [0104], Fig 5).

Consider claim 5, Mathias does not, but Suwandy discloses wherein one or more of: 
the embedding layer comprises a pre-trained BERT model (embeddings encoded via BERT, a well known pre-trained model, [0026]); or the respective context token vector for each of the one or more contextual units further comprises one or more CLS tokens (noting that this limitation is not required by the claim language, which only requires “one or more of”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Mathias such that the embedding layer comprises a pre-trained BERT model for reasons similar to those for claim 1. 

Consider claim 6, Mathias discloses: determining the intent associated with the conversational input based on the context further comprises: generating, by an embedding layer, a token vector for the conversational input (e.g. user utterance embeddings, [0104], Fig 5); and determining, by an intent classification layer, the intent based on the token vector and a single multi-dimensional context vector for the context (decoder 570 determines a score that corresponds to whether a particular candidate value pair corresponds to slot needed to execute the intent, [0080], performing intent classification, [0057]).

Consider claim 7, Mathias discloses one or more of: 
the embedding layer comprises a pre-trained BERT model (noting that this limitation is not required by the claim language, which only requires “one or more of”); 
the token vector for the conversational input further comprises one or more CLS tokens (noting that this limitation is not required by the claim language, which only requires “one or more of”); the 
intent classification layer comprises a first feedforward layer and a softmax layer (Dense Layer 574 and Softmax 576, Fig 5, [0108]); or 
the single multi-dimensional context vector for the context is determined by: generating, by the embedding layer, a respective context token vector for each of the one or more contextual units based on a respective context conversational input of the each of the one or more contextual units; generating, by a second feedforward layer, a respective consolidated vector for each of the one or more contextual units based on: (a) the respective context token vector for the each of the one or more contextual units, (b) a respective context intent vector for a respective context intent associated with the respective context conversational input of the each of the one or more contextual units, and (c) a respective context entities vector for one or more respective context entities associated with the respective context conversational input of the each of the one or more contextual units; and concatenating, by an attention layer, the respective consolidated vector for each of the one or more contextual units into the single multi-dimensional context vector (noting that this limitation is not required by the claim language, which only requires “one or more of”).

Consider claim 8, Mathias discloses: determining the one or more entities associated with the conversational input further comprises: generating, by an embedding layer, a token vector for the conversational input (e.g. user utterance embeddings, [0104], Fig 5); concatenating the token vector, a single multi-dimensional context vector for the context, and an expected entities vector for the one or more expected entities into a consolidated entity vector (vector generated by encoder 550, which concatenates vectors, including slot embedding i.e. expected entities vector, and utterance history embeddings, [0108-0109], Fig 5); and determining, by an entity recognizing layer, a respective entity tag for each of the one or more entities based on the consolidated entity vector (score for particular candidate keyvalue pair under consideration for the particular slot needed to operate the current intent, [0109], e.g. “San Francisco”, [0113]-[0115]).

Consider claim 9, Mathias discloses wherein one or more of: 
the single multi-dimensional context vector for the context is determined by: generating, by the embedding layer, a respective context token vector for each of the one or more contextual units based on a respective context conversational input of the each of the one or more contextual units (noting that this limitation is not required by the claim language, which only requires “one or more of”); generating, by a third feedforward layer, a respective consolidated vector for each of the one or more contextual units based on: (a) the respective context token vector for the each of the one or more contextual units, (b) a respective context intent vector for a respective context intent associated with the respective context conversational input of the each of the one or more contextual units, and (c) a respective context entities vector for one or more respective context entities associated with the respective context conversational input of the each of the one or more contextual units; and concatenating, by an attention layer, the respective consolidated vector for each of the one or more contextual units into the single multi-dimensional context vector (noting that this limitation is not required by the claim language, which only requires “one or more of”); 
the expected entities vector is encoded based on the one or more expected entities and predefined entity tags (e.g. the entity candidates to fill the <placetype>, <city> slots, etc., [0113]-[0115], Fig 5); 
the embedding layer comprises a pre-trained BERT model (noting that this limitation is not required by the claim language, which only requires “one or more of”); 
the token vector for the conversational input further comprises one or more CLS tokens (noting that this limitation is not required by the claim language, which only requires “one or more of”); or 
the entity recognizing layer comprises a fourth feedforward layer and a softmax layer (noting that this limitation is not required by the claim language, which only requires “one or more of”).

Consider claim 10, Mathias discloses: the immediate prior one or more conversational inputs and the conversational input occur in a time session of a conversation (the context data includes time data, such as time of receipt of the audio data, [0063], for a dialog session with multiple utterances, [0070]).

Consider claim 12, Mathias discloses: each of the one or more contextual units comprises: a respective context conversational input for each of the immediate prior one or more conversational inputs (e.g. previous utterance “what is the weather in san Francisco?”, which generates <getweather> intent which requires a <location> slot, [0113]-[0115]); a respective context intent vector for a respective context intent associated with the respective context conversational input (e.g. <getweather> and <location>, the pair considered a context intent vector, [0113]; these are embedded in vector form by Dialog Tracker 590, Fig 5); and a respective context entities vector for one or more respective context entities associated with the respective context conversational input (entity-key pairs embeddings, [0013], Fig. 5, element 590). 

Consider claim 13, Mathias discloses: the respective context intent vector is encoded based on the respective context intent and predefined intent vector values (e.g. the slots belonging to <getweather> and <localsearch>, [0113]-[0115], Fig 5); and the respective context entities vector is encoded based on the one or more respective context entities and predefined entity tags (e.g. the entity candidates to fill the <placetype>, <city> slots, etc., [0113]-[0115], Fig 5).

Consider claim 14, Mathias discloses: determining the context further comprises: generating, by an embedding layer, a respective context token vector for each of the one or more contextual units based on the respective context conversational input of the each of the one or more contextual units (Encoder 550 generates context embeddings, Fig 5, [0113-0115]); generating, by a feedforward layer, a respective consolidated vector for each of the one or more contextual units based on the respective context token vector, the respective context intent vector, and the respective context entities vector for the each of the one or more contextual units (e.g. <getweather> and <location>, the pair considered a context intent vector, [0113]; these are embedded in vector form by Dialog Tracker 590, Fig 5, for each utterance turn); and concatenating, by an attention layer, the respective consolidated vector for each of the one or more contextual units into a single multi-dimensional context vector (word attention vectors concatenated into per-stream context vector computed by attention models, [0104], Fig 5).

Consider claim 15, Mathias does not, but Suwandy discloses wherein one or more of: 
the embedding layer comprises a pre-trained BERT model (embeddings encoded via BERT, a well known pre-trained model, [0026]); or the respective context token vector for each of the one or more contextual units further comprises one or more CLS tokens (noting that this limitation is not required by the claim language, which only requires “one or more of”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Mathias such that the embedding layer comprises a pre-trained BERT model for reasons similar to those for claim 1. 

Consider claim 16, Mathias discloses: determining the intent associated with the conversational input based on the context further comprises: generating, by an embedding layer, a token vector for the conversational input (e.g. user utterance embeddings, [0104], Fig 5); and determining, by an intent classification layer, the intent based on the token vector and a single multi-dimensional context vector for the context (decoder 570 determines a score that corresponds to whether a particular candidate value pair corresponds to slot needed to execute the intent, [0080], performing intent classification, [0057]).

Consider claim 17, Mathias discloses one or more of: 
the embedding layer comprises a pre-trained BERT model (noting that this limitation is not required by the claim language, which only requires “one or more of”); 
the token vector for the conversational input further comprises one or more CLS tokens (noting that this limitation is not required by the claim language, which only requires “one or more of”); the 
intent classification layer comprises a first feedforward layer and a softmax layer (Dense Layer 574 and Softmax 576, Fig 5, [0108]); or 
the single multi-dimensional context vector for the context is determined by: generating, by the embedding layer, a respective context token vector for each of the one or more contextual units based on a respective context conversational input of the each of the one or more contextual units; generating, by a second feedforward layer, a respective consolidated vector for each of the one or more contextual units based on: (a) the respective context token vector for the each of the one or more contextual units, (b) a respective context intent vector for a respective context intent associated with the respective context conversational input of the each of the one or more contextual units, and (c) a respective context entities vector for one or more respective context entities associated with the respective context conversational input of the each of the one or more contextual units; and concatenating, by an attention layer, the respective consolidated vector for each of the one or more contextual units into the single multi-dimensional context vector (noting that this limitation is not required by the claim language, which only requires “one or more of”).

Consider claim 18, Mathias discloses: determining the one or more entities associated with the conversational input further comprises: generating, by an embedding layer, a token vector for the conversational input (e.g. user utterance embeddings, [0104], Fig 5); concatenating the token vector, a single multi-dimensional context vector for the context, and an expected entities vector for the one or more expected entities into a consolidated entity vector (vector generated by encoder 550, which concatenates vectors, including slot embedding i.e. expected entities vector, and utterance history embeddings, [0108-0109], Fig 5); and determining, by an entity recognizing layer, a respective entity tag for each of the one or more entities based on the consolidated entity vector (score for particular candidate keyvalue pair under consideration for the particular slot needed to operate the current intent, [0109], e.g. “San Francisco”, [0113]-[0115]).

Consider claim 19, Mathias discloses wherein one or more of: 
the single multi-dimensional context vector for the context is determined by: generating, by the embedding layer, a respective context token vector for each of the one or more contextual units based on a respective context conversational input of the each of the one or more contextual units (noting that this limitation is not required by the claim language, which only requires “one or more of”); generating, by a third feedforward layer, a respective consolidated vector for each of the one or more contextual units based on: (a) the respective context token vector for the each of the one or more contextual units, (b) a respective context intent vector for a respective context intent associated with the respective context conversational input of the each of the one or more contextual units, and (c) a respective context entities vector for one or more respective context entities associated with the respective context conversational input of the each of the one or more contextual units; and concatenating, by an attention layer, the respective consolidated vector for each of the one or more contextual units into the single multi-dimensional context vector (noting that this limitation is not required by the claim language, which only requires “one or more of”); 
the expected entities vector is encoded based on the one or more expected entities and predefined entity tags (e.g. the entity candidates to fill the <placetype>, <city> slots, etc., [0113]-[0115], Fig 5); 
the embedding layer comprises a pre-trained BERT model (noting that this limitation is not required by the claim language, which only requires “one or more of”); 
the token vector for the conversational input further comprises one or more CLS tokens (noting that this limitation is not required by the claim language, which only requires “one or more of”); or 
the entity recognizing layer comprises a fourth feedforward layer and a softmax layer (noting that this limitation is not required by the claim language, which only requires “one or more of”).

Consider claim 20, Mathias discloses: the immediate prior one or more conversational inputs and the conversational input occur in a time session of a conversation (the context data includes time data, such as time of receipt of the audio data, [0063], for a dialog session with multiple utterances, [0070]).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 20240037339 Chen discloses domain-specific named entity recognition via graph neural networks
US 20210350209 Wang discloses intent and context-aware dialogue based virtual assistance
US 11580968 Gupta discloses contextual natural language understanding for conversational agents
US 20200143247 Jonnalagadda discloses automated conversations with intent and action response generation
US 12136414 Thomas discloses integrating dialog history into end-to-end spoken language understanding systems

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. 

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Jesse S Pullias/
Primary Examiner, Art Unit 2655                                                   09/02/25
Read full office action
Prosecution Timeline

Jan 29, 2024
Application Filed
Sep 02, 2025
Non-Final Rejection — §101, §103
Nov 25, 2025
Examiner Interview Summary
Nov 25, 2025
Applicant Interview (Telephonic)
Dec 04, 2025
Response Filed
Dec 04, 2025
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/385,358
Patent 12596885
Automatically Labeling Items using a Machine-Trained Language Model
2y 5m to grant Granted Apr 07, 2026
17/747,704
Patent 12573378
SPEECH TENDENCY CLASSIFICATION
2y 5m to grant Granted Mar 10, 2026
18/168,450
Patent 12572740
MULTI-LANGUAGE DOCUMENT FIELD EXTRACTION
2y 5m to grant Granted Mar 10, 2026
18/410,097
Patent 12566929
COMBINING DATA SELECTION AND REWARD FUNCTIONS FOR TUNING LARGE LANGUAGE MODELS USING REINFORCEMENT LEARNING
2y 5m to grant Granted Mar 03, 2026
17/838,199
Patent 12536389
TRANSLATION SYSTEM
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
96%
With Interview (+13.0%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 1052 resolved cases by this examiner. Grant probability derived from career allow rate.