Last updated: May 29, 2026
Application No. 18/304,341
RUNTIME ALIGNMENT OF LANGUAGE MODELS IN CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Non-Final OA §103
Filed
Apr 20, 2023
Examiner
WITHEY, THEODORE JOHN
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
3 (Non-Final)
Interview Optional

— +51.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 44% grant rate with +51.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 25 resolved cases, 2023–2026
Examiner Intelligence

WITHEY, THEODORE JOHN View full profile →
Grants 44% of resolved cases
Career Allowance Rate
11 granted / 25 resolved
-18.0% vs TC avg
Strong +51% interview lift
Without
With
+51.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
17 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
0.6%
-39.4% vs TC avg
§103
99.4%
+59.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 25 resolved cases
Office Action

§103
DETAILED ACTION
	This office action is in response to Applicant’s Request for Continued Examination (RCE), received on 12/30/2025. Claims 1, 11, and 19 have been amended (as entered on 12/01/25). Claims 1-20 are pending and have been considered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/30/2025 has been entered.
 
Response to Arguments
Applicant’s arguments, see pgs. 9-15, filed 12/01/2025, with respect to “Claim Rejections under 35 U.S.C. 101” have been fully considered and are persuasive.  The rejections of claims 1-20 have been withdrawn. The examiner would like to note that the claims have been deemed to be containing eligible subject matter under 35 U.S.C. 101 due to the inclusion of defining the language model to be implemented as an artificial neural network (ANN), wherein the model only has the constraints for generation applied at a runtime, indicating a zero-shot operation for generating specific natural language responses, implementing the claimed improvement of dynamically adjusting the constraints at a runtime, removing the need to re-train the model for each context (see [0002]-[0004] of instant application).
Applicant’s arguments, see pgs. 15-18, filed 12/01/2025, with respect to the rejection(s) of claim(s) 1, 11, and 19 under 35 U.S.C. 102(a)(1) have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Moon et al. (US-11442992-B1), hereinafter Moon. Moon discloses training of an ANN ([Col. 46, Lines 15-45]), wherein the training is in the context of managing a conversation flow with a user (see [Fig. 4A]). See updated rejections below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-3, 5, 7, 9-13, 16-17, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lam et al. (“Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with a Distilled Representation”), hereinafter Lam, in view of Moon et al. (US-11442992-B1), hereinafter Moon.

	Regarding claim 1, Lam discloses: a method comprising:
	generating, based at least on a user input ([pg. 14, Appendix, Table 3, Turn 1, Input] I’d like hotel recommendations), a canonical form that comprises a constrained semantic representation of the user input ([pg. 14, Appendix, Table 3, Turn 1, DST Target] ( hotels search ) [Concatenating the request to provide a recommendation to “hotel search” is a constrained semantic representation of input, i.e. a canonical form]);
determining, based at least on the canonical form, a dialog flow that controls output of a language model ([pg. 14, Appendix, Table 3, Turn 1, DAG Target] ( hotels search ) request rating , request stars [Requesting a rating or stars given to a hotel from a user (In view of response generation “Do you have any requirements for the hotel’s rating or the number of stars of the hotel?”) before making a final recommendation is indicative of a dialog flow that controls the output of a language model, i.e. generated response, based on the canonical form “hotels search”. Further consider Section 5.2 where the used model is claimed to be mBART, a well-known language model]), the dialog flow defining a sequence of past and future dialog between a user and outputs of the language model ([pg. 14, Appendix, Table 3, Turn 1, RG Target, Turn 2, DST Input Agent Acts and User response], [In view of the above dialog flow “(hotels search) request rating, request stars”, wherein each request defines a sequence of dialog to be performed between a user and outputs of language model, i.e. the requests, in later turns such as rating and stars in turn 2, wherein at the DST stage, the “(hotels search)” is defining a past dialog request to be clarified with the future “request rating” and “request stars” operations]); and,
performing one or more operations to execute the dialog flow to generate an output using the language model at runtime ([pg. 15, Appendix, Table 3, Turn 1, RG Prediction] Do you have a preference on how many stars and what rating the hotel should have? [Sending a response to a user is an operation to execute the steps of asking for stars and rating, i.e. dialog flow, based on the previously determined dialog flow DAG Prediction to generate an output, i.e. the response. Further, see section 9, “Ethical Considerations”, which disclose runs indicating the operation to be performed at runtime]), the dialog flow configuring the language model to generate the output according to constraints defined in the dialog flow based on (1) a match between the canonical form and a user input defined in the dialog flow ([In view of the dialog flow being generated based upon the canonical form, which itself is based upon user input, it is unclear to the examiner how there would not inherently always be a match between the canonical form and user input defined in the dialog flow for generating output as the dialog flow is generated based on the canonical form, and, therefore, the user input]) and (2) a corresponding canonical form of a language model output defined in the dialog flow ([pg. 14, Appendix, Table 3, Turn 2, DAG prediction “request location/price_level”], [“request location” is the canonical form of the generated language model output “And what about location?” defined in the multi-turn dialog flow]).
Lam does not disclose:
the language model being implemented as an artificial neural network; and,
the constraints being applied at runtime to the language model, and training data sets used to train the language model excluding feedback related to the constraints such that the language model is not pre-trained with the constraints.
Moon discloses:
the language model being implemented as an artificial neural network ([Col. 23, Lines 50-60] The assistant system 140 may then select, by a conversational reasoning model, one or more candidate nodes from the knowledge graph corresponding to one or more candidate entities, respectively. Each candidate node may be selected based on the nodes corresponding to the initial entities, one or more dialog states associated with the query, and a context associated with the query, [Col. 24, Lines 54-66] FIG. 10 illustrates an example artificial neural network (“ANN”) 1000. In particular embodiments, an ANN may refer to a computational model comprising one or more nodes. Example ANN 1000 may comprise an input layer 1010, hidden layers 1020, 1030, 1040, and an output layer 1050. Each layer of the ANN 1000 may comprise one or more nodes, such as a node 1005 or a node 1015. In particular embodiments, each node of an ANN may be connected to another node of the ANN. As an example and not by way of limitation, each node of the input layer 1010 may be connected to one of more nodes of the hidden layer 1020. In particular embodiments, one or more nodes may be a bias node, [A conversational reasoning, i.e. language, model for selecting nodes, wherein an ANN is defined as the model performing the operations, indicates the conversational reasoning model to be implemented using the ANN]); and,
the constraints being applied at runtime to the language model ([Col. 25, Lines 65-67]-[Col. 26, Lines 1-5] (2) a zero-shot learning model that leverages previous sentence, dialog, and KG contexts to re-rank candidates from pruned decoder graph output based on their relevance and path scores, which allows for generalizable and robust classification with a large number of candidate classes, [A zero-shot learning model indicates the constraints of a previous sentence/dialog/context are used for generating/ranking candidates without prior training]), and training data sets used to train the language model excluding feedback related to the constraints such that the language model is not pre-trained with the constraints ([Col. 28, Lines 45-50] The embodiments disclosed herein compute zero-shot relevance score in the KG embeddings space, thus allowing for robust prediction for KG entities and domains unseen during training as well, [An entity/domain unseen during training indicates there is no pre-training operation associated with the entity/domain, necessarily having constraints related to the entity/domain]).
 Lam and Moon are considered analogous art within conversational reasoning within knowledge bases. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lam to incorporate the teachings of Moon, because of the novel way to associate walk paths of a knowledge graph with input contexts including dialog state, sentence, and initial entities mentioned in the conversation for ranking candidate entities using a zero-shot relevance learning model which results in more accurate and relevant entities in generated responses within multi-turn dialogs (Moon, [Col. 3, Lines 15-40]).
	
	
Regarding claim 2, Lam in view of Moon discloses: the method of claim 1.
Lam further discloses:
wherein the performing the one or more operations to execute the dialog flow comprises using at least the language model to generate the output ([pg. 5, Section 5.2, Par. 1] All models use a standard Seq2Seq architecture with a bidirectional encoder and left-to-right autoregressive decoder. mBART is pre-trained to denoise text in 50 languages, while mT5 is trained on 101 languages [mBART can reasonably be classified as a language model]).

Regarding claim 3, Lam in view of Moon discloses: the method of claim 1.
Lam further discloses:
generating a second canonical form based at least on the output ([pg. 14, Appendix, Table 3, Turn 2, DST Prediction] ( hotels search ) rating equal_to " don’t care " , stars at_least " 5 " [In view of the previously disclosed ( hotels search ) canonical form of Turn 1, it can be seen that the addition of rating equal_to and stars at_least is an second, updated canonical form based on the output from the first turn, i.e. asking for rating and stars]);
determining a second dialog flow based at least on the second canonical form ([pg. 14, Appendix, Table 3, Turn 2, DAG Prediction] ( hotels search ) request location , request price_level [In view of turn 1, it can be seen that a second dialog flow, i.e. determining location and price_level requests, in view of the second canonical form (see above element) indicating a hotel search with rating and stars already decided (tracking to output from a first turn), indicating the flow should ask other questions based on the elements of a second canonical form, which is based on output from a first turn]); and,
performing one or more second operations to execute the second dialog flow to generate a second output ([pg. 14, Appendix, Table 3, Turn 2, RG Prediction] And what about location? Do you have a price range for the hotel? [Asking a user for location a pricing is executing the steps of the dialog flow determined in the DAG prediction to generate a second output, i.e. the response, in view of the first output RG prediction of turn 1]).

Regarding claim 5, Lam in view of Moon discloses: the method of claim 1.
Lam further discloses:
wherein the generating the canonical form comprises processing the user input using a trained machine learning model ([pg. 5, Section 5.2, Par. 1] We use mbart-large-50 as the neural model for our agent in all our experiments. All models use a standard Seq2Seq architecture with a bidirectial encoder and left-to-right autoregressive decoder. mBART is pre-trained to denoise text in 50 languages [In view of the previously generated canonical forms of Lam, disclosed in claim 1 rejection]).

Regarding claim 7, Lam in view of Moon discloses: the method of claim 1.
Lam further discloses:
wherein the determining the dialog flow comprises generating the dialog flow based at least on the canonical form ([pg. 14, Table 3, Turn 1, DST Prediction] ( hotels search ) [Based on an input “I’d like hotel recommendations”, indicating “hotels search” is a canonical form], [pg. 14, Table 3, Turn 1, DAG Prediction]) ( hotels search ) request rating , request stars [Determining to ask for a rating and/or stars for the hotel indicates a generated dialog flow based on the canonical form “hotels search”]).

Regarding claim 9, Lam in view of Moon discloses: the method of claim 1.
Lam further discloses:
wherein the performing the one or more operations to execute the dialog flow comprises:
generating an embedding of a second canonical form associated with the dialog flow in a semantic or latent space ([pg. 14, Table 3, Turns 1, 2 DST Prediction] [In view of the sentence embedding of Lam ([pg. 5, section 4.2, Par. 1]), indicating the canonical forms can also be embedded as they are in some form of sentence, further in view of the addition of rating equal_to “don’t care” and stars at_least “5” to the turn 2 DST prediction indicating a second canonical form associated with the dialogue flow in view of the canonical form “(hotels search)” of turn 1]);
determining one or more canonical forms based at least on the embedding of the second canonical form and one or more embeddings of one or more predefined canonical forms in the semantic or latent space ([Fig. 1, History], [pg. 14, Table 3, Turn 2, ACD, DAG], [Determining to add request location and request price_level canonical forms to the canonical form DAG prediction which comes after the rating and star determinations of the earlier DST section of turn 2 indicates determination of the canonical form “(hotels search) request location request price_level” is based on the second canonical form, i.e. [pg. 14, Appendix, Table 3, Turn 2, DST Prediction] ( hotels search ) rating equal_to " don’t care " , stars at_least " 5 ", e.g. not needing to include these pieces of information again, and a predefined canonical form in a semantic or latent space in view of the previous dialogue acts and retrieved results of Fig. 1 of Lam indicating predefined, i.e. historical canonical forms, further in view of the API and ACTS calls of the turns indicating embedding to transmit information and perform those calls in a semantic or latent space]);
generating a prompt that includes the one or more canonical forms ([pgs. 14-15, Turns 1-3, DAG Predictions]), one or more example outputs associated with the canonical forms ([pgs. 14-15, Turns 1-2, RG Predictions]), and at least a portion of a current conversation ([pgs. 14-15, Turns 1-3, DST Inputs]); and,
processing the prompt using the language model to generate the output ([pgs. 14-15, Turn 3, RG Prediction] “There are 4 available hotels. I recommend Royal Plaza Hotel. Its rating is 9.” [In view of the mBART, i.e. language model, used to perform the operations of Lam as disclosed in Section 5.2]).

Regarding claim 10, Lam in view of Moon discloses: the method of claim 1.
Lam further discloses:
wherein the canonical form and the dialog flow are specified in a formal modeling language ([pg. 15, Turn 3, ACD Input], [In view of [0035] of the instant application which defines modeling language as a programming language that “requires a particular syntax defining combinations of symbols that are considered to be correctly structured statements” indicating that the labels <state>…<endofstate> and <history>…<endofhistory>, which respectively correspond to dialogue flow and canonical forms, place dialog flow and canonical form information in a formal modeling language format]).

Regarding claim 11, Lam discloses: a processor comprising:
one or more processing units to perform operations ([Section 5.2, Par. 1] We also use the Dialogues4 library for data preprocessing and evaluation [Preprocessing data indicates a processor to perform that action]) comprising:
	generating, based at least on a user input ([pg. 14, Appendix, Table 3, Turn 1, Input] I’d like hotel recommendations), a canonical form that comprises a constrained semantic representation of the user input ([pg. 14, Appendix, Table 3, Turn 1, DST Target] ( hotels search ) [Concatenating the request to provide a recommendation to “hotel search” is a constrained semantic representation of input, i.e. a canonical form]);
determining, based at least on the canonical form, a dialog flow that controls output of a language model ([pg. 14, Appendix, Table 3, Turn 1, DAG Target] ( hotels search ) request rating , request stars [Requesting a rating or stars given to a hotel from a user (In view of response generation “Do you have any requirements for the hotel’s rating or the number of stars of the hotel?”) before making a final recommendation is indicative of a dialog flow that controls the output of a language model, i.e. generated response, based on the canonical form “hotels search”. Further consider Section 5.2 where the used model is claimed to be mBART, a well-known language model]), the dialog flow defining a sequence of past and future dialog between a user and outputs of the language model ([pg. 14, Appendix, Table 3, Turn 1, RG Target, Turn 2, DST Input Agent Acts and User response], [In view of the above dialog flow “(hotels search) request rating, request stars”, wherein each request defines a sequence of dialog to be performed between a user and outputs of language model, i.e. the requests, in later turns such as rating and stars in turn 2, wherein at the DST stage, the “(hotels search)” is defining a past dialog request to be clarified with the future “request rating” and “request stars” operations]);; and,
performing one or more operations to execute the dialog flow to generate an output using the language model at runtime ([pg. 15, Appendix, Table 3, Turn 1, RG Prediction] Do you have a preference on how many stars and what rating the hotel should have? [Sending a response to a user is an operation to execute the steps of asking for stars and rating, i.e. dialog flow, based on the previously determined dialog flow DAG Prediction to generate an output, i.e. the response. Further, see section 9, “Ethical Considerations”, which disclose runs indicating the operation to be performed at runtime]), the dialog flow configuring the language model to generate the output according to constraints defined in the dialog flow based on (1) a match between the canonical form and a user input defined in the dialog flow ([In view of the dialog flow being generated based upon the canonical form, which itself is based upon user input, it is unclear to the examiner how there would not inherently always be a match between the canonical form and user input defined in the dialog flow for generating output as the dialog flow is generated based on the canonical form, and, therefore, the user input]) and (2) a corresponding canonical form of a language model output defined in the dialog flow ([pg. 14, Appendix, Table 3, Turn 2, DAG prediction “request location/price_level”], [“request location” is the canonical form of the generated language model output “And what about location?” defined in the multi-turn dialog flow]).
Lam does not disclose:
the language model being implemented as an artificial neural network; and,
the constraints being applied at runtime to the language model, and training data sets used to train the language model excluding feedback related to the constraints such that the language model is not pre-trained with the constraints.
Moon discloses:
the language model being implemented as an artificial neural network ([Col. 23, Lines 50-60] The assistant system 140 may then select, by a conversational reasoning model, one or more candidate nodes from the knowledge graph corresponding to one or more candidate entities, respectively. Each candidate node may be selected based on the nodes corresponding to the initial entities, one or more dialog states associated with the query, and a context associated with the query, [Col. 24, Lines 54-66] FIG. 10 illustrates an example artificial neural network (“ANN”) 1000. In particular embodiments, an ANN may refer to a computational model comprising one or more nodes. Example ANN 1000 may comprise an input layer 1010, hidden layers 1020, 1030, 1040, and an output layer 1050. Each layer of the ANN 1000 may comprise one or more nodes, such as a node 1005 or a node 1015. In particular embodiments, each node of an ANN may be connected to another node of the ANN. As an example and not by way of limitation, each node of the input layer 1010 may be connected to one of more nodes of the hidden layer 1020. In particular embodiments, one or more nodes may be a bias node, [A conversational reasoning, i.e. language, model for selecting nodes, wherein an ANN is defined as the model performing the operations, indicates the conversational reasoning model to be implemented using the ANN]); and,
the constraints being applied at runtime to the language model ([Col. 25, Lines 65-67]-[Col. 26, Lines 1-5] (2) a zero-shot learning model that leverages previous sentence, dialog, and KG contexts to re-rank candidates from pruned decoder graph output based on their relevance and path scores, which allows for generalizable and robust classification with a large number of candidate classes, [A zero-shot learning model indicates the constraints of a previous sentence/dialog/context are used for generating/ranking candidates without prior training]), and training data sets used to train the language model excluding feedback related to the constraints such that the language model is not pre-trained with the constraints ([Col. 28, Lines 45-50] The embodiments disclosed herein compute zero-shot relevance score in the KG embeddings space, thus allowing for robust prediction for KG entities and domains unseen during training as well, [An entity/domain unseen during training indicates there is no pre-training operation associated with the entity/domain, necessarily having constraints related to the entity/domain]).
 Lam and Moon are considered analogous art within conversational reasoning within knowledge bases. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lam to incorporate the teachings of Moon, because of the novel way to associate walk paths of a knowledge graph with input contexts including dialog state, sentence, and initial entities mentioned in the conversation for ranking candidate entities using a zero-shot relevance learning model which results in more accurate and relevant entities in generated responses within multi-turn dialogs (Moon, [Col. 3, Lines 15-40]).

Regarding claim 12, Lam in view of Moon discloses: the processor of claim 11.
Lam further discloses:
wherein the performing the one or more operations to execute the dialog flow comprises using at least the language model to generate the output ([pg. 5, Section 5.2, Par. 1] All models use a standard Seq2Seq architecture with a bidirectional encoder and left-to-right autoregressive decoder. mBART is pre-trained to denoise text in 50 languages, while mT5 is trained on 101 languages [mBART can reasonably be classified as a language model]).

Regarding claim 13, Lam in view of Moon discloses: the processor of claim 11.
Lam further discloses:
wherein the one or more processing units further perform operations comprising:
generating a second canonical form based at least on the output ([pg. 14, Appendix, Table 3, Turn 2, DST Prediction] ( hotels search ) rating equal_to " don’t care " , stars at_least " 5 " [In view of the previously disclosed ( hotels search ) canonical form of Turn 1, it can be seen that the addition of rating equal_to and stars at_least is an second, updated canonical form based on the output from the first turn, i.e. asking for rating and stars]);
determining a second dialog flow based at least on the second canonical form ([pg. 14, Appendix, Table 3, Turn 2, DAG Prediction] ( hotels search ) request location , request price_level [In view of turn 1, it can be seen that a second dialog flow, i.e. determining location and price_level requests, in view of the second canonical form (see above element) indicating a hotel search with rating and stars already decided (tracking to output from a first turn), indicating the flow should ask other questions based on the elements of a second canonical form, which is based on output from a first turn]); and,
performing one or more second operations to execute the second dialog flow to generate a second output ([pg. 14, Appendix, Table 3, Turn 2, RG Prediction] And what about location? Do you have a price range for the hotel? [Asking a user for location a pricing is executing the steps of the dialog flow determined in the DAG prediction to generate a second output, i.e. the response, in view of the first output RG prediction of turn 1]).

Regarding claim 16, Lam in view of Moon discloses: the processor of claim 11.
Lam further discloses:
wherein the performing the one or more operations to execute the dialog flow comprises:
generating an embedding of a second canonical form associated with the dialog flow in a semantic or latent space ([pg. 14, Table 3, Turns 1, 2 DST Prediction] [In view of the sentence embedding of Lam ([pg. 5, section 4.2, Par. 1]), indicating the canonical forms can also be embedded as they are in some form of sentence, further in view of the addition of rating equal_to “don’t care” and stars at_least “5” to the turn 2 DST prediction indicating a second canonical form associated with the dialogue flow in view of the canonical form “(hotels search)” of turn 1]);
determining one or more canonical forms based at least on the embedding of the second canonical form and one or more embeddings of one or more predefined canonical forms in the semantic or latent space ([Fig. 1, History], [pg. 14, Table 3, Turn 2, ACD, DAG], [Determining to add request location and request price_level canonical forms to the canonical form DAG prediction which comes after the rating and star determinations of the earlier DST section of turn 2 indicates determination of the canonical form “(hotels search) request location request price_level” is based on the second canonical form, i.e. [pg. 14, Appendix, Table 3, Turn 2, DST Prediction] ( hotels search ) rating equal_to " don’t care " , stars at_least " 5 ", e.g. not needing to include these pieces of information again, and a predefined canonical form in a semantic or latent space in view of the previous dialogue acts and retrieved results of Fig. 1 of Lam indicating predefined, i.e. historical canonical forms, further in view of the API and ACTS calls of the turns indicating embedding to transmit information and perform those calls in a semantic or latent space]);
generating a prompt that includes the one or more canonical forms ([pgs. 14-15, Turns 1-3, DAG Predictions]), one or more example outputs associated with the canonical forms ([pgs. 14-15, Turns 1-2, RG Predictions]), and at least a portion of a current conversation ([pgs. 14-15, Turns 1-3, DST Inputs]); and,
processing the prompt using the language model to generate the output ([pgs. 14-15, Turn 3, RG Prediction] “There are 4 available hotels. I recommend Royal Plaza Hotel. Its rating is 9.” [In view of the mBART, i.e. language model, used to perform the operations of Lam as disclosed in Section 5.2]).

Regarding claim 17, Lam in view of Moon discloses: the processor of claim 16.
Lam further discloses:
wherein the performing the one or more operations to execute the dialog flow comprises:
accessing at least one of a knowledge base, a computational knowledge engine, a search engines, or an automation service to generate a second output ([Fig. 1, “Knowledge Base”], [Following the flow of Fig. 1, using output from a knowledge base to generate dialogue acts (see dialogue act generation DAG) to be then used for response generation (RG) indicates accessing the knowledge base to generate a second output, i.e. response, in view of the plurality of responses of Table 3]), wherein the prompt is further generated to include at least a portion of the second output ([pgs. 14-15, Table 3, Turns 1-3], [The hotel is recommended based on price, rating, stars, etc. all of which could reasonably be considered to be portions of a second output, i.e. any step of the hotel recommendation]).

Regarding claim 19, Lam discloses: a system comprising:
one or more processors to:
execute a dialog engine to manage an interplay between a large language model (LLM) and one or more user inputs ([pgs. 14-15, Table 3, Turns 1-3], [representing a multi-step interplay between a large language model (mBART) and user inputs, i.e. DST inputs]), the dialog engine dynamically generating a prompt for the LLM including one or more example dialog flows associated with one or more predefined user inputs that are within a threshold similarity to the one or more user inputs ([pg. 5, Section 4.2, Par. 2] In creating the RG training set, we first translate the source agent utterances to the target language and use LaBSE to remove pairs whose similarity score is below a threshold. We found a threshold of 0.8 to work best empirically. Higher thresholds would inadvertently filter correctly translated utterances. We construct the final training data by pairing aligned translated utterances that pass the filter with their corresponding translated agent dialogue acts [Pairing aligned utterances with agent dialogue acts indicates a prompt for action response for the LLM including a dialog flow, in view of the dialog flow of Table 3, wherein the predefined inputs, i.e. source utterances, are within a threshold similarity to one or more user inputs, i.e. translated versions. The examiner would like to note that though related to translations, the concepts of comparing words between languages to determine accuracies of translations is equivalent to the word comparison for task determination between tasks in the same language, i.e. both word embedding comparisons]), the one or more example dialog flows comprising one or more sequences of past and future dialog between a user and the LLM and configuring the LLM at runtime ([pg. 10, Ethical Considerations, Par. 2] …performing one run instead of averaging multiple runs…”, [Multiple and/or one run indicates a runtime operation. Further, in view of the previously cited dialog flow “(hotels search) request rating, request stars”, wherein each request defines a sequence of dialog to be performed between a user and outputs of language model, i.e. the requests, in later turns such as rating and stars in turn 2, wherein at the DST stage, the “(hotels search)” is defining a past dialog request to be clarified with the future “request location” and “request price level” operations]) to generate the output according to constraints defined in the one or more example dialog flows based on (1) a match between a canonical form and a user input defined in at least one dialog flow ([In view of the dialog flow being generated based upon the canonical form, which itself is based upon user input, it is unclear to the examiner how there would not inherently always be a match between the canonical form and user input defined in the dialog flow for generating output as the dialog flow is generated based on the canonical form, and, therefore, the user input]) and (2) a corresponding canonical form of an LLM output defined in the at least one dialog flow ([pg. 14, Appendix, Table 3, Turn 2, DAG prediction “request location/price_level”], [“request location” is the canonical form of the generated language model output “And what about location?” defined in the multi-turn dialog flow.]).
Lam does not disclose:
the language model being implemented as an artificial neural network; and,
the constraints being applied at runtime to the language model, and training data sets used to train the language model excluding feedback related to the constraints such that the language model is not pre-trained with the constraints.
Moon discloses:
the language model being implemented as an artificial neural network ([Col. 23, Lines 50-60] The assistant system 140 may then select, by a conversational reasoning model, one or more candidate nodes from the knowledge graph corresponding to one or more candidate entities, respectively. Each candidate node may be selected based on the nodes corresponding to the initial entities, one or more dialog states associated with the query, and a context associated with the query, [Col. 24, Lines 54-66] FIG. 10 illustrates an example artificial neural network (“ANN”) 1000. In particular embodiments, an ANN may refer to a computational model comprising one or more nodes. Example ANN 1000 may comprise an input layer 1010, hidden layers 1020, 1030, 1040, and an output layer 1050. Each layer of the ANN 1000 may comprise one or more nodes, such as a node 1005 or a node 1015. In particular embodiments, each node of an ANN may be connected to another node of the ANN. As an example and not by way of limitation, each node of the input layer 1010 may be connected to one of more nodes of the hidden layer 1020. In particular embodiments, one or more nodes may be a bias node, [A conversational reasoning, i.e. language, model for selecting nodes, wherein an ANN is defined as the model performing the operations, indicates the conversational reasoning model to be implemented using the ANN]); and,
the constraints being applied at runtime to the language model ([Col. 25, Lines 65-67]-[Col. 26, Lines 1-5] (2) a zero-shot learning model that leverages previous sentence, dialog, and KG contexts to re-rank candidates from pruned decoder graph output based on their relevance and path scores, which allows for generalizable and robust classification with a large number of candidate classes, [A zero-shot learning model indicates the constraints of a previous sentence/dialog/context are used for generating/ranking candidates without prior training]), and training data sets used to train the language model excluding feedback related to the constraints such that the language model is not pre-trained with the constraints ([Col. 28, Lines 45-50] The embodiments disclosed herein compute zero-shot relevance score in the KG embeddings space, thus allowing for robust prediction for KG entities and domains unseen during training as well, [An entity/domain unseen during training indicates there is no pre-training operation associated with the entity/domain, necessarily having constraints related to the entity/domain]).
 Lam and Moon are considered analogous art within conversational reasoning within knowledge bases. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lam to incorporate the teachings of Moon, because of the novel way to associate walk paths of a knowledge graph with input contexts including dialog state, sentence, and initial entities mentioned in the conversation for ranking candidate entities using a zero-shot relevance learning model which results in more accurate and relevant entities in generated responses within multi-turn dialogs (Moon, [Col. 3, Lines 15-40]).

Claim(s) 4, 6, 8, 14, 15, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lam in view of Moon, further in view of Hall et al. (US-20210406718-A1), hereinafter Hall.

Regarding claim 4, Lam in view of Moon discloses: the method of claim 1.
Lam further discloses:
generating an embedding of the user input in a semantic or latent space ([pg. 5, section 4.2, Par. 1] To score a pair of sentences, the model first calculates an embedding for each sentence and computes the cosine distance between those vectors [Computing a distance between embedding vectors, wherein the vectors are encoded using BERT (indicating semantic analysis), indicates the vectors are embeddings in a semantic or latent space]).
Lam in view of Moon does not disclose:
wherein the generating the canonical form comprises:
determining one or more example user inputs that are associated with one or more predefined canonical forms based at least on the embedding of the user input and one or more embeddings of the one or more example user inputs in the semantic or latent space; and,
generating a prompt that includes the one or more example user inputs, the one or more predefined canonical forms, and at least a portion of a current conversation.
Hall discloses:
wherein the generating the canonical form comprises:
determining one or more example user inputs that are associated with one or more predefined canonical forms based at least on the embedding of the user input and one or more embeddings of the one or more example user inputs in the semantic or latent space ([0022] As shown in FIG. 1C, in a new conversational event 105C, the user 110 may request that the conversational computing interface 100 “order another pepperoni pizza.” Accordingly, the conversational computing interface 100 may be configured to repeat the previously-executed action 106B in a replicated planned action 107B, e.g., so as to order another pepperoni pizza , and, [0017] the conversational computing interface may be trained to generate an action that is similar to an action that was taken for some other event in an annotated dialogue [In view of the embedding similarity comparison of Lam, Hall’s system determines an example input “Order a pepperoni pizza” (See Fig. 1B) with associated canonical form “order_pizza” (See Fig. 1B, 107B) based on similarity of requests of past orders (cheese pizza of training Fig. 1A and first pepperoni pizza of Fig. 1B]);
generating a prompt that includes the one or more example user inputs, the one or more predefined canonical forms, and at least a portion of a current conversation ([Fig. 3A-3B, Annotated Dialogue History 300A, Updated Dialogue Plan 300B, traced/executable steps], [Updating a dialogue plan, i.e. prompt(s), based on a dialogue history, i.e. example user input(s), in view of the API invocations of the dialogue history “Create User Account”, “Get Menu”, “Add [type-of] pizza”, etc., i.e. predefined canonical forms, to represent a current conversation, i.e. cheese pizza instead of pepperoni, tracks to generating a new prompt featuring the example inputs and predefined canonical forms of the dialogue history with the current conversation, substituting pepperoni for cheese]); and,
processing the prompt using the language model to generate the canonical form ([Fig. 3B, Conversation Event 312B], [Processing the prompts of “Which would you like?” and “Cheese!” in reference to kinds of pizza resulting in an API invocation “Add Cheese Pizza” indicates a generated canonical form, i.e. “Add Cheese Pizza”, based on processing of the prompt “Cheese!”, in view of the language model of Lam]).
 Lam are considered analogous art within task-oriented dialogue. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lam in view of Moon to incorporate the teachings of Hall, because of the novel way to adjust previous dialogue conversations to a current situation allowing for the previous actions to be directly executable in a new context, reducing required training samples (Hall, [0001]).

Regarding claim 6, Lam in view of Moon discloses: the method of claim 1.
Lam in view of Moon does not disclose:
wherein the determining the dialog flow comprises matching the canonical form to a predefined canonical form associated with the dialog flow.
Hall discloses:
wherein the determining the dialog flow comprises matching the canonical form to a predefined canonical form associated with the dialog flow ([Figs. 3A-3B], [0047] Updated dialogue plan 300B includes an updated conversational event 312B in which a user asks for a cheese pizza instead of a pepperoni pizza, and subsequent updated executable steps including updated executable step 314B asking for user confirmation and updated executable step 318B in which the conversational computing interface orders the cheese pizza instead of the pepperoni pizza…The updated dialogue plan 300B may be derived by automatically and/or manually editing the annotated dialogue history 300A [In view of Figs. 3A-3B, performing an updated dialogue plan that contains all the same steps as the dialogue history event indicates that based on the canonical form of “Order a pizza!”, in view of the canonical form generation for tasks of Lam, a dialogue flow is determined by matching canonical forms to predefined, i.e. historical, canonical forms in view of the embedding similarity matching of Lam]).
Lam, Moon, and Hall are considered analogous art within task-oriented dialogue response. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lam in view of Moon to incorporate the teachings of Hall, because of the novel way to adjust previous dialogue conversations to a current situation allowing for the previous actions to be directly executable in a new context, reducing required training samples (Hall, [0001]).

Regarding claim 8, Lam in view of Moon discloses: the method of claim 1.
Lam further discloses:
generating an embedding of the canonical form in a semantic or latent space ([pg. 5, section 4.2, Par. 1] To score a pair of sentences, the model first calculates an embedding for each sentence and computes the cosine distance between those vectors [Computing a distance between embedding vectors, wherein the vectors are encoded using BERT (indicating semantic analysis), indicates the vectors are embeddings in a semantic or latent space]).
Lam in view of Moon does not disclose:
wherein the determining the dialog flow comprises:
determining one or more canonical forms that are associated with one or more predefined dialog flows based at least on the embedding of the canonical form and one or more embeddings of the one or more canonical forms in the semantic or latent space;
generating a prompt that includes the one or more canonical forms, the one or more predefined dialog flows, and at least a portion of a current conversation; and
processing the prompt using the language model to generate the dialog flow.
Hall discloses:
wherein the determining the dialog flow comprises:
determining one or more canonical forms that are associated with one or more predefined dialog flows based at least on the embedding of the canonical form and one or more embeddings of the one or more canonical forms in the semantic or latent space ([Figs. 3A-3B, Conversation Events 302A, 302B], [In view of the canonical form generation based on input of Lam, determining that an updated dialogue plan should have the same conversation pattern as a dialogue from history based on the original event “Order a pizza!” indicates a determination that the canonical forms of “Order a pizza!” or other conversational events, as would be generated in Lam, between the historical and current dialogues are associated based on canonical form embeddings, in view of the embeddings of Lam, associated with predefined dialogue flows, i.e. for ordering a pizza]);
generating a prompt that includes the one or more canonical forms, the one or more predefined dialog flows, and at least a portion of a current conversation ([Fig. 3A-3B, Annotated Dialogue History 300A, Updated Dialogue Plan 300B, traced/executable steps], [Updating a dialogue plan, i.e. prompt(s), based on a dialogue history, i.e. predefined dialogue flow, in view of the API invocations of the dialogue history “Create User Account”, “Get Menu”, “Add [type-of] pizza”, etc., i.e. canonical forms, to represent a current conversation, i.e. cheese pizza instead of pepperoni, tracks to generating a new prompt featuring one or more canonical forms (as would be determined in Lam) and predefined dialogue flows of the dialogue history with the current conversation, i.e. for ordering a pizza with different toppings]); and
processing the prompt using the language model to generate the dialog flow ([Fig. 3B, Conversation Event 312B], [Processing the prompts of “Cheese!” and “The total for your order is $8…” in reference to kinds of pizza resulting in an API invocation “Add Cheese Pizza” indicates a generated dialogue flow, i.e. “Add Cheese Pizza”, based on processing of the prompt “Cheese!”, in view of the language model of Lam and the previously defined prompts of cheese pizza and associated price]).
Lam, Moon, and Hall are considered analogous art within task-oriented dialogue response. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lam in view of Moon to incorporate the teachings of Hall, because of the novel way to adjust previous dialogue conversations to a current situation allowing for the previous actions to be directly executable in a new context, reducing required training samples (Hall, [0001]).

Regarding claim 14, Lam in view of Moon discloses: the processor of claim 11.
Lam further discloses:
generating an embedding of the user input in a semantic or latent space ([pg. 5, section 4.2, Par. 1] To score a pair of sentences, the model first calculates an embedding for each sentence and computes the cosine distance between those vectors [Computing a distance between embedding vectors, wherein the vectors are encoded using BERT (indicating semantic analysis), indicates the vectors are embeddings in a semantic or latent space]).
Lam in view of Moon does not disclose:
wherein the generating the canonical form comprises:
determining one or more example user inputs that are associated with one or more predefined canonical forms based at least on the embedding of the user input and one or more embeddings of the one or more example user inputs in the semantic or latent space; and,
generating a prompt that includes the one or more example user inputs, the one or more predefined canonical forms, and at least a portion of a current conversation.
Hall discloses:
wherein the generating the canonical form comprises:
determining one or more example user inputs that are associated with one or more predefined canonical forms based at least on the embedding of the user input and one or more embeddings of the one or more example user inputs in the semantic or latent space ([0022] As shown in FIG. 1C, in a new conversational event 105C, the user 110 may request that the conversational computing interface 100 “order another pepperoni pizza.” Accordingly, the conversational computing interface 100 may be configured to repeat the previously-executed action 106B in a replicated planned action 107B, e.g., so as to order another pepperoni pizza , and, [0017] the conversational computing interface may be trained to generate an action that is similar to an action that was taken for some other event in an annotated dialogue [In view of the embedding similarity comparison of Lam, Hall’s system determines an example input “Order a pepperoni pizza” (See Fig. 1B) with associated canonical form “order_pizza” (See Fig. 1B, 107B) based on similarity of requests of past orders (cheese pizza of training Fig. 1A and first pepperoni pizza of Fig. 1B]);
generating a prompt that includes the one or more example user inputs, the one or more predefined canonical forms, and at least a portion of a current conversation ([Fig. 3A-3B, Annotated Dialogue History 300A, Updated Dialogue Plan 300B, traced/executable steps], [Updating a dialogue plan, i.e. prompt(s), based on a dialogue history, i.e. example user input(s), in view of the API invocations of the dialogue history “Create User Account”, “Get Menu”, “Add [type-of] pizza”, etc., i.e. predefined canonical forms, to represent a current conversation, i.e. cheese pizza instead of pepperoni, tracks to generating a new prompt featuring the example inputs and predefined canonical forms of the dialogue history with the current conversation, substituting pepperoni for cheese]); and,
processing the prompt using the language model to generate the canonical form ([Fig. 3B, Conversation Event 312B], [Processing the prompts of “Which would you like?” and “Cheese!” in reference to kinds of pizza resulting in an API invocation “Add Cheese Pizza” indicates a generated canonical form, i.e. “Add Cheese Pizza”, based on processing of the prompt “Cheese!”, in view of the language model of Lam]).
 Lam, Moon, and Hall are considered analogous art within task-oriented dialogue response. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lam in view of Moon to incorporate the teachings of Hall, because of the novel way to adjust previous dialogue conversations to a current situation allowing for the previous actions to be directly executable in a new context, reducing required training samples (Hall, [0001]).

Regarding claim 15, Lam in view of Moon discloses: the processor of claim 11.
Lam further discloses:
generating an embedding of the canonical form in a semantic or latent space ([pg. 5, section 4.2, Par. 1] To score a pair of sentences, the model first calculates an embedding for each sentence and computes the cosine distance between those vectors [Computing a distance between embedding vectors, wherein the vectors are encoded using BERT (indicating semantic analysis), indicates the vectors are embeddings in a semantic or latent space]).
Lam in view of Moon does not disclose:
wherein the determining the dialog flow comprises:
determining one or more canonical forms that are associated with one or more predefined dialog flows based at least on the embedding of the canonical form and one or more embeddings of the one or more canonical forms in the semantic or latent space;
generating a prompt that includes the one or more canonical forms, the one or more predefined dialog flows, and at least a portion of a current conversation; and
processing the prompt using the language model to generate the dialog flow.
Hall discloses:
wherein the determining the dialog flow comprises:
determining one or more canonical forms that are associated with one or more predefined dialog flows based at least on the embedding of the canonical form and one or more embeddings of the one or more canonical forms in the semantic or latent space ([Figs. 3A-3B, Conversation Events 302A, 302B], [In view of the canonical form generation based on input of Lam, determining that an updated dialogue plan should have the same conversation pattern as a dialogue from history based on the original event “Order a pizza!” indicates a determination that the canonical forms of “Order a pizza!” or other conversational events, as would be generated in Lam, between the historical and current dialogues are associated based on canonical form embeddings, in view of the embeddings of Lam, associated with predefined dialogue flows, i.e. for ordering a pizza]);
generating a prompt that includes the one or more canonical forms, the one or more predefined dialog flows, and at least a portion of a current conversation ([Fig. 3A-3B, Annotated Dialogue History 300A, Updated Dialogue Plan 300B, traced/executable steps], [Updating a dialogue plan, i.e. prompt(s), based on a dialogue history, i.e. predefined dialogue flow, in view of the API invocations of the dialogue history “Create User Account”, “Get Menu”, “Add [type-of] pizza”, etc., i.e. canonical forms, to represent a current conversation, i.e. cheese pizza instead of pepperoni, tracks to generating a new prompt featuring one or more canonical forms (as would be determined in Lam) and predefined dialogue flows of the dialogue history with the current conversation, i.e. for ordering a pizza with different toppings]); and
processing the prompt using the language model to generate the dialog flow ([Fig. 3B, Conversation Event 312B], [Processing the prompts of “Cheese!” and “The total for your order is $8…” in reference to kinds of pizza resulting in an API invocation “Add Cheese Pizza” indicates a generated dialogue flow, i.e. “Add Cheese Pizza”, based on processing of the prompt “Cheese!”, in view of the language model of Lam and the previously defined prompts of cheese pizza and associated price]).
Lam, Moon, and Hall are considered analogous art within task-oriented dialogue response. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lam in view of Moon to incorporate the teachings of Hall, because of the novel way to adjust previous dialogue conversations to a current situation allowing for the previous actions to be directly executable in a new context, reducing required training samples (Hall, [0001]).

Regarding claim 18, Lam in view of Moon discloses: the processor of claim 11.
Lam further discloses:
wherein the processor is comprised in at least one of:
a system incorporating one or more virtual machines (VMs) ([Section 5.2, Par. 4] Our models were trained on virtual machines with a single NVIDIA V100 (16GB memory) GPU on the AWS platform); and,
a system implementing one or more large language models (LLMs) ([Section 5.2, Par. 1] We use mbart-large-50 as the neural model for our agent in all our experiments. [BART is a well-known language model, adapted to “large”]). 
Lam in view of Moon does not disclose:
wherein the processor is comprised in at least one of:
an infotainment system for an autonomous or semi-autonomous machine; 
a system for performing simulation operations; 
a system for performing digital twin operations; 
a system for performing light transport simulation; 
a system for performing collaborative content creation for 3D assets; 
a system for performing deep learning operations; 
a system implemented using an edge device; 
a system implemented using a robot; 
a system for generating or presenting virtual reality, augmented reality, or mixed reality content; 
a system for performing conversational AI operations; 
a system for generating synthetic data; 
a system implemented at least partially in a data center; or 
a system implemented at least partially using cloud computing resources.
Hall discloses:
a system for performing deep learning operations ([0084] Non-limiting examples of training procedures for adjusting trainable parameters include… reinforcement learning (e.g., deep Q learning based on feedback)); and,
a system for generating or presenting virtual reality, augmented reality, or mixed reality content ([0079] In some implementations, display subsystem may include one or more virtual-, augmented-, or mixed reality displays [The examiner would like to note that due to the current disjunctive nature of the claims, not all elements require a mapping]).
 Lam, Moon, and Hall are considered analogous art within task-oriented dialogue response. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lam in view of Moon to incorporate the teachings of Hall, because of the novel way to adjust previous dialogue conversations to a current situation allowing for the previous actions to be directly executable in a new context, reducing required training samples (Hall, [0001]).

Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lam in view of Moon, further in view of Andreas et al. (US-11410643-B2), hereinafter Andreas.

Regarding claim 20, Lam in view of Moon discloses: the system of claim 19.
Lam in view of Moon does not disclose:
wherein the prompt is dynamically generated based at least on the one or more user inputs being dissimilar from the one or more predefined user inputs by more than a threshold amount.
Andreas discloses:
wherein the prompt is dynamically generated based at least on the one or more user inputs being dissimilar from the one or more predefined user inputs by more than a threshold amount ([Col. 4, Lines 45-50] For example, the conversational computing interface may be configured to generate computer-executable plans for events that did not occur in the training data, so as to generalize the training from the exemplary annotated dialogues provided in training data to other, similar and/or dissimilar situations [In view of the similarity calculations of Lam, indicating a threshold similarity, in view of the similarities of Andreas, in which new prompts are dynamically generated based on dissimilar user inputs indicating a threshold in order to be considered “dissimilar”]).
 Lam are considered analogous art within task. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Lam in view of Moon to incorporate the teachings of Andreas, because of the novel way to adapt existing executable plans for events that have not occuand ability to respond to diverse situations (Andreas, [Col. 4, Lines 30-50]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Danila et al. (US-20190378510-A1) discloses “Techniques described herein relate to allowing users to employ voice-based human-to-computer dialog to program automated assistants with customized routines, or “dialog routines,” that can later be invoked to accomplish task(s). In various implementations, a first free form natural language input—that identifies a command to be mapped to a task and slot(s) required to be filled with values to fulfill the task—may be received from a user. A dialog routine may be stored that includes a mapping between the command and the task, and which accepts, as input, value(s) to fill the slot(s). Subsequent free form natural language input may be received from the user to (i) invoke the dialog routine based on the mapping, and/or (ii) to identify value(s) to fill the slot(s). Data indicative of at least the value(s) may be transmitted to a remote computing device for fulfillment of the task” (abstract). See entire document.
He et al. (“SPACE-3: Unified Dialog Model Pre-training for Task-Oriented Dialog Understanding and Generation”) discloses “a novel unified semi-supervised pre-trained conversation model learning from large-scale dialog corpora with limited annotations, which can be effectively fine-tuned on a wide range of downstream dialog tasks. Specifically, SPACE-3 consists of four successive components in a single transformer to maintain a task-flow in TOD systems: (i) a dialog encoding module to encode dialog history, (ii) a dialog understanding module to extract semantic vectors from either user queries or system responses, (iii) a dialog policy module to generate a policy vector that contains high-level semantics of the response, and (iv) a dialog generation module to produce appropriate responses. We design a dedicated pre-training objective for each component. Concretely, we pre-train the dialog encoding module with span mask language modeling to learn contextualized dialog information” (abstract). See entire document.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to THEODORE JOHN WITHEY whose telephone number is (703)756-1754. The examiner can normally be reached Monday - Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



                                                                                                                                                                                                   /THEODORE WITHEY/Examiner, Art Unit 2655  
                                                                                                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                  /ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Show 7 earlier events
Dec 01, 2025
Response after Non-Final Action
Dec 09, 2025
Applicant Interview (Telephonic)
Dec 09, 2025
Examiner Interview Summary
Dec 30, 2025
Request for Continued Examination
Jan 17, 2026
Response after Non-Final Action
Feb 13, 2026
Non-Final Rejection mailed — §103
May 06, 2026
Examiner Interview Summary
May 06, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

18/179,842
Patent 12632670
Natural Language Processing for Identifying Bias in a Span of Text
3y 2m to grant Granted May 19, 2026
17/655,770
Patent 12591744
METHOD FOR TRAINING SEMANTIC REPRESENTATION MODEL, DEVICE AND STORAGE MEDIUM
4y 0m to grant Granted Mar 31, 2026
18/113,192
Patent 12536994
APPARATUS FOR CLASSIFYING SOUNDS BASED ON NEURAL CODE IN SPIKING NEURAL NETWORK AND METHOD THEREOF
2y 9m to grant Granted Jan 27, 2026
17/956,558
Patent 12475330
METHOD FOR IDENTIFYING NOISE SAMPLES, ELECTRONIC DEVICE, AND STORAGE MEDIUM
3y 1m to grant Granted Nov 18, 2025
17/813,944
Patent 12417759
SPEECH RECOGNITION USING CADENCE PATTERNS
3y 1m to grant Granted Sep 16, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
44%
Grant Probability
95%
With Interview (+51.3%)
2y 11m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 25 resolved cases by this examiner. Grant probability derived from career allowance rate.