Last updated: April 19, 2026
Application No. 18/061,027
GENERATING VARIATIONAL DIALOGUE RESPONSES FROM STRUCTURED DATA FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Final Rejection §103
Filed
Dec 02, 2022
Examiner
WITHEY, THEODORE JOHN
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
4 (Final)
Interview Optional

— +46.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 23 resolved cases, 2023–2026
Examiner Intelligence

WITHEY, THEODORE JOHN View full profile →
Grants 44% of resolved cases
Career Allow Rate
10 granted / 23 resolved
-18.5% vs TC avg
Strong +47% interview lift
Without
With
+46.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
39 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
22.0%
-18.0% vs TC avg
§103
48.6%
+8.6% vs TC avg
§102
17.1%
-22.9% vs TC avg
§112
12.0%
-28.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 23 resolved cases
Office Action

§103
DETAILED ACTION
	This office action is in response to Applicant’s Amendment/Request for Reconsideration, received on 11/12/2025. Claims 1, 10-16, 18 have been amended. Claims 3-4, 17 have been cancelled. Claims 1-2, 5-16, 18-20 are pending and have been considered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments, see pgs. 9-11, filed 11/12/2025, with respect to the rejection(s) of claim(s) 1-20 under 35 U.S.C. 35 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Olabiyi et al. (US-20220058444-A1). Olabiyi will be introduced to replace Norton. Olabiyi discloses generation of candidate response lists to received questions, wherein the candidates are ranked based on syntactic differences between ground truth and the candidates. See updated rejections below.

Claim Objections
Claim 10 objected to because of the following informalities: line 7 of the amended claim reads “wherein the one or more neural network trained by at least”. The examiner believes “are” and/or a similar addition should be made between “network” and “trained” to fix grammatical issues. The examiner’s interpretation including “are” will be applied below. Appropriate correction is required.
Claim 14 is objected to because of the following informalities: line 14 of the claims reads “output having has a first syntax and a second candidate output having has a…” (emphasis added to underlined portion). The examiner believed there is a redundancy with inclusion of both “having” and “has”. For further analysis of the claims, “has” will be removed.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-2, 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sarikaya et al. (US-20240347050-A1), hereinafter Sarikaya, in view of Olabiyi et al. (US-20220058444-A1), hereinafter Olabiyi, further in view of Challa et al. (US-10978056-B1), hereinafter Challa.

	Regarding claim 1, Sarikaya discloses: One or more processors ([0168] one or more controllers/processors (1104/1204)) comprising circuitry ([Circuits are an inherent part of a physical processor structure]) to:
	determine, responsive to receiving a query, one or more values for one or more fields corresponding to a domain associated with the query ([0039] For example, if the second skill is an event booking skill and the context data included the entity type “location” and the entity value “Seattle, Washington,” the skill output data may correspond to names of events in Seattle, Washington [In this example, the domain represents booking/travel, Seattle represents the value of the “location” field]);
	generate, using a neural network and based at least on the query or the one or more values, a response to have syntax variation and accuracy ([0153] The orchestrator component 230/LRO 528 determines (908) output data based at least in part on the prompt text data. For example, the orchestrator component 230/LRO 528 may determine the output data to include text data corresponding to the prompt text data [Output data including text data corresponding to the prompt text tracks to a response to the prompt. Further, it is unclear to the examiner what the response having “syntax variation and accuracy” is with respect to for determining the variation/accuracy; therefore, it will be interpreted that any generated response will necessarily have syntax variation and accuracy with regard to some other body of text]); and,
	cause, using at least one of a display or an audio speaker device, a presentation of the response ([0154] The device 110 presents (912) content corresponding to the output data. For example, if the output data includes text data, the device 110 displays text corresponding to the text data.).
	Sarikaya does not disclose:
	the neural network is trained by at least:
	applying ground truth data to a set of input data to cause the neural network to generate a plurality of candidate outputs, the ground truth data representative of one or more variational responses having different syntactic structures for semantically equivalent content, the set of input data comprising one or more training queries and one or more training values corresponding to one or more training fields; and,
	generating, according to one or more functions assigning higher multi-metric values to one or more candidate outputs of the plurality of candidate outputs based at least on comparisons between the one or more candidate outputs and the one or more variational responses of the ground truth data, 
a plurality of multi-metric values comprising: 
one or more first metric values indicating accuracy of the one or more candidate outputs in representing the one or more training values, and 
one or more second metric values indicating variation in syntax between the one or more candidate outputs and the one or more variational responses.
calculating loss values by applying one or more loss functions that assign higher loss values to candidate outputs having lower accuracy or lower syntactic variation; and,
updating, based at least on the loss values, one or more weights of the neural network.
Olabiyi discloses:
wherein the neural network is trained by at least:
applying ground truth data to a set of input data to cause the neural network to generate a plurality of candidate outputs ([Fig. 3C, applying Ground Truth Data 366 to Discriminator 362 resulting in Ranked Response List 368], [0050] Discriminator 362 can generate a probability that each candidate response corresponds to ground truth data 366. The probability generated by discriminator 362 can be used to rank the candidate responses to generate ranked response list 368, [Wherein the ground truth data is applied to output from the generator (input to the discriminator) indicating applying ground truth data to a set of input data, wherein a ranked list represents a plurality of candidate outputs]), the ground truth data representative of one or more variational responses having different syntactic structures for semantically equivalent content ([0053] The discriminator 362 can share context and word embedding with the generator 360 and can discriminate at the word level. Word-level discrimination can be achieved through a bidirectional RNN and is able to capture both syntactic and conceptual differences between the generator output and the ground truth, [Defining syntactic and conceptual differences to be distinct types indicates a situation in which there is no conceptual difference, i.e. semantically equivalent, though there are syntactic differences, i.e. different syntactic structures]), the set of input data comprising one or more training queries and one or more training values corresponding to one or more training fields ([0099] The training data can include a number of dialog sequences in a multi-turn dialog. Each dialog sequence can include one or more word tokens. The training data can also include a current prompt to which the machine classifier is being trained to generate a response. The current prompt can include one or more word tokens indicating a statement, question, or other dialog step, [0104] the algorithm 450 shown in FIG. 4B can be used to train a machine classifier having a generator G with parameters θ.sub.G and a discriminator D with parameters θ.sub.D, [Training using data comprised of words indicates each word/token to be a field of a sentence/larger text with values of the words themselves]);
generating, according to one or more functions assigning higher multi-metric values to one or more candidate outputs of the plurality of candidate outputs based at least on comparisons between the one or more candidate outputs and the one or more variational responses of the ground truth data ([0115] The response samples L can be ranked using the discriminator score [0116] {D*(X.sub.i, Y.sub.i,l)]}.sub.l=1.sup.L, [0122] the selected response is the candidate response having the highest (or lowest) discriminator score. That is, the selected response is the candidate response that is indicated by the machine classifier to be closest to a ground truth response, [The discriminator score is reasonably considered to be a higher multi-metric as the score is dependent upon discriminator output D, Generator output Y, and input X (see Figs. 3A/B), wherein each component of the discriminator function can be represented as its own metric (as would be required for processing numerically represented computer encodings)]),
calculating loss values by applying one or more loss functions that assign higher loss values to candidate outputs having lower accuracy or lower syntactic variation ([0031] A machine classifier being trained in an asymmetric mode may minimize the discriminator loss of the generator G based on the autoregressive outputs of the generator, [Applying a loss gathered through a discriminator (whose job is to compare input to ground truth as previously disclosed), wherein the determination is made on a word/token level, indicates the loss to be based on a comparison of syntax and/or accuracy (see previously cited [0053] which discloses the generating a probability of correspondence between input and ground truth). Further, minimizing loss indicates reducing overall discrepancy between ground truth and input, indicating a comparison of two pieces of text based on syntactical differences will have a higher loss value with lower accuracy/variation to be minimized]); and,
updating, based at least on the loss values, one or more weights of the neural network ([0020] The machine classifiers may be trained in both auto-regressive and traditional teacher-forcing modes, with the generator including a hierarchical recurrent encoder-decoder network and the discriminator including a bi-directional recurrent neural network, [0103] The model parameters can be updated based on the context, the current prompt, the generated response, the ground truth response, and the discriminator accuracy. By updating the model parameters, the generator can be trained to generate responses that are more accurate. In several embodiments, the model parameters are updated using an autoregression weighted by the discriminator accuracy, [Wherein the discriminator accuracy is directly used for loss, indicating an updating of the model based on loss values as would be required to produce more accurate results. Further, wherein the context of [0103] is with regard to Fig. 4A, defined to be a flow chart of a process for training a machine classifier which contains a neural network]).
Sarikaya and Olabiyi are considered analogous art within speech response generation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Sarikaya to incorporate the teachings of Olabiyi, because of the novel way to employ conditional GANs for multi-turn dialogue models with a HRED generator and discriminator, combining generative and retrieval-based multi-turn dialogue systems, improving their individual performances through sharing context and word embeddings between the generator and discriminator (Olabiyi, [0018]).
Sarikaya in view of Olabiyi does not disclose:
a plurality of multi-metric values comprising: 
one or more first metric values indicating accuracy of the one or more candidate outputs in representing the one or more training values, and 
one or more second metric values indicating variation in syntax between the one or more candidate outputs and the one or more variational responses. 
Challa discloses:
a plurality of multi-metric values comprising: 
one or more first metric values indicating accuracy of the one or more candidate outputs in representing the one or more training values ([Col. 26, Lines 9-10] the assistant system 140 may use methods to implement a grammaticality “filter”, [Col. 26, Lines 15-20] assistant system 140 may then determine, by the one or more classification models, a threshold score for determining a quality-indication for a candidate response based on the plurality of confidence scores associated with the plurality of candidate responses, [Col. 27, Lines 29-36] the assistant system 140 may determine, by a natural-language understanding module 220, one or more slots and one or more intents associated with the user input. Correspondingly, the one or more classification rules may be based on the slots and intents. As an example and not by way of limitation, the classification rules for an intent of getting updates about a friend may be different from those for an intent of getting biographical introduction of a famous politician [Determining a classification based on slots, i.e. entity tags/values, indicates a required value calculation to perform the classification and filtering, wherein the filtering operation indicates a required metric determination for setting the filter]), and 
one or more second metric values indicating variation in syntax between the one or more candidate outputs and the one or more variational responses ([Col. 23, Lines 1-5] a grammatical classification and semantic correctness classification dataset for the weather domain that consists of responses generated by 3 data-driven NLG systems, [Col. 27, Lines 10-20] classification rules may be based on one or more of metadata associated with the user input, a language-structure of the user input, user profile data associated with the user, or historical user input from the user and corresponding candidate responses presented to the user, [Col. 26, Lines 15-20] The assistant system 140 may then determine, by the one or more classification models, a threshold score for determining a quality-indication for a candidate response based on the plurality of confidence scores, [Col. 27, Lines 19-23] As another example and not by way of limitation, the filtering module may analyze the language-structure of the user input and select candidate responses with language-structures that match the language-structure of the user input [Determining a classification based on grammatical/semantic correctness and/or overall language structure, tracking to a syntax-matching operation, wherein the classification can be based on historical candidate responses, i.e. sample responses, indicating a required comparison of the old responses to the current response for the classification, wherein the filtering operation indicates a required metric determination for setting the filter. Further, the classifications of Challa could be used for determining syntactic and/or conceptual differences of Olabiyi without a change in functionality to Olabiyi. Performing a plurality of classifications on the same response indicates the collective quality indication to be based upon a multi-metric]). 
 Sarikaya, Olabiyi, and Challa are considered analogous art within automated speech response generation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Sarikaya in view of Olabiyi to incorporate the teachings of Challa, because of the novel way to have an assistant system interact with different agents to obtain information or services to be used in a generated responses, wherein those generated responses with ungrammatical results are filtered out, reducing processing time/power for response decisions (Challa, [Col. 2, Lines 1-67]).
	
	Regarding claim 2, Sarikaya in view of Olabiyi, further in view of Challa discloses: the one or more processors of claim 1.
	Sarikaya further discloses:
	wherein the one or more values are determined based at least on accessing one or more application programming interfaces (APIs) associated with the domain ([0103] In at least some embodiments, a device-determined directive may be formatted as a programmatic API call with a same logical operation as a remotely-determined directive [Sarikaya defines a directive as something which may include a description of the intent [0103], i.e. an API call to gather the temperature based on a weather directive]).
	Challa further discloses:
	wherein the variation in syntax corresponds to an arrangement of the one or more values ([Table 3, “# grammatically/semantically correct”, “# ungrammatical/semantically incorrect”], [Col. 22, Lines 60-65] the assistant system 140 may use a generate, filter, and rank framework, in which candidate responses are first filtered to eliminate unacceptable responses, and then ranked to select the best response. In particular embodiments, acceptability may include grammatical correctness and semantic correctness, [Eliminating unacceptable responses, wherein this determination is made based upon grammar and semantics, i.e. either or both of which could represent syntax, indicating at least two responses with different syntax, i.e. correct and incorrect, based upon the position of values, i.e. entities/slots/labels. See Table 3]), a first value of the one or more values arranged in a first position relative to a second value of the one or more values based at least on 
the variation in syntax ([A variation in syntax will inherently change the position of a first value relative to a second value. Similarly, changing the positions of values will inherently change the syntax of the sentence]), and 
the positioning of at least one of the first value or the second value in the query ([Col. 27, Lines 30-40] the assistant system 140 may determine, by a natural-language understanding module 220, one or more slots and one or more intents associated with the user input. Correspondingly, the one or more classification rules may be based on the slots and intents. As an example and not by way of limitation, the classification rules for an intent of getting updates about a friend may be different from those for an intent of getting biographical introduction of a famous politician (e.g., favoring naturalness v.s. favoring grammaticality), [A syntactical classification based on slots and intents, i.e. the query’s objective, indicates different syntaxes for the intents (disclosed as natural vs. grammatically correct). Further, “a first value… arranged in a first position relative to a second values…based at least on…the positioning of at least one of the first… or second value” tracks to determining positioning based on the existing positioning, i.e. no operation to be performed. It is unclear to the examiner what this step represents]).

	Regarding claim 6, Sarikaya in view of Olabiyi, further in view of Challa discloses: the one or more processors of claim 1.
	Olabiyi further discloses:
	wherein the neural network comprises at least one of (i) an autoregressive model ([0019] embodiments of the invention may employ autoregressive sampling, [A model performing autoregressive operations indicates that model to be autoregressive]) or (ii) a model having an encoder and a decoder ([0020] the generator including a hierarchical recurrent encoder-decoder network).

Claim(s) 10-16, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Barkan et al. (US-20210182489-A1), hereinafter Barkan, in view of Olabiyi, further in view of Challa.

Regarding claim 10, Barkan discloses: a method comprising:
determining one or more responses to one or more queries based at least on an output of one or more neural networks ([0037] The query 202 in some examples is a request including a sentence which is to be compared to one or more candidate sentences [Comparison of a query to one or more candidate sentences indicates determining if those one or more candidates are appropriate responses], where, [0035] The DSE model 200 is a trained language model for performing sentence similarity comparisons, and, [0021] The pretrained teacher language model 102 can include, without limitation, a BERT language model, ELMo deep contextualized word representations model, XLNET, ALBERT or any other natural language processing (NLP) transformer-type of language model. The pretrained teacher language model 102 is utilized to train a distilled sentence embedding (DSE) student model 104 [Output from the DSE model tracks to output of one or more neural networks]), the output including a response ([0057] The plurality of candidate sentences 402 includes a set of one or more similar sentences 420 which may be output in response 418 to a query).
Barkan does not disclose:
wherein the one or more neural networks are trained by at least:
applying ground truth data to a set of input data to cause the one or more neural networks to generate a plurality of candidate outputs, the ground truth data representative of one or more variational responses having different syntactic structures for semantically equivalent content, the set of input data comprising one or more training queries and one or more training values corresponding to one or more training fields;
generating variational outputs from a same set of inputs using a function assigning higher multi-metric values to one or more candidate outputs of the plurality of candidate outputs based at least on comparisons between the one or more candidate outputs and the one or more variational responses of ground truth data, 
a plurality of multi-metric values comprising: 
one or more first metric values indicating accuracy of the one or more candidate outputs in representing the one or more training values and 
one or more second metric values indicating variation in syntax between the one or more candidate outputs and the one or more variational responses.
Olabiyi discloses:
wherein the one or more neural networks are trained by at least:
applying ground truth data to a set of input data to cause the neural network to generate a plurality of candidate outputs ([Fig. 3C, applying Ground Truth Data 366 to Discriminator 362 resulting in Ranked Response List 368], [0050] Discriminator 362 can generate a probability that each candidate response corresponds to ground truth data 366. The probability generated by discriminator 362 can be used to rank the candidate responses to generate ranked response list 368, [Wherein the ground truth data is applied to output from the generator (input to the discriminator) indicating applying ground truth data to a set of input data, wherein a ranked list represents a plurality of candidate outputs]), the ground truth data representative of one or more variational responses having different syntactic structures for semantically equivalent content ([0053] The discriminator 362 can share context and word embedding with the generator 360 and can discriminate at the word level. Word-level discrimination can be achieved through a bidirectional RNN and is able to capture both syntactic and conceptual differences between the generator output and the ground truth, [Defining syntactic and conceptual difference to be distinct types indicates a situation in which there is no conceptual difference, i.e. semantically equivalent, though there are syntactic differences, i.e. different syntactic structures]), the set of input data comprising one or more training queries and one or more training values corresponding to one or more training fields ([0099] The training data can include a number of dialog sequences in a multi-turn dialog. Each dialog sequence can include one or more word tokens. The training data can also include a current prompt to which the machine classifier is being trained to generate a response. The current prompt can include one or more word tokens indicating a statement, question, or other dialog step, [0104] the algorithm 450 shown in FIG. 4B can be used to train a machine classifier having a generator G with parameters θ.sub.G and a discriminator D with parameters θ.sub.D, [Training using data comprised of words indicates each word/token to be a field of a sentence/larger text with values of the words themselves]);
generating variational outputs from a same set of inputs using a function assigning higher multi-metric values to one or more candidate outputs of the plurality of candidate outputs based at least on comparisons between the one or more candidate outputs and the one or more variational responses of ground truth data ([0115] The response samples L can be ranked using the discriminator score [0116] {D*(X.sub.i, Y.sub.i,l)]}.sub.l=1.sup.L, [0122] the selected response is the candidate response having the highest (or lowest) discriminator score. That is, the selected response is the candidate response that is indicated by the machine classifier to be closest to a ground truth response, [The discriminator score is reasonably considered to be a higher multi-metric as the score is dependent upon discriminator output D, Generator output Y, and input x (see Figs. 3A/B), wherein each component of the discriminator function can be represented as its own metric (as would be required for processing numerically represented computer encodings)]),
Barkan and Olabiyi are considered analogous art within speech response generation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Barkan to incorporate the teachings of Olabiyi, because of the novel way to employ conditional GANs for multi-turn dialogue models with a HRED generator and discriminator, combining generative and retrieval-based multi-turn dialogue systems, improving their individual performances through sharing context and word embeddings between the generator and discriminator (Olabiyi, [0018]).
Barkan in view of Olabiyi does not disclose:
a plurality of multi-metric values comprising:
one or more first metric values indicating accuracy of the one or more candidate outputs in representing the one or more training values and 
one or more second metric values indicating variation in syntax between the one or more candidate outputs and the one or more variational responses.
Challa discloses:
wherein the multi-metric values comprise 
one or more first metric values indicating accuracy of the one or more candidate outputs in representing the one or more training values ([Col. 26, Lines 9-10] the assistant system 140 may use methods to implement a grammaticality “filter”, [Col. 26, Lines 15-20] assistant system 140 may then determine, by the one or more classification models, a threshold score for determining a quality-indication for a candidate response based on the plurality of confidence scores associated with the plurality of candidate responses, [Col. 27, Lines 29-36] the assistant system 140 may determine, by a natural-language understanding module 220, one or more slots and one or more intents associated with the user input. Correspondingly, the one or more classification rules may be based on the slots and intents. As an example and not by way of limitation, the classification rules for an intent of getting updates about a friend may be different from those for an intent of getting biographical introduction of a famous politician [Determining a classification based on slots, i.e. entity tags/values, indicates a required value calculation to perform the classification and filtering, wherein the filtering operation indicates a required metric determination for setting the filter]), and 
(ii) 	one or more second metric values indicating variation in syntax between the one or more candidate outputs and the one or more variational responses ([Col. 23, Lines 1-5] a grammatical classification and semantic correctness classification dataset for the weather domain that consists of responses generated by 3 data-driven NLG systems, [Col. 27, Lines 10-20] classification rules may be based on one or more of metadata associated with the user input, a language-structure of the user input, user profile data associated with the user, or historical user input from the user and corresponding candidate responses presented to the user, [Col. 26, Lines 15-20] The assistant system 140 may then determine, by the one or more classification models, a threshold score for determining a quality-indication for a candidate response based on the plurality of confidence scores, [Col. 27, Lines 19-23] As another example and not by way of limitation, the filtering module may analyze the language-structure of the user input and select candidate responses with language-structures that match the language-structure of the user input [Determining a classification based on grammatical/semantic correctness and/or overall language structure, tracking to syntax, wherein the classification can be based on historical candidate responses, i.e. sample responses, indicating a required comparison of the old responses to the current response for the classification, wherein the filtering operation indicates a required metric determination for setting the filter. Further, the classifications of Challa could be used for determining syntactic and/or conceptual differences of Olabiyi without a change in functionality to Olabiyi. Performing a plurality of classification on the same response indicates the collective quality indication to be based upon a multi-metric]).
Barkan, Olabiyi, and Challa are considered analogous art within automated speech response generation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Barkan in view of Olabiyi to incorporate the teachings of Challa, because of the novel way to have an assistant system interact with different agents to obtain information or services to be used in a generated responses, wherein those generated responses with ungrammatical results are filtered out, reducing processing time/power for response decisions (Challa, [Col. 2, Lines 1-67]).
	
	
Regarding claim 11, Barkan in view of Olabiyi, further in view of Challa discloses: the method of claim 10.
Barkan further discloses:
Wherein the variational outputs include at least a first output having a first syntax and a second output having a second syntax that is a variant of the first syntax ([Fig. 7], [0066] the DSE model iteratively compares each candidate sentence representation to the representation of the query sentence until the query sentence has been compared to all the candidate sentences at 710. The DSE language model selects similar sentences based on the scores at 712. The system outputs sentences similar to the query sentence).
	Challa further discloses:
	wherein the variation in syntax corresponds to an arrangement of the one or more training values ([Table 3, “# grammatically/semantically correct”, “# ungrammatical/semantically incorrect”], [Col. 22, Lines 60-65] the assistant system 140 may use a generate, filter, and rank framework, in which candidate responses are first filtered to eliminate unacceptable responses, and then ranked to select the best response. In particular embodiments, acceptability may include grammatical correctness and semantic correctness, [Eliminating unacceptable responses, wherein this determination is made based upon grammar and semantics, i.e. either or both of which could represent syntax, indicating at least two responses with different syntax, i.e. correct and incorrect, based upon the position of values, i.e. entities/slots/labels. See Table 3]), a first value of the one or more training values arranged in a first position relative to a second value of the one or more training values based at least on 
the variation in syntax ([A variation in syntax will inherently change the position of a first value relative to a second value. Similarly, changing the positions of values will inherently change the syntax of the sentence]), and 
the positioning of at least one of the first value or the second value in the query ([Col. 27, Lines 30-40] the assistant system 140 may determine, by a natural-language understanding module 220, one or more slots and one or more intents associated with the user input. Correspondingly, the one or more classification rules may be based on the slots and intents. As an example and not by way of limitation, the classification rules for an intent of getting updates about a friend may be different from those for an intent of getting biographical introduction of a famous politician (e.g., favoring naturalness v.s. favoring grammaticality), [A syntactical classification based on slots and intents, i.e. the query’s objective, indicates different syntaxes for the intents (disclosed as natural vs. grammatically correct). Further, “a first value… arranged in a first position relative to a second values…based at least on…the positioning of at least one of the first… or second value” tracks to determining positioning based on the existing positioning, i.e. no operation to be performed. It is unclear to the examiner what this step represents]).

	Regarding claim 12, Barkan in view of Olabiyi, further in view of Challa discloses: the method of claim 10.
	Challa further discloses:
	obtaining the one or more training values using an application programming interface (API) corresponding to a domain associated with at least one query of the one or more queries ([Col. 8, Lines 55-63] the social-networking system 160 may enable users to interact with each other as well as receive content from third-party systems 170 or other entities, or to allow users to interact with these entities through an application programming interfaces (API) or other communication channels, [Col. 12, Lines 5-13] The NLU module 220 may classify the text/speech input into a member of the pre-defined taxonomy, e.g., for the input “Play Beethoven's 5th,” the NLU module 220 may classify the input as having the intent [IN:play_music]. In particular embodiments, a domain may be conceptually a namespace for a set of intents, e.g., music [Gathering information through API calls, wherein the requests are clearly in specific domains associated with queries, i.e. “Play Music?”, indicating an obtaining of values, i.e. the music, based on an API call]).


Regarding claim 13, Barkan discloses: One or more processors ([0070] a processor), comprising:
circuitry to ([Inherent to the infrastructure of a physical processor]) to:
apply, using a neural network and based at least on processing a training data instance including one or more training queries or one or more training values corresponding to a plurality of training fields ([0020] distilled sentence embedding (DSE) model for computing sentence similarity comparisons based on sentence representations, [0074] training a neural network for comparing the similarities between a primary sentence and several candidate sentences [Determining similarities between sentences indicates a required processing of those sentences, i.e. training data, wherein that processing will be comparing feature vectors, indicating each feature can represent a value corresponding to a field for a domain associated with the query, i.e. a query will inherently have a domain]), a plurality of candidate outputs ([0070] train a distilled sentence embedding (DSE) language model by decoupling a transformer language model using knowledge distillation to calculate sentence embeddings for a plurality of candidate sentences… generate a set of similarity scores for each candidate sentence in the plurality of candidate sentences and a selected sentence associated with an input query [Determining similarity scores between candidate and selected sentences indicates the generation/determination of a plurality of estimated, i.e. candidate, responses/outputs]).
Barkan does not disclose:
the training data instance representative of one or more variational responses to a set of input data, the set of input data comprising the one or more training queries and the one or more training values corresponding to the plurality of training fields;
compare, for assigning higher multi-metric values to one or more candidate outputs of the plurality of candidate outputs, the one or more candidate outputs of the plurality of candidate outputs, the one or more candidate outputs to a plurality of variational responses, 
a plurality of multi-metric values comprising:
one or more first metric values indicating accuracy of the one or more candidate outputs in representing the plurality of training fields, and 
one or more second metric values indicating variation in syntax between the one or more candidate outputs and the plurality of variational responses,
calculate loss values by applying one or more loss functions that assign higher loss values to candidate outputs having lower accuracy or lower syntactic variation; and,
update, based at least on the loss values, one or more parameters of the neural network.
Olabiyi discloses:
the training data instance representative of one or more variational responses to a set of input data ([0053] The discriminator 362 can share context and word embedding with the generator 360 and can discriminate at the word level. Word-level discrimination can be achieved through a bidirectional RNN and is able to capture both syntactic and conceptual differences between the generator output and the ground truth, [Defining syntactic and conceptual differences within generated responses indicates the responses to be a variational set of input data with respect to the discriminator ranking the received responses]), the set of input data comprising the one or more training queries and the one or more training values corresponding to the plurality of training fields ([0099] The training data can include a number of dialog sequences in a multi-turn dialog. Each dialog sequence can include one or more word tokens. The training data can also include a current prompt to which the machine classifier is being trained to generate a response. The current prompt can include one or more word tokens indicating a statement, question, or other dialog step, [0104] the algorithm 450 shown in FIG. 4B can be used to train a machine classifier having a generator G with parameters θ.sub.G and a discriminator D with parameters θ.sub.D, [Training using data comprised of words indicates each word/token to be a field of a sentence/larger text with values of the words themselves]);
compare, for assigning higher multi-metric values to one or more candidate outputs of the plurality of candidate outputs, the one or more candidate outputs of the plurality of candidate outputs, the one or more candidate outputs to a plurality of variational responses ([0115] The response samples L can be ranked using the discriminator score [0116] {D*(X.sub.i, Y.sub.i,l)]}.sub.l=1.sup.L, [0122] the selected response is the candidate response having the highest (or lowest) discriminator score. That is, the selected response is the candidate response that is indicated by the machine classifier to be closest to a ground truth response, [The discriminator score is reasonably considered to be a higher multi-metric as the score is dependent upon discriminator output D, Generator output Y, and input x (see Figs. 3A/B), wherein each component of the discriminator function can be represented as its own metric (as would be required for processing numerically represented computer encodings)]),
calculate loss values by applying one or more loss functions that assign higher loss values to candidate outputs having lower accuracy or lower syntactic variation ([0031] A machine classifier being trained in an asymmetric mode may minimize the discriminator loss of the generator G based on the autoregressive outputs of the generator, [Applying a loss gathered through a discriminator (whose job is to compare input to ground truth as previously disclosed), wherein the determination is made on a word/token level, indicates the loss to be based on a comparison of syntax and/or accuracy (see previously cited [0053] which discloses the generating a probability of correspondence between input and ground truth). Further, minimizing loss indicates reducing overall discrepancy between ground truth and input, indicating a comparison of two pieces of text based on syntactical differences will have a higher loss value with lower accuracy/variation to be minimized]); and,
update, based at least on the loss values, one or more parameters of the neural network ([0020] The machine classifiers may be trained in both auto-regressive and traditional teacher-forcing modes, with the generator including a hierarchical recurrent encoder-decoder network and the discriminator including a bi-directional recurrent neural network, [0103] The model parameters can be updated based on the context, the current prompt, the generated response, the ground truth response, and the discriminator accuracy. By updating the model parameters, the generator can be trained to generate responses that are more accurate. In several embodiments, the model parameters are updated using an autoregression weighted by the discriminator accuracy, [Wherein the discriminator accuracy is directly used for loss, indicating an updating of the model based on loss values as would be required to produce more accurate results. Further, wherein the context of [0103] is with regard to Fig. 4A, defined to be a flow chart of a process for training a machine classifier which contains a neural network]).
Barkan and Olabiyi are considered analogous art within speech response generation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Barkan to incorporate the teachings of Olabiyi, because of the novel way to employ conditional GANs for multi-turn dialogue models with a HRED generator and discriminator, combining generative and retrieval-based multi-turn dialogue systems, improving their individual performances through sharing context and word embeddings between the generator and discriminator (Olabiyi, [0018]).
Barkan in view of Olabiyi does not disclose:
a plurality of multi-metric values comprising:
one or more first metric values indicating accuracy of the one or more candidate outputs in representing the plurality of training fields, and 
one or more second metric values indicating variation in syntax between the one or more candidate outputs and the plurality of variational responses.
Challa discloses:
a plurality of multi-metric values comprising:
one or more first metric values indicating accuracy of the one or more candidate outputs in representing the plurality of training fields ([Col. 26, Lines 9-10] the assistant system 140 may use methods to implement a grammaticality “filter”, [Col. 26, Lines 15-20] assistant system 140 may then determine, by the one or more classification models, a threshold score for determining a quality-indication for a candidate response based on the plurality of confidence scores associated with the plurality of candidate responses, [Col. 27, Lines 29-36] the assistant system 140 may determine, by a natural-language understanding module 220, one or more slots and one or more intents associated with the user input. Correspondingly, the one or more classification rules may be based on the slots and intents. As an example and not by way of limitation, the classification rules for an intent of getting updates about a friend may be different from those for an intent of getting biographical introduction of a famous politician [Determining a classification based on slots, i.e. entity tags/values, indicates a required value calculation to perform the classification and filtering, wherein the filtering operation indicates a required metric determination for setting the filter]), and 
one or more second metric values indicating variation in syntax between the one or more candidate outputs and the plurality of variational responses ([Col. 23, Lines 1-5] a grammatical classification and semantic correctness classification dataset for the weather domain that consists of responses generated by 3 data-driven NLG systems, [Col. 27, Lines 10-20] classification rules may be based on one or more of metadata associated with the user input, a language-structure of the user input, user profile data associated with the user, or historical user input from the user and corresponding candidate responses presented to the user, [Col. 26, Lines 15-20] The assistant system 140 may then determine, by the one or more classification models, a threshold score for determining a quality-indication for a candidate response based on the plurality of confidence scores [Col. 27, Lines 19-23] As another example and not by way of limitation, the filtering module may analyze the language-structure of the user input and select candidate responses with language-structures that match the language-structure of the user input [Determining a classification based on grammatical/semantic correctness and/or overall language structure, tracking to syntax, wherein the classification can be based on historical candidate responses, i.e. sample responses, indicating a required comparison of the old responses to the current response for the classification, wherein the filtering operation indicates a required metric determination for setting the filter. Further, the classifications of Challa could be used for determining syntactic and/or conceptual differences of Olabiyi without a change in functionality to Olabiyi. Performing a plurality of classification on the same response indicates the collective quality indication to be based upon a multi-metric]).
Barkan, Olabiyi, and Challa are considered analogous art within automated speech response generation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Barkan in view of Olabiyi to incorporate the teachings of Challa, because of the novel way to have an assistant system interact with different agents to obtain information or services to be used in a generated responses, wherein those generated responses with ungrammatical results are filtered out, reducing processing time/power for response decisions (Challa, [Col. 2, Lines 1-67]).

Regarding claim 14, Barkan in view of Olabiyi, further in view of Challa discloses: the one or more processors of claim 13.
Barkan further discloses:
wherein the plurality of candidate outputs include at least a first candidate output having a first syntax and the second candidate output having a second syntax that is a variant of the first syntax ([Fig. 7], [0066] the DSE model iteratively compares each candidate sentence representation to the representation of the query sentence until the query sentence has been compared to all the candidate sentences at 710. The DSE language model selects similar sentences based on the scores at 712. The system outputs sentences similar to the query sentence).

Regarding claim 15, Barkan in view of Olabiyi, further in view of Challa discloses: the one or more processors of claim 14.
Barkan further discloses:
wherein the syntax of the first candidate output represents at least one of a length of the first candidate output or an arrangement of one or more values of the one or more training values corresponding to the plurality of training fields in the first candidate output ([Fig. 4, “Set of Similar Sentences 420”], [Inclusion of sets of similar sentences indicates that the differences are in length, arrangement of values, or other syntactical differences if the sentence is defined as similar to one which can accurately respond to the query]).

Regarding claim 16, Barkan in view of Olabiyi, further in view of Challa discloses: the one or more processors of claim 13.
Barkan further discloses:
wherein the comparing includes evaluating a condition indicative of one or more differences between the plurality of candidate outputs and the plurality of variational responses ([0080] a student language model, wherein the student language model is trained by comparing test similarity scores ranking a test sentence paired with each candidate sentence in the plurality of candidate sentences with training similarity scores generated by a trained teacher model [Comparison of similarity scores between those generated by a teacher model, i.e. sample, and a student model, i.e. estimated, wherein the comparisons are based on differences in how similar they are to a query, i.e. how well they answer]).

	Regarding claim 19, Barkan in view of Olabiyi, further in view of Challa discloses: the one or more processors of claim 13.
	Olabiyi further discloses:
	wherein the neural network comprises at least one of (i) an autoregressive model ([0019] embodiments of the invention may employ autoregressive sampling, [A model performing autoregressive operations indicates that model to be autoregressive]) or (ii) a model having an encoder and a decoder ([0020] the generator including a hierarchical recurrent encoder-decoder network).

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sarikaya in view of Olabiyi, further in view of Challa, further in view of Mars (US-20220318518-A1).

	Regarding claim 5, Sarikaya in view of Olabiyi, further in view of Challa discloses: the one or more processors of claim 1.
	Sarikaya in view of Olabiyi, further in view of Challa does not disclose:
	Wherein: 
a first query and the one or more fields corresponding to the query are a plurality of first fields; and,
	a second query linked to the first query, and one or more values corresponding to one or more second fields corresponding to the second query.
	Mars discloses:
	a first query and the plurality of fields corresponding to the query are a plurality of first fields ([0072] In such example, a first response of the response corpus (e.g., The delivery fee is $3) may be converted to a first embedded response representation [Query tracks to a user asking about a delivery fee, fields include price, implicit reference to a specific restaurant]); and,
	a second query linked to the first query, and one or more values corresponding to one or more second fields corresponding to the second query ([0072] a second response of the response corpus (e.g., We have a wide selection of vegetarian pizzas) may be converted to a second embedded response representation distinct from the first embedded response representation [A second query, i.e. “What kind of pizza do you have?”, with fields of pizza type, based on restaurant, i.e. location, which is linked to the first query]).
Sarikaya, Olabiyi, Challa, and Mars are considered analogous art within conversational generation systems. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Sarikaya in view of Olabiyi, further in view of Challa to incorporate the teachings of Mars, because of the novel way to generate and implement a response corpus to a query which requires limited-to-no training experience resulting in reduced training time, samples, and development time (Mars, [0005]).
	

Claim(s) 7, 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sarikaya in view of Olabiyi, further in view of Challa, further in view of Song et al. (US-20230050655-A1), hereinafter Song.
	
	
	Regarding claim 7, Sarikaya in view of Olabiyi, further in view of Challa discloses: the one or more processors of claim 1.
	Sarikaya in view of Olabiyi, further in view of Challa does not disclose:
	Wherein the neural network includes a large language model (LLM).
	Song discloses:
	Wherein the neural network includes a large language model (LLM) ([0056] As described herein, an end-to-end neural system may incorporate a large-scale pre-trained language model to build a chit-chat dialog system).
Sarikaya, Olabiyi, Challa, and Song are considered analogous art within dialog agent modelling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Sarikaya in view of Olabiyi, further in view of Challa to incorporate the teachings of Song, because of the novel way to use a federated machine learning model to train language models with data from multiple owners without revealing the raw data associated with the owners to preserve privacy of information gathered (Song, [0005]).

	Regarding claim 8, Sarikaya in view of Olabiyi, further in view of Challa discloses: the one or more processors of claim 1.
	Sarikaya in view of Olabiyi, further in view of Challa does not disclose:
	wherein the neural network is pre-trained on a plurality of domains prior to being re-trained for a particular domain included in the plurality of domains or separate from the plurality of domains.
	Song discloses:
	wherein the neural network is pre-trained on a plurality of domains prior to being re-trained for a particular domain included in the plurality of domains or separate from the plurality of domains ([0022] In an embodiment, the processor may be further configured to train the second initial learning model based on the reply information to result in an updated second learning model. The sensitive information input to the initial learning model may not be used in the training of the learning model that results in the updated second learning model [Updated training on the second model according to the reply information, corresponding to context information, indicates updating an already trained model, i.e. the second initial learning model, based on a specific domain, i.e. context]).
Sarikaya, Olabiyi, Challa, and Song are considered analogous art within dialog agent modelling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Sarikaya in view of Olabiyi, further in view of Challa to incorporate the teachings of Song, because of the novel way to use a federated machine learning model to train language models with data from multiple owners without revealing the raw data associated with the owners to preserve privacy of information gathered (Song, [0005]).

Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sarikaya in view of Olabiyi, further in view of Challa, further in view of Mars, further in view of Song.

	Regarding claim 9, Sarikaya in view of Olabiyi, further in view of Challa discloses: the one or more processors of claim 1.
	Sarikaya further discloses:
	Wherein the one or more processors are comprised in at least one of:
	A system for performing conversational AI operations ([0016] Automatic speech recognition (ASR) is a field of computer science, artificial intelligence);
	A system incorporating one or more virtual machines (VMs) ([0166] A server may also include one or more virtual machines);
	A system of an autonomous ([0175] a smart phone 110b) or semi-autonomous machine ([0175] a smart television 110g);
	An in-vehicle infotainment system of an autonomous or semi-autonomous machine ([0175] a vehicle 110e).
	Sarikaya in view of Olabiyi, further in view of Challa does not disclose:
	A system for performing deep learning operations;
	A system implemented using a robot;
	A system implemented at least partially using cloud computing resources;
	A system for generating or presenting one or more of virtual reality content, augmented reality content, or mixed reality content;
a system for performing simulation operations (Not covered, not required due to disjunctive nature of the claim); 
a system for performing digital twin operations (Not covered, not required due to disjunctive nature of the claim); 
a system for performing light transport simulation (Not covered, not required due to disjunctive nature of the claim); 
a system for performing collaborative content creation for 3D assets (Not covered, not required due to disjunctive nature of the claim);
a system implemented using an edge device (Not covered, not required due to disjunctive nature of the claim);
	Mars discloses:
	A system for performing deep learning operations ([0044] a deep learning algorithm);
	A system implemented using a robot ([0048] virtual assistant 160);
	A system implemented at least partially using cloud computing resources ([0099] one or more remote or cloud-based systems);
Sarikaya, Olabiyi, Challa, and Mars are considered analogous art within conversational generation systems. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Sarikaya in view of Olabiyi, further in view of Challa to incorporate the teachings of Mars, because of the novel way to generate and implement a response corpus to a query which requires limited-to-no training experience resulting in reduced training time, samples, and development time (Mars, [0005]).
	Mars does not disclose:
A system for generating or presenting one or more of virtual reality content, augmented reality content, or mixed reality content;
	Song discloses:
	A system for generating or presenting one or more of virtual reality content, augmented reality content, or mixed reality content ([0097] virtual reality controller and/or virtual reality headset);
Sarikaya, Olabiyi, Challa, Mars, and Song are considered analogous art within dialog agent modelling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Sarikaya in view of Olabiyi, further in view of Challa, further in view of Mars to incorporate the teachings of Song, because of the novel way to use a federated machine learning model to train language models with data from multiple owners without revealing the raw data associated with the owners to preserve privacy of information gathered (Song, [0005]).

Claim(s) 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Barkan in view of Olabiyi, further in view of Challa, further in view of Mars.

Regarding claim 18, Barkan in view of Olabiyi, further in view of Challa discloses: the one or more processors of claim 13.
Barkan in view of Olabiyi, further in view of Challa does not disclose:
a first training query, the plurality of training fields corresponding to the one or more training queries are a plurality of first training fields, and the plurality of variational responses corresponding to the one or more training queries are a plurality of first variational responses;
the training data instance includes a second training query linked to the first training query, second training values corresponding to a plurality of second training fields corresponding to the second training query, and a plurality of second variational responses corresponding to the second training query; and,
the circuitry is to further update the one or more parameters of the neural network based at least on the plurality of second variational responses, the second training values, and a third query comprising the first training query and the second training query.
Mars discloses:
a first training query, the plurality of training fields corresponding to the one or more training queries are a plurality of first training fields, and the plurality of variational responses corresponding to the one or more training queries are a plurality of first variational responses ([0066] For instance, a single response corpus may include a plurality of responses associated with only a single response classification category (e.g., bookings) [A response corpus indicates those responses are to queries, specifically fields associated with booking, i.e. price, location, length of stay, wherein the response corpus represents sample responses]);
the training data instance includes a second training query linked to the first training query, second training values corresponding to a plurality of second training fields corresponding to the second training query, and a plurality of second variational responses corresponding to the second training query ([0066] a second response corpus may include a plurality of response associated with another response classification category (e.g., cancellations) [A response corpus indicates those responses are to queries, specifically fields associated with cancellation, i.e. price, location, time until stay, wherein the response corpus represents sample responses and the tasks of booking and cancellation and clearly linked, wherein the response corpus represents sample responses]); and,
the circuitry is to further update the one or more parameters of the neural network based at least on the plurality of second variational responses, the second training values, and a third query comprising the first training query and the second training query ([0063] Additionally, or alternatively, in a third implementation, a first subset of responses of the response corpus may include only anchor responses and a second subset of responses (of the response corpus) may include anchor responses and associated system-displayed responses related to the anchor responses [A second subset of responses tracks to second sample responses with associated second values] wherein, [0116] In a first implementation, if or when it may be determined that a pairwise of a target query/target response embeddings are misaligned, S340 may function to nudge or move the embedding values of the pairwise by bending or collapsing the embedding values of the pairwise closer together [i.e. updating embedding alignment parameters within a neural network], and Mars defines [0072] a third response of the response corpus (e.g., Your order will arrive in 30 minutes) [A third response to a third query which clearly is dependent on the previous two queries in the pizza ordering example provided in [0072]]).
 Barkan, Olabiyi, Challa, and Mars are considered analogous art within query response. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Barkan in view of Olabiyi, further in view of Challa to incorporate the teachings of Mars, because of the novel way to generate and implement a response corpus to a query which requires limited-to-no training experience resulting in reduced training time, samples, and development time (Mars, [0005]).

Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Barkan in view of Olabiyi, further in view of Challa, further in view of Sarikaya, further in view of Mars, further in view of Song.

Regarding claim 20, Barkan in view of Olabiyi, further in view of Challa discloses: the one or more processors of claim 13.
Barkan further discloses:
Wherein the one or more processors are comprised in at least one of:
A system implemented at least partially in a data center ([0023] one or more data centers);
	Barkan in view of Olabiyi, further in view of Challa does not disclose:
	A system for performing conversational AI operations;
A system incorporating one or more virtual machines (VMs);
A system of an autonomous or semi-autonomous machine;
An in-vehicle infotainment system of an autonomous or semi-autonomous machine;
A system for performing deep learning operations;
A system implemented using a robot;
A system implemented at least partially using cloud computing resources;
	A system for generating or presenting one or more of virtual reality content, augmented reality content, or mixed reality content;
A system for generating or presenting one or more of virtual reality content, augmented reality content, or mixed reality content;
a system for performing simulation operations (Not covered, not required due to disjunctive nature of the claim); 
a system for performing digital twin operations (Not covered, not required due to disjunctive nature of the claim); 
a system for performing light transport simulation (Not covered, not required due to disjunctive nature of the claim); 
a system for performing collaborative content creation for 3D assets (Not covered, not required due to disjunctive nature of the claim);
a system implemented using an edge device (Not covered, not required due to disjunctive nature of the claim);
Sarikaya further discloses:
	Wherein the processor is comprised in at least one of:
	A system for performing conversational AI operations ([0016] Automatic speech recognition (ASR) is a field of computer science, artificial intelligence);
	A system incorporating one or more virtual machines (VMs) ([0166] A server may also include one or more virtual machines);
	A system of an autonomous ([0175] a smart phone 110b) or semi-autonomous machine ([0175] a smart television 110g);
	An in-vehicle infotainment system of an autonomous or semi-autonomous machine ([0175] a vehicle 110e).
Barkan, Olabiyi, Challa, and Sarikaya are considered analogous art within query response generation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Barkan in view of Olabiyi, further in view of Challa to incorporate the teachings of Sarikaya, because of the novel way to apply spoken language understanding to many different domains for more complete, accurate responses (Sarikaya, [0002]).
	Sarikaya in view of Olabiyi, further in view of Challa does not disclose:
	A system for performing deep learning operations;
	A system implemented using a robot;
	A system implemented at least partially using cloud computing resources;
	A system for generating or presenting one or more of virtual reality content, augmented reality content, or mixed reality content;
	Mars discloses:
	A system for performing deep learning operations ([0044] a deep learning algorithm);
	A system implemented using a robot ([0048] virtual assistant 160);
	A system implemented at least partially using cloud computing resources ([0099] one or more remote or cloud-based systems);
Barkan, Olabiyi, Challa, Sarikaya and Mars are considered analogous art within conversational generation systems. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Barkan in view of Olabiyi, further in view of Challa, further in view of Sarikaya to incorporate the teachings of Mars, because of the novel way to generate and implement a response corpus to a query which requires limited-to-no training experience resulting in reduced training time, samples, and development time (Mars, [0005]).
	Mars does not disclose:
A system for generating or presenting one or more of virtual reality content, augmented reality content, or mixed reality content;
	Song discloses:
	A system for generating or presenting one or more of virtual reality content, augmented reality content, or mixed reality content ([0097] virtual reality controller and/or virtual reality headset);
Barkan, Olabiyi, Challa, Sarikaya, Mars, and Song are considered analogous art within dialog agent modelling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Barkan in view of Olabiyi, further in view of Challa, further in view of Sarikaya, further in view of Mars to incorporate the teachings of Song, because of the novel way to use a federated machine learning model to train language models with data from multiple owners without revealing the raw data associated with the owners to preserve privacy of information gathered (Song, [0005]).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Laban et al. (US-20230419048-A1) discloses “Embodiments described herein provide a method and system for generating a reading interface for a user. The method includes receiving a first text passage from a first data source and a second text passage from a second data source. The method also includes generating a candidate question relating to contents of the first and the second text passages. The method further includes generating a first answer to the candidate question and a second answer to the candidate question. The method further includes determining that the candidate question qualifies as a discord question when the first answer and the second answer are both available and exhibit semantic diversity” (abstract). See entire document.
Alomari et al. (US-20240021193-A1) discloses “A method of training a neural network to generate conversational replies, the method including: providing a first dataset of stored phrases linked to form a plurality of conversational sequences; training the neural network to generate responses to input phrases using the first dataset; and using the trained neural network to generate a list of conversational replies in response to conversational inputs” (abstract). See entire document.
Ma et al. (US-20200226475-A1) discloses “Methods and systems are provided for a natural language processing system comprising a chatbot adapted for dialog generation. In one example, the system may include a combination of a variational autoencoder (VAE) and a generative adversarial network (GAN) for generating natural responses to input queries. The VAE may convert queries into vector embeddings that may then be used by the GAN to continuously update and improve responses provided by the chatbot” (abstract). See entire document.
Irving et al. (US-20240104336-A1) discloses “Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enabling a user to conduct a dialogue. Implementations of the system learn when to rely on supporting evidence, obtained from an external search system via a search system interface, and are also able to generate replies for the user that align with the preferences of a previously trained response selection neural network. Implementations of the system can also use a previously trained rule violation detection neural network to generate replies that take account of previously learnt rules” (abstract). See entire document.
Khot et al. (“GooAQ: Open Question Answering with Diverse Answer Types”) discloses “We use this dataset to study inherent differences between models producing different answer types, and observe interesting trends. For example, in line with recent work, LM’s strong performance on GOOAQ’s shortanswer questions heavily benefits from annotated data. However, their surprisingly high quality in generating coherent and accurate answers for questions requiring long responses (such as ‘how’ and ‘why’ questions) is less reliant on observing annotated data and mainly supported by their pre-training. Moreover, we show that GOOAQ is a valuable training resource, resulting in strong performance on the recent ELI5 long-answers dataset” (abstract). See entire document.
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THEODORE JOHN WITHEY whose telephone number is (703)756-1754. The examiner can normally be reached Monday - Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THEODORE WITHEY/Examiner, Art Unit 2655                   

/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Dec 02, 2022
Application Filed
Nov 22, 2024
Non-Final Rejection — §103
Jan 08, 2025
Examiner Interview Summary
Jan 08, 2025
Applicant Interview (Telephonic)
Feb 14, 2025
Response Filed
Mar 10, 2025
Final Rejection — §103
May 12, 2025
Applicant Interview (Telephonic)
May 12, 2025
Examiner Interview Summary
May 21, 2025
Request for Continued Examination
May 22, 2025
Response after Non-Final Action
Aug 08, 2025
Non-Final Rejection — §103
Oct 21, 2025
Applicant Interview (Telephonic)
Oct 21, 2025
Examiner Interview Summary
Nov 12, 2025
Response Filed
Jan 12, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/655,770
Patent 12591744
METHOD FOR TRAINING SEMANTIC REPRESENTATION MODEL, DEVICE AND STORAGE MEDIUM
2y 5m to grant Granted Mar 31, 2026
18/113,192
Patent 12536994
APPARATUS FOR CLASSIFYING SOUNDS BASED ON NEURAL CODE IN SPIKING NEURAL NETWORK AND METHOD THEREOF
2y 5m to grant Granted Jan 27, 2026
17/956,558
Patent 12475330
METHOD FOR IDENTIFYING NOISE SAMPLES, ELECTRONIC DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Nov 18, 2025
17/813,944
Patent 12417759
SPEECH RECOGNITION USING CADENCE PATTERNS
2y 5m to grant Granted Sep 16, 2025
17/986,417
Patent 12412580
Sound Extraction System and Sound Extraction Method
2y 5m to grant Granted Sep 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
44%
Grant Probability
90%
With Interview (+46.9%)
2y 11m
Median Time to Grant
High
PTA Risk
Based on 23 resolved cases by this examiner. Grant probability derived from career allow rate.
GENERATING VARIATIONAL DIALOGUE RESPONSES FROM STRUCTURED DATA FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email