Last updated: April 19, 2026
Application No. 18/746,579
CONTROLLING DIALOGUE USING CONTEXTUAL INFORMATION FOR STREAMING SYSTEMS AND APPLICATIONS

Non-Final OA §101§102§103
Filed
Jun 18, 2024
Examiner
SULTANA, NADIRA
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
Interview Optional

— +31.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 97 resolved cases, 2023–2026
Examiner Intelligence

SULTANA, NADIRA View full profile →
Grants 74% — above average
Career Allow Rate
72 granted / 97 resolved
+12.2% vs TC avg
Strong +31% interview lift
Without
With
+31.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
29 currently pending
Career history
126
Total Applications
across all art units
Statute-Specific Performance

§101
25.4%
-14.6% vs TC avg
§103
54.8%
+14.8% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
3.6%
-36.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 97 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION

Notice of AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.
The Independent Claim 1 recites “generating, based at least on information associated with an interactive application, one or more embeddings associated with the information”; “determining, based at least on a textual input, at least a portion of the one or more embeddings”; “determining, based at least on one or more language models processing input data associated with the textual input and the at least the portion of the one or more embedding, a textual output for the textual input”; “and causing a character of the interactive application to output speech associated with the textual output”. The limitations above as drafted, is a process that, under its broadest reasonable interpretation, covers a mental process, as this could be performed in the human mind or with the aid of pen and paper.
The limitation of "generating ... ", “determining..”, “causing…” as drafted covers mental activities. More specifically, two person can play an interactive game, can generate some rules, procedures of the game in textual format and can draw some map and character and save those as embeddings on a piece of paper. They can generate a textual output based on the textual input ( rules, procedures of the games) and can create an output speech for a character of the game. The above steps, as drafted, is a process that under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. The claim didn’t recite anything in the claim element which precludes the step from practically being performed in the human mind. Additionally, the mere nominal recitation of a generic computer appliance does not take the claim limitation out of the mental processes grouping. Thus, the claim recites a mental process. 
The claim recites the additional limitation of  “language model”, for performing the method, which is recited at a high level of generality and is recited as performing generic computer functions routinely used in computer applications. The current specification in paragraph [0060] specifies language model as neural network or any other type of language model, which is generic and can be any neural network or any language model, which is not sufficient to amount to significantly more than the judicial exception. This is no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
Thus, taken alone, the additional elements do not amount to significantly more than the above identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds
nothing that is not already present when looking at the elements taken individually. There is no indication
that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Claim 1 is therefore not drawn to eligible subject matter as this is  directed to an abstract idea without significantly more than the abstract idea.

The Independent Claim 10 recites “determine, based at least on a textual input associated with an application, one or more first sources of information from one or more second sources of information associated with the application”; “generate input data based at least on the textual input and the one or more first sources of information”; “determine, based at least on one or more language models processing the input data, a textual output for the textual input”; “and cause a character of the application to output speech associated with the textual output”. The limitations above as drafted, is a process that, under its broadest reasonable interpretation, covers a mental process, as this could be performed in the human mind or with the aid of pen and paper.
The limitation of "generate ... ", “determine..”, “cause…” as drafted covers mental activities. More specifically, a person while using an application ( which can be a gaming application), can determine that the sources of information, based on an input,  can generate the input data based on a textual input and a source of information, a textual output can be generated based on the textual input ( can be regarding query, rules, procedures of the games) and can create an output speech for a character of the game. The above steps, as drafted, is a process that under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor”, nothing in the claim element precludes the step from practically being performed in the human mind. Additionally, the mere nominal recitation of a generic computer appliance does not take the claim limitation out of the mental processes grouping. Thus, the claim recites a mental process.. 
The claim recites the additional limitation of  “language model”, for performing the method, which is recited at a high level of generality and is recited as performing generic computer functions routinely used in computer applications. The current specification in paragraph [0060] specifies language model as neural network or any other type of language model, which is generic and can be any neural network or any language model, which is not sufficient to amount to significantly more than the judicial exception. This is no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
Thus, taken alone, the additional elements do not amount to significantly more than the above identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds
nothing that is not already present when looking at the elements taken individually. There is no indication
that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Claim 10 is therefore not drawn to eligible subject matter as this is  directed to an abstract idea without significantly more than the abstract idea.

The Independent Claim 18 recites “generate a response to a query based at least on one or more language models processing a prompt that is associated with one or more first embeddings and to cause the response to be output perceptually within an interactive application, wherein the one or more first embeddings are identified from one or more second embeddings stored in one or more databases”, “and wherein the one or more second embeddings are associated with one or more sources that include contextual information associated with the interactive application”. The limitations above as drafted, is a process that, under its broadest reasonable interpretation, covers a mental process, as this could be performed in the human mind or with the aid of pen and paper.
Two person can play an interactive game, can generate some rules, procedures of the game in textual format and can draw some map and character and save those as embeddings on a piece of paper. They can generate response of a query based on the embeddings. They can generate a textual output based on the textual input ( rules, procedures of the games), where second embedding or information maybe from a source with contextual information . The above steps, as drafted, is a process that under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor”,” processing circuitry”, nothing in the claim element precludes the step from practically being performed in the human mind. Additionally, the mere nominal recitation of a generic computer appliance does not take the claim limitation out of the mental processes grouping. Thus, the claim recites a mental process. 
The claim recites the additional limitation of  “language model”, for performing the method, which is recited at a high level of generality and is recited as performing generic computer functions routinely used in computer applications. The current specification in paragraph [0060] specifies language model as neural network or any other type of language model, which is generic and can be any neural network or any language model, which is not sufficient to amount to significantly more than the judicial exception. This is no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
Thus, taken alone, the additional elements do not amount to significantly more than the above identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds
nothing that is not already present when looking at the elements taken individually. There is no indication
that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Claim 18 is therefore not drawn to eligible subject matter as this is  directed to an abstract idea without significantly more than the abstract idea.

Claim 2 recites the additional limitation of “further comprising: determining an identifier associated with the character, wherein the determining the at least the portion of the one or more embeddings is further based at least on the identifier” , where determining that an identifier of an character is associated with the embeddings or the information, could be an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claim 2 does not recite any additional limitations. The claim as drafted, is not patent eligible.

Claim 3 recites “further comprising: receiving second input data representative of one or more inputs”; “and generating, based at least on the second input data, image data representative of one or more images associated with a state of the interactive application, wherein the determining the at least the portion of the one or more embeddings is further based at least on the image data”, to find out that the second input data is related to an image and the embeddings are associated with the image data is an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claim 3 does not recite any additional limitations. The claim as drafted, is not patent eligible.

Claim 4 recites “wherein the information includes one or more of: first information indicating one or more settings associated with the interactive application”; “second information indicating one or more locations associated with the interactive application”; “third information indicating one or more tasks associated with the interactive application”; “fourth information associated with the character”; “fifth information associated a user of the interactive application”; “sixth information indicating one or more actions that occurred with respect to the interactive application”; “seventh information associated with a context for a current state associated with the interactive application”; “or one or more images corresponding to the interactive application”, to find out that information regarding the interactive application include multiple types such as location, tasks, character, user, actions , context, image, are an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claim 4 does not recite any additional limitations. The claim as drafted, is not patent eligible.

Claim 5 recites “further comprising: generating one or more second embeddings based at least on at least one of the textual input or one or more images associated with a context of the interactive application, wherein the determining the at least the portion of the one or more embeddings is based at least on comparing the one or more second embeddings with respect to the one or more embeddings”,  generating embeddings based on some textual input or images related with the context of the game could be performed with the aid of pen and paper. To determine some portion of embeddings by comparing with other embeddings, could be performed in human mind or with the aid of pen and paper. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claim 5 does not recite any additional limitations. The claim as drafted, is not patent eligible.

Claim 6 recites “further comprising: determining, based at least on the at least the portion of the one or more embeddings, one or more textual sources that include at least a portion of the information; and generating a prompt based at least the textual input and the one or more textual sources, wherein the input data represents at least the prompt”, to determine textual source that include some information from a part of the embeddings, generating the input data, which can be a prompt, from the textual sources, could be performed with the aid of pen and paper. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claim 6 does not recite any additional limitations. The claim as drafted, is not patent eligible.

Claim 7 recites “wherein: the at least the portion of the one or more embeddings includes one or more image embeddings; the method further comprises determining one or more textual embeddings associated with the one or more image embeddings; and the input data is associated with the textual input and the one or more textual embeddings”, where determining an image and related textual description, is an observation and could be performed in the human mind or with the aid of pen and paper. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claim 7 does not recite any additional limitations. The claim as drafted, is not patent eligible.

Claim 8 recites “further comprising: determining one or more filters associated with at least one of the textual input, the character, or the interactive application; and determining, based at least on the one or more filters, at least a second portion of the one or more embeddings from the at least the portion of the one or more embeddings”, where determining a way to filter some information and finding out the related source of information, could be an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claim 8 does not recite any additional limitations. The claim as drafted, is not patent eligible.

Claim 9 recites “wherein the causing the character of the interactive application to output the speech corresponding to the textual output comprises: generating audio data representative of the speech associated with the textual output; and sending, to a client device, the audio data along with image data representative of one or more images corresponding to at least the character”, where speaking out the textual output could be performed by a person. The claim recites additional limitation of “client device”. Client device is specified in specification, para.[0044],[0045] as performing generic computer functions, which is not sufficient to amount to significantly more than the judicial exception. The claim 9 as drafted, is not patent eligible. (see BASCOM Global Internet v. AT&T Mobility LLC, 827 F.3d 1341, 119 USPQ2d 1236 (Fed. Cir. 2016 which notes server vs client structured to be well known).
Claim 11 recites the additional limitation of “wherein the one or more processors are further to: determine an identifier associated with the character, wherein the determination of the one or more first sources of contextual information is further based at least on the identifier” , where determining that an identifier of an character is associated with the source of contextual information, could be an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claim recites additional limitation of “processor”. Processor is specified in specification, para.[0096] ,[00106], as performing generic computer functions, which is not sufficient to amount to significantly more than the judicial exception. The claim 11 as drafted, is not patent eligible

Claim 12 recites “wherein the one or more processors are further to: receive second input data representative of one or more inputs”; “and generate, based at least on the second input data, image data representative of one or more images associated with a state of the application, wherein the determination of the one or more first sources of information is further based at least on the image data”, to find out that the second input data is related to an image and the source of information is associated with the image data is an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claim recites additional limitation of “processor”. Processor is specified in specification, para.[0096] ,[00106], as performing generic computer functions, which is not sufficient to amount to significantly more than the judicial exception. The claim 12 as drafted, is not patent eligible

Claim 13 recites “wherein the one or more processors are further to: obtain one or more embeddings associated with the one or more second sources of information, wherein the determination of the one or more first sources of information comprises: determining, based at least on the textual input, at least a portion of the one or more embedding; and determining that the one or more first sources of information are associated with the at least the portion of the one or more embeddings”, to determine from the textual input that the sources of information are associated with the embeddings, could be an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claim recites additional limitation of “processor”. Processor is specified in specification, para.[0096] ,[00106], as performing generic computer functions, which is not sufficient to amount to significantly more than the judicial exception. The claim 13 as drafted, is not patent eligible

Claim 14 recites “wherein the one or more processors are further to: retrieve text from the one or more first sources of information; and generate a prompt based at least the textual input and the text, wherein the input data represents at least the prompt”, to generate an input based on the source of information and the textual input could be performed with the aid of pen and paper. The claim recites additional limitation of “processor”. Processor is specified in specification, para.[0096] ,[00106], as performing generic computer functions, which is not sufficient to amount to significantly more than the judicial exception. The claim 14 as drafted, is not patent eligible

Claim 15 recites “wherein: the one or more first sources of information include one or more images associated with the application; the one or more processors are further to determine text based at least on the one or more images; and the input data is associated with the textual input and the text”, to determine that the source of the information contains images associated with the application and determining input text based on the images, are an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claim recites additional limitation of “processor”. Processor is specified in specification, para.[0096],[00106],as performing generic computer functions, which is not sufficient to amount to significantly more than the judicial exception. The claim 15 as drafted, is not patent eligible

Claim 16 recites “wherein the one or more processors are further to: determine one or more filters associated with at least one of the textual input, the character, or the application; and determine, based at least on the one or more filters, one or more third sources of information from the one or more first sources of information, wherein the input data is generated based at least on the textual input and the one or more third sources of information”, where determining a way to filter some information and finding out the related source of information, is an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claim recites additional limitation of “processor”. Processor is specified in specification, para.[0096],[00106], as performing generic computer functions, which is not sufficient to amount to significantly more than the judicial exception. The claim 16 as drafted, is not patent eligible

Claims 17 and 20 recite “wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing one or more simulation operations; a system for performing one or more digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system that provides one or more cloud gaming applications; a system for performing one or more deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system for performing one or more generative Al operations; a system for performing operations using one or more large language models (LLMs); a system for performing operations using one or more vision language models (VLMs); a system for performing one or more conversational Al operations; a system for generating synthetic data; a system for presenting at least one of virtual reality content, augmented reality content, or mixed reality content; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources”, where determining whether the system could be in different types of application or system, is an evaluation, observation and could be performed in the human mind or with the aid of pen and paper.  The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claims 17 and 20 do not recite any additional limitations. The claims as drafted, are not patent eligible.

Claim 19 recites “wherein the processing circuitry is further to: generate one or more images associated with a context of the interactive application, wherein the one or more first embeddings are identified based at least on the textual input and the one or more images”, generating an image associated with the context could be performed with the aid of pen and paper. The claim recites additional limitation of “processing circuitry”. Processing circuitry is specified in specification, para.[00142],[00143], as performing generic computer functions, which is not sufficient to amount to significantly more than the judicial exception. The claim 19 as drafted, is not patent eligible

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:

A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 10, 11, 16-18 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gelfenbeyn et al. ( 20230351120 A1), hereinafter referenced as Gelfenbeyn.

Regarding Claim 10, Gelfenbeyn teaches a system comprising: 
one or more processors to ( Gelfenbeyn: Para.[0115], Fig. 11, processors 1002): 
determine, based at least on a textual input associated with an application, one or more first sources of information from one or more second sources of information associated with the application ( Gelfenbeyn: Para.[0058]-[0060], [0062], Figs.5, 6A, an architecture diagram 500 is showing a system where user may use the client device to interact with AI characters in a virtual environment using an application running on the user device. The server, the game client, and the application may be set up based on predetermined rules to enable streaming multimodal inputs from the client to the server. At step B, 506, pre-processed data stream of different modalities are obtained in the form of embeddings 618. Diagram 600 illustrates inputs collected from client such as text input 602 and determining the embeddings ( preprocessed data/information) to process the input) ; 
generate input data based at least on the textual input and the one or more first sources of information (Gelfenbeyn: Para.[0032], [0033], the system may read a log of conversations between a first user and a second user and the description of a scene in a virtual environment and can generate the input data, the parameters of an AI character, based on the log data and the description of the scene. Para.[0081]-[0083], Fig.7A, various parts of memories, such as a personal memory 726, world knowledge 730, and contextual knowledge 734 ( different sources of information) provide information);
determine, based at least on one or more language models processing the input data, a textual output for the textual input ( Gelfenbeyn: Para.[0069], [0074], Fig. 6B, The dialogue prompts 636 ( input data) may be provided to a LLM 646 to generate dialogue output 654);
and cause a character of the application to output speech associated with the textual output ( Gelfenbeyn: Para.[0069], [0074], Fig. 6B, The dialogue output 654, the client-side narrative triggers 656, the animation controls 658, and the voice parameters 644 may be processed using text to speech conversion 660. The output data obtained upon applying the text to speech conversion 660 are sent as a stream to the client 662. The game engine animates the AI character based on the received data by instructing the AI character on what to say, how to move, what to enact, and the like).

Regarding Claim 11, Gelfenbeyn teaches the system of claim 10. Gelfenbeyn further teaches,  wherein the one or more processors are further to: determine an identifier associated with the character ( Gelfenbeyn: Para. [0065], Fig. 6A illustrates different machine learning models such as goals model 622, which identify which goals need to be activated for the AI character, safety model 624, which identify which unsafe responses need to be filtered out) , 
wherein the determination of the one or more first sources of contextual information is further based at least on the identifier ( Gelfenbeyn: Para.[0083], Fig. 7A, The contextual knowledge 734 may be processed to include information about an environment or context to contextualize pursuit of the goal). 

Regarding Claim 16, Gelfenbeyn teaches the system of claim 10. Gelfenbeyn further teaches,  wherein the one or more processors are further to: determine one or more filters associated with at least one of the textual input, the character, or the application ( Gelfenbeyn: Para.[0028], filters are determined to be used, such as, prior to sending a request to the LLM, the platform may classify and filter the user questions and messages to change words based on the personalities of AI characters, emotional states of AI characters );  
and determine, based at least on the one or more filters, one or more third sources of information from the one or more first sources of information, wherein the input data is generated based at least on the textual input and the one or more third sources of information ( Gelfenbeyn: Para. [0065],[0066], Fig. 6A, filtering of unsafe response for the AI model can be configured based on the embeddings 618 and safety model 624, event model 630 (third source of information). Para. [0042], [0043], the input field may include text field describing the scene and environment in which the AI character is placed);  
 
Regarding Claim 17, Gelfenbeyn teaches the system of claim 10. Gelfenbeyn further teaches, wherein the system is comprised in at least one of: [a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine;  a system for performing one or more simulation operations; a system for performing one or more digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing one or more deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system for performing operations using one or more vision language models (VLMs);a system for generating synthetic data; a system for presenting at least one of virtual reality content, augmented reality content, or mixed reality content; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.]
a system that provides one or more cloud gaming applications ( Gelfenbeyn: Para.[0029], the platform can be used in game applications);
a system for performing one or more generative Al operations ( Gelfenbeyn: Para.[0030], the platform can be used in generative AI operations, such as generating AI character models);
a system for performing operations using one or more large language models (LLMs) ( Gelfenbeyn: Para.[0028], the system may utilize a LLM in conversations with the users.);
a system for performing one or more conversational Al operations ( Gelfenbeyn: Para.[0030], the AI character model can engage in conversation with the user).

Regarding Claim 18, Gelfenbeyn teaches one or more processors ( Gelfenbeyn: Para.[0115], Fig. 11, processors 1002) comprising: 
processing circuitry to generate a response to a query based at least on one or more language models processing a prompt that is associated with one or more first embeddings and to cause the response to be output perceptually within an interactive application ( Gelfenbeyn: Para.[0047]-[0052], Fig. 3, The language model 304 can be based on a LLM, can form a request ( prompt) for the LLM, receive a response and process the response from the LLM. The AI character model 300 can include runtime parameters 312 ( first embeddings) and design parameters 314 ( second embeddings). The request for the LLM can include text requests according to the current scene, environmental parameters, an emotional state of the AI character, an emotional state of the user, and current context of the conversation with the user. Para.[0074], Fig. 6B, The output data obtained are sent as a stream to the client 662 of the gaming engine. Para.[0101], Fig.11, In block 1004, the method may include adjusting, by the processor and based on the log data, parameters of the AI character model to cause the AI character model to mimic behavioral characteristics of the first user in follow-up conversations with further users), 
wherein the one or more first embeddings are identified from one or more second embeddings stored in one or more databases ( Gelfenbeyn: Para. [0047]-[0049]. Fig. 3, The runtime parameters 312 ( first embeddings) may correspond to an emotional state of an AI character ( design parameters 314, as second embeddings)),
and wherein the one or more second embeddings are associated with one or more sources that include contextual information associated with the interactive application ( Gelfenbeyn: Para.[0048], [0083], Figs. 3,7A, the design parameters 314 may correspond to settings for personality and emotions of an AI character. The design parameters 314 can be generated based on character description which can include contextual knowledge, such as 734, processed to include information about an environment or context to contextualize pursuit of the goal).

Claim 20 is a  processor claim performing the steps in system claim 17 above and as such, claim 20 is similar in scope and content to claim 17 and therefore, claim 20 is rejected under similar rationale as presented against claim 17 above.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9, 12-15 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Gelfenbeyn et al. ( 20230351120 A1), hereinafter referenced as Gelfenbeyn, in view of Ahafonov et al.  ( US 20240355010 A1), hereinafter referenced as Ahafonov.

Regarding Claim 1,  Gelfenbeyn teaches a method comprising: 
determining, based at least on a textual input, at least a portion of the one or more embeddings ( Gelfenbeyn: Para.[0058]-[0060], [0062], Figs.5, 6A, an architecture diagram 500 is showing a system where user may use the client device to interact with AI characters in a virtual environment using an application running on the user device. The server, the game client, and the application may be set up based on predetermined rules to enable streaming multimodal inputs from the client to the server. At step B, 506, pre-processed data stream of different modalities are obtained in the form of embeddings 618. Diagram 600 illustrates inputs collected from client such as text input 602 and determining the embeddings ( preprocessed data/information) to process the input) ; 
determining, based at least on one or more language models processing input data associated with the textual input and the at least the portion of the one or more embedding, a textual output for the textual input ( Gelfenbeyn: Para.[0062].[0063], Figs. 5, 6A, at step B, input data has been processed through a series of machine learning models and embedded data. Para.[0069], Figs. 5, 6B, at step C, formatted and composed data such as dialogue prompts 636 ( input data) may be provided to a LLM 646 to generate dialogue output 654); 
and causing a character of the interactive application to output speech associated with the textual output ( Gelfenbeyn: Para.[0069], [0074], Fig. 6B, The dialogue prompts 636 may be provided to a LLM 646 to generate dialogue output 654. The dialogue output 654, the client-side narrative triggers 656, the animation controls 658, and the voice parameters 644 may be processed using text to speech conversion 660. The output data obtained upon applying the text  to speech conversion 660 are sent as a stream to the client 662. The game engine animates the AI character based on the received data by instructing the AI character on what to say, how to move, what to enact, and the like).

Gelfenbeyn while teaching the method of claim 1, fails to explicitly teach the claimed, generating, based at least on information associated with an interactive application, one or more embeddings associated with the information;

However, Ahafonov does teach the claimed, generating, based at least on information associated with an interactive application, one or more embeddings associated with the information ( Ahafonov: Para.[0021], [0028], Figs.1, 2, an interaction system 100 is illustrated for facilitating interactions ( e.g., exchanging text messages, conducting text audio and video calls, or playing games) between multiple user systems 102, each of which hosts multiple applications, including an interaction client 104 and other applications 106. The interaction servers 124 host multiple systems and subsystems such as personalized AI agent system 232. Para.[0129], [0147], [0148], Fig.5 illustrates an example architecture 500 for applying a personal AI agent 502 to provide personalized features to a user of an interaction client 104 by analyzing user data and behavior to understand their preferences and interests. The personal AI agent 502 generates embeddings for the multimodal memory 508 that includes information about individuals and their relationships with other individuals, entities, and devices and the multimodal memory is generated using various data sources as described herein by identifying patterns and connections between entities, using one or more techniques, such as neural networks. These embeddings can be fine-tuned and optimized for specific applications and tasks) ; 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ahafonov’s teaching of system and method for generating an extended reality (XR) try-on experience , into the system and method for observation-based training of an Artificial Intelligence (AI) character model, taught by Gelfenbeyn, because, this would improve the efficiency of using an electronic device by intelligently and automatically generating images that depict real-world objects in a real-world scene in a simple and intuitive manner. (Ahafonov, Para.[0018]).

Regarding Claim 2, Gelfenbeyn in view of Ahafonov teach the method of claim 1. Gelfenbeyn further teaches, further comprising: determining an identifier associated with the character ( Gelfenbeyn: Para. [0065], Fig. 6A illustrates different machine learning models such as goals model 622, which identify which goals need to be activated for the AI character, safety model 624, which identify which unsafe responses need to be filtered out) , 
wherein the determining the at least the portion of the one or more embeddings is further based at least on the identifier ( Gelfenbeyn: Para.[0062], [0065], Fig. 6A,  all different machine learning models, such as goals model 622, safety model 624, which identify different characteristics of the AI character, are configured to process the embeddings 618 ( preprocessed data stream in the form of text and/or embeddings) to recognize what needs to be activated). 

Regarding Claim 3, Gelfenbeyn in view of Ahafonov teach the method of claim 1. Ahafonov further teaches, further comprising: receiving second input data representative of one or more inputs ( Ahafonov: Para.[0158], Fig. 5, the personal AI agent 502 determines that the user is using the AR/VR device 526 and receives input via the AR/VR device 526 including a voice prompt “What can I cook with these ingredients?"); 
and generating, based at least on the second input data, image data representative of one or more images associated with a state of the interactive application ( Ahafonov: Para.[0158], The personal AI agent 502 accesses a camera feed on the AR/VR device 526 and identifies objects within the camera feed, which include potatoes and carrots), 
wherein the determining the at least the portion of the one or more embeddings is further based at least on the image data ( Ahafonov: Para.[0158], The personal AI agent 502 accesses multimodal memory 508 for the user and finds relevant user data from a common space vector ( embeddings), such as a like for a friend 's potato and carrot soup, a user's location in Germany on a recent trip, and a video post from the user with context relating to authenticity of recipes). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ahafonov’s teaching of system and method for generating an extended reality (XR) try-on experience , into the system and method for observation-based training of an Artificial Intelligence (AI) character model, taught by Gelfenbeyn, because, this would improve the efficiency of using an electronic device by intelligently and automatically generating images that depict real-world objects in a real-world scene in a simple and intuitive manner. (Ahafonov, Para.[0018]).

Regarding Claim 4, Gelfenbeyn in view of Ahafonov teach the method of claim 1. Gelfenbeyn further teaches, wherein the information includes one or more of: [first information indicating one or more settings associated with the interactive application]; 
second information indicating one or more locations associated with the interactive application ( Gelfenbeyn: Para.[0095], Fig.9, A scene 902 may be driven by a plurality of parameters. The parameters may include scene and location knowledge 904); [third information indicating one or more tasks associated with the interactive application;] 
fourth information associated with the character ( Gelfenbeyn: Para.[0076], [0079], [0084],[0085], Fig.7A, AI character personality and background description 706, An identity profile 718 specify elements of an AI character ( e.g., role, interests). Voice configuration 738 ( information) may be used to determine the configuration of voice in real-time, which can allow AI characters to show different expressions when pursuing a goal. Dialogue style controls 742 may be used to control a dialogue style of an AI character);
sixth information indicating one or more actions that occurred with respect to the interactive application  ( Gelfenbeyn: Para.[0086], Fig.7A, Goals and actions 746 received from the user may be processed to specify the goals that an AI character has per scene, and then set up the actions that the AI character has available to pursue the goal);
seventh information associated with a context for a current state associated with the interactive application( Gelfenbeyn: Para.[0083], Fig.7A, The contextual knowledge 734 may be processed to include information about an environment or context to contextualize pursuit of the goal);[or one or more images corresponding to the interactive application.]  
Ahafonov further teaches, fifth information associated a user of the interactive application ( Ahafonov: Para.[0131]-[0141], [0148], the multimodal memory 508, where the embeddings are stored, contain information about user’s demographic, behavioral, contextual, interests, contacts, purchase history data, etc.) ; 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ahafonov’s teaching of system and method for generating an extended reality (XR) try-on experience , into the system and method for observation-based training of an Artificial Intelligence (AI) character model, taught by Gelfenbeyn, because, this would improve the efficiency of using an electronic device by intelligently and automatically generating images that depict real-world objects in a real-world scene in a simple and intuitive manner. (Ahafonov, Para.[0018]).

Regarding Claim 5, Gelfenbeyn in view of Ahafonov teach the method of claim 1. Ahafonov further teaches, further comprising: generating one or more second embeddings based at least on at least one of the textual input or one or more images associated with a context of the interactive application ( Ahafonov: Para.[0158], Fig. 5, The personal AI agent 502 accesses a camera feed on the AR/VR device 526 and identifies objects within the camera feed which is located in the multimodal memory 508 for the user and finds relevant user data from a common space Vector ( embeddings generated before), which include potatoes and carrots ( based on user’s query)),
wherein the determining the at least the portion of the one or more embeddings is based at least on comparing the one or more second embeddings with respect to the one or more embeddings ( Ahafonov: Para.[0147], The personal AI agent 502 uses these embeddings for cross-modal comparisons and analysis. An embedding of an image can be compared to the embedding of a corresponding text description to identify semantic relationships between the two). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ahafonov’s teaching of system and method for generating an extended reality (XR) try-on experience , into the system and method for observation-based training of an Artificial Intelligence (AI) character model, taught by Gelfenbeyn, because, this would improve the efficiency of using an electronic device by intelligently and automatically generating images that depict real-world objects in a real-world scene in a simple and intuitive manner. (Ahafonov, Para.[0018]).

Regarding Claim 6, Gelfenbeyn in view of Ahafonov teach the method of claim 1. Ahafonov further teaches, further comprising: determining, based at least on the at least the portion of the one or more embeddings, one or more textual sources that include at least a portion of the information ( Ahafonov: Para.[0163], Fig.6, The personal AI agent system 600 may employ embeddings to generate the multimodal memory 626. Para.[0167], [0168], Fig.6, the information communicated are extracted from multimodal memory 626, the data can be text  or image data, video data, audio data, electronic documents, links to data stored on the Internet or the client system 606);
and generating a prompt based at least the textual input and the one or more textual sources( Ahafonov: Para.[0167], [0168], Fig.6,a prompt is generated based on the intent 622, where the intent is generated based on the information ( could be textual) extracted/obtained from the multimodal memory);
wherein the input data represents at least the prompt ( Ahafonov: Para.[0168], Fig. 6, The user 638 interacts with a device, such as by entering a prompt as user input 602).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ahafonov’s teaching of system and method for generating an extended reality (XR) try-on experience , into the system and method for observation-based training of an Artificial Intelligence (AI) character model, taught by Gelfenbeyn, because, this would improve the efficiency of using an electronic device by intelligently and automatically generating images that depict real-world objects in a real-world scene in a simple and intuitive manner. (Ahafonov, Para.[0018]).

Regarding Claim 7, Gelfenbeyn in view of Ahafonov teach the method of claim 1. Ahafonov further teaches, wherein: the at least the portion of the one or more embeddings includes one or more image embeddings ( Ahafonov: Para.[0147], Fig.5, the embeddings used by the personal AI agent can be image embeddings);
the method further comprises determining one or more textual embeddings associated with the one or more image embeddings ( Ahafonov: Para.[0147], Fig.5, embeddings of an image could have corresponding text description);
and the input data is associated with the textual input and the one or more textual embeddings ( Ahafonov: Para.[0152], Fig.5, the generative machine learning models are trained to receive a prompt as input (which can include any combination of text, images) and which can be derived from multimodal memory 508, where embeddings are stored);
 Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ahafonov’s teaching of system and method for generating an extended reality (XR) try-on experience , into the system and method for observation-based training of an Artificial Intelligence (AI) character model, taught by Gelfenbeyn, because, this would improve the efficiency of using an electronic device by intelligently and automatically generating images that depict real-world objects in a real-world scene in a simple and intuitive manner. (Ahafonov, Para.[0018]).

Regarding Claim 8, Gelfenbeyn in view of Ahafonov teach the method of claim 1. Gelfenbeyn further teaches, further comprising: determining one or more filters associated with at least one of the textual input, the character, or the interactive application ( Gelfenbeyn: Para.[0028], filters are determined to be used, such as, prior to sending a request to the LLM, the platform may classify and filter the user questions and messages to change words based on the personalities of AI characters, emotional states of AI characters );  
and determining, based at least on the one or more filters, at least a second portion of the one or more embeddings from the at least the portion of the one or more embeddings ( Gelfenbeyn: Para. [0065],[0066], Fig. 6A, filtering of unsafe response for the AI model can be configured based on the embeddings 618 and safety model 624, event model 630 (second portion));  
wherein the input data is associated with the textual input and the at least the second portion of the one or more embeddings ( Gelfenbeyn: Para. [0042], [0043], the input field may include text field describing the scene and environment in which the AI character is placed);  

Regarding Claim 9, Gelfenbeyn in view of Ahafonov teach the method of claim 1. Gelfenbeyn further teaches, wherein the causing the character of the interactive application to output the speech corresponding to the textual output comprises: generating audio data representative of the speech associated with the textual output ( Gelfenbeyn: Para.[0090], Fig.7B, at block 762, the text to speech conversion model determines how the AI character speaks his lines (audio) to pursue the goal),  
and sending, to a client device, the audio data along with image data representative of one or more images corresponding to at least the character ( Gelfenbeyn: Para.[0090]-[0092], Fig.7B, the outputs obtained in blocks, dialogue output (audio or text) 766, the client side narrative triggers 768, and the animation 770 may be provided to a client 772 (e.g., a client engine, a game engine, a web application, and the like)).  

Regarding Claim 12, Gelfenbeyn teach the system of claim 10. Gelfenbeyn fails to explicitly teach the claimed, wherein the one or more processors are further to: receive second input data representative of one or more inputs; and generate, based at least on the second input data, image data representative of one or more images associated with a state of the application, wherein the determination of the one or more first sources of information is further based at least on the image data.  

However, Ahafonov does teach the claimed, wherein the one or more processors are further to:
receive second input data representative of one or more inputs( Ahafonov: Para.[0158], Fig. 5, the personal AI agent 502 determines that the user is using the AR/VR device 526 and receives input via the AR/VR device 526 including a voice prompt “What can I cook with these ingredients?"); 
and generate, based at least on the second input data, image data representative of one or more images associated with a state of the application( Ahafonov: Para.[0158], The personal AI agent 502 accesses a camera feed on the AR/VR device 526 and identifies objects within the camera feed, which include potatoes and carrots), 
wherein the determination of the one or more first sources of information is further based at least on the image data ( Ahafonov: Para.[0158], The personal AI agent 502 accesses multimodal memory 508 for the user and finds relevant user data from a common space vector ( embeddings), such as a like for a friend 's potato and carrot soup, a user's location in Germany on a recent trip, and a video post from the user with context relating to authenticity of recipes). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ahafonov’s teaching of system and method for generating an extended reality (XR) try-on experience , into the system and method for observation-based training of an Artificial Intelligence (AI) character model, taught by Gelfenbeyn, because, this would improve the efficiency of using an electronic device by intelligently and automatically generating images that depict real-world objects in a real-world scene in a simple and intuitive manner. (Ahafonov, Para.[0018]).

Regarding Claim 13, Gelfenbeyn teach the system of claim 10. Gelfenbeyn fails to explicitly teach the claimed, wherein the one or more processors are further to: obtain one or more embeddings associated with the one or more second sources of information, wherein the determination of the one or more first sources of information comprises: determining, based at least on the textual input, at least a portion of the one or more embedding; and determining that the one or more first sources of information are associated with the at least the portion of the one or more embeddings.  

However, Ahafonov does teach the claimed, wherein the one or more processors are further to: obtain one or more embeddings associated with the one or more second sources of information ( Ahafonov: Para.[0158], Fig. 5, based on user’s query, personal AI agent 502 access camera feed from multimodal memory 508 to find relevant user data ( second source of information) from a common space vector (embeddings), such as a video from recent trip ),
wherein the determination of the one or more first sources of information comprises: determining, based at least on the textual input, at least a portion of the one or more embedding ( Ahafonov: Para.[0158], Fig. 5, The personal AI agent 502 generates a prompt ( textual input) from this common space vector to input into a neural network engine 514, which asks for "authentic German soup recipes);
and determining that the one or more first sources of information are associated with the at least the portion of the one or more embeddings ( Ahafonov: Para.[0158], Fig. 5, based on user’s query, personal AI agent 502 access camera feed from multimodal memory 508 to find relevant user data ( first source of information) from a common space vector (embeddings), such as picture of potato and carrot).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ahafonov’s teaching of system and method for generating an extended reality (XR) try-on experience , into the system and method for observation-based training of an Artificial Intelligence (AI) character model, taught by Gelfenbeyn, because, this would improve the efficiency of using an electronic device by intelligently and automatically generating images that depict real-world objects in a real-world scene in a simple and intuitive manner. (Ahafonov, Para.[0018]).

Regarding Claim 14, Gelfenbeyn teach the system of claim 10. Gelfenbeyn fails to explicitly teach the claimed,  wherein the one or more processors are further to: retrieve text from the one or more first sources of information; and generate a prompt based at least the textual input and the text, wherein the input data represents at least the prompt.  

However, Ahafonov does teach the claimed, wherein the one or more processors are further to: retrieve text from the one or more first sources of information ( Ahafonov: Para.[0167], [0168], Fig.6, the information communicated are extracted from multimodal memory 626, the data can be text  or image data, video data, audio data, electronic documents, links to data stored on the Internet or the client system 606);
 and generate a prompt based at least the textual input and the text, wherein the input data represents at least the prompt ( Ahafonov: Para.[0167], [0168], Fig.6,a prompt is generated based on the intent 622, where the intent is generated based on the information ( could be textual) extracted/obtained from the multimodal memory. The user 638 interacts with a device, such as by entering a prompt as user input 602).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ahafonov’s teaching of system and method for generating an extended reality (XR) try-on experience , into the system and method for observation-based training of an Artificial Intelligence (AI) character model, taught by Gelfenbeyn, because, this would improve the efficiency of using an electronic device by intelligently and automatically generating images that depict real-world objects in a real-world scene in a simple and intuitive manner. (Ahafonov, Para.[0018]).
Regarding Claim 15, Gelfenbeyn teach the system of claim 10. Gelfenbeyn fails to explicitly teach the claimed, wherein: the one or more first sources of information include one or more images associated with the application; the one or more processors are further to determine text based at least on the one or more images; and the input data is associated with the textual input and the text.  

However, Ahafonov does teach the claimed, wherein: the one or more first sources of information include one or more images associated with the application ( Ahafonov: Para.[0147], Fig.5, the embeddings ( source of information) used by the personal AI agent can be image embeddings);
the one or more processors are further to determine text based at least on the one or more images ( Ahafonov: Para.[0147], Fig.5, embeddings of an image could have corresponding text description);
and the input data is associated with the textual input and the text ( Ahafonov: Para.[0152], Fig.5, the generative machine learning models are trained to receive a prompt as input (which can include any combination of text, images) and which can be derived from multimodal memory 508, where embeddings are stored);
 Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ahafonov’s teaching of system and method for generating an extended reality (XR) try-on experience , into the system and method for observation-based training of an Artificial Intelligence (AI) character model, taught by Gelfenbeyn, because, this would improve the efficiency of using an electronic device by intelligently and automatically generating images that depict real-world objects in a real-world scene in a simple and intuitive manner. (Ahafonov, Para.[0018]).

Regarding Claim 19, Gelfenbeyn teach the one or more processors of claim 18. Gelfenbeyn fails to explicitly teach the claimed, wherein the processing circuitry is further to: generate one or more images associated with a context of the interactive application, wherein the one or more first embeddings are identified based at least on the textual input and the one or more images.

However, Ahafonov does teach the claimed, wherein the processing circuitry is further to: generate one or more images associated with a context of the interactive application ( Ahafonov: Para.[0065], the artificial intelligence and machine learning system can implement one or more machine learning models that generate artificial images of a person or object wearing an artificially generated fashion item corresponding to a textual description or prompt and later generate a new image in which the artificially generated fashion item is replaced with an object or XR object that resembles (looks like) a real-world fashion item or product),
wherein the one or more first embeddings are identified based at least on the textual input and the one or more images ( Ahafonov: Para.[0145], Fig. 5, personal AI agent 502 can use the user database 504 to determine that a user posts a picture captured from a mobile phone with the user's dog and the caption or comments refer to the dog as Jake. The personal AI agent 502 can update the multimodal memory 508 ( as embeddings) for the user to store a link that associates the user with a dog named Jake, such as a link between an entity representing a dog ( image) and another entity representing the name Jake). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ahafonov’s teaching of system and method for generating an extended reality (XR) try-on experience , into the system and method for observation-based training of an Artificial Intelligence (AI) character model, taught by Gelfenbeyn, because, this would improve the efficiency of using an electronic device by intelligently and automatically generating images that depict real-world objects in a real-world scene in a simple and intuitive manner. (Ahafonov, Para.[0018]).

Conclusion

Listed below are the prior arts made of record and not relied upon but are considered pertinent to applicant's disclosure.
Nguyenet al. (US 20240112674 A1) teaches a method includes rendering a first output image of an XR assistant avatar for displays of an extended reality (XR) display device, wherein the XR assistant avatar is interactable by a user to access an assistant system and has a first form indicating a first attention state, which indicates whether the XR assistant avatar is interactable via first voice commands for first functions enabled by the assistant system, detecting voice inputs from the user, determining a second attention state associated with the XR assistant avatar based on the voice inputs, and rendering a second output image of the XR assistant avatar for the displays of the XR display device, wherein the XR assistant avatar is morphed to have a second form indicating the second attention state, which indicates whether the XR assistant avatar is interactable via second voice commands for second functions enabled by the assistant system.
Spiegel et al. (US 20240249318 A1) teaches a system and method for determining user intent and providing targeted advertising using chatbot interactions is disclosed. The system receives user prompts during chat sessions with a chatbot and generates responses using a large language model. User intent is extracted by analyzing the chat conversations using natural language processing and machine learning techniques. The extracted user intent, comprising weighted keywords and concepts, is used to create a user intent profile. Targeted advertising content is generated based on the user intent profile and provided to the user during subsequent platform interactions. The large language model is continuously retrained using user engagement data to improve intent modeling accuracy. User privacy is maintained by limiting context extraction to chatbot conversations. The system enables personalized and relevant advertising by inferring user intent through conversational interactions.
Taylor et al.  (US 20200382448 A1) teaches a chat bot computing system includes a bot controller and a natural language processor. The natural language processor receives a first textual input and identifies concepts represented by the first textual input. An indication of the concepts is output to the bot controller which generates a response to the first textual input. The concepts output by the natural language processor are also fed back into the input to the natural language processor, as context information, when a second textual input is received. The natural language processor then identifies concepts represented in the second textual input, based on the second natural language, textual input and the context information.
Mikhailiuk et al. (US 20250150414 A1) teaches a computer-implemented method and system for responding to user posts containing images with relevant image responses during conversation between a user and a chatbot. The system receives an image post from the user and generates a description of the image using an image-to-text model. User intent is determined based on the image and description. If responding with an image is appropriate based on the user intent, the system generates a prompt using the image description and passes it to a text generation model to create an image description and caption. The image description and caption are used to synthesize a new image. The resulting image and caption are packaged into a post that is provided as a response to the user. The system uses machine learning pipelines and models to analyze images, detect inappropriate content, classify user intent, generate text, and synthesize images.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NADIRA SULTANA whose telephone number is (571)272-4048. The examiner can normally be reached M-F,7:30 am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D. Shah can be reached on (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/NADIRA SULTANA/Examiner, Art Unit 2653
Read full office action
Prosecution Timeline

Jun 18, 2024
Application Filed
Feb 10, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/654,845
Patent 12603086
CONTEXTUAL EDITABLE SPEECH RECOGNITION METHODS AND SYSTEMS
2y 5m to grant Granted Apr 14, 2026
18/129,882
Patent 12591747
ENTITY-CONDITIONED SENTENCE GENERATION
2y 5m to grant Granted Mar 31, 2026
18/154,197
Patent 12573413
AUDIO CODING METHOD AND RELATED APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 10, 2026
18/316,173
Patent 12567420
METHOD AND APPARATUS FOR CONTROLLING SOUND RECEIVING DEVICE BASED ON DUAL-MODE AUDIO THREE-DIMENSIONAL CODE
2y 5m to grant Granted Mar 03, 2026
17/575,195
Patent 12536992
ELECTRONIC DEVICE AND METHOD FOR PROVIDING VOICE RECOGNITION SERVICE
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+31.1%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 97 resolved cases by this examiner. Grant probability derived from career allow rate.