DETAILED ACTION
This Office Action is in response to application 18/924655 filed on 10/23/2024. Claims 1-20 are pending.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/06/2025 has been acknowledged and is being considered by the examiner.
Claim Objections
Claim 13 is objected to because of the following informalities: The claim recites “a particular user profiles” in line 3. The examiner believes this is a typographical error and should recite “a particular user profile.” Appropriate correction is required.
Claim 14 is objected to because of the following informalities: The claim recites “sores” in line 5. The examiner believes this is a typographical error and should recite “stores.” Appropriate correction is required.
Claim 16 is objected to because of the following informalities: The claim recites “a particular user profiles” in line 16. The examiner believes this is a typographical error and should recite “a particular user profile.” Appropriate correction is required.
Claim Interpretation
Claims 1, 16, 20, recite the limitation “real-time context.” Applicant’s specification defines this as:
[0035] …For example, data corresponding to various modalities such as visual, auditory, textual, speech, textile etc., modalities or combinations thereof, may be collected and used. The real-time contexts may include, for example, inferred locations and activities of the user. In some examples, the real-time context may also include information indicative of an emotional state of the user. The context-aware dialogue system may detect the emotional state of the user based on facial appearance and/or gaze direction of one or both eyes of the user determined based on image data obtained via an inward-facing camera that may be provided on the portable device (e.g., smart eyewear) of the user. The facial appearance and/or gaze direction may be used to track the facial expression and eye movements of the user to determine emotional state indicative of happiness, sadness, fear, anger, disgust, surprise, etc. experienced by the user. Additionally, or alternatively, the context-aware dialogue system may detect the emotional state of the user based on the audio data obtained via the microphone of portable device (e.g., smart eyewear) of the user. For example, the audio data may include utterances of the user, and the emotional state may be inferred based on content, intonation, sound level, arousal level etc. of the utterances of the user.
Therefore, in line with applicant’s specification, the examiner will equate “real-time context” as any of inferred location, activities, or emotional state of the user.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-13, 15-20, are rejected under 35 U.S.C. 103 as being unpatentable over Shukla (US 2019/0206407) in view of Pandey et al. (US 2023/0267278).
Regarding claim 1, Shukla disclosed:
A method for generating personalized responses in a conversation with a user (Paragraph 3, human machine communication. Paragraph 33, customized responses), the method comprising:
generating, by one or more processors (Paragraph 72, processors), a plurality of real-time contexts (Paragraph 33, acquiring visual or acoustic data to understand user expression/emotion/intent) capturing an environment of the user over time (Paragraph 40, continually collecting multi-modal data), including generating a particular real-time context (Paragraph 36, user is wearing a yellow shirt and appears bored), among the plurality of real-time contexts, based on i) a first data stream (Paragraph 33, visual data) corresponding to a first modality in an environment of the user and ii) a second data stream (Paragraph 33, acoustic data) corresponding to a second modality in the environment of the user, wherein the second modality is different from the first modality, and wherein respective real-time contexts, among the plurality of real-time contexts, correspond to different points in time (Paragraph 42, histories) (Paragraph 33, the user device uses various sensors in one or more modalities, such as visual, acoustic, text, or haptic. The sensor data is captured during dialog and used to facilitate an understanding of the user (expression, emotion, intent) and the surroundings of the user. The data is used to provide a customized response. Paragraph 35, inputs from the user’s side, such as an utterance or action, the appearance of the user, as well as information about the surrounding of the user are provided to the agent device via network connections. Paragraph 36, the user interaction system observes, from the input of the user device, that the user is wearing a yellow shirt and appears to be bored. Paragraph 40, during dialog, the user device continually collects multi-modal sensor data related to the user and their surroundings. Paragraph 42, detected states which are associated with dialog histories…may evolve over time based on the party’s choices made during conversations);
generating, by the one or more processors, a plurality of historical contexts (Paragraph 85, dialog history) based on the plurality of real-time contexts (Paragraph 85, to determine a response for an on-going dialog, additional parameters are needed such as the dialog history. The history, along with the current state of the user (e.g., user’s emotion) associated with a current state of the corresponding dialog tree impacts the decision on an appropriate response);
in response to receiving a conversational cue (Paragraph 35, user’s utterance) provided by the user, generating, by the one or more processors, a current real-time context based on data corresponding to the first modality and the second modality in a current environment of the user (Paragraph 35, inputs from the user’s side, such as the user’s utterance are provided to the user interaction system. Paragraph 36, the user interaction system processes the input, including other information such as the user is wearing a yellow shirt and is bored);
generating, by the one or more processors based on the current real-time context, a personalized response to the conversational cue, wherein generating the personalized response includes (Paragraph 33, sensor data provides contextual information and can be explored to customize a response accordingly. Paragraph 36, based on the user’s shirt and that they appear to be bored, an adaptive response that comments on how nice the user looks is generated. Paragraph 38, basing the decision on how to respond to a user on what is observed during the dialog such as expression, emotion, mindset of the user and generating the response that is based on the specific situation of the dialog and the intended purpose of the dialog)
identifying, based on the current real-time context, relevant user information (Paragraph 47, visual data can also include facial expressions of the user or objects in the background where the user is at (basketball, table, chair). Audio data also includes tone of the user or an accent of the user. This information is leveraged in dialog management to enhance engagement of the user), and
generating the personalized response to the conversational cue using the relevant user information (Paragraph 49, based on the user appearing not engaged, the dialog machine determines to perk up the user to better engage them. The automated dialog machine utters “Would you like to play a game?”); and
causing, by the one or more processors, the personalized response to be provided to the user (Paragraph 49, such a question is delivered in audio form to the user).
While Shukla disclosed using dialog histories (see above), Shukla did not explicitly disclose including identifying one or more relevant historical contexts from among the plurality of historical contexts.
However, in an analogous art, Pandey disclosed including identifying one or more relevant historical contexts from among the plurality of historical contexts (Paragraph 45, obtaining one or more historical conversation logs comprising responses matched to a plurality of conversation contexts and using the one or more historical conversation logs to output one of the agent responses. Paragraph 46, assigning a score to the conversation context agent response from the one or more historical conversation logs).
One of ordinary skill in the art would have been motivated to combine the teachings of Shukla and Pandey because the references involve outputting responses for conversations, and as such, are within the same environment.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the identifying relevant historical contexts of Pandey with the teachings of Shukla in order to improve discoverability and coverage of designated responses based on an analysis of conversation history (Pandey, Paragraph 91).
Regarding claim 16, Shukla disclosed:
A method for generating personalized responses in a conversation with a user, the method comprising (Paragraph 3, human machine communication. Paragraph 33, customized responses):
generating, by one or more processors (Paragraph 72, processors), a plurality of real-time contexts (Paragraph 33, acquiring visual or acoustic data to understand user expression/emotion/intent), including generating a particular real-time context (Paragraph 36, user is wearing a yellow shirt and appears bored), among the plurality of real-time contexts, based on i) a first data stream (Paragraph 33, visual data) corresponding to a first modality in an environment of the user and ii) a second data stream (Paragraph 33, acoustic data) corresponding to a second modality in the environment of the user, wherein the second modality is different from the first modality, and wherein respective real-time contexts, among the plurality of real-time contexts, correspond to different points in time (Paragraph 42, dialog histories) (Paragraph 33, the user device uses various sensors in one or more modalities, such as visual, acoustic, text, or haptic. The sensor data is captured during dialog and used to facilitate an understanding of the user (expression, emotion, intent) and the surroundings of the user. The data is used to provide a customized response. Paragraph 35, inputs from the user’s side, such as an utterance or action, the appearance of the user, as well as information about the surrounding of the user are provided to the agent device via network connections. Paragraph 36, the user interaction system observes, from the input of the user device, that the user is wearing a yellow shirt and appears to be bored. Paragraph 40, during dialog, the user device continually collects multi-modal sensor data related to the user and their surroundings. Paragraph 42, detected states which are associated with dialog histories…may evolve over time based on the party’s choices made during conversations);
generating, by the one or more processors, user information (Paragraph 47, visual data can also include facial expressions of the user or objects in the background where the user is at (basketball, table, chair). Audio data also includes tone of the user or an accent of the user. This information is leveraged in dialog management to enhance engagement of the user), including
generating a plurality of historical contexts (Paragraph 85, dialog history) based on one or both of i) the plurality of real-time contexts (Paragraph 85, to determine a response for an on-going dialog, additional parameters are needed such as the dialog history. The history, along with the current state of the user (e.g.., user’s emotion) associated with a current state of the corresponding dialog tree impacts the decision on an appropriate response or ii) previous conversations with the user, and
generating, based on the plurality of historical contexts, a plurality of user profiles (Paragraph 65, user profile archive), wherein a particular user profiles, among the plurality of user profiles, includes information regarding a particular aspect of the user (Paragraphs 65, 68, user information such as preferences from a user profile archive. Generating a different initiating sequence based on a user’s preference from the user profile archive and something detected in the dialog scene such as a user’s appearance (i.e., aspect of the user) or objects in the environment of the user. Figure 5, showing a plurality of user profile archives 555);
in response to receiving a conversational cue from the user (Paragraph 35, user’s utterance), generating, by the one or more processors, a current real-time context based on data corresponding to the first modality and the second modality in a current environment of the user (Paragraph 35, inputs from the user’s side, such as the user’s utterance are provided to the user interaction system. Paragraph 36, the user interaction system processes the input, including other information such as the user is wearing a yellow shirt and is bored);
generating, based on the current real-time context, a personalized response (Paragraph 33, customized response) to the conversational cue, including identifying, based on the current real-time context, relevant user information (Paragraph 33, sensor data provides contextual information and can be explored to customize a response accordingly. Paragraph 36, based on the user’s shirt and that they appear to be bored, an adaptive response that comments on how nice the user looks is generated. Paragraph 38, basing the decision on how to respond to a user on what is observed during the dialog such as expression, emotion, mindset of the user and generating the response that is based on the specific situation of the dialog and the intended purpose of the dialog), including
generating the personalized response to the conversational cue using the relevant user information (Paragraph 49, based on the user appearing not engaged, the dialog machine determines to perk up the user to better engage them. The automated dialog machine utters “Would you like to play a game?”); and
causing, by the one or more processors, the personalized response to be provided to the user (Paragraph 49, such a question is delivered in audio form to the user).
Shukla did not explicitly disclose wherein respective historical contexts, among the plurality of historical contexts, include one or both of i) summaries of daily events associated with the user or ii) summaries of the previous conversations with the user; and identifying one or both of i) one or more relevant historical contexts from among the plurality of historical contexts or ii) one or more relevant user profiles from among the plurality of user profiles.
However, in an analogous art, Pandey disclosed wherein respective historical contexts, among the plurality of historical contexts, include one or both of i) summaries of daily events associated with the user or ii) summaries of the previous conversations with the user (Paragraph 45, obtaining one or more historical conversation logs (i.e., summaries of previous conversations) that comprise a plurality of agent responses matched to a plurality of conversation texts);
identifying one or both of i) one or more relevant historical contexts from among the plurality of historical contexts (Paragraph 45, obtaining one or more historical conversation logs comprising responses matched to a plurality of conversation contexts and using the one or more historical conversation logs to output one of the agent responses. Paragraph 46, assigning a score to the conversation context agent response from the one or more historical conversation logs) or ii) one or more relevant user profiles from among the plurality of user profiles.
One of ordinary skill in the art would have been motivated to combine the teachings of Shukla and Pandey because the references involve outputting responses for conversations, and as such, are within the same environment.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the identifying relevant historical contexts of Pandey with the teachings of Shukla in order to improve discoverability and coverage of designated responses based on an analysis of conversation history (Pandey, Paragraph 91).
Regarding claim 20, the claim is substantially similar to claim 1. Claim 20 recites a first and second sensor (Shukla, Paragraph 33, visual sensors and acoustic sensors). Therefore, the claim is rejected under the same rationale.
Regarding claims 2, 17, the limitations of claims 1, 16, have been addressed. Shukla and Pandey disclosed:
wherein: the first data stream corresponding to the first modality comprises image or video data visually depicting a scene in the environment of the (Shukla, Paragraph 32, observed surroundings. Paragraph 33, video sensor 140 acquiring visual data); and
the second data stream corresponding to the second modality comprises audio data reflecting an audio environment of the user and sound produced by the user (Shukla, Paragraph 33, acoustic sensor 130 gathering utterances of the user or sound from the environment).
Regarding claim 3, the limitations of claim 2 have been addressed. Shukla and Pandey disclosed:
wherein: the image data comprises images of the environment of the user captured at predetermined intervals of time (Shukla, Paragraph 32, the person and the setting of the dialog are continuously (i.e., predetermined) observed, analyzed, and used to adaptively conduct dialog accordingly); and
the audio data comprises a continuous audio stream capturing the audio environment of the user and the sound produced by the user (Shukla, Paragraph 33, acoustic sensor 130 gathering utterances of the user or sound from the environment).
Regarding claim 4, the limitations of claim 3 have been addressed. Shukla and Pandey disclosed:
wherein generating the particular real-time context includes: generating, using a vision language model, a textual description of the scene based on the image data (Shukla, Paragraph 43, the user device takes multi-modal data (audio, images, video, text) from the sensors and processes the multi-modal data to generate text representing the features of the raw multi-modal data. Paragraph 63, having different models (i.e., vision language model) to detect user state);
transcribing, using a speech recognition model, the audio data to generate a textual representation of the audio environment of the user and the sound produced by the user (Shukla, Paragraph 43, the user device takes multi-modal data (audio, images, video, text) from the sensors and processes the multi-modal data to generate text representing the features of the raw multi-modal data. Paragraph 62, speech recognition is performed based on the audio signal to determine the text of the user’s utterance); and
generating the particular real-time context based on i) the textual description of the scene and ii) the textual representation of the audio data (Shukla, Paragraph 35, inputs from the user’s side, such as the user’s utterance are provided to the user interaction system. Paragraph 36, the user interaction system processes the input, including other information such as the user is wearing a yellow shirt and is bored).
Regarding claim 5, the limitations of claim 4 have been addressed. Shukla and Pandey disclosed:
wherein generating the particular real-time context further includes: inferring, from one or both of the textual description of the scene and the textual representation of the audio data, a location of the user and an activity of the user (Shukla, Paragraph 64, the dialog environment includes lower level concepts such as objects present in the environment and higher level concepts such as places (office, park, beach) and also the nature of the place (vacation place, work place, transit place); and
generating the particular real-time context to include information indicative of the location of the user and the activity of the user (Shukla, Paragraph 65, the estimated user state and the determined contextual information of the underlying dialog (from paragraph 64) are utilized to adaptively have a conversation with a user or determine how to respond to the last response from the user).
Regarding claim 6, the limitations of claim 5 have been addressed. Shukla and Pandey disclosed:
wherein inferring the location of the user and the activity of the user includes: generating a prompt based on the textual description of the scene and the textual representation of the audio environment of the user and the sound produced by the user; and providing the prompt to a language model to infer the location of the user and the activity of the user (Shukla, Paragraph 62, based on the processed multimodal data (i.e., prompt) from the multimodal analysis unit 510, the user’s state and contextual information surrounding the dialog scene are determined. Paragraph 64, the contextual info determiner 530 detects, based on environment detection models 525, various objects present in the scene captured by multimodal data and extracts relevant features. The environment detection model 525 detects different types of scenes, different types of objects, and different characterizations of environments in order to estimate a place the user is at).
Regarding claim 7, the limitations of claim 2 have been addressed. Shukla and Pandey disclosed:
wherein the image data further includes data indicative of one or both of i) facial appearance of the user (Shukla, Paragraph 47, visual data captures the facial expression of the user) or ii) gaze direction of one or both eyes of the user.
Regarding claim 8, the limitations of claim 7 have been addressed. Shukla and Pandey disclosed:
further comprising: detecting, by the one or more processors, an emotional state of the user based on analyzing one or both of i) one or both of facial appearance or gaze direction of one or both eyes of the user obtained from the image data (Shukla, Paragraph 47, visual data captures the facial expression of the user) or ii) information indicative of user emotion obtained from the audio data; and
generating, by the one or more processors, the particular real-time context to further include information indicative of the emotional state of the user (Shukla, Paragraph 47, if the user appears to be in a bad mood, the user interaction system initiates using a different sentence, such as “are you okay?”).
Regarding claim 9, the limitations of claim 1 have been addressed. Shukla and Pandey disclosed:
wherein respective historical contexts, among the plurality of historical contexts, include one or both of i) summaries of daily events of the user or ii) summaries of previous conversations with the user (Pandey, Paragraph 45, obtaining one or more historical conversation logs (i.e., summaries of previous conversations) that comprise a plurality of agent responses matched to a plurality of conversation texts).
For motivation, please refer to claim 1.
Regarding claim 10, the limitations of claim 9 have been addressed. Shukla and Pandey disclosed:
wherein generating the plurality of historical contexts includes: clustering, based on similarities between the real-time contexts among the plurality of real-time contexts, subsets of the real-time contexts into respective daily events (Pandey, Paragraph 40, clustering responses by obtaining a set of agent responses as input and outputting a set of response clusters. Paragraph 41, applying homogeneity filtering (i.e., similarity) to the cluster);
generating, based on the subsets of the real-time contexts clustered into the respective daily events, respective summaries of the daily events (Pandey, Paragraph 45, obtaining one or more historical conversation logs (i.e., summaries) that comprise a plurality of agent responses matched to a plurality of conversation texts); and
generating the historical contexts to include the respective summaries of the daily events (Pandey, Paragraph 41, applying the homogeneity filter to the cluster and selecting a response cluster to create the representative response).
For motivation, please refer to claim 1.
Regarding claim 11, the limitations of claim 9 have been addressed. Shukla and Pandey disclosed:
wherein generating the plurality of historical contexts includes: separating previous conversations with the user into conversation sessions (Pandey, Paragraph 45, obtaining one or more historical conversation logs that comprise a plurality of agent responses matched to a plurality of conversation contexts);
generating respective conversation summaries of the conversation sessions (Pandey, Paragraph 46, using an input output pair of the first machine learning model matching a conversation context-designated response pair from the one or more modified conversation logs (i.e., summary)); and
generating the historical contexts to include the respective conversation summaries of the conversation sessions (Pandey, Paragraph 41, applying the homogeneity filter to the cluster and selecting a response cluster to create the representative response).
For motivation, please refer to claim 1.
Regarding claim 12, the limitations of claim 1 have been addressed. Shukla and Pandey disclosed:
further comprising: generating, by the one or more processors, respective sets of one or more indices for respective historical contexts, the one or more indices generated for a particular historical context including one or more of i) a temporal index indicative of a time associated with the particular historical context, ii) a spatial index indicative of a location associated with the particular historical context, and iii) a semantic index indicative of semantic content associated with the particular historical context (Pandey, Paragraph 46, using an input output pair (i.e., indices) of the first machine learning model matching a conversation context-designated response pair (i.e., semantic context) from the one or more modified conversation logs);
storing, by the one or more processors in a database, the plurality of historical contexts in association with corresponding ones of the respective sets of one or more indices (Pandey, Paragraph 45, the agent responses (output) are matched to conversation contexts (input) and are obtained by the conversation logs (i.e., stored)); and
performing associative retrieval based on the respective sets of one or more indices associated with the historical contexts in the database to identifying the one or more relevant historical contexts (Pandey, Paragraph 46, generating response embeddings for a portion of the agent responses, clustering the response embeddings into a plurality of clusters, applying the homogeneity filter, and then generating a response based on the filtered cluster(i.e., associative retrieval)).
For motivation, please refer to claim 1.
Regarding claim 13, the limitations of claim 1 have been addressed. Shukla and Pandey disclosed:
wherein: the method further comprises generating, by the one or more processors, a plurality of user profiles based on the plurality of historical contexts, wherein a particular user profiles, among the plurality of user profiles, includes a textual description of a particular aspect of the user (Shukla, Paragraphs 65, 68, user information such as preferences from a user profile archive. Generating a different initiating sequence based on a user’s preference from the user profile archive and something detected in the dialog scene such as a user’s appearance (i.e., aspect of the user) or objects in the environment of the user. Figure 5, showing a plurality of user profile archives 555); and
identifying the relevant user information further includes identifying one or more relevant user profiles from among the plurality of user profiles (Shukla, Paragraph 68, identifying that a user is known to love to play basketball from the user profile archive 555).
Regarding claim 15, the limitations of claim 1 have been addressed. Shukla and Pandey disclosed:
wherein generating the personalized response includes: generating a dialogue strategy based on the current real-time context (Shukla, Paragraph 38, changing topics to interest the user (i.e., dialog strategy));
identifying the relevant user information based on the dialogue strategy (Shukla, Paragraph 38, user interested in basketball and switching to that topic); and
generating the personalized response based on the current real-time context and the relevant user information identified based on the dialogue strategy (Shukla, Paragraph 50, the automated companion decides to leverage the knowledge of the user enjoying basketball to make the dialog more engaging for the user).
Regarding claim 18, the limitations of claim 17 have been addressed. Shukla and Pandey disclosed:
wherein generating the particular real-time context includes: generating, using a vision language model, a textual description of the scene based on the image data (Shukla, Paragraph 43, the user device takes multi-modal data (audio, images, video, text) from the sensors and processes the multi-modal data to generate text representing the features of the raw multi-modal data. Paragraph 63, having different models (i.e., vision language model) to detect user state);
transcribing, using a speech recognition model, the audio data to generate a textual representation of the audio environment of the user and the sound produced by the user (Shukla, Paragraph 43, the user device takes multi-modal data (audio, images, video, text) from the sensors and processes the multi-modal data to generate text representing the features of the raw multi-modal data. Paragraph 62, speech recognition is performed based on the audio signal to determine the text of the user’s utterance);
inferring, from one or both of the textual description of the scene and the textual representation of the audio data, a location of the user and an activity of the user (Shukla, Paragraph 64, the dialog environment includes lower level concepts such as objects present in the environment and higher level concepts such as places (office, park, beach) and also the nature of the place (vacation place, work place, transit place); and
generating the particular real-time context to include information indicative of the location of the user and the activity of the user (Shukla, Paragraph 65, the estimated user state and the determined contextual information of the underlying dialog (from paragraph 64) are utilized to adaptively have a conversation with a user or determine how to respond to the last response from the user).
Regarding claim 19, the limitations of claim 16 have been addressed. Shukla and Pandey disclosed:
wherein generating the plurality of historical contexts includes: clustering, based on similarities between the real-time contexts among the plurality of real-time contexts, subsets of the real-time contexts into respective daily events (Pandey, Paragraph 40, clustering responses by obtaining a set of agent responses as input and outputting a set of response clusters. Paragraph 41, applying homogeneity filtering (i.e., similarity) to the cluster);
generating, based on the subsets of the real-time contexts clustered into the respective daily events, respective summaries of the daily events (Pandey, Paragraph 45, obtaining one or more historical conversation logs (i.e., summaries) that comprise a plurality of agent responses matched to a plurality of conversation texts);
separating previous conversations with the user into conversation sessions; generating respective summaries of the conversation sessions (Pandey, Paragraph 45, obtaining one or more historical conversation logs that comprise a plurality of agent responses matched to a plurality of conversation contexts); and
generating the historical contexts to include i) the respective summaries of the daily events (Pandey, Paragraph 41, applying the homogeneity filter to the cluster and selecting a response cluster to create the representative response) and ii) the respective summaries of the conversation sessions (Pandey, Paragraph 46, using an input output pair of the first machine learning model matching a conversation context-designated response pair from the one or more modified conversation logs (i.e., summary).
For motivation, please refer to claim 1.
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Shukla (US 2019/0206407) in view of Pandey et al. (US 2023/0267278) in view of Emma et al. (US 2019/0156222).
Regarding claim 14, the limitations of claim 13 have been addressed. Shukla and Pandey disclosed:
wherein generating the plurality of user profiles includes: generating a new user profile based on a historical context among the plurality of historical contexts (Shukla, Paragraph 68, for a new conversation with a new user, initiate the conversation at its initiate node and determine if that is appropriate given the estimated state of the user).
Shukla and Pandey did not explicitly disclose querying a database, that stores user profiles, to determine whether there is a stored user profile that satisfies a similarity criteria with the new user profile; in response to determining that there is a stored profile that satisfies the similarity criteria with the new user profile, updating the stored user profile based on the new user profile; and in response to determining that there is no stored user profile that satisfies the similarity criteria with the new user profile, storing the new user profile in the database as a separate new user profile.
However, in an analogous art, Emma disclosed querying a database, that stores user profiles, to determine whether there is a stored user profile that satisfies a similarity criteria with the new user profile (Paragraph 46, as the user interacts with the AI, each interaction is analyzed, categorized, tagged, and stored in a database. The AI develops the user profile for the user with extracted elements from the interactions. The extracted elements are compared to pre-built personality profiles and matched (i.e., similarity criteria) with at least one of these personality profiles);
in response to determining that there is a stored profile that satisfies the similarity criteria with the new user profile, updating the stored user profile based on the new user profile (Paragraph 101, based on the analyzed, categorized, and tagged media, the system extracts the text, audio, or video clips and updates the profile based on the extracted elements);
in response to determining that there is no stored user profile that satisfies the similarity criteria with the new user profile, storing the new user profile in the database as a separate new user profile (Paragraph 52, elements are extracted and stored in a new personality profile).
One of ordinary skill in the art would have been motivated to combine the teachings of Shukla and Pandey with Emma because the references involve AI learning through conversations, and as such, are within the same environment.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the user profile of Emma with the teachings of Shukla and Pandey in order to allow for improved conversation and artificial personality development (Emma, Paragraph 15).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Steven C. Nguyen whose telephone number is (571)270-5663. The examiner can normally be reached M-F 7AM - 3PM and alternatively, through e-mail at Steven.Nguyen2@USPTO.gov.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christopher Parry can be reached at 571-272-8328. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/S.C.N/ Examiner, Art Unit 2451
/Chris Parry/ Supervisory Patent Examiner, Art Unit 2451