DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55. Priority is being given to 01/09/2023.
Status of Claims
This action is in reply to the amendments filed on 12/15/2025.
Claims 1-20 are currently pending and have been examined.
Claims 1, 9, 11, and 20 are amended.
Claims 1-20 are currently rejected.
This action is made FINAL.
Response to Arguments
Applicant’s arguments filed 12/15/2025 have been fully considered but they are not persuasive.
Applicant’s arguments with regards to the art rejections have been considered and appear to be directed solely to the instant amendments to the claims. Accordingly, the claims are addressed in the body of the rejections below.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-2, 11, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Honda et. al. (US 2020/0321006), herein Honda in view of Kim et. al. (KR 20190011458), herein Kim.
Regarding claim 1:
Honda teaches:
A system (fig. 1, agent system 1) for a vehicle (fig. 1, vehicle M), the system comprising:
a wireless interface (fig. 1, network NW) configured to connect a server (fig. 1, agent servers 200 and web servers 300) with an input device (fig. 2, microphone 10) and an output device of the vehicle (fig. 2, display 20 and speaker 30); and
the server comprising:
one or more processors (fig. 4, natural language processor 222); and
a memory (fig. 4, storage 250) storing:
sample data, associated with the vehicle (The personal profile 254 includes personal information, preferences, past conversation histories, and the like of occupants stored for each occupant [0067]), that match a plurality of output responses corresponding respectively to the sample data (the conversation manager 224 sets certainty factors of response results having high degrees of matching with the interests of the occupant P to be high with reference to the personal profile 254. For example, when an interest of the occupant P is “dining,” the conversation manager 224 sets a certainty factor of “Italian restaurant” to be higher than those of other information. The conversation manager 224 may set higher certainty factors for higher evaluation results (recommendation degrees) of general users with respect to establishments acquired from the various web server 300. [0093]);
and instructions (a CPU executing a program (software) [0061]) that, when executed by the one or more processors (fig. 4, natural language processor 222), cause the server to:
generate, based on input data received (the agent function 150-1 transmits a voice stream or a voice stream on which processing such as compression or encoding has been performed, acquired from the microphone 10, the audio processor 112, or the like to the agent server 200-1. [0063]), via the wireless interface from the input device of the vehicle (fig. 4, data transferred from 150-1 to 210), a sample datum from the stored sample data (When the voice stream is acquired, the speech recognizer 220 performs speech recognition and outputs text information and the natural language processor 222 performs semantic interpretation on the text information with reference to the dictionary DB 252 [0065]; When text such as “Today's weather” or “How is the weather today?” is recognized as a speech recognition result, for example, the natural language processor 222 generates an internal state in which a user intention has been replaced with “Weather: today.” Accordingly, even when request speech includes variations in text and differences in wording, it is possible to easily make a conversation suitable for the request. The natural language processor 222 may recognize the meaning of text information using artificial intelligence processing such as machine learning processing using probabilities and generate a command based on a recognition result, for example. [0066]);
retrieve, from the memory, an output response (The response sentence generator 228 generates a response sentence [0070]), of the plurality of output response, that matches the sample datum (The conversation manager 224 determines details of a response (for example, details of an utterance for the occupant and an image to be output) for the occupant of the vehicle M with reference to the personal profile 254, the knowledge base DB 256 and the response rule DB 258 on the basis of an input command [0067]);
output, via the wireless interface to the output device of the vehicle (fig. 4, data transferred from 210 to 150-1), the retrieved output response (The response sentence generator 228 generates a response sentence and transmits the generated response sentence (response result) to the agent apparatus 100 such that details of the utterance determined by the conversation manager 224 are delivered to the occupant of the vehicle M [0070]);
perform multi-task learning (Each of the plurality of agent functions 150-1 to 150-3 determines response details on the basis of the personal profile 254, the knowledge base DB 256 and the response rule DB 258 provided in the storage 250 thereof and determines a certainty factor for the response details, for example. [0092]) based on the input data (it is assumed that, when a command of “Where are recently popular establishments?” has been received from the occupant P, the conversation manager 224 has acquired information of “clothing shop,” “shoes shop” and “Italian restaurant” from the various web server 300 as information corresponding to the command through the network retriever 226. Here, the conversation manager 224 sets certainty factors of response results having high degrees of matching with the interests of the occupant P to be high with reference to the personal profile 254. For example, when an interest of the occupant P is “dining,” the conversation manager 224 sets a certainty factor of “Italian restaurant” to be higher than those of other information. [0093]) and the sample data (The conversation manager 224 may refer to the personal profile 254, refer to whether there have been the same questions in a history of recent conversations (for example, within one month), and when there have been the same questions, set certainty factors of response details the same as replies to the questions to be high. The history of conversations may be a history of conversations with the occupant P who has spoken or a history of conversations included in the personal profile 254 other than the occupant P. The conversation manager 224 may combine the above-described plurality of certainty factor setting conditions and set certainty factors. [0096]),
identify, based on the retrieved output response (when an interest of the occupant P is “dining,” the conversation manager 224 sets a certainty factor of “Italian restaurant” to be higher than those of other information. The conversation manager 224 may set higher certainty factors for higher evaluation results (recommendation degrees) of general users with respect to establishments acquired from the various web server 300. [0093]), a user intention (the agent function 150-1 selects a response result to be output, for example, on the basis of a certainty factor set for each response result. A certainty factor is, for example, a degree (index value) to which a response result for a request (command) included in an utterance of the occupant P is presumed to be a correct response. The certainty factor is, for example, a degree to which a response to an utterance of the occupant is presumed to be a response matching a request of the occupant or expected by the occupant [0092]); and
cause, based on the identified user intention (When it is determined that the certainty factor of the response result is not less than the threshold value in the process of step S202, the first agent function causes the output to output the generated response result (step S208) [0112]), the vehicle to be controlled (The agent apparatus 100 performs control with respect to a vehicle apparatus 50, and the like on the basis of a request from the occupant. [0031]; Agent functions may include a function of performing control of an apparatus in a vehicle (e.g., an apparatus with respect to driving control or vehicle body control), and the like [0026]).
Honda does not explicitly teach, however Kim teaches:
perform multi-task learning (The direct comparison method is a method of setting a unit of a recognition target word, a phoneme, etc. as a feature vector model and comparing how similar the input speech is, and a vector quantization method is typically used [page 15]) based on the input data (“input speech”) and the sample data (“reference model”), wherein the multi-task learning uses a plurality of representative questions corresponding to the input data as output data (According to the vector quantization method, a feature vector of input speech data is mapped to a codebook, which is a reference model, and is encoded into a representative value, thereby comparing the code values with each other. [page 15]);
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda to include the teachings as taught by Kim with a reasonable expectation of success. Kim teaches the benefit of “shifting the gaze and releasing the steering wheel as a means for the user to visually identify information or operate the device during operation is a threat to safe driving. Therefore, it is expected that a conversation system that grasps a user 's intention through a conversation with a user and provides a necessary service to the user can provide a safer and more convenient service when applied to a vehicle. [Kim, page 2]”.
Regarding Claim 2:
Honda in view of Kim teaches all the limitations of claim 1, upon which this claim is dependent.
Honda further teaches:
encode an input sequence corresponding to the input data (the agent function 150-1 transmits a voice stream or a voice stream on which processing such as compression or encoding has been performed [0063]); and
classify the sample datum based on the encoded input sequence (the natural language processor 222 performs semantic interpretation on the text information with reference to the dictionary DB 252 [0065]).
Regarding claim 11:
Honda teaches:
A method for controlling a system (an agent apparatus control method [0002]), the method comprising:
establishing a communication channel, via a wireless interface (fig. 1, network NW) and between:
a server (fig. 1, agent servers 200 and web servers 300), and
an input device of a vehicle (fig. 2, microphone 10) and an output device of the vehicle (fig. 2, display 20 and speaker 30);
storing sample data (fig. 4, storage 250), associated with the vehicle (The personal profile 254 includes personal information, preferences, past conversation histories, and the like of occupants stored for each occupant [0067]), that match a plurality of output responses corresponding respectively to the sample data (the conversation manager 224 sets certainty factors of response results having high degrees of matching with the interests of the occupant P to be high with reference to the personal profile 254. For example, when an interest of the occupant P is “dining,” the conversation manager 224 sets a certainty factor of “Italian restaurant” to be higher than those of other information. The conversation manager 224 may set higher certainty factors for higher evaluation results (recommendation degrees) of general users with respect to establishments acquired from the various web server 300. [0093]);
performing multi-task learning (Each of the plurality of agent functions 150-1 to 150-3 determines response details on the basis of the personal profile 254, the knowledge base DB 256 and the response rule DB 258 provided in the storage 250 thereof and determines a certainty factor for the response details, for example. [0092]) based on input data (it is assumed that, when a command of “Where are recently popular establishments?” has been received from the occupant P, the conversation manager 224 has acquired information of “clothing shop,” “shoes shop” and “Italian restaurant” from the various web server 300 as information corresponding to the command through the network retriever 226. Here, the conversation manager 224 sets certainty factors of response results having high degrees of matching with the interests of the occupant P to be high with reference to the personal profile 254. For example, when an interest of the occupant P is “dining,” the conversation manager 224 sets a certainty factor of “Italian restaurant” to be higher than those of other information. [0093]) and the sample data (The conversation manager 224 may refer to the personal profile 254, refer to whether there have been the same questions in a history of recent conversations (for example, within one month), and when there have been the same questions, set certainty factors of response details the same as replies to the questions to be high. The history of conversations may be a history of conversations with the occupant P who has spoken or a history of conversations included in the personal profile 254 other than the occupant P. The conversation manager 224 may combine the above-described plurality of certainty factor setting conditions and set certainty factors. [0096]);
based on the multi-task learning, determining a sample datum (When the voice stream is acquired, the speech recognizer 220 performs speech recognition and outputs text information and the natural language processor 222 performs semantic interpretation on the text information with reference to the dictionary DB 252 [0065]; When text such as “Today's weather” or “How is the weather today?” is recognized as a speech recognition result, for example, the natural language processor 222 generates an internal state in which a user intention has been replaced with “Weather: today.” Accordingly, even when request speech includes variations in text and differences in wording, it is possible to easily make a conversation suitable for the request. The natural language processor 222 may recognize the meaning of text information using artificial intelligence processing such as machine learning processing using probabilities and generate a command based on a recognition result, for example. [0066]), from among the stored same data, that corresponds to input data received, via the communication channel (fig. 4, data transferred from 150-1 to 210) and from the input device of the vehicle (the agent function 150-1 transmits a voice stream or a voice stream on which processing such as compression or encoding has been performed, acquired from the microphone 10, the audio processor 112, or the like to the agent server 200-1. [0063]);
determining, from the plurality of output response, an output response (The response sentence generator 228 generates a response sentence [0070]) that matches the determined sample datum (The conversation manager 224 determines details of a response (for example, details of an utterance for the occupant and an image to be output) for the occupant of the vehicle M with reference to the personal profile 254, the knowledge base DB 256 and the response rule DB 258 on the basis of an input command [0067]);
sending, via the communication channel to the output devices of the vehicle (fig. 4, data transferred from 210 to 150-1), the output response (The response sentence generator 228 generates a response sentence and transmits the generated response sentence (response result) to the agent apparatus 100 such that details of the utterance determined by the conversation manager 224 are delivered to the occupant of the vehicle M [0070]);
identifying, based on the output response (when an interest of the occupant P is “dining,” the conversation manager 224 sets a certainty factor of “Italian restaurant” to be higher than those of other information. The conversation manager 224 may set higher certainty factors for higher evaluation results (recommendation degrees) of general users with respect to establishments acquired from the various web server 300. [0093]), a user intention (the agent function 150-1 selects a response result to be output, for example, on the basis of a certainty factor set for each response result. A certainty factor is, for example, a degree (index value) to which a response result for a request (command) included in an utterance of the occupant P is presumed to be a correct response. The certainty factor is, for example, a degree to which a response to an utterance of the occupant is presumed to be a response matching a request of the occupant or expected by the occupant [0092]); and
causing, based on the identified user intention (When it is determined that the certainty factor of the response result is not less than the threshold value in the process of step S202, the first agent function causes the output to output the generated response result (step S208) [0112]), the vehicle to be controlled (The agent apparatus 100 performs control with respect to a vehicle apparatus 50, and the like on the basis of a request from the occupant. [0031]; Agent functions may include a function of performing control of an apparatus in a vehicle (e.g., an apparatus with respect to driving control or vehicle body control), and the like [0026]).
Honda does not explicitly teach, however Kim teaches:
perform multi-task learning (The direct comparison method is a method of setting a unit of a recognition target word, a phoneme, etc. as a feature vector model and comparing how similar the input speech is, and a vector quantization method is typically used [page 15]) based on the input data (“input speech”) and the sample data (“reference model”), wherein the multi-task learning uses a plurality of representative questions corresponding to the input data as output data (According to the vector quantization method, a feature vector of input speech data is mapped to a codebook, which is a reference model, and is encoded into a representative value, thereby comparing the code values with each other. [page 15]);
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda to include the teachings as taught by Kim with a reasonable expectation of success. Kim teaches the benefit of “shifting the gaze and releasing the steering wheel as a means for the user to visually identify information or operate the device during operation is a threat to safe driving. Therefore, it is expected that a conversation system that grasps a user 's intention through a conversation with a user and provides a necessary service to the user can provide a safer and more convenient service when applied to a vehicle. [Kim, page 2]”.
Regarding claim 20:
Honda teaches:
A non-transitory computer-readable recording medium storing instructions (fig. 4, storage 250) that, when executed by a system, cause:
a server (fig. 1, agent servers 200 and web servers 300), and
an input device of a vehicle (fig. 2, microphone 10) and an output device of the vehicle (fig. 2, display 20 and speaker 30);
storing sample data (fig. 4, storage 250), associated with the vehicle (The personal profile 254 includes personal information, preferences, past conversation histories, and the like of occupants stored for each occupant [0067]), that match a plurality of output responses corresponding respectively to the sample data (the conversation manager 224 sets certainty factors of response results having high degrees of matching with the interests of the occupant P to be high with reference to the personal profile 254. For example, when an interest of the occupant P is “dining,” the conversation manager 224 sets a certainty factor of “Italian restaurant” to be higher than those of other information. The conversation manager 224 may set higher certainty factors for higher evaluation results (recommendation degrees) of general users with respect to establishments acquired from the various web server 300. [0093]);
performing multi-task learning (Each of the plurality of agent functions 150-1 to 150-3 determines response details on the basis of the personal profile 254, the knowledge base DB 256 and the response rule DB 258 provided in the storage 250 thereof and determines a certainty factor for the response details, for example. [0092]) based on input data (it is assumed that, when a command of “Where are recently popular establishments?” has been received from the occupant P, the conversation manager 224 has acquired information of “clothing shop,” “shoes shop” and “Italian restaurant” from the various web server 300 as information corresponding to the command through the network retriever 226. Here, the conversation manager 224 sets certainty factors of response results having high degrees of matching with the interests of the occupant P to be high with reference to the personal profile 254. For example, when an interest of the occupant P is “dining,” the conversation manager 224 sets a certainty factor of “Italian restaurant” to be higher than those of other information. [0093]) and the sample data (The conversation manager 224 may refer to the personal profile 254, refer to whether there have been the same questions in a history of recent conversations (for example, within one month), and when there have been the same questions, set certainty factors of response details the same as replies to the questions to be high. The history of conversations may be a history of conversations with the occupant P who has spoken or a history of conversations included in the personal profile 254 other than the occupant P. The conversation manager 224 may combine the above-described plurality of certainty factor setting conditions and set certainty factors. [0096]);
based on the multi-task learning, determining a sample datum (When the voice stream is acquired, the speech recognizer 220 performs speech recognition and outputs text information and the natural language processor 222 performs semantic interpretation on the text information with reference to the dictionary DB 252 [0065]; When text such as “Today's weather” or “How is the weather today?” is recognized as a speech recognition result, for example, the natural language processor 222 generates an internal state in which a user intention has been replaced with “Weather: today.” Accordingly, even when request speech includes variations in text and differences in wording, it is possible to easily make a conversation suitable for the request. The natural language processor 222 may recognize the meaning of text information using artificial intelligence processing such as machine learning processing using probabilities and generate a command based on a recognition result, for example. [0066]), from among the stored same data, that corresponds to input data received, via the communication channel (fig. 4, data transferred from 150-1 to 210) and from the input device of the vehicle (the agent function 150-1 transmits a voice stream or a voice stream on which processing such as compression or encoding has been performed, acquired from the microphone 10, the audio processor 112, or the like to the agent server 200-1. [0063]);
determining, from the plurality of output response, an output response (The response sentence generator 228 generates a response sentence [0070]) that matches the determined sample datum (The conversation manager 224 determines details of a response (for example, details of an utterance for the occupant and an image to be output) for the occupant of the vehicle M with reference to the personal profile 254, the knowledge base DB 256 and the response rule DB 258 on the basis of an input command [0067]);
sending, via the communication channel to the output devices of the vehicle (fig. 4, data transferred from 210 to 150-1), the output response (The response sentence generator 228 generates a response sentence and transmits the generated response sentence (response result) to the agent apparatus 100 such that details of the utterance determined by the conversation manager 224 are delivered to the occupant of the vehicle M [0070]);
identifying, based on the output response (when an interest of the occupant P is “dining,” the conversation manager 224 sets a certainty factor of “Italian restaurant” to be higher than those of other information. The conversation manager 224 may set higher certainty factors for higher evaluation results (recommendation degrees) of general users with respect to establishments acquired from the various web server 300. [0093]), a user intention (the agent function 150-1 selects a response result to be output, for example, on the basis of a certainty factor set for each response result. A certainty factor is, for example, a degree (index value) to which a response result for a request (command) included in an utterance of the occupant P is presumed to be a correct response. The certainty factor is, for example, a degree to which a response to an utterance of the occupant is presumed to be a response matching a request of the occupant or expected by the occupant [0092]); and
causing, based on the identified user intention (When it is determined that the certainty factor of the response result is not less than the threshold value in the process of step S202, the first agent function causes the output to output the generated response result (step S208) [0112]), the vehicle to be controlled (The agent apparatus 100 performs control with respect to a vehicle apparatus 50, and the like on the basis of a request from the occupant. [0031]; Agent functions may include a function of performing control of an apparatus in a vehicle (e.g., an apparatus with respect to driving control or vehicle body control), and the like [0026]).
Honda does not explicitly teach, however Kim teaches:
perform multi-task learning (The direct comparison method is a method of setting a unit of a recognition target word, a phoneme, etc. as a feature vector model and comparing how similar the input speech is, and a vector quantization method is typically used [page 15]) based on the input data (“input speech”) and the sample data (“reference model”), wherein the multi-task learning uses a plurality of representative questions corresponding to the input data as output data (According to the vector quantization method, a feature vector of input speech data is mapped to a codebook, which is a reference model, and is encoded into a representative value, thereby comparing the code values with each other. [page 15]);
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda to include the teachings as taught by Kim with a reasonable expectation of success. Kim teaches the benefit of “shifting the gaze and releasing the steering wheel as a means for the user to visually identify information or operate the device during operation is a threat to safe driving. Therefore, it is expected that a conversation system that grasps a user 's intention through a conversation with a user and provides a necessary service to the user can provide a safer and more convenient service when applied to a vehicle. [Kim, page 2]”.
Claim(s) 3 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Honda et. al. (US 2020/0321006), herein Honda in view of Kim et. al. (KR 20190011458), herein Kim in further view of Kohita (US 2021/0365485), herein Kohita and Lin et. al. (US 2024/0029132), herein Lin.
Regarding Claim 3:
Honda in view of Kim teaches all the limitations of claim 2, upon which this claim is dependent.
Honda in view of Kim does not explicitly teach, however Kohita teaches:
perform global encoding on the input sequence (the global encoding is calculated based on each of the local encodings of the words in a self-attention fashion, and determining, via an editorial agent, a Q-value for each of the words in terms of each of three actions based on the status. [0004]); and
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim to include the teachings as taught by Kohita with a reasonable expectation of success. Kohita teaches the benefit of “Text summarization refers to the technique of shortening long pieces of text. The intention is to create a coherent and fluent summary having only the main points outlined in the document. Automatic text summarization is a common problem in machine learning and natural language processing (NLP) [Kohita, 0003]”
Honda in view of Kim and Kohita does not explicitly teach, however Lin teaches:
perform bidirectional encoding on the global encoded input sequence (a language model may be based on a bi-directional encoding language model, such as BERT, which may include a conditional random field (e.g., a BERT-CRF model) [0066]).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim and Kohita to include the teachings as taught by Lin with a reasonable expectation of success. Lin teaches the benefit of “the trained attribute extraction model for a first category may be applied to items in a second category and then, based on the predicted likelihood or frequency of attribute values in items of the second category (as predicted by the extraction model of the first category), identify additional attributes and/or attribute values for the second category. After determining additional attributes or attribute values for items in a category, they then may be used to augment the attribute schema of the category and/or provide additional attributes and attribute values to be identified in items of the category, enabling further labeling and improved use of the items based on the additional attribute labels. For example, further item evaluation or search may be improved by the additional attributes determined in the augmented attribute schema. [Lin, 0004]”.
Regarding Claim 12:
Honda in view of Kim teaches all the limitations of claim 11, upon which this claim is dependent.
Honda in view of Kim does not explicitly teach, however Kohita teaches:
perform global encoding on the input sequence (the global encoding is calculated based on each of the local encodings of the words in a self-attention fashion, and determining, via an editorial agent, a Q-value for each of the words in terms of each of three actions based on the status. [0004]); and
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim to include the teachings as taught by Kohita with a reasonable expectation of success. Kohita teaches the benefit of “Text summarization refers to the technique of shortening long pieces of text. The intention is to create a coherent and fluent summary having only the main points outlined in the document. Automatic text summarization is a common problem in machine learning and natural language processing (NLP) [Kohita, 0003]”
Honda in view of Kim and Kohita does not explicitly teach, however Lin teaches:
perform bidirectional encoding on the global encoded input sequence (a language model may be based on a bi-directional encoding language model, such as BERT, which may include a conditional random field (e.g., a BERT-CRF model) [0066]).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim and Kohita to include the teachings as taught by Lin with a reasonable expectation of success. Lin teaches the benefit of “the trained attribute extraction model for a first category may be applied to items in a second category and then, based on the predicted likelihood or frequency of attribute values in items of the second category (as predicted by the extraction model of the first category), identify additional attributes and/or attribute values for the second category. After determining additional attributes or attribute values for items in a category, they then may be used to augment the attribute schema of the category and/or provide additional attributes and attribute values to be identified in items of the category, enabling further labeling and improved use of the items based on the additional attribute labels. For example, further item evaluation or search may be improved by the additional attributes determined in the augmented attribute schema. [Lin, 0004]”.
Claim(s) 4-5 and 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Honda et. al. (US 2020/0321006), herein Honda in view of Kim et. al. (KR 20190011458), herein Kim in further view of Tong et. al. (US 2024/0202535), herein Tong.
Regarding Claim 4:
Honda in view of Kim teaches all the limitations of claim 2, upon which this claim is dependent.
Honda in view of Kim does not explicitly teach, however Tong teaches:
calculate a loss value of the classified sample datum (the first preset condition may be set to that a loss value of the first model tends to converge, or that a loss value of the first model is less than a preset value [0122]); and
adjust, based on the calculated loss value, a weight of a deep learning model used for the multi-task learning (The transformer model is a deep learning model that weights all parts of input data based on an attention mechanism [0004]).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim to include the teachings as taught by Tong with a reasonable expectation of success. Tong teaches the benefit of “the model training system adds an additional supervision signal to training of the first model by using the second output obtained by performing inference on the training data by the second model that is complementary to the first model in performance, and promotes the first model to learn from the second model complementary to the first model, so that the first model can accelerate convergence, and does not need to be pre-trained on a large-scale data set, to greatly shorten training time, improve training efficiency of the first model, and meet a service requirement. [Tong, 0009]”.
Regarding Claim 5:
Honda in view of Kim teaches all the limitations of claim 2, upon which this claim is dependent.
Honda in view of Kim does not explicitly teach, however Tong teaches:
calculate a first loss value for the classified sample datum (The model training system 100 determines a first contrastive loss based on the first feature extracted by the first model from the training data [0109]) and a second loss (determine a second contrastive loss [0125]) value for contrastive learning (a learning target similar to that in a contrastive learning manner is added to the model training system 100, and the model training system 100 adds an additional supervision signal to training of another AI model by using a feature learned from an AI model [0153]).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim to include the teachings as taught by Tong with a reasonable expectation of success. Tong teaches the benefit of “the model training system adds an additional supervision signal to training of the first model by using the second output obtained by performing inference on the training data by the second model that is complementary to the first model in performance, and promotes the first model to learn from the second model complementary to the first model, so that the first model can accelerate convergence, and does not need to be pre-trained on a large-scale data set, to greatly shorten training time, improve training efficiency of the first model, and meet a service requirement. [Tong, 0009]”.
Regarding Claim 13:
Honda in view of Kim teaches all the limitations of claim 11, upon which this claim is dependent.
Honda in view of Kim does not explicitly teach, however Tong teaches:
calculating a loss value of the classified sample datum (the first preset condition may be set to that a loss value of the first model tends to converge, or that a loss value of the first model is less than a preset value [0122]); and
adjust, based on the calculated loss value, a weight of a deep learning model used for the multi-task learning (The transformer model is a deep learning model that weights all parts of input data based on an attention mechanism [0004]).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim to include the teachings as taught by Tong with a reasonable expectation of success. Tong teaches the benefit of “the model training system adds an additional supervision signal to training of the first model by using the second output obtained by performing inference on the training data by the second model that is complementary to the first model in performance, and promotes the first model to learn from the second model complementary to the first model, so that the first model can accelerate convergence, and does not need to be pre-trained on a large-scale data set, to greatly shorten training time, improve training efficiency of the first model, and meet a service requirement. [Tong, 0009]”.
Regarding Claim 14:
Honda in view of Kim teaches all the limitations of claim 11, upon which this claim is dependent.
Honda in view of Kim does not explicitly teach, however Tong teaches:
calculating a first loss value for the classified sample datum (The model training system 100 determines a first contrastive loss based on the first feature extracted by the first model from the training data [0109]) and a second loss (determine a second contrastive loss [0125]) value for contrastive learning (a learning target similar to that in a contrastive learning manner is added to the model training system 100, and the model training system 100 adds an additional supervision signal to training of another AI model by using a feature learned from an AI model [0153]).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim to include the teachings as taught by Tong with a reasonable expectation of success. Tong teaches the benefit of “the model training system adds an additional supervision signal to training of the first model by using the second output obtained by performing inference on the training data by the second model that is complementary to the first model in performance, and promotes the first model to learn from the second model complementary to the first model, so that the first model can accelerate convergence, and does not need to be pre-trained on a large-scale data set, to greatly shorten training time, improve training efficiency of the first model, and meet a service requirement. [Tong, 0009]”.
Claim(s) 6 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Honda et. al. (US 2020/0321006), herein Honda in view of Kim et. al. (KR 20190011458), herein Kim in further view of Tong et. al. (US 2024/0202535), herein Tong in further view of Weng (US 2023/0099906), herein Weng.
Regarding Claim 6:
Honda in view of Kim and Tong teaches all the limitations of claim 5, upon which this claim is dependent.
Honda in view of Kim and Tong does not explicitly teach, however Wang teaches:
sum up the first loss value and the second loss value to obtain a total loss value (calculating a weighted sum of the image similarity loss value and the translation amount loss value to obtain a total loss value [0015]); and
adjust, based on the total loss value (based on the total loss value to obtain the image registration model [0015]), a weight of a deep learning model used for the multi-task learning (adjusting the model parameters of the deep learning model based on the total loss value to obtain the image registration model [0015]).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim and Tong to include the teachings as taught by Weng with a reasonable expectation of success. Weng teaches the benefit of “training a deep learning model by using one sample group including a moving image sample and a reference image sample to obtain a transformation matrix, generating an auxiliary moving image and an auxiliary reference image according to a predetermined size, the moving image sample, and the reference image sample, performing an image transformation processing on the auxiliary moving image according to the transformation matrix to obtain a transformed auxiliary moving image, calculating an image similarity loss value according to the transformed auxiliary moving image and the auxiliary reference image, calculating a translation amount loss value according to an anatomical key point in the moving image sample and a corresponding anatomical key point in the reference image sample, adjusting model parameters of the deep learning model based on the image similarity loss value and the translation amount loss value to obtain the image registration model, and returning to perform step of training the deep learning model by using one sample group including the moving image sample and the reference image sample to obtain the transformation matrix, till a training loss value is convergent and less than a loss value threshold to obtain the pre-trained image registration model [Weng, 0012].”
Regarding Claim 15:
Honda in view of Kim and Tong teaches all the limitations of claim 14, upon which this claim is dependent.
Honda in view of Kim and Tong does not explicitly teach, however Wang teaches:
summing up the first loss value and the second loss value to obtain a total loss value (calculating a weighted sum of the image similarity loss value and the translation amount loss value to obtain a total loss value [0015]); and
adjusting, based on the total loss value (based on the total loss value to obtain the image registration model [0015]), a weight of a deep learning model used for the multi-task learning (adjusting the model parameters of the deep learning model based on the total loss value to obtain the image registration model [0015]).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim and Tong to include the teachings as taught by Weng with a reasonable expectation of success. Weng teaches the benefit of “training a deep learning model by using one sample group including a moving image sample and a reference image sample to obtain a transformation matrix, generating an auxiliary moving image and an auxiliary reference image according to a predetermined size, the moving image sample, and the reference image sample, performing an image transformation processing on the auxiliary moving image according to the transformation matrix to obtain a transformed auxiliary moving image, calculating an image similarity loss value according to the transformed auxiliary moving image and the auxiliary reference image, calculating a translation amount loss value according to an anatomical key point in the moving image sample and a corresponding anatomical key point in the reference image sample, adjusting model parameters of the deep learning model based on the image similarity loss value and the translation amount loss value to obtain the image registration model, and returning to perform step of training the deep learning model by using one sample group including the moving image sample and the reference image sample to obtain the transformation matrix, till a training loss value is convergent and less than a loss value threshold to obtain the pre-trained image registration model [Weng, 0012].”
Claim(s) 7-9 and 16-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Honda et. al. (US 2020/0321006), herein Honda in view of Kim et. al. (KR 20190011458), herein Kim and Tong et. al. (US 2024/0202535), herein Tong in further view of Aguilar (US 2021/0104245), herein Aguilar.
Regarding Claim 7:
Honda in view of Kim and Tong teaches all the limitations of claim 5, upon which this claim is dependent.
Honda in view of Kim and Tong does not explicitly teach, however Aguilar teaches:
calculate the second loss value (The contrastive module 640 also determines a contrastive loss value based on the comparison of the acoustic view loss value and the lexical view loss value. The contrastive module 640 updates the model data 622 using the lexical view loss value and/or the contrastive loss value, thus embedding a portion of the information/data learned by the lexical view/ML model 630 into the trained model 515. [0121]) based on a hidden state (the hidden state [0119]), a positive sample (positive samples [0122]), and a negative sample for the input data (the negative samples [0123]).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim and Tong to include the teachings as taught by Aguilar with a reasonable expectation of success. Aguilar teaches the benefit of being able to “identify portions of the input audio data that represents speech from a particular user. The portions of the input audio data may be processed using a trained machine learning (ML) model to predict a sentiment category for the audio data. The sentiment category may be used in various applications. For example, the sentiment category may be displayed to a user to indicate his or her sentiments during interactions with other persons, and/or to indicate his or her sentiment during particular times of the day. The sentiment category may also be used by application developers for voice-activated systems or smart speaker systems to identify emotions and/or sentiments of a user while interacting with the voice-activated system or smart speaker system. The application developer may be able to determine a user's satisfaction of his or her interactions with the voice-activated system or smart speaker system. Assuming user permission, other components may also receive sentiment data for different operations. [Aguilar, 0026]”
Regarding Claim 8:
Honda in view of Kim, Tong and Aguilar teaches all the limitations of claim 7, upon which this claim is dependent.
Aguilar further teaches:
wherein the positive sample includes a vector corresponding to a correct output response based on classifying the sample datum (These data vectors are used as positive samples of the triplets in the following contrastive loss function employed by the contrastive module 640 [0122]), and wherein the instructions, when executed by the one or more processors, further cause the system to determine the negative sample based on fact scores of a plurality of vector outputs and based on classifying the sample datum (a first word in the utterance may have a negative semantic (derived from the lexical feature vector) and the corresponding acoustic representation of the first word also indicates a high arousal implying anger, in which case the attention model 620, 635 is trained to realize that the first word corresponds to an anger sentiment and that should affect the processing of the other words in the utterance [0118]).
Regarding Claim 9:
Honda in view of Kim, Tong and Aguilar teaches all the limitations of claim 8, upon which this claim is dependent.
Kim further teaches:
wherein each of the vector corresponding to the correct output response and the predetermined number of vectors corresponds to one of the plurality of representative questions corresponding to the input data (The direct comparison method is a method of setting a unit of a recognition target word, a phoneme, etc. as a feature vector model and comparing how similar the input speech is, and a vector quantization method is typically used. According to the vector quantization method, a feature vector of input speech data is mapped to a codebook, which is a reference model, and is encoded into a representative value, thereby comparing the code values with each other. [page 15]).
Aguilar further teaches:
wherein the negative sample includes a predetermined number of vectors having a highest fact score, excluding a vector corresponding to the correct output response, among the fact scores of the plurality of vector outputs based on classifying the sample datum (where the + and − superscripts refer to positive and negative samples and dis is a distance function that calculates the similarity between two vectors. The system thus uses similar samples that are as close as possible and dissimilar ones that are as far as possible, forcing a margin of at least m for the negative samples. In some embodiments, the negative samples may be randomly chosen to force a different sentiment category and to force a different sentiment category that is acoustically similar to the positive sample (e.g., sadness vs. neutral, or anger vs. happiness) [0123]).
Regarding Claim 16:
Honda in view of Kim and Tong teaches all the limitations of claim 14, upon which this claim is dependent.
Honda in view of Kim and Tong does not explicitly teach, however Aguilar teaches:
calculating the second loss value (The contrastive module 640 also determines a contrastive loss value based on the comparison of the acoustic view loss value and the lexical view loss value. The contrastive module 640 updates the model data 622 using the lexical view loss value and/or the contrastive loss value, thus embedding a portion of the information/data learned by the lexical view/ML model 630 into the trained model 515. [0121]) based on a hidden state (the hidden state [0119]), a positive sample (positive samples [0122]), and a negative sample for the input data (the negative samples [0123]).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim and Tong to include the teachings as taught by Aguilar with a reasonable expectation of success. Aguilar teaches the benefit of being able to “identify portions of the input audio data that represents speech from a particular user. The portions of the input audio data may be processed using a trained machine learning (ML) model to predict a sentiment category for the audio data. The sentiment category may be used in various applications. For example, the sentiment category may be displayed to a user to indicate his or her sentiments during interactions with other persons, and/or to indicate his or her sentiment during particular times of the day. The sentiment category may also be used by application developers for voice-activated systems or smart speaker systems to identify emotions and/or sentiments of a user while interacting with the voice-activated system or smart speaker system. The application developer may be able to determine a user's satisfaction of his or her interactions with the voice-activated system or smart speaker system. Assuming user permission, other components may also receive sentiment data for different operations. [Aguilar, 0026]”
Regarding Claim 17:
Honda in view of Kim, Tong and Aguilar teaches all the limitations of claim 16, upon which this claim is dependent.
Aguilar further teaches:
wherein the positive sample includes a vector corresponding to a correct output response based on classifying the sample datum (These data vectors are used as positive samples of the triplets in the following contrastive loss function employed by the contrastive module 640 [0122]), and wherein the instructions, when executed by the one or more processors, further cause the system to determine the negative sample based on fact scores of a plurality of vector outputs and based on classifying the sample datum (a first word in the utterance may have a negative semantic (derived from the lexical feature vector) and the corresponding acoustic representation of the first word also indicates a high arousal implying anger, in which case the attention model 620, 635 is trained to realize that the first word corresponds to an anger sentiment and that should affect the processing of the other words in the utterance [0118]).
Regarding Claim 18:
Honda in view of Kim, Tong and Aguilar teaches all the limitations of claim 17, upon which this claim is dependent.
Aguilar further teaches:
wherein the negative sample includes a predetermined number of vectors having a highest fact score, excluding a vector corresponding to the correct output response, among the fact scores of the plurality of vector outputs based on classifying the sample datum (where the + and − superscripts refer to positive and negative samples and dis is a distance function that calculates the similarity between two vectors. The system thus uses similar samples that are as close as possible and dissimilar ones that are as far as possible, forcing a margin of at least m for the negative samples. In some embodiments, the negative samples may be randomly chosen to force a different sentiment category and to force a different sentiment category that is acoustically similar to the positive sample (e.g., sadness vs. neutral, or anger vs. happiness) [0123]).
Claim(s) 10 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Honda et. al. (US 2020/0321006), herein Honda in view of Kim et. al. (KR 20190011458), herein Kim in further view of Zhou et. al. (US 2021/0357441), herein Zhou.
Regarding Claim 10:
Honda in view of Kim teaches all the limitations of claim 1, upon which this claim is dependent.
Honda in view of Kim does not explicitly teach, however Zhou teaches:
wherein the sample data stored in the memory include a frequently asked question (FAQ) related to the vehicle (the program may monitor a user's activity on a help center page that provides a list of frequently asked questions (FAQs) and answers to the FAQs. One of the sources for the FAQs may be online discussion communities. The program may monitor which FAQs have been viewed or selected within a certain time period [0084]).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim to include the teachings as taught by Zhou with a reasonable expectation of success. Zhou teaches the benefit of an augmentation machine learning model may be configured to determine one or more variations of the user query that correspond to a semantic meaning of the user query. A plurality of response candidates may be determined that correspond to the user query by comparing the user query and the one or more variations of the user query to a plurality of documents. A semantic machine learning model may be configured to determine a final response candidate based on performing a semantic comparison between the plurality of response candidates and at least the user query.
Regarding Claim 19:
Honda in view of Kim teaches all the limitations of claim 11, upon which this claim is dependent.
Honda in view of Kim does not explicitly teach, however Zhou teaches:
wherein the sample data stored in the memory include a frequently asked question (FAQ) related to the vehicle (the program may monitor a user's activity on a help center page that provides a list of frequently asked questions (FAQs) and answers to the FAQs. One of the sources for the FAQs may be online discussion communities. The program may monitor which FAQs have been viewed or selected within a certain time period [0084]).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the claimed invention to have modified Honda in view of Kim to include the teachings as taught by Zhou with a reasonable expectation of success. Zhou teaches the benefit of an augmentation machine learning model may be configured to determine one or more variations of the user query that correspond to a semantic meaning of the user query. A plurality of response candidates may be determined that correspond to the user query by comparing the user query and the one or more variations of the user query to a plurality of documents. A semantic machine learning model may be configured to determine a final response candidate based on performing a semantic comparison between the plurality of response candidates and at least the user query.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Chen (US 2020/0075006) discloses The present application discloses a method, device, and system for interfacing with a terminal with a plurality of response modes. The method includes obtaining a voice command from a user, determining context information corresponding to the voice command, and performing a response operation in response to the voice command, the response operation being based at least in part on a response mode that is determined based at least in part on the context information, and the response mode indicating one or more interfaces for interaction between the terminal and the user.
Li (US 2020/0041993) discloses System and method for a user to navigate an autonomous vehicle manually or via verbal commands. In one aspect, seven commands are arranged for manual operation, including start, stop, forward, backward, turn-left, turn-right, and U-turn. A panel with buttons or with a knob and buttons is arranged for implementing the commands. In another aspect, when a verbal input is received from a user, the verbal input is analyzed to ascertain whether it contains any of the seven commands. A command issued manually or verbally causes the same maneuver and effect. In yet other aspects, methods to change a travel route and to predetermine vehicle orientation direction at a parking lot are provided.
Ghosh (US 2021/0370950) discloses method and interactive assistance system for providing personalized assistance to a driver or person in an autonomous vehicle. Parameters related to the user and the vehicle are monitored and compared with historical data to determine a deviation in the parameters. An abnormal condition is detected when the deviation is more than an optimal threshold. Further, a personalized interaction is initiated with the user through a selected one of the interactive assistance engine and one or more assistive activities are performed for handling the abnormal condition. In an embodiment, the method of present disclosure enhances both safety and user experience of the user of the autonomous vehicle.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Scott R Jagolinzer whose telephone number is (571)272-4180. The examiner can normally be reached M-Th 8AM - 4PM Eastern.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christian Chace can be reached at (571)272-4190. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Scott R. Jagolinzer
Examiner
Art Unit 3665
/S.R.J./Examiner, Art Unit 3665 /CHRISTIAN CHACE/Supervisory Patent Examiner, Art Unit 3665