Last updated: May 29, 2026
Application No. 17/874,972
METHOD FOR PROVIDING RESPONSE OF VOICE INPUT AND ELECTRONIC DEVICE SUPPORTING THE SAME

Final Rejection §103
Filed
Jul 27, 2022
Priority
Dec 16, 2020 — RE 10-2020-0176703 +1 more
Examiner
YOUNG, CAMERON KENNETH
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Samsung Electronics Co., Ltd.
OA Round
4 (Final)
Interview Optional

— +16.1% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 73% grant rate with +16.1% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 22 resolved cases, 2023–2026
Examiner Intelligence

YOUNG, CAMERON KENNETH View full profile →
Grants 73% — above average
Career Allowance Rate
16 granted / 22 resolved
+10.7% vs TC avg
Strong +16% interview lift
Without
With
+16.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
10 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
96.0%
+56.0% vs TC avg
§102
3.0%
-37.0% vs TC avg
§112
1.0%
-39.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 22 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
	Applicant’s amendment dated 08/21/2025 has been entered. No additional claims have been cancelled and no new claims have been added. Claims 1, 3 – 5, 7 – 9, 12 – 14, 16 – 18, and 20 remain pending within the application. 

Response to Arguments
Applicant’s arguments, see pages 10 – 12 of Applicant’s Response, filed 08/21/2025, with respect to interpretation of the claims as a mental process have been fully considered and are persuasive.  The 35 U.S.C. § 101 rejections of claims 1, 3 – 5, 7 – 9, 12 – 14, 16 – 18, and 20 have been withdrawn. 
Particularly, applicant’s arguments that the technological complexity exceeds the capability of the human mind, especially with regards to the assigning of quantitative weights and retrieval of feature information in real-time operation, demonstrate that although the core of the claims amounts to a mental process, the mental process is not a process the human mind is equipped to perform. As such, these arguments in addition to the newly made amendments overcome the previously made 35 U.S.C. § 101 rejections which have been removed.

Applicant's arguments regarding the combination of Fan, Metallinou, and Ikeno filed 08/21/2025 have been fully considered but they are not persuasive. Particularly, Applicant argues that the newly amended limitations of the claims are not present within the prior art. Examiner respectfully disagrees. 
As laid out with reference to the 35 U.S.C. § 103 rejections below, the references teach the newly amended limitations. Particularly, Examiner notes that the limitation of “assign at least one weight value to the retrieval data based on user preference information retrieved from a stored user profile by calculating a relevance score for each of a plurality of pieces of the retrieval data according to the degree of similarity between features of each piece of the retrieval data and the user preference information” is taught by Metallinou. Metallinou teaches evaluating and scoring retrieval data (i.e., assigning a weight value to data corresponding to the domains retrieved that may be able to handle a request) by calculating relevance scores for the data corresponding to the domains (retrieval data). Metallinou at 22:43 - 22:64. Further, Metallinou teaches that the speech processing system takes into account user preferences, which includes applications that are enabled for the user. Metallinou at 25:41 - 26:4. As such, because the retrieval data determines domains capable of performing the user's query and the user preferences determines what applications are enabled, then the retrieval data determining domains capable of performing the user's query are based, at least in part, on user preference information retrieved from a stored user profile which are based on a relevance to the query, including the applications enabled for the user.
Further, Applicant argues that the references do not disclose “obtain feature information at least partially similar to defined preference information, from the retrieval data based on the at least one weight value, wherein the feature information comprises at least one piece of the retrieval data having a highest relevance score;” Examiner respectfully disagrees. 
Particularly, Examiner notes that Metallinou teaches evaluating and scoring retrieval data (i.e., assigning a weight value to data corresponding to the domains retrieved that may be able to handle a request) by calculating relevance scores for the data corresponding to the domains (retrieval data). Metallinou at 22:43 - 22:64. Further, Metallinou contemplates using the highest confidence score (i.e., relevance score, the relevance scores estimate confidence) to determine the proper domain. Metallinou at 22:43 - 22:64.
Further still, Applicant argues that claim 1 incorporates limitations directed to correcting information within a response and inserting feature information into the response to correct that response. However, the combination of Fan, Metallinou, and Ikeno teaches this limitation. Metallinou teaches a slot filler component that replaces words with more accurate data retrieved to substitute broader elements. Metallinou at 30:12 - 30:27. This, in view of Ikeno’s evaluation of incorrect information in a message, amounts to identifying incorrect information and inserting missing information (i.e., higher detail information inserted in lieu of low information elements). Thus, the 35 U.S.C. § 103 rejections of claims 1, 3 – 5, 7 – 9, 12 – 14, 16 – 18, and 20 are maintained for at least this reason. 
Further, Applicant argues that Fan and Ikeno do not teach elements of the claims directed to determining an intent corresponding to a first or second response based on an intent from the user. Examiner respectfully disagrees. Fan’s teachings of determining a response based on a user’s intent, as laid out below with respect to the 35 U.S.C. § 103 rejections of claim 1, demonstrate that Fan’s intent determination and generation of multiple options for response amounts to determining a first or second response based on the user’s intent because when determining a response from a plurality of responses is, at least, a first response and a second response. As such, for all the reasons laid out above and in light of the rejections laid out below, the 35 U.S.C. § 103 rejections of Claims 1, 3 – 5, 7 – 9, 12 – 14, 16 – 18, and 20 are maintained. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3 – 5, 7 – 9, 12 – 14, 16 – 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent No. 11,646,035 B1 to Xing Fan et al. (hereinafter Fan) in view of U.S. Patent No. 10,515,625 B1 to Angeliki Metallinou (hereinafter Metallinou) and in further view of U.S. Patent Application Publication No. 2018/0068659 A1 to Atsushi Ikeno et al. (hereinafter Ikeno).
Regarding Claim 1, Fan teaches an electronic device comprising: (Fan teaches an electronic device comprising one or more elements (i.e., computing components, server components, microphones, etc.). Fan at 34:30 - 34:53 and Fig. 12.)
A microphone; (Fan teaches the electronic device comprising an audio capture component such as microphone 1220. Fan at 35:34 - 36:8 and Fig. 12.)
An output device comprising output circuitry; (Fan teaches the electronic device comprising an audio output component such as a headset, speaker, or other components capable of outputting audio. Fan at 35:34 - 36:8 and Fig. 12.)
At least one processor including processing circuitry; and (Fan teaches the electronic device comprising one or more processors (i.e., at least one) and other computer components connected to the processor. Fan at 34:54 - 35:23 and Fig. 12.)
Memory storing instructions; (Fan teaches the electronic device comprises memory coupled to the processor storing computer executable instructions. Fan at 34:54 - 35:23 and Fig. 12.)
which, when executed individually and/or collectively by the at least one processor, cause the electronic device to: Receive a voice input of a user through the microphone; (Fan teaches receiving first dialog data (i.e., voice input) representing a first user input where the input is spoken (i.e., the input is received through the microphone.) Fan at 4:18 - 4:38.)
Determine whether a response corresponding to the voice input corresponds to a first response provided by retrieving information or a second response …, by analyzing the voice input based on at least one of the identified intention or the at least one parameter; (Fan teaches determining whether or not to provide a response to a user based on an analysis of input dialog data including user input speech and determining which skill to use which may comprise a request for information (i.e., determining whether to provide a first response by retrieving information or a second response which is one of the skills within the skill system). Fan at 8:9 - 8:19 and Fan at 11:23 - 11:42. Further, Fan teaches the electronic device determining whether or not to use a skill based on user preferences stored within a user profile. Fan at 12:22 - 12:67. Further, Fan teaches determining the output data based on the intent and the context data (i.e., analyzing the intent and parameter to determine the output). Fan at 5:24 - 6:51.)
In response to the determination of the response corresponding to the voice input corresponds to the first response, obtain a retrieval data by performing an information retrieval query using at least one parameter identified from the voice input and assign at least one wight value to the retrieval data based on user preference information, (Fan teaches determining a user's intent to interact with a device, then processing the information within the user's speech into pieces of the spoken input to determine the specific actions the user would like the device to take. (i.e., in response to determining the user input corresponds to the first response performing an action) Fan at 10:12 - 10:61. Further, Fan teaches retrieving data using one of the skill components of the device. Fan at 11:23 - 11:42. Further still, Fan teaches determining a first intent (i.e., at least one piece of information) from a set of potential intents and ranking the intents on based on contextual data (i.e., assigning weights). Fan at 6:30 - 6:57. Further, Fan teaches determining output data using a second intent that corresponds to user input (i.e., extracting feature information, (information corresponding to user input) from the acquired data (retrieval data) based on the given weight.) Fan at 6:30 - 6:57. Further still, Fan teaches a skill component including one or more databases wherein the skill component requests data or perform an action requested by the user (i.e., the intent of the user is determined and information query is performed, or a skill is activated.). Fan at 11:23 - 11:42.)
Obtain feature information at least partially similar to a defined preference information, from the retrieval data based on the at least one weight value…; (Fan teaches performing a request for information using one of the skill components. Fan at 11:23 - 11:42. Further, Fan teaches the electronic device comprising personal instances of the skills and storing user profile preferences that determine which skills may be used. (i.e., a skill component enabled by user profile preferences that retrieves information obtains information from a database or other information storage and outputs it to the user based on the user's profile preferences. As such, the feature information from the retrieval data is at least partially similar to a defined preference.) Fan at 11:52 - 12:49.)
Generate the response corresponding to the voice input including the feature information in the response; (Fan teaches using a TTS system to generate synthesized speech to be output as a response to the user input wherein the TTS system is employed by skills on the device to output responses to the user. (i.e., the skills retrieving and outputting information use the TTS system to generate responses to user input). Fan at 11:60 - 12:21.)
Output the generated response through the output device…; (Fan teaches outputting responses to the user in response to user input. (i.e., outputting the generated response through the output device). Fan at 14:13 - 14:24.) and
Determine the response corresponding to the voice input corresponds to the first response or the second response, based on at least one of an intention identified from the voice input by analyzing the voice input, an action determined based on the intention or a feature of at least one of the parameters. (Fan teaches determining a skill and user intent based on the analysis of the dialog data (i.e., voice input) including data representing the intent of the user and determining whether or not to use the skill (i.e., perform a retrieval of information or provide a response using a defined template (e.g., “No result could be retrieved”, “There was an error processing your request”, or a clarifying response such as “Would you like me to search for [an element of the user’s voice input] on [a search engine of user’s preference]”)) based on the user intent (i.e., based on the intention or a feature of the parameter). Fan at 29:1 - 29:25. Further, Fan teaches determining a skill based on user preferences stored in a user profile. Fan at 11:52 - 12:49. As such, determining whether or not to use a skill based on the intent of the user is determining whether to provide a first response or a second response.)
Fan, however, does not teach a second response provided by using a defined template, converting the voice input into text and identify an intention of the user and at least one parameter from the text using an automatic speech recognition module and a natural language understanding module, assigning at least one weight value to the retrieval data based on user preference information retrieved from a stored user profile by calculating a relevance score for each of a plurality of pieces of the retrieval data according to the degree of similarity between features of each piece of the retrieval data and the user preference information, and obtain feature information … wherein the feature information comprises at least one piece of the retrieval data having a highest relevance score, and outputting the generated response … as an audible speech output. 
In a similar field of endeavor (e.g., natural language processing of spoken queries.), Metallinou teaches using a response template to respond to a user based off of user preferences (i.e., providing a response using a defined template.) Metallinou at 20:14 - 20:32. As such, Metallinou's profile storing response templates is analogous to Fan's profile storage of user preferences. Thus, Fan's determination of which skill to use in view of Metallinou's response templates would have provided determining a first response by retrieving information or a second response using a defined template.
Further, Metallinou teaches converting the voice input into text and identify an intention of the user and at least one parameter from the text using an automatic speech recognition module and a natural language understanding module; (Metallinou teaches converting audio to text (generating text transcripts) then analyzing the text to determine meaning of the utterances (i.e., determine an intent) further, named entities and contextual data (i.e., at least one parameter) Metallinou at 2:3 - 2:49. As such, Metallinou’s teachings in view of Fan’s teachings of determining an intent of the user and processing intents and context data (i.e., at least one parameter) amount to the limitations of the claims. Fan at 5:24 - 6:51.)
Further, Metallinou teaches assigning at least one weight value to the retrieval data based on user preference information retrieved from a stored user profile by calculating a relevance score for each of a plurality of pieces of the retrieval data according to the degree of similarity between features of each piece of the retrieval data and the user preference information; (Metallinou teaches evaluating and scoring retrieval data (i.e., assigning a weight value to data corresponding to the domains retrieved that may be able to handle a request) by calculating relevance scores for the data corresponding to the domains (retrieval data). Metallinou at 22:43 - 22:64. Further, Metallinou teaches that the speech processing system takes into account user preferences, which includes applications that are enabled for the user. Metallinou at 25:41 - 26:4. As such, because the retrieval data determines domains capable of performing the user's query and the user preferences determines what applications are enabled, then the retrieval data determining domains capable of performing the user's query are based, at least in part, on user preference information retrieved from a stored user profile which are based on a relevance to the query, including the applications enabled for the user.)
Furthermore, Metallinou teaches, in combination with Fan laid out above, obtaining feature information at least partially similar to defined preference information, from the retrieval data based on the at least one weight value, wherein the feature information comprises at least one piece of the retrieval data having a highest relevance score; (Metallinou teaches evaluating and scoring retrieval data (i.e., assigning a weight value to data corresponding to the domains retrieved that may be able to handle a request) by calculating relevance scores for the data corresponding to the domains (retrieval data). Metallinou at 22:43 - 22:64. Further, Metallinou contemplates using the highest confidence score (i.e., relevance score, the relevance scores estimate confidence) to determine the proper domain. Metallinou at 22:43 - 22:64.)
Further, Metallinou teaches outputting the generated response … as an audible speech output; (Metallinou teaches outputting text as speech output using a text-to-speech system to output audible speech in a dialogue with the user. Metallinou at 10:19 - 10:32.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Fan with the teachings of Metallinou to provide a second response using a defined template. Doing so would have improved entity resolution during natural language processing by providing specific, personalized information for each user as recognized by Metallinou at 20:14 – 20:32.
Fan in view of Metallinou (hereinafter Fan-Metallinou), however, does not teach [determining] whether the generated response needs to be corrected based on whether the generated response includes the feature information; and in response to a determination that the generated response does not include the feature information, correct the generated response to include the feature information.
In a similar field of endeavor (e.g., processing of voice requests and generation of responses in response to the voice requests), Ikeno teaches [determining] whether the generated response needs to be corrected based on whether the generated response includes the feature information; (Ikeno teaches generating text responses, then correcting the responses according to a determined category (i.e., the system determines that the response needs corrected based on the category.) Ikeno at ¶¶ [0087] - [0094].) 
and in response to a determination that the generated response does not include the feature information, correct the generated response to include the feature information. (Ikeno teaches correcting the text responses based on the determined category (i.e., correcting the generated text using feature information.) Ikeno at ¶¶ [0087] - [0094]. Further, Ikeno in view of Metallinou provides this limitation. Metallinou teaches a slot filler component that replaces words with more accurate data retrieved to substitute broader elements. Metallinou at 30:12 - 30:27. This amounts to retrieving feature information and inserting that feature information where that information is not included (i.e., replacing broader elements with more specific elements.))
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Fan-Metallinou with the teachings of Ikeno to provide the limitations of claim 1. Doing so would have improved the communication between the user and the system by providing more accurate conversation analysis and recording as recognized by Ikeno at ¶¶ [0084] – [0094]. Further, Fan-Metallinou and Ikeno teach similar fields of endeavor. As such, a person of ordinary skill in the art would have been motivated to modify Fan-Metallinou with the teachings of Ikeno to provide correcting generated texts based on feature information.


Regarding claim 3, Fan-Metallinou in view of Ikeno (hereinafter Fan-Metallinou-Ikeno) teaches all the limitations of claim 1 as laid out above. Further, Fan teaches the electronic device of claim 1 wherein the memory stores information related to types of the response matched with intentions of the user, wherein the memory stores instructions which, when executed individually and/or collectively by the at least one processor, cause the electronic device to: identify whether a type of the response matched with the identified intention based on the information related to types of the response corresponds to the first response or the second response. (Fan teaches system memory comprising a memory for storing computer instructions related to types of responses matched with user intent. Fan at 29:1 - 29:25 and 34:30 - 34:53. Further, Fan teaches determining whether or not to generate a response based on the skill/intent information. Fan at 29:1 - 29:25.)

Regarding claim 4, Fan-Metallinou-Ikeno teaches all the limitations of claim 1 as laid out above. Further, Fan teaches the electronic device of claim 1, wherein the memory stores instructions which, when executed individually and/or collectively by the at least one processor, cause the electronic device to: determine a type of an action for providing the response based on the result of analyzing the voice input, and to determine the response corresponding to the voice input corresponds to the first response or the second response based on the determined type of the action. (Fan teaches determining which skill to use (i.e., type of action to perform) and generating a response based on the skill (i.e., generating a response based on the type of action.) Fan at 29:1 - 29:25 and 34:10 - 34:29. Further, Fan-Metallinou teaches determining the response corresponds to the first response or the second response as laid out above. As such, Fan’s teaching of determining which skill to use and generating a response based on the type of action is determining whether a response corresponds to a first response or a second response.)

Regarding claim 5, Fan-Metallinou-Ikeno teaches all the limitations of claim 1 as laid out above. Further, Fan teaches the electronic device of claim 1 wherein the memory stores instructions which, when executed individually and/or collectively by the at least one processor, cause the electronic device to: determine a feature of the at least one parameter, based on the result of analyzing the voice input, and determine the response corresponding to the voice input corresponds to the first response or the second response based on the determined feature. (Fan teaches determining an intent or meaning associated with a spoken input (i.e., determining a feature of the element) and determining whether or not to generate a response based on skill/intent information. Fan at 17:46 - 17:65 and 29:1 - 29:25. Further, Fan in view of Metallinou teaches determining the response corresponding to the voice input corresponds to the first or second response as laid out above. Thus, Fan in view of Metallinou teaches determining the first or second response based on a feature.)

Regarding claim 7, Fan-Metallinou-Ikeno teaches all the limitations of claim 1 as laid out above. Further, Fan teaches the electronic device of claim 6 wherein the memory stores instructions which, when executed individually and/or collectively by the at least one processor, cause the electronic device to: based on the feature information comprising a plurality of pieces of information, set priority of the plurality of pieces of information, based on the weight given to each of the plurality of pieces of information, and generate the response using the plurality of pieces of information based on the set priority. (Fan teaches ranking a plurality of intents, which were derived from analyzing voice input for meaning and intent (i.e., extracting feature information), and ranking the intents based on contextual data and selecting the highest-ranking intent (i.e., assigning a priority to the highest-ranking intent.) and invoking a skill based on the user input and intent that responds to the user input (i.e., generate the response based on the highest-ranking intent selected). Fan at 6:30 - 6:57.)

Regarding claim 8, Fan-Metallinou-Ikeno teaches all the limitations of claim 7 as laid out above. Further, Fan teaches the electronic device of claim 7, wherein the memory stores instructions which, when executed individually and/or collectively by the at least one processor, cause the electronic device to: generate the response such that each of a plurality of elements of the response corresponds to any one of the plurality of pieces of information, and determine an arrangement order of the plurality of elements based on the set priority. (Fan teaches ranking intents based on contextual data and selecting the highest-ranking intent (i.e., determining an arrangement order of the plurality of elements based on the priority) and invoking a skill to respond to the user based on the highest-ranking intent (i.e., generating a response based on the selected intent.)) Fan at 6:30 - 6:57.)

Regarding claim 9, Fan teaches an electronic device comprising: (Fan teaches an electronic device comprising one or more elements (i.e., computing components, server components, microphones, etc.). Fan at 34:30 - 34:53 and Fig. 12.)
A communication circuit; (Fan teaches the electronic device comprising an audio capture component such as microphone 1220 and network communication hardware. (i.e., a network communication circuit and an auditory communication circuit) Fan at 34:30 - 36:8 and Fig. 12.)
At least one processor including processing circuitry; and (Fan teaches the electronic device comprising one or more processors (i.e., at least one) and other computer components connected to the processor. Fan at 34:54 - 35:23 and Fig. 12.)
Memory storing instructions; (Fan teaches the electronic device comprises memory coupled to the processor storing computer executable instructions. Fan at 34:54 - 35:23 and Fig. 12.)
which, when executed individually and/or collectively by the at least one processor, cause the electronic device to: Obtain a voice input from an external electronic device connected through the communication circuit; (Fan teaches receiving auditory input from a secondary device (i.e., an external device) in the environment and sending that information to the original electronic device over a network (i.e., the device is connected through a network communication circuit.) Fan at 14:13 - 14:59.)
Determine whether a response corresponding to the voice input corresponds to a first response provided by retrieving information or a second response …, by analyzing the voice input based on at least one of the identified intention or the at least one parameter; (Fan teaches determining whether or not to provide a response to a user based on an analysis of input dialog data including user input speech and determining which skill to use which may comprise a request for information (i.e., determining whether to provide a first response by retrieving information or a second response which is one of the skills within the skill system). Fan at 8:9 - 8:19 and Fan at 11:23 - 11:42. Further, Fan teaches the electronic device determining whether or not to use a skill based on user preferences stored within a user profile. Fan at 12:22 - 12:67. Further still, Fan teaches determining the output data based on the intent and the context data (i.e., analyzing the intent and parameter to determine the output). Fan at 5:24 - 6:51.)
In response to the determination of the response corresponding to the voice input corresponds to the first response, obtain retrieval data by performing an information retrieval query based on user preference information using at least one parameter identified from the voice input and assign at least one weight value to the retrieval data, (Fan teaches determining a user's intent to interact with a device, then processing the information within the user's speech into pieces of the spoken input to determine the specific actions the user would like the device to take. (i.e., in response to determining the user input corresponds to the first response performing an action) Fan at 10:12 - 10:61. Further, Fan teaches retrieving data using one of the skill components of the device. Fan at 11:23 - 11:42. Further still, Fan teaches determining a first intent (i.e., at least one piece of information) from a set of potential intents and ranking the intents on based on contextual data (i.e., assigning weights). Fan at 6:30 - 6:57. Further, Fan teaches determining output data using a second intent that corresponds to user input (i.e., extracting feature information, (information corresponding to user input) from the acquired data (retrieval data) based on the given weight.) Fan at 6:30 - 6:57. Further still, Fan teaches a skill component including one or more databases wherein the skill component requests data or perform an action requested by the user (i.e., the intent of the user is determined and information query is performed, or a skill is activated.). Fan at 11:23 - 11:42.)
Obtain feature information at least partially similar to a defined preference information from the retrieval data based on the at least one weight value; (Fan teaches performing a request for information using one of the skill components. Fan at 11:23 - 11:42. Further, Fan teaches the electronic device comprising personal instances of the skills and storing user profile preferences that determine which skills may be used. (i.e., a skill component enabled by user profile preferences that retrieves information obtains information from a database or other information storage and outputs it to the user based on the user's profile preferences. As such, the feature information from the retrieval data is at least partially similar to a defined preference.) Fan at 11:52 - 12:49.)
Generate the response corresponding to the voice input including the feature information in the response; (Fan teaches using a TTS system to generate synthesized speech to be output as a response to the user input wherein the TTS system is employed by skills on the device to output responses to the user. (i.e., the skills retrieving and outputting information use the TTS system to generate responses to user input). Fan at 11:60 - 12:21.)
Control the communication circuit to transmit the generated response to the external electronic device; (Fan teaches transmitting, over a network, audio information to be output by a secondary device (i.e., an external device that is connected through a network communication circuit.) Fan at 14:13 - 14:59.)
Determine the response corresponding to the voice input corresponds to the first response or the second response, based on at least one of an intention identified from the voice input by analyzing the voice input, an action determined based on the intention, or a feature of the parameter. (Fan teaches determining a skill and user intent based on the analysis of the dialog data (i.e., voice input) including data representing the intent of the user and determining whether or not to use the skill (i.e., take an action) based on the user intent (i.e., based on the intention or a feature of the parameter). Fan at 29:1 - 29:25. Further, Fan teaches determining a skill based on user preferences stored in a user profile. Fan at 11:52 - 12:49. As such, determining whether or not to use a skill based on the intent of the user is determining whether to provide a first response or a second response.)
Fan, however, does not teach a second response provided using a defined template, converting the voice input into text and identify an intention of the user and at least one parameter from the text using an automatic speech recognition module and a natural language understanding module, assigning at least one weight value the retrieval data based on user preference information retrieved from a stored user profile by calculating a relevance score for each of a plurality of pieces of the retrieval data according to the degree of similarity between features of each piece of the retrieval data and the user preference information, obtaining feature information … wherein the feature information comprises at least one piece of the retrieval data having a highest relevance score, inserting feature information into the response, and outputting the generated response … as audible speech output. 
In a similar field of endeavor (e.g., natural language processing of spoken queries.), Metallinou teaches using a response template to respond to a user based off of user preferences (i.e., providing a response using a defined template.) Metallinou at 20:14 - 20:32. As such, Metallinou's profile storing response templates is analogous to Fan's profile storage of user preferences. Thus, Fan's determination of which skill to use in view of Metallinou's response templates would have provided determining a first response by retrieving information or a second response using a defined template.
Further, Metallinou teaches converting the voice input into text and identify an intention of the user and at least one parameter from the text using an automatic speech recognition module and a natural language understanding module (Metallinou teaches converting audio to text (generating text transcripts) then analyzing the text to determine meaning of the utterances (i.e., determine an intent) further, named entities and contextual data (i.e., at least one parameter) Metallinou at 2:3 - 2:49. As such, Metallinou’s teachings in view of Fan’s teachings of determining an intent of the user and processing intents and context data (i.e., at least one parameter) amount to the limitations of the claims. Fan at 5:24 - 6:51.).
Further, Metallinou teaches assigning at least one weight value to the retrieval data based on user preference information retrieved from a stored user profile by calculating a relevance score for each of a plurality of pieces of the retrieval data according to the degree of similarity between features of each piece of the retrieval data and the user preference information; (Metallinou teaches evaluating and scoring retrieval data (i.e., assigning a weight value to data corresponding to the domains retrieved that may be able to handle a request) by calculating relevance scores for the data corresponding to the domains (retrieval data). Metallinou at 22:43 - 22:64. Further, Metallinou teaches that the speech processing system takes into account user preferences, which includes applications that are enabled for the user. Metallinou at 25:41 - 26:4. As such, because the retrieval data determines domains capable of performing the user's query and the user preferences determines what applications are enabled, then the retrieval data determining domains capable of performing the user's query are based, at least in part, on user preference information retrieved from a stored user profile which are based on a relevance to the query, including the applications enabled for the user.)
Furthermore, Metallinou teaches, in combination with Fan laid out above, obtaining feature information at least partially similar to defined preference information, from the retrieval data based on the at least one weight value, wherein the feature information comprises at least one piece of the retrieval data having a highest relevance score; (Metallinou teaches evaluating and scoring retrieval data (i.e., assigning a weight value to data corresponding to the domains retrieved that may be able to handle a request) by calculating relevance scores for the data corresponding to the domains (retrieval data). Metallinou at 22:43 - 22:64. Further, Metallinou contemplates using the highest confidence score (i.e., relevance score, the relevance scores estimate confidence) to determine the proper domain. Metallinou at 22:43 - 22:64.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Fan with the teachings of Metallinou to provide a second response using a defined template. Doing so would have improved entity resolution during natural language processing by providing specific, personalized information for each user as recognized by Metallinou at 20:14 – 20:32.
Fan in view of Metallinou (hereinafter Fan-Metallinou), however, does not teach [determining] whether the generated response needs to be corrected based on whether the generated response includes the feature information; and in response to a determination that the generated response does not include the feature information, correct the generated response to include the feature information.
In a similar field of endeavor (e.g., processing of voice requests and generation of responses in response to the voice requests), Ikeno teaches [determining] whether the generated response needs to be corrected based on whether the generated response includes the feature information; (Ikeno teaches generating text responses, then correcting the responses according to a determined category (i.e., the system determines that the response needs corrected based on the category.) Ikeno at ¶¶ [0087] - [0094].) 
and in response to a determination that the generated response does not include the feature information, correct the generated response to include the feature information. (Ikeno teaches correcting the text responses based on the determined category (i.e., correcting the generated text using feature information.) Ikeno at ¶¶ [0087] - [0094]. Further, Ikeno in view of Metallinou provides this limitation. Metallinou teaches a slot filler component that replaces words with more accurate data retrieved to substitute broader elements. Metallinou at 30:12 - 30:27. This amounts to retrieving feature information and inserting that feature information where that information is not included (i.e., replacing broader elements with more specific elements.))
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Fan-Metallinou with the teachings of Ikeno to provide the limitations of claim 9. Doing so would have improved the communication between the user and the system by providing more accurate conversation analysis and recording as recognized by Ikeno at ¶¶ [0084] – [0094]. Further, Fan-Metallinou and Ikeno teach similar fields of endeavor. As such, a person of ordinary skill in the art would have been motivated to modify Fan-Metallinou with the teachings of Ikeno to provide correcting generated texts based on feature information.

Regarding claim 12, Fan-Metallinou-Ikeno teaches all the limitations of claim 9 as laid out above. Further, Fan teaches the electronic device of claim 11 wherein the instructions which, when executed individually and/or collectively by the at least one processor, cause the electronic device to: based on the extracted feature information comprising a plurality of pieces of information, set priority of the plurality of pieces of information based on the weight given to each of the plurality of pieces of information, and generate the response using the plurality of pieces of information based on the set priority. (Fan teaches ranking a plurality of intents, which were derived from analyzing voice input for meaning and intent (i.e., extracting feature information), and ranking the intents based on contextual data and selecting the highest-ranking intent (i.e., assigning a priority to the highest-ranking intent.) and invoking a skill based on the user input and intent that responds to the user input (i.e., generate the response based on the highest-ranking intent selected). Fan at 6:30 - 6:57.)

Regarding claim 13, Fan-Metallinou-Ikeno teaches all the limitations of claim 12 as laid out above. Further, Fan teaches the electronic device of claim 12 wherein the instructions which, when executed individually and/or collectively by the at least one processor, cause the electronic device to: generate the response such that each of a plurality of elements of the response corresponds to any one of the plurality of pieces of information, and determine an arrangement order of the plurality of elements based on the set priority. (Fan teaches ranking intents based on contextual data and selecting the highest-ranking intent (i.e., determining an arrangement order of the plurality of elements based on the priority) and invoking a skill to respond to the user based on the highest-ranking intent (i.e., generating a response based on the selected intent.)) Fan at 6:30 - 6:57.)

Regarding claim 14, Fan teaches a method for providing a response to a voice input, the method comprising: obtaining the voice input; (Fan teaches a computer implemented method. Fan at 36:61 - 36:67. Fan teaches receiving first dialog data (i.e., voice input) representing a first user input where the input is spoken (i.e., the input is received through the microphone.) Fan at 4:18 - 4:38.)
determining whether the response corresponding to the voice input corresponds to a first response provided by retrieving information or a second response …, by analyzing the voice input; (Fan teaches determining whether or not to provide a response to a user based on an analysis of input dialog data including user input speech and determining which skill to use which may comprise a request for information (i.e., determining whether to provide a first response by retrieving information or a second response which is one of the skills within the skill system). Fan at 8:9 - 8:19 and Fan at 11:23 - 11:42. Further, Fan teaches the electronic device determining whether or not to use a skill based on user preferences stored within a user profile. Fan at 12:22 - 12:67. Further, Fan teaches determining the output data based on the intent and the context data (i.e., analyzing the intent and parameter to determine the output). Fan at 5:24 - 6:51.)
in response to the determination of the response corresponding to the voice input corresponds to the first response, obtaining a retrieval data by performing an information retrieval query using at least one parameter identified from the voice input and assigning at least one weight value to the retrieval data based on user preference information, (Fan teaches determining a user's intent to interact with a device, then processing the information within the user's speech into pieces of the spoken input to determine the specific actions the user would like the device to take. (i.e., in response to determining the user input corresponds to the first response performing an action) Fan at 10:12 - 10:61. Further, Fan teaches retrieving data using one of the skill components of the device. Fan at 11:23 - 11:42. Further still, Fan teaches determining a first intent (i.e., at least one piece of information) from a set of potential intents and ranking the intents on based on contextual data (i.e., assigning weights). Fan at 6:30 - 6:57. Further, Fan teaches determining output data using a second intent that corresponds to user input (i.e., extracting feature information, (information corresponding to user input) from the acquired data (retrieval data) based on the given weight.) Fan at 6:30 - 6:57. Further still, Fan teaches a skill component including one or more databases wherein the skill component requests data or perform an action requested by the user (i.e., the intent of the user is determined and information query is performed, or a skill is activated.). Fan at 11:23 - 11:42.)
obtaining feature information at least partially similar to a defined preference information, from the retrieval data based on the at least one weight value, wherein the feature information comprises at least one piece of the retrieval data having a highest relevance score; (Fan teaches performing a request for information using one of the skill components. Fan at 11:23 - 11:42. Further, Fan teaches the electronic device comprising personal instances of the skills and storing user profile preferences that determine which skills may be used. (i.e., a skill component enabled by user profile preferences that retrieves information obtains information from a database or other information storage and outputs it to the user based on the user's profile preferences. As such, the feature information from the retrieval data is at least partially similar to a defined preference.) Fan at 11:52 - 12:49.)
generating the response corresponding to the voice input including the feature information in the response; (Fan teaches using a TTS system to generate synthesized speech to be output as a response to the user input wherein the TTS system is employed by skills on the device to output responses to the user. (i.e., the skills retrieving and outputting information use the TTS system to generate responses to user input). Fan at 11:60 - 12:21.)
and outputting the generated response through the output device; (Fan teaches outputting responses to the user in response to user input. (i.e., outputting the generated response through the output device). Fan at 14:13 - 14:24.)
wherein the determining whether the response corresponds to the first response or the second response comprises determining the response corresponding to the voice input corresponds to the first response or the second response, based on at least one of an intention identified from the voice input by analyzing the voice input, an action determined based on the intention or a feature of the parameter. (Fan teaches determining a skill and user intent based on the analysis of the dialog data (i.e., voice input) including data representing the intent of the user and determining whether or not to use the skill (i.e., take an action) based on the user intent (i.e., based on the intention or a feature of the parameter). Fan at 29:1 - 29:25. Further, Fan teaches determining a skill based on user preferences stored in a user profile. Fan at 11:52 - 12:49. As such, determining whether or not to use a skill based on the intent of the user is determining whether to provide a first response or a second response.)
Fan, however, does not teach a second response provided using a defined template, converting the voice input into text and identify an intention of the user and at least one parameter from the text using an automatic speech recognition module and a natural language understanding module, assigning at least one weight value the retrieval data based on user preference information retrieved from a stored user profile by calculating a relevance score for each of a plurality of pieces of the retrieval data according to the degree of similarity between features of each piece of the retrieval data and the user preference information, obtaining feature information … wherein the feature information comprises at least one piece of the retrieval data having a highest relevance score, inserting feature information into the response, and outputting the generated response … as audible speech output. 
In a similar field of endeavor (e.g., natural language processing of spoken queries.), Metallinou teaches using a response template to respond to a user based off of user preferences (i.e., providing a response using a defined template.) Metallinou at 20:14 - 20:32. As such, Metallinou's profile storing response templates is analogous to Fan's profile storage of user preferences. Thus, Fan's determination of which skill to use in view of Metallinou's response templates would have provided determining a first response by retrieving information or a second response using a defined template.
Further, Metallinou teaches converting the voice input into text and identify an intention of the user and at least one parameter from the text using an automatic speech recognition module and a natural language understanding module (Metallinou teaches converting audio to text (generating text transcripts) then analyzing the text to determine meaning of the utterances (i.e., determine an intent) further, named entities and contextual data (i.e., at least one parameter) Metallinou at 2:3 - 2:49.).
Further, Metallinou teaches assigning at least one weight value to the retrieval data based on user preference information retrieved from a stored user profile by calculating a relevance score for each of a plurality of pieces of the retrieval data according to the degree of similarity between features of each piece of the retrieval data and the user preference information; (Metallinou teaches evaluating and scoring retrieval data (i.e., assigning a weight value to data corresponding to the domains retrieved that may be able to handle a request) by calculating relevance scores for the data corresponding to the domains (retrieval data). Metallinou at 22:43 - 22:64. Further, Metallinou teaches that the speech processing system takes into account user preferences, which includes applications that are enabled for the user. Metallinou at 25:41 - 26:4. As such, because the retrieval data determines domains capable of performing the user's query and the user preferences determines what applications are enabled, then the retrieval data determining domains capable of performing the user's query are based, at least in part, on user preference information retrieved from a stored user profile which are based on a relevance to the query, including the applications enabled for the user.)
Furthermore, Metallinou teaches, in combination with Fan laid out above, obtaining feature information at least partially similar to defined preference information, from the retrieval data based on the at least one weight value, wherein the feature information comprises at least one piece of the retrieval data having a highest relevance score; (Metallinou teaches evaluating and scoring retrieval data (i.e., assigning a weight value to data corresponding to the domains retrieved that may be able to handle a request) by calculating relevance scores for the data corresponding to the domains (retrieval data). Metallinou at 22:43 - 22:64. Further, Metallinou contemplates using the highest confidence score (i.e., relevance score, the relevance scores estimate confidence) to determine the proper domain. Metallinou at 22:43 - 22:64.)
Further, Metallinou teaches outputting the generated response … as an audible speech output; (Metallinou teaches outputting text as speech output using a text-to-speech system to output audible speech in a dialogue with the user. Metallinou at 10:19 - 10:32.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Fan with the teachings of Metallinou to provide a second response using a defined template. Doing so would have improved entity resolution during natural language processing by providing specific, personalized information for each user as recognized by Metallinou at 20:14 – 20:32.
Fan in view of Metallinou (hereinafter Fan-Metallinou), however, does not teach [determining] whether the generated response needs to be corrected based on whether the generated response includes the feature information; and in response to a determination that the generated response does not include the feature information, correct the generated response to include the feature information.
In a similar field of endeavor (e.g., processing of voice requests and generation of responses in response to the voice requests), Ikeno teaches [determining] whether the generated response needs to be corrected based on whether the generated response includes the feature information; (Ikeno teaches generating text responses, then correcting the responses according to a determined category (i.e., the system determines that the response needs corrected based on the category.) Ikeno at ¶¶ [0087] - [0094].) 
and in response to a determination that the generated response does not include the feature information, correct the generated response to by inserting the feature information into the response. (Ikeno teaches correcting the text responses based on the determined category (i.e., correcting the generated text using feature information.) Ikeno at ¶¶ [0087] - [0094]. Further, Ikeno in view of Metallinou provides this limitation. Metallinou teaches a slot filler component that replaces words with more accurate data retrieved to substitute broader elements. Metallinou at 30:12 - 30:27. This amounts to retrieving feature information and inserting that feature information where that information is not included (i.e., replacing broader elements with more specific elements.))
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date to combine the teachings of Fan-Metallinou with the teachings of Ikeno to provide the limitations of claim 14. Doing so would have improved the communication between the user and the system by providing more accurate conversation analysis and recording as recognized by Ikeno at ¶¶ [0084] – [0094]. Further, Fan-Metallinou and Ikeno teach similar fields of endeavor. As such, a person of ordinary skill in the art would have been motivated to modify Fan-Metallinou with the teachings of Ikeno to provide correcting generated texts based on feature information.

Regarding claim 16, Fan-Metallinou-Ikeno teaches all the limitations of claim 14 as laid out above. Further, Fan teaches the method of claim 14, wherein determining whether the response corresponds to the first response or the second response comprises identifying whether a type of the response matched with the identified intention based on the information related to types of the response corresponds to the first response or the second response. (Fan teaches system memory comprising a memory for storing computer instructions related to types of responses matched with user intent. Fan at 29:1 - 29:25 and 34:30 - 34:53. Further, Fan teaches determining whether or not to generate a response based on the skill/intent information. Fan at 29:1 - 29:25. As such, Fan teaches determining a type of response based on the intent of the user which entails determining whether a generated response uses a defined response template (i.e., the second response) or whether the response comprises retrieving information (i.e., the first response.).)

Regarding claim 17, Fan-Metallinou-Ikeno teaches all the limitations of claim 14, as laid out above. Further, Fan teaches the method of claim 14, wherein determining whether the response corresponds to the first response or the second response comprises determining a type of an action for providing the response based on the result of analyzing the voice input, and determining the response corresponding to the voice input corresponds to the first response or the second response based on the determined type of the action. (Fan teaches determining which skill to use (i.e., type of action to perform) and generating a response based on the skill (i.e., generating a response based on the type of action.) Fan at 29:1 - 29:25 and 34:10 - 34:29. Further, Fan-Metallinou teaches determining the response corresponds to the first response or the second response as laid out above. As such, Fan’s teaching of determining which skill to use and generating a response based on the type of action is determining whether a response corresponds to a first response or a second response.)

Regarding claim 18, Fan-Metallinou-Ikeno teaches all the limitations of claim 14 as laid out above. Further, Fan teaches the method of claim 14, wherein determining whether the response corresponds to the first response or the second response comprises: determining a feature of the at least one parameter, based on the result of analyzing the voice input, and determining the response corresponding to the voice input corresponds to the first response or the second response based on the determined feature. (Fan teaches determining an intent or meaning associated with a spoken input (i.e., determining a feature of the element) and determining whether or not to generate a response based on skill/intent information. Fan at 17:46 - 17:65 and 29:1 - 29:25. Further, Fan in view of Metallinou teaches determining the response corresponding to the voice input corresponds to the first or second response as laid out above. Thus, Fan in view of Metallinou teaches determining the first or second response based on a feature of the element.)

Regarding claim 20, Fan-Metallinou-Ikeno teaches all the limitations of claim 14 as laid out above. Further, Fan teaches the method of claim 14, wherein generating the response comprises: based on the extracted feature information comprising a plurality of pieces of information, setting priority of the plurality of pieces of information based on the weight given to each of the plurality of pieces of information; determining an arrangement order of a plurality of elements corresponding to the plurality of pieces of information, respectively, based on the set priority; and generating the response to include the plurality of elements based on the arrangement order of the plurality of elements. (Fan teaches ranking a plurality of intents, which were derived from analyzing voice input for meaning and intent (i.e., extracting feature information), and ranking the intents based on contextual data and selecting the highest-ranking intent (i.e., assigning a priority to the highest-ranking intent.) and invoking a skill based on the user input and intent that responds to the user input (i.e., generate the response based on the highest-ranking intent selected). Fan at 6:30 - 6:57. Further, in order to generate the response to user input, Fan teaches determining “slots” (i.e., elements) of the response that are to be filled in later. These slots are linked to the spoken input with the intents interpreted therefrom (i.e., the elements correspond to the plurality of pieces of information which are ranked). Thus, the response is generated using a plurality of pieces of information that are linked to a plurality of elements which are ranked (i.e., arrangement order).)

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CAMERON KENNETH YOUNG whose telephone number is (703)756-1527. The examiner can normally be reached Mon - Fri, 9:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CAMERON KENNETH YOUNG/Examiner, Art Unit 2655                                                                                                                                                                                                        
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Show 4 earlier events
Sep 11, 2024
Applicant Interview (Telephonic)
Oct 18, 2024
Response Filed
Jan 13, 2025
Final Rejection mailed — §103
Mar 13, 2025
Request for Continued Examination
Mar 15, 2025
Response after Non-Final Action
May 22, 2025
Non-Final Rejection mailed — §103
Aug 21, 2025
Response Filed
Nov 18, 2025
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/131,815
Patent 12619827
SYSTEM AND METHOD FOR INTELLIGENT GENERATION OF PRIVILEGE LOGS
3y 1m to grant Granted May 05, 2026
17/999,850
Patent 12602409
INFORMATION SEARCH SYSTEM
3y 4m to grant Granted Apr 14, 2026
18/290,574
Patent 12592230
RECOGNITION OR SYNTHESIS OF HUMAN-UTTERED HARMONIC SOUNDS
2y 4m to grant Granted Mar 31, 2026
17/974,455
Patent 12567429
VOICE CALL CONTROL METHOD AND APPARATUS, COMPUTER-READABLE MEDIUM, AND ELECTRONIC DEVICE
3y 4m to grant Granted Mar 03, 2026
18/619,608
Patent 12525250
Cascade Architecture for Noise-Robust Keyword Spotting
1y 9m to grant Granted Jan 13, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
73%
Grant Probability
89%
With Interview (+16.1%)
2y 11m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 22 resolved cases by this examiner. Grant probability derived from career allowance rate.