Last updated: April 19, 2026
Application No. 17/895,731
ELECTRONIC APPARATUS AND METHOD OF CONTROLLING THE SAME

Final Rejection §103
Filed
Aug 25, 2022
Examiner
SERRAGUARD, SEAN ERIN
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Samsung Electronics Co., Ltd.
OA Round
4 (Final)
Interview Optional

— +33.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 134 resolved cases, 2023–2026
Examiner Intelligence

SERRAGUARD, SEAN ERIN View full profile →
Grants 69% — above average
Career Allow Rate
92 granted / 134 resolved
+6.7% vs TC avg
Strong +34% interview lift
Without
With
+33.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
43 currently pending
Career history
177
Total Applications
across all art units
Statute-Specific Performance

§101
9.4%
-30.6% vs TC avg
§103
49.7%
+9.7% vs TC avg
§102
18.6%
-21.4% vs TC avg
§112
19.2%
-20.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 134 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Response to Amendments 
Applicant’s amendment filed on 18 November 2025 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 1, 11, 16, and 19-22 has been acknowledged and entered.  
In view of the amendment to claims 21 and 22, the objection to claims 21 and 22 is withdrawn.
In view of the amendment to claim(s) 1, 11, and 16, the rejection of claim(s) 1-3, 5-13, and 15-17, and 19-22 under 35 U.S.C. §103 is withdrawn. 
In light of the amended claims, new grounds for rejection under 35 U.S.C. §103 are provided in the response below. 

Response to Arguments
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §103, see pages 9-14 of the Response to Non-Final Office Action dated 18 August 2025, which was received on 18 November 2025 (hereinafter Response and Office Action, respectively), have been fully considered.
With respect to the rejection(s) of claim(s) 1 and 11 under 35 U.S.C. 103 as being obvious over Gurram (U.S. Pat. App. Pub. No. 2005/0240404, hereinafter Gurram) in view of Hwang (U.S. Pat. App. Pub. No. 2021/0043204, hereinafter Hwang), applicant asserts that the cited references fail to teach or suggest all limitations of the claims as amended. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 1 and 11 under 35 U.S.C. §103 are withdrawn.
With respect to the rejection(s) of claim(s) 16 under 35 U.S.C. 103 as being obvious over Gurram in view of Gupta (U.S. Pat. App. Pub. No. 2020/0184967, hereinafter Gupta), applicant generally presents two arguments (1) that Gurram and Gupta fails to disclose “a first type of speech recognition associated with on-device speech recognition engines and a second type of speech recognition associated with server speech recognition engines”: and (2) that Gurram in view of Gupta fails to teach or suggest all limitations of the claims as presented. These arguments are addressed individually below.
In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., “a first type of speech recognition associated with on-device speech recognition engines and a second type of speech recognition associated with server speech recognition engines”) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Applicant is invited to amend the claims, during normal prosecution and in light of specification support, such that the claims recite the desired limitations.
Claim 16, as amended, does recite “wherein the subset of voice recognition engines comprises one from among a first subset comprising a plurality of on-device voice recognition engines which are stored on the electronic device, and a second subset comprising a plurality of server voice recognition engines which are stored on a server,” which shares some common terminology with the limitation recited above. (Instant Application, Claim 16). Respectfully, and in light of the arguments presented, applicant appears to be arguing that the different subsets, insofar as they can be argued to be different, are equivalent to different types. However, this argument is not persuasive.
The specification of the Instant Application fails to provide clear support to indicate that different “subsets” refers to, or can otherwise be treated as tantamount to, different types. The commonly understood meaning of “subsets” is a selection or a portion from a related larger group. In looking to the specification to determine if the applicant has redefined the term “subset,” no evidence that the applicant has chosen to be their own lexicographer could be found. Applicant does indicate support for the amendment to claim 16 can be found at paragraph [0144], and the amendment does in fact find support at paragraph [0144]. (See Response, pg. 9 citing Instant Application, at ¶ [0144]).  However, the paragraph [0144] does not support the implied meaning of subsets as types. In fact, the disclosed “subsets” described at paragraph [0144] are understood as indicating the opposite conclusion. (Id.)
As disclosed in the specification, “the on-device engine or the server engine may be considered subsets of the engines.” (Instant Application, ¶ [0144], emphasis added). In this context, the word “considered” indicates a view or perspective. When read together, “considered subsets” is understood as viewing each of the “on-device engine” or the “server engine” as a selection from a related larger group (e.g., viewing each of them as a subset with respect to the other or with respect to a larger group).  The target object is not changed by the consideration and, as such, the subsets are not understood as types (i.e., that the word “subset” does not reflect a different part, only a different name used to describe the same part). Therefore, in light of the understood meaning of the word and the context provided by the specification, the interpretation of the word “subset” as indicating a “type” is not persuasive.
Regarding the second argument, applicants argument in light of the amendment to claim 16 is persuasive. As such, the rejections of claim 16 under 35 U.S.C. §103 are withdrawn.
Applicant further argues that the rejection(s) of dependent claims 2-3, 5-10, 12-13, 15, 17, and 19-22 should be withdrawn for at least the same reasons as independent claims 1, 11, and 16. Applicant’s arguments related to claims 1, 11, and 16, in light of the amended claims, are persuasive. As such, the rejections of claims 2-3, 5-10, 12-13, 15, 17, and 19-22 under 35 U.S.C. §103 are withdrawn. 
However, upon further consideration, new ground(s) of rejection under 35 U.S.C. §103 are made in light of combinations of Gurram, Gupta, and newly cited reference Gruber (U.S. Pat. App. Pub. No. 2012/0265528, hereinafter Gruber).
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 5-6, 8-13, and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gurram in view of Gruber.

Regarding claim 1, Gurram discloses An electronic device comprising: a processor (“the speech recognition engines 101 are associated with a client computer 107” where computers are commonly understood in the art to include a processor.; Gurram, ¶ [0025]) configured to: receive a user voice input (“client computer 107 may use, or be associated with, microphone(s) or other physical element for capturing the spoken input 102.”; Gurram, ¶ [0025]); identify at least one state of the electronic device… from among a plurality of states of the electronic device (Though not explicitly defined as states, the system identifies selection information, where the selection information includes “heuristics,” “application information,” and “user preference information,” as well as subcategories thereof, which are “associated with the user interface” and where the selection information includes a determination of a category/type of selectable information, and a determination of device status with regards to the category/type {at least one state of the electronic device}. The state is the device status with regards to the category defined by said selection information. In one example, the “best-suited speech recognition engine may be the speech recognition engine with fewer misrecognitions, over time, of spoken input issued by a user” where the category defined by the selection information is “speech recognition engine with fewer misrecognitions,” determined based on “track[ing] which of the speech recognition engines 101 is best at recognizing a particular user’s voice,” and the state is either fewer or not fewer for the speech recognition engine with the electronic device being in the active state. The category is defined with relation to the electronic device (e.g., in the sample above, “fewer misrecognitions” is a function of both the device being in the active state and engine performance), and therefore is related to the electronic device. Further, a determination of a category/type necessarily includes the possibility of other categories/types and associated device states of those categories/types {from among a plurality of states of the electronic device}; Gurram, ¶ [0036]-[0039], [0058], [0066]); select a voice recognition engine corresponding to the identified at least one state, from among a plurality of voice recognition engines, (The device status with regards to the category is defined by the category, and therefore corresponds to the category. Further, the category is defined with relation to the electronic device (e.g., in the sample above, the device with “fewer misrecognitions” can be selected based on “track[ing] which of the speech recognition engines 101 is best at recognizing a particular user’s voice”) and the selection of “the best” indicates a plurality of speech recognition engines from which one is chosen.; Gurram, ¶ [0037], [0043]) based on correlations between the plurality of voice recognition engines and the plurality of states (“Based on the received spoken input, and using portions of the accessed selection information alone or in combination, an available speech recognition engine best suited to process the spoken input is selected” where the best suited speech recognition engine is selected based on a particular user preference for the available speech recognition engine “for a particular task(s)” {based on the intent}; Gurram, ¶ [0047]); and perform an operation corresponding to the user voice input based on the selected voice recognition engine, (“The received spoken input 102 may be transferred to the selected speech recognition engine using the accessed session. The received spoken input is then processed using the selected speech recognition engine (260).”; Gurram, ¶ [0048]). However, Gurram fail(s) to expressly recite identify at least one state of the electronic device corresponding to at least one item related to the electronic device from among a plurality of states of the electronic device, wherein the at least one item comprises at least one of: a network state, an account-login state, a voice speaker, a voice input route, trigger presence or an application running state.
Gruber teaches systems and methods for “for improving interpretation and processing of commands provided to [a virtual assistant]”. (Gruber, ¶ [0007]). Regarding claim 1, Gruber teaches identify at least one state of the electronic device (The “virtual assistant 1002 acquires and applies a variety of contextual information {identify at least one state} to perform information processing functions,”; Gruber, ¶ [0171]) corresponding to at least one item related to the electronic device from among a plurality of states of the electronic device (The contextual information can be derived from and thus corresponds to “several sources of context, including for example device sensor data 1056, application preferences and usage history 1072, dialog history and assistant memory 1052, personal databases 1058, personal acoustic context data 1080, current application context 1060, and event context 2706 {corresponding to at least one item}” as derived by the “virtual assistant 1002 {related to the electronic device}”; Gruber, ¶ [0179]) wherein the at least one item comprises at least one of: a network state, an account-login state, a voice speaker, a voice input route, trigger presence or an application running state (The system can use “vocabulary from user personal database(s) 1058” can influence the “choice and tuning of the statistical language model 1029,” where vocabulary from user personal databases 1058 includes the “voice speaker,” which is understood under the BRI as speaker identification. To clarify, a user personal database is associated with an account for the user and the user must be at least known by the system to associate the personal databases 1058. Further, the system “can use dialog state context to select a custom statistical language model 1029,” including the use of specific keywords, such as learned keywords {trigger presence} in selecting “an SLM 1029 that biases toward hearing these words.” As well, each of the “Personal Acoustic Context Data 1080” and “Device Sensor Data 1056… can be used to select from possible SLMs 1029 or otherwise tune them to optimize for recognized acoustical contexts” which includes “the properties of the microphones and cameras in use {a voice input route}” and “the current networks being used, and signatures of connected networks {a network state}”; Gruber, ¶ [0223]-[0224], [0318]-[0319], [0321], [0333], [0335]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition engine selection systems of Gurram to incorporate the teachings of Gruber to include identify at least one state of the electronic device corresponding to at least one item related to the electronic device from among a plurality of states of the electronic device, wherein the at least one item comprises at least one of: a network state, an account-login state, a voice speaker, a voice input route, trigger presence or an application running state. The systems and methods of Gruber incorporate “context information” from a wide variety of sources “to supplement natural language or gestural input from a user,” including the use of such context for selection between available language models, such as for use in speech to text and natural language understanding, which  “helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input,” thus providing the known benefit of improved operational performance in light of changes in operational state for the device and/or the user, as recognized by Gruber. (Gruber, ¶ [0013], [0317]-[0321]).

Regarding claim 2, the rejection of claim 1 is incorporated. Gurram and Gruber disclose all of the elements of the current invention as stated above. Gurram further discloses wherein the plurality of voice recognition engines comprises an on-device voice recognition engine provided in the electronic device (“In one implementation, the speech recognition engines 101 may be stored locally on the client computer 107. “; Gurram, ¶ [0025]), and a server voice recognition engine provided in a server (“In another implementation, the speech recognition engines 101 may be stored remotely, for example, on a server computer that is separate from the client computer 107.”; Gurram, ¶ [0025]), and wherein the processor is further configured to select the on-device voice recognition engine or the server voice recognition engine (“the system 100 may use the speech recognition engine selector 120 to select one or more of the detected available speech recognition engines 101, to be used in recognizing the spoken input 102.”; Gurram, ¶ [0037], [0043]). 

Regarding claim 3, the rejection of claim 1 is incorporated. Gurram and Gruber disclose all of the elements of the current invention as stated above. Gurram further discloses wherein the processor is further configured to: identify an intent corresponding to the received user voice input from among a plurality of intents (“it also may be the case that multiple ones of the plurality of speech recognition engines 325 are each simultaneously maintained and associated with a defined session. In this way, a user may switch from one speech recognition engine to another while executing a plurality of tasks or while using a plurality of applications, so that a preferred engine is always being used” where switching from the first speech recognition engine to the second speech recognition engine is based on a user intent; Gurram, ¶ [0059]), and select the voice recognition engine based on correlations between the plurality of voice recognition engines and the plurality of intents (In one example, “the session manager 130 may be aware that a first speech recognition engine was selected for its abilities in interpreting dictation from the user, while a second speech recognition engine was selected for its abilities in taking navigation commands from the user in the context of a web interface. As a result, if the user switches from dictating text to navigating a web interface, then the session manager may switch from a first session with the first speech recognition engine to a second session with the second speech recognition engine.”; Gurram, ¶ [0063]). 

Regarding claim 5, the rejection of claim 3 is incorporated. Gurram and Gruber disclose all of the elements of the current invention as stated above. Gurram further discloses further comprising a storage configured to store first reference data in which at least one intent from among the plurality of intents is assigned to the plurality of voice recognition engines (“the user preference information may be tracked over time, so that, for example, if a particular user consistently uses a particular speech recognition engine for a particular task, then that engine may be automatically selected when the user initiates that task”; Gurram, ¶ [0039]). 

Regarding claim 6, the rejection of claim 5 is incorporated. Gurram and Gruber disclose all of the elements of the current invention as stated above. Gurram further discloses wherein the storage is configured to store second reference data including at least one of the correlations between the correlations between the plurality of voice recognition engines and the plurality of states (“The selection information 122 also includes application information 126 relating to a particular application being used” as well as “user preference information 128” where “User preference information 128 may relate to a user’s determination of preferred speech recognition engines for use in an application.”; Gurram, ¶ [0038]-[0039]), and the correlations between the plurality of voice recognition engines and the plurality of intents. (“the user preference information may be tracked over time, so that, for example, if a particular user consistently uses a particular speech recognition engine for a particular task, then that engine may be automatically selected when the user initiates that task,” and “If more than one user is authorized to use the system 100, each user may have a different preference in this regard.”; Gurram, ¶ [0038]-[0039], [0024]). 

Regarding claim 8, the rejection of claim 6 is incorporated. Gurram and Gruber disclose all of the elements of the current invention as stated above. Gurram further discloses wherein the processor is further configured to adjust at least one of the correlations between the correlations between the plurality of voice recognition engines and the plurality of states, and the correlations between the plurality of voice recognition engines and the plurality of intents based on a recognition result of the user voice input (The selection information includes heuristics, where “the interface information is associated with available speech recognition engines based on the heuristics {calculate a correlation of each voice recognition engine}” where “The heuristics 124” include “keep[ing] track of which speech recognition engines are better suited to particular environments or tasks” including which of the speech recognition engines “achieves the best results with a particular application {relates to the intent of the user input}” where best results with a particular application is based on a recognition result of the user voice.; Gurram, ¶ [0058], [0037], [0034]). 

Regarding claim 9, the rejection of claim 6 is incorporated. Gurram and Gruber disclose all of the elements of the current invention as stated above. Gurram further discloses wherein the processor is further configured to: control the storage to store data about history information corresponding to recognition results of the user voice input (“The selection information 122 also includes application information 126 relating to a particular application being used,” as well as “user preference information 128” where “User preference information 128 may relate to a user’s determination of preferred speech recognition engines for use in an application.”; Gurram, ¶ [0038]-[0039]); and select the voice recognition engine from among a plurality of voice recognition engines having a same correlation based on the history information (“the speech recognition engine information 123, the heuristics 124, the interface information 126, and the user preference information 128 may be used alone or in combination to select the search recognition engine to be used in recognizing the spoken input 102” As such, and relying on the example of number of misrecognitions, the preferred speech recognition engine can be selected based on user preference based on history information, even in light of the plurality of voice recognition engines having the same correlation to the number of misrecognitions based on the history information.; Gurram, ¶ [0043], [0037]). 

Regarding claim 10, the rejection of claim 6 is incorporated. Gurram and Gruber disclose all of the elements of the current invention as stated above. Gurram further discloses wherein the processor is further configured to: control the storage to store data about history information corresponding to recognition results of the user voice input (“Heuristics 124 also are included in the selection information 122” where “the heuristics 124 may track which of the speech recognition engines 101 is best at recognizing a particular user’s voice. In this example, the best-suited speech recognition engine may be the speech recognition engine with fewer misrecognitions, over time, of spoken input issued by a user.”; Gurram, ¶ [0033], [0037]); and generate a rule for identifying the voice recognition engine based on the history information (“the engine selector 120” may use “selection information 122” as part of “selecting one of the speech recognition engines 101 for use” where selection information indicating a preferred engine and selection of an engine based on the selection information is a rule for identifying the voice recognition engine based on the history information.; Gurram, ¶ [0031], [0033]). 

Regarding claim 11, Gurram discloses A method of controlling an electronic device (Systems and methods described with reference to “the speech recognition engines 101... associated with a client computer 107”; Gurram, ¶ [0025]), comprising: receiving a user voice input (“client computer 107 may use, or be associated with, a microphone(s) or other physical element for capturing the spoken input 102.”; Gurram, ¶ [0025]); identifying at least one state of the electronic device...from among a plurality of states of the electronic device (Though not explicitly defined as states, the system identifies selection information, where the selection information includes “heuristics,” “application information,” and “user preference information,” as well as subcategories thereof, which are “associated with the user interface” and where the selection information includes a determination of a category/type of selectable information, and a determination of device status with regards to the category/type {at least one state of the electronic device}. The state is the device status with regards to the category defined by said selection information. In one example, the “best-suited speech recognition engine may be the speech recognition engine with fewer misrecognitions, over time, of spoken input issued by a user” where the category defined by the selection information is “speech recognition engine with fewer misrecognitions,” determined based on “track[ing] which of the speech recognition engines 101 is best at recognizing a particular user’s voice,” and the state is either fewer or not fewer for the speech recognition engine with the electronic device being in the active state. The category is defined with relation to the electronic device (e.g., in the sample above, “fewer misrecognitions” is a function of both the device being in the active state and engine performance), and therefore is related to the electronic device. Further, a determination of a category/type necessarily includes the possibility of other categories/types and associated device states of those categories/types {from among a plurality of states of the electronic device}; Gurram, ¶ [0036]-[0039], [0058], [0066]) ; selecting a voice recognition engine corresponding to the identified state, from among a plurality of voice recognition engines, (The device status with regards to the category is defined by the category, and therefore corresponds to the category. Further, the category is defined with relation to the electronic device (e.g., in the sample above, the device with “fewer misrecognitions” can be selected based on “track[ing] which of the speech recognition engines 101 is best at recognizing a particular user’s voice”) and the selection of “the best” indicates a plurality of speech recognition engines from which one is chosen.; Gurram, ¶ [0037], [0043]) based on correlations between the plurality of voice recognition engines and the plurality of states (“Based on the received spoken input, and using portions of the accessed selection information alone or in combination, an available speech recognition engine best suited to process the spoken input is selected” where the best suited speech recognition engine is selected based on a particular user preference for the available speech recognition engine “for a particular task(s)” {based on the intent}; Gurram, ¶ [0047]); and performing an operation corresponding to the user voice input based on the selected voice recognition engine, (“The received spoken input 102 may be transferred to the selected speech recognition engine using the accessed session. The received spoken input is then processed using the selected speech recognition engine (260).”; Gurram, ¶ [0048]). However, Gurram fail(s) to expressly recite identify at least one state of the electronic device corresponding to at least one item related to the electronic device from among a plurality of states of the electronic device, wherein the at least one item comprises at least one of: a network state, an account-login state, a voice speaker, a voice input route, trigger presence or an application running state.
The relevance of Gruber is described above with relation to claim 1. Regarding claim 11, Gruber teaches identifying at least one state of the electronic device (The “virtual assistant 1002 acquires and applies a variety of contextual information {identify at least one state} to perform information processing functions,”; Gruber, ¶ [0171]) corresponding to at least one item related to the electronic device from among a plurality of states of the electronic device (The contextual information can be derived from and thus corresponds to “several sources of context, including for example device sensor data 1056, application preferences and usage history 1072, dialog history and assistant memory 1052, personal databases 1058, personal acoustic context data 1080, current application context 1060, and event context 2706 {corresponding to at least one item}” as derived by the “virtual assistant 1002 {related to the electronic device}”; Gruber, ¶ [0179]) wherein the at least one item comprises at least one of: a power state, a network state, an account-login state, a voice speaker, a voice input route, trigger presence or an application running state (The system can use “vocabulary from user personal database(s) 1058” can influence the “choice and tuning of the statistical language model 1029,” where vocabulary from user personal databases 1058 includes the voice speaker (where “voice speaker” is understood under the BRI to refer to speaker identification). To further clarify, a user personal database is associated with an account for the user and the user must be at least known by the system to associate the personal databases 1058. Further, the system “can use dialog state context to select a custom statistical language model 1029,” including the use of specific keywords, such as learned keywords {trigger presence} in selecting “an SLM 1029 that biases toward hearing these words.” As well, each of the “Personal Acoustic Context Data 1080” and “Device Sensor Data 1056… can be used to select from possible SLMs 1029 or otherwise tune them to optimize for recognized acoustical contexts” which includes “the properties of the microphones and cameras in use {a voice input route}” and “the current networks being used, and signatures of connected networks {a network state}”; Gruber, ¶ [0223]-[0224], [0318]-[0319], [0321], [0333], [0335]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition engine selection systems of Gurram to incorporate the teachings of Gruber to include identify at least one state of the electronic device corresponding to at least one item related to the electronic device from among a plurality of states of the electronic device, wherein the at least one item comprises at least one of: a network state, an account-login state, a voice speaker, a voice input route, trigger presence or an application running state. The systems and methods of Gruber incorporate “context information” from a wide variety of sources “to supplement natural language or gestural input from a user,” including the use of such context for selection between available language models, such as for use in speech to text and natural language understanding, which  “helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input,” thus providing the known benefit of improved operational performance in light of changes in operational state for the device and/or the user, as recognized by Gruber. (Gruber, ¶ [0013], [0317]-[0321]).

Regarding claim 12, the rejection of claim 11 is incorporated. Claim 12 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Regarding claim 13, the rejection of claim 11 is incorporated. Claim 13 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Regarding claim 15, the rejection of claim 13 is incorporated. Claim 15 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Claims 7, and 21-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gurram and Gruber as applied to claims 3, 5, and 13, and further in view of Gupta.

Regarding claim 7, the rejection of claim 5 is incorporated. Gurram and Gruber disclose all of the elements of the current invention as stated above. However, Gurram and Gruber fail(s) to expressly recite wherein the processor is further configured to calculate the correlations based on the at least one intent.
Gupta teaches a speech processing system and method able to “hand off’ utterances from one speech processing system or speech processing component to another. (Gupta, ¶ [0019]). Regarding claim 7, Gupta teaches wherein the processor is further configured to calculate the correlations based on the at least one intent (“The system 120 determines (134) a first score corresponding to a first ability of a first speech processing system to respond to the intent and determines (136) a second score corresponding to a second ability of a second speech processing system to respond to the intent,” where the ability to respond is the state, which has a correlation to both the first speech recognition engine and the second speech recognition engine, the score is a calculation of the ability of the each speech processing system to respond, and the score/correlation is based on the intent of the user.; Gupta, ¶ [0022]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition engine selection systems of Gurram, as modified by the contextual speech recognition model selection of Gruber, to incorporate the teachings of Gupta to include wherein the processor is further configured to calculate the correlations based on the at least one intent. The systems and methods of Gupta can determine user intent for hand off between speech recognition systems or components thereof, including keeping the user informed regarding hand off, “to provide a desirable customer experience,” as recognized by Gupta. (Gupta, ¶ [0019]).

Regarding claim 21, the rejection of claim 3 is incorporated. Gurram and Gruber disclose all of the elements of the current invention as stated above. However, Gurram and Gruber fail(s) to expressly recite wherein the plurality of voice recognition engines comprises a default voice recognition engine, and wherein the default voice recognition engine is configured to analyze the intent of the user voice input.
The relevance of Gupta is described above with relation to claim 7. Regarding claim 21, Gupta teaches wherein the plurality of voice recognition engines comprises a default voice recognition engine (In some embodiments, “the speech processing system manager 294 includes processing components, such as ASR and/or NLU components, that may be used to select a speech processing system 292,” where said processing component are understood as default processing components.; Gupta, ¶ [0042]), and wherein the default voice recognition engine is configured to analyze the intent of the user voice input (The default “processing components” can include “NLU components”, where the NLU component “may determine an intent representing an action that a user desires be performed.”; Gupta, ¶ [0042], [0044]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition engine selection systems of Gurram, as modified by the contextual speech recognition model selection of Gruber, to incorporate the teachings of Gupta to include wherein the plurality of voice recognition engines comprises a default voice recognition engine, and wherein the default voice recognition engine is configured to analyze the intent of the user voice input. The systems and methods of Gupta can determine user intent for hand off between speech recognition systems or components thereof, including keeping the user informed regarding hand off, “to provide a desirable customer experience,” as recognized by Gupta. (Gupta, ¶ [0019]).

Regarding claim 22, the rejection of claim 13 is incorporated. Claim 22 is substantially the same as claim 21 and is therefore rejected under the same rationale as above.

Claims 16-17, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gurram in view of Gupta and Gruber.

Regarding claim 16, Gurram discloses A method of controlling an electronic device (the systems and methods described with reference to “the speech recognition engines 101... associated with a client computer 107”; Gurram, ¶ [0025]), comprising: receiving a user voice input (“client computer 107 may use, or be associated with, a microphone(s) or other physical element for capturing the spoken input 102.”; Gurram, ¶ [0025]); identifying a state of the electronic device (Though not explicitly defined as states, the system identifies selection information, where the selection information includes “heuristics,” “application information,” and “user preference information,” as well as subcategories thereof, which are “associated with the user interface” and where the selection information includes a determination of a category/type of selectable information, and a determination of device status with regards to the category/type {at least one state of the electronic device}. The state is the device status with regards to the category/type defined by said selection information. In one example, the “best-suited speech recognition engine may be the speech recognition engine with fewer misrecognitions, over time, of spoken input issued by a user” where the category defined by the selection information is “speech recognition engine with fewer misrecognitions,” determined based on “track[ing] which of the speech recognition engines 101 is best at recognizing a particular user’s voice,” and the state is either fewer or not fewer for the speech recognition engine with the electronic device being in the active state. The category is defined with relation to the electronic device (e.g., in the sample above, “fewer misrecognitions” is a function of both the device being in the active state and engine performance), and therefore is related to the electronic device. Further, a determination of a category/type necessarily includes the possibility of other categories/types and associated device states of those categories/types {from among a plurality of states of the electronic device}; Gurram, ¶ [0036]-[0039], [0058], [0066]); determining whether a selection rule exists identifying a preferred voice recognition engine corresponding to the intent and the state from among the plurality of voice recognition engines (“the engine selector 120” may use “selection information 122” as part of “selecting one of the speech recognition engines 101 for use” where selection information indicating a preferred engine and selection of an engine based on the selection information is a rule for identifying the voice recognition engine based on the state, and the selection of “the best” indicates a plurality of speech recognition engines from which one is chosen, and where the selection information can further include “that a particular user prefers one type of speech recognition engine to another... for a particular task(s) {a preferred voice recognition engine corresponding to the intent}”; Gurram, ¶ [0024], [0031], [0033], [0043]); based on the selection rule existing, performing an operation corresponding to the user voice input based on the preferred voice recognition engine (“the engine selector 120 may access selection information 122” where the “selection information 122 includes various criteria and information for selecting a particular one of the speech recognition engines 101” where criteria for selection is a selection rule, and where the system may only access selection information which exists. As such, if the selection information exist, then embodiments which rely on that selection information, will use that selection information “for selecting a particular one of the speech recognition engines 101” and where the selection information can indicate a user preference for a speech recognition engine which “may simply be received from a user prior to the user making use of the system 100.”; Gurram, ¶ [0032]-[0033], [0039]); based on the selection rule not existing, selecting a subset of voice recognition engines from among the plurality of voice recognition engines based on the intent (Conversely, if the selection rule does not exist, the system will rely on the remaining selection information, which includes “heuristics,” “application information,” and the remaining “user preference information,” as well as subcategories thereof, which are “associated with the user interface” and the speech recognition engines. With regards to heuristics, “the heuristics 124 also may relate to which of the speech recognition engines 101 are better suited to processing spoken input associated with particular types of interface elements” where “the heuristics 124 may be used to relate types of speech recognition with particular speech recognition engines”. Thus, the heuristics is understood to include determining “a specific type” associated with the “speech recognition task,” where speech recognition engines capable of handling a type of speech recognition task is a subset of the speech recognition engines. As such, this type establishes a subset of “speech recognition engines” which “are better suited to processing spoken input associated with [the] particular type” which can also coincide with a particular user preferred type “of speech recognition engine... for a particular task(s)”; Gurram, ¶ [0035]-[0039], [0024]), and selecting a voice recognition engine, from among the subset of voice recognition engines, based on the intent and a correlation between the state and the voice recognition engine (“By using the selection information 122, the engine selector 120 may select an optimal one of the speech recognition engines 101. Specifically, the speech recognition engine information 123, the heuristics 124, the interface information 126, and the user preference information 128 may be used... in combination to select the search recognition engine to be used in recognizing the spoken input 102.” As stated above, the remaining selection information, which includes “heuristics,” “application information,” and the remaining ‘user preference information,” as well as “other designations or types of selection information” can be applied to the subset created based on type through the heuristics approach, such as selecting the best speech recognition engine based on misrecognition rate, where the best speech recognition engine is selected from the subset generated based on the specific type associated with the speech recognition task (e.g., selecting the best speech recognition engine capable of processing speech recognition tasks of the specific type).; Gurram, ¶ [0035]-[0037], [0041], [0043]); and performing an operation corresponding to the user voice input based on the selected voice recognition engine (“The received spoken input 102 may be transferred to the selected speech recognition engine using the accessed session. The received spoken input is then processed using the selected speech recognition engine (260).”; Gurram, ¶ [0048]). However, Gurram fail(s) to expressly recite determining, based on a default voice recognition engine from among a plurality of voice recognition engines, an intent corresponding to the user voice input, [and] wherein the subset of voice recognition engines comprises one from among a first subset comprising a plurality of on-device voice recognition engines which are stored on the electronic device, and a second subset comprising a plurality of server voice recognition engines which are stored on a server.
The relevance of Gupta is described above with relation to claim 7. Regarding claim 16, Gupta teaches determining, based on a default voice recognition engine from among a plurality of voice recognition engines, an intent corresponding to the user voice input (In some embodiments, “the speech processing system manager 294 includes processing components, such as ASR and/or NLU components, that may be used to select a speech processing system 292,” where said processing component are understood as default processing components, and The default “processing components” can include “NLU components”, where the NLU component “may determine an intent representing an action that a user desires be performed.”; Gupta, ¶ [0042], [0044]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition engine selection systems of Gurram to incorporate the teachings of Gupta to include determining, based on a default voice recognition engine from among a plurality of voice recognition engines, an intent corresponding to the user voice input. The systems and methods of Gupta can determine user intent for hand off between speech recognition systems or components thereof, including keeping the user informed regarding hand off, “to provide a desirable customer experience,” as recognized by Gupta. (Gupta, ¶ [0019]). However, Gurram and Gupta fail(s) to expressly recite wherein the subset of voice recognition engines comprises one from among a first subset comprising a plurality of on-device voice recognition engines which are stored on the electronic device, and a second subset comprising a plurality of server voice recognition engines which are stored on a server.
The relevance of Gruber is described above with relation to claim 1. Regarding claim 16, Gruber teaches wherein the subset of voice recognition engines comprises one from among a first subset comprising a plurality of on-device voice recognition engines which are stored on the electronic device, and a second subset comprising a plurality of server voice recognition engines which are stored on a server (“In the example of FIG. 32, input elicitation functionality and output processing functionality are distributed among client 1304 and server 1340, with client part of input elicitation 2794 a and client part of output processing 2792 a located at client 1304, and server part of input elicitation 2794 b and server part of output processing 2792 b located at server 1340” where the components “located at server 1340” include “complete library of language pattern recognizers 2760 b” and the client device “maintains subsets and/or portions of these components locally” including a “subset of library of language pattern recognizers 2760 a”; Gruber, ¶ [0100]-[0109]; FIG. 32).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition engine selection systems of Gurram, as modified by the speech processing hand-off systems of Gupta, to incorporate the teachings of Gruber to include wherein the subset of voice recognition engines comprises one from among a first subset comprising a plurality of on-device voice recognition engines which are stored on the electronic device, and a second subset comprising a plurality of server voice recognition engines which are stored on a server. The systems and methods of Gruber incorporate “context information” from a wide variety of sources “to supplement natural language or gestural input from a user,” including the use of such context for selection between a plurality of language models, such as for use in speech to text and natural language understanding, which  “helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input,” thus providing the known benefit of improved operational performance in light of changes in operational state for the device and/or the user, as recognized by Gruber. (Gruber, ¶ [0013], [0317]-[0321]).

Regarding claim 17, the rejection of claim 16 is incorporated. Gurram, Gupta, and  Gruber disclose all of the elements of the current invention as stated above. Gurram further discloses wherein the selecting of the voice recognition engine further comprises: selecting a subset of the plurality of voice recognition engines based on the intent (The system can include “multiple ones of the plurality of speech recognition engines 325 are each simultaneously maintained and associated with a defined session. In this way, a user may switch from one speech recognition engine to another while executing a plurality of tasks or while using a plurality of applications”; Gurram, ¶ [0059], [0062]); and selecting the voice recognition engine from among the subset of the plurality of voice recognition engines, (“The selection of a particular session may be based on a number of factors or combinations thereof, such as, for example, the currently-active interface element, a user selection/preference, or heuristics.”; Gurram, ¶ [0063]) based on the voice recognition engine having a highest correlation with the identified state from among correlations between the subset of the plurality of voice recognition engines and a plurality of states. (Selection is based on the highest correlations with regards to the selection information in question for each of the plurality of speech recognition engines 325; Gurram, ¶ [0062]-[0063]). 

Regarding claim 19, the rejection of claim 17 is incorporated. Gurram, Gupta, and  Gruber disclose all of the elements of the current invention as stated above. Gurram further discloses wherein the default voice recognition engine is included in the plurality of on-device voice recognition engines (The speech recognition engines, including the user preferred speech recognition engine may be stored either locally or remotely.; Gurram, ¶ [0026]-[0027]). 

Regarding claim 20, the rejection of claim 17 is incorporated. Gurram and Gupta disclose all of the elements of the current invention as stated above. Gurram further discloses wherein the plurality of on- device voice recognition engines relate to functions of the electronic device (“Particular speech recognition engines 101 may be better suited to process the spoken input 102 associated with particular environments and/or particular tasks. For example, DynaSpeak® is designed for use on a mobile device.”; Gurram, ¶ [0024]), and wherein the plurality of server voice recognition engines relate to provision of services from outside of the electronic device (“As another example, IBM ViaVoice® may have better dictation recognition than other speech recognition engines that are available at a particular time within the system 100.”; Gurram, ¶ [0024]).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657
Read full office action
Prosecution Timeline

Aug 25, 2022
Application Filed
Aug 23, 2024
Non-Final Rejection — §103
Oct 17, 2024
Interview Requested
Nov 01, 2024
Applicant Interview (Telephonic)
Nov 01, 2024
Examiner Interview Summary
Nov 29, 2024
Response Filed
Mar 11, 2025
Final Rejection — §103
Jun 11, 2025
Request for Continued Examination
Jun 12, 2025
Response after Non-Final Action
Aug 14, 2025
Non-Final Rejection — §103
Nov 18, 2025
Response Filed
Feb 24, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/154,549
Patent 12603095
Stereo Audio Signal Delay Estimation Method and Apparatus
2y 5m to grant Granted Apr 14, 2026
17/648,548
Patent 12598250
SYSTEMS AND METHODS FOR COHERENT AND TIERED VOICE ENROLLMENT
2y 5m to grant Granted Apr 07, 2026
18/004,197
Patent 12597429
PACKET LOSS CONCEALMENT
2y 5m to grant Granted Apr 07, 2026
16/529,456
Patent 12512093
Sensor-Processing Systems Including Neuromorphic Processing Modules and Methods Thereof
2y 5m to grant Granted Dec 30, 2025
17/640,303
Patent 12505835
HOME APPLIANCE AND SERVER
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
69%
Grant Probability
99%
With Interview (+33.6%)
3y 2m
Median Time to Grant
High
PTA Risk
Based on 134 resolved cases by this examiner. Grant probability derived from career allow rate.