Last updated: April 19, 2026
Application No. 18/771,489
SERVER SUPPORTED RECOGNITION OF WAKE PHRASES

Non-Final OA §103§112§DP
Filed
Jul 12, 2024
Examiner
AGAHI, DARIOUSH
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Soundhound AI Ip LLC
OA Round
1 (Non-Final)
Interview Optional

— +29.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 166 resolved cases, 2023–2026
Examiner Intelligence

AGAHI, DARIOUSH View full profile →
Grants 86% — above average
Career Allow Rate
142 granted / 166 resolved
+23.5% vs TC avg
Strong +29% interview lift
Without
With
+29.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
27 currently pending
Career history
193
Total Applications
across all art units
Statute-Specific Performance

§101
25.8%
-14.2% vs TC avg
§103
47.8%
+7.8% vs TC avg
§102
10.0%
-30.0% vs TC avg
§112
12.6%
-27.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 166 resolved cases
Office Action

§103 §112 §DP
DETAILED ACTION
This office action is in response to Applicant’s submission filed on 7/12/2024. This is a CON application based on 17584780 (issued as US12051403). Claims 1-20 are pending in the application of which Claims 1, and 11 are independent and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement(s)(IDS) submitted on 7/12/2024 has been considered by the examiner.

Claim Objections
Listed claims are objected to for the informalities shown and may be addressed with suggested amendments:
Claim 4, line 1 recite: … augmenting the wake phrase audio comprises …
Applicant is advised to review all claims for any potential claim objection issues.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1, and 11, and therefore claims 2 -10, and 12-20 respectively which depend therefrom; are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  
Claim 1, line 7, and claim 11, line 10 recites “… associated with the device type in a repository ...”, which appears to be indefinite since it is not clear which device type it is referring to. 
Applicant is advised to review all claims for any potential antecedent basis issues.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1 - 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-12 of U.S. Patent No. 12051403. The claims of the issued patent are narrower in scope than that of the instant application. Therefore, the claims of the issued patent anticipate the claims of the instant application. Even though, the instant application has a method claim and a system claim for its independent claims, while the issued patent has method, and CRM claims for its independent claims. It would be obvious to extract the system claim based on the functions of the CRM claims. Please see the claim chart mapping below as well as the claim mappings for the individual claims. Claims 11-20 of the instant application are mirrored of claims 1-10. 

claim
Instant Application: 18771489

Claim
Issued Patent: US12051403
1
A computer-implemented method, comprising:

1
A computer-implemented method comprising:

1.a
supporting a plurality of virtual assistants;
1.aa
receiving a request from a virtual assistant device, the request comprising wake phrase audio;

1.b
receiving a request from a virtual assistant device, the request comprising wake phrase audio and an identification of a device type;
1.bb
identifying, based on a type of the virtual assistant device, a specific environment for which the virtual assistance device is used;

1.c
identifying, based on the identification of the device type, a specific environment for which the virtual assistance device is used;
1.cc
based on the type of the virtual assistant device, retrieving a wake phrase detector associated with the device type in a repository of wake phrase detectors;

1.f
based on the device type, retrieving a plurality of wake phrase detectors associated with the device type in a repository of wake phrase detectors for the plurality of virtual assistants;
1.dd
retrieving, from a repository associated with the specific environment, typical non-speech environmental noise for the specific environment;

1.d
retrieving, from a repository associated with the specific environment, typical non-speech environmental noise for the specific environment,
1.ee
augmenting the wake phrase audio with the typical non-speech environmental noise for the specific environment; and

1.e
wherein the typical non-speech environmental noise is used for augmenting positive audio samples and negative audio samples of the wake phrase audio;

1.g
identifying when the wake phrase audio triggers a wake phrase detector of the plurality of the wake phrase detectors; and

1.h
in response to the wake phrase audio triggering the wake phrase detector,  
1.ff
providing a corresponding response to the virtual assistant device

1.i
providing a corresponding response to the virtual assistant device,

1.j
wherein the wake phrase detector is trained from positive audio samples of the wake phrase and negative audio samples of the wake phrase audio, wherein the positive audio samples contain a match of the wake phrase audio and the negative audio samples contain similar phrases of the wake phrase audio.

1.g
in response to the augmented wake phrase audio triggers the wake phrase detector.

2
The computer-implemented method of claim 1, further comprising:

3
The method of claim 1, further comprising
2.aa
storing the wake phrase audio in a corpus for training wake phrase detectors.

3.a
storing the wake phrase audio in a corpus for training wake phrase detectors.

3
The computer-implemented method of claim 1, further comprising:

3.aa
receiving an identification of a virtual assistant device type.

6.a
wherein the identification of the device type uniquely identifies the virtual assistant device with respect to the plurality of virtual assistants supported.

4
The computer-implemented method of claim 1, wherein augmenting the wake phrase comprises

4.aa
augmenting positive audio samples and negative audio samples of the wake phrase audio.

1.e
wherein the typical non-speech environmental noise is used for augmenting positive audio samples and negative audio samples of the wake phrase audio;

5
The computer-implemented method of claim 1,

5.aa
wherein the wake phrase detector is trained from positive audio samples of the wake phrase and negative audio samples of the wake phrase audio and wherein the positive audio samples contain a match of the wake phrase audio and the negative audio samples contain similar phrases of the wake phrase audio.

1.j
wherein the wake phrase detector is trained from positive audio samples of the wake phrase and negative audio samples of the wake phrase audio, wherein the positive audio samples contain a match of the wake phrase audio and the negative audio samples contain similar phrases of the wake phrase audio.

6
The computer-implemented method of claim 1, further comprising:

6.aa
identifying when the wake phrase audio triggers a wake phrase detector of the plurality of the wake phrase detectors.

1.g
identifying when the wake phrase audio triggers a wake phrase detector of the plurality of the wake phrase detectors; and

7
The computer-implemented method of claim 1, wherein the wake phrase detector is further configured to

4
The method of claim 1, wherein the wake phrase detector is further configured to
7.aa
detect noise characteristics present in the wake phrase audio and associated with the device type and, based on the detection, providing the response.

4.a
detect noise characteristics present in the wake phrase audio and associated with the device type and, based on the detection, providing the response.

8
The computer-implemented method of claim 1,

5
The method of claim 1,
8.aa
wherein the wake phrase detector is trained from only positive audio samples of the wake phrase.

5.a
wherein the wake phrase detector is trained from only positive audio samples of the wake phrase.

9
The computer-implemented method of claim 1,

6
The method of claim 1,
9.aa
wherein an identification of the type of the virtual assistant device uniquely identifies the virtual assistant device with respect to a plurality of virtual assistants supported.

6.a
wherein the identification of the device type uniquely identifies the virtual assistant device with respect to the plurality of virtual assistants supported.

10
The computer-implemented method of claim 1,

9
The method of claim 1
10.aa
wherein an identification of the type of the virtual assistant device further identifies a car within which the virtual assistant device is deployed.

9.a
wherein the identification of the device type further identifies an associated computing device within which the virtual assistant device is deployed, the associated computing device being a car.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-2, 6, 11 - 12, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over  Gruenstein et al. (US20180233150A1)(herein " Gruenstein"), and in further view of  Nemala et al. (US9640194B1)(herein " Nemala").

Regarding claims 1, and 11 Gruenstein teaches a computer-implemented method [Par. 0009], comprising: [- claim 1], and a computer system, comprising [Par. 0070]: at least one processor[Par. 0093, 0094]; and memory including instructions [Par. 0073] that, when executed by the at least one processor [Par. 0081], cause the computer system to
	receiving/receive a request from a virtual assistant device, the request comprising wake phrase audio; (Gruenstein, Par. 0033:” The data that identifies the key phrase may be text data for the key phrase, e.g., a text string, or an identifier for the client device 102, e.g., either of which may be included in the request to analyze the audio signal received from the client device 102. The server hotword detection module 114 may use the identifier  for the client device 102 to access a database and determine the key phrase for the client device 102 and the audio signal.”, and Par. 0043:” … the client device 102 may provide the speech recognition system 112 with data for the user specified hotword that the speech recognition system 112 associates with an identifier for the client device 102, e.g., with a user account for the client device 102.”)
	identifying/identify, based on a type of the virtual assistant device, a specific environment for which the virtual assistance device is used; (Gruenstein, Par. 0033:” The data that identifies the key phrase may be text data for the key phrase, e.g., a text string, or an identifier for the client device 102, e.g., either of which may be included in the request to analyze the audio signal received from the client device 102. The server hotword detection module 114 may use the identifier for the client device 102 to access a database and determine the key phrase for the client device 102 and the audio signal.”, and Par. 0043:” … the client device 102 may provide the speech recognition system 112 with data for the user specified hotword that the speech recognition system 112 associates with an identifier for the client device 102, e.g., with a user account for the client device 102.”, and Par. 0063:” … the client device that includes the audio signal also includes data identifying a key phrase, e.g., text for the key phrase or an identifier that can be used to look up a key phrase in a database.”, and Par. 0044:”The client device 102 may have different key phrases for different physical geographic locations. For instance, the client device 102 may have a first key phrase for a user's home and a second, different key phrase for the user's office. The client device 102 may use one or more location devices 110 to determine a current physical geographic location for the client device 102 and select a corresponding key phrase.”) Note: Gruenstein teaches having multiple devices with associated key phrases for different environment (home, office, etc.). Choosing different type of virtual assistant device for a given environment is a design choice, and PHOSITA will make the selection per the environment where device is operational.
	based on the type of the virtual assistant device, retrieving a wake phrase detector associated with the device type in a repository of wake phrase detectors; (Gruenstein, Par. 0034:” In some examples, the server hotword detection module 114 may use a pre-built hotword biasing model. For instance, the server hotword detection module 114 may analyzes multiple audio signals from the client device 102 or from multiple different client devices, all of which are for the same key phrase, using the same hotword biasing model.”, and Par. 0043:” … the client device 102 may be configured to detect any of multiple different key phrases encoded in an audio signal. For example, the client device 102 may receive input representing a user specified hotword, such as ‘hey indigo’ or ‘hey gennie.’ … the client device 102 may provide the speech recognition system 112 with data for the user specified hotword that the speech recognition system 112 associates with an identifier for the client device [represents device type] 102, e.g., with a user account for the client device 102.”, and Par. 0063:” … the client device that includes the audio signal also includes data identifying a key phrase, e.g., text for the key phrase or an identifier that can be used to look up a key phrase in a database.”) Note: client device has multiple wake phrases that are tied to the identification (representing type) of the device, where user account connected them all in the database as recited here above.
Providing/provide a corresponding response to the virtual assistant device in response to the augmented wake phrase audio triggers the wake phrase detector. (Gruenstein, Par. 0002:” … analyzes the entire key phrase to determine whether the user spoke the key phrase. When the server determines that the key phrase is included in the words, the server may parse other words spoken by the user to generate data for an action that the client device should perform.”, and Par. 0055:” In response to determining that the response data includes tagged text data representing the one or more utterances encoded in the audio signal, the client device performs an action [response] using the tagged text data (210). For instance, the client device uses the tags in the data to determine the action to perform. The tags may indicate which portion of the tagged data, and the respective portion of the audio signal, correspond to the first utterances for the key phrase. The tags may indicate which portion of the tagged data correspond to an action for the client device to perform, e.g., “play some music.”)
Gruenstein does not teach, however Nemala teaches retrieving/receive, from a repository associated with the specific environment, typical non-speech environmental noise for the specific environment; (Nemala, Col. 7, line 66 – Col. 8, line 6: As follows from this figure, a frequency analysis module 450 and/or combination module 460 of the training system 410 may receive predetermined reference clean speech signals and predetermined reference noise signals [specific environment] from the clean speech database 420 and the noise database 430, respectively. These reference clean speech and noise signals may be combined by a combination module 460 of the training system 410 into “synthetic” noisy speech signals.”)
augmenting/augment the wake phrase audio with the typical non-speech environmental noise for the specific environment; (Nemala, Col. 7, line 66 – Col. 8, line 6: As follows from this figure, a frequency analysis module 450 and/or combination module 460 of the training system 410 may receive predetermined reference clean speech signals and predetermined reference noise signals [specific environment] from the clean speech database 420 and the noise database 430, respectively. These reference clean speech and noise signals may be combined [augmented] by a combination module 460 of the training system 410 into “synthetic” noisy speech signals.”) Note: Synthetic noisy speech signal represents augmented wake phrase audio.
Nemalais considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruenstein further in view of Nemala to receive,  from a repository associated with the specific environment, typical non-speech environmental noise for the specific environment; augmenting the wake phrase audio with the typical non-speech environmental noise for the specific environment. Motivation to do so would produce a signal with an improved signal-to-noise ratio (Nemala, Col. 7, line 3).

Regarding claims 2, and 12, Gruenstein, as modified above, teaches the method, and the system of claims 1, and 11 respectively.
Gruenstein, as modified above, further teaches storing the wake phrase audio in a corpus for training wake phrase detectors. (Gruenstein, Par. 0024:” … In some implementations, the client hotword detection module is configured to detect occurrence of any of multiple different key phrases, e.g., ten key phrases. The multiple different key phrases include a limited number of different key phrases for which the client hotword detection module 106 is trained.”, and Par. 0032:” The server hotword detection module 114 may use a language model 118, an acoustic model 120, or both, to determine whether the one or more first utterances satisfy the second threshold 116 of being a key phrase. The language model 118, and the acoustic model 120, are each trained using a large amount of training data, e.g., compared to the client hotword detection module 106. For example, the language model 118, the acoustic model 120, or both, may be trained using 30,000 hours of training data. The client hotword detection module 106 may be trained using 100 hours of training data.”) Note: to train for hotword detection reads on storing the wake phrase audio in a corpus for training wake phrase detectors.

Regarding claims 6, and 16, Gruenstein, as modified above, teaches the method, and the system of claims 1, and 11 respectively.
Gruenstein, as modified above, further teaches identifying when the wake phrase audio triggers a wake phrase detector of the plurality of the wake phrase detectors. (Gruenstein, Par. 0002:” …  When the server determines that the key phrase is included in the words, the server may parse other words spoken by the user to generate data for an action that the client device should perform.”, and Par. 0024:” … the client hotword detection module is configured to detect occurrence of any of multiple different key phrases, e.g., ten key phrases. The multiple different key phrases include a limited number of different key phrases for which the client hotword detection module 106 is trained.”)

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over  Gruenstein, and Nemala, and in further view of  Kothari et al. (US20200312317A1)(herein " Kothari ").

Regarding claims 3, and 13 Gruenstein, as modified above, teaches the method, and the system of claims 1, and 11 respectively.
Gruenstein, as modified above, does not teach, however, Kothari teaches receiving an identification of a virtual assistant device type. (Kothari Par. 0036:” In some cases, the input audio signal can include identifying information specifying which of the first digital assistant computing device 104 or the second digital assistant computing device 104 is to process the input audio signal. Identifying information can include a label or other identifier assigned to the first or second digital assistant computing device 104, such as ‘first’, ‘home’, ‘living room’, or ‘kitchen’. Identifying information can include alphanumeric values. In some cases, if the input audio signal includes identifying information that can be used to select one of the first or second digital computing device 104 to use for further processing, the data processing system 102 can instruct the corresponding digital assistant computing device to perform the further signal processing.”, and Par. 0062:” The orchestrator component 112 can poll one or more digital assistant computing devices 104 associated with an account identifier to obtain characteristics associated with the one or more digital assistant computing devices 104, and set one of the one or more digital assistant computing devices 104 as a primary signal processor based on an analysis of the characteristics. For example, the orchestrator component 112 can poll the first digital assistant computing device to obtain one or more characteristics of the first digital assistant computing device. The orchestrator component 112 can poll the second digital assistant computing device 104 to obtain the one or more characteristics of the second digital assistant computing device 104.”, and Par. 0063:” The characteristic can include or be based on the type of device or a configuration of the device. For example, the type of device can include a speaker device, a television device, a mobile device, and a wearable device.”)
Kothari is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruenstein, as modified above, further in view of Kothari to receive an identification of a virtual assistant device type. Motivation to do so would coordinate signal processing to reduce the overall processor, memory and bandwidth utilization of the system that includes multiple digital assistant computing devices (Kothari, Par. 0056).

Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over  Gruenstein, and Nemala, and in further view of  Li et al. ("Adversarial Music: Real World Audio Adversary Against Wake-word Detection System")(herein " Li ").

Regarding claims 4, and 14 Gruenstein, as modified above, teaches the method, and the system of claims 1, and 11 respectively.
Gruenstein, as modified above, does not teach, however, Li teaches wherein augmenting the wake phrase comprises augmenting positive audio samples and negative audio samples of the wake phrase audio. (Li, Section 1, Introduction:” We reimplemented the wake-word detection system used in Amazon Alexa based on their latest publications on the architecture [Wu et al., 2018]. We leveraged a large amount of open-sourced speech data to train our wake-word model, testing and making sure it has on par performance compared with the real Alexa. We collected 100 samples of ‘Alexa’ utterances from 10 people and augmented the data set by varying the volume, tempo and speed. We created a synthetic data set using publicly available data sets as background noise and negative speech examples. This collected database is used to validate our emulated model and be compared with the real Alexa.”, and Section 4: dataset:” We collected 100 positive speech samples (speaking ‘Alexa’) from 10 peoples (4 males and 6 females; 4 native speakers of English, 6 non-native speakers of English). Each person provided 10 utterances, under the requirement of varying their tone and pitch as much as possible. We further augmented the data to 20x by varying the speed, tempo and the volume of the utterance, resulting in 2000 samples. We used LJ speech dataset for background noise and negative speech examples (speak anything but ‘Alexa’). We created a synthetic data set by randomly adding positive and negative speech examples onto a 10s background noise and created binary labels accordingly. While ‘hearing’ positive speech examples, we set label values as 1.”)
Li is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruenstein, as modified above, further in view of Li to augment positive audio samples and negative audio samples of the wake phrase audio. Motivation to do so would  improve robustness of wake-word detection system ( Li,  Conclusion).

Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over  Gruenstein, and Nemala, and in further view of  Li et al. ("Adversarial Music: Real World Audio Adversary Against Wake-word Detection System")(herein " Li ").

Regarding claims 5, and 15 Gruenstein, as modified above, teaches the method, and the system of claims 1, and 11 respectively.
Gruenstein, as modified above, does not teach, however, Foerster teaches wherein the wake phrase detector is trained from positive audio samples of the wake phrase and negative audio samples of the wake phrase audio and wherein the positive audio samples contain a match of the wake phrase audio and the negative audio samples contain similar phrases of the wake phrase audio. (Foerster, Col. 1, ll. 29 – 38:” … include the actions of accessing a first neural network that was trained to recognize a given keyword or keyphrase using a set of hotword training data, wherein the hotword training data includes positive hotword training data that correspond to utterances of the keyword or keyphrase, and negative hotword training data that corresponds to utterances of words or phrases that are other than the keyword or keyphrase; selecting a seed hotsound; mapping, to a feature space, [i] the positive hotword training data, [ii] the negative hotword training data, and …”, and Col. 13, ll. 17 – 26:” The first neural network has been trained to recognize a keyword or keyphrase, e.g., “Ok Google,” using a set of hotword training data. The hotword training data includes a set of positive hotword training data that corresponds to utterances of the keyword or keyphrase, and a set of negative hotword training data that corresponds to utterances of words or phrases that are other than the keyword or keyphrase. In some implementations, the amount of negative hotword training data may be large.”)
Foerster is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruenstein, as modified above, further in view of Foerster to wherein the wake phrase detector is trained from positive audio samples of the wake phrase and negative audio samples of the wake phrase audio and wherein the positive audio samples contain a match of the wake phrase audio and the negative audio samples contain similar phrases of the wake phrase audio. Motivation to do so would provide a robust way of generating trigger sounds that improves the stability of hotsounding compared to other sound recognition systems (Foerster, Col.2, ll. 57-60).

Claims 7-9 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over  Gruenstein, and Nemala, and in further view of  Sharifi et al. (US20220093104A1)(herein " Sharifi ").

Regarding claims 7, and 17 Gruenstein, as modified above, teaches the method, and the system of claims 1, and 11 respectively.
Gruenstein, as modified above, does not teach, however, Sharifi teaches wherein the wake phrase detector is further configured to detect noise characteristics present in the wake phrase audio and associated with the device type and, based on the detection, providing the response. (Sharifi, Par. 0034:” … the audio data 103 and content metadata 110 associated with the speech input 104, from the QoS manager 300 in descending order of ranking 312. …, such as, for example, processing, noise modeling, acoustic modeling, language model, annotation, etc., to generate a speech recognition result (e.g., transcription) for the speech input 104. … The TTS module 730 may convert this response from text to speech and output the response in audio form to the user device 200, which is then output as synthesized speech to the user …”, and Par. 0042: “… The measured loudness may correspond to the portion of the audio data 103 that corresponds to the hotword detected by the hotword detector 220c, the portion of the audio data 103 that corresponds to the voice query following the hotword, or the entire audio data 103 captured by the user device 200. The audio quality score of the speech input 104 may further indicate a level of background noise present in the audio data 103. Thus, the audio quality score may simply refer to a confidence score of the audio quality of the speech input 104, i.e., how well the speech input 104 was captured by a microphone of the user device 200”, and Par. 0004:” Typically, after a voice enabled device wakes up by detecting the presence of the hotword in an utterance of speech (e.g., input audio), … Accordingly, when a user of a voice enabled device utters the following speech: ‘Hey Google, what restaurants are still open right now?’, the voice enabled device may wake-up in response to detecting a hotword (‘Hey Google’), and provide the terms following the hotword that correspond to a voice query (‘what nearby restaurants are still open right now?’) to the server-based processing stack for processing.”) Note: in a BRI sense when a wake phrase detector, detects noise characteristics/modeling in a wake phrase, it associates it to the device in question. Sharifi is teaching QoS determination which is another way of saying measuring the noise characteristics of the wake phrase.
Sharifi is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruenstein, as modified above, further in view of Sharifi to detect noise characteristics present in the wake phrase audio and associated with the device type and, based on the detection, providing the response. Motivation to do so would allow the user device to decide whether or not to send ASR requests  to the query processing stack for processing (Sharifi, Par. 0054).

Regarding claims 8, and 18 Gruenstein, as modified above, teaches the method, and the system of claims 1, and 11 respectively.
Gruenstein, as modified above, does not teach, however, Sharifi teaches wherein the wake phrase detector is trained from only positive audio samples of the wake phrase. (Sharifi, Par. 0003:” … the voice enabled device captures input audio via a microphone and uses a hotword detector trained to detect the presence of the hotword in the input audio. When the hotword is detected in the input audio, the voice enabled device initiates a wake-up process for processing the hotword and/or any other terms in the input audio following the hotword.”)
Sharifi is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruenstein, as modified above, further in view of Sharifi to wherein the wake phrase detector is trained from only positive audio samples of the wake phrase. Motivation to do so would allow the user device to decide whether or not to send ASR requests  to the query processing stack for processing (Sharifi, Par. 0054).

Regarding claims 9, and 19 Gruenstein, as modified above, teaches the method, and the system of claims 1, and 11 respectively.
Gruenstein, as modified above, does not teach, however, Sharifi teaches wherein an identification of the type of the virtual assistant device uniquely identifies the virtual assistant device with respect to a plurality of virtual assistants supported. (Sharifi, Par. 0034:” … the audio data 103 and content metadata 110 associated with the speech input 104, from the QoS manager 300 in descending order of ranking 312. …, such as, for example, processing, noise modeling, acoustic modeling, language model, annotation, etc., to generate a speech recognition result (e.g., transcription) for the speech input 104. … The TTS module 730 may convert this response from text to speech and output the response in audio form to the user device 200, which is then output as synthesized speech to the user …”, and Par. 0042: “… The measured loudness may correspond to the portion of the audio data 103 that corresponds to the hotword detected by the hotword detector 220c, the portion of the audio data 103 that corresponds to the voice query following the hotword, or the entire audio data 103 captured by the user device 200. The audio quality score of the speech input 104 may further indicate a level of background noise present in the audio data 103. Thus, the audio quality score may simply refer to a confidence score of the audio quality of the speech input 104, i.e., how well the speech input 104 was captured by a microphone of the user device 200”, and Par. 0004:” Typically, after a voice enabled device wakes up by detecting the presence of the hotword in an utterance of speech (e.g., input audio), … Accordingly, when a user of a voice enabled device utters the following speech: ‘Hey Google, what restaurants are still open right now?’, the voice enabled device may wake-up in response to detecting a hotword (‘Hey Google’), and provide the terms following the hotword that correspond to a voice query (‘what nearby restaurants are still open right now?’) to the server-based processing stack for processing.”) Note: in a BRI sense when a wake phrase detector, detects noise characteristics/modeling in a wake phrase, it associates it to the device in question. Sharifi is teaching QoS determination which is another way of saying measuring the noise characteristics of the wake phrase.
Sharifi is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruenstein, as modified above, further in view of Sharifi to wherein an identification of the type of the virtual assistant device uniquely identifies the virtual assistant device with respect to a plurality of virtual assistants supported. Motivation to do so would allow the user device to decide whether or not to send ASR requests  to the query processing stack for processing (Sharifi, Par. 0054).

Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over  Gruenstein, and Nemala, and in further view of  Broy et al. (US20210316682A1)(herein "Broy").

Regarding claims 10, and 20 Gruenstein, as modified above, teaches the method, and the system of claims 1, and 11 respectively.
Gruenstein, as modified above, does not teach, however, Broy teaches wherein an identification of the type of the virtual assistant device further identifies a car within which the virtual assistant device is deployed. (Broy, Par. 0030:” In detail, FIG. 1 shows a method 100 for determining a digital assistant for performing a vehicle function from a plurality of digital assistants in a vehicle. The method 100 can receive 102 a voice message from a vehicle occupant by means of a digital assistant from the plurality of the digital assistants. Each digital assistant from the plurality of digital assistants can have a unique identifier, e.g. a unique name. The digital assistants in the vehicle can communicate with each other using the unique identifier of a digital assistant. If a digital assistant does not have a unique identifier, an identifier of a vehicle occupant who is associated with the digital assistant can also be used to uniquely identify the digital assistant. In addition or alternatively, a digital assistant can be uniquely identified with regard to a function provided by the digital assistant, and/or with regard to a vehicle occupant who has been identified as the sender of the voice message. Preferably, each digital assistant is uniquely assigned to a vehicle occupant, so that a unique assignment to a digital assistant is possible by means of the identification of the vehicle occupant.”)
Broy is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gruenstein, as modified above, further in view of Broy to wherein an identification of the type of the virtual assistant device further identifies a car within which the virtual assistant device is deployed. Motivation to do so would improve the efficiency of executing commands for controlling vehicle functions by means of digital assistants in a vehicle interior (Broy, Par. 0003).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Maker et al. (US 20190251960 A1) teaches Par. 0006:” But selecting a digital assistant from among multiple digital assistants based on a voice input may be unreliable. This is because multiple digital assistants may detect their own trigger word being present in the voice input even though only one digital assistant can be selected.”, and Par. 0080:” In some other embodiments, user interface and command module 128 may perform trigger word detection for multiple trigger words. For example, user interface and command module 128 may perform trigger word detection for the trigger words “Hey Roku” and “OK Google.” In some embodiments, different trigger words may correspond to different digital assistants 180. This enables a user 136 to interact with different digital assistants 180 using different trigger words. In some embodiments, user interface and command module 128 may store the different trigger words in data storage 134 of the audio responsive electronic device 122.”
Examiner's Note: Examiner has cited particular columns and line numbers and/or paragraph numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.
In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DARIOUSH AGAHI, P.E.
Primary Examiner

/DARIOUSH AGAHI/Primary Examiner, Art Unit 2656
Read full office action
Prosecution Timeline

Jul 12, 2024
Application Filed
Feb 13, 2026
Non-Final Rejection — §103, §112, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/309,330
Patent 12596890
SYSTEMS AND METHODS FOR CROSS-LINGUAL TRANSFER LEARNING
2y 5m to grant Granted Apr 07, 2026
18/447,697
Patent 12596876
SYSTEMS AND METHODS FOR IMPROVING TEXTUAL DESCRIPTIONS USING LARGE LANGUAGE MODELS
2y 5m to grant Granted Apr 07, 2026
18/313,252
Patent 12591743
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM FOR EXTRACTING A NAMED ENTITY FROM A DOCUMENT
2y 5m to grant Granted Mar 31, 2026
18/490,733
Patent 12586586
SPEECH RECOGNITION WITH SELECTIVE USE OF DYNAMIC LANGUAGE MODELS
2y 5m to grant Granted Mar 24, 2026
18/163,231
Patent 12579448
TECHNIQUES FOR POSITIVE ENTITY AWARE AUGMENTATION USING TWO-STAGE AUGMENTATION
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
86%
Grant Probability
99%
With Interview (+29.0%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 166 resolved cases by this examiner. Grant probability derived from career allow rate.