Last updated: April 19, 2026
Application No. 18/750,663
CONTEXTUAL SUPPRESSION OF ASSISTANT COMMAND(S)

Non-Final OA §DP
Filed
Jun 21, 2024
Examiner
CHAWAN, VIJAY B
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Google LLC
OA Round
1 (Non-Final)
Interview Optional

— +11.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 882 resolved cases, 2023–2026
Examiner Intelligence

CHAWAN, VIJAY B View full profile →
Grants 88% — above average
Career Allow Rate
776 granted / 882 resolved
+26.0% vs TC avg
Moderate +12% lift
Without
With
+11.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
21 currently pending
Career history
903
Total Applications
across all art units
Statute-Specific Performance

§101
20.9%
-19.1% vs TC avg
§103
13.8%
-26.2% vs TC avg
§102
33.8%
-6.2% vs TC avg
§112
9.4%
-30.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 882 resolved cases
Office Action

§DP
DETAILED ACTION
	
Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting

The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.

Claims 1-2, 4 and 10 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 5 and 20 of U.S. Patent No. 11,557,293. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1-3, 4 and 10 of the instant application are similar in scope and content of the patented claims 1-2, 5 and 20 of the patent issued to the same Applicant.
It is clear that all the elements of the application claims 1-2, 4 and 10 are to be found in patented claims 1-2, 5 and 20 (as the application claims 1-2, 4 and 10 fully encompasses patented claims 1-2, 5 and 20).  The difference between the application claims and the patent claims lies in the fact that the patent claim includes many more elements and is thus much more specific.  Thus the invention of claims 1-2, 5 and 20 of the patent is in effect a “species” of the “generic” invention of the application claims 1-2, 4 and 10. It has been held that the generic invention is “anticipated” by the “species”.  See In re Goodman, 29 USPQ2d 2010 (Fed. Cir. 1993).  Since application claims 1-2, 4 and 10 is anticipated by claims 1-2, 5 and 20 of the patent, it is not patentably distinct from of the patented claims.
Application No: 18/750,663
Patent No: 11,557,293
1. A method implemented by one or more processors, the method comprising: detecting an occurrence of a warm word activation event; and in response to detecting the occurrence of the warm word activation event, activating one or more currently dormant automated assistant functions to process a stream of audio data, using a warm word model associated with the warm word activation event, to monitor for an occurrence of one or more particular words or phrases; in response to detecting the occurrence of the one or more particular words or phrases in a portion of the audio data, determining whether to process a preamble portion of the audio data and/or a postamble portion of the audio data, wherein the preamble portion of the audio data precedes the portion of the audio data that includes the one or more particular words or phrases, and wherein the postamble portion of the audio data follows the portion of the audio data that includes the one or more particular words or phrases in response to determining to refrain from processing the preamble portion of the audio data and/or postamble portion of the audio data: causing an automated assistant to perform an assistant command that is associated with the one or more particular words or phrases; and in response to determining to process the preamble portion of the audio data and/or postamble portion of the audio data: processing the preamble portion of the audio data and/or postamble portion of the audio data to determine whether to perform the assistant command that is associated with the one or more particular words or phrases.
1. A method implemented by one or more processors, the method comprising: processing, using a warm word model, a stream of audio data to monitor for an occurrence of one or more particular words or phrases, the stream of audio data being generated by one or more microphones of a client device of a user, and each of the one or more particular words or phrases being associated with an assistant command; in response to determining a portion of the audio data corresponds to one or more of the particular words or phrases: processing, using an automatic speech recognition (ASR) model, a preamble portion of the audio data and/or a postamble portion of the audio data to generate ASR output, wherein the preamble portion of the audio data precedes the portion of the audio data that corresponds to the one or more particular words or phrases, and wherein the postamble portion of the audio data follows the portion of the audio data that corresponds to the one or more particular words or phrases; and determining, based on processing the ASR output, whether the user intended the one or more particular words or phrases to cause performance of the assistant command; in response to determining the user did not intend the one or more particular words or phrases to cause performance of the assistant command that is associated one or more of the particular words or phrases: refraining from causing an automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases; and in response to determining the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases: causing the automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases.
2. The method of claim 1, further comprising: in response to detecting no occurrence of the one or more particular words or phrases in a portion of the audio data: deactivating one or more currently active automated assistant functions that were previously activated in response to detecting the occurrence of the warm word activation event.
2. The method of claim 1, further comprising: detecting an occurrence of a warm word activation event; and in response to detecting the occurrence of the warm word activation event, activating one or more currently dormant automated assistant functions that utilize the warm word model, wherein processing the stream of audio data using the warm word model to monitor for the occurrence of the one or more particular words or phrases is in response to activating the one or more currently dormant automated assistant functions that utilize the warm word model.
3. The method of claim 2, wherein deactivating one or more of the currently active automated assistant functions that were previously activated in response to detecting the occurrence of the warm word activation event is further in response to detecting no occurrence of the one or more particular words or phrases in a portion of the audio data for a threshold duration of time relative to detecting the occurrence of the warm word activation event.

4. The method of claim 1, wherein determining whether to process a preamble portion of the audio data and/or a postamble portion of the audio data is based on detecting voice activity before the portion of the stream of audio data that includes the one or more particular words or phrases and/or after the portion of the stream of audio data that includes the one or more particular words or phrases.
5. The method of claim 4, further comprising: in response to determining the NLU output is insufficient for determining whether the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases: processing, using the ASR model, the postamble portion of the audio data to generate additional ASR output; and determining, based on processing the additional ASR output, whether the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases.
5. The method of claim 4, wherein determining to process a preamble portion of the audio data is based on detecting voice activity before the portion of the stream of audio data that includes the one or more particular words or phrases.

6. The method of claim 4, wherein determining to process a postamble portion of the audio data is based on detecting voice activity after the portion of the stream of audio data that includes the one or more particular words or phrases.

7. The method of claim 1, wherein processing the preamble portion of the audio data and/or postamble portion of the audio data to determine whether to perform the assistant command that is associated with the one or more particular words or phrases comprises: determining, based on processing the preamble portion of the audio data and/or postamble portion of the audio data, whether the user intended the one or more particular words or phrases to cause performance of the assistant commend.

8. The method of claim 7, further comprising: in response to determining that the user intended the one or more particular words or phrases to cause performance of the assistant commend: causing the automated assistant to perform the assistant command that is associated with the one or more particular words or phrases.

9. The method of claim 7, further comprising: in response to determining that the user did not intend the one or more particular words or phrases to cause performance of the assistant commend: refraining from causing the automated assistant to perform the assistant command that is associated with the one or more particular words or phrases.

10. A system comprising: at least one processor; and memory storing instructions that, when executed, cause the at least one processor to be operable to: detect an occurrence of a warm word activation event; and in response to detecting the occurrence of the warm word activation event, activate one or more currently dormant automated assistant functions to process a stream of audio data, using a warm word model associated with the warm word activation event, to monitor for an occurrence of one or more particular words or phrases; in response to detecting the occurrence of the one or more particular words or phrases in a portion of the audio data, determine whether to process a preamble portion of the audio data and/or a postamble portion of the audio data, wherein the preamble portion of the audio data precedes the portion of the audio data that includes the one or more particular words or phrases, and wherein the postamble portion of the audio data follows the portion of the audio data that includes the one or more particular words or phrases in response to determining to refrain from processing the preamble portion of the audio data and/or postamble portion of the audio data: cause an automated assistant to perform an assistant command that is associated with the one or more particular words or phrases; and in response to determining to process the preamble portion of the audio data and/or postamble portion of the audio data: process the preamble portion of the audio data and/or postamble portion of the audio data to determine whether to perform the assistant command that is associated with the one or more particular words or phrases.
20. A system comprising: at least one processor; and memory storing instructions that, when executed, cause the at least one processor to: process, using a warm word model, a stream of audio data to monitor for an occurrence of one or more particular words or phrases, the stream of audio data being generated by one or more microphones of a client device of a user, and each of the one or more particular words or phrases being associated with an assistant command; in response to determining a portion of the audio data corresponds to one or more of the particular words or phrases: process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data and/or a postamble portion of the audio data to generate ASR output, wherein the preamble portion of the audio data precedes the portion of the audio data that corresponds to the one or more particular words or phrases, and wherein the postamble portion of the audio data follows the portion of the audio data that corresponds to the one or more particular words or phrases; and determine, based on processing the ASR output, whether the user intended the one or more particular words or phrases to cause performance of the assistant command; in response to determining the user did not intend the one or more particular words or phrases to cause performance of the assistant command that is associated one or more of the particular words or phrases: refrain from causing an automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases; and in response to determining the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases: cause the automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases.
11. The system of claim 10, wherein the at least one processor is further operable to: in response to detecting no occurrence of the one or more particular words or phrases in a portion of the audio data: deactivate one or more currently active automated assistant functions that were previously activated in response to detecting the occurrence of the warm word activation event.

12. The system of claim 11, wherein deactivating one or more of the currently active automated assistant functions that were previously activated in response to detecting the occurrence of the warm word activation event is further in response to detecting no occurrence of the one or more particular words or phrases in a portion of the audio data for a threshold duration of time relative to detecting the occurrence of the warm word activation event.

13. The system of claim 10, wherein determining whether to process a preamble portion of the audio data and/or a postamble portion of the audio data is based on detecting voice activity before the portion of the stream of audio data that includes the one or more particular words or phrases and/or after the portion of the stream of audio data that includes the one or more particular words or phrases.

14. The system of claim 13, wherein determining to process a preamble portion of the audio data is based on detecting voice activity before the portion of the stream of audio data that includes the one or more particular words or phrases.

15. The system of claim 13, wherein determining to process a postamble portion of the audio data is based on detecting voice activity after the portion of the stream of audio data that includes the one or more particular words or phrases.

16. The system of claim 10, wherein the instructions to process the preamble portion of the audio data and/or postamble portion of the audio data to determine whether to perform the assistant command that is associated with the one or more particular words or phrases comprise instructions to: determine, based on processing the preamble portion of the audio data and/or postamble portion of the audio data, whether the user intended the one or more particular words or phrases to cause performance of the assistant commend.

17. The system of claim 16, wherein the at least one processor is further operable to: in response to determining that the user intended the one or more particular words or phrases to cause performance of the assistant commend: cause the automated assistant to perform the assistant command that is associated with the one or more particular words or phrases.

18. The system of claim 16, wherein the at least one processor is further operable to: in response to determining that the user did not intend the one or more particular words or phrases to cause performance of the assistant commend: refrain from causing the automated assistant to perform the assistant command that is associated with the one or more particular words or phrases.

19. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform operations, the operations comprising: detecting an occurrence of a warm word activation event; and in response to detecting the occurrence of the warm word activation event, activating one or more currently dormant automated assistant functions to process a stream of audio data, using a warm word model associated with the warm word activation event, to monitor for an occurrence of one or more particular words or phrases; in response to detecting the occurrence of the one or more particular words or phrases in a portion of the audio data, determining whether to process a preamble portion of the audio data and/or a postamble portion of the audio data, wherein the preamble portion of the audio data precedes the portion of the audio data that includes the one or more particular words or phrases, and wherein the postamble portion of the audio data follows the portion of the audio data that includes the one or more particular words or phrases in response to determining to refrain from processing the preamble portion of the audio data and/or postamble portion of the audio data: causing an automated assistant to perform an assistant command that is associated with the one or more particular words or phrases; and in response to determining to process the preamble portion of the audio data and/or postamble portion of the audio data: processing the preamble portion of the audio data and/or postamble portion of the audio data to determine whether to perform the assistant command that is associated with the one or more particular words or phrases.




Claims 1-2, 7-8, 10-11, and 18-19 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 3, 5-6, 13, 15 and 20 of U.S. Patent No. 12,057,119. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1-2, 7-8, 10-11, and 18-19 of the instant application are similar in scope and content of the patented claims 1, 3, 5-6, 13, 15 and 20 of the patent issued to the same Applicant.
It is clear that all the elements of the application claims 1-2, 7-8, 10-11, and 18-19 are to be found in patented claims 1, 3, 5-6, 13, 15 and 20 (as the application claims 1-2, 7-8, 10-11, and 18-19  fully encompasses patented claims 1, 3, 5-6, 13, 15 and 20).  The difference between the application claims and the patent claims lies in the fact that the patent claim includes many more elements and is thus much more specific.  Thus the invention of claims 1, 3, 5-6, 13, 15 and 20 of the patent is in effect a “species” of the “generic” invention of the application claims 1-2, 7-8, 10-11, and 18-19. It has been held that the generic invention is “anticipated” by the “species”.  See In re Goodman, 29 USPQ2d 2010 (Fed. Cir. 1993).  Since application claims 1-2, 7-8, 10-11, and 18-19 is anticipated by claims 1, 3, 5-6, 13, 15 and 20 of the patent, it is not patentably distinct from of the patented claims.

Application No: 18/750,663
Patent No: 12,571,119
1. A method implemented by one or more processors, the method comprising: detecting an occurrence of a warm word activation event; and in response to detecting the occurrence of the warm word activation event, activating one or more currently dormant automated assistant functions to process a stream of audio data, using a warm word model associated with the warm word activation event, to monitor for an occurrence of one or more particular words or phrases; in response to detecting the occurrence of the one or more particular words or phrases in a portion of the audio data, determining whether to process a preamble portion of the audio data and/or a postamble portion of the audio data, wherein the preamble portion of the audio data precedes the portion of the audio data that includes the one or more particular words or phrases, and wherein the postamble portion of the audio data follows the portion of the audio data that includes the one or more particular words or phrases in response to determining to refrain from processing the preamble portion of the audio data and/or postamble portion of the audio data: causing an automated assistant to perform an assistant command that is associated with the one or more particular words or phrases; and in response to determining to process the preamble portion of the audio data and/or postamble portion of the audio data: processing the preamble portion of the audio data and/or postamble portion of the audio data to determine whether to perform the assistant command that is associated with the one or more particular words or phrases.
1. A method implemented by one or more processors, the method comprising: processing, using a warm word model, a stream of audio data to monitor for an occurrence of one or more particular words or phrases, the stream of audio data being generated by one or more microphones of a client device of a user, and each of the one or more particular words or phrases being associated with an assistant command; in response to determining a portion of the stream of audio data corresponds to one or more of the particular words or phrases: processing, using a voice activity detection (VAD) model, the stream of audio data to monitor for an occurrence of additional voice activity before the portion of the stream of audio data corresponds to one or more of the particular words or phrases and/or after the portion of the stream of audio data corresponds to one or more of the particular words or phrases; in response to determining that there is no additional voice activity before the portion of the stream of audio data corresponds to one or more of the particular words or phrases and/or after the portion of the stream of audio data corresponds to one or more of the particular words or phrases: causing an automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases; and in response to determining that there is additional voice activity before the portion of the stream of audio data corresponds to one or more of the particular words or phrases and/or after the portion of the stream of audio data corresponds to one or more of the particular words or phrases: further processing the stream of audio data to determine whether to cause the automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases.
2. The method of claim 1, further comprising: in response to detecting no occurrence of the one or more particular words or phrases in a portion of the audio data: deactivating one or more currently active automated assistant functions that were previously activated in response to detecting the occurrence of the warm word activation event.
3. The method of claim 2, further comprising: in response to determining the user did not intend the one or more particular words or phrases to cause performance of the assistant command that is associated one or more of the particular words or phrases: refraining from causing the automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases.
3. The method of claim 2, wherein deactivating one or more of the currently active automated assistant functions that were previously activated in response to detecting the occurrence of the warm word activation event is further in response to detecting no occurrence of the one or more particular words or phrases in a portion of the audio data for a threshold duration of time relative to detecting the occurrence of the warm word activation event.

4. The method of claim 1, wherein determining whether to process a preamble portion of the audio data and/or a postamble portion of the audio data is based on detecting voice activity before the portion of the stream of audio data that includes the one or more particular words or phrases and/or after the portion of the stream of audio data that includes the one or more particular words or phrases.

5. The method of claim 4, wherein determining to process a preamble portion of the audio data is based on detecting voice activity before the portion of the stream of audio data that includes the one or more particular words or phrases.

6. The method of claim 4, wherein determining to process a postamble portion of the audio data is based on detecting voice activity after the portion of the stream of audio data that includes the one or more particular words or phrases.

7. The method of claim 1, wherein processing the preamble portion of the audio data and/or postamble portion of the audio data to determine whether to perform the assistant command that is associated with the one or more particular words or phrases comprises: determining, based on processing the preamble portion of the audio data and/or postamble portion of the audio data, whether the user intended the one or more particular words or phrases to cause performance of the assistant commend.
5. The method of claim 2, wherein determining whether the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases based on processing the ASR output comprises: processing, using a natural language understanding (NLU) model, the ASR output to generate NLU output, wherein the ASR output is generated based on both the preamble portion of the audio data and the postamble portion of the audio data; and determining, based on the NLU output, whether the user intended the one or more particular words or phrases to cause performance of the assistant command.
8. The method of claim 7, further comprising: in response to determining that the user intended the one or more particular words or phrases to cause performance of the assistant commend: causing the automated assistant to perform the assistant command that is associated with the one or more particular words or phrases.
6. The method of claim 5, further comprising: in response to determining the NLU output is insufficient for determining whether the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases: processing, using the ASR model, an additional postamble portion of the audio data to generate additional ASR output, wherein the additional postamble portion of the audio data follows the postamble portion of the audio data; and determining, based on processing the additional ASR output, whether the user intended the one or more particular words or phrases to cause performance of the assistant command that is associated with one or more of the particular words or phrases.
9. The method of claim 7, further comprising: in response to determining that the user did not intend the one or more particular words or phrases to cause performance of the assistant commend: refraining from causing the automated assistant to perform the assistant command that is associated with the one or more particular words or phrases.

10. A system comprising: at least one processor; and memory storing instructions that, when executed, cause the at least one processor to be operable to: detect an occurrence of a warm word activation event; and in response to detecting the occurrence of the warm word activation event, activate one or more currently dormant automated assistant functions to process a stream of audio data, using a warm word model associated with the warm word activation event, to monitor for an occurrence of one or more particular words or phrases; in response to detecting the occurrence of the one or more particular words or phrases in a portion of the audio data, determine whether to process a preamble portion of the audio data and/or a postamble portion of the audio data, wherein the preamble portion of the audio data precedes the portion of the audio data that includes the one or more particular words or phrases, and wherein the postamble portion of the audio data follows the portion of the audio data that includes the one or more particular words or phrases in response to determining to refrain from processing the preamble portion of the audio data and/or postamble portion of the audio data: cause an automated assistant to perform an assistant command that is associated with the one or more particular words or phrases; and in response to determining to process the preamble portion of the audio data and/or postamble portion of the audio data: process the preamble portion of the audio data and/or postamble portion of the audio data to determine whether to perform the assistant command that is associated with the one or more particular words or phrases.
13. A system comprising: at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the at least one processor to: process, using a warm word model, a stream of audio data to monitor for an occurrence of one or more particular words or phrases, the stream of audio data being generated by one or more microphones of a client device of a user, and each of the one or more particular words or phrases being associated with an assistant command; in response to determining a portion of the stream of audio data corresponds to one or more of the particular words or phrases: process, using a voice activity detection (VAD) model, the stream of audio data to monitor for an occurrence of additional voice activity before the portion of the stream of audio data corresponds to one or more of the particular words or phrases and/or after the portion of the stream of audio data corresponds to one or more of the particular words or phrases; in response to determining that there is no additional voice activity before the portion of the stream of audio data corresponds to one or more of the particular words or phrases and/or after the portion of the stream of audio data corresponds to one or more of the particular words or phrases: cause an automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases; and in response to determining that there is additional voice activity before the portion of the stream of audio data corresponds to one or more of the particular words or phrases and/or after the portion of the stream of audio data corresponds to one or more of the particular words or phrases: further process the stream of audio data to determine whether to cause the automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases.
11. The system of claim 10, wherein the at least one processor is further operable to: in response to detecting no occurrence of the one or more particular words or phrases in a portion of the audio data: deactivate one or more currently active automated assistant functions that were previously activated in response to detecting the occurrence of the warm word activation event.
15. The system of claim 14, wherein the instructions further cause the at least one processor to: in response to determining the user did not intend the one or more particular words or phrases to cause performance of the assistant command that is associated one or more of the particular words or phrases: refrain from causing the automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases.
12. The system of claim 11, wherein deactivating one or more of the currently active automated assistant functions that were previously activated in response to detecting the occurrence of the warm word activation event is further in response to detecting no occurrence of the one or more particular words or phrases in a portion of the audio data for a threshold duration of time relative to detecting the occurrence of the warm word activation event.

13. The system of claim 10, wherein determining whether to process a preamble portion of the audio data and/or a postamble portion of the audio data is based on detecting voice activity before the portion of the stream of audio data that includes the one or more particular words or phrases and/or after the portion of the stream of audio data that includes the one or more particular words or phrases.

14. The system of claim 13, wherein determining to process a preamble portion of the audio data is based on detecting voice activity before the portion of the stream of audio data that includes the one or more particular words or phrases.

15. The system of claim 13, wherein determining to process a postamble portion of the audio data is based on detecting voice activity after the portion of the stream of audio data that includes the one or more particular words or phrases.

16. The system of claim 10, wherein the instructions to process the preamble portion of the audio data and/or postamble portion of the audio data to determine whether to perform the assistant command that is associated with the one or more particular words or phrases comprise instructions to: determine, based on processing the preamble portion of the audio data and/or postamble portion of the audio data, whether the user intended the one or more particular words or phrases to cause performance of the assistant commend.

17. The system of claim 16, wherein the at least one processor is further operable to: in response to determining that the user intended the one or more particular words or phrases to cause performance of the assistant commend: cause the automated assistant to perform the assistant command that is associated with the one or more particular words or phrases.

18. The system of claim 16, wherein the at least one processor is further operable to: in response to determining that the user did not intend the one or more particular words or phrases to cause performance of the assistant commend: refrain from causing the automated assistant to perform the assistant command that is associated with the one or more particular words or phrases.
15. The system of claim 14, wherein the instructions further cause the at least one processor to: in response to determining the user did not intend the one or more particular words or phrases to cause performance of the assistant command that is associated one or more of the particular words or phrases: refrain from causing the automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases.
19. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform operations, the operations comprising: detecting an occurrence of a warm word activation event; and in response to detecting the occurrence of the warm word activation event, activating one or more currently dormant automated assistant functions to process a stream of audio data, using a warm word model associated with the warm word activation event, to monitor for an occurrence of one or more particular words or phrases; in response to detecting the occurrence of the one or more particular words or phrases in a portion of the audio data, determining whether to process a preamble portion of the audio data and/or a postamble portion of the audio data, wherein the preamble portion of the audio data precedes the portion of the audio data that includes the one or more particular words or phrases, and wherein the postamble portion of the audio data follows the portion of the audio data that includes the one or more particular words or phrases in response to determining to refrain from processing the preamble portion of the audio data and/or postamble portion of the audio data: causing an automated assistant to perform an assistant command that is associated with the one or more particular words or phrases; and in response to determining to process the preamble portion of the audio data and/or postamble portion of the audio data: processing the preamble portion of the audio data and/or postamble portion of the audio data to determine whether to perform the assistant command that is associated with the one or more particular words or phrases.
20. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform operations, the operations comprising: processing, using a warm word model, a stream of audio data to monitor for an occurrence of one or more particular words or phrases, the stream of audio data being generated by one or more microphones of a client device of a user, and each of the one or more particular words or phrases being associated with an assistant command; in response to determining a portion of the stream of audio data corresponds to one or more of the particular words or phrases: processing, using a voice activity detection (VAD) model, the stream of audio data to monitor for an occurrence of additional voice activity before the portion of the stream of audio data corresponds to one or more of the particular words or phrases and/or after the portion of the stream of audio data corresponds to one or more of the particular words or phrases; in response to determining that there is no additional voice activity before the portion of the stream of audio data corresponds to one or more of the particular words or phrases and/or after the portion of the stream of audio data corresponds to one or more of the particular words or phrases: causing an automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases; and in response to determining that there is additional voice activity before the portion of the stream of audio data corresponds to one or more of the particular words or phrases and/or after the portion of the stream of audio data corresponds to one or more of the particular words or phrases: further processing the stream of audio data to determine whether to cause the automated assistant to perform the assistant command that is associated with one or more of the particular words or phrases.



Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
Lang (US 11,380,322 B2) teaches suppressing a wake word response to a local wake word. An example implementation involves a playback device receiving audio content for playback by the playback device and providing a sound data stream representing the received audio content to a voice assistant service (VAS) wake-word engine and a local keyword engine. The playback device plays back a first portion of the audio content and detects, via the local keyword engine, that a second portion of the received audio content includes sound data matching one or more particular local keywords. Before the second portion of the received audio content is played back, the playback device disables a local keyword response of the local keyword engine to the one or more particular local keywords and then plays back the second portion of the audio content via one or more speakers.
Hughes et al., (US 2018/0182390 A1) teach methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextual hotwords are disclosed. In one aspect, a method, during a boot process of a computing device, includes the actions of determining, by a computing device, a context associated with the computing device. The actions further include, based on the context associated with the computing device, determining a hotword. The actions further include, after determining the hotword, receiving audio data that corresponds to an utterance. The actions further include determining that the audio data includes the hotword. The actions further include, in response to determining that the audio data includes the hotword, performing an operation associated with the hotword.
Garcia et al., (US 2019/0295544 A1) teach systems and processes for operating a virtual assistant to provide natural assistant interaction are provided. In accordance with one or more examples, a method includes, at an electronic device with one or more processors and memory: receiving a first audio stream including one or more utterances; determining whether the first audio stream includes a lexical trigger; generating one or more candidate text representations of the one or more utterances; determining whether at least one candidate text representation of the one or more candidate text representations is to be disregarded by the virtual assistant. If at least one candidate text representation is to be disregarded, one or more candidate intents are generated based on candidate text representations of the one or more candidate text representations other than the to be disregarded at least one candidate text representation.
Tukka et al., (US 2019/0371342 A1) disclose methods and systems for passive wakeup of a user interaction device and configuring a dynamic wakeup time for a user interaction device, a method includes detecting an occurrence of at least one first non-voice event associated with at least one device present in an Internet of Things (IoT) environment. The method includes detecting an occurrence of at least one successive event associated with the at least one device. The method includes estimating a contextual probability of initiating at least one interaction by a user with the user interaction device on detecting the occurrence of at least one of the at least one first event and the at least one successive event. On determining the estimated contextual probability is above a pre-defined threshold value, the method includes configuring the dynamic wakeup time to switch the user interaction device to a passive wakeup state.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIJAY B CHAWAN whose telephone number is (571)272-7601. The examiner can normally be reached 7-5 Monday thru Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VIJAY B CHAWAN/Primary Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Jun 21, 2024
Application Filed
Feb 20, 2026
Non-Final Rejection — §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/384,607
Patent 12603089
ELECTRONIC APPARATUS PERFORMING SPEECH RECOGNITION AND METHOD FOR CONTROLLING THEREOF
2y 5m to grant Granted Apr 14, 2026
18/438,891
Patent 12592229
WAKEWORD DETECTION
2y 5m to grant Granted Mar 31, 2026
18/512,110
Patent 12586579
End-To-End Segmentation in a Two-Pass Cascaded Encoder Automatic Speech Recognition Model
2y 5m to grant Granted Mar 24, 2026
18/814,983
Patent 12585895
Communication Channel Quality Improvement System Using Machine Conversions
2y 5m to grant Granted Mar 24, 2026
18/363,309
Patent 12579968
METHOD OF DETERMINING END POINT DETECTION TIME AND ELECTRONIC DEVICE FOR PERFORMING THE METHOD
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
88%
Grant Probability
99%
With Interview (+11.6%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 882 resolved cases by this examiner. Grant probability derived from career allow rate.