Last updated: April 19, 2026
Application No. 18/766,909
Device Leadership Negotiation Among Voice Interface Devices

Non-Final OA §DP
Filed
Jul 09, 2024
Examiner
COLUCCI, MICHAEL C
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Google LLC
OA Round
1 (Non-Final)
Interview Optional

— +15.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 990 resolved cases, 2023–2026
Examiner Intelligence

COLUCCI, MICHAEL C View full profile →
Grants 76% — above average
Career Allow Rate
749 granted / 990 resolved
+13.7% vs TC avg
Strong +15% interview lift
Without
With
+15.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
41 currently pending
Career history
1031
Total Applications
across all art units
Statute-Specific Performance

§101
14.2%
-25.8% vs TC avg
§103
59.2%
+19.2% vs TC avg
§102
8.5%
-31.5% vs TC avg
§112
6.0%
-34.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 990 resolved cases
Office Action

§DP
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.   A nonstatutory obviousness-type double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and  In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the conflicting application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. 
Effective January 1, 1994, a registered attorney or agent of record may sign a terminal disclaimer. A terminal disclaimer signed by the assignee must fully comply with 37 CFR 3.73(b).
Claims 1 and 11 with dependent claims thereof, are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims 1 and 11 and any dependent claims thereof of U.S. Patent No. 12046241.  Although the conflicting claims are not identical, they are not patentably distinct from each other because said claims of the instant application includes all of the features of said claims of U.S. Patent No. 12046241.  It would have been obvious to one of ordinary skill in the art to omit the step of a media transfer request given that data is being sent by a user’s express action, In re Karlson 136 USPQ 184 (1963): "Omission of an element and its function is an obvious expedient if the remaining elements perform the same functions as before"

Present invention				US Patent 12046241
1. A computer-implemented method when executed on data processing hardware of a first voice assistant device causes the data processing hardware to perform operations comprising: receiving a voice input comprising a hotword and a voice request subsequent to the hotword, the voice input captured by the first voice assistant device and a second voice assistant device, the first voice assistant device and the second voice assistant device each communicatively coupled to a local network implemented at a network interface and configured to respond to voice requests that are subsequent to the hotword; detecting, in the voice input, the hotword; based on detecting the hotword, processing the voice input to determine that the voice request comprises a media play request to playback media content on a media output device; sending, from the first voice assistant device via the local network, a multicast message received by the second voice assistant device, the multicast message received by the second voice assistant device causing the second voice assistant device to not respond to the voice request despite the second voice assistant device capturing the voice input of the hotword and the voice request subsequent to the hotword; andbased on determining that voice request comprises the request to playback the media content on the media output device, causing the media output device to playback the media content.  

2. The computer-implemented method of claim 1, wherein processing the voice input further comprises processing the voice input to determine that the voice request further comprises a voice designation of the media output device, the voice designation comprising a description of the media output device.  
3. The computer-implemented method of claim 2, wherein causing the media output device to playback the media content comprises causing the media output device to playback the media content using the voice designation of the media output device. 
4. The computer-implemented method of claim 1, wherein processing the voice input further comprises processing the voice input to determine that the voice request further comprises a name of a media play application, the media play application streaming the media content from playback on the media output device.  
5. The computer-implemented method of claim 4, wherein causing the media output device to playback the media content comprises causing the media output device to playback the media content using the media play application. 
6. The computer-implemented method of claim 1, wherein processing the voice input further comprises processing the voice input to determine that the voice request further comprises a description of the media content.  
7. The computer-implemented method of claim 1, wherein:the media output device comprises a speaker; and the media content played back by the media output device comprises music audibly played back by the media output device.  
8. The computer-implemented method of claim 1, wherein causing the media output device to playback the media content comprises causing the media output device to playback the media content streamed from a remote content source.  
9. The computer-implemented method of claim 1, wherein the first voice assistant device and the media output device is communicatively to a local network implemented at a network interface.  
10. The computer-implemented method of claim 1, wherein the media output device comprises the first voice assistant device. 
11. A first voice activated device comprising:data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware causes the data processing hardware to perform operations comprising:receiving a voice input comprising a hotword and a voice request subsequent to the hotword, the voice input captured by the first voice assistant device and a second voice assistant device, the first voice assistant device and the second voice assistant device each communicatively coupled to a local network implemented at a network interface and configured to respond to voice requests that are subsequent to the hotword;detecting, in the voice input, the hotword; based on detecting the hotword, processing the voice input to determine that the voice request comprises a media play request to playback media content on a 5 media output device;sending, from the first voice assistant device via the local network, a multicast message received by the second voice assistant device, the multicast message received by the second voice assistant device causing the second voice assistant device to not respond to the voice request despite the second voice assistant device capturing the voice input of the hotword and the voice request subsequent to the hotword; andbased on determining that voice request comprises the request to playback the media content on the media output device, causing the media output device to playback the media content.  
12. The first voice assistant device of claim 11, wherein processing the voice input further comprises processing the voice input to determine that the voice request further comprises a voice designation of the media output device, the voice designation comprising a description of the media output device. 
13. The first voice assistant device of claim 12, wherein causing the media output device to playback the media content comprises causing the media output device to playback the media content using the voice designation of the media output device.  
14. The first voice assistant device of claim 11, wherein processing the voice input further comprises processing the voice input to determine that the voice request further comprises a name of a media play application, the media play application streaming the media content from playback on the media output device.  
15. The first voice assistant device of claim 14, wherein causing the media output device to playback the media content comprises causing the media output device to playback the media content using the media play application.  
16. The first voice assistant device of claim 11, wherein processing the voice input further comprises processing the voice input to determine that the voice request further comprises a description of the media content.  
17. The first voice assistant device of claim 11, wherein:the media output device comprises a speaker; andthe media content played back by the media output device comprises music audibly played back by the media output device.  
18. The first voice assistant device of claim 11, wherein causing the media output device to playback the media content comprises causing the media output device to playback the media content streamed from a remote content source.  
19. The first voice assistant device of claim 11, wherein the first voice assistant device and the media output device is communicatively to a local network implemented at a network interface.20. The first voice assistant device of claim 11, wherein the media output device comprises the first voice assistant device.
20. The first voice assistant device of claim 11, wherein the media output device 20. comprises the first voice assistant device.


1. (Currently Amended) A computer-implemented method when executed on data processing hardware of a first voice assistant device causes the data processing hardware to perform operations comprising: receiving a voice input, the voice input comprising a hotword and a voice request command subsequent to the hotword, the voice input captured by the first voice assistant device and a second voice assistant device, the first voice assistant device and the second voice assistant device each communicatively coupled to a local network implemented at a network interface and configured to respond to voice requests that are subsequent to the hotword; detecting, in the voice input, the hotword; based on detecting the hotword, processing the voice input to determine that the voice request comprises: a media transfer request to transfer playback of media content to a group of one or more media output devices; and a user voice designation of the group of the one or more media output devices, the user voice designation comprising a description of a destination of the group of the one or more media output devices;[[ and]] sending, from the first voice assistant device via the local network, a multicast message received by the second voice assistant device, the multicast message received by the second voice assistant device causing the second voice assistant device to not respond to the voice request despite the second voice assistant device capturing the voice input of the hotword and the voice request subsequent to the hotword: and based on determining that the voice input comprises the media transfer request, causing, using the user voice designation of the group of the one or more media output devices, each media output device in the group of the one or more media output devices to playback the media content.  

2. (Original) The method of claim 1, wherein:the media output devices in the group of the one or more media output devices comprise speakers; and the media content played back by each media output device in the group of the one or more media output devices comprises music audibly played back by each media output device in the group of the one or more media output devices.  
3. (Original) The method of claim 1, wherein the description of the destination of the group of the one or more media output devices comprises a particular room within a house where the group of the one or more media output devices are located.  
4. (Original) The method of claim 1, wherein the description of the destination of the group of the one or more media output devices comprises a particular space within a house where the group of the one or more media output devices are located.  
5. (Original) The method of claim 1, wherein causing each media output device in the group of the one or more media output devices to playback the media content comprises causing each media output device to playback the media content streamed from a remote content source.  
6. (Currently Amended) The method of claim 1, wherein the first voice assistant device and the each media output device in the group of the one or more media output devices are communicatively coupled to[[ a]]the local network implemented at [[a ]]thenetwork interface.  
7. (Currently Amended) The method of claim 6, wherein the first voice assistant device is configured to communicate with at least one media output device in the group of the one or more media output devices through the local network.  
8. (Currently Amended) The method of claim 1, wherein the operations further comprise displaying, via an array of light emitting diodes (LEDs) of the first voice assistant device, a visual pattern on the LEDs while processing the voice input. 
9. (Currently Amended) The method of claim 1, wherein the voice input is captured by a microphone implemented by the first voice assistant activated device.  
10. (Currently Amended) The method of claim 1, wherein the operations further comprise audibly outputting, from a speaker of the first voice activated device, a voice message response to the voice request confirming that the voice request has been fulfilled.  
11. (Currently Amended) A first voice assistant activated device comprising:data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware causes the data processing hardware to perform operations comprising: receiving a voice input, the voice input comprising a hotword and a voice request command subsequent to the hotword, the voice input captured by the first voice assistant device and a second voice assistant device, the first voice assistant device and the second voice assistant device each communicatively coupled to a local network implemented at a network interface and configured to respond to voice requests that are subsequent to the hotword;detecting, in the voice input, the hotword; based on detecting the hotword, processing the voice input to determine that the voice request comprises: a media transfer request to transfer playback of media content to a group of one or more media output devices; and a user voice designation of the group of the one or more media output devices, the user voice designation comprising a description of a destination of the group of the one or more media output devices;[[ and]] sending, from the first voice assistant device via the local network, a multicast message received by the second voice assistant device, the multicast message received by the second voice assistant device causing the second voice assistant device to not respond to the voicerequest despite the second voice assistant device capturing the voice input of the hotword and the voice request subsequent to the hotword and based on determining that the voice input comprises the media transfer request, causing, using the user voice designation of the group of the one or more media output devices, each media output device in the group of the one or more media output devices to playback the media content.  
12. (Currently Amended) The first voice assistant activated device of claim 11, wherein:the media output devices in the group of the one or more media output devices comprise speakers; and the media content played back by each media output device in the group of the one or more media output devices comprises music audibly played back by each media output device in the group of the one or more media output devices.  
13. (Currently Amended) The first voice assistant activated device of claim 11, wherein the description of the destination of the group of the one or more media output devices comprises a particular room within a house where the group of the one or more media output devices are located.  
14. (Currently Amended) The firstvoice assistant activated device of claim 11, wherein the description of the destination of the group of the one or more media output devices comprises a particular space within a house where the group of the one or more media output devices are located.  
15. (Currently Amended) The firstvoice assistant activated device of claim 11, wherein causing each media output device in the group of the one or more media output devices to playback the media content comprises causing each media output device to playback the media content streamed from a remote content source. 
16. (Currently Amended) The first voice assistant activated device of claim 11, wherein the first voice assistant device and the each media output device in the group of the one or more media output devices are communicatively coupled to[[ a]]the local network implemented at [[a ]]the network interface.  
17. (Currently Amended) The first voice assistant activated device of claim 16, wherein the first voice assistant device is configured to communicate with at least one media output device in the group of the one or more media output devices through the local network.  
18. (Currently Amended) The first voice assistant activated device of claim 11, further comprising:an array of light emitting diodes (LEDs), wherein the operations further comprise displaying a visual pattern on the LEDs while processing the voice input.  
19. (Currently Amended) The first voice assistant activated device of claim 11, further comprising:a microphone, wherein the voice input is captured by the microphone.  
20. (Currently Amended) The first voice assistant activated device of claim 11, further comprising:a speaker, wherein the operations further comprise audibly outputting, from the speaker, a voice message response to the voice request confirming that the voice request has been fulfilled.




Allowable Subject Matter
Claims 1-20 are allowed, pending resolution of the outstanding double patenting rejection.
The following is an examiner’s statement of reasons for allowance: 
After a full review of the prior arguments, and after careful review of the complex claims as a whole, the examiner believes that the prior art taken alone or in combination fails to teach the claims as a whole such as receiving a voice input comprising a hotword and a voice request subsequent to the hotword, the voice input captured by the first voice assistant device and a second voice assistant device, the first voice assistant device and the second voice assistant device each communicatively coupled to a local network implemented at a network interface and configured to respond to voice requests that are subsequent to the hotword; detecting, in the voice input, the hotword; based on detecting the hotword, processing the voice input to determine that the voice request comprises a media play request to playback media content on a media output device; sending, from the first voice assistant device via the local network, a multicast message received by the second voice assistant device, the multicast message received by the second voice assistant device causing the second voice assistant device to not respond to the voice request despite the second voice assistant device capturing the voice input of the hotword and the voice request subsequent to the hotword; and based on determining that voice request comprises the request to playback the media content on the media output device, causing the media output device to playback the media content.
The claims as a whole as precisely claimed, under BRI, overcome the prior art of record. The closest prior art group teaches voice potentially going to devices simultaneously and communication between devices in a network, such as to handle a command e.g. “play song”. Additionally, said prior art group teaches a wake word spoken to a single device, analyzed, and the determination is made whether to perform ASR on the command. Other prior art teaches VAD + command to process voice inputs. Under BRI, the prior art 1) lacks the suggestion that the hotword itself is simultaneous among all devices and not ASR based, 2) only sends the hotword to a single device prior to ASR, and 3) does not suggest that any communication from device 1 to device 2 (or nth) causes it to not respond or cease processing per se. The prior art at best teaches a multiple device system where devices communicate to control one another, in which a wake/trigger word is used to be sure processing of a command should take place while a device remains asleep. Assuming that 1) is VAD based and 2) can be implemented on multiple devices, the concept of such a combination would not be feasibly executed, since one of ordinary skill would not implement a hotword or wake check in a system with multi-device control where a keyword analysis of a single command is the core concept. And similarly in a VAD hotword check system, one of ordinary skill in the art would not modify a single device to many device scheme reasonably. This would be a drastic modification to core concepts of the prior art. Therefore, the prior art fails to teach or suggestion the complex claims as a whole. 
As supplemental analysis, regarding the exclusion of devices and ASR versus hotword concepts, assuming under BRI, the system goes to sleep per er, the closest reference which appears the most similar in scope is Rosenberger 8340975, which teaches a broad concept that one device out of several is selected based on the highest weight derived from a voice/speech input. Combining Kim with Rosenberger would not advance
 any combinational strength in prior art since the multi-device approach is
 redundant without the value of some other element that would read upon
 the claim. Outside of Kim, Rosenberger, and outdated same-assignee art,
 there exists references at best mildly related, in the scopes of time between inputs for detect/non-detect, 1st  to output new 2nd
 signal energy with randomized subtraction signal, and manual option selection to compare signals. Thereby rendering Rosenberger the solo most pertinent reference. At a high level first glance Rosenberger appears to read upon the claims such as using a voice input selecting one device from multiple based on weighted message level. Under BRI this is sound. However, there are key distinctions which do not warrant its use in a rejection. For instance, in Rosenberger the system expressly uses speech recognition for both trigger word and command. While both present invention and Rosenberger have the end result of isolating a single device to respond to the user, the combination of when speech recognition is performed and device awoken is different. The distinction can be made as follows on the
Rosenberger:
1) All devices are in low power mode and will perform ASR.
2) A trigger word is input and processed using ASR on all
devices.
3) The device with the highest message weight is isolated.
4) The device a) recognizes the next command and b) responds.

Present invention claims:
1) All devices only detect audio, no ASR used.
2) A voice input is detected using audio analysis on all devices.
3) The device with the highest value is isolated.
4) The device responds in some capacity.

The biggest distinction is that the present invention device does not begin to respond in any way until both conditions are met: first) voice is detected and second) the highest voice value is determined. Speech recognition or any subsequent steps (if present or extrapolated from the claims) can take place until the first and second conditions are met (note that speech recognition is never claimed or suggested). Under BRI we can assume that in the field of invention, between score and response, a subsequent command can be processed with ASR. However, the claims are void of ASR suggestions, particularly using the concept of voice detection which is in line with voice activity detection which is used to determine whether human speech is present. While VAD is commonly used as part of ASR, it is not substitutable for ASR per se. Therefore, the interpretation under BRI that ASR is performed by the claims is not warranted since voice detection is not speech recognition i.e. human voice presence indication is not converting voice to text. Further assuming somehow that detecting a voice is used in conjunction with ASR, in light of the present invention we can safely assume under BRI that the command is processed with ASR but the voice input detection portion is void of ASR. Alternatively, the voice detected + the command can be processed under ASR but must be subsequent to voice activity detection thereof. Such assumptions are outside the tangible claim scope since no steps past voice detection can be realized, but are reasonable for permutations of subsequent art scope in light of, and in comparison, to Rosenberger i.e. a command commonly follows a trigger. Further assuming that low power mode and asleep amount to the same thing, if a device can still process data, then Rosenberger is processing with ASR whereas the present invention claims are processing with voice activity detection and not ASR. Even assuming ASR is present, we revert back to the previous BRI explanation of ASR versus voice detection, and the lack of evidence in the claims suggesting ASR. Additional prior art was searched relative to VAD or voice activity detection which yielded a failed combination to Rosenberger since the use of such references strengthens the claims in lieu of a protentional combination with Rosenberger i.e. VAD is performed separately and first from ASR. Inserting VAD prior to the ASR in Rosenberger would alter the wake system drastically and provides no basis to embed a VAD since this would at best create two triggers and remove the trigger-based purpose in Rosenberger, therefore voiding the suggestion of substitutability or combinability. Overall given the priority dates, same inventors, explanation related to Kim and Rosenberger, and concepts of prior art VAD, the prior art fails to teach or suggest the complex claims as a whole.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

	





Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.   

US 9916839 B1 Scalise; Albert M. et al.
Simultaneous ASR and communication between devices in network

US 20140278435 Al Ganong, Ill; William F. et al. 
Wake word trigger for single device, Dialog matching, context

US 6961704 B1	Phillips; Michael S. et al.
Multiple applications controlled

US 8340975B1 Rosenberger; Theodore Alfred Multiple voice-controlled device, security 

US 20130238326 Al KIM; Yongsin et al. Commands for multiple devices

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL COLUCCI whose telephone number is (571)270-1847.  The examiner can normally be reached on M-F 9 AM - 7 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/MICHAEL COLUCCI/Primary Examiner, Art Unit 2655                                                                                                                                                                                               (571)-270-1847
Examiner FAX:  (571)-270-2847
Michael.Colucci@uspto.gov
Read full office action
Prosecution Timeline

Jul 09, 2024
Application Filed
Jan 16, 2026
Non-Final Rejection — §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/515,502
Patent 12592240
ENCODING AND DECODING OF ACOUSTIC ENVIRONMENT
2y 5m to grant Granted Mar 31, 2026
18/585,168
Patent 12586570
CHUNK-WISE ATTENTION FOR LONGFORM ASR
2y 5m to grant Granted Mar 24, 2026
18/131,021
Patent 12573405
WORD CORRECTION USING AUTOMATIC SPEECH RECOGNITION (ASR) INCREMENTAL RESPONSE
2y 5m to grant Granted Mar 10, 2026
18/656,274
Patent 12573380
MANAGING AMBIGUOUS DATE MENTIONS IN TRANSFORMING NATURAL LANGUAGE TO A LOGICAL FORM
2y 5m to grant Granted Mar 10, 2026
18/492,177
Patent 12567414
SYSTEM AND METHOD FOR DETECTING A WAKEUP COMMAND FOR A VOICE ASSISTANT
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
91%
With Interview (+15.3%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 990 resolved cases by this examiner. Grant probability derived from career allow rate.