Last updated: May 04, 2026

Application No. 18/765,108

VOICE SHORTCUT DETECTION WITH SPEAKER VERIFICATION

Non-Final OA §DP

Filed

Jul 05, 2024

Priority

Apr 16, 2021 — continuation of 11/568,878 +1 more

Examiner

ROBERTS, SHAUN A

Art Unit

2655

Tech Center

2600 — Communications

Assignee

Google LLC

OA Round

1 (Non-Final)

Interview Optional

— +10.2% interview lift. Interview lift (+10.2%) is below the 15.0% threshold. A written response is recommended.

Based on 650 resolved cases, 2023–2026

Examiner Intelligence

ROBERTS, SHAUN A View full profile →

Grants 76% — above average

Career Allowance Rate

494 granted / 650 resolved

+14.0% vs TC avg

Moderate +10% lift

Without

With

+10.2%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

28 currently pending

Career history

678

Total Applications

across all art units

Statute-Specific Performance

§101

7.6%

-32.4% vs TC avg

§103

49.2%

+9.2% vs TC avg

§102

29.4%

-10.6% vs TC avg

§112

3.5%

-36.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 650 resolved cases

Office Action

§DP

DETAILED ACTION
1.	This action is responsive to Application no.18/765,108.  All claims have been examined and are currently pending.
Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
3.	The information disclosure statement (IDS) submitted is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Double Patenting
4.	The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
5.	Claims 1-20 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-4, 6-10 13-18, 20 of U.S. Patent No.11,568,878. Although the claims at issue are not identical, they are not patentably distinct from each other because they recite similar limitations, and thus anticipate the limitations of the application claims;
both teaching processing the plurality of device speaker embeddings to generate a multi-user speaker embedding, to generate and recognize separated audio for causing a device to perform an action.
6.	Claims 1-20 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-4, 6-16, 18-20 of U.S. Patent No.12,033,641. Although the claims at issue are not identical, they are not patentably distinct from each other because they recite similar limitations, and thus anticipate the limitations of the application claims;
both teaching processing the plurality of device speaker embeddings to generate a multi-user speaker embedding, to generate and recognize separated audio for causing a device to perform an action.

Conclusion
7.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: See PTO-892, where the application would be in condition for allowance once the double patenting rejection is overcome.
	The Application at hand teaches:
[0081] FIG. 5 illustrates an example of generating an attended speaker embedding for multiple users in accordance with various implementations disclosed herein. Speaker-aware technologies, such as voice filter technology generally assume the neural network takes a single embedding (also referred to herein as a d-vector) as a side input, thus can only be personalized for a single user at runtime. However, many smart devices, such as home speakers, can be a shared device among multiple users. For example, smart home speakers are usually shared between multiple family members. In such cases, conventional voice filter model techniques may be impractical to use.
[0084] In some implementations, a system, such as a shared smart home speaker, may have an unknown number of users. In some of those implementations, the system may have multiple speaker embeddings, each corresponding to a distinct user of the shared device. For example, assume we have three users of a shared device and three corresponding speaker embeddings: d.sub.1, d.sub.2, and d.sub.3.
[0085] In some implementations, the speaker embeddings can be concatenated from multiple enrolled users. The concatenated speaker embeddings can be processed using the voice filter model to generate the predicted mask. In some versions of those implementations, the system needs to know the maximal number of enrolled users in advance. For example, the system can have three speaker embeddings d.sub.1, d.sub.2, and d.sub.3 corresponding to three enrolled users.
	
The closest prior art is as follows:

	Zhang et al (2021/0390970)
[0028] The speaker embedding network 206 may process the enrollment audio of a target speaker, such as the audio data 212, and may generate an utterance-level speaker embedding. Speaker embedding may be a bias signal that may inform the multi-modal separation network 208 to perform and enhance target speaker separation. A pre-trained speaker model may be introduced and utilized for producing speaker embeddings to characterize the target speaker. The speaker model may be pre-trained on the task of speaker verification, using one or more convolution layers followed by a fully connected layer. The input to the speaker model may an enrollment utterance of the target speaker, such as saying one's name when entering a teleconference. The speaker embedding network 206 may output the utterance-level speaker embedding.

Zhao et al 2021/0082438
[0032] Speaker identifier 155 computes a similarity between speaker embedding z associated with the unknown speaker and each of enrollment embeddings 160, which comprise utterance-level speaker embeddings associated with each of several enrollment speakers. Each of enrollment embeddings 160 was previously-generated based on speech frames of a respective enrollment speaker using the same components 105, 115, 125, 140 and 150 which were used to generate speaker embedding z. Speaker identifier 155 may identify the unknown speaker as the enrollment speaker whose associated embedding is most similar to speaker embedding z associated with the unknown speaker. If none of enrollment embeddings 160 is sufficiently similar to speaker embedding z, speaker identifier 155 may output an indication that the unknown speaker cannot be identified from (i.e., is not one of) the enrollment speakers. Speaker identifier 155 may be implemented by a computing device executing an algorithm for computing similarities, by a trained neural network, etc.

Chen et al 2019/0318757
Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAUN A ROBERTS whose telephone number is (571)270-7541.  The examiner can normally be reached Monday-Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool.  To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.
For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAUN ROBERTS/Primary Examiner, Art Unit 2655

Read full office action

Prosecution Timeline

Jul 05, 2024

Application Filed

Feb 19, 2026

Non-Final Rejection — §DP (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/274,775

Patent 12609133

SCENE ESTIMATE METHOD, SCENE ESTIMATE APPARATUS, AND PROGRAM

2y 8m to grant Granted Apr 21, 2026

18/312,688

Patent 12586599

AUDIO SIGNAL PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM WITH MACHINE LEARNING AND FOR MICROPHONE MUTE STATE FEATURES IN A MULTI PERSON VOICE CALL

2y 10m to grant Granted Mar 24, 2026

18/484,282

Patent 12586568

SYNTHETICALLY GENERATING INNER SPEECH TRAINING DATA

2y 5m to grant Granted Mar 24, 2026

18/179,756

Patent 12573376

Dynamic Language and Command Recognition

3y 0m to grant Granted Mar 10, 2026

18/629,200

Patent 12562157

GENERATING TOPIC-SPECIFIC LANGUAGE MODELS

1y 10m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

76%

Grant Probability

86%

With Interview (+10.2%)

2y 11m (~1y 1m remaining)

Median Time to Grant

Low

PTA Risk

Based on 650 resolved cases by this examiner. Grant probability derived from career allowance rate.