Last updated: May 29, 2026
Application No. 18/292,297
IMPROVING DETECTION OF VOICE-BASED KEYWORDS USING FALSELY REJECTED DATA

Non-Final OA §102§103
Filed
Jan 25, 2024
Priority
Sep 26, 2021 — nonprovisional of PCTCN2021120549
Examiner
WOZNIAK, JAMES S
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
2 (Non-Final)
This examiner grants 60% of cases after interview

— +39.4% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 391 resolved cases, 2023–2026
Examiner Intelligence

WOZNIAK, JAMES S View full profile →
Grants 60% of resolved cases
Career Allowance Rate
233 granted / 391 resolved
-2.4% vs TC avg
Strong +39% interview lift
Without
With
+39.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
21 currently pending
Career history
429
Total Applications
across all art units
Statute-Specific Performance

§101
7.2%
-32.8% vs TC avg
§103
82.5%
+42.5% vs TC avg
§102
5.8%
-34.2% vs TC avg
§112
4.2%
-35.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 391 resolved cases
Office Action

§102 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment

In response to the Non-final Office Action from 9/3/2025, Applicant has filed an amendment on 11/24/2025.  In this reply, Applicant has only elected to file minor amendments to independent claims 1, 12, 22, and 27 that involve repeating language from the prior detecting step that was already included as "the detecting" of the "responsive to the detecting...obtaining" step in the 1/25/2024 claim set and the small addition of the obtaining being from an audio buffer that was already considered in a dependent claim (11).  The amendments amount to limitations that do not alter claim scope and use of an audio buffer that was already mapped to the prior art.  Applicant is entitled to their own prosecution strategy though it is recommended that more substantive amendments be considered in the future in order to meaningfully advance prosecution.
Applicant has also argued that the prior art of record fails to teach the limitation regarding responsive to the detecting, using the user audio detection model, the presence of the true keyword sample in the audio from the user, obtaining, from an audio buffer, a plurality of user audio data preceding the true keyword sample, the plurality of user audio data comprising one or more falsely rejected true keyword samples, the one or more falsely rejected true keyword samples being insufficient to change the operational state of the user equipment (Remarks, Pages 11-15).  
These arguments have been fully considered, however, are not found to be persuasive due to the reasons noted in the below Response to Arguments section.  The grounds of rejection have been updated to reflect the claim language changed via the instant amendment.

Applicants argue that claims 5-6, 10, 20, and 29-30 have been amended to address the noted informalities and request withdrawal of the objections directed towards minor informalities (Remarks, Page 10).
In response to the correction of the typographical and grammatical issues with these claims, the claim objections directed towards minor informalities have been withdrawn.

Applicant argues that the 35 U.S.C. 101 rejection of claims 22-26 should be withdrawn since these claims have been amended to recite a non-transitory computer-readable storage medium (Remarks, Page 10).
In response, since the storage medium is now directly associated with the modifier "non-transitory" that has an ordinary meaning of excluding signals per se under the broadest reasonable interpretation (BRI), claims 22-26 are no longer directed towards a non-statutory signal per se in step 1 under the BRI.  Accordingly, the 35 U.S.C. 101 rejection of these claims has been withdrawn.

Response to Arguments

With respect to independent Claim 1, Applicant centers their traversal on i) the limitation regarding "responsive to the detecting, using the user audio detection model, the presence of the true keyword sample in the audio from the user, obtaining, from an audio buffer, a plurality of user audio data preceding the true keyword sample, the plurality of user audio data comprising one or more falsely rejected true keyword samples, the one or more falsely rejected true keyword samples being insufficient to change the operational state of the user equipment" and ii) whether this limitation is provided by the teachings of Wu, et al. (U.S. Patent:  10,892,599).  
Specifically, Applicant contends that in Wu the first audio data "was already captured, generated, scored, and stored" before the second audio was even captured" and thus opines that WU does not teach obtaining user audio data that precedes a detected true keyword sample in response to detecting the true keyword sample (Remarks, Page 14).
In response, Applicant is first directed towards the determination that "the second audio data...corresponds to a positive detection of a wakeword" (Col. 7, Line 60- Col. 8, Line 34).  This determination regarding the second audio data maps to the claimed detection of "a presence of a true keywords sample in the audio from the user."  In making such a determination or "responsive to" making such a determination in the same citation, Wu describes looking back at "first audio data" that was determined to correspond to an improper/false "negative detection of the keyword based upon a time-based proximity.  At that point, the first audio data is obtained from an audio buffer (i.e., the rolling audio buffer described in Col. 3, Lines 30-38) because the "first audio data" is accessed from where it is stored and relied upon to generate "an updated model based on the first audio data" (Col. 3, Lines 1-11; Col. 4, Lines 3-7; Col. 8, Lines 31-34; Col. 9, Lines 44-62; Fig. 5, Element 520).  Applicant's argument that the previous capture-to-storage sequence somehow prevents the first audio data from being retrieved is not persuasive in the process of looking back at false negatives based upon time proximity and then actively using "first audio data" to update the model taught by Wu.  Accordingly, this argument is not found to be persuasive. 
Applicant next argues that while Wu discloses a rolling buffer, Wu does not teach that responsive to detecting a true keywords sample, the system obtains user audio data that precedes the detected true keyword sample from an audio buffer because Wu's first audio data was already captured and evaluated before the second audio was captured.  Applicant also argues that the first audio data in Wu does not need to be obtain from the buffer in response to the positive wakeword detection because the first audio data was already available from the sequential capture and evaluation where Wu's buffer is only a "general storage mechanism for incoming audio, not as a structure specifically designed to enable retrieval of user audio data" (Remarks, Pages 14-15).
In response, Applicant is directed towards the preceding reply and citations.  The Col. 3 citation explicitly states that first audio data is stored in the rolling buffer after capture.  That “first audio data” stored in the rolling buffer after capture is then relied upon to update the model after positive detection of a wakeword in the second audio data.  Accordingly, this argument is not found to be persuasive.
The art rejections of the remaining independent and dependent claims have been traversed for reasons similar to independent claim 1 (Remarks, Page 15).  In response to such arguments, Applicant is directed towards the preceding claim 1 rebuttal.

Claim Interpretation

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 

Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.  The limitations being interpreted under 35 U.S.C. 112(f) are as follows and accompanied by an explanation of mapping to corresponding structure in the instant specification:
Limitation interpreted under 35 U.S.C. 112(f)
Explanation of Mapping
means for implementing a user audio detection model
Computer-implemented 35 U.S.C. 112(f) is invoked.  See WMS Gaming, Inc. v. Int’l Game Tech., 184 F.3d 1339, 1349, 51 USPQ2d 1385, 1391 (Fed. Cir. 1999) and MPEP 2181(II)(B) where the corresponding structure "is not simply a general-purpose computer by itself but the special purpose computer as programmed to perform the disclosed algorithm."  In this case, the corresponding structure is a computer processor of user equipment (Paragraph 0101) configured to perform implementing an existing audio detection model based on one or more true keyword samples (Paragraph 0073).
means for detecting audio from a user
A microphone that detects voice-based keywords (Paragraphs 0076-0077). 
means for detecting, using the user audio detection model, presence of a true keyword sample in the audio from the user
Computer-implemented 35 U.S.C. 112(f) is invoked.  In this case, the corresponding structure is a computer processor of user equipment (Paragraph 0101) configured to perform determining the presence of a true keyword via evaluation against a voice profile of the user and/or based on a similarity of the detected audio to the user keyword model (Paragraph 0078).
 means for obtaining, responsive to the detecting of the audio from the user, a plurality of user audio data preceding the true keyword sample…
Computer-implemented 35 U.S.C. 112(f) is invoked.  In this case, the corresponding structure is a computer processor of user equipment (Paragraph 0101) configured to perform separating and labeling falsely rejected keyword samples that are obtained from an audio buffer preceding a true keyword sample (Paragraph 0081).
means for transmitting at least a portion of the one or more falsely rejected true keyword samples to a networked entity, or accessing the at least portion of the one or more falsely rejected true keyword samples locally at the user equipment, the at least portion of the one or more falsely rejected true keyword samples configured to be used in generation of an updated user audio detection model
Wired, wireline, or wireless network interfaces for transmission (Paragraph 0060) and/or computer-implemented sample splitting (Paragraph 0084).
means for receiving the updated user audio detection model from the networked entity, or locally generating the updated user audio detection model using the at least portion of the one or more falsely rejected true keyword samples
Wired, wireline, or wireless network interfaces for reception (Paragraph 0060).
means for separating the one or more falsely rejected true keyword samples from the one or more false keyword samples prior to the transmitting of the at least portion of the one or more falsely rejected true keyword samples to the networked entity
Computer-implemented 35 U.S.C. 112(f) is invoked.  In this case, the corresponding structure is a computer processor of user equipment (Paragraph 0101) configured to perform separation of falsely rejected keyword samples from false keyword samples using a keyword model to compare audio characteristics (Paragraph 0083).
means for determining a presence of the one or more falsely rejected keyword true samples in the plurality of user audio data based on a first similarity threshold associated with the user audio detection model being met or exceeded but not meeting or exceeding a second similarity threshold.
Computer-implemented 35 U.S.C. 112(f) is invoked.  In this case, the corresponding structure is a computer processor of user equipment (Paragraph 0101) configured to perform comparing a similarity score with a user keyword model to first and second similarity thresholds where the presence of a falsely rejected true keyword sample may be determined based on a first similarity threshold associated with the user keyword model being met or exceeded but not meeting or exceeding a second similarity threshold (Paragraph 0040).
means for maintaining an audio buffer
Computer-implemented 35 U.S.C. 112(f) is invoked.  In this case, the corresponding structure is a computer processor of user equipment (Paragraph 0101) configured to recording audio to a buffer with a prescribed length of time (Paragraph 0032).


Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-4, 7-9, 11-15, 18-19, 21-24, and 26 are rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Wu, et al. (U.S. Patent:  10,872,599).
With respect to Claim 1, Wu discloses:
A method of updating a user audio detection model on a user equipment, the method comprising: 
implementing the user audio detection model, the user audio detection model configured to change an operational state of the user equipment based on true keyword samples detected (implementing a speech key/wake word detection model that changes the state of a controlled device when a positive/true audio input is detected (e.g., causes the device to wake up/activate), Col. 2, Lines 55-60; Col. 4, Lines 23-61; and Col. 10, Lines 30-57); 
detecting audio from a user (detecting and capturing user audio, Col. 2, Lines 55-65; Col. 3, Lines 30-38; Col. 7, Line 60- Col. 8, Line 34); 
detecting, using the user audio detection model, presence of a true keyword sample in the audio from the user (“positive detection of a wakeword,” Col. 7, Line 60- Col. 8, Line 34; Fig. 5, Element 516); 
responsive to the detecting , using the user audio detection model, the presence of the true keyword sample in the audio from the user, obtaining, from an audio buffer, a plurality of user audio data preceding the true keyword sample, the plurality of user audio data comprising one or more falsely rejected true keyword samples, the one or more falsely rejected true keyword samples being insufficient to change the operational state of the user equipment (obtaining audio data samples from an earlier/first user audio input that pertains to a false negative/detection of the keyword wherein in a negative detection, the system sees the wakeword as "not present in the first audio" and so does not act on the wakeword to active a device, Col. 2, Lines 55-65; Col. 3, Line 39- Col. 4, Line 8 ; Col. 7, Line 60- Col. 8, Line 34; and Fig. 5, Elements 502, 504, and 518; rolling audio buffer storing the first audio data, Col. 3, Lines 30-38); 
transmitting at least a portion of the one or more falsely rejected true keyword samples to a networked entity, or accessing the at least portion of the one or more falsely rejected true keyword samples locally at the user equipment, the at least portion of the one or more falsely rejected true keyword samples configured to be used in generation of an updated user audio detection model (sending first audio data to a server that corresponds to negatively rejected true wakeword samples to generate a trained updated model , Col. 2, Lines 55-65; Col. 4, Lines 8-17; Col. 5, Lines 16-35; Col. 7, Line 27- Col. 8, Line 34; Col. 9, Lines 44-62; Fig. 5, Elements 504, 512, and 520); and receiving the updated user audio detection model from the networked entity, or locally generating the updated user audio detection model using the at least portion of the one or more falsely rejected true keyword samples (receiving the trained updated model at a user device from the server , Col. 2, Lines 55-65; Col. 4, Lines 8-17; Col. 5, Lines 16-35; Col. 7, Line 27- Col. 8, Line 34; Fig. 1, Element 138; and Fig. 5, Element 520).
With respect to Claim 2, Wu further discloses:
The method of claim 1, wherein: the plurality of user audio data further comprises one or more false keyword samples (false audio error data samples where a keyword is not present in the input audio such as for when a user intentionally tries to confuse a device, Col. 2, Lines 41-45; Col. 9, Line 63- Col. 10, Line 7); and the method further comprises separating the one or more falsely rejected true keyword samples from the one or more false keyword samples prior to the transmitting of the at least portion of the one or more falsely rejected true keyword samples to the networked entity (the false samples are not sent to the server for model updating as opposed to the identified false negatives that are separated and sent to the server for model updating, Col. 2, Lines 41-45; Col. 4, Lines 8-17; Col. 5, Lines 16-35; Col. 7, Line 27- Col. 8, Line 34; Col. 9, Line 63- Col. 10, Line 7).
With respect to Claim 3, Wu further discloses:
The method of claim 2, further comprising discarding the one or more false keyword samples (the false keyword samples are discarded for model update at a server, Col. 2, Lines 41-45; Col. 9, Line 63- Col. 10, Line 7).
With respect to Claim 4, Wu further discloses:
The method of claim 1, further comprising transmitting the user audio detection model to the networked entity prior to the transmitting of the at least portion of the one or more falsely rejected true keyword samples to the networked entity, the networked entity comprising a server apparatus (a wakeword model may be trained in earlier iterations in noting that an initially trained model is updated “more” than one time wherein the user detection model is sent from a user device to a server, Col. 3, Lines 1-20 and Col. 3, Line 64- Col. 4, Line 17).
With respect to Claim 7, Wu further discloses:
The method of claim 1, further comprising replacing the user audio detection model with the received updated user audio detection model (the user trained audio wakeword detection model is replaced with the updated version, Col. 3, Lines 1-20; Col. 9, Lines 22-34; and Fig. 1, Element 138).
With respect to Claim 8, Wu further discloses:
The method of claim 1, further comprising (note that this claim regards a second iteration of the detecting, transmitting, and receiving steps of claim 1 and that Wu further describes second and beyond iterations in that “a trained model [is] updated one or more times to account for how a particular user or users speaks the wakeword,” Col. 3, Lines 1-20): 
detecting one or more second falsely rejected true samples subsequent to the receiving of the updated user audio detection model (obtaining audio data samples from an earlier/first user audio input that pertains to a false negative/detection of the keyword wherein in a negative detection, the system sees the wakeword as "not present in the first audio" and so does not act on the wakeword to active a device, Col. 2, Lines 55-65; Col. 3, Line 39- Col. 4, Line 8 ; Col. 7, Line 60- Col. 8, Line 34; and Fig. 5, Elements 502, 504, and 518); 
transmitting at least a portion of the one or more second falsely rejected true samples to the networked entity, the at least portion of the one or more second falsely rejected true samples configured to be used in generation of a second updated user audio detection model (sending first audio data to a server that corresponds to negatively rejected true wakeword samples to generate a trained updated model , Col. 2, Lines 55-65; Col. 4, Lines 8-17; Col. 5, Lines 16-35; Col. 7, Line 27- Col. 8, Line 34; Col. 9, Lines 44-62; Fig. 5, Elements 504, 512, and 520); 
receiving the second updated user audio detection model from the networked entity (receiving the trained updated model at a user device from the server, Col. 2, Lines 55-65; Col. 4, Lines 8-17; Col. 5, Lines 16-35; Col. 7, Line 27- Col. 8, Line 34; Fig. 1, Element 138; and Fig. 5, Element 520); and replacing the updated user audio detection model with the second updated user audio detection model (the user trained audio wakeword detection model is replaced with the updated version, Col. 3, Lines 1-20; Col. 9, Lines 22-34; and Fig. 1, Element 138;  “a trained model [is] updated one or more times to account for how a particular user or users speaks the wakeword,” Col. 3, Lines 1-20).
With respect to Claim 9, Wu teaches that an audio data sample that is greater than a threshold (i.e., a true sample) may be sent to the server (Col. 2, Lines 37-41) and that training/updating may take place over several iterations using audio data received from a user device (Col. 3, Lines 1-20).  Note that the additional method step recited in claim 9 involves that the true audio data samples are "configured to be used in the generation of the updated user audio detection model" and does not include a step for actually using the true audio data samples for model updating.  Accordingly, since the audio data in Wu is transmitted to the server and the server uses such extracted audio data to train the model, the true audio data samples greater than the threshold are "configured" for the intended use of training.
With respect to Claim 11, Wu further discloses:
The method of claim 1, further comprising temporarily storing the audio from the user in the audio buffer for a prescribed length, the audio buffer comprising the true keyword sample and the one or more falsely rejected true keyword samples (rolling buffer (i.e., a buffer that continually receives input data and overwrites the oldest data with the newest data when filled) that stores incoming audio data, Col. 3, Lines 30-38, wherein audio that is contained in a buffer includes true keyword samples exceeding a threshold and negative rejections of true keywords, Col. 2, Lines 21-54 and Col. 7, Line 60- Col. 8, Line 34).
Claim 12 is directed towards a user equipment system embodiment comprising a memory and a processor coupled to the memory for carrying out the method of claim 1, and thus, is rejected under similar rationale.  Moreover, Wu teaches method implementation as user equipment comprising a memory and processor coupled to the memory (Col. 3, Lines 21-38; Col. 12, Lines 36-64; Fig. 10 showing user equipment having a processor coupled to a memory).
Claims 13-15, 18-19, and 21 contain subject matter respectively similar to Claims 2-4, 7, 9, and 11 and thus, are rejected under similar rationale.
Claim 22 is directed towards an embodiment on the claimed invention directed towards a non-transitory computer-readable apparatus comprising a storage medium storing a plurality of instructions for carrying out the method of claim 1, and thus, is rejected under similar rationale.  Moreover, Wu teaches method implementation as a non-transitory computer-readable medium storing program instructions (Col. 12, Lines 55-64 and Col. 14, Lines 25-39).
Claim 23-24 and 26 contains subject matter respectively similar to Claim 11, 2, and 7, and thus, are rejected under similar rationale.
Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5, 10, 16, 20, 25, and 27-30 are rejected under 35 U.S.C. 103 as being unpatentable over Wu, et al. in view of Bansal, et al. (U.S. PG Publication:  2016/0077574 A1).
With respect to Claim 5, Wu teaches the method for training a wakeword detection model based upon negative/false rejections of audio containing a wakeword.  Although Wu detects a falsely rejected audio data sample using a threshold comparison (Col. 7, Line 60- Col. 8, Line 34), Wu does not teach determining a presence of the one or more falsely rejected true keyword samples in the plurality of user audio data based on a first similarity threshold associated with the user audio detection model being met or exceeded but not meeting or exceeding a second similarity threshold.  Bansal, however, discloses the use of "additional thresholds" where a wakeup/wakeword candidate audio must score greater than a first threshold TH1, but may not exceed a second threshold (e.g., TH2) (Paragraphs 0041,0047, 0051-0054, and 0062; Fig. 2a; and Fig. 3d, Elements 312, 314, and 328). Note that such audio inputs do not trigger a wakeup (e.g., are falsely rejected), but are considered in an adaptation operation.
Wu and Bansal are analogous art because they are from a similar field of endeavor in wakeword model training utilizing false accepts/negatives.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to utilize the multiple wakeword thresholds taught by Bansal in the detection of false negatives and positive wakewords in the process of Wu to provide a predictable result of more effectively adapting a wakeword model by excluding candidates with very low (i.e., less than TH1) similarity scores and focusing on those having reasonable similarity that can have improved detection with adaptation.
With respect to Claim 10, Wu fails to teach, however, Bansal discloses:
The method of claim 1, further comprising training the user audio detection model based on one or more true keyword samples, wherein the one or more true keyword samples and the plurality of user audio data comprise audio data associated with voice of the user (adaptation on a very strong wakeup keyword audio exceeding TH4 (i.e., a true keyword sample) Paragraph 0059, Fig. 3a and 3b, Elements 302 and 306; adaptation to a voice of a user discussed at Paragraphs 0013 and 0035; such teachings in Bansal provide for higher quality, reliable updating/adaptation audio data that can improve speech recognition for a particular user, Paragraph 0035).
Claim 16 contains subject matter similar to claim 5, and thus, is rejected under similar rationale.
Claim 20 contains subject matter similar to claim 10, and thus, is rejected under similar rationale.
Claim 25 contains subject matter similar to claim 5, and thus, is rejected under similar rationale.
With respect to Claim 27, Wu discloses:
User equipment comprising: 
means (structure:  processor, Fig. 10, Element 1004) for implementing a user audio detection model, the user audio detection model configured to change an operational state of the user equipment based on true keyword samples detected (algorithm:  implementing an existing “trained” speech key/wake word detection model that changes the state of a controlled device when a positive/true audio input is detected (e.g., causes the device to wake up/activate), Col. 2, Lines 55-60; Col. 3, Lines 1-20; Col. 4, Lines 23-61; Col. 9, Lines 22-34; and Col. 10, Lines 30-57); 
means for detecting audio from a user (detecting and capturing user audio, Col. 2, Lines 55-65; Col. 3, Lines 30-38; Col. 7, Line 60- Col. 8, Line 34; structure:  microphone, Fig. 10, Element 1020); 
means (structure:  processor, Fig. 10, Element 1004) for detecting, using the user audio detection model, presence of a true keyword sample in the audio from the user (algorithm:  positive detection of a wakeword,” wherein a wakeword is a type of keyword and the detection is based upon a similarity score of correspondence to an existing trained model such as a probability score, Col. 2, Lines 21-65; Col. 7, Line 60- Col. 8, Line 34; Col. 9, Lines 22-34; and Fig. 5, Element 516); 
means for obtaining (structure:  processor, Fig. 10, Element 1004), responsive to the detecting, using the user audio detection model, the presence of the true keyword sample in the audio from the user, from an audio buffer, the plurality of user audio data comprising one or more falsely rejected true keyword samples, the one or more falsely rejected true keyword samples being insufficient to change the operational state of the user equipment (algorithm:  obtaining detected/score-labeled audio data samples from an earlier/first user audio input that pertains to a false negative/detection of the keyword wherein in a negative detection, the system sees the wakeword as "not present in the first audio" and so does not act on the wakeword to active a device, Col. 2, Lines 55-65; Col. 3, Line 39- Col. 4, Line 8 ; Col. 7, Line 60- Col. 8, Line 34; and Fig. 5, Elements 502, 504, and 518; rolling buffer (i.e., a buffer that continually receives input data and overwrites the oldest data with the newest data when filled) that stores incoming audio data, Col. 3, Lines 30-38, wherein audio that is contained in a buffer includes true keyword samples exceeding a threshold and negative rejections of true keywords, Col. 2, Lines 21-54 and Col. 7, Line 60- Col. 8, Line 34); 
means for transmitting at least a portion of the one or more falsely rejected true keyword samples to a networked entity, or accessing the at least portion of the one or more falsely rejected true keyword samples locally at the user equipment, the at least portion of the one or more falsely rejected true keyword samples configured to be used in generation of an updated user audio detection model (sending first audio data to a server that corresponds to negatively rejected true wakeword samples to generate a trained updated model , Col. 2, Lines 55-65; Col. 4, Lines 8-17; Col. 5, Lines 16-35; Col. 7, Line 27- Col. 8, Line 34; Col. 9, Lines 44-62; Fig. 5, Elements 504, 512, and 520; structure:  Fig. 10, Element 1014 showing a connection to a network 199; see also Fig. 1; Processor, Fig. 10, Element 1004; and the false samples are not sent to the server for model updating as opposed to the identified false negatives that are separated and sent to the server for model updating, Col. 2, Lines 41-45; Col. 4, Lines 8-17; Col. 5, Lines 16-35; Col. 7, Line 27- Col. 8, Line 34; Col. 9, Line 63- Col. 10, Line 7); and 
means for receiving the updated user audio detection model from the networked entity, or locally generating the updated user audio detection model using the at least portion of the one or more falsely rejected true keyword samples (receiving the trained updated model at a user device from the server, Col. 2, Lines 55-65; Col. 4, Lines 8-17; Col. 5, Lines 16-35; Col. 7, Line 27- Col. 8, Line 34; Fig. 1, Element 138; and Fig. 5, Element 520; structure: Fig. 10, Element 1014 showing a connection to a network; see also Fig. 1).
Although Wu does disclose the implementation of an original existing model that is subsequently updated in one or more iterations (Col. 3, Lines 1-20 and Col. 9, Lines 22-34), Wu does not specifically indicate that the samples used to implement the existing model are the “true keyword samples” (see the preceding 35 U.S.C. 112(f) interpretation of the “means for implementing…” limitation).  Bansal, however, discloses adaptation on a very strong wakeup keyword audio exceeding TH4 (i.e., a true keyword sample) (Paragraph 0059, Fig. 3a and 3b, Elements 302 and 306; adaptation to a voice of a user discussed at Paragraphs 0013 and 0035).
Wu and Bansal are analogous art for the reasons noted with respect to Claim 5.  Thus, it would have been obvious to utilize the true keyword samples taught by Bansal to train an initial trained model of Wu to predictably provide for higher quality, reliable updating/adaptation audio data that can improve speech recognition for a particular user (Bansal, Paragraph 0035).
With respect to Claim 28, Wu further discloses:
The user equipment of claim 27, wherein: the plurality of user audio data further comprises one or more false keyword samples (false audio error data samples where a keyword is not present in the input audio such as for when a user intentionally tries to confuse a device, Col. 2, Lines 41-45; Col. 9, Line 63- Col. 10, Line 7); and the user equipment further comprises means (structure:  processor, Fig. 10, Element 1004) for separating the one or more falsely rejected true keyword samples from the one or more false keyword samples prior to the transmitting of the at least portion of the one or more falsely rejected true keyword samples to the networked entity (algorithm:  the false samples are not sent to the server for model updating as opposed to the identified false negatives that are separated and sent to the server for model updating wherein the score-based labeling is based on a comparison with a keyword/wakeword model, Col. 2, Lines 41-45; Col. 4, Lines 8-17; Col. 5, Lines 16-35; Col. 7, Line 27- Col. 8, Line 34; Col. 9, Line 63- Col. 10, Line 7).
With respect to Claim 29, Wu fails to teach, but Bansal further discloses:
means (structure:  processor, Paragraph 0018 and 0072) for determining a presence of the one or more falsely rejected true keyword samples in the plurality of user audio data based on a first similarity threshold associated with the user audio detection model being met or exceeded but not meeting or exceeding a second similarity threshold (algorithm:  the use of "additional thresholds" where a wakeup/wakeword candidate audio must score greater than a first threshold TH1, but may not exceed a second threshold in comparison to a wake up keyword model (e.g., TH2) (Paragraphs 0041, 0047, 0051-0054, and 0062; Fig. 2a; and Fig. 3d, Elements 312, 314, and 328). Note that such audio inputs do not trigger a wakeup (e.g., are falsely rejected), but are considered in an adaptation operation).  The combination of Wu and Bansal is obvious under 35 U.S.C. 103 for the reasons given with respect to claim 5.
With respect to Claim 30, Wu further discloses:
The user equipment of claim 27, wherein: the one or more true keyword samples and the plurality of user audio data comprise audio data associated with voice of the user (audio is for a particular user, Col. 2, Lines 21-54 and Col. 3, Lines 1-20); and the user equipment further comprises means (structure:  processor, Fig. 10, Element 1004) for temporarily storing the audio data associated with voice of the user in the audio buffer for a prescribed length, the audio buffer comprising the true keyword sample and the one or more falsely rejected true keyword samples (algorithm:  management of rolling buffer (i.e., a buffer that continually receives input data and overwrites the oldest data with the newest data when filled) that stores incoming audio data, Col. 3, Lines 30-38, wherein audio that is contained in a buffer includes true keyword samples exceeding a threshold and negative rejections of true keywords, Col. 2, Lines 21-54 and Col. 7, Line 60- Col. 8, Line 34)..

Claims 6 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Wu, et al. in view of Hoffmeister (U.S. Patent:  9,159,319).
With respect to Claim 6, Wu teaches the method for training a wakeword detection model based upon negative/false rejections of audio containing a wakeword.  Wu does not teach the use of another audio detection model comprising a different detection as set forth in claim 6.
Hoffmeister, however, discloses:
determining a presence of the one or more falsely rejected true keyword samples in the plurality of user audio data based on another user audio detection model, the another user audio detection model comprising at least a different detection criterion from the user audio detection model (falsely rejected keywords with "low confidence" are used for model updating and in the detection process, audio may be compared to an additional model (e.g., a competitor model) where the detection criterion involves a likelihood mapping to that additional model, Col. 9, Lines 51-60; Col. 10, Lines 11-21; Col. 11, Lines 18-31; Col. 12, Lines 1-19).
Wu and Hoffmeister are analogous art because they are from a similar field of endeavor in keyword model training utilizing false accepts/negatives.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to utilize the competitor models taught by Hoffmeister in the wakeword detector taught by Wu to provide a predictable result of improving the detection of keywords by considering similar sounding competitors (Hoffmeister, Col. 2, Lines 23-39).
Claim 17 contains subject matter similar to claim 6, and thus, is rejected under similar rationale.

Conclusion

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES S WOZNIAK whose telephone number is (571)272-7632. The examiner can normally be reached 7-3, off alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant may use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JAMES S. WOZNIAK
Primary Examiner
Art Unit 2655



/JAMES S WOZNIAK/               Primary Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Jan 25, 2024
Application Filed
Sep 03, 2025
Non-Final Rejection mailed — §102, §103
Nov 24, 2025
Response Filed
Feb 11, 2026
Final Rejection mailed — §102, §103
Apr 07, 2026
Response after Non-Final Action
May 06, 2026
Request for Continued Examination
May 12, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/585,204
Patent 12640139
METHOD AND APPARATUS FOR IMPROVING PERFORMANCE OF ARTIFICIAL INTELLIGENCE MODEL USING SPEECH RECOGNITION RESULTS AS TEXT INPUT
2y 3m to grant Granted May 26, 2026
18/535,521
Patent 12609113
NATURAL LANGUAGE PROCESSING SYSTEMS AND METHODS FOR INTENT CLASSIFICATION OF SPEECH TRANSCRIPTION
2y 4m to grant Granted Apr 21, 2026
18/544,354
Patent 12609106
EMOTIVE TEXT-TO-SPEECH WITH AUTO DETECTION OF EMOTIONS
2y 4m to grant Granted Apr 21, 2026
18/399,876
Patent 12597422
SPEAKING PRACTICE SYSTEM WITH RELIABLE PRONUNCIATION EVALUATION
2y 3m to grant Granted Apr 07, 2026
18/488,578
Patent 12586569
Knowledge Distillation with Domain Mismatch For Speech Recognition
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
60%
Grant Probability
99%
With Interview (+39.4%)
3y 8m (~1y 3m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 391 resolved cases by this examiner. Grant probability derived from career allowance rate.