Last updated: April 19, 2026

Application No. 18/253,673

ENABLING TRAINING OF A MACHINE-LEARNING MODEL FOR TRIGGER-WORD DETECTION

Final Rejection §102§103

Filed

May 19, 2023

Examiner

WOZNIAK, JAMES S

Art Unit

2655

Tech Center

2600 — Communications

Assignee

Assa Abloy AB

OA Round

2 (Final)

This examiner grants 59% of cases after interview

— +40.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 385 resolved cases, 2023–2026

Examiner Intelligence

WOZNIAK, JAMES S View full profile →

Grants 59% of resolved cases

Career Allow Rate

227 granted / 385 resolved

-3.0% vs TC avg

Strong +40% interview lift

Without

With

+40.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 7m

Avg Prosecution

42 currently pending

Career history

427

Total Applications

across all art units

Statute-Specific Performance

§101

18.1%

-21.9% vs TC avg

§103

40.1%

+0.1% vs TC avg

§102

18.4%

-21.6% vs TC avg

§112

16.1%

-23.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 385 resolved cases

Office Action

§102 §103

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment

In response to the Non-final Office Action from 10/30/2025, Applicant has filed an amendment on 1/14/2026.  In this reply, Applicant does not advance prosecution with meaningful amendments to the claims and instead argues that the prior art of record fails to teach identifying a speech segment that is close to being recognized as the trigger word and if the corresponding semantic vector for the identified speech segment lies within a threshold distance of a vector for the trigger word, labelling the identified speech segment as valid training data for the trigger word (Remarks, Pages 7-9).  These arguments have been fully considered, however, are not found to be persuasive for the reasons noted in the below Response to Arguments section.

Applicant requests withdrawal of the rejection of claim 13 under 35 U.S.C. 112(b) due to the amendment of this claim (Remarks, Page 7).
In response to the correction of the antecedent basis issue in claim 13, the 35 U.S.C. 112(b) rejection is now moot and has been withdrawn.

Applicant requests withdrawal of the rejection of claim 13 under 35 U.S.C. 101 due to the amendment of this claim (Remarks, Page 7).
In response to further specifying that the computer readable storage medium is "non-transitory," non-statutory signals per se have been excluded from the claim as a whole leaving only statutory embodiments for the medium under the broadest reasonable interpretation (BRI).  Accordingly, the 35 U.S.C. 101 rejection has been withdrawn.

Response to Arguments

With respect to independent Claim 1, Applicant argues that Wu, et al. (CN 108320733 A1) fails to teach corresponding semantic vector for the identified speech segment lies within a threshold distance of a vector for the trigger word, labelling the identified speech segment as valid training data for the trigger word.  In particular, Applicant contends that Wu does not disclose "exploiting speech data that fails recognition but is semantically close to a trigger words as positive training data" because Wu expressly describes that "using non-wake-up words that are similar to the wake-up word(s) as training data corresponding to the wake-up word(s) is likely to worsen the performance of the wake-up model" and therefore expressly describes filtering out such data as a "counter-example".  Thus, Applicant concludes that Wu only uses similarity and distance measures to be used to exclude semantically or acoustically close, but incorrect, data before retraining the wake-up model (Remarks, Pages 7-8).  Moreover, the Applicant appears to argue features that are not being claimed.  Specifically, the Applicant on pages 8-9 describe that claim 1 "expands the training data...by labelling words similar/close to trigger words as trigger words...to improve model robustness" (Remarks, Pages 8-9).  Applicant also contends that Wu fails to teach determining that the found section of sound-based data corresponds to the trigger word when a distance in vector space, is less than a threshold distance because Wu teaches the opposite of excluding semantically similar or acoustically close data before retraining the wake-up model."
In response, it should be noted that the Applicant's filtering out arguments overlook the complete statement about non-awake words that are similar to the wake-up words worsen performance of the wake-up model and so are not used in training because false may be used as counter examples (see Pages 1 and 10).  Moreover, as noted, Applicant’s argued subject matter does not reflect the instant unamended claim language.  
Claim 1 specifically recites a step that positively determines "that the sound-based data corresponds to a trigger word" and labeling such data accordingly.  Then claim 1 includes a narrowing wherein clause that describes that such step further involves performing speech recognition on that data, finding a section that fails to be considered the trigger word based upon speech recognition, and then performing the semantic distance based upon vector data.  Claim 1 does not state that the found section that fails speech recognition-based trigger word detection is used to train the machine learning model, instead the claim features a step that identifies the trigger word where the found section is determined to correspond to the trigger word when the distance is less than a threshold.  There is no claim limitation specifying that the found section is used to train the ML model.
Turning back to the teachings of Wu, it is noted that Wu does positively label voice data as wake-up/trigger voice data (Page 4, 6, and 10).  Moreover, Wu looks to two types of metrics to establish positive wake-up words- acoustic and semantic level features (e.g., see Page 10).  The acoustic features involve acoustic speech recognition of the wake-up model while semantic features involve semantic level feature sets that together comprises a vector (Pages 15-16 and 19).  Importantly, such metrics can be used initially or in a combination (see “and/or” on page 10) in the form of acoustic recognition scoring and semantic-based vector processing.  Thus, Wu teaches a scenario where an acoustic recognition score of an input voice data may not register a positive wake word example, however, would still be detected as a positive example in the screening out relying on semantics if the semantic distance of the sample wakeword and input speech data is minimal (Pages 10 and 15-16).  
Thus, the Applicant arguments directed towards claim 1 are not found to be persuasive because they rely on features that are not claimed and Wu teaches positive detection of a wake-up/trigger word even when acoustic speech recognition may fail by relying on semantic detection.  The examiner can appreciate the differences underlying the teachings of Wu and the disclosed invention; however, these differences would be helpful to incorporate into the claim with greater clarity to better support Applicant's position.
The remaining independent and dependent claims have been traversed for reasons similar to Claim 1 (Remarks, Page 9).  In regards to such arguments, see the response directed towards claim 1.

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-2, 7-8, and 13 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wu, et al. (CN 108320733 A1).
With respect to Claim 1, Wu discloses:
A method for enabling training a machine-learning, ML, model for trigger-word detection, the method being performed in a training data provider, the method comprising:
receiving sound-based data, the sound-based data being based on sounds captured in a space to be monitored (acquisition/collection of voice data from a space using equipment in the form of a microphone, Pages 8-9 and 16);
determining that the sound-based data corresponds to a trigger word, and labelling this sound-based data to correspond to the trigger word (detection and labeling/marking of voice data as "positive wake-up voice data," Pages 4, 6, 10, and 16-17); and
providing the labelled sound-based data to train the ML model (using the positively labeled wake-up voice data for neural network model training (e.g., a DNN), Pages 17 and 20);
wherein the determining that the sound-based data corresponds to the trigger word comprises:
performing speech recognition of the sound-based data (performing speech recognition and obtaining results using the "current wake-up model," Pages 2, 8, and 11);
finding a section of sound-based data that, using the speech recognition, fails to be considered to be the trigger word, but is close to being considered to be the trigger word (determination of false wake-up data or non-awake words "similar to the wake-up words," Pages 1-2, 5, and 10);
obtaining semantic vector data based on the found section of sound-based data (extraction of a "semantic level" feature set that together comprises a vector, Pages 15-16 and 19); and 
determining that the found section of sound-based data corresponds to the trigger word when a distance, in vector space, between the semantic vector data of the sound-based data and a vector corresponding to the trigger word, is less than a threshold distance (detecting valid/positive wake-up voice data involves a process that "screens out the false wake-up data" utilizing semantic vector sequences and sample vector sequences in a distance calculation to identify semantic similarity in the screening process wherein each sequence of features constitutes a vector and the distance calculation measures vector space closeness (i.e., minimum distance) between the two vectors, Pages 10-11, 15-16, and 19-20).
With respect to Claim 2, Wu discloses:
The method according to claim 1, wherein the sound-based data is in the form of Mel-frequency cepstral coefficients, MFCCs (acoustic characteristics in the form of MFCCs among others, Page 1-2).
Claim 7 is directed towards a system embodiment of the claimed invention comprising a processor and a memory storing processor-executable instructions for carrying out the method of claim 1, and thus, is rejected under similar rationale.  Moreover, Wu teaches method implementation as a system comprising a processor and memory storing program instructions (Pages 7 and 20).
Claim 8 contains subject matter similar to Claim 2, and thus, is rejected under similar rationale.
Claim 13 involves an embodiment of the invention directed towards a non-transitory computer-readable storage medium storing a computer program for carrying out the method of claim 1, and thus, is rejected under similar rationale.  Moreover, Wu teaches method implementation as a computer-readable storage medium storing program instructions within an electronic computing device (Pages 7 and 20).

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Hart, et al. (U.S. Patent:  8,700,392).  
With respect to Claim 3, Wu teaches the wake word training procedure that screens out false wake words from the model optimization process as applied to Claim 1.  Wu does not discuss what happens to data that fails to correspond to voice sounds, namely that such data is discarded as recited in claim 3.  Hart, however, discloses non-trigger word inputs can be discarded (Col. 10, Lines 18-37).
Wu and Hart are analogous art because they are a from a similar field of endeavor in trigger word recognition.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing data to utilize the non-trigger word data deletion taught by Hart in the wake word detection process taught by Wu to provide a predictable result of saving computer memory/cache space since clearly non-speech inputs have been deemed to lack trigger word content.
Claim 9 contains subject matter similar to Claim 3, and thus, is rejected under similar rationale.

Claims 4 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Thomson, et al. (U.S. Patent:  10,388,272).  
With respect to Claim 4, Wu teaches the wake word training procedure that screens out false wake words from the model optimization process as applied to Claim 1.  Wu does not discuss what happens to the acquired voice data after training is completed, namely the discarding of the voice data positively relied upon for training.  Thomson, however, discloses that after training is completed the audio and text data is deleted (Col. 201, Lines 17-29).
Wu and Thomson are analogous art because they are a from a similar field of endeavor in keyword recognition training.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing data to utilize the post-training data deletion taught by Thomson in the wake word training process taught by Wu to provide a predictable result of preserving user privacy (Thomson, Col. 201, Lines 17-29).
Claim 10 contains subject matter similar to Claim 4, and thus, is rejected under similar rationale.

Claims 5-6, 11-12, and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Kim, et al. (U.S. PG Publication:  2016/0267913 A1).
With respect to Claim 5, Wu teaches the wake word neural network training procedure that screens out false wake words from the model optimization process as applied to Claim 1.  Wu does not teach what is done with the obtained optimized/trained wake word recognition model, namely transmission of such model to a central location for aggregated learning as set forth in claim 5.  Kim, however, discloses using uploaded, locally updated wake-up keyword models to update/train a recognition server model (Paragraphs 0070, 0112, and 0126).
Wu and Kim are analogous art because they are a from a similar field of endeavor in trigger word recognition training.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing data to use the server model updating process taught by Kim after the wake word training process taught by Wu to provide a predictable result of synchronizing models across various devices (Kim, Paragraph 0100), thus avoiding unnecessary/duplicate training processes. 
With respect to Claim 6, Kim further discloses:
The method according to claim 5, further comprising:  receiving an updated ML model being based on the central ML model (central server transmission of the wake-up keyword model to a user device, Paragraph 0050, 0070, 0096, and 0195; note that Wu teaches DNN wake word models as applied to Claim 1).
Claims 11-12 contain subject matter respectively similar to Claim 5-6, and thus, are rejected under similar rationale.
With respect to Claims 15 and 16, Kim further discloses that the model is trained/updated at a local device (Paragraphs 0112 and 0126).

Conclusion

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Sharifi, et al. (U.S. PG Publication:  2022/0115011 A1)- teaches a method for detecting "near matches to a hotword or phrase" where a trigger/hotword fails an audio recognition threshold, the positive hotword can still be detected "based on a semantic similarity between the text generated by the speech recognition engine and a supported hotword" (Paragraphs 0032, 0037, and 0106).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES S WOZNIAK whose telephone number is (571)272-7632. The examiner can normally be reached 7-3, off alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JAMES S. WOZNIAK
Primary Examiner
Art Unit 2655



/JAMES S WOZNIAK/Primary Examiner, Art Unit 2655

Read full office action

Prosecution Timeline

May 19, 2023

Application Filed

Oct 28, 2025

Non-Final Rejection — §102, §103

Jan 14, 2026

Response Filed

Mar 09, 2026

Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/399,876

Patent 12597422

SPEAKING PRACTICE SYSTEM WITH RELIABLE PRONUNCIATION EVALUATION

2y 5m to grant Granted Apr 07, 2026

18/488,578

Patent 12586569

Knowledge Distillation with Domain Mismatch For Speech Recognition

2y 5m to grant Granted Mar 24, 2026

18/359,113

Patent 12511476

CONCEPT-CONDITIONED AND PRETRAINED LANGUAGE MODELS BASED ON TIME SERIES TO FREE-FORM TEXT DESCRIPTION GENERATION

2y 5m to grant Granted Dec 30, 2025

18/390,934

Patent 12512100

AUTOMATED SEGMENTATION AND TRANSCRIPTION OF UNLABELED AUDIO SPEECH CORPUS

2y 5m to grant Granted Dec 30, 2025

18/448,628

Patent 12475882

METHOD AND SYSTEM FOR AUTOMATIC SPEECH RECOGNITION (ASR) USING MULTI-TASK LEARNED (MTL) EMBEDDINGS

2y 5m to grant Granted Nov 18, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

59%

Grant Probability

99%

With Interview (+40.1%)

3y 7m

Median Time to Grant

Moderate

PTA Risk

Based on 385 resolved cases by this examiner. Grant probability derived from career allow rate.