Last updated: April 19, 2026
Application No. 18/633,349
VOICE WAKE-UP METHOD, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM

Final Rejection §103
Filed
Apr 11, 2024
Examiner
PULLIAS, JESSE SCOTT
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Aac Acoustic Technologies (Shenzhen) Co. Ltd.
OA Round
2 (Final)
Interview Optional

— +13.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 1052 resolved cases, 2023–2026
Examiner Intelligence

PULLIAS, JESSE SCOTT View full profile →
Grants 83% — above average
Career Allow Rate
873 granted / 1052 resolved
+21.0% vs TC avg
Moderate +13% lift
Without
With
+13.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
47 currently pending
Career history
1099
Total Applications
across all art units
Statute-Specific Performance

§101
15.0%
-25.0% vs TC avg
§103
50.4%
+10.4% vs TC avg
§102
19.7%
-20.3% vs TC avg
§112
4.9%
-35.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1052 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This office action is in response to correspondence 02/03/26 regarding application 18/634,991, in which claims 1, 4, 6, 7, and 10 were amended and claims 2-3 were cancelled. Claims 1 and 4-10 are pending in the application and have been considered.

Response to Arguments
Amended claim 6 overcomes the objection for informalities, and so the objection is withdrawn.
Amended claim 10 overcomes the 35 U.S.C. 101 rejection, and so the rejection is withdrawn. Specifically, Applicant has amended the claim to recite “non-transitory” computer-readable storage medium, which rules out transitory media types.
On pages 6-7, regarding the 35 U.S.C. 103 rejections based in part on Wang, Applicant argues that in Wang, the user needs to wait for the host to be woken up before uttering the subsequent command word and therefore, the substantial steps of the voice wakeup method provided by Wang are different those of the application, and Wang's disclosure cannot achieve the technical effect of optimizing the computational overhead of the algorithm as achieved by this application. In response, it is noted that the features upon which Applicant relies (i.e., whether the user needs to wait for the host to be woken up before uttering the subsequent command word) are not recited in the rejected claims. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Second, on page 7, Applicant argues that the present application performs pre-processing including filtering, pre-emphasis, framing, windowing, content extraction, and calculation of the decoded data are not disclose in Wang. In response, while not all of these features are explicitly shown in Wang, because of the computation of MFCCs in Wang, one of ordinary skill in the art of audio processing at filing time would have immediately recognized that these pre-processing steps are implied, or at least suggested by Wang:
performing pre-emphasis processing on the filtered data to generate pre-emphasis data (computation of MFCCs especially in conjunction with an HMM recognizer implies, or at least suggests pre-emphasis, [0107]-[0111]); 
framing the pre-emphasis data to generate frames of data (framing, [0107]); 
windowing each of the frames of the data to generate windowing data (implicit in MFCC computation, [0107], [0111]); 
extracting effective contents in the windowing data based on the improved endpoint detection algorithm (e.g. frequency spectrum of the window for a frame containing speech, implicit in MFCC computation, [0107], [0111]); and 
calculating the effective contents based on a Mel-frequency cepstrum coefficient feature extraction algorithm to generate the effective sound segment of the original voice signal (the MFCCs themselves are computing by taking a DCT of Mel log powers, resulting in Mel cepstral coefficients, [0107], [0111]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang by performing pre-emphasis processing on the filtered data to generate pre-emphasis data; framing the pre-emphasis data to generate frames of data; windowing each of the frames of the data to generate windowing data; extracting effective contents in the windowing data based on the improved endpoint detection algorithm; and calculating the effective contents based on a Mel-frequency cepstrum coefficient feature extraction algorithm to generate the effective sound segment of the original voice signal in order to address the well known problems with HMM based recognizers identified by Wang ([0101]-[0104]) by using a feature (MFCC) offering well known robust solutions. In other words, the claimed pre-processing steps are not new and non-obvious over the prior art.
Next, on page 7 Applicant further argues “Moreover, the technical solution of Wang's disclosure first collects wakeup word in an offline state, then sequentially performs frame processing, extracts feature parameters, conducts clustering on the feature parameters, establishes the observation states of the Hidden Markov Model, adjusts the parameters of the Hidden Markov Model through an algorithm to maximize P(σ'|)), compares P(σ'|)) with the confidence threshold to determine whether the wakeup word is recognized. Therefore, Wang does not disclose operations such as filtering and pre-emphasis before frame processing of the data. Nor does Wang disclose steps such as windowing, content extraction, and calculation after frame processing of the data. Therefore, Wang does not disclose the specific operating method for generating the effective sound segments of the original voice signal in this application, and he even less able to provide technical inspiration for these operations. In addition, there is no evidence to indicate that the specific steps of the operating method for generating the effective sound segments of the original voice signal in this application belong to conventional technical means.” 
In response, the examiner respectfully disagrees. Wang discloses filtering an interference signal in the decoded data to generate filtered data because Wang discloses separating, i.e. filtering, signals other than from the desired source by blind source separation processing, [0083]. Wang implies, or at least suggests performing pre-emphasis processing on the filtered data to generate pre-emphasis data because the computation of MFCCs especially in conjunction with an HMM recognizer implies, or at least suggests pre-emphasis, [0107]-[0111]). The claimed “windowing” each of the frames of the data to generate windowing data is implicit in the MFCC computation of Wang, [0107], [0111]), as is the extracting effective contents in the windowing data based on the improved endpoint detection algorithm, since the frequency spectrum of the window for a frame containing speech is implicit in MFCC computation, [0107], [0111]), and calculating the effective contents based on a Mel-frequency cepstrum coefficient feature extraction algorithm to generate the effective sound segment of the original voice signal, since the MFCCs themselves are computing by taking a DCT of Mel log powers, resulting in Mel cepstral coefficients, [0107], [0111]. Moreover, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang by performing pre-emphasis processing on the filtered data to generate pre-emphasis data; framing the pre-emphasis data to generate frames of data; windowing each of the frames of the data to generate windowing data; extracting effective contents in the windowing data based on the improved endpoint detection algorithm; and calculating the effective contents based on a Mel-frequency cepstrum coefficient feature extraction algorithm to generate the effective sound segment of the original voice signal in order to address the well known problems with HMM based recognizers identified by Wang ([0101]-[0104]) by using a feature (MFCC) offering well known robust solutions. 
The arguments on pages 7-8 regarding Zwyssig, Junqua, Sakhnov, and Paul, as well as dependent claims 4-10 are similar to those addressed above, and are not persuasive for similar reasons.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 5, 7, 9, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 20200027462) in view of Zwyssig et al. (“A Digital Microphone Array for Distant Speech Recognition”. ICASSP 2010), in further view of Junqua et al. (“A Study of Endpoint Detection Algorithms in Adverse Conditions: Incidence on a DTW and HMM Recognizer”. EUROSPEECH ’91).

Consider claim 1, Wang discloses a voice wake-up method, applied to an electronic device (waking up the voice recognition processor, [0154], of an electrical appliance, [0002]), comprising steps: 
collecting an original voice signal input by a user (acquiring voice information in an analog signal format of a user attempting voice control according to a preset wake word, e.g. “hello refrigerator”, [0012], [0124]); 
generating data according to the original voice signal (digital conversion to obtain the voice information in the digital signal format, [0140]); 
decoding the data to generate decoded data (step 120: processing the voice information to determine whether it contains human voice; if yes, separating out voice information segment containing voice, [0141]; this is considered “decoding” because it extracts speech segments from the signal); 
performing preprocessing and feature extraction processing on the decoded data to generate voice features (extracting characteristic parameters from a voice frame containing voice data, such as MFCC etc, [0107], [0111], Fig. 8; those skilled in the art would understand that extracting MFCCs from speech requires a number of steps, i.e. preprocessing); 
performing pattern matching on the voice features according to a hidden Markov model to generate a recognition result (matching the voice data with the wake up word by comparing the observed values to the observation state of a hidden Markov HMM model, [0112], [0127]); and 
waking up an external processor of the electronic device according to the recognition result (comparing P(σ′|λ) with a confidence threshold value to obtain whether the awakening word is recognized or not, [0153], and if so, the speech recognition processor is awakened, [0154] considered an external processor versus the coprocessor which performs the wakeup word recognition, Abstract),
wherein the step of performing preprocessing and feature extraction processing on the decoded data to generate the voice features comprises steps: 
pre-processing the decoded data by an improved endpoint detection algorithm to generate an effective sound segment of the original voice signal (separating the voice information including the human voice from portions not containing the human voice, i.e. endpoint detection, using an energy threshold, i.e. an improved endpoint detection algorithm, [0029], improved by blind source separation processing, [0083]); 
extracting feature information of the effective sound segment by a feature extraction algorithm (extracting characteristic parameters such as MFCC, [0107], [0111]);
wherein the step of pre-processing the decoded data by the improved endpoint detection algorithm to generate the effective sound segment of the original voice signal comprises steps: 
filtering an interference signal in the decoded data to generate filtered data (separating, i.e. filtering, signals other than from the desired source by blind source separation processing, [0083]).
Wang does not specifically mention: performing pre-emphasis processing on the filtered data to generate pre-emphasis data; 
framing the pre-emphasis data to generate frames of data; 
windowing each of the frames of the data to generate windowing data; 
extracting effective contents in the windowing data based on the improved endpoint detection algorithm; and 
calculating the effective contents based on a Mel-frequency cepstrum coefficient feature extraction algorithm to generate the effective sound segment of the original voice signal.
However, Wang implies, or at least suggests:
performing pre-emphasis processing on the filtered data to generate pre-emphasis data (computation of MFCCs especially in conjunction with an HMM recognizer implies, or at least suggests pre-emphasis, [0107]-[0111]); 
framing the pre-emphasis data to generate frames of data (framing, [0107]); 
windowing each of the frames of the data to generate windowing data (implicit in MFCC computation, [0107], [0111]); 
extracting effective contents in the windowing data based on the improved endpoint detection algorithm (e.g. frequency spectrum of the window for a frame containing speech, implicit in MFCC computation, [0107], [0111]); and 
calculating the effective contents based on a Mel-frequency cepstrum coefficient feature extraction algorithm to generate the effective sound segment of the original voice signal (the MFCCs themselves are computing by taking a DCT of Mel log powers, resulting in Mel cepstral coefficients, [0107], [0111]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang by performing pre-emphasis processing on the filtered data to generate pre-emphasis data; 
framing the pre-emphasis data to generate frames of data; 
windowing each of the frames of the data to generate windowing data; 
extracting effective contents in the windowing data based on the improved endpoint detection algorithm; and 
calculating the effective contents based on a Mel-frequency cepstrum coefficient feature extraction algorithm to generate the effective sound segment of the original voice signal in order to address the well known problems with HMM based recognizers identified by Wang ([0101]-[0104]) by using a feature (MFCC) offering well known robust solutions.
Wang also does not specifically mention generating pulse density modulation data according to the original voice signal.
Zwyssig discloses density modulation data according to the original voice signal (the ADC on each MEMS chip outputs a binary PDM signal, Section 2.2, page 5107, i.e. the digital microphone converts speech into PDM data, Section 1, page 5106).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang by generating pulse density modulation data according to the original voice signal in order to avoid needing external amplifiers or ADCs, as suggested by Zwyssig (Section 1, page 5106). Doing so would have led to predictable results of reducing costs for mass production of the array, as suggested by Zwyssig (Section 1, page 5106). The references cited are analogous art in the same field of audio processing. 
Wang and Zwyssig do not specifically mention performing vector quantization on the feature information to generate the voice features.
Junqua discloses performing vector quantization on the feature information to generate the voice features (VQ-based HMM recognizer, Section 4.2, page 1372).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang and Zwyssig by performing vector quantization on the feature information to generate the voice features in order to mitigate the effects of noise, predictably reducing errors in the automatic speech recognition system, as suggested by Junqua (Section 1, page 1371). The references cited are analogous art in the same field of audio processing.

	
Consider claim 7, Wang discloses an electronic device (electrical appliance, [0002]), comprising: 
an external processor and a smart microphone (voice recognition processor CPU and voice collecting assembly in conjunction with Co-processor, Fig 4, the microphone array considered “smart” as it is used to wake up the device upon hearing a command such as “hello, refrigerator”, [0088], [0188]); 
wherein the smart microphone comprises a microphone and a digital signal processor (MIC array and Co-processor in conjunction with AD conversion module, Fig 4, [0118]); 
the microphone is configured to collect an original voice signal input by a user (acquiring voice information in an analog signal format of a user attempting voice control according to a preset wake word, e.g. “hello refrigerator”, [0012], [0124]);
 the digital signal processor is configured to generate data according to the original voice signal (digital conversion to obtain the voice information in the digital signal format, [0140]), 
decode the data to generate decoded data (step 120: processing the voice information to determine whether it contains human voice; if yes, separating out voice information segment containing voice, [0141]; this is considered “decoding” because it extracts speech segments from the signal); 
perform preprocessing and feature extraction processing on the decoded data to generate voice features (extracting characteristic parameters from a voice frame containing voice data, such as MFCC etc., [0107], [0111], Fig. 8; those skilled in the art would understand that extracting MFCCs from speech requires a number of steps, i.e. preprocessing), 
perform pattern matching on the voice features according to a hidden Markov model to generate a recognition result (matching the voice data with the wake up word by comparing the observed values to the observation state of a hidden Markov HMM model, [0112], [0127]), and 
send the recognition result to the external processor (co-processor wakes up voice recognition assembly to perform the voice recognition after detecting a wakeup word, [0118]); 
the external processor is woke up according to the recognition result (comparing P(σ′|λ) with a confidence threshold value to obtain whether the awakening word is recognized or not, [0153], and if so, the speech recognition processor is awakened, [0154] considered an external processor versus the coprocessor which performs the wakeup word recognition, Abstract).
Wang does not specifically mention: performing pre-emphasis processing on the filtered data to generate pre-emphasis data; 
framing the pre-emphasis data to generate frames of data; 
windowing each of the frames of the data to generate windowing data; 
extracting effective contents in the windowing data based on the improved endpoint detection algorithm; and 
calculating the effective contents based on a Mel-frequency cepstrum coefficient feature extraction algorithm to generate the effective sound segment of the original voice signal.
However, Wang implies, or at least suggests:
performing pre-emphasis processing on the filtered data to generate pre-emphasis data (computation of MFCCs especially in conjunction with an HMM recognizer implies, or at least suggests pre-emphasis, [0107]-[0111]); 
framing the pre-emphasis data to generate frames of data (framing, [0107]); 
windowing each of the frames of the data to generate windowing data (implicit in MFCC computation, [0107], [0111]); 
extracting effective contents in the windowing data based on the improved endpoint detection algorithm (e.g. frequency spectrum of the window for a frame containing speech, implicit in MFCC computation, [0107], [0111]); and 
calculating the effective contents based on a Mel-frequency cepstrum coefficient feature extraction algorithm to generate the effective sound segment of the original voice signal (the MFCCs themselves are computing by taking a DCT of Mel log powers, resulting in Mel cepstral coefficients, [0107], [0111]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang by performing pre-emphasis processing on the filtered data to generate pre-emphasis data; 
framing the pre-emphasis data to generate frames of data; 
windowing each of the frames of the data to generate windowing data; 
extracting effective contents in the windowing data based on the improved endpoint detection algorithm; and 
calculating the effective contents based on a Mel-frequency cepstrum coefficient feature extraction algorithm to generate the effective sound segment of the original voice signal in order to address the well known problems with HMM based recognizers identified by Wang ([0101]-[0104]) by using a feature (MFCC) offering well known robust solutions.
Wang also does not specifically mention generating pulse density modulation data according to the original voice signal.
Zwyssig discloses density modulation data according to the original voice signal (the ADC on each MEMS chip outputs a binary PDM signal, Section 2.2, page 5107, i.e. the digital microphone converts speech into PDM data, Section 1, page 5106).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang by generating pulse density modulation data according to the original voice signal in order to avoid needing external amplifiers or ADCs, as suggested by Zwyssig (Section 1, page 5106). Doing so would have led to predictable results of reducing costs for mass production of the array, as suggested by Zwyssig (Section 1, page 5106). The references cited are analogous art in the same field of audio processing. 
Wang and Zwyssig do not specifically mention performing vector quantization on the feature information to generate the voice features.
Junqua discloses performing vector quantization on the feature information to generate the voice features (VQ-based HMM recognizer, Section 4.2, page 1372).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang and Zwyssig by performing vector quantization on the feature information to generate the voice features in order to mitigate the effects of noise, predictably reducing errors in the automatic speech recognition system, as suggested by Junqua (Section 1, page 1371). The references cited are analogous art in the same field of audio processing.

Consider claim 10, the Wang-Zwyssig-Junqua combination discloses non-transitory computer-readable storage medium (a memory or cache with instructions is inherent for co-processor and voice recognition processor to perform the operations described, [0079], Wang), comprising: a program stored therein (a memory or cache with a stored program is inherent for co-processor and voice recognition processor to perform the operations described, [0079], Wang); wherein when the program is operated, a device where the non-transitory computer-readable storage medium is disposed is controlled (waking up the voice recognition processor, [0154], of an electrical appliance, [0002], Wang) to execute the voice wake-up method according to claim 1 (see claim 1 above).

Consider claim 5, Wang discloses the step of performing pattern matching on the voice features according to the hidden Markov model to generate the recognition result comprises: 
according to the hidden Markov model, performing pattern matching on the voice features by a forward algorithm (matching the voice data with the wake up word by comparing the observed values to the observation state of a hidden Markov HMM model, [0112], [0127]; use of a “forward algorithm” to compute the match probability is implicit); 
determining whether the original voice signal input by the user comprises a predetermined command through a predetermined discrimination rule to generate the recognition result (comparing P(σ′|λ) with a confidence threshold value to obtain whether the awakening word is recognized or not, [0153]).

Consider claim 9, Wang discloses the external processor comprises a central processing unit (CPU) or a system-on-chip (voice recognition processor CPU, [0004]).


Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 20200027462) in view of Zwyssig et al. (“A Digital Microphone Array for Distant Speech Recognition”. ICASSP 2010), in further view of Junqua et al. (“A Study of Endpoint Detection Algorithms in Adverse Conditions: Incidence on a DTW and HMM Recognizer”. EUROSPEECH ’91), in further view of Sakhnov et al. (“Approach for Energy-Based Voice Detector with Adaptive Scaling Factor”. IAENG International Journal of Computer Science, 36:4, IJCS_36_4_16, 19 November 2009).


Consider claim 4, Wang, Zwyssig, and Junqua do not, but Sakhnov discloses the improved endpoint detection algorithm comprises a formula δ=φ × (δmax-δmin)/(ρmax-ρmin) × ρ; ρ is a short-term energy change rate; δ is a short-term energy threshold; φ is an adjustable influence factor (page 4, equations 19 and 20; Threshold is viewed current frame energy ratio multiplied by an adaptive scaling factor; the scaled energy is considered equivalent to the claimed short term energy change rate scaled adaptive energy threshold).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang, Zwyssig, and Junqua such that the improved endpoint detection algorithm comprises a formula δ=φ × (δmax-δmin)/(ρmax-ρmin) × ρ; ρ is a short-term energy change rate; δ is a short-term energy threshold; φ is an adjustable influence factor in order to deal with the effects of noise, as suggested by Sakhnov (page 1, Section 1), predictably minimizing the false detection rate of inactive segments while maximizing the detection rate of active speech, as suggested by Sakhnov (page 1, Section 1). The references cited are analogous art in the same field of audio processing.


Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 20200027462) in view of Zwyssig et al. (“A Digital Microphone Array for Distant Speech Recognition”. ICASSP 2010), in further view of Junqua et al. (“A Study of Endpoint Detection Algorithms in Adverse Conditions: Incidence on a DTW and HMM Recognizer”. EUROSPEECH ’91), in further view of Paul (“Speech Recognition Using Hidden Markov Models”. The Lincoln Laboratory Journal, Volume 3, Number 1 (1990)).


Consider claim 6, Wang discloses the step of according to the hidden Markov model, performing pattern matching on the voice features by the forward algorithm and determining whether the original voice signal input by the user comprises the predetermined command through the predetermined discrimination rule to generate the recognition result comprises: 
converting the voice features into symbol sequences, where the voice features are two-dimensional voice features and the symbol sequences are one-dimensional symbol sequences (converting feature vectors of MFCCs, which are two dimensional in time into a phoneme sequence in time, i.e. one dimensional sequence, [0107-0112]); 
exhausting all state sequences corresponding to a symbol sequence of a current frame of the data to generate feature frame sequences (this is implicit to the path search in HMM-based recognition, which sums state transition probabilities, [0107-0112]); 
obtaining a generation probability of the feature frame sequences generated by each of the state sequences according to a transition probability and a transmission probability (this is implicit to the path search in HMM-based recognition, which sums state transition probabilities, [0107-0112]); 
extending a quantity of states in each of the state sequences to a quantity of feature frames, summing probabilities of generating the feature frame sequences of the state sequences, and using a sum thereof as a likelihood probability that the feature frame sequences are identified as word sequences (this is implicit in computing P(σ′|λ), which represents the sum of probabilities through the state transitions corresponding to the feature frame sequences of MFCCs, [0107-0112]); 
Wang and Zwyssig do not specifically mention vector quantization.
Junqua discloses vector quantization (VQ-based HMM recognizer, Section 4.2, page 1372).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang and Zwyssig by using vector quantization for reasons similar to those for claim 1. 
Wang, Zwyssig, and Junqua do not specifically mention calculating probabilities of the word sequences of the feature frame sequences in the hidden Markov model as prior probabilities of the word sequences; multiplying the likelihood probability and each of the prior probabilities to obtain posterior probabilities of the word sequences; and using one of the word sequences having a maximum posterior probability as the recognition result.
Paul discloses vector quantization (a vector quantizer converts continuous valued observations into discrete, Section 2.7, page 48); calculating probabilities of the word sequences of the feature frame sequences in the hidden Markov model as prior probabilities of the word sequences (observation probabilities of the classes, Section 2.3, page 44); multiplying the likelihood probability and each of the prior probabilities to obtain posterior probabilities of the word sequences (equation 3, Section 2.3, page 44, the word sequence probabilities modeled using the language model described at section 5, pages 54-55); and using one of the word sequences having a maximum posterior probability as the recognition result (recognition is performed by finding the best word path through this network, Section 5, pages 54-55).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang, Zwyssig, and Junqua by calculating probabilities of the word sequences of the feature frame sequences in the hidden Markov model as prior probabilities of the word sequences; multiplying the likelihood probability and each of the prior probabilities to obtain posterior probabilities of the word sequences; and using one of the word sequences having a maximum posterior probability as the recognition result in order to improve recognition performance, as suggested by Paul (Section 1, page 41), with predictably applications in industry as identified by Paul (Section 1, page 41). The references cited are analogous art in the same field of audio processing.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US 20200027462) in view of Zwyssig et al. (“A Digital Microphone Array for Distant Speech Recognition”. ICASSP 2010), in further view of Junqua et al. (“A Study of Endpoint Detection Algorithms in Adverse Conditions: Incidence on a DTW and HMM Recognizer”. EUROSPEECH ’91), in further view of Weber et al. (US 9401140).
Consider claim 8, Wang, Zwyssig, and Junqua do not, but Weber discloses the digital signal processor is further configured to receive the hidden Markov model sent by a server, and the hidden Markov model is trained by the server (modeling computing device 300 trains acoustic model with HMM, and sends to remote data store, Fig. 1, Col 8 lines 35-55, Col 14 lines 13-48).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wang, Zwyssig, and Junqua such that the digital signal processor is further configured to receive the hidden Markov model sent by a server, and the hidden Markov model is trained by the server in order to make it less expensive, difficult, and time-consuming to train the model, as suggested by Weber (Col 1 lines 22-25), predictably improving recognition accuracy, as suggested by Weber (Col 1 lines 57-59). The references cited are analogous art in the same field of audio processing.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. 

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Jesse S Pullias/
Primary Examiner, Art Unit 2655                                 03/31/26
Read full office action
Prosecution Timeline

Apr 11, 2024
Application Filed
Oct 30, 2025
Non-Final Rejection — §103
Feb 03, 2026
Response Filed
Mar 31, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/385,358
Patent 12596885
Automatically Labeling Items using a Machine-Trained Language Model
2y 5m to grant Granted Apr 07, 2026
17/747,704
Patent 12573378
SPEECH TENDENCY CLASSIFICATION
2y 5m to grant Granted Mar 10, 2026
18/168,450
Patent 12572740
MULTI-LANGUAGE DOCUMENT FIELD EXTRACTION
2y 5m to grant Granted Mar 10, 2026
18/410,097
Patent 12566929
COMBINING DATA SELECTION AND REWARD FUNCTIONS FOR TUNING LARGE LANGUAGE MODELS USING REINFORCEMENT LEARNING
2y 5m to grant Granted Mar 03, 2026
17/838,199
Patent 12536389
TRANSLATION SYSTEM
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
83%
Grant Probability
96%
With Interview (+13.0%)
2y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 1052 resolved cases by this examiner. Grant probability derived from career allow rate.