Prosecution Insights
Last updated: May 29, 2026
Application No. 17/641,634

KEYWORD DETECTIONS BASED ON EVENTS GENERATED FROM AUDIO SIGNALS

Final Rejection §103§112
Filed
Mar 09, 2022
Priority
Oct 17, 2019 — nonprovisional of PCTUS2019056638
Examiner
SERRAGUARD, SEAN ERIN
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Hewlett-Packard Development Company, L.P.
OA Round
4 (Final)
70%
Grant Probability
Favorable
5-6
OA Rounds
0m
Est. Remaining
99%
With Interview

Examiner Intelligence

Grants 70% — above average
70%
Career Allowance Rate
99 granted / 142 resolved
+7.7% vs TC avg
Strong +33% interview lift
Without
With
+33.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
23 currently pending
Career history
180
Total Applications
across all art units

Statute-Specific Performance

§101
0.5%
-39.5% vs TC avg
§103
95.0%
+55.0% vs TC avg
§102
1.4%
-38.6% vs TC avg
§112
2.9%
-37.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 142 resolved cases

Office Action

§103 §112
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner. Response to Amendments Applicant’s amendment filed on 22 July 2025 has been entered. In view of the amendment to the claim(s), the amendment of claim(s) 1-2, 4, 6, 9-10, 14, 18, and 20 have been acknowledged and entered. In view of the amendment to claim(s) 18 and 20, the rejection of claim(s) 18 and 20 under 35 U.S.C. §112(a) is maintained, as modified with consideration of the amendments. In view of the amendment to claim(s) 1-2, 4, 6, 9-10, 14, 18, and 20, the rejection of claims 1-20 under 35 U.S.C. §103 is maintained as modified in response to the amendments. Response to Arguments Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §103, see pages 8-10 of the Response to Non-Final Office Action dated 23 April 2025, which was entered on 22 July 2025 (hereinafter Response and Office Action, respectively), have been fully considered. With respect to the rejection(s) of claim(s) 1, and mutatis mutandis claim(s) 9 and 14, under 35 U.S.C. §103 in light of Georges (U.S. Pat. App. Pub. No. 2018/0293974, hereinafter Georges) in view of Jin (U.S. Pat. App. Pub. No. 2016/0260429, hereinafter Jin), applicant asserts that (1) the combination of Georges and Jin does not disclose or suggest at least “at least one event generator to generate a pattern of events by integrating a raw waveform of the audio signal”. However, this argument is not persuasive. As an initial point, applicant asserts that “Office expressly characterizes Jin's “peripheral model” as corresponding to the claimed event generator.” (Response, pg. 9). However, the peripheral model is only understood as part of the event generator. Jin discloses a peripheral model and a plurality of feature detectors, which, in combination, correspond to the event generator in the instant application. The correlation between the components is better elucidated in comparing FIG. 2 of Jin to FIG. 3 of the instant application, where the peripheral model is understood as equivalent to the integrator 302, and the plurality of feature detectors is understood as equivalent to the comparator of thresholds 304. The rejection below is modified to clarify this distinction in light of the amendments and arguments provided here. Jin teaches the recited limitations. As explained in Jin regarding the peripheral model, the “peripheral model” provides the “frequency components, which are conveyed as simulated AN firing rates to the feature detectors” where “the frequency components may be obtained using a spectrogram method by taking the short-time Fourier transform”. (Jin, [0037]) It is noted that, in the context of Jin, the STFT functions by breaking the raw audio signal into small time segments and then applying an integration to each segment. Examples of this alternative embodiment are further provided at para. [0060] and FIGS. 8 and 10, where Jin explains that “the discrete Fourier transform of the sound waveform” as performed for the STFT “was computed in 9-ms Hamming windows spaced by 1 ms.” (Jin, [0037]). Though considered unnecessary for the rejection, both the peripheral model and the plurality of feature detectors disclose independently performing an integration, where the raw waveform is integrated at the peripheral model. As such, Jin discloses “at least one event generator to generate a pattern of events by integrating a raw waveform of the audio signal.” Therefore, the rejection is maintained over the arguments provided above. With respect to the rejection of claims 18 and 20 under 35 USC 112(a), applicant has amended claim 18 and 20 and asserts that support for the amended claims can be found at paragraph [0041] of the as filed specification. Applicant specifically cites the sentence “Thus, the threshold value in the comparator 304 may be set to generate a number of events that minimizes the processing resources to analyze the data (e.g., reduces power consumption), maximizes the accuracy and confidence of the keyword detector206 to detect the keyword, and provides privacy such that the audio signal cannot be reconstructed from the events that are generated.” (Response, pg. 7). It is asserted that this sentence supports the limitation “setting a threshold value for the event based keyword detector to generate the plurality of events based on a desired level of privacy before the audio signal is received” in amended claim 18. Respectively, it is asserted that this sentence supports the limitation “further comprising instructions to set the threshold value for the plurality of event generators of the event based keyword detector based on … a desired amount of privacy” in amended claim 20. This argument is not persuasive. With respect to claims 18 and 20, the above paragraph does not support the amended limitations. Though, it is agreed that privacy is recited, a “desired level of privacy” is not taught as the basis of setting the threshold value. Privacy itself appears to be an intrinsic result of the system, based on detection of a number of keywords which is less than all words expressed in the utterance. In essence, the fewer words which are detected by the system, the more “privacy” the system provides. However, this is not tantamount to a desired level of amount of privacy. In the sentence itself, “the threshold value in the comparator 304 may be set to generate a number of events.” (Instant Application, para, [0041]). The number of events results in three (3) features simultaneously “minimizes the processing resources to analyze the data (e.g., reduces power consumption), maximizes the accuracy and confidence of the keyword detector 206 to detect the keyword, and provides privacy such that the audio signal cannot be reconstructed from the events that are generated.” This is a result which occurs naturally on the removal of any words from a statement, not a selection or provision of a desire. Though privacy is arguably a generally desirable trait, a desired level of privacy is not disclosed in the specification in any context. As such, the rejection of claims 18 and 20 is maintained as modified in light of the amendments. Applicant further argues that the rejection(s) of dependent claims 2-8, 10-13, and 15-20 should be withdrawn for at least the same reasons as independent claims 1, 9, and 14. Applicant’s arguments in light of the amended claims are not persuasive for the same reasons as described with reference to claims 1, 9, and 14. As such, the rejections of claims 2-8, 10-13, and 15-20 under 35 U.S.C. §103 are maintained in the rejection below, as modified in response to amendments. The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale. Claim Rejections - 35 USC § 112 The following is a quotation of the first paragraph of 35 U.S.C. 112(a): (a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112: The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention. Claims 18 and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Regarding claims 18 and 20, the limitation of a privacy threshold is not supported by the specification as filed. Claim 18 recites “setting a threshold value for the event based keyword detector… based on a desired level of privacy,” at line(s) 2-3. Claim 20 recites “set the threshold value for the plurality of event generators of the event based keyword detector based on… a desired amount of privacy,” at line 8. However, the specification as filed does not provide adequate support for a desired level of privacy. Applicant asserts in the response that the amendments do not constitute new matter, and indicates paragraph [0041] as providing support for the amendments. As such, the Office looks to the specification generally for support. As indicated in the response, paragraph [0041] recites “Thus, the threshold value in the comparator304 may be set to generate a number of events that minimizes the processing resources to analyze the data (e.g., reduces power consumption), maximizes the accuracy and confidence of the keyword detector 206 to detect the keyword, and provides privacy such that the audio signal cannot be reconstructed from the events that are generated.”. (Instant Application, ¶ [0041]). However, this is not a desired level of privacy. The privacy described is a result from “the number of events” being “reduced,” which is a result from “the threshold value” being “increased”. Privacy results from this based on a missing number of events (i.e., missing words). The resulting level of privacy is not “desired” and the resulting threshold value is not shaped by any asserted intent for privacy. In fact, it is unclear what a desired level of privacy would be in this context. Applicant does not describe or provide examples for any such “levels” or “amounts” of privacy, such that privacy can be quantified or such that any levels or amounts of privacy can be established as a desired privacy. As such, the “desired levels of privacy” and “desired amounts of privacy” as recited claims 18 and 20, respectively, constitutes new matter and the claims are rejected. Appropriate correction is required. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-7, 9-12, 14-17, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Georges in view of Jin with further support from Schafer. Regarding claim 1, Georges discloses A device, comprising (The systems and methods described with reference to the “spoken language understanding system”; Georges, ¶ [0030]): a microphone to receive an audio signal (The system receives “audio input signals 130 provided by an audio capture device such as, for example, a microphone or array of microphones.”; Georges, ¶ [0016]); a keyword detector to detect a keyword… (The system detects “a user spoken key-phrase included in an initial segment of an audio signal”; Georges, ¶ [0034]); and a digital signal processor in communication with the keyword detector, (the key-phrase detection {the keyword detector} triggers {in communication with} “an automatic speech recognition (ASR) processor is triggered, in response to the key-phrase detection. In some embodiments, the triggering of the ASR processor comprises waking the ASR processor from a relatively lower power consumption idle state, to a relatively higher power consumption recognition state.”; Georges, ¶ [0035]) wherein the digital signal processor is activated in response to detection of the keyword to analyze subsequent audio streams (“an automatic speech recognition (ASR) processor is triggered {activated}, in response to the key-phrase detection” where “the ASR processor recognizes speech based on a combination of both the buffered initial segment and on one or more additional received segments of the audio signal which include further speech from the user.”; Georges, ¶ [0036]). However, Georges fails to expressly recite an event generator to generate a pattern of events by integrating a raw waveform of the audio signal, an amount of data associated with the pattern of events being smaller than an amount of data associated with the audio signal itself, [and] a keyword detector to detect a keyword based on the pattern of events. Jin teaches systems and methods for noise robust speech recognition. (Jin, ¶ [0005]). Regarding claim 1, Jin teaches a microphone to receive an audio signal (“In use, a user may speak commands” where the “spoken words or commands are sensed by an audio pickup such as a microphone to produce an audio signal.”; Jin, ¶ [0082]); at least one event generator to generate a pattern of events (Discloses “The input audio signal is processed as disclosed above to produce a spike sequence” based on performing a transformation of the “speech waveform {a waveform of the audio signal},” depicted in FIG. 2, using a “peripheral model” which, in combination with the feature detector, is the at least one event generator, where “the peripheral model” processes the waveform to generate “frequency components, which are conveyed as simulated AN firing rates to the feature detectors {generate a pattern of events directly from...}”; Jin, ¶ [0035], [0037], [0082], FIG. 2) by integrating a raw waveform of the audio signal (the “frequency components” which “are conveyed as simulated AN firing rates to the feature detectors... may be obtained using a spectrogram method by taking the short-time Fourier transform” where the STFT is an integration of the raw waveform of the audio signal.; Jin, ¶ [0035], [0037], FIG. 2), the amount of data associated with the pattern of events being smaller than the amount of data associated with the audio signal itself (Though not expressly recited as a comparison of data amounts, the spike code provided in Jin is a sparse sequence of discrete feature detection events which can and, in certain embodiments, does disregard the precise time intervals between spikes, and is represented simply as an ordered list of detector labels. An ordered list of detector labels is necessarily a smaller amount of data than an amount of data associated with the audio signal from which those spike signals are derived. This reduction in size is supported by Schafer, which is the accompanying dissertation for this patent application, which clarifies that “In our system, both the input speech and the stored templates are stored as sequences of discrete neuron identifiers, which in practice take the form of two-byte unsigned integers. This represents a 10x to 100x reduction in memory requirements,” as compared to the original waveforms “with similar savings in CPU time.”; Jin, ¶ [0005], [0039], [0050]-[0051], [0057]), a keyword detector to detect a keyword based on the pattern of events (“this spike sequence is compared to one or more spike sequences generated from training data” where the systems and methods described here can be used “for automated speech recognition of a fixed set of vocal commands”; Jin, ¶ [0082]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include an event generator to generate a pattern of events by integrating a raw waveform of the audio signal, an amount of data associated with the pattern of events being smaller than an amount of data associated with the audio signal itself, [and] a keyword detector to detect a keyword based on the pattern of events. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 2, the rejection of claim 1 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite further comprising: a raster plot generator to generate the raster plot of the events. The relevance of Jin is described above with relation to claim 1. Regarding claim 2, Jin teaches further comprising: a raster plot generator to generate a raster plot of the events (As shown in FIG. 7, the spike sequence can be used to generate “[r]aster plots of feature detector spikes {...of the plurality of events}”; Jin, ¶ [0082], [0017], FIG. 7). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include further comprising: a raster plot generator to generate the raster plot of the events. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 3, the rejection of claim 1 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the keyword detector comprises a neural network. The relevance of Jin is described above with relation to claim 1. Regarding claim 3, Jin teaches wherein the keyword detector comprises a neural network (“The input audio signal is processed” using “artificial neurons trained to selectively respond to specific speech features” where “artificial neurons” is a neural network.; Jin, ¶ [0082], [0030]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the keyword detector comprises a neural network. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 4, the rejection of claim 1 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the event generator comprises: an integrator to integrate the waveform of the audio signal; a comparator to compare values of an integrated audio signal to a threshold, wherein an event is generated for each value of the integrated audio signal that exceeds the threshold; and a reset timer to pause the integrator for a predefined time after the event is generated. The relevance of Jin is described above with relation to claim 1. Regarding claim 4, Jin teaches wherein the event generator comprises: an integrator to integrate the raw waveform of the audio signal (The peripheral model which performs the transformation of the “speech waveform,” depicted in FIG. 2, conveys “simulated AN firing rates to the feature detectors,” where the frequency components “may be obtained using a spectrogram method by taking the short-time Fourier transform.” As explained above and supported by the explanations in Stankovic, a short-time Fourier transform is well-known in the relevant art as the integral {...an integrator} of multiple overlapping windows of an audio signal {...to integrate the raw waveform of the audio signal}, having of width T with respect to time t.; Jin, ¶ [0035], [0037], [0082], FIG. 2); a comparator to compare values of an integrated audio signal to a threshold, (“feature detector is modeled as an artificial neuron that takes a weighted sum σ(t)=w·s(t) of its inputs and spikes {integrated audio signal} at peaks in σ(t) that exceed a... threshold”; Jin, ¶ [0039]) wherein an event is generated for each value of the integrated audio signal that exceeds the threshold (Spikes are determined based on exceeding the threshold; Jin, ¶ [0039]); and a reset timer to pause the integrator for a predefined time after the event is generated (identifying spikes of the auditory signal comprises conditioning the auditory signal, which can include “Start/end pause priors: Eliminate any peaks that are within 250 ms of the beginning or end of the file.”; Jin, ¶ [0066], [0073]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the event generator comprises: an integrator to integrate the waveform of the audio signal; a comparator to compare values of an integrated audio signal to a threshold, wherein an event is generated for each value of the integrated audio signal that exceeds the threshold; and a reset timer to pause the integrator for a predefined time after the event is generated. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 5, the rejection of claim 4 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the threshold comprises a positive threshold and a negative threshold. The relevance of Jin is described above with relation to claim 1. Regarding claim 5, Jin teaches wherein the threshold comprises a positive threshold and a negative threshold (The threshold, though described in the context of positive values, is understood to encompass negative values as well as positive values. The conversion of values between positive and negative is mathematically trivial for a person having ordinary skill in the art (e.g., multiplying all values by -1).; Jin, ¶ [0039]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the threshold comprises a positive threshold and a negative threshold. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 6, the rejection of claim 2 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the event generator comprises a plurality of event generators each to integrate the raw waveform of the audio signal, wherein the raster plot generated by the raster plot generator includes each event generated by each one of the plurality of event generators. The relevance of Jin is described above with relation to claim 1. Regarding claim 6, Jin teaches wherein the at least one event generator is a plurality of event generators (Discloses a plurality of feature detectors, where the combination of the peripheral model and a feature detector is the event generator. A plurality of feature detectors, each in combination with the peripheral model, is a plurality of event generators.; Jin, ¶ [0037], [0045], [0057]) each to integrate the raw waveform of the audio signal, (the “frequency components” which “are conveyed as simulated AN firing rates to the feature detectors... may be obtained using a spectrogram method by taking the short-time Fourier transform” where the STFT is an integration of the raw waveform of the audio signal.; Jin, ¶ [0035], [0037], FIG. 2) wherein the raster plot generated by the raster plot generator includes each event generated by each one of the plurality of event generators (“As shown in FIG. 7, the “231 feature detectors... are graphed {by each one of the plurality of event generators},” where 231 feature detectors generated a spike {includes each event generated…}.; Jin, ¶ [0017], [0057], FIG. 7). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the event generator comprises a plurality of event generators, wherein the raster plot generated by the raster plot generator includes each event generated by each one of the plurality of event generators. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 7, the rejection of claim 6 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the plurality of event generators are set with different thresholds. The relevance of Jin is described above with relation to claim 1. Regarding claim 7, Jin teaches wherein the plurality of event generators are set with different thresholds (“In addition, the threshold levels are also determined separately for each neuron during training to maximize discriminative ability,”; Jin, ¶ [0039]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the plurality of event generators are set with different thresholds. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 9, Georges discloses A method, comprising (The systems and methods described with reference to the “spoken language understanding system”; Georges, ¶ [0030]): : receiving an audio signal (The system receives “audio input signals 130 provided by an audio capture device such as, for example, a microphone or array of microphones.”; Georges, ¶ [0016]) ...detect a pattern... that is associated with a keyword (The system detects “a user spoken key-phrase included in an initial segment of an audio signal”; Georges, ¶ [0034]); and activating a digital signal processor in response to the keyword being detected to analyze subsequent audio streams (“an automatic speech recognition (ASR) processor is triggered, in response to the key-phrase detection” where “the ASR processor recognizes speech based on a combination of both the buffered initial segment and on one or more additional received segments of the audio signal which include further speech from the user.”; Georges, ¶ [0035]-[0036]). However, Georges fails to expressly recite generating, by an event based keyword detector, a plurality of events by integrating a raw waveform of the audio signal, an amount of data associated with the pattern of events being smaller than an amount of data associated with the audio signal itself; generating, by the event based keyword detector, a raster plot of the plurality of events; analyzing, by the event based keyword detector, the raster plot to detect a pattern in the plurality of events that is associated with a keyword. The relevance of Jin is described above with relation to claim 1. Regarding claim 9, Jin teaches comprising: receiving an audio signal (“In use, a user may speak commands” where the “spoken words or commands are sensed by an audio pickup such as a microphone to produce an audio signal.”; Jin, ¶ [0082]); generating, by an event based keyword detector, a plurality of events (Discloses “The input audio signal is processed as disclosed above to produce a spike sequence” based on performing a transformation of the “speech waveform,” depicted in FIG. 2, using a “peripheral model” which, in combination with the feature detector, is the at least one event generator, where “the peripheral model” processes the waveform to generate “frequency components, which are conveyed as simulated AN firing rates to the feature detectors {generating... a plurality of events...}”; Jin, ¶ [0035], [0037], [0082], FIG. 2) by integrating a raw waveform of the audio signal (the “frequency components” which “are conveyed as simulated AN firing rates to the feature detectors... may be obtained using a spectrogram method by taking the short-time Fourier transform” where the STFT is an integration of the raw waveform of the audio signal.; Jin, ¶ [0035], [0037], FIG. 2), the amount of data associated with the plurality of events being smaller than the amount of data associated with the audio signal itself (Though not expressly recited as a comparison of data amounts, the spike code provided in Jin is a sparse sequence of discrete feature detection events which can and, in certain embodiments, does disregard the precise time intervals between spikes, and is represented simply as an ordered list of detector labels. An ordered list of detector labels is necessarily a smaller amount of data than an amount of data associated with the audio signal from which those spike signals are derived. This reduction in size is supported by Schafer, which is the accompanying dissertation for this patent application, which clarifies that “In our system, both the input speech and the stored templates are stored as sequences of discrete neuron identifiers, which in practice take the form of two-byte unsigned integers. This represents a 10x to 100x reduction in memory requirements,” as compared to the original waveforms “with similar savings in CPU time.”; Jin, ¶ [0005], [0039], [0050]-[0051], [0057]), generating, by the event based keyword detector, a raster plot of the plurality of events (As shown in FIG. 7, the spike sequence can be used to generate a “Raster plots of feature detector spikes {...of the plurality of events}”; Jin, ¶ [0082], [0017], FIG. 7); analyzing, by the event based keyword detector, the raster plot to detect a pattern in the plurality of events that is associated with a keyword (“this spike sequence is compared to one or more spike sequences generated from training data” where the systems and methods described here can be used “for automated speech recognition of a fixed set of vocal commands”; Jin, ¶ [0082]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include generating, by an event based keyword detector, a plurality of events by integrating a raw waveform of the audio signal, an amount of data associated with the pattern of events being smaller than an amount of data associated with the audio signal itself; generating, by the event based keyword detector, a raster plot of the plurality of events; analyzing, by the event based keyword detector, the raster plot to detect a pattern in the plurality of events that is associated with a keyword. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 10, the rejection of claim 9 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the generating the plurality of events, comprises: integrating the audio signal over time; comparing a value of an integrated audio signal at a particular time to a threshold; and generating an event when the value of the integrated audio signal exceeds the threshold. The relevance of Jin is described above with relation to claim 1. Regarding claim 10, Jin teaches wherein the generating the plurality of events, comprises: integrating the raw waveform of the audio signal over time (The peripheral model which performs the transformation of the “speech waveform,” depicted in FIG. 2, conveys “simulated AN firing rates to the feature detectors,” where the frequency components “may be obtained using a spectrogram method by taking the short-time Fourier transform” As explained above and supported by the explanations in Stankovic, a short-time Fourier transform is well-known in the relevant art as the integral {...an integrator} of multiple overlapping windows of an audio signal {...to integrate the raw waveform of the audio signal}, having of width T with respect to time t {over time}.; Jin, ¶ [0035], [0037], [0082], FIG. 2); comparing a value of an integrated audio signal at a particular time to a threshold (using the integrated audio signals produced by the STFT, the “feature detector is modeled as an artificial neuron that takes a weighted sum σ(t)=w·s(t) of its inputs and spikes at peaks in σ(t) that exceed a fixed threshold”; Jin, ¶ [0039]); and generating an event when the value of the integrated audio signal exceeds the threshold (Spikes are determined based on exceeding the threshold; Jin, ¶ [0039]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the generating the plurality of events, comprises: integrating the audio signal over time; comparing a value of an integrated audio signal at a particular time to a threshold; and generating an event when the value of the integrated audio signal exceeds the threshold. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 11, the rejection of claim 10 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite further comprising: pausing for a predefined period of time after the event is generated before continuing to integrate the audio signal over time. The relevance of Jin is described above with relation to claim 1. Regarding claim 11, Jin teaches further comprising: pausing for a predefined period of time after the event is generated before continuing to integrate the audio signal over time (identifying spikes of the auditory signal comprises conditioning the auditory signal, which can include “Start/end pause priors: Eliminate any peaks that are within 250 ms of the beginning or end of the file.”; Jin, ¶ [0066], [0073]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include further comprising: pausing for a predefined period of time after the event is generated before continuing to integrate the audio signal over time. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 12, the rejection of claim 9 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite further comprising: tuning the event based keyword detector to detect the keyword with an amount of accuracy and an amount of confidence above a desired threshold before the audio signal is received. The relevance of Jin is described above with relation to claim 1. Regarding claim 12, Jin teaches further comprising: tuning the event based keyword detector to detect the keyword with an amount of accuracy and an amount of confidence above a desired threshold before the audio signal is received (“In designing the encoding scheme, it was hypothesized that selective tuning of the feature detectors could yield spike codes that are robust to additive acoustic noise,” where robustness to noise refers to both “the hit rate and false hit rate {an amount of accuracy and an amount of confidence}” and where any threshold may be a desired threshold.; Jin, ¶ [0048]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include further comprising: tuning the event based keyword detector to detect the keyword with an amount of accuracy and an amount of confidence above a desired threshold before the audio signal is received. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 14, Georges discloses A non-transitory computer readable storage medium encoded with instructions executable by a processor (The systems and methods described with reference to the “spoken language understanding system” wherein “the methodology depicted can be implemented as a computer program product including one or more non-transitory machine readable mediums that when executed by one or more processors cause the methodology to be carried out. “; Georges, ¶ [0030], [0033]): instructions to set a threshold value for a plurality of event generators of an event based keyword detector (The system sets threshold values for a plurality of feature detectors “such that the neuron spikes in response to the preferred feature but not in response to the background,” where each of the plurality of feature detectors, alongside the peripheral model, form a plurality of event generators.; Jin, ¶ [0037]-[0039]), the non-transitory computer-readable storage medium comprising: instructions to receive an audio signal (The system receives “audio input signals 130 provided by an audio capture device such as, for example, a microphone or array of microphones.”; Georges, ¶ [0016]); instructions to detect a keyword … based on a pattern… (The system detects “a user spoken key-phrase included in an initial segment of an audio signal”; Georges, ¶ [0034]); and instructions to activate a digital signal processor after the keyword is detected by the event based keyword detector to analyze subsequent audio streams (“an automatic speech recognition (ASR) processor is triggered, in response to the key-phrase detection” where “the ASR processor recognizes speech based on a combination of both the buffered initial segment and on one or more additional received segments of the audio signal which include further speech from the user.”; Georges, ¶ [0035]-[0036]). However, Georges fails to expressly recite instructions to set a threshold value for a plurality of event generators of an event based keyword detector, instructions to detect a keyword by integrating a raw waveform of the audio signal by the event based keyword detector based on a pattern of events generated by each one of the plurality of event generators and recorded in a raster plot, wherein an event is generated when an integrated audio signal value exceeds the threshold and wherein an amount of data associated with the pattern of events being smaller than an amount of data associated with the audio signal itself. The relevance of Jin is described above with relation to claim 1. Regarding claim 14, Jin teaches instructions to receive an audio signal (“In use, a user may speak commands” where the “spoken words or commands are sensed by an audio pickup such as a microphone to produce an audio signal.”; Jin, ¶ [0082]); instructions to detect a keyword by integrating a raw waveform of the audio signal by the event based keyword detector (Discloses “The input audio signal is processed as disclosed above to produce a spike sequence” based on performing a transformation of the “speech waveform {a waveform of the audio signal},” depicted in FIG. 2, using a “peripheral model” which, in combination with the feature detector, is the at least one event generator, and is used to detect the keyword, where “the peripheral model” processes the waveform to generate “frequency components, which are conveyed as simulated AN firing rates to the feature detectors {generate a pattern of events}” and where the “frequency components” which “are conveyed as simulated AN firing rates to the feature detectors... may be obtained using a spectrogram method by taking the short-time Fourier transform” where the STFT is an integration of the raw waveform of the audio signal.; Jin, ¶ [0035], [0037], [0082], FIG. 2) based on a pattern of events generated by each one of the plurality of event generators and recorded in a raster plot, (As shown in FIG. 7, the spike sequence can be used to generate a “Raster plots of feature detector spikes {...of the plurality of events}” where “this spike sequence is compared to one or more spike sequences generated from training data” where the systems and methods described here can be used “for automated speech recognition of a fixed set of vocal commands”; Jin, ¶ [0082], [0017], FIG. 7) wherein an event is generated when an integrated audio signal value exceeds the threshold (“temporal integration is implemented by including several time-delayed copies of the AN response as the input to each feature detector” and “feature detector is modeled as an artificial neuron that takes a weighted sum σ(t)=w·s(t) of its inputs and spikes at peaks in σ(t) that exceed a fixed threshold”; Jin, ¶ [0038]-[0039]) and wherein the amount of data associated with the pattern of events being smaller than the amount of data associated with the audio signal itself (Though not expressly recited as a comparison of data amounts, the spike code provided in Jin is a sparse sequence of discrete feature detection events which can and, in certain embodiments, does disregard the precise time intervals between spikes, and is represented simply as an ordered list of detector labels. An ordered list of detector labels is necessarily a smaller amount of data than an amount of data associated with the audio signal from which those spike signals are derived. This reduction in size is supported by Schafer, which is the accompanying dissertation for this patent application, which clarifies that “In our system, both the input speech and the stored templates are stored as sequences of discrete neuron identifiers, which in practice take the form of two-byte unsigned integers. This represents a 10x to 100x reduction in memory requirements,” as compared to the original waveforms “with similar savings in CPU time.”; Jin, ¶ [0005], [0039], [0050]-[0051], [0057]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include instructions to detect a keyword directly from a waveform of the audio signal by the event based keyword detector based on a pattern of events generated by each one of the plurality of event generators and recorded in a raster plot, wherein an event is generated when an integrated audio signal value exceeds the threshold. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 15, the rejection of claim 14 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the amount of data associated with the audio signal itself is ten times the amount of data associated with the pattern of events. The relevance of Jin is described above with relation to claim 1. Regarding claim 15, Jin teaches wherein increasing the threshold value decreases an amount of events generated by an event generator of the plurality of event generators and decreasing the threshold value increases the amount of events generated by the event generator (“Spikes (in which the peaks are circled) are assigned at peaks in the response that surpass a fixed threshold (horizontal gray lines),” where it is understood that increasing the threshold value will not increase the height of the spike. As such, the number of spikes detected {events generated} at the higher threshold, when established as a fixed threshold, will decrease. Conversely, decreasing the threshold value, for at least the fixed threshold embodiment, will increase the number of spikes detected {events generated}.; Jin, ¶ [0012]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein increasing the threshold value decreases an amount of events generated by an event generator of the plurality of event generators and decreasing the threshold value increases the amount of events generated by the event generator. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 16, the rejection of claim 1 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the amount of data associated with the audio signal itself is ten times the amount of data associated with the pattern of events. The relevance of Jin is described above with relation to claim 1. Regarding claim 16, Jin teaches wherein the amount of data associated with the audio signal itself is ten times the amount of data associated with the pattern of events (Though not expressly recited as a comparison of data amounts, examiner takes official notice that this limitation is implicitly taught by Jin, as explained by Schafer. The spike code provided in Jin is a sparse sequence of discrete feature detection events which can and, in certain embodiments, does disregard the precise time intervals between spikes, and is represented simply as an ordered list of detector labels. An ordered list of detector labels is necessarily a smaller amount of data than an amount of data associated with the audio signal from which those spike signals are derived. As further explained in Schafer, which is the accompanying dissertation for Jin (See Schafer, pg. 119, para. 2), for the system described in Jin, “both the input speech and the stored templates are stored as sequences of discrete neuron identifiers, which in practice take the form of two-byte unsigned integers. This represents a 10x to 100x reduction in memory requirements,” as compared to the original waveforms “with similar savings in CPU time.” (See Schafer, pg. 123, para. 1); Jin, ¶ [0005], [0039], [0050]-[0051], [0057]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the amount of data associated with the audio signal itself is ten times the amount of data associated with the pattern of events. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 17, the rejection of claim 9 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the amount of data associated with the audio signal itself is ten times the amount of data associated with the pattern of events. The relevance of Jin is described above with relation to claim 1. Regarding claim 17, Jin teaches wherein the amount of data associated with the audio signal itself is ten times the amount of data associated with the plurality of events. (Though not expressly recited as a comparison of data amounts, examiner takes official notice that this limitation is implicitly taught by Jin, as explained by Schafer. The spike code provided in Jin is a sparse sequence of discrete feature detection events which can and, in certain embodiments, does disregard the precise time intervals between spikes, and is represented simply as an ordered list of detector labels. An ordered list of detector labels is necessarily a smaller amount of data than an amount of data associated with the audio signal from which those spike signals are derived. As further explained in Schafer, which is the accompanying dissertation for Jin (See Schafer, pg. 119, para. 2), for the system described in Jin, “both the input speech and the stored templates are stored as sequences of discrete neuron identifiers, which in practice take the form of two-byte unsigned integers. This represents a 10x to 100x reduction in memory requirements,” as compared to the original waveforms “with similar savings in CPU time.” (See Schafer, pg. 123, para. 1); Jin, ¶ [0005], [0039], [0050]-[0051], [0057]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the amount of data associated with the audio signal itself is ten times the amount of data associated with the pattern of events. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 19, the rejection of claim 14 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the amount of data associated with the audio signal itself is ten times the amount of data associated with the pattern of events. The relevance of Jin is described above with relation to claim 1. Regarding claim 19, Jin teaches wherein the amount of data associated with the audio signal itself is ten times the amount of data associated with the pattern of events (Though not expressly recited as a comparison of data amounts, examiner takes official notice that this limitation is implicitly taught by Jin, as explained by Schafer. The spike code provided in Jin is a sparse sequence of discrete feature detection events which can and, in certain embodiments, does disregard the precise time intervals between spikes, and is represented simply as an ordered list of detector labels. An ordered list of detector labels is necessarily a smaller amount of data than an amount of data associated with the audio signal from which those spike signals are derived. As further explained in Schafer, which is the accompanying dissertation for Jin (See Schafer, pg. 119, para. 2), for the system described in Jin, “both the input speech and the stored templates are stored as sequences of discrete neuron identifiers, which in practice take the form of two-byte unsigned integers. This represents a 10x to 100x reduction in memory requirements,” as compared to the original waveforms “with similar savings in CPU time.” (See Schafer, pg. 123, para. 1); Jin, ¶ [0005], [0039], [0050]-[0051], [0057]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the amount of data associated with the audio signal itself is ten times the amount of data associated with the pattern of events. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Regarding claim 20, the rejection of claim 14 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite further comprising instructions to set the threshold value for the plurality of event generators of the event based keyword detector based on at least one of a desired amount of processing resources to analyze the data, a desired amount of accuracy and confidence, or a desired amount of privacy. The relevance of Jin is described above with relation to claim 1. Regarding claim 20, Jin teaches further comprising instructions to set the threshold value for the plurality of event generators of the event based keyword detector based on at least one of a desired amount of processing resources to analyze the data, a desired amount of accuracy and confidence, or a desired amount of privacy (Each feature detector is modeled as an artificial neuron that takes a weighted sum σ(t)=w·s(t) of its inputs and spikes at peaks in σ(t) that exceed a fixed threshold {below a desired processing threshold}; Jin, ¶ [0039]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include further comprising instructions to set the threshold value for the plurality of event generators of the event based keyword detector based on at least one of a desired amount of processing resources to analyze the data, a desired amount of accuracy and confidence, or a desired amount of privacy. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). Claims 8 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Georges and Jin as applied to claims 6 and 12 above, and further in view of Sayyadi-Harikandehei (U.S. Pat. App. Pub. No. 2020/0152205, hereinafter Sayyadi-Harikandehei). Regarding claim 8, the rejection of claim 6 is incorporated. Georges and Jin disclose all of the elements of the current invention as stated above. However, Georges fails to expressly recite further comprising a feedback loop, wherein the feedback loop is to adjust at least one of: a threshold value of the plurality of event generators or an enable setting of the plurality of event generators based on a confidence score from a confidence calculator and an accuracy score from the digital signal processor from a sampled version of the audio signal. The relevance of Jin is described above with relation to claim 1. Regarding claim 8, Jin teaches further comprising a feedback loop, wherein the feedback loop is to adjust at least one of: a threshold value of the plurality of event generators or an enable setting of the plurality of event generators (“Each feature detector is modeled as an artificial neuron that takes a weighted sum σ(t)=w·s(t) of its inputs and spikes at peaks in σ(t) that exceed a fixed threshold” where “the threshold levels are also determined separately for each neuron” and “the threshold could be made to vary based upon the variance or other activity level of the signal σ(t) over a past history of 1-60 seconds.”; Jin, ¶ [0039]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include further comprising a feedback loop, wherein the feedback loop is to adjust at least one of: a threshold value of the plurality of event generators or an enable setting of the plurality of event generators. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]). However, Georges and Jin fail to expressly recite based on a confidence score from a confidence calculator and an accuracy score from the digital signal processor from a sampled version of the audio signal. Sayyadi-Harikandehei teaches systems and methods for increasing scrutiny of an audio input. (Sayyadi-Harikandehei, ¶ [0003]). Regarding claim 8, Sayyadi-Harikandehei teaches based on a confidence score from a confidence calculator and an accuracy score from the digital signal processor from a sampled version of the audio signal (“The user device may assign a confidence score indicative of the accuracy of the detection of the wake word (e.g., did the user device detect an actual wake word/phrase, a different/similar word/phrase, background noise, etc.). The user device may compare the confidence score to the threshold (e.g., a wake word detection threshold, etc.) to determine whether to accept one or more words included with audio content as a wake word (or wake phrase) or not. The user device may modify the threshold based on whether the user device recognizes the audio content as originating from an authorized user.”; Sayyadi-Harikandehei, ¶ [0025]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges as modified by the spike based keyword detection of Jin to incorporate the teachings of Sayyadi-Harikandehei to include based on a confidence score from a confidence calculator and an accuracy score from the digital signal processor from a sampled version of the audio signal. Controlling the threshold for detection based on desired characteristics can improve accurate detection based on user or operator needs, which improves the user experience, as recognized by Sayyadi-Harikandehei. (Sayyadi-Harikandehei, ¶ [0027]-[0029]). Regarding claim 13, the rejection of claim 12 is incorporated. Georges and Jin disclose all of the elements of the current invention as stated above. Georges further discloses wherein the tuning, comprises: storing the audio signal in memory (“a received audio signal, which is stored in a buffer for subsequent use by the ASR processor.”; Georges, ¶ [0011]). However, Georges and Jin fail to expressly recite comparing detection of the keyword by the event based keyword detector to detection of the keyword by the digital signal processor in the audio signal stored in the memory; adjusting a threshold value of at least one of a plurality of event generators of the event based keyword detector or disabling at least one of the plurality of event generators of the event based keyword detector; and repeating the storing, the comparing, and the adjusting until the amount of accuracy and the amount of confidence is above the desired threshold. The relevance of Sayyadi-Harikandehei is described above with relation to claim 8. Regarding claim 13, Sayyadi-Harikandehei teaches comparing detection of the keyword by the event based keyword detector to detection of the keyword by the digital signal processor in the audio signal stored in the memory (“The user device may assign a confidence score indicative of the accuracy of the detection of the wake word (e.g., did the user device detect an actual wake word/phrase, a different/similar word/phrase, background noise, etc.).”; Sayyadi-Harikandehei, ¶ [0025]); adjusting a threshold value of at least one of a plurality of event generators of the event based keyword detector or disabling at least one of the plurality of event generators of the event based keyword detector (“The user device may compare the confidence score to the threshold (e.g., a wake word detection threshold, etc.) to determine whether to accept one or more words included with audio content as a wake word (or wake phrase) or not. The user device may modify the threshold based on whether the user device recognizes the audio content as originating from an authorized user.”; Sayyadi-Harikandehei, ¶ [0025]); and repeating the storing, the comparing, and the adjusting until the amount of accuracy and the amount of confidence is above the desired threshold (A continuation of failed detections or inadequate confidence levels are understood to result in further updating to the threshold until acceptable values are established.; Sayyadi-Harikandehei, ¶ [0025]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges as modified by the spike based keyword detection of Jin to incorporate the teachings of Sayyadi-Harikandehei to include comparing detection of the keyword by the event based keyword detector to detection of the keyword by the digital signal processor in the audio signal stored in the memory; adjusting a threshold value of at least one of a plurality of event generators of the event based keyword detector or disabling at least one of the plurality of event generators of the event based keyword detector; and repeating the storing, the comparing, and the adjusting until the amount of accuracy and the amount of confidence is above the desired threshold. Controlling the threshold for detection based on desired characteristics can improve accurate detection based on user or operator needs, which improves the user experience, as recognized by Sayyadi-Harikandehei. (Sayyadi-Harikandehei, ¶ [0027]-[0029]). Claim 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Georges and Jin as applied to claim 9 above, and further in view of Angel (U.S. Pat. App. Pub. No. 2019/0066686, hereinafter Angel). Regarding claim 18, the rejection of claim 9 is incorporated. Georges and Jin disclose all of the elements of the current invention as stated above. However, Georges and Jin fail to expressly recite further comprising: setting a threshold value for the event based keyword detector to generate the plurality of events based on a desired level of privacy before the audio signal is received. Angel teaches systems and methods for protecting sensitive information collected during verbal communications. (Angel, ¶ [0001]). Regarding claim 18, Angel teaches further comprising: setting a threshold value for the event based keyword detector to generate the plurality of events based on a desired level of privacy before the audio signal is received (“if a confidence score associated with a protected piece is below a given threshold, a user is notified that further action and/or confirmation is required. In an embodiment, privacy preserving program 102 updates or adjusts the confidence score based on information received from a user in response to the notification.”; Angel, ¶ [0027]). It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges as modified by the spike based keyword detection of Jin to incorporate the teachings of Angel to include further comprising: setting a threshold value for the event based keyword detector to generate the plurality of events based on a desired level of privacy before the audio signal is received. The systems and methods of Angel “provide for an adaptable, policy driven sanitation of sensitive information” received by “voice application devices” which “recognize that multiple instances, as well as different levels of sensitive information” to “provide for a policy driven hierarchical approach to protecting different classes or types of sensitive information” which “provide[s] for the protection of contextually sensitive information, and not just particular keywords or phrases,” resulting in better overall protection of user privacy, as recognized by Angel. (Angel, ¶ [0013]). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Khellah et al. (U.S. Pat. App. Pub. No. 20190115011) discloses detecting keywords in audio including an audio receiver to receive audio, a spike transducer to convert the audio into a plurality of spikes, and a spiking neural network to receive one or more of the spikes and generate a spike corresponding to a detected keyword. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Sean E Serraguard/Patent Examiner, Art Unit 2657
Read full office action

Prosecution Timeline

Show 6 earlier events
Feb 05, 2025
Response after Non-Final Action
Apr 23, 2025
Non-Final Rejection mailed — §103, §112
Jul 22, 2025
Response Filed
Oct 23, 2025
Final Rejection mailed — §103, §112
Dec 08, 2025
Interview Requested
Dec 16, 2025
Applicant Interview (Telephonic)
Dec 16, 2025
Examiner Interview Summary
Apr 27, 2026
Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12614545
CROSS-DEVICE DATA SYNCHRONIZATION BASED ON SIMULTANEOUS HOTWORD TRIGGERS
2y 9m to grant Granted Apr 28, 2026
Patent 12609109
Speech Recognition Method and Apparatus, and Computer-Readable Storage Medium
4y 2m to grant Granted Apr 21, 2026
Patent 12603095
Stereo Audio Signal Delay Estimation Method and Apparatus
3y 3m to grant Granted Apr 14, 2026
Patent 12598250
SYSTEMS AND METHODS FOR COHERENT AND TIERED VOICE ENROLLMENT
4y 2m to grant Granted Apr 07, 2026
Patent 12597429
PACKET LOSS CONCEALMENT
3y 3m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

5-6
Expected OA Rounds
70%
Grant Probability
99%
With Interview (+33.0%)
3y 0m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 142 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month