Last updated: April 19, 2026
Application No. 17/641,634
KEYWORD DETECTIONS BASED ON EVENTS GENERATED FROM AUDIO SIGNALS

Final Rejection §103§112
Filed
Mar 09, 2022
Examiner
SERRAGUARD, SEAN ERIN
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Hewlett-Packard Development Company, L.P.
OA Round
4 (Final)
Interview Optional

— +33.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 134 resolved cases, 2023–2026
Examiner Intelligence

SERRAGUARD, SEAN ERIN View full profile →
Grants 69% — above average
Career Allow Rate
92 granted / 134 resolved
+6.7% vs TC avg
Strong +34% interview lift
Without
With
+33.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
43 currently pending
Career history
177
Total Applications
across all art units
Statute-Specific Performance

§101
9.4%
-30.6% vs TC avg
§103
49.7%
+9.7% vs TC avg
§102
18.6%
-21.4% vs TC avg
§112
19.2%
-20.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 134 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Response to Amendments 
Applicant’s amendment filed on 22 July 2025 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 1-2, 4, 6, 9-10, 14, 18, and 20 have been acknowledged and entered.  
In view of the amendment to claim(s) 18 and 20, the rejection of claim(s) 18 and 20 under 35 U.S.C. §112(a) is maintained, as modified with consideration of the amendments.
In view of the amendment to claim(s) 1-2, 4, 6, 9-10, 14, 18, and 20, the rejection of claims 1-20 under 35 U.S.C. §103 is maintained as modified in response to the amendments.

Response to Arguments
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §103, see pages 8-10 of the Response to Non-Final Office Action dated 23 April 2025, which was entered on 22 July 2025 (hereinafter Response and Office Action, respectively), have been fully considered.
With respect to the rejection(s) of claim(s) 1, and mutatis mutandis claim(s) 9 and 14, under 35 U.S.C. §103 in light of Georges (U.S. Pat. App. Pub. No. 2018/0293974, hereinafter Georges) in view of Jin (U.S. Pat. App. Pub. No. 2016/0260429, hereinafter Jin), applicant asserts that (1) the combination of Georges and Jin does not disclose or suggest at least “at least one event generator to generate a pattern of events by integrating a raw waveform of the audio signal”. However, this argument is not persuasive. 
As an initial point, applicant asserts that “Office expressly characterizes Jin's “peripheral model” as corresponding to the claimed event generator.” (Response, pg. 9). However, the peripheral model is only understood as part of the event generator. Jin discloses a peripheral model and a plurality of feature detectors, which, in combination, correspond to the event generator in the instant application. The correlation between the components is better elucidated in comparing FIG. 2 of Jin to FIG. 3 of the instant application, where the peripheral model is understood as equivalent to the integrator 302, and the plurality of feature detectors is understood as equivalent to the comparator of thresholds 304. The rejection below is modified to clarify this distinction in light of the amendments and arguments provided here.
Jin teaches the recited limitations. As explained in Jin regarding the peripheral model, the “peripheral model” provides the “frequency components, which are conveyed as simulated AN firing rates to the feature detectors” where “the frequency components may be obtained using a spectrogram method by taking the short-time Fourier transform”. (Jin, [0037]) It is noted that, in the context of Jin, the STFT functions by breaking the raw audio signal into small time segments and then applying an integration to each segment. Examples of this alternative embodiment are further provided at para. [0060] and FIGS. 8 and 10, where Jin explains that “the discrete Fourier transform of the sound waveform” as performed for the STFT “was computed in 9-ms Hamming windows spaced by 1 ms.” (Jin, [0037]). Though considered unnecessary for the rejection, both the peripheral model and the plurality of feature detectors disclose independently performing an integration, where the raw waveform is integrated at the peripheral model. As such, Jin discloses “at least one event generator to generate a pattern of events by integrating a raw waveform of the audio signal.” Therefore, the rejection is maintained over the arguments provided above.
With respect to the rejection of claims 18 and 20 under 35 USC 112(a), applicant has amended claim 18 and 20 and asserts that support for the amended claims can be found at paragraph [0041] of the as filed specification. Applicant specifically cites the sentence “Thus, the threshold value in the comparator 304 may be set to generate a number of events that minimizes the processing resources to analyze the data (e.g., reduces power consumption), maximizes the accuracy and confidence of the keyword detector206 to detect the keyword, and provides privacy such that the audio signal cannot be reconstructed from the events that are generated.” (Response, pg. 7). It is asserted that this sentence supports the limitation “setting a threshold value for the event based keyword detector to generate the plurality of events based on a desired level of privacy before the audio signal is received” in amended claim 18.  Respectively, it is asserted that this sentence supports the limitation “further comprising instructions to set the threshold value for the plurality of event generators of the event based keyword detector based on … a desired amount of privacy” in amended claim 20.  This argument is not persuasive.
With respect to claims 18 and 20, the above paragraph does not support the amended limitations. Though, it is agreed that privacy is recited, a “desired level of privacy” is not taught as the basis of setting the threshold value. Privacy itself appears to be an intrinsic result of the system, based on detection of a number of keywords which is less than all words expressed in the utterance. In essence, the fewer words which are detected by the system, the more “privacy” the system provides. However, this is not tantamount to a desired level of amount of privacy. 
In the sentence itself, “the threshold value in the comparator 304 may be set to generate a number of events.” (Instant Application, para, [0041]). The number of events results in three (3) features simultaneously “minimizes the processing resources to analyze the data (e.g., reduces power consumption), maximizes the accuracy and confidence of the keyword detector 206 to detect the keyword, and provides privacy such that the audio signal cannot be reconstructed from the events that are generated.” This is a result which occurs naturally on the removal of any words from a statement, not a selection or provision of a desire. Though privacy is arguably a generally desirable trait, a desired level of privacy is not disclosed in the specification in any context. As such, the rejection  of claims 18 and 20 is maintained as modified in light of the amendments.
Applicant further argues that the rejection(s) of dependent claims 2-8, 10-13, and 15-20 should be withdrawn for at least the same reasons as independent claims 1, 9, and 14. Applicant’s arguments in light of the amended claims are not persuasive for the same reasons as described with reference to claims 1, 9, and 14. As such, the rejections of claims 2-8, 10-13, and 15-20 under 35 U.S.C. §103 are maintained in the rejection below, as modified in response to amendments.
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 18 and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Regarding claims 18 and 20, the limitation of a privacy threshold is not supported by the specification as filed. Claim 18 recites “setting a threshold value for the event based keyword detector… based on a desired level of privacy,” at line(s) 2-3. Claim 20 recites “set the threshold value for the plurality of event generators of the event based keyword detector based on… a desired amount of privacy,” at line 8. However, the specification as filed does not provide adequate support for a desired level of privacy. 
Applicant asserts in the response that the amendments do not constitute new matter, and indicates paragraph [0041] as providing support for the amendments. As such, the Office looks to the specification generally for support. As indicated in the response, paragraph [0041] recites “Thus, the threshold value in the comparator304 may be set to generate a number of events that minimizes the processing resources to analyze the data (e.g., reduces power consumption), maximizes the accuracy and confidence of the keyword detector 206 to detect the keyword, and provides privacy such that the audio signal cannot be reconstructed from the events that are generated.”. (Instant Application, ¶ [0041]). However, this is not a desired level of privacy. 
The privacy described is a result from “the number of events” being “reduced,” which is a result from “the threshold value” being “increased”. Privacy results from this based on a missing number of events (i.e., missing words). The resulting level of privacy is not “desired” and the resulting threshold value is not shaped by any asserted intent for privacy. In fact, it is unclear what a desired level of privacy would be in this context. Applicant does not describe or provide examples for any such “levels” or “amounts” of privacy, such that privacy can be quantified or such that any levels or amounts of privacy can be established as a desired privacy. As such, the “desired levels of privacy” and “desired amounts of privacy” as recited claims 18 and 20, respectively, constitutes new matter and the claims are rejected.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-7, 9-12, 14-17, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Georges in view of Jin with further support from Schafer.

Regarding claim 1, Georges discloses A device, comprising (The systems and methods described with reference to the “spoken language understanding system”; Georges, ¶ [0030]): a microphone to receive an audio signal (The system receives “audio input signals 130 provided by an audio capture device such as, for example, a microphone or array of microphones.”; Georges, ¶ [0016]); a keyword detector to detect a keyword… (The system detects “a user spoken key-phrase included in an initial segment of an audio signal”; Georges, ¶ [0034]); and a digital signal processor in communication with the keyword detector, (the key-phrase detection {the keyword detector} triggers {in communication with} “an automatic speech recognition (ASR) processor is triggered, in response to the key-phrase detection. In some embodiments, the triggering of the ASR processor comprises waking the ASR processor from a relatively lower power consumption idle state, to a relatively higher power consumption recognition state.”; Georges, ¶ [0035]) wherein the digital signal processor is activated in response to detection of the keyword to analyze subsequent audio streams (“an automatic speech recognition (ASR) processor is triggered {activated}, in response to the key-phrase detection” where “the ASR processor recognizes speech based on a combination of both the buffered initial segment and on one or more additional received segments of the audio signal which include further speech from the user.”; Georges, ¶ [0036]). However, Georges fails to expressly recite an event generator to generate a pattern of events by integrating a raw waveform of the audio signal, an amount of data associated with the pattern of events being smaller than an amount of data associated with the audio signal itself, [and] a keyword detector to detect a keyword based on the pattern of events.
Jin teaches systems and methods for noise robust speech recognition. (Jin, ¶ [0005]). Regarding claim 1, Jin teaches a microphone to receive an audio signal (“In use, a user may speak commands” where the “spoken words or commands are sensed by an audio pickup such as a microphone to produce an audio signal.”; Jin, ¶ [0082]); at least one event generator to generate a pattern of events (Discloses “The input audio signal is processed as disclosed above to produce a spike sequence” based on performing a transformation of the “speech waveform {a waveform of the audio signal},” depicted in FIG. 2, using a “peripheral model” which, in combination with the feature detector, is the at least one event generator, where “the peripheral model” processes the waveform to generate “frequency components, which are conveyed as simulated AN firing rates to the feature detectors {generate a pattern of events directly from...}”; Jin, ¶ [0035], [0037], [0082], FIG. 2) by integrating a raw waveform of the audio signal (the “frequency components” which “are conveyed as simulated AN firing rates to the feature detectors... may be obtained using a spectrogram method by taking the short-time Fourier transform” where the STFT is an integration of the raw waveform of the audio signal.; Jin, ¶ [0035], [0037], FIG. 2), the amount of data associated with the pattern of events being smaller than the amount of data associated with the audio signal itself (Though not expressly recited as a comparison of data amounts, the spike code provided in Jin is a sparse sequence of discrete feature detection events which can and, in certain embodiments, does disregard the precise time intervals between spikes, and is represented simply as an ordered list of detector labels. An ordered list of detector labels is necessarily a smaller amount of data than an amount of data associated with the audio signal from which those spike signals are derived. This reduction in size is supported by Schafer, which is the accompanying dissertation for this patent application, which clarifies that “In our system, both the input speech and the stored templates are stored as sequences of discrete neuron identifiers, which in practice take the form of two-byte unsigned integers. This represents a 10x to 100x reduction in memory requirements,” as compared to the original waveforms “with similar savings in CPU time.”; Jin, ¶ [0005], [0039], [0050]-[0051], [0057]), a keyword detector to detect a keyword based on the pattern of events (“this spike sequence is compared to one or more spike sequences generated from training data” where the systems and methods described here can be used “for automated speech recognition of a fixed set of vocal commands”; Jin, ¶ [0082]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include an event generator to generate a pattern of events by integrating a raw waveform of the audio signal, an amount of data associated with the pattern of events being smaller than an amount of data associated with the audio signal itself, [and] a keyword detector to detect a keyword based on the pattern of events. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 2, the rejection of claim 1 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite further comprising: a raster plot generator to generate the raster plot of the events.
The relevance of Jin is described above with relation to claim 1. Regarding claim 2, Jin teaches further comprising: a raster plot generator to generate a raster plot of the events (As shown in FIG. 7, the spike sequence can be used to generate “[r]aster plots of feature detector spikes {...of the plurality of events}”; Jin, ¶ [0082], [0017], FIG. 7).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include further comprising: a raster plot generator to generate the raster plot of the events. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 3, the rejection of claim 1 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the keyword detector comprises a neural network.
The relevance of Jin is described above with relation to claim 1. Regarding claim 3, Jin teaches wherein the keyword detector comprises a neural network (“The input audio signal is processed” using “artificial neurons trained to selectively respond to specific speech features” where “artificial neurons” is a neural network.; Jin, ¶ [0082], [0030]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the keyword detector comprises a neural network. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 4, the rejection of claim 1 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the event generator comprises: an integrator to integrate the waveform of the audio signal; a comparator to compare values of an integrated audio signal to a threshold, wherein an event is generated for each value of the integrated audio signal that exceeds the threshold; and a reset timer to pause the integrator for a predefined time after the event is generated.
The relevance of Jin is described above with relation to claim 1. Regarding claim 4, Jin teaches wherein the event generator comprises: an integrator to integrate the raw waveform of the audio signal (The peripheral model which performs the transformation of the “speech waveform,” depicted in FIG. 2, conveys “simulated AN firing rates to the feature detectors,” where the frequency components “may be obtained using a spectrogram method by taking the short-time Fourier transform.” As explained above and supported by the explanations in Stankovic, a short-time Fourier transform is well-known in the relevant art as the integral {...an integrator} of multiple overlapping windows of an audio signal {...to integrate the raw waveform of the audio signal}, having of width T with respect to time t.; Jin, ¶ [0035], [0037], [0082], FIG. 2); a comparator to compare values of an integrated audio signal to a threshold, (“feature detector is modeled as an artificial neuron that takes a weighted sum σ(t)=w·s(t) of its inputs and spikes {integrated audio signal} at peaks in σ(t) that exceed a... threshold”; Jin, ¶ [0039]) wherein an event is generated for each value of the integrated audio signal that exceeds the threshold (Spikes are determined based on exceeding the threshold; Jin, ¶ [0039]); and a reset timer to pause the integrator for a predefined time after the event is generated (identifying spikes of the auditory signal comprises conditioning the auditory signal, which can include “Start/end pause priors: Eliminate any peaks that are within 250 ms of the beginning or end of the file.”; Jin, ¶ [0066], [0073]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the event generator comprises: an integrator to integrate the waveform of the audio signal; a comparator to compare values of an integrated audio signal to a threshold, wherein an event is generated for each value of the integrated audio signal that exceeds the threshold; and a reset timer to pause the integrator for a predefined time after the event is generated. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 5, the rejection of claim 4 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the threshold comprises a positive threshold and a negative threshold.
The relevance of Jin is described above with relation to claim 1. Regarding claim 5, Jin teaches wherein the threshold comprises a positive threshold and a negative threshold (The threshold, though described in the context of positive values, is understood to encompass negative values as well as positive values. The conversion of values between positive and negative is mathematically trivial for a person having ordinary skill in the art (e.g., multiplying all values by -1).; Jin, ¶ [0039]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the threshold comprises a positive threshold and a negative threshold. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 6, the rejection of claim 2 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the event generator comprises a plurality of event generators each to integrate the raw waveform of the audio signal, wherein the raster plot generated by the raster plot generator includes each event generated by each one of the plurality of event generators.
The relevance of Jin is described above with relation to claim 1. Regarding claim 6, Jin teaches wherein the at least one event generator is a plurality of event generators (Discloses a plurality of feature detectors, where the combination of the peripheral model and a feature detector is the event generator. A plurality of feature detectors, each in combination with the peripheral model, is a plurality of event generators.; Jin, ¶ [0037], [0045], [0057]) each to integrate the raw waveform of the audio signal, (the “frequency components” which “are conveyed as simulated AN firing rates to the feature detectors... may be obtained using a spectrogram method by taking the short-time Fourier transform” where the STFT is an integration of the raw waveform of the audio signal.; Jin, ¶ [0035], [0037], FIG. 2) wherein the raster plot generated by the raster plot generator includes each event generated by each one of the plurality of event generators (“As shown in FIG. 7, the “231 feature detectors... are graphed {by each one of the plurality of event generators},” where 231 feature detectors generated a spike {includes each event generated…}.; Jin, ¶ [0017], [0057], FIG. 7).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the event generator comprises a plurality of event generators, wherein the raster plot generated by the raster plot generator includes each event generated by each one of the plurality of event generators. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 7, the rejection of claim 6 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the plurality of event generators are set with different thresholds.
The relevance of Jin is described above with relation to claim 1. Regarding claim 7, Jin teaches wherein the plurality of event generators are set with different thresholds (“In addition, the threshold levels are also determined separately for each neuron during training to maximize discriminative ability,”; Jin, ¶ [0039]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the plurality of event generators are set with different thresholds. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 9, Georges discloses A method, comprising (The systems and methods described with reference to the “spoken language understanding system”; Georges, ¶ [0030]): : receiving an audio signal (The system receives “audio input signals 130 provided by an audio capture device such as, for example, a microphone or array of microphones.”; Georges, ¶ [0016]) ...detect a pattern... that is associated with a keyword (The system detects “a user spoken key-phrase included in an initial segment of an audio signal”; Georges, ¶ [0034]); and activating a digital signal processor in response to the keyword being detected to analyze subsequent audio streams (“an automatic speech recognition (ASR) processor is triggered, in response to the key-phrase detection” where “the ASR processor recognizes speech based on a combination of both the buffered initial segment and on one or more additional received segments of the audio signal which include further speech from the user.”; Georges, ¶ [0035]-[0036]). However, Georges fails to expressly recite generating, by an event based keyword detector, a plurality of events by integrating a raw waveform of the audio signal, an amount of data associated with the pattern of events being smaller than an amount of data associated with the audio signal itself; generating, by the event based keyword detector, a raster plot of the plurality of events; analyzing, by the event based keyword detector, the raster plot to detect a pattern in the plurality of events that is associated with a keyword.
The relevance of Jin is described above with relation to claim 1. Regarding claim 9, Jin teaches comprising: receiving an audio signal (“In use, a user may speak commands” where the “spoken words or commands are sensed by an audio pickup such as a microphone to produce an audio signal.”; Jin, ¶ [0082]); generating, by an event based keyword detector, a plurality of events (Discloses “The input audio signal is processed as disclosed above to produce a spike sequence” based on performing a transformation of the “speech waveform,” depicted in FIG. 2, using a “peripheral model” which, in combination with the feature detector, is the at least one event generator, where “the peripheral model” processes the waveform to generate “frequency components, which are conveyed as simulated AN firing rates to the feature detectors {generating... a plurality of events...}”; Jin, ¶ [0035], [0037], [0082], FIG. 2) by integrating a raw waveform of the audio signal (the “frequency components” which “are conveyed as simulated AN firing rates to the feature detectors... may be obtained using a spectrogram method by taking the short-time Fourier transform” where the STFT is an integration of the raw waveform of the audio signal.; Jin, ¶ [0035], [0037], FIG. 2), the amount of data associated with the plurality of events being smaller than the amount of data associated with the audio signal itself (Though not expressly recited as a comparison of data amounts, the spike code provided in Jin is a sparse sequence of discrete feature detection events which can and, in certain embodiments, does disregard the precise time intervals between spikes, and is represented simply as an ordered list of detector labels. An ordered list of detector labels is necessarily a smaller amount of data than an amount of data associated with the audio signal from which those spike signals are derived. This reduction in size is supported by Schafer, which is the accompanying dissertation for this patent application, which clarifies that “In our system, both the input speech and the stored templates are stored as sequences of discrete neuron identifiers, which in practice take the form of two-byte unsigned integers. This represents a 10x to 100x reduction in memory requirements,” as compared to the original waveforms “with similar savings in CPU time.”; Jin, ¶ [0005], [0039], [0050]-[0051], [0057]), generating, by the event based keyword detector, a raster plot of the plurality of events (As shown in FIG. 7, the spike sequence can be used to generate a “Raster plots of feature detector spikes {...of the plurality of events}”; Jin, ¶ [0082], [0017], FIG. 7); analyzing, by the event based keyword detector, the raster plot to detect a pattern in the plurality of events that is associated with a keyword (“this spike sequence is compared to one or more spike sequences generated from training data” where the systems and methods described here can be used “for automated speech recognition of a fixed set of vocal commands”; Jin, ¶ [0082]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include generating, by an event based keyword detector, a plurality of events by integrating a raw waveform of the audio signal, an amount of data associated with the pattern of events being smaller than an amount of data associated with the audio signal itself; generating, by the event based keyword detector, a raster plot of the plurality of events; analyzing, by the event based keyword detector, the raster plot to detect a pattern in the plurality of events that is associated with a keyword. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 10, the rejection of claim 9 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the generating the plurality of events, comprises: integrating the audio signal over time; comparing a value of an integrated audio signal at a particular time to a threshold; and generating an event when the value of the integrated audio signal exceeds the threshold.
The relevance of Jin is described above with relation to claim 1. Regarding claim 10, Jin teaches wherein the generating the plurality of events, comprises: integrating the raw waveform of the audio signal over time (The peripheral model which performs the transformation of the “speech waveform,” depicted in FIG. 2, conveys “simulated AN firing rates to the feature detectors,” where the frequency components “may be obtained using a spectrogram method by taking the short-time Fourier transform” As explained above and supported by the explanations in Stankovic, a short-time Fourier transform is well-known in the relevant art as the integral {...an integrator} of multiple overlapping windows of an audio signal {...to integrate the raw waveform of the audio signal}, having of width T with respect to time t {over time}.; Jin, ¶ [0035], [0037], [0082], FIG. 2); comparing a value of an integrated audio signal at a particular time to a threshold (using the integrated audio signals produced by the STFT, the “feature detector is modeled as an artificial neuron that takes a weighted sum σ(t)=w·s(t) of its inputs and spikes at peaks in σ(t) that exceed a fixed threshold”; Jin, ¶ [0039]); and generating an event when the value of the integrated audio signal exceeds the threshold (Spikes are determined based on exceeding the threshold; Jin, ¶ [0039]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein the generating the plurality of events, comprises: integrating the audio signal over time; comparing a value of an integrated audio signal at a particular time to a threshold; and generating an event when the value of the integrated audio signal exceeds the threshold. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 11, the rejection of claim 10 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite further comprising: pausing for a predefined period of time after the event is generated before continuing to integrate the audio signal over time.
The relevance of Jin is described above with relation to claim 1. Regarding claim 11, Jin teaches further comprising: pausing for a predefined period of time after the event is generated before continuing to integrate the audio signal over time (identifying spikes of the auditory signal comprises conditioning the auditory signal, which can include “Start/end pause priors: Eliminate any peaks that are within 250 ms of the beginning or end of the file.”; Jin, ¶ [0066], [0073]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include further comprising: pausing for a predefined period of time after the event is generated before continuing to integrate the audio signal over time. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 12, the rejection of claim 9 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite further comprising: tuning the event based keyword detector to detect the keyword with an amount of accuracy and an amount of confidence above a desired threshold before the audio signal is received.
The relevance of Jin is described above with relation to claim 1. Regarding claim 12, Jin teaches further comprising: tuning the event based keyword detector to detect the keyword with an amount of accuracy and an amount of confidence above a desired threshold before the audio signal is received (“In designing the encoding scheme, it was hypothesized that selective tuning of the feature detectors could yield spike codes that are robust to additive acoustic noise,” where robustness to noise refers to both “the hit rate and false hit rate {an amount of accuracy and an amount of confidence}” and where any threshold may be a desired threshold.; Jin, ¶ [0048]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include further comprising: tuning the event based keyword detector to detect the keyword with an amount of accuracy and an amount of confidence above a desired threshold before the audio signal is received. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 14, Georges discloses A non-transitory computer readable storage medium encoded with instructions executable by a processor (The systems and methods described with reference to the “spoken language understanding system” wherein “the methodology depicted can be implemented as a computer program product including one or more non-transitory machine readable mediums that when executed by one or more processors cause the methodology to be carried out. “; Georges, ¶ [0030], [0033]): instructions to set a threshold value for a plurality of event generators of an event based keyword detector (The system sets threshold values for a plurality of feature detectors “such that the neuron spikes in response to the preferred feature but not in response to the background,” where each of the plurality of feature detectors, alongside the peripheral model, form a plurality of event generators.; Jin, ¶ [0037]-[0039]), the non-transitory computer-readable storage medium comprising: instructions to receive an audio signal (The system receives “audio input signals 130 provided by an audio capture device such as, for example, a microphone or array of microphones.”; Georges, ¶ [0016]); instructions to detect a keyword … based on a pattern… (The system detects “a user spoken key-phrase included in an initial segment of an audio signal”; Georges, ¶ [0034]); and instructions to activate a digital signal processor after the keyword is detected by the event based keyword detector to analyze subsequent audio streams (“an automatic speech recognition (ASR) processor is triggered, in response to the key-phrase detection” where “the ASR processor recognizes speech based on a combination of both the buffered initial segment and on one or more additional received segments of the audio signal which include further speech from the user.”; Georges, ¶ [0035]-[0036]). However, Georges fails to expressly recite instructions to set a threshold value for a plurality of event generators of an event based keyword detector, instructions to detect a keyword by integrating a raw waveform of the audio signal by the event based keyword detector based on a pattern of events generated by each one of the plurality of event generators and recorded in a raster plot, wherein an event is generated when an integrated audio signal value exceeds the threshold and wherein an amount of data associated with the pattern of events being smaller than an amount of data associated with the audio signal itself.
The relevance of Jin is described above with relation to claim 1. Regarding claim 14, Jin teaches instructions to receive an audio signal (“In use, a user may speak commands” where the “spoken words or commands are sensed by an audio pickup such as a microphone to produce an audio signal.”; Jin, ¶ [0082]); instructions to detect a keyword by integrating a raw waveform of the audio signal by the event based keyword detector (Discloses “The input audio signal is processed as disclosed above to produce a spike sequence” based on performing a transformation of the “speech waveform {a waveform of the audio signal},” depicted in FIG. 2, using a “peripheral model” which, in combination with the feature detector, is the at least one event generator, and is used to detect the keyword, where “the peripheral model” processes the waveform to generate “frequency components, which are conveyed as simulated AN firing rates to the feature detectors {generate a pattern of events}” and where the “frequency components” which “are conveyed as simulated AN firing rates to the feature detectors... may be obtained using a spectrogram method by taking the short-time Fourier transform” where the STFT is an integration of the raw waveform of the audio signal.; Jin, ¶ [0035], [0037], [0082], FIG. 2) based on a pattern of events generated by each one of the plurality of event generators and recorded in a raster plot, (As shown in FIG. 7, the spike sequence can be used to generate a “Raster plots of feature detector spikes {...of the plurality of events}” where “this spike sequence is compared to one or more spike sequences generated from training data” where the systems and methods described here can be used “for automated speech recognition of a fixed set of vocal commands”; Jin, ¶ [0082], [0017], FIG. 7) wherein an event is generated when an integrated audio signal value exceeds the threshold (“temporal integration is implemented by including several time-delayed copies of the AN response as the input to each feature detector” and “feature detector is modeled as an artificial neuron that takes a weighted sum σ(t)=w·s(t) of its inputs and spikes at peaks in σ(t) that exceed a fixed threshold”; Jin, ¶ [0038]-[0039]) and wherein the amount of data associated with the pattern of events being smaller than the amount of data associated with the audio signal itself (Though not expressly recited as a comparison of data amounts, the spike code provided in Jin is a sparse sequence of discrete feature detection events which can and, in certain embodiments, does disregard the precise time intervals between spikes, and is represented simply as an ordered list of detector labels. An ordered list of detector labels is necessarily a smaller amount of data than an amount of data associated with the audio signal from which those spike signals are derived. This reduction in size is supported by Schafer, which is the accompanying dissertation for this patent application, which clarifies that “In our system, both the input speech and the stored templates are stored as sequences of discrete neuron identifiers, which in practice take the form of two-byte unsigned integers. This represents a 10x to 100x reduction in memory requirements,” as compared to the original waveforms “with similar savings in CPU time.”; Jin, ¶ [0005], [0039], [0050]-[0051], [0057]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include instructions to detect a keyword directly from a waveform of the audio signal by the event based keyword detector based on a pattern of events generated by each one of the plurality of event generators and recorded in a raster plot, wherein an event is generated when an integrated audio signal value exceeds the threshold. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 15, the rejection of claim 14 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to expressly recite wherein the amount of data associated with the audio signal itself is ten times the amount of data associated with the pattern of events.
The relevance of Jin is described above with relation to claim 1. Regarding claim 15, Jin teaches wherein increasing the threshold value decreases an amount of events generated by an event generator of the plurality of event generators and decreasing the threshold value increases the amount of events generated by the event generator (“Spikes (in which the peaks are circled) are assigned at peaks in the response that surpass a fixed threshold (horizontal gray lines),” where it is understood that increasing the threshold value will not increase the height of the spike. As such, the number of spikes detected {events generated} at the higher threshold, when established as a fixed threshold, will decrease. Conversely, decreasing the threshold value, for at least the fixed threshold embodiment, will increase the number of spikes detected {events generated}.; Jin, ¶ [0012]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the keyword spotting and speech recognition systems of Georges to incorporate the teachings of Jin to include wherein increasing the threshold value decreases an amount of events generated by an event generator of the plurality of event generators and decreasing the threshold value increases the amount of events generated by the event generator. The spike-based detection described by Jin “outperforms a state-of-the-art robust speech recognizer at low signal-to-noise levels,” and “the spike-based encoding scheme” offers “gains in noise robustness over traditional speech recognition methods,” where improved speech recognition provides the known benefit of improved performance and increase in result quality, as recognized by Jin. (Jin, ¶ [0005]).

Regarding claim 16, the rejection of claim 1 is incorporated. Georges discloses all of the elements of the current invention as stated above. However, Georges fails to express
Read full office action
Prosecution Timeline

Mar 09, 2022
Application Filed
Apr 20, 2024
Non-Final Rejection — §103, §112
Jul 29, 2024
Response Filed
Nov 02, 2024
Final Rejection — §103, §112
Jan 09, 2025
Response after Non-Final Action
Feb 04, 2025
Request for Continued Examination
Feb 05, 2025
Response after Non-Final Action
Apr 18, 2025
Non-Final Rejection — §103, §112
Jul 22, 2025
Response Filed
Oct 20, 2025
Final Rejection — §103, §112
Dec 08, 2025
Interview Requested
Dec 16, 2025
Applicant Interview (Telephonic)
Dec 16, 2025
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/154,549
Patent 12603095
Stereo Audio Signal Delay Estimation Method and Apparatus
2y 5m to grant Granted Apr 14, 2026
17/648,548
Patent 12598250
SYSTEMS AND METHODS FOR COHERENT AND TIERED VOICE ENROLLMENT
2y 5m to grant Granted Apr 07, 2026
18/004,197
Patent 12597429
PACKET LOSS CONCEALMENT
2y 5m to grant Granted Apr 07, 2026
16/529,456
Patent 12512093
Sensor-Processing Systems Including Neuromorphic Processing Modules and Methods Thereof
2y 5m to grant Granted Dec 30, 2025
17/640,303
Patent 12505835
HOME APPLIANCE AND SERVER
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
69%
Grant Probability
99%
With Interview (+33.6%)
3y 2m
Median Time to Grant
High
PTA Risk
Based on 134 resolved cases by this examiner. Grant probability derived from career allow rate.