Prosecution Insights
Last updated: April 19, 2026
Application No. 17/729,238

LIVE SPEECH DETECTION

Non-Final OA §101§103
Filed
Apr 26, 2022
Examiner
CHAVEZ, RODRIGO A
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Cirrus Logic International Semiconductor Ltd.
OA Round
5 (Non-Final)
50%
Grant Probability
Moderate
5-6
OA Rounds
3y 5m
To Grant
88%
With Interview

Examiner Intelligence

Grants 50% of resolved cases
50%
Career Allow Rate
115 granted / 228 resolved
-11.6% vs TC avg
Strong +37% interview lift
Without
With
+37.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
22 currently pending
Career history
250
Total Applications
across all art units

Statute-Specific Performance

§101
16.4%
-23.6% vs TC avg
§103
53.1%
+13.1% vs TC avg
§102
20.9%
-19.1% vs TC avg
§112
5.6%
-34.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 228 resolved cases

Office Action

§101 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/12/2026 has been entered. Response to Arguments Applicant's arguments filed 01/12/2026 have been fully considered but they are not persuasive. Regarding the rejection under 35 U.S.C. § 101, applicant argues: “In Applicant's Response filed July 3, 2025, Applicant argued that even if one or more of the features of Claim 1 as then on file were considered to fall within the abstract idea judicial exception, which Applicant does not concede, Claim 1 as a whole would still not have been directed to a judicial exception at least because the feature relating to the triggering of liveness detection circuitry would have integrated any such judicial exception into the practical application of more efficiently controlling liveness detection circuitry.” … “Claim 1 has been amended in order to expedite prosecution such that it is positively limited to performing live speech detection on the received signal. Applicant submits that at least this feature of Claim 1 as amended, which relates to the technical field of live speech detection, could not be performed as part of a mental process and does not fall within the abstract idea judicial exception” … “Applicant further submits that even if one or more other features of Claim 1 as amended were considered to fall within the abstract idea judicial exception, which Applicant does not concede, the above-discussed feature of Claim 1 as amended relating to live speech detection would integrate any such judicial exception into the practical application of improving the technical field of live speech detection by providing a more efficient mechanism by which to perform live speech detection. This efficiency is achieved by selectively performing live speech detection, which is relatively power intensive, on received signals determined to be suitable for this purpose. Thus, the solution of Claim 1 mitigates against wastefully performing live speech detection on unsuitable signals.” Regarding applicant’s arguments, the examiner respectfully disagrees. The examiner contends that, although the claim has been amended to positively recite a determination of whether the received signal comprises live speech, the recitation fails to provide meaningful language beyond the abstract idea. As previously exemplified by the examiner in the Final Office action dated 10/16/2025: “Under broadest reasonable interpretation, a human is capable of receiving audio information in the form of an audio signal or in the form of a frequency spectrum printed on a sheet of paper, and compare the printed spectrum with a model spectrum of live speech to determine whether the signal is suitable to be further analyzed for live speech detection.” In this case, the question of analysis now relies on, not only the binary decision of whether a signal is suitable or not to be further processed, but also another binary decision of whether the received signal comprises live speech. The examiner acknowledges the applicant’s argument that the solution recited in the claim appears to be mitigating “against wastefully performing live speech detection on unsuitable signals,” however, the examiner contends that the claim is abstract because no specific procedures are tied to the analysis that are beyond what a human is capable of performing. The claim does not go beyond simply deciding whether a signal comprises live speech based on an ultrasonic component. As noted before, a human is capable of making such decision using his or her own judgement by looking at a printed frequency spectrum showing the measurements of an ultrasonic component. The applicant has failed to provide any meaningful limitations that would provide any sort of context as to any particular procedures being employed by a particular machine that would tie the liveness determination to a particular technological environment. The examiner again reiterates from the previous office action that: “For example, no specific procedure is recited regarding the ‘estimating’ of an existing signal characteristic of an ultrasonic component that is based on a model of live speech. Although the examiner recognizes that an ultrasonic component represents a part of an audio signal that a human is not capable of hearing, (1) the claim does not specifically recite any analysis of an ultrasonic component, which in fact the specification supports, and (2) the examiner contends that, as mentioned before, under broadest reasonable interpretation, a human is capable of ‘estimating’ an ultrasonic component from a printed frequency spectrum of a model of live speech by simple visual comparison, given that the ‘estimating’ procedure is not clearly specified in the claim. Furthermore, no specific procedure is recited regarding the ‘determining’ whether the ultrasonic component is suitable for detecting live speech. The examiner cannot accurately determine the meets and bounds of the ‘determining’ step. Whether this step involves an algorithmic comparison or simply a visual comparison, under the broadest reasonable interpretation, a person of ordinary skill in the art would be capable of determining whether a signal is suitable for further processing much like how a quality control process is performed in various procedures that are known in the art. For example, how a person of ordinary skill may be able to detect that a signal contains a sufficient amount of noise that would require a denoising procedure, or how mixing engineers are able to estimate, by looking at a frequency spectrum analyzer, whether a signal contains inaudible artifacts that would need to be filtered out in order to insure that the overall audio mix is not cluttered by such artifacts and the signal contains only the necessary information. The above examples are mere examples of choice given a person’s ordinary skill in the art, but they exemplify the broadness of the recited subject matter such that it preempts these examples of ordinary skill and renders the recited subject matter abstract.” Regarding the rejection under 35 U.S.C. § 103, Applicant’s arguments with respect to claims 1, 2, 6-11, 13-17 and 19-21 have been considered but are moot because of the new ground of rejection in view of Lesso and Lee. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1, 2, 6-11, 13-17 and 19-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. The Supreme Court has long held that “[l]aws of nature, natural phenomena, and abstract ideas are not patentable.” Alice Corp. Pty. Ltd. v. CLS Bank Int’l, 134 S. Ct. 2347, 2354 (2014) (quoting Assoc. for Molecular Pathology v. Myriad Genetics, Inc., 133 S. Ct. 2107, 2116 (2013) (internal quotation marks omitted)). The “abstract ideas” category embodies the longstanding rule that an idea, by itself, is not patentable. Alice Corp., 134S. Ct. at 2355 (quoting Gottschalk v. Benson, 409 U.S. 63, 67 (1972). In Alice, the Supreme Court sets forth an analytical “framework for distinguishing patents that claim laws of nature, natural phenomena, and abstract ideas [or mental processes ] from those that claim patent-eligible applications of those concepts.” Id. at 2355 (citing Mayo Collaborative Servs. v. Prometheus Labs., Inc., 132 S. Ct. 1289, 1296–97 (2012)). The first step in the analysis is to “determine whether the claims at issue are directed to one of those patent-ineligible concepts.” Id. If the claims are directed to a patent-ineligible concept, the second step in the analysis is to consider the elements of the claims “individually and ‘as an ordered combination’” to determine whether there are additional elements that “‘transform the nature of the claim’ into a patent-eligible application.” Id. (quoting Mayo, 132 S. Ct. at 1298, 1297). In other words, the second step is to “search for an ‘inventive concept’—i.e., an element or combination of elements that is ‘sufficient to ensure that the patent in practice amounts to significantly more than a patent upon the [ineligible concept] itself’”. Id. (brackets in original) (quoting Mayo, 132 S. Ct. at 1294). The prohibition against patenting an abstract idea “‘cannot be circumvented by attempting to limit the use of the formula to a particular technological environment’ or adding ‘insignificant post-solution activity.’” Bilski v. Kappos, 561 U.S. 593, 610–11 (2010) (citation omitted). Step 1: This part of the eligibility analysis evaluates whether the claim falls within any statutory category. See MPEP 2106.03. Independent Claim 1 recites the method of detecting a suitability of a signal for live speech detection, and thus is a process (a series of steps or acts). A process is a statutory category of invention. Independent Claim 15 recites a non-transitory storage medium having instructions executed by a processor that perform a method similar to Claim 1. An non-transitory storage medium falls within a Statutory category of invention. Independent claim 16 recites an apparatus comprising one or more processors configured to perform a method similar to claim 1. An apparatus is a statutory category of invention. Independent claim 21 recites a method similar to claim 1 and thus is also statutory. Dependent claims 2, 6-11, 13-14, 17 and 19-20 are dependent on claims 1 and 16, respectively, and therefore recite their respective statutory classes. Step 2A, Prong One: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04, subsection II, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim. In applying the framework set out in Alice, examiner found Applicant’s claims 1, 15 , 16 and 21 are directed to a patent-ineligible abstract concept of determining the suitability of a speech signal for live speech detection by measuring and comparing the frequency components of the signal. The steps of Applicant’s claims 1, 2, 6-11, 13-17 and 19-21 are an abstract concept that would fall under the judicial exception of mental processes. Specifically, the claims recite the step of “receiving the signal containing speech...” Under the broadest reasonable interpretation, this limitation may simply involve a human receiving some kind of information that represents another human’s speech. Although the claim recites the signal is from a transducer, the claimed transducer appears to be nothing more than an additional element representing a device such as a microphone or a speaker that is able to represent an audio signal and is performing in a purely conventional way. Therefore, this step is directed to a mental process of receiving information. Furthermore, the step of “measuring a signal characteristic of an audible component of the received signal” recites steps that are directed to mental processes. The claim fails to provide any limit on how the signal characteristic is measured. The measurement may simply come from a human looking at a spectrogram representing the speech signal within the audible range of frequencies. Thus, the recited limitation is directed to a mental process. Further, the claim recites “estimating, based on a model of live speech, an expected signal characteristic of an ultrasonic component of the received signal based on the measured signal characteristic of the audible component”. The claim does not place any limits on how the expected signal characteristic is estimated. Under the broadest reasonable interpretation, a human is capable of looking at a spectrogram that is able to represent measurements in the ultrasonic range and visually compare them with a spectrogram that models live speech. Therefore, the above steps are also directed to mental processes. The claim further recites “determining, based on the estimated expected signal characteristic, whether the ultrasonic component is suitable for detecting whether the speech is live speech”. Here, based on the findings of previous limitations, the limitation provides nothing more than a binary decision of signal suitability based on the previous measurements and comparisons. Under the broadest reasonable interpretation a human is capable of comparing and making a decision of suitability based on the measured and compared data. Thus, the limitation recites a mental process. Finally, the step of “in response to determining that the ultrasonic component is suitable for detecting whether the speech is live speech is live speech, determining whether the received signal comprises live speech based on a measurement of an ultrasonic component of the received signal” falls under the mental processes category. Although the liveness determination is now positively recited in the claim, the limitation constitutes nothing more than simply deciding whether a signal comprises live speech based on an ultrasonic component, with no specific procedures tied to the analysis that are beyond what a human is capable of performing. Although an ultrasonic component is used to make such determination, because no specific procedure is tied to how the ultrasonic component is used to make such determination, under broadest reasonable interpretation, a human would be capable of using his or her own judgement to decide whether a sound signal contains live speech by simply listening and comparing the amount of ultrasonic information present in a printed frequency spectrum of the sound signal. Thus the claimed limitation recites a mental process as well. The claims recite limitations that taken in combination, recite at least a series of mental processes. Step 2A, Prong Two: This part of the eligibility analysis evaluates whether the claim as a whole integrates the recited judicial exception into a practical application of the exception. This evaluation is performed by (1) identifying whether there are any additional elements recited in the claim beyond the judicial exception, and (2) evaluating those additional elements individually and in combination to determine whether the claim as a whole integrates the exception into a practical application. See MPEP 2106.04(d). As discussed above, the claims recite a transducer as an additional element beyond the judicial exception. The examiner has found, however, that the transducer provides no further detail and is recited at such a high-level of generality that this limitation is merely a post-solution step. Therefore, this step is an insignificant extra-solution activity and does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Furthermore, independent Claims 15 and 16 further recite “a processor” and “one or more processors” as an additional element beyond the judicial exception. However, these additional elements do not amount to significantly more than the abstract idea because the additional elements constitute a generic computer environment. Alice, 134 S. Ct. at 2357. The Claims need meaningful limitations that go beyond generally linking the use of an abstract idea to a particular technological environment. Therefore, the steps are all abstract and the Claim as a whole is abstract. “[S]imply appending generic computer functionality to lend speed or efficiency to the performance of an otherwise abstract concept does not meaningfully limit claim scope for purposes of patent eligibility.” CLS Bank, 2013 U.S. App. LEXIS 9493, at *29 (citing Bancorp, 687 F.3d at 1278, and Dealertrack, Inc. v. Huber, 674 F.3d 1315, 1333-34 (Fed. Cir. 2012) (finding that the claimed computer-aided clearinghouse process is a patent-ineligible abstract idea)); SiRF Tech., Inc. v. Int'l Trade Comm'n, 601 F.3d 1319, 1333 (Fed. Cir. 2010) (“In order for the addition of a machine to impose a meaningful limit on the scope of a claim, it must play a significant part in permitting the claimed method to be performed, rather than function solely as an obvious mechanism for permitting a solution to be achieved more quickly, i.e., through the utilization of a computer for performing calculations.”). Additionally, dependent claims 2, 6-11, 13-14, 17 and 19-20 do not provide any additional elements that integrate the judicial exception into a practical application. The claims simply describe further parameters for comparing data such as an ultrasonic signal characteristic threshold, using a bandpass filter to measure signal characteristics and applying weights corresponding to human loudness perception to two or more bandpass filtered signals. Step 2B: This part of the eligibility analysis evaluates whether the claim as a whole amounts to significantly more than the recited exception, i.e., whether any additional element, or combination of additional elements, adds an inventive concept to the claim. See MPEP 2106.05. At step 2A, prong two, the additional elements of the transducer and the processor were found to be insignificant extra-solution activity and a generic computer environment. At Step 2B, the re-evaluation of the insignificant extra-solution activity consideration takes into account whether or not the extra-solution activity is well understood, routine, and conventional in the field. See MPEP 2106.05(g). Here, the element of a transducer is merely a device performing as a generic computer environment and is well understood, routine, and conventional in the field. Therefore, this limitation remains insignificant extra-solution activity even upon reconsideration and does not amount to significantly more. Even when considered in combination, these additional elements represent mere instructions to apply an exception and insignificant extra-solution activity, and therefore do not provide an inventive concept. Additionally, dependent claims 2, 6-11, 13-14, 17 and 19-20 do not add an inventive concept. In conclusion, Examiner notes that none of recited steps in Applicant's claims 1, 2, 6-11, 13-17 and 19-21 refer to a specific machine by reciting structural limitations of any apparatus or to any specific operations that would cause a machine to be the mechanism to perform these steps. Although the claims may be processed by a computing system having a processor, the computing system is merely a general purpose computing system. Therefore, all of the claims 1, 2, 6-11, 13-17 and 19-21 are abstract. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1, 2, 6-11, 13-17 and 19-21 are rejected under 35 U.S.C. 103 as being unpatentable over Lesso (US PG Pub 20200043484) in view of Y. Lee et al., “Using Sonar for Liveness Detection to Protect Smart Speakers against Remote Attackers,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 4, Issue 1 (March 2020). (hereinafter Lee). As per claims 1, 15 and 16, Lesso discloses: A method, non-transitory storage medium having instructions thereon which, when executed by a processor (Lesso; Fig. 2, item 16), cause the processor to perform a method, and an apparatus for detecting a suitability of a signal for live speech detection, the method comprising: receiving the signal containing speech from a transducer (Lesso; Fig. 5, item 60; p. 0074 - an audio signal representing speech is received; see also Fig. 4, item 12 & p. 0070-0071 - a microphone 12 (for example one of the microphones in the smartphone 10) detects a sound, and this is passed to an initial processing block 40); measuring a signal characteristic of an audible component of the received signal (Lesso; p. 0075-0079 - The received audio signal representing speech may then be passed to a spectrum extraction block 42. The spectrum extraction block 42 may be configured to obtain a spectrum of the received audio signal… the spectrum extraction block 42 may be configured to perform a fast Fourier transform on the received audio signal. The result of the fast Fourier transform is an indication of the power or energy present in the signal at different frequencies); estimating, based on a model of live speech (Lesso; p. 0092-0084 - …each speaker model contains separate models of the voiced speech and the unvoiced speech of the enrolled user; see also p. 0134), an expected signal characteristic of an ultrasonic component of the received signal based on the measured signal characteristic of the audible component (Lesso; Fig. 5, items 66 & 68; p. 0114 - for each portion of the audio signal for which speech content is identified, information about an expected frequency spectrum of the corresponding portion of the audio signal is retrieved; see also p. 0118 - the comparison block 50 may compare components of the identified parts of the audio signal with the respective retrieved information for the corresponding test acoustic class in a frequency band in the range of 5-20 kHz, or in the range from 16 kHz upwards. As described above, some loudspeakers may be unable to reproduce ultrasonic and/or near ultrasonic frequencies well. As a result of this, a received audio signal that comprises a test acoustic class which comprises a relatively high level of ultrasonic and/or near ultrasonic frequencies is likely to be reproduced poorly by a loudspeaker at these frequencies (for example, in a frequency band in the range of 5 kHz-20 kHz, or in a frequency band in the range above 16 kHz). Thus, the spectrum of a received audio signal representing a test acoustic class is likely to differ significantly from the expected spectrum of an audio signal representing a test acoustic class, wherein the test acoustic class comprises a relatively high level of ultrasonic and/or near ultrasonic frequencies, when the received audio signal results from a replay attack. Thus, in order to be able to detect a replay attack using this frequency band, it is preferable for the test acoustic class to comprise a relatively high level of ultrasonic and/or near ultrasonic frequencies, and suitable acoustic classes may for example be fricatives, such as sibilants, and plosives); and determining, based on the estimated expected signal characteristic, whether the ultrasonic component is suitable for detecting whether the speech is live speech (Lesso; Fig. 5, item 68; p. 0118 - the comparison block 50 may compare components of the identified parts of the audio signal with the respective retrieved information for the corresponding test acoustic class in a frequency band in the range of 5-20 kHz, or in the range from 16 kHz upwards. As described above, some loudspeakers may be unable to reproduce ultrasonic and/or near ultrasonic frequencies well. As a result of this, a received audio signal that comprises a test acoustic class which comprises a relatively high level of ultrasonic and/or near ultrasonic frequencies is likely to be reproduced poorly by a loudspeaker at these frequencies (for example, in a frequency band in the range of 5 kHz-20 kHz, or in a frequency band in the range above 16 kHz). Thus, the spectrum of a received audio signal representing a test acoustic class is likely to differ significantly from the expected spectrum of an audio signal representing a test acoustic class, wherein the test acoustic class comprises a relatively high level of ultrasonic and/or near ultrasonic frequencies, when the received audio signal results from a replay attack. Thus, in order to be able to detect a replay attack using this frequency band, it is preferable for the test acoustic class to comprise a relatively high level of ultrasonic and/or near ultrasonic frequencies, and suitable acoustic classes may for example be fricatives, such as sibilants, and plosives (suitability determination); see also p. 0112 - Having identified the speech content, information is obtained about a frequency spectrum of each portion of the audio signal for which the specific speech content is identified. For example, whereas the spectrum extraction block 42 may be configured to obtain a spectrum of the entire received audio signal, a second spectrum extraction block 47 may be configured to obtain a spectrum of those portions of the received audio signal for which the particular speech content of interest is identified. For example, as described above, the portions of the signal representing the specific speech content with a high proportion of high frequency components may be considered of interest, and the second spectrum extraction block 47 may be configured to obtain a spectrum of the frames of the received audio in which that speech content is identified (extraction block 47 is used to selectively trigger liveness detection based on determining suitability of the portions of the signal that are of interest based on their high frequency components)). Lesso, however, fails to disclose in response to determining that the ultrasonic component is suitable for detecting whether the speech is live speech is live speech, determining whether the received signal comprises live speech based on a measurement of an ultrasonic component of the received signal. Lee does teach in response to determining that the ultrasonic component is suitable for detecting whether the speech is live speech is live speech, determining whether the received signal comprises live speech based on a measurement of an ultrasonic component of the received signal (Lee; Pg. 5 Section 2. Background – using Sonar as a technique that uses sound propagation to detect objects for liveness detection by emitting ultrasonic sound (in the range of 18-20Khz) with a speaker and receiving the reflected sound with a microphone array (i.e., matrix voice); see also Pg. 17, Section 4.8 Discussion: Additional Design and Practical Issues - The Speaker-Sonar works by detecting movements… our approach aims to make sure that a command is coming from a real user (i.e., liveness detection); by detecting movement through the emission of ultrasonic sound signals and detection of Doppler shifts, Lee’s disclosure provides for determining suitability for liveness detection). Therefore, it would have been obvious to one of ordinary skill in the art to modify the method, non-transitory storage medium and apparatus of Lesso to include in response to determining that the ultrasonic component is suitable for detecting whether the speech is live speech is live speech, determining whether the received signal comprises live speech based on a measurement of an ultrasonic component of the received signal, as taught by Lee, in order to protect the smart speakers from remote attackers that leverage network-connected speakers to send malicious commands. The key idea… is to make sure that the voice command is indeed coming from the user (Lee; Pg. 2). As per claims 2 and 17, Lesso in view of Lee discloses: The method and apparatus of claims 1 and 16, wherein the measured signal characteristic and the expected signal characteristic are the same signal characteristic and comprise one of: a power level; and a sound pressure level (Lesso; p. 0075-0079 - The received audio signal representing speech may then be passed to a spectrum extraction block 42. The spectrum extraction block 42 may be configured to obtain a spectrum of the received audio signal… the spectrum extraction block 42 may be configured to perform a fast Fourier transform on the received audio signal. The result of the fast Fourier transform is an indication of the power or energy present in the signal at different frequencies). As per claim 6, Lesso in view of Lee discloses: The method of claim 1, further comprising: determining whether the received signal comprises speech (Lesso; Fig. 5, item 62; p. 0109 - the speech content present in at least a portion of the audio signal is identified). As per claims 7 and 19, Lesso in view of Lee discloses: The method and apparatus of claims 1 and 16, wherein determining whether the ultrasonic component is suitable for detecting whether the speech is live speech comprises: comparing the expected signal characteristic to an ultrasonic signal characteristic threshold (Lesso; p. 0127-0128 - The comparison as generated by the comparison block 50 may be transmitted to a decision block 52. The decision block 52 may determine if a measure of a difference between the frequency spectrum of portions of the audio signal for which speech content is identified and the respective expected frequency spectrum exceeds a threshold level. If the measure of the difference exceeds a threshold level, the decision block 52 may determine that the audio signal may result from a replay attack). As per claim 8, Lesso in view of Lee discloses: The method of claim 1, wherein measuring the signal characteristic of the audible component comprises: bandpass filtering the received audio signal to generate one or more bandpass filtered audio signals; and measuring the signal characteristic in one or more of the one or more bandpass filtered audio signals (Lesso; p. 0077 - the spectrum extraction block 42 may be configured to apply several band-pass filters to the received audio signal representing speech. Each band-pass filter may only allow signals within a particular frequency band of the received audio signal to pass through). As per claim 9, Lesso in view of Lee discloses: The method of claim 8, wherein the one or more bandpass filtered audio signals comprise two or more bandpass filtered signals, wherein measuring the signal characteristic of the audible component further comprises: applying weights to the measured signal characteristics in the two or more bandpass filtered signals, wherein the estimation of the expected signal characteristic in the ultrasonic component is based on one or more weighted bandpass filtered signals (Lesso; p. 0099-0101 - In some embodiments, the identification block 46 is configured to identify at least one test acoustic class (weights) in the received audio signal, where an acoustic class is the set of phonemes belonging to the same broad class of phonemes, such as fricatives, vowels, etc. That is, the identification block 46 may be configured to identify the portions of the received audio signal that contain one or more specific test acoustic class…). As per claim 10, Lesso in view of Lee discloses: The method of claim 9, wherein the weights are applied to emphasize one or more of the bandpass filtered signals that correspond to human loudness perception (Lesso; p. 0099-0101 - In some examples, the at least one test acoustic class may comprise one or more specific phonemes. In some examples, the test acoustic class may comprise a set of vowels. In other examples, the test acoustic class may comprise a set of consonants. For example, the at least one test acoustic class may comprise fricatives, and more specifically may comprise sibilants. In another example, the at least one test acoustic class may comprise plosives. It is noted that an audio signal generated by the vocal tract of a human being, in particular when articulating fricatives and sibilant phonemes, contains significant energy in the ultrasound region, above about 20 kHz, and even beyond 30 kHz). As per claim 11, Lesso in view of Lee discloses: The method of claim 9, wherein weights are applied to reduce sensitivity to differences in speech between different cohorts of the population (Lesso; p. 0132 - In some examples, the expected spectrum corresponding to each acoustic class stored in the database 48 will be representative of the acoustic class as spoken by a large cohort of speakers. However, in some examples, the expected spectrum corresponding to each acoustic class stored in the database 48 will be representative of the acoustic class as spoken by a particular individual. Said particular individual may have been identified by the speaker identification process). As per claim 13, Lesso in view of Lee discloses: The method of claim 1, wherein the model of live speech is generated using a speech model for a user of the transducer (Lesso; p. 0082-0084 - The signal received on the input 70 is also passed to a speaker recognition block 74, which performs a voice biometric process to identify the speaker, from amongst a plurality of enrolled speakers. The process of enrolment in a speaker recognition system typically involves the speaker providing a sample of speech, from which specific features are extracted, and the extracted features are used to form a model of the speaker's speech. In use, corresponding features are extracted from a sample of speech, and these are compared with the previously obtained model to obtain a measure of the likelihood that the speaker is the previously enrolled speaker…). As per claim 14, Lesso in view of Lee discloses: The method of claim 1, wherein the live speech is generated using a cohort of speakers (Lesso; p. 0082-0084 - In the system shown in FIG. 6, one or more speaker model is stored, for example in a database 76. Based on the output of the speaker recognition block 74, one or more speaker model is selected. In this embodiment, each speaker model contains separate models of the voiced speech and the unvoiced speech of the enrolled user. More specifically, the model of the voiced speech and the model of the unvoiced speech of the enrolled user each comprise amplitude values corresponding to multiple frequencies). As per claim 20, Lesso in view of Lee discloses the apparatus of claim 16, which the electronic device of claim 20 comprises. Thus, the claim is rejected similarly. As per claim 21 Lesso discloses: A method for detecting a suitability of a signal for live speech detection, the method performed by a suitability circuitry and comprising: receiving the signal containing speech from a transducer (Lesso; Fig. 5, item 60; p. 0074 - an audio signal representing speech is received; see also Fig. 4, item 12 & p. 0070-0071 - a microphone 12 (for example one of the microphones in the smartphone 10) detects a sound, and this is passed to an initial processing block 40); measuring a signal characteristic of an audible component of the received signal (Lesso; p. 0075-0079 - The received audio signal representing speech may then be passed to a spectrum extraction block 42. The spectrum extraction block 42 may be configured to obtain a spectrum of the received audio signal… the spectrum extraction block 42 may be configured to perform a fast Fourier transform on the received audio signal. The result of the fast Fourier transform is an indication of the power or energy present in the signal at different frequencies); estimating, based on a trained neural network (Lesso; p. 0103 - The identification of the acoustic class may for example be performed by a trained neural network; p. 0101 - …the at least one test acoustic class may comprise fricatives, and more specifically may comprise sibilants. In another example, the at least one test acoustic class may comprise plosives. It is noted that an audio signal generated by the vocal tract of a human being, in particular when articulating fricatives and sibilant phonemes, contains significant energy in the ultrasound region, above about 20 kHz, and even beyond 30 kHz), an expected signal characteristic of an ultrasonic component of the received signal based on the measured signal characteristic of the audible component (Lesso; Fig. 5, items 66 & 68; p. 0114 - for each portion of the audio signal for which speech content is identified, information about an expected frequency spectrum of the corresponding portion of the audio signal is retrieved; see also p. 0118 - the comparison block 50 may compare components of the identified parts of the audio signal with the respective retrieved information for the corresponding test acoustic class in a frequency band in the range of 5-20 kHz, or in the range from 16 kHz upwards. As described above, some loudspeakers may be unable to reproduce ultrasonic and/or near ultrasonic frequencies well. As a result of this, a received audio signal that comprises a test acoustic class which comprises a relatively high level of ultrasonic and/or near ultrasonic frequencies is likely to be reproduced poorly by a loudspeaker at these frequencies (for example, in a frequency band in the range of 5 kHz-20 kHz, or in a frequency band in the range above 16 kHz). Thus, the spectrum of a received audio signal representing a test acoustic class is likely to differ significantly from the expected spectrum of an audio signal representing a test acoustic class, wherein the test acoustic class comprises a relatively high level of ultrasonic and/or near ultrasonic frequencies, when the received audio signal results from a replay attack. Thus, in order to be able to detect a replay attack using this frequency band, it is preferable for the test acoustic class to comprise a relatively high level of ultrasonic and/or near ultrasonic frequencies, and suitable acoustic classes may for example be fricatives, such as sibilants, and plosives); and determining, based on the estimated expected signal characteristic, whether the ultrasonic component is suitable for detecting whether the speech is live speech (Lesso; Fig. 5, item 68; p. 0118 - the comparison block 50 may compare components of the identified parts of the audio signal with the respective retrieved information for the corresponding test acoustic class in a frequency band in the range of 5-20 kHz, or in the range from 16 kHz upwards. As described above, some loudspeakers may be unable to reproduce ultrasonic and/or near ultrasonic frequencies well. As a result of this, a received audio signal that comprises a test acoustic class which comprises a relatively high level of ultrasonic and/or near ultrasonic frequencies is likely to be reproduced poorly by a loudspeaker at these frequencies (for example, in a frequency band in the range of 5 kHz-20 kHz, or in a frequency band in the range above 16 kHz). Thus, the spectrum of a received audio signal representing a test acoustic class is likely to differ significantly from the expected spectrum of an audio signal representing a test acoustic class, wherein the test acoustic class comprises a relatively high level of ultrasonic and/or near ultrasonic frequencies, when the received audio signal results from a replay attack. Thus, in order to be able to detect a replay attack using this frequency band, it is preferable for the test acoustic class to comprise a relatively high level of ultrasonic and/or near ultrasonic frequencies, and suitable acoustic classes may for example be fricatives, such as sibilants, and plosives (suitability determination); see also p. 0112 - Having identified the speech content, information is obtained about a frequency spectrum of each portion of the audio signal for which the specific speech content is identified. For example, whereas the spectrum extraction block 42 may be configured to obtain a spectrum of the entire received audio signal, a second spectrum extraction block 47 may be configured to obtain a spectrum of those portions of the received audio signal for which the particular speech content of interest is identified. For example, as described above, the portions of the signal representing the specific speech content with a high proportion of high frequency components may be considered of interest, and the second spectrum extraction block 47 may be configured to obtain a spectrum of the frames of the received audio in which that speech content is identified (extraction block 47 is used to selectively trigger liveness detection based on determining suitability of the portions of the signal that are of interest based on their high frequency components)). Lesso, however, fails to disclose in response to determining that the ultrasonic component is suitable for detecting whether the speech is live speech is live speech, determining whether the received signal comprises live speech based on a measurement of an ultrasonic component of the received signal. Lee does teach in response to determining that the ultrasonic component is suitable for detecting whether the speech is live speech is live speech, determining whether the received signal comprises live speech based on a measurement of an ultrasonic component of the received signal (Lee; Pg. 5 Section 2. Background – using Sonar as a technique that uses sound propagation to detect objects for liveness detection by emitting ultrasonic sound (in the range of 18-20Khz) with a speaker and receiving the reflected sound with a microphone array (i.e., matrix voice); see also Pg. 17, Section 4.8 Discussion: Additional Design and Practical Issues - The Speaker-Sonar works by detecting movements… our approach aims to make sure that a command is coming from a real user (i.e., liveness detection); by detecting movement through the emission of ultrasonic sound signals and detection of Doppler shifts, Lee’s disclosure provides for determining suitability for liveness detection). Therefore, it would have been obvious to one of ordinary skill in the art to modify the method, non-transitory storage medium and apparatus of Lesso to include in response to determining that the ultrasonic component is suitable for detecting whether the speech is live speech is live speech, determining whether the received signal comprises live speech based on a measurement of an ultrasonic component of the received signal, as taught by Lee, in order to protect the smart speakers from remote attackers that leverage network-connected speakers to send malicious commands. The key idea… is to make sure that the voice command is indeed coming from the user (Lee; Pg. 2). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon includes: Jones (US PG Pub 20150269941) discloses embodiments of techniques or systems for fraud detection are provided herein. A communication may be received where the communication includes one or more voice signals from an individual. Frequency responses associated with these voice signals may be determined and analyzed and utilized to determine whether or not potential fraudulent activity is occurring. For example, if a frequency response is greater than a frequency threshold, potential fraudulent activity may be determined. Further, frequency responses may be cross referenced with voice biometrics, voice printing, or fraud pathway detection results. In this way, voice stress or frequency responses may be utilized to build other databases related to other types of fraud detection, thereby enhancing one or more aspects of fraud detection. For example, a database may include a voice library, a pathway library, or a frequency library which include characteristics associated with fraudulent activity, thereby facilitating identification of such activity (Jones; Abstract). Baker, IV (US PG Pub 20130204607) discloses a system implements voice detection using a receiver, a voice analyzer, and a voice identifier. The receiver receives a transmission from a transmission channel associated with a channel identification. The transmission includes a voice input. The voice analyzer analyzes the voice input and generates a plurality of voice metrics according to a plurality of analysis parameters. The voice identifier compares the voice metrics to one or more stored sets of voice metrics. Each set of voice metrics corresponds to a voice identification associated with the channel identification. The voice identifier also identifies a match between the voice metrics from the voice analyzer and at least one of the stored sets of voice metrics (Baker, IV; Abstract). Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139. The examiner can normally be reached Monday - Friday 9-6 ET. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /RODRIGO A CHAVEZ/Examiner, Art Unit 2658 /RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658
Read full office action

Prosecution Timeline

Apr 26, 2022
Application Filed
Jun 14, 2024
Non-Final Rejection — §101, §103
Aug 08, 2024
Response Filed
Oct 25, 2024
Final Rejection — §101, §103
Dec 19, 2024
Interview Requested
Jan 17, 2025
Applicant Interview (Telephonic)
Jan 25, 2025
Examiner Interview Summary
Jan 27, 2025
Request for Continued Examination
Jan 28, 2025
Response after Non-Final Action
Apr 05, 2025
Non-Final Rejection — §101, §103
Jul 03, 2025
Response Filed
Oct 13, 2025
Final Rejection — §101, §103
Dec 11, 2025
Response after Non-Final Action
Jan 12, 2026
Request for Continued Examination
Jan 26, 2026
Response after Non-Final Action
Mar 20, 2026
Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12597430
MULTI-CHANNEL SIGNAL GENERATOR, AUDIO ENCODER AND RELATED METHODS RELYING ON A MIXING NOISE SIGNAL
2y 5m to grant Granted Apr 07, 2026
Patent 12579984
DATA AUGMENTATION SYSTEM AND METHOD FOR MULTI-MICROPHONE SYSTEMS
2y 5m to grant Granted Mar 17, 2026
Patent 12541653
ENTERPRISE COGNITIVE SOLUTIONS LOCK-IN AVOIDANCE
2y 5m to grant Granted Feb 03, 2026
Patent 12542136
DYNAMICALLY CONFIGURING A WARM WORD BUTTON WITH ASSISTANT COMMANDS
2y 5m to grant Granted Feb 03, 2026
Patent 12531077
METHOD AND APPARATUS IN AUDIO PROCESSING
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

5-6
Expected OA Rounds
50%
Grant Probability
88%
With Interview (+37.3%)
3y 5m
Median Time to Grant
High
PTA Risk
Based on 228 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month