Last updated: April 19, 2026
Application No. 18/416,419
DETECTION OF ULTRASONIC SIGNALS

Final Rejection §103
Filed
Jan 18, 2024
Examiner
BRINEY III, WALTER F
Art Unit
2692
Tech Center
2600 — Communications
Assignee
Nokia Technologies Oy
OA Round
2 (Final)
Interview Optional

— +3.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 540 resolved cases, 2023–2026
Examiner Intelligence

BRINEY III, WALTER F View full profile →
Grants 65% — above average
Career Allow Rate
352 granted / 540 resolved
+3.2% vs TC avg
Minimal +4% lift
Without
With
+3.8%
Interview Lift
resolved cases with interview
Typical timeline
2y 12m
Avg Prosecution
58 currently pending
Career history
598
Total Applications
across all art units
Statute-Specific Performance

§101
1.7%
-38.3% vs TC avg
§103
63.2%
+23.2% vs TC avg
§102
13.5%
-26.5% vs TC avg
§112
9.4%
-30.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 540 resolved cases
Office Action

§103
Detailed Action
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . See 35 U.S.C. § 100 (note).
Art Rejections
Obviousness
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 16, 17, 20, 24, 29–31, 34 and 35 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Cong Shi et al., Authentication of Voice Commands by Leveraging Vibrations in Wearables, 2021 Annual Computer Security Applications Conference 83 (November/December 2021) (“Shi”) and Yan Michalevsky et al., Gyrophone: Recognizing Speech from Gyroscope Signals, 23d USENIX Security Symposium (20–-22 August 2014) (“Michalevsky”).
Claims 18, 19, 21, 32 and 33 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Shi; Michalevsky and US Patent Application Publication 2020/0394302 (published 17 December 2020) (“Nashimoto”).
Claim 22 is rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Shi; Michalevsky; Nashimoto and US Patent Application Publication 2019/0237096 (published 01 August 2019) (“Trella”).
Claim 23 is rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Shi; Michalevsky and Trella.
Claims 25 and 27 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Shi; Michalevsky; Nashimoto and US Patent Application Publication 2022/0122606 (published 21 April 2022) (“Kamkar-Parsi”).
Claim 26 is rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Shi, Michalevsky, and Kamkar-Parsi.
Claim 28 is rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Shi, Michalevsky, Kamkar-Parsi and Trella.
Claim 16 is drawn to “an apparatus.” The following table illustrates the correspondence between the claimed apparatus and the Shi reference.
Claim 16
The Shi Reference
“16. An apparatus comprising:
The Shi reference describes a cloud server, a voice assistant and a wearable device that together correspond to the claimed apparatus and user device. Shi at 84, 85, FIG.1. Shi’s apparatus is configured to perform a method of detecting inaudible ultrasonic voice command attacks and to ignore such inauthentic voice commands. Id.
“at least one processor; and
“at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform:
One of ordinary skill would have immediately recognized that a cloud server, such as Shi’s cloud server, is a computer having a processor and a memory programmed with instructions that cause the processor to perform a method—namely, Shi’s ultrasonic signal detection method.
“providing first data derived from a signal received by a microphone of a user device;
The voice assistant records audio data with a microphone. Id. at 86.
“providing second data representing mechanical oscillations within a gyroscope of the user device;
The wearable device records mechanical oscillations imparted into an accelerometer by a user’s voice and by an ultrasonic attack. Id. Though Shi describes the wearable as including a gyroscope, Shi does not describe using the signal from the gyroscope to detect an ultrasonic attack. Id. at 85.
“detecting, based at least in part on the first data and the second data, that the signal received by the microphone comprises an ultrasonic signal; and
Shi likewise describes comparing the first audio data from the microphone to the second oscillation data from the accelerometer to detect an ultrasonic attack signal. Id. at 86, 89, FIGs.2, 5.
“responsive to the detection, controlling the user device for mitigating one or more events associated with receipt of the ultrasonic signal by the microphone.”
If an ultrasonic attack signal is detected, Shi’s apparatus determines that a detected voice command is inauthentic, and ignores the command. Id. at 90, FIG.2. For example, an attacker might attack a voice assistant with an inaudible, ultrasonic voice command requesting disclosure of sensitive information, making a purchase or disarming a smart lock. Id. at 83. When Shi’s apparatus detects the inauthentic voice command, Shi’s apparatus ignores the command, effectively blocking audible disclosure of the sensitive information, blocking transmission of purchase information and blocking processing of an unlocking operation. See id. at 83, 90, FIG.2.

Table 1
The table above shows that the Shi reference describes an apparatus that corresponds closely to the claimed apparatus. The Shi reference does not anticipate the claimed invention because Shi’s apparatus does not analyze the output of a gyroscope to detect ultrasonic signals. Rather, Shi leverages the ability of an accelerometer to react to human voice to measure the similarity between the output of an accelerometer and a microphone to discriminate between authentic user voice commands and inauthentic attacks, including ultrasonic voice command attacks. However, the Michalevsky references teaches and suggests that a gyroscope, like an accelerometer, is responsive to voice signals. Michalevsky at §§ 2, 3. This would have reasonably suggested that a gyroscope output signal would exhibit similar characteristics as Shi’s accelerometer output signal. See Shi at 87–88 (describing the use of an accelerometer to measure voice). This would have further suggested detecting ultrasonic attacks by correlating the output of the gyroscope with the output of a microphone. For the foregoing reasons, the combination of the Shi and the Michalevsky references makes obvious all limitations of the claim.
Claim 17 depends on claim 16 and further requires the following:
“wherein the first data is provided at the output of one or more non-linear components that process the signal received by the microphone.”
Similarly, Shi’s apparatus includes a microphone that produces a non-linear output in response to an ultrasonic input signal. Shi at 84. For the foregoing reasons, the combination of the Shi and the Michalevsky references makes obvious all limitations of the claim.
Claim 18 depends on claim 16 and further requires the following:
“wherein the apparatus is further caused to detect that the signal received by the microphone comprises an ultrasonic signal based, at least in part, on identifying non-zero values of the first data and the second data for one or more corresponding time instances or time periods.”
The obviousness rejection of claim 16, incorporated herein, shows the obviousness of combining the teachings of Shi and Michalevsky to detect ultrasonic attacks by analyzing the outputs of multiple sensors, including a microphone and a gyroscope. The analysis would include performing a time-frequency correlation between a microphone output and an accelerometer output. See Shi at 89–90. In other words, Shi detects time-aligned, or corresponding, outputs in a microphone and an accelerometer, in order to discriminate between attacks and authentic voice commands. Id. Shi does not anticipate the identification of non-zero values of first and second data for one or more corresponding time instances.
The Shi reference teaches that in the presence of an ultrasonic attack, the microphone would produce a demodulated, non-zero audio output. Shi at 84. Shi further teaches that an accelerometer would not produce a substantial signal. Id. at 90.
The Nashimoto reference, like Shi, describes a set of sensors that generally correlate with each other during normal operations, but differ in abnormal, attack situations. Nashimoto at ¶¶ 59–63, FIGs.5–8. For example, an accelerometer, magnetic sensor and gyroscope tend to correlate with other during normal operation. Id. However, during an attack, the correlation disappears and each sensor exhibits a different pattern of response. Id. For example, during an attack, a gyroscope sensor will output a biased signal pattern along its three axes, with one axis dominating the output, such as the Z axis. Id.
Based on these findings, if one of ordinary skill modified Shi’s apparatus to use the output of a gyroscope instead of, or in addition to, the output of an accelerometer, one of ordinary skill would have reasonably expected that, in response to an ultrasonic attack, a microphone would produce a non-zero output (i.e., an inauthentic voice command) while a gyroscope would simultaneously produce a non-zero biased signal pattern. Accordingly, one of ordinary skill would have reasonably modified Shi’s apparatus to detect an ultrasonic attack by detecting corresponding, or time-aligned, non-zero microphone output signals and non-zero gyroscope output signals. One of ordinary skill would have reasonably recognized that adding the non-zero value analysis would provide an additional measure of authenticity to overcome uncertainty in synchronizing sensor outputs. See Shi at 89. For the foregoing reasons, the combination of the Shi, the Michalevsky and the Nashimoto references makes obvious all limitations of the claim.
Claim 19 depends on claim 16 and further requires the following:
“wherein the detecting comprises performing amplitude envelope correlation using respective waveforms represented by the first data and the second data for generating a first parameter indicative of a similarity between the respective waveforms, and wherein the detection is based, at least in part, on the first parameter.”
The Shi-Michalevsky combination proposed in the obviousness rejection of claim 16, incorporated herein, would perform a spectral correlation analysis on frequency-domain converted versions of the outputs from a microphone and an accelerometer/gyroscope. The correlation would indicate similarity between the output of the microphone and the output of the gyroscope. Shi at 89–90. Accordingly, Shi does not describe performing an amplitude envelope correlation, for example, a time-domain correlation, since Shi performs a time-frequency correlation.
The Nashimoto reference, however, teaches and suggests detecting ultrasonic signals by using time-domain correlation to identify similarities between multiple sensors. Nashimoto at ¶¶ 59–63, FIGs.5–8. This would have reasonably suggested modifying Shi’s apparatus to perform a time-domain correlation between sensors (i.e., a microphone and a gyroscope) to aid in detecting ultrasonic signals. For the foregoing reasons, the combination of the Shi, the Michalevsky and the Nashimoto references makes obvious all limitations of the claim.
Claim 21 depends on claim 19 and further requires the following:
“wherein the detecting further comprises performing spectral analysis of frequency domain representations of respective waveforms represented by the first data and the second data for generating a second parameter indicative of similarity between the frequency domain representations, and wherein the detection is based, at least in part, on the first and second parameters meeting respective predetermined conditions.”
The Shi-Michalevsky combination proposed in the obviousness rejection of claim 16, incorporated herein, would perform a spectral correlation analysis on a frequency domain conversion of outputs from a microphone and an accelerometer/gyroscope to produce a second parameter. The second parameter would be a correlation that indicates similarity between the output of the microphone and the output of the gyroscope. Shi at 89–90. One of ordinary skill would have readily recognized that both the time-domain correlation suggested by Nashimoto, Nashimoto at ¶¶ 59–63, FIGs.5–8, and the time-frequency domain correlation suggested by Shi, would be usable together simultaneously as two metrics for detecting ultrasonic signals. For example, one of ordinary skill would have used the two metrics as two independent measures of similarity or would have combined the two metrics as weighted factors in determining a similarity score. In other words, one of ordinary skill would have reasonably recognized that adding the non-zero value analysis would provide an additional measure of authenticity to overcome uncertainty in synchronizing sensor outputs. See Shi at 89. For the foregoing reasons, the combination of the Shi, the Michalevsky and the Nashimoto references makes obvious all limitations of the claim.
Claim 22 depends on claim 19 and further requires the following:
“wherein the detecting comprises one or more machine-learned models trained using training data comprising predetermined sets of first parameters known to be generated responsive to ultrasonic signals being transmitted to the user device, wherein the detection is based on an output of the one or more machine- learned models.”
The Shi reference does not describe the use of a trained machine learning model to detect ultrasonic attack signals. The Trella reference teaches and suggests training a machine learning model with known attack signals in order to recognize ultrasonic attacks based picked up by a set of sensors. Trella at ¶¶ 20, 27–29, FIG.5. This would have reasonably suggested further modifying Shi’s apparatus to include a similar trained machine learning model for providing another means of detecting ultrasonic attacks. For the foregoing reasons, the combination of the Shi, the Michalevsky, the Nashimoto and the Trella references makes obvious all limitations of the claim.
Claim 25 depends on claim 19 and further requires the following:
“wherein the user device comprises an earphone,
“wherein the microphone is provided on an external part of the earphone and a second microphone is provided on an internal part of the earphone, and
“wherein the apparatus is further caused to:
“provide third data derived from a signal received by the second microphone; and
“determine a third parameter indicative of an energy ratio between waveforms represented by the first and third data,
“wherein the detection is further based, at least in part, on the third parameter.”
Claim 27 depends on claim 25 and further requires the following:
“wherein the detecting further comprises performing spectral analysis of frequency domain representations of respective waveforms represented by the first data and the second data for generating a second parameter indicative of similarity between the frequency domain representations, and wherein the detection is based, at least in part, on the third parameter and at least one of the first parameter or the second parameter meeting respective predetermined conditions.”
Claims 25 and 27 are discussed together. The Shi reference describes a method for authenticating a voice command in a smart home environment. Shi at 83, FIGs.1, 2.
One of ordinary skill in the art would have known that voice commands are common features in several other environments, including hearing assistance systems. For example, the Kamkar-Parsi reference describes a hearing device system that includes an earphone, an external, ambient microphone and an internal, canal microphone. Kamkar-Parsi at ¶¶ 11–13, 49–53, FIG.1. Kamkar-Parsi’s hearing device system further supports user voice commands. Id. The Kamkar-Parsi reference teaches and suggests that in a hearing device system, detecting a user’s own voice by comparing the energy output of an external, ambient microphone and the energy output of an internal, canal microphone. Id. at ¶¶ 12, 51
Read in light of Shi, one of ordinary skill would have reasonably expected that a hearing device system with voice command support, like the one described by Kamkar-Parsi, is susceptible to ultrasonic attacks. Shi and Kamkar-Parsi both describe different techniques for detecting and verifying user voice commands. Accordingly, it would have been obvious to combine those techniques in a hearing device embodiment. For example, as suggested by Shi, one of ordinary skill in the art would performed a time-frequency correlation between an accelerometer/gyroscope as a factor for validating the authenticity of user’s voice command; and, as suggested by Kamkar-Parsi, one of ordinary skill would have determined the energy ratio between an external microphone and an internal microphone to produce an additional factor for validating the authenticity of a user’s voice command. In other words, one of ordinary skill would have reasonably recognized that adding own voice detection would provide an additional measure of authenticity to overcome uncertainty in synchronizing sensor outputs. See Shi at 89. For the foregoing reasons, the combination of the Shi, the Michalevsky, the Nashimoto and the Kamkar-Parsi references makes obvious all limitations of the claims.
Claim 20 depends on claim 16 and further requires the following:
“wherein the detecting comprises performing spectral analysis of frequency domain representations of respective waveforms represented by the first data and the second data for generating a second parameter indicative of similarity between the frequency domain representations, and wherein the detection is based, at least in part, on the second parameter.”
The Shi-Michalevsky combination proposed in the obviousness rejection of claim 16, incorporated herein, would perform a spectral correlation analysis on a frequency domain conversion of outputs from a microphone and an accelerometer/gyroscope to produce a second parameter. Shi at 89–90. The second parameter would be a correlation that indicates similarity between the output of the microphone and the output of the gyroscope. Id. For the foregoing reasons, the combination of the Shi and the Michalevsky references makes obvious all limitations of the claim.
Claim 23 depends on claim 20 and further requires the following:
“wherein the detecting comprises one or more machine-learned models trained using training data comprising predetermined sets of second parameters known to be generated responsive to ultrasonic signals being transmitted to the user device, wherein the detection is based on an output of the one or more machine- learned models.”
The Shi reference does not describe the use of a trained machine learning model to detect ultrasonic attack signals. The Trella reference teaches and suggests training a machine learning model with known attack signals in order to recognize ultrasonic attacks based picked up by a set of sensors. Trella at ¶¶ 20, 27–29, FIG.5. This would have reasonably suggested further modifying Shi’s apparatus to include a similar trained machine learning model for providing another means of detecting ultrasonic attacks. For the foregoing reasons, the combination of the Shi, the Michalevsky and the Trella references makes obvious all limitations of the claim.
Claim 26 depends on claim 20 and further requires the following:
“wherein the user device comprises an earphone, wherein the microphone is provided on an external part of the earphone and a second microphone is provided on an internal part of the earphone, and wherein the apparatus is further caused to: provide third data derived from a signal received by the second microphone; and determine a third parameter indicative of an energy ratio between waveforms represented by the first and third data, wherein the detection is further based, at least in part, on the third parameter.”
The Shi reference describes a method for authenticating a voice command in a smart home environment. Shi at 83, FIGs.1, 2.
One of ordinary skill in the art would have known that voice commands are common features in several other environments, including hearing assistance systems. For example, the Kamkar-Parsi reference describes a hearing device system that includes an earphone, an external, ambient microphone and an internal, canal microphone. Kamkar-Parsi at ¶¶ 11–13, 49–53, FIG.1. Kamkar-Parsi’s hearing device system further supports user voice commands. Id. The Kamkar-Parsi reference teaches and suggests that in a hearing device system, detecting a user’s own voice by comparing the energy output of an external, ambient microphone and the energy output of an internal, canal microphone. Id. at ¶¶ 12, 51
Read in light of Shi, one of ordinary skill would have reasonably expected that a hearing device system with voice command support, like the one described by Kamkar-Parsi, is susceptible to ultrasonic attacks. Shi and Kamkar-Parsi both describe different techniques for detecting and verifying user voice commands. Accordingly, it would have been obvious to combine those techniques in a hearing device embodiment. For example, as suggested by Shi, one of ordinary skill in the art would performed a time-frequency correlation between an accelerometer/gyroscope as a factor for validating the authenticity of user’s voice command; and, as suggested by Kamkar-Parsi, one of ordinary skill would have determined the energy ratio between an external microphone and an internal microphone to produce an additional factor for validating the authenticity of a user’s voice command. In other words, one of ordinary skill would have reasonably recognized that adding own voice detection would provide an additional measure of authenticity to overcome uncertainty in synchronizing sensor outputs. See Shi at 89. For the foregoing reasons, the combination of the Shi, the Michalevsky and the Kamkar-Parsi references makes obvious all limitations of the claim.
Claim 28 depends on claim 26 and further requires the following:
“wherein the detecting comprises one or more machine-learned models trained using training data comprising predetermined sets of third parameters and first parameters known to be generated responsive to ultrasonic signals being transmitted to the user device, wherein the detection is based on an output of the one or more machine-learned models.”
The Shi reference does not describe the use of a trained machine learning model to detect ultrasonic attack signals. The Trella reference teaches and suggests training a machine learning model with known attack signals in order to recognize ultrasonic attacks based picked up by a set of sensors. Trella at ¶¶ 20, 27–29, FIG.5. This would have reasonably suggested further modifying Shi’s apparatus to include a similar trained machine learning model for providing another means of detecting ultrasonic attacks. For the foregoing reasons, the combination of the Shi, the Michalevsky, the Kamkar-Parsi and the Trella references makes obvious all limitations of the claim.
Claim 24 depends on claim 16 and further requires the following:
“wherein the user device comprises one of a digital assistant, smartphone, tablet computer, smart speaker, smart glasses, other smart home appliance, head-worn display device, an earphone or a hearing aid.”
Shi’s apparatus comprises a smart home appliance, such as a voice assistant. Shi at 84, 85, FIG.1. For the foregoing reasons, the combination of the Shi and the Michalevsky references makes obvious all limitations of the claim.
Claim 29 depends on claim 16 and further requires the following:
“wherein the second data represents mechanical oscillations with respect to two or more axes of the gyroscope.”
As shown in the obviousness rejection of claim 16, incorporated herein, the teachings of the Nashimoto and Michalevsky references reasonably suggest using a gyroscope as a sensor to detect voice and ultrasonic signals. And as shown in the obviousness rejection of claim 19, incorporated herein, the gyroscope will exhibit a particular pattern among its three axes. One of ordinary skill would have reasonably identified this pattern by monitoring the output of the gyroscope in order to detect an ultrasonic attack. For the foregoing reasons, the combination of the Shi and the Michalevsky references makes obvious all limitations of the claim.
Claim 30 depends on claim 16 and further requires the following:
“wherein the controlling comprises performing at least of:
“disabling one or more loudspeakers of the user device;
“muting or attenuating an output signal to one or more loudspeakers of the user device, wherein the output signal is derived from at least some of the first data;
“disabling a processing function of the user device that receives as input at least some of the first data; or
“disabling a transmission function of the user device that transmits at least some of the first data.”
If an ultrasonic attack signal is detected, Shi’s apparatus determines that a detected voice command is inauthentic, and ignores the command. Shi at 90, FIG.2. For example, an attacker might attack a voice assistant with an inaudible, ultrasonic voice command requesting disclosure of sensitive information, making a purchase or disarming a smart lock. Id. at 83. When Shi’s apparatus detects the inauthentic voice command, Shi’s apparatus ignores the command, effectively blocking audible disclosure of the sensitive information, blocking transmission of purchase information and blocking processing of an unlocking operation. See id. at 83, 90, FIG.2. For the foregoing reasons, the combination of the Shi and the Michalevsky references makes obvious all limitations of the claim.
Claim 31 is drawn to “a method.” The following table illustrates the correspondence between the claimed method and the Shi reference.
Claim 35
The Shi Reference
31. A method, comprising:
The Shi reference describes a cloud server, a voice assistant and a wearable device that together perform a method corresponding to the claimed method. Shi at 84, 85, FIG.1. Shi’s apparatus is configured to perform a method of detecting inaudible ultrasonic voice command attacks and to ignore such inauthentic voice commands. Id.
“providing first data derived from a signal received by a microphone of a user device;
The voice assistant records audio data with a microphone. Id. at 86.
“providing second data representing mechanical oscillations within a gyroscope of the user device;
The wearable device records mechanical oscillations imparted into an accelerometer by a user’s voice and by an ultrasonic attack. Id. Though Shi describes the wearable as including a gyroscope, Shi does not describe using the signal from the gyroscope to detect an ultrasonic attack. Id. at 85.
“detecting, based at least in part on the first data and the second data, that the signal received by the microphone comprises an ultrasonic signal; and
Shi likewise describes comparing the first audio data from the microphone to the second oscillation data from the accelerometer to detect an ultrasonic attack signal. Id. at 86, 89, FIGs.2, 5.
“responsive to the detection, controlling the user device for mitigating one or more events associated with receipt of the ultrasonic signal by the microphone.”
If an ultrasonic attack signal is detected, Shi’s apparatus determines that a detected voice command is inauthentic, and ignores the command. Id. at 90, FIG.2. For example, an attacker might attack a voice assistant with an inaudible, ultrasonic voice command requesting disclosure of sensitive information, making a purchase or disarming a smart lock. Id. at 83. When Shi’s apparatus detects the inauthentic voice command, Shi’s apparatus ignores the command, effectively blocking audible disclosure of the sensitive information, blocking transmission of purchase information and blocking processing of an unlocking operation. See id. at 83, 90, FIG.2.

Table 2
The table above shows that the Shi reference describes a method that corresponds closely to the claimed method. The Shi reference does not anticipate the claimed invention because Shi’s apparatus does not analyze the output of a gyroscope to detect ultrasonic signals. Rather, Shi leverages the ability of an accelerometer to react to human voice to measure the similarity between the output of an accelerometer and a microphone to discriminate between authentic user voice commands and inauthentic attacks, including ultrasonic voice command attacks. However, the Michalevsky references teaches and suggests that a gyroscope, like an accelerometer, is responsive to voice signals. Michalevsky at §§ 2, 3. This would have reasonably suggested that a gyroscope output signal would exhibit similar characteristics as Shi’s accelerometer output signal. See Shi at 87–88 (describing the use of an accelerometer to measure voice). This would have further suggested detecting ultrasonic attacks by correlating the output of the gyroscope with the output of a microphone. For the foregoing reasons, the combination of the Shi and the Michalevsky references makes obvious all limitations of the claim.
Claim 32 depends on claim 31 and further requires the following:
“wherein the detecting is configured to detect that the signal received by the microphone comprises an ultrasonic signal based, at least in part, on identifying non-zero values of the first data and the second data for one or more corresponding time instances or time periods.”
The obviousness rejection of claim 16, incorporated herein, shows the obviousness of combining the teachings of Shi and Michalevsky to detect ultrasonic attacks by analyzing the outputs of multiple sensors, including a microphone and a gyroscope. The analysis would include performing a time-frequency correlation between a microphone output and an accelerometer output. See Shi at 89–90. In other words, Shi detects time-aligned, or corresponding, outputs in a microphone and an accelerometer, in order to discriminate between attacks and authentic voice commands. Id. Shi does not anticipate the identification of non-zero values of first and second data for one or more corresponding time instances.
The Shi reference teaches that in the presence of an ultrasonic attack, the microphone would produce a demodulated, non-zero audio output. Shi at 84. Shi further teaches that an accelerometer would not produce a substantial signal. Id. at 90.
The Nashimoto reference, like Shi, describes a set of sensors that generally correlate with each other during normal operations, but differ in abnormal, attack situations. Nashimoto at ¶¶ 59–63, FIGs.5–8. For example, an accelerometer, magnetic sensor and gyroscope tend to correlate with other during normal operation. Id. However, during an attack, the correlation disappears and each sensor exhibits a different pattern of response. Id. For example, during an attack, a gyroscope sensor will output a biased signal pattern along its three axes, with one axis dominating the output, such as the Z axis. Id.
Based on these findings, if one of ordinary skill modified Shi’s apparatus to use the output of a gyroscope instead of, or in addition to, the output of an accelerometer, one of ordinary skill would have reasonably expected that, in response to an ultrasonic attack, a microphone would produce a non-zero output (i.e., an inauthentic voice command) while a gyroscope would simultaneously produce a non-zero biased signal pattern. Accordingly, one of ordinary skill would have reasonably modified Shi’s apparatus to detect an ultrasonic attack by detecting corresponding, or time-aligned, non-zero microphone output signals and non-zero gyroscope output signals. One of ordinary skill would have reasonably recognized that adding the non-zero value analysis would provide an additional measure of authenticity to overcome uncertainty in synchronizing sensor outputs. See Shi at 89. For the foregoing reasons, the combination of the Shi, the Michalevsky and the Nashimoto references makes obvious all limitations of the claim.
Claim 33 depends on claim 31 and further requires the following:
“wherein the detecting is configured to perform amplitude envelope correlation using respective waveforms represented by the first data and the second data for generating a first parameter indicative of a similarity between the respective waveforms, and wherein the detection is based, at least in part, on the first parameter.”
The Shi-Michalevsky combination proposed in the obviousness rejection of claim 16, incorporated herein, would perform a spectral correlation analysis on frequency-domain converted versions of the outputs from a microphone and an accelerometer/gyroscope. The correlation would indicate similarity between the output of the microphone and the output of the gyroscope. Shi at 89–90. Accordingly, Shi does not describe performing an amplitude envelope correlation, for example, a time-domain correlation, since Shi performs a time-frequency correlation.
The Nashimoto reference, however, teaches and suggests detecting ultrasonic signals by using time-domain correlation to identify similarities between multiple sensors. Nashimoto at ¶¶ 59–63, FIGs.5–8. This would have reasonably suggested modifying Shi’s apparatus to perform a time-domain correlation between sensors (i.e., a microphone and a gyroscope) to aid in detecting ultrasonic signals. For the foregoing reasons, the combination of the Shi, the Michalevsky and the Nashimoto references makes obvious all limitations of the claim.
Claim 34 depends on claim 31 and further requires the following:
“wherein the controlling comprises performing at least one of: disabling one or more loudspeakers of the user device; muting or attenuating an output signal to one or more loudspeakers of the user device, wherein the output signal is derived from at least some of the first data; disabling a processing function of the user device that receives as input at least some of the first data; or disabling a transmission function of the user device that transmits at least some of the first data.”
If an ultrasonic attack signal is detected, Shi’s apparatus determines that a detected voice command is inauthentic, and ignores the command. Shi at 90, FIG.2. For example, an attacker might attack a voice assistant with an inaudible, ultrasonic voice command requesting disclosure of sensitive information, making a purchase or disarming a smart lock. Id. at 83. When Shi’s apparatus detects the inauthentic voice command, Shi’s apparatus ignores the command, effectively blocking audible disclosure of the sensitive information, blocking transmission of purchase information and blocking processing of an unlocking operation. See id. at 83, 90, FIG.2. For the foregoing reasons, the combination of the Shi and the Michalevsky references makes obvious all limitations of the claim.
Claim 35 is drawn to “a non-transitory computer readable medium.” The following table illustrates the correspondence between the claimed medium and the Shi reference.
Claim 35
The Shi Reference
35. A non-transitory computer readable medium comprising program instructions stored thereon for performing at least the following:
The Shi reference describes a cloud server, a voice assistant and a wearable device that together form an apparatus. Shi at 84, 85, FIG.1. Shi’s apparatus is configured to perform a method of detecting inaudible ultrasonic voice command attacks and to ignore such inauthentic voice commands. Id. One of ordinary skill would have immediately recognized that a cloud server, such as Shi’s cloud server, is a computer having a processor and a non-transitory computer readable medium that is programmed to store instructions that cause the processor to perform a method—namely, Shi’s ultrasonic signal detection method.
“providing first data derived from a signal received by a microphone of a user device;
The voice assistant records audio data with a microphone. Id. at 86.
“providing second data representing mechanical oscillations within a gyroscope of the user device;
The wearable device records mechanical oscillations imparted into an accelerometer by a user’s voice and by an ultrasonic attack. Id. Though Shi describes the wearable as including a gyroscope, Shi does not describe using the signal from the gyroscope to detect an ultrasonic attack. Id. at 85.
“detecting, based at least in part on the first data and the second data, that the signal received by the microphone comprises an ultrasonic signal; and
Shi likewise describes comparing the first audio data from the microphone to the second oscillation data from the accelerometer to detect an ultrasonic attack signal. Id. at 86, 89, FIGs.2, 5.
“responsive to the detection, controlling the user device for mitigating one or more events associated with receipt of the ultrasonic signal by the microphone.”
If an ultrasonic attack signal is detected, Shi’s apparatus determines that a detected voice command is inauthentic, and ignores the command. Id. at 90, FIG.2. For example, an attacker might attack a voice assistant with an inaudible, ultrasonic voice command requesting disclosure of sensitive information, making a purchase or disarming a smart lock. Id. at 83. When Shi’s apparatus detects the inauthentic voice command, Shi’s apparatus ignores the command, effectively blocking audible disclosure of the sensitive information, blocking transmission of purchase information and blocking processing of an unlocking operation. See id. at 83, 90, FIG.2.

Table 3
The table above shows that the Shi reference describes a medium that corresponds closely to the claimed medium. The Shi reference does not anticipate the claimed invention because Shi’s apparatus does not analyze the output of a gyroscope to detect ultrasonic signals. Rather, Shi leverages the ability of an accelerometer to react to human voice to measure the similarity between the output of an accelerometer and a microphone to discriminate between authentic user voice commands and inauthentic attacks, including ultrasonic voice command attacks. However, the Michalevsky references teaches and suggests that a gyroscope, like an accelerometer, is responsive to voice signals. Michalevsky at §§ 2, 3. This would have reasonably suggested that a gyroscope output signal would exhibit similar characteristics as Shi’s accelerometer output signal. See Shi at 87–88 (describing the use of an accelerometer to measure voice). This would have further suggested detecting ultrasonic attacks by correlating the output of the gyroscope with the output of a microphone. For the foregoing reasons, the combination of the Shi and the Michalevsky references makes obvious all limitations of the claim.
Summary
Claims 16–35 are rejected under at least one of 35 U.S.C. §§ 102 and 103 as being unpatentable over the cited prior art. In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 C.F.R. § 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
Response to Applicant’s Arguments
Applicant’s Reply at 7–10 (29 January 2026) includes comments pertaining to the rejections presented in this Office action. The Examiner has considered the comments, but they are unpersuasive of any error in the rejections.
Concerning claims 16, 31 and 35, Applicant comments that the Shi reference does not describe the claimed gyroscope. (Reply at 7–8). This point is uncontested. The rejection of claims 16, 31 and 35 is based on the combination of both Shi and Michalevesky, who suggests the use of a gyroscope in place, or in addition to, Shi’s accelerometer.
Applicant further comments that the Shi reference does not describe detecting based on first and second data that the signal received by the microphone comprises an ultrasonic signal. (Reply at 8.) Applicant’s view of Shi is unpersuasive. Shi states:
“By leveraging wearable devices as a personal identity token, our solution captures users’ voice characteristics in the aerial speech vibration through a wearable’s accelerometer and compares them with the voice characteristics in the audio speech captured by a VA device’s microphone. When a legitimate user gives a command, the similarity between the voice characteristics obtained from the vibration domain and the audio domain should have high similarity. Otherwise, the command is from an adversary.”
Shi at p. 85, ¶ 1. Stated otherwise, Shi compares the output of a microphone and an accelerometer to determine whether a command is from a legitimate user or an adversary. When the comparison yields a high-degree of similarity, the command is from a legitimate user; otherwise, the command is from an adversary. An adversary, in the context of Shi includes ultrasonic attackers as stated in the section of Shi explaining system performance evaluation Shi at p. 90, ¶ 6:
“We also evaluate WearID during hidden voice command attacks. We collect 100 samples of 10 hidden voice commands replayed by a loudspeaker and compute the similarity between the microphone and accelerometer recordings. The results show that the similarities are approximately zero for the hidden voice commands, meaning that the commands can be well differentiated from the similarity scores of the legitimate users (i.e., around 0.5 for the Huawei Watch 2 Sport and 0.4 for the LG W150). We test the ability of WearID to defend against ultrasound attacks by replaying a signal sweeping across 15 ~25 kHz from a tweeter speaker. In the experiment, we do not observe any sound signals in the recorded accelerometer readings, which confirms that WearID is not vulnerable to ultrasound attacks.”
Shi at FIG.1 further illustrates hidden voice ultrasound attacks as a type of attack among a set of various attacks. Thus, Shi’s comparison is plainly designed to detect ultrasonic attacks and reject them. And while it is true that Shi’s methodology may detect other adversarial attacks, it is at least sensitive to ultrasonic attacks, and meets the scope of the claims, which do not describe detection in any more restrictive terms (e.g., requiring detection of only ultrasonic signals, or by performing specific detection sub steps).
Applicant comments that Shi’s accelerometer does not distinguish between ultrasonic attacks and distant audible signals. (Reply at 8–9). This observation, however, does not differentiate Shi from the claimed invention. Nothing in the claim language requires detecting ultrasonic signals in a way that cannot also detect audible impersonation attacks at a distance. It is not even clear that such a claim limitation is supported by the original filed specification.
Applicant comments that Shi’s accelerometer and gyroscope are included in a wearable and that Shi’s microphone is included in a voice assistant (VA) device. (Reply at 9). This is true, but the claim only requires that the microphone and gyroscope are part of a “user device” without specifying the form of the user device. The term “user device” plainly broadly a mechanism in a user’s environment, for example, a mechanism to capture sensor data needed for voice recognition and to reject ultrasonic attacks. Accordingly, Shi’s wearable and VA device together form a user device for sensing the acoustic and vibrations in the user’s environment. If Applicant desires to pursue this possible distinction with the Shi reference, Applicant is advised to amend the claims to specify the form of the user device, for example, by specifying the specific type of the user device.
Applicant comments that using signals from Michalevsky’s gyroscope instead of from Shi’s accelerometer would not have been obvious. (Reply at 9). Applicant comments that Shi’s accelerometer is useful precisely because it does not respond to ultrasonic signals while Michalevsky’s gyroscope would respond to high-frequency audible signals. This comment is unpersuasive. It assumes that just because some gyroscopes may produce some measurable response to ultrasonic signals, it would be non-obvious to use a gyroscope in place of Shi’s accelerometer. The first problem with this assumption is that it is based on information contained in Applicant’s specification, which was presumably not in the possession of one of ordinary skill in the art at the time this Application was effectively filed. The second problem is that it argues against the combination of references by trying to bodily incorporate the embodiments of the references without considering all that they teach to one of ordinary skill in the art. The Shi reference describes an embodiment where instead of comparing the signals of two or more microphones, the outputs of a microphone and a different type of sensor—namely, an accelerometer—are compared to detect imposter attacks, including inaudible ultrasonic attacks. Michalevsky’s teachings about the capability of gyroscopes to detect acoustic signals strongly indicates that it is another type of sensor that may be compared to the output of a microphone or accelerometer to detect various types of attacks. In particular, Michalevsky teaches that gyroscopes are sensitive to acoustic signals, particularly around their resonance frequency, which tends to be in the ultrasonic range (i.e., greater than 20 KHz). Michalevsky at § 2.2. With this suggestion in place, one of ordinary skill would have conducted routine experimentation to determine if a gyroscope is a suitable sensor by comparing its output to that of a microphone and/or accelerometer to gauge its suitability for detecting an attack, including an inaudible ultrasonic attack as suggested by Shi.
For the foregoing reasons, Applicant has not persuasively established any error in the Office action. All the rejections are maintained.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 C.F.R. § 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 C.F.R. § 1.17(a)) pursuant to 37 C.F.R. § 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WALTER F BRINEY III whose telephone number is (571)272-7513. The examiner can normally be reached M-F 8 am-4:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carolyn Edwards can be reached at 571-270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Walter F Briney III/

Walter F Briney IIIPrimary ExaminerArt Unit 2692

3/18/2026
Read full office action
Prosecution Timeline

Jan 18, 2024
Application Filed
Sep 05, 2025
Non-Final Rejection — §103
Jan 29, 2026
Response Filed
Mar 18, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/940,871
Patent 12598444
Apparatus and Method for Rendering a Sound Scene Using Pipeline Stages
2y 5m to grant Granted Apr 07, 2026
18/152,065
Patent 12598442
AUTOMATIC LOUDSPEAKER DIRECTIVITY ADAPTATION
2y 5m to grant Granted Apr 07, 2026
18/562,609
Patent 12598412
Sound Signal Processing Method and Headset Device
2y 5m to grant Granted Apr 07, 2026
18/522,158
Patent 12587791
SOUND-GENERATING DEVICE
2y 5m to grant Granted Mar 24, 2026
18/223,871
Patent 12581245
LOUDSPEAKER
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
65%
Grant Probability
69%
With Interview (+3.8%)
2y 12m
Median Time to Grant
Moderate
PTA Risk
Based on 540 resolved cases by this examiner. Grant probability derived from career allow rate.
DETECTION OF ULTRASONIC SIGNALS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email