Prosecution Insights
Last updated: May 29, 2026
Application No. 18/180,277

METHOD FOR MULTIFACTOR AUTHENTICATION USING BONE CONDUCTION AND AUDIO SIGNALS

Non-Final OA §103
Filed
Mar 08, 2023
Priority
Mar 08, 2022 — provisional 63/268,999 +2 more
Examiner
POUDEL, SAMIKSHYA NMN
Art Unit
2436
Tech Center
2400 — Computer Networks
Assignee
UNIVERSITY OF HOUSTON SYSTEM
OA Round
3 (Non-Final)
47%
Grant Probability
Moderate
3-4
OA Rounds
0m
Est. Remaining
99%
With Interview

Examiner Intelligence

Grants 47% of resolved cases
47%
Career Allowance Rate
9 granted / 19 resolved
-10.6% vs TC avg
Strong +82% interview lift
Without
With
+81.8%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
19 currently pending
Career history
48
Total Applications
across all art units

Statute-Specific Performance

§101
0.8%
-39.2% vs TC avg
§103
93.4%
+53.4% vs TC avg
§102
4.1%
-35.9% vs TC avg
§112
0.8%
-39.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 19 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 08/28/2025 has been entered. Response to Amendment In the response filed on 08/28/2025. The applicant amended claims 1, 12, and 19 are amended. No claims were added. Response to Arguments With respect to 135 U.S.C. §103 rejections: Applicant's arguments filed on 08/28/2025 have been received and entered. Applicant's arguments with respect to the newly amended independents “Claim Rejections - 35 USC § 103” remarks pages 7-9, have been considered but are moot because the claim amendment introduces new claim limitations that have not previously been considered. Therefore, the new 103 ground of rejection relies on new references in combination as presented below. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1, 3, 6-11, and 19-22 are rejected under 35 U.S.C. 103 as being unpatentable over Fawaz (US 20180068671 A1) in view of Zhang (US 20210256979 A1). Regarding claim 1, Fawaz teaches a method for two-way authentication of a user, the method comprising: receiving a bone conduction signal from the user via one or more wearable devices (Fawaz, Fig 2, The method 200 may begin after the system records vibration data via an accelerometer device (Block 202) (i.e., Bone conduction signal), vibration data corresponds to recorded vibrations related to speech from a user associated with the accelerometer device. In an embodiment, a user may be associated with the accelerometer device if the user is wearing the device as a necklace, headset, or other suitable equipment (i.e., one or more wearable devices), [0036]) [Examiner interprets that system recoding body vibration corresponding to speech of a user by using accelerometer which is worn as a necklace/headset devices as receiving a bone conduction signal from the user via one or more wearable devices]; receiving an audio signal from the user via a microphone separate from the one or more wearable devices, the audio signal to correspond with the bone conduction signal (Fawaz, The system collects various data associated with a voice command, such as a speech signal and vibrations of the head, neck and/or chest of a user corresponding to the speech signals. The data is collected from at least one accelerometer and one microphone disposed in one or more devices, For example, the accelerometer may be disposed in a necklace worn by a user and communicatively coupled to a smartphone, including a microphone, that implements a voice assistant. As the user speaks, the wearable device utilizes an accelerometer to record and then transmit the vibration data to the smart phone which is simultaneously recording the speech signals, [0017] The client device 130 may include a microphone 110 to record speech signals. However, in other embodiments, the microphone 110 may be separate from the client device 130, [0029]) [Examiner interprets system having accelerometer in a necklace and microphone in a smart phone (i.e., separate wearable devices) and recording speech signals (i.e., audio signal) while the wearable device recoding vibration simultaneously (i.e., , the audio signal to correspond with the bone conduction signal) as limitation above]; determining a consistency score for the audio signal in relation to the corresponding bone conduction signal, the consistency score comprising a number indicative of a probability that the audio signal and bone conduction signal originate from the user (Fawaz, the VA module 112 may be configured to run various algorithms to determine if the speech signals originated from the user corresponding to the vibration data…the VA module 112 may compare the speech signals to the vibration data to determine if they are correlated. Depending on the criteria of the system, the correlation may need to be within a certain percentage to authenticate the voice command. In other words, if the VA module 112 determines that the speech signals have a high enough correlation to the vibration data, then the speech signals can be attributed to the user and, thus, a voice command corresponding to the speech signals is considered authenticated, [0034] The matching algorithm may include receiving speech signals and vibration data along with the corresponding sampling frequencies. The matching algorithm may conclude by producing a decision value indicating whether there is a match between the speech signals and the vibration data (i.e. authenticate or deny), [0042] The final post-processing step may include measuring the signal similarity between the accelerometer and microphone signals by using the normalized cross correlation. The system analyzes whether the normalized cross correlation has a significant peak and that the maximum value is outside the range [−0.4, 0.4]. This indicates that the two signals are included within each other as shown in the plot 540 of FIG. 5. In this case, the system will conclude that the resulting microphone signal matches the recorded vibration data, [0052]) [Examiner interprets that VA module calculating normalized cross correlation between speech and vibration signal and produces the decision value compared to thresholds which Under BRI numeric similarity/decision value (i.e., consistency score) for the audio signal in relation to the corresponding bone conduction signal and deciding whether the command is originated from the same user based on this value (i.e., indicative of probability) as limitation above]; in response to the consistency score being greater than or equal to a consistency threshold and in response to a difference in reception time of the bone conduction signal and the audio signal being less than a preselected time interval (Fawaz, the VA module 112 may synchronize the recorded speech signals with the recorded vibrations. The VA module 112 may perform the synchronization by aligning time shifts of the speech signal and vibration data such that there is maximum cross correlation between both signals. This operation may be vital for any comparisons, as the speech signals and the vibration data may not be received and/or recorded at the same time, [0033] the system records vibration data via an accelerometer device (Block 202)….At approximately the same time, the system may also record speech signals via a microphone (Block 204). If speech signals and vibration data are recorded at disparate points of time, then clearly the speech signals are disassociated from the vibration data and authentication automatically fails. Thus, the method 200 only continues with speech signals and vibration data which are recorded at substantially the same time, [0036-0037] Comparison of the speech signals to the vibration data may also include performing a correlation analysis to determine the correlation between the two. Acceptable correlation levels between the vibration data and speech signals may depend on a number of factors such as the quality of the recordings, the necessary level of security, usability/wearability, etc. If the speech signals and vibration data are sufficiently correlated, then the speech signal was originated from the user corresponding to the vibration data. Thus, the voice command corresponding to the speech signal is authenticated, [0040] The final post-processing step may include measuring the signal similarity between the accelerometer and microphone signals by using the normalized cross correlation. The system analyzes whether the normalized cross correlation has a significant peak and that the maximum value is outside the range [−0.4, 0.4]. This indicates that the two signals are included within each other as shown in the plot 540 of FIG. 5. In this case, the system will conclude that the resulting microphone signal matches the recorded vibration data, [0052]) [Examiner interprets that system using minimum correlation requirement such as correlation outside [-0.4,0.4] (i.e., consistency score), comparing against the threshold and requiring the signals to be recorded substantially at the same time and failing authentication when recorded at disparate times, alignment via time shifts teaches limitation above]; Although, Fawaz teaches VA module uses defined algorithm such as filters, per segment test, correlation to clean and evaluate the microphonic signal and decide whether it matches vibration from the user, [0033,0047,0052], explicit logic applied to the vibration signal such as rejecting segments not corresponding to a human speech or not matching speech segments (length filters, glottal pulses, pitch constraints) to verify if the remaining vibration data comes from the speaking user, [0024,0046,0049] and when both channels matches (i.e., both signals are satisfied), the command is authenticated for access, [0017,0041], Fawaz does not explicitly teach: verifying, using an audio conduction model (AC model), that the audio signal is associated with the user; verifying, using a bone conduction model (BC model), that the bone conduction signal is associated with the user; in response to verification of the audio signal and the bone conduction signal, enabling, for the user, access to a smart device However, Zhang teaches: verifying, using an audio conduction model (AC model), that the audio signal is associated with the user (Zhang, When the first voice component matches the first voiceprint model, it indicates that the voice information collected by the Bluetooth headset at this time is entered by the authorized user. A higher matching degree indicates more similarity between the voice component and the corresponding voiceprint model, and a higher possibility that a voicing user is the authorized user. [0092]); verifying, using a bone conduction model (BC model), that the bone conduction signal is associated with the user (Zhang, When the second voice component matches the second voiceprint model, it indicates that the voice information collected by the Bluetooth headset at this time is entered by the authorized user. A higher matching degree indicates more similarity between the voice component and the corresponding voiceprint model, and a higher possibility that a voicing user is the authorized user. [0092]); and in response to verification of the audio signal and the bone conduction signal, enabling, for the user, access to a smart device (Zhang, Bluetooth headset determine that the first voice component matches the first voiceprint model, and the second voice component matches the second voiceprint model and sends to the mobile phone (i.e., a smart device), an operation instruction corresponding to the voice information, for example, an unlock instruction, a power-off instruction, or an instruction for calling a specific contact. In this way, the mobile phone performs a corresponding operation based on the operation instruction, so that the user can control the mobile phone by using a voice (i.e., access to a smart device) , [0092]) [ Examiner interprets that checking if the voice components does not match their voice models to authorize users to perform operation on the mobile phone such as access as in response to verification of the audio signal and the bone conduction signal, enabling, for the user, access to a smart device] Therefore, it would have been obvious to PHOSITA before the effective filing date to modify the teaching of Fawaz to include a concept of verifying, using an audio conduction model (AC model), that the audio signal is associated with the user; verifying, using a bone conduction model (BC model), that the bone conduction signal is associated with the user; in response to verification of the audio signal and the bone conduction signal, enabling, for the user, access to a smart device as taught by Zhang for the purpose of determining that the first voice component matches the first voiceprint model, and the second voice component matches the second voiceprint model and sending to the mobile phone (i.e., a smart device), to perform a corresponding operation based on the operation instruction, so that the user can control the mobile phone by using a voice (i.e., access to a smart device), [Zhang: 0092]. Regarding claim 3, Fawaz and Zhang further teaches the method of claim 1, further comprising, prior to determining the consistency score, pre- processing the bone conduction signal (Fawaz, Beginning with the pre-processing step, in an example embodiment, the module 112 may re-sample both vibration data and speech signals to the same sampling rate while applying a low-pass filter to prevent aliasing. In some embodiments, the low-pass filter may be 4 Khz, 8 kHz or any other suitable frequency that preserves most of the acoustic features of the speech signals and while reducing the processing load,… algorithm may refine the raw data 320 and 330 by normalizing the magnitude of the data to have a maximum magnitude of unity, which necessitates removal of the spikes in the data. matching algorithm may continue pre-processing by identifying the energy envelope of the vibration data and respective application to the speech signal. Pre-processing of the matching algorithm concludes when the VA module 112 overlays the vibration data envelope to the speech signal so that it removes all portions of the speech signal that did not result from vibrations of the user, [0044-0047]) [Examiner interprets that system preprocessing operations on the vibration data such as low pass filter, normalization, envelope detection performed before the final cross correlation used for authentication as limitation above]; Regarding claim 6, Fawaz and Zhang further teaches the method of claim 1, further comprising, prior to determination of the consistency score and verification, prompting the user to submit an initial or enrollment (1) bone conduction signal and (2) audio signal (Zhang, Fig 4, S403: If being in the wearing state, the Bluetooth headset collects, by using the first voice sensor, a first voice component in the voice information entered by the user, and collects a second voice component in the voice information by using the second voice sensor (i.e. an initial or enrollment (1) bone conduction signal and (2) audio signal ), [0108]. For example, the first voice sensor is an air conduction microphone, and the second voice sensor is a bone conduction microphone. In a process of using the Bluetooth headset, the user may enter voice information “Xiao F, pay by using WeChat”. In this case, because the air conduction microphone is exposed in the air, the Bluetooth headset may receive, by using the air conduction microphone, a vibration signal (in other words, the first voice component in the voice information) generated by air vibration after the user makes a sound, In addition, because the bone conduction microphone can be in contact with an ear bone of the user through the skin, the Bluetooth headset may receive, by using the bone conduction microphone, a vibration signal (in other words, the second voice component in the voice information) generated by vibration of the ear bone and the skin after the user makes a sound, [0109] S405: The mobile phone separately performs voiceprint recognition on the first voice component and the second voice component, to obtain a first voiceprint recognition result corresponding to the first voice component and a second voiceprint recognition result corresponding to the second voice component, [0118]) [ Examiner interprets that collecting first and second voice component (i.e., an initial or enrollment (1) bone conduction signal and (2) audio signal ) by prompting user to enter voice before the performing an initial or enrollment (1) bone conduction signal and (2) audio signal (i.e., determination of a consistency score and verification) as prior to determination of a consistency score and verification, prompting the user to submit an initial or enrollment (1) bone conduction signal and (2) audio signal]. Regarding claim 7, Fawaz and Zhang further teaches the method of claim 6, further comprising: training the AC model with a plurality of responses prior to submission of the initial or enrollment audio signal, and training the AC model with the submitted initial or enrollment audio signal (Zhang, Voiceprint models of one or more authorized users are pre-stored on the mobile phone. Each authorized user has two voiceprint models, one is a first voiceprint model established based on a voice feature of the user collected when the air conduction microphone (in other words, the first voice sensor) works), [0119] and Fig 4, S403: If being in the wearing state, the Bluetooth headset collects, by using the first voice sensor, a first voice component in the voice information entered by the user (i.e., an initial or enrollment audio signal ), [0108]. S405: The mobile phone separately performs voiceprint recognition on the first voice component to obtain a first voiceprint recognition result corresponding to the first voice component [0118] The second phase is a process in which when the user uses a voice control function on the mobile phone for the first time, the first voiceprint model belonging to the user are established by entering the registration voice (i.e., an initial or enrollment audio signal ), [0121]) [Examiner interprets that user having their prestored voice print models on their phone as training the AC model with a plurality of responses prior to submission of an initial or enrollment audio signal and performing voiceprint recognition result after the collection of a first voice component in the voice information entered by the user (i.e. as an initial or enrollment audio signal) as training the AC model with a submitted initial or enrollment audio signal]. Regarding claim 8, Fawaz and Zhang further teaches the method of claim 6, wherein the BC model comprises a convolutional neural network (CNN) (Zhang, The first phase is establishing the first voiceprint model and the second voiceprint model. The first phase is a background model training phase. In the first phase, a background model of voiceprint recognition is established by using a machine learning algorithm such as a GMM (gaussian mixed model, Gaussian mixture model), an SVM (support vector machines, support vector machine), or a deep neural network framework. The mobile phone or the Bluetooth headset may establish, based on the background model and a registration voice entered by a user, a first voiceprint model and a second voiceprint model belonging to the user. The deep neural network framework includes but is not limited to a DNN (deep neural network, deep neural network) algorithm, an RNN (recurrent neural network, recurrent neural network) algorithm, an LSTM (long short-term memory, long short-term memory) algorithm, and the like, [0120]) [In light of specification, Examiner interprets that under Broadest reasonable interpretation BC model being an image classifier based on CNN interprets second voiceprint model comprising DNN as the BC model comprises a convolutional neural network (CNN), see instant application at spec [0056, 0057]]. Regarding claim 9, Fawaz and Zhang further teaches the method of claim 8, further comprising training the CNN with a stored/pre-collected bone conduction dataset (Zhang, The first phase is a background model training phase. In the first phase, a. developer may collect voices of related texts (for example, “Hello, Xiao E”) generated when a large quantity of speakers wearing the Bluetooth headset make a sound. Further, after performing filtering and noise reduction on the voices of the related texts, the mobile phone may extract an audio feature (for example, a time-frequency noise spectrum graph, or a gammatone-like spectrogram) in a background sound, and a background model of voiceprint recognition is established by using a machine learning algorithm such as a GMM (gaussian mixed model, Gaussian mixture model), an SVM (support vector machines, support vector machine), or a deep neural network framework. The mobile phone or the Bluetooth headset may establish, based on the background model and a registration voice entered by a user, a first voiceprint model and a second voiceprint model belonging to the user. The deep neural network framework includes but is not limited to a DNN (deep neural network, deep neural network) algorithm, an RNN (recurrent neural network, recurrent neural network) algorithm, an LSTM (long short-term memory, long short-term memory) algorithm, and the like, [0120]) [ Examiner interprets that collecting voices of related texts generated when a large quantity of speakers wearing the Bluetooth headset make a sound for training a background model (i.e., the second voiceprint model) as training the CNN with a stored/pre-collected bone conduction dataset]. Regarding claim 10, Fawaz and Zhang further teaches the method of claim 1, further comprising, if one or more of the audio signal or the bone conduction signal are not verified, preventing the user from accessing the smart device (Fawaz, If the system has authenticated the voice command, the system may then transmit the voice command to a voice assistant (Block 212). If the speech signals and vibration data are not sufficiently correlated, then the voice command is not authenticated. The system may restart the method each time vibration data is recorded, thus enabling continuous voice authentication and added security for voice assistant devices, [0041]) [Examiner interprets that non authentication of user as preventing the user from accessing the smart device]. Regarding claim 11, Fawaz and Zhang further teaches the method of claim 1, further comprising, prior to determining the consistency score, delaying an earliest received of the bone conduction signal and the audio signal to align the bone conduction signal and the audio signal (Fawaz, the device may include one or more components to process recorded vibrations prior to transmitting the recordings. For example, the device may insert metadata such as timestamps to help the voice authentication module (discussed below) sync the recorded vibrations with recorded speech signals for comparison. Further, the device may edit the recorded vibrations to apply a filter, remove noise, adjust frequency, and/or otherwise prepare the recorded vibrations in a format compatible with other components of the system, [0023] the VA module 112 may synchronize the recorded speech signals with the recorded vibrations. The VA module 112 may perform the synchronization by aligning time shifts of the speech signal and vibration data such that there is maximum cross correlation between both signals. This operation may be vital for any comparisons, as the speech signals and the vibration data may not be received and/or recorded at the same time, [0033] Graphs 340 and 350 of FIG. 3 illustrate normalized speech signals and vibration data, respectively. Further, the graphs 340 and 350 have been aligned by finding the time shift that results in the maximum cross correlation of both graphs. In some embodiments, the graphs 340 and 350 may also be truncated so that the two graphs are on the same signal duration, [0045]) [Examiner interprets that system synchronizing via time shifts (i.e., delaying earlier signal) to align signals to have maximum cross correlation as limitation above]. Regarding claim 19, Claim 19 recite commensurate subject matter as claim 1. Therefore, it is rejected for the same reasons. Except for additional elements: Fawaz further teaches: a non-transitory machine-readable storage medium storing processor-executable instructions that, when executed by at least one processor, cause the at least one processor (Fawaz, The method 200 may include one or more functions and/or routines in the form of non-transitory computer-executable instructions that are stored in a tangible computer-readable storage medium and/or executed using a processor of a computing device (e.g., the client device 130 and/or the server 120), [0035]) to: Regarding claim 20, Fawaz and Zhang further teaches the non-transitory machine-readable storage medium of claim 19, wherein the smart device includes the microphone (Fawaz, The client device 130 (i.e., the smart device) may include a microphone 110 to record speech signals. However, in other embodiments, the microphone 110 may be separate from the client device 130. In these embodiments, the microphone 110 may be configured to transmit recorded speech signals to the client device 130, [0029]) Regarding claim 21, Fawaz and Zhang further teaches the non-transitory machine-readable storage medium of claim 20, wherein the bone conduction signal is received via wireless communication (Fawaz, The device may also be capable of transmitting recorded vibrations through wireless signals, [0021] The system may then transmit the recorded vibration data (Block 206). In one embodiment, the device housing the accelerometer may implement Bluetooth™ technology to transmit the recorded vibration data, [0031]) Regarding claim 22, Fawaz and Zhang further teaches the non-transitory machine-readable storage medium of claim 19, wherein the consistency threshold is based on similarities between bone conduction signals and audio signals that indicate the bone conduction signal and the audio signal are from the user (Fawaz, the VA module 112 may be configured to run various algorithms to determine if the speech signals originated from the user corresponding to the vibration data…the VA module 112 may compare the speech signals to the vibration data to determine if they are correlated. Depending on the criteria of the system, the correlation may need to be within a certain percentage to authenticate the voice command. In other words, if the VA module 112 determines that the speech signals have a high enough correlation to the vibration data, then the speech signals can be attributed to the user and, thus, a voice command corresponding to the speech signals is considered authenticated, [0034] The matching algorithm may include receiving speech signals and vibration data along with the corresponding sampling frequencies. The matching algorithm may conclude by producing a decision value indicating whether there is a match between the speech signals and the vibration data (i.e. authenticate or deny), [0042] The final post-processing step may include measuring the signal similarity between the accelerometer and microphone signals by using the normalized cross correlation. The system analyzes whether the normalized cross correlation has a significant peak and that the maximum value is outside the range [−0.4, 0.4]. This indicates that the two signals are included within each other as shown in the plot 540 of FIG. 5. In this case, the system will conclude that the resulting microphone signal matches the recorded vibration data, [0052]) [Examiner interprets that VA module calculating normalized cross correlation between speech and vibration signal and produces the decision value compared to thresholds which Under BRI numeric similarity/decision value (i.e., consistency score) for the audio signal in relation to the corresponding bone conduction signal and deciding whether the command is originated from the same user based on this value (i.e., indicative of probability) as limitation above]; Claims 12-18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (US 20210256979 A1) in view of Fawaz (US 20180068671 A1). Regarding claim 12, Zhang teaches a method for two-way authentication of a user (Zhang, voice control method to complete a series of operations such as user identity authentication, mobile phone unlocking, and enabling a function of the mobile phone, [0144]), the method comprising: prompting the user to submit initial or enrollment (a) audio signals and (b) bone conduction signals (Zhang, Fig 4, S403: If being in the wearing state, the Bluetooth headset collects, by using the first voice sensor, a first voice component in the voice information entered by the user, and collects a second voice component in the voice information by using the second voice sensor (i.e., an initial or enrollment (1) bone conduction signal and (2) audio signal ), [0108]. For example, the first voice sensor is an air conduction microphone, and the second voice sensor is a bone conduction microphone. In a process of using the Bluetooth headset, the user may enter voice information “Xiao F, pay by using WeChat”. In this case, because the air conduction microphone is exposed in the air, the Bluetooth headset may receive, by using the air conduction microphone, a vibration signal (in other words, the first voice component in the voice information) generated by air vibration after the user makes a sound, In addition, because the bone conduction microphone can be in contact with an ear bone of the user through the skin, the Bluetooth headset may receive, by using the bone conduction microphone, a vibration signal (in other words, the second voice component in the voice information) generated by vibration of the ear bone and the skin after the user makes a sound, [0109]) [ Examiner interprets that collecting first and second voice component (i.e., an initial or enrollment (1) bone conduction signal and (2) audio signal ) by prompting user to enter voice as prompting the user to submit initial or enrollment (a) audio signals and (b) bone conduction signals ]; updating an audio conduction model (AC model) and a bone conduction model (BC model) based on the received initial or enrollment (a) audio signals and (b) bone conduction signals (Zhang, Voiceprint models of one or more authorized users may he pre-stored on the mobile phone. Each authorized user has two voiceprint models, one is a first voiceprint model established based on a voice feature of the user collected when the air conduction microphone (in other words, the first voice sensor) works, and the other is a second voiceprint model established based on a voice feature of the user collected when the bone conduction microphone (in other words, the second voice sensor) works, [0119], S405: The mobile phone separately performs voiceprint recognition on the first voice component and the second voice component, to obtain a first voiceprint recognition result corresponding to the first voice component and a second voiceprint recognition result corresponding to the second voice component, [0118] The second phase is a process in which when the user uses a voice control function on the mobile phone for the first time, the first voiceprint model and the second voiceprint model belonging to the user are established by entering the registration voice(i.e., an initial or enrollment (1) bone conduction signal and (2) audio signal ), [0121]) [Examiner interprets that collecting registration voice and performing voiceprint recognition result after the collection of a first voice component in the voice information entered by the user (i.e. as an initial or enrollment audio signal) as updating an audio conduction model (AC model) and a bone conduction model (BC model) based on the received initial or enrollment (a) audio signals and (b) bone conduction signals]; after reception of the initial or enrollment (a) audio signals and (b) bone conduction signals (Zhang, Fig 4, , S403: If being in the wearing state, the Bluetooth headset collects, by using the first voice sensor, a first voice component in the voice information entered by the user, and collects a second voice component in the voice information by using the second voice sensor (i.e., an initial or enrollment (1) bone conduction signal and (2) audio signal ), [0108]. For example, the first voice sensor is an air conduction microphone, and the second voice sensor is a bone conduction microphone. In a process of using the Bluetooth headset, the user may enter voice information “Xiao F, pay by using WeChat”. In this case, because the air conduction microphone is exposed in the air, the Bluetooth headset may receive, by using the air conduction microphone, a vibration signal (in other words, the first voice component in the voice information) generated by air vibration after the user makes a sound, In addition, because the bone conduction microphone can be in contact with an ear bone of the user through the skin, the Bluetooth headset may receive, by using the bone conduction microphone, a vibration signal (in other words, the second voice component in the voice information) generated by vibration of the ear bone and the skin after the user makes a sound, [0109]); receiving a bone conduction signal from the user via one or more wearable devices (Zhang, Fig 2, the second voice sensor 202 of wearable device 11 collects the voice information sent by the user after bone propagation (i.e., a bone conduction signal), [0071]) Fig 4, S404: The Bluetooth headset sends the second voice component to the mobile phone by using the Bluetooth connection. S405: The mobile phone separately performs voiceprint recognition on the first voice component, to obtain a second voiceprint recognition result corresponding to the second voice component, [0118]) [Examiner interprets that mobile phone receiving second voice component (i.e., a bone conduction signal) from user via wearable device 11 which collects the bone conduction signal for identity authentication as receiving a bone conduction signal from the user via one or more wearable devices]; receiving an audio signal from the user via a microphone, the audio signal corresponding to the bone conduction signal (Zhang, the first voice sensor 201 is an air conduction microphone, and when the user wearing the wearable device 11 speaks, the wearable device 11 may collect, by using the first voice sensor 201, voice information sent by the user after air propagation [0071], Fig 4, S404: The Bluetooth headset sends the first voice component (i.e., audio signal) to the mobile phone by using the Bluetooth connection. S405: The mobile phone separately performs voiceprint recognition on the first voice component to obtain a first voiceprint recognition result corresponding to the first voice component [0118]) [Examiner interprets that mobile phone receiving air conduction signal (i.e., audio signal) from user via microphone which collects the air conduction signal for identity authentication as receiving a bone conduction signal from the user via one or more wearable devices]; determining a consistency score for the audio signal in relation to the corresponding bone conduction signal (Zhang, the Bluetooth headset (i.e., the wearable device 11) separately collect the voice information by using the externally disposed first voice sensor 201 and the internally disposed second voice sensor 202. For example, the voice information collected by the first voice sensor 201 is a first voice component (i.e., the audio signal), and the voice information collected by the second voice sensor 202 is a second voice component (i.e., the bone conduction signal), [0090] , the Bluetooth headset perform voiceprint recognition on the first voice component and the second voice component, to obtain a first voiceprint recognition result corresponding to the first voice component and a second voiceprint recognition result corresponding to the second voice component [0091] and When the first voice component matches the first voiceprint model, and the second voice component matches the second voiceprint model, it indicates that the voice information collected by the Bluetooth headset at this time is entered by the authorized user. For example, the Bluetooth headset calculates, by using a specific algorithm, a first degree of matching between the first voice component and the first voiceprint model and a second degree of matching between the second voice component and the second voiceprint model. A higher matching degree indicates more similarity between the voice component and the corresponding voiceprint model, and a higher possibility that a voicing user is the authorized user, when an average value of the first matching degree and the second matching degree (i.e., the average consistency score of first component and second component) is greater than 80 scores (i.e., consistency threshold) , the Bluetooth headset may determine that the first voice component matches the first voiceprint model, and the second voice component matches the second voiceprint model, [0092]) [Under BRI, Examiner interprets that “consistency score” between two signals need only be a single quantitative measure that reflects how well they agree and computing a first matching degree between the air conduction (i.e. audio) signal and a second matching degree between the bone conduction signal and bone voice print model, then taking an average of those two degrees (i.e., a single consistency score) and comparing it to a threshold as determining a consistency score for the audio signal in relation to the corresponding bone conduction signal]. in response to the consistency score being greater than or equal to a consistency threshold (Zhang, when an average value of the first matching degree and the second matching degree (i.e., the average consistency score of first component and second component) is greater than 80 scores (i.e., consistency threshold), the Bluetooth headset may determine that the first voice component matches the first voiceprint model, and the second voice component matches the second voiceprint model, [0092]): verifying, using an audio conduction model (AC model), that the audio signal is associated with the user (Zhang, When the first voice component matches the first voiceprint model, it indicates that the voice information collected by the Bluetooth headset at this time is entered by the authorized user. A higher matching degree indicates more similarity between the voice component and the corresponding voiceprint model, and a higher possibility that a voicing user is the authorized user. [0092]); verifying, using a bone conduction model (BC model), that the bone conduction signal is associated with the user (Zhang, When the second voice component matches the second voiceprint model, it indicates that the voice information collected by the Bluetooth headset at this time is entered by the authorized user. A higher matching degree indicates more similarity between the voice component and the corresponding voiceprint model, and a higher possibility that a voicing user is the authorized user. [0092]); and in response to verification of the audio signal and the bone conduction signal, enabling, for the user, access to a smart device (Zhang, Bluetooth headset determine that the first voice component matches the first voiceprint model, and the second voice component matches the second voiceprint model and sends to the mobile phone (i.e., a smart device), an operation instruction corresponding to the voice information, for example, an unlock instruction, a power-off instruction, or an instruction for calling a specific contact. In this way, the mobile phone performs a corresponding operation based on the operation instruction, so that the user can control the mobile phone by using a voice (i.e., access to a smart device) , [0092]) [ Examiner interprets that checking if the voice components does not match their voice models to authorize users to perform operation on the mobile phone such as access as in response to verification of the audio signal and the bone conduction signal, enabling, for the user, access to a smart device]; Although an average value of the first matching degree and the second matching degree (i.e., the average consistency score of first component and second component) is greater than 80 scores (i.e., consistency threshold), can be considered as consistency score for the audio signal in relation to the corresponding bone conduction signal under BRI [0092], Zhang does not explicitly teach: receiving an audio signal from the user via a microphone separate from the one or more wearable devices; determining a consistency score for the audio signal in relation to the corresponding bone conduction signal, the consistency score comprising a number indicative of a probability that the audio signal and bone conduction signal originate from the user; in response to the consistency score being greater than or equal to a consistency threshold and in response to a difference in reception time of the bone conduction signal and the audio signal being less than a preselected time interval; However, Fawaz teaches: receiving an audio signal from the user via a microphone separate from the one or more wearable devices (Fawaz, The system collects various data associated with a voice command, such as a speech signal and vibrations of the head, neck and/or chest of a user corresponding to the speech signals. The data is collected from at least one accelerometer and one microphone disposed in one or more devices, For example, the accelerometer may be disposed in a necklace worn by a user and communicatively coupled to a smartphone, including a microphone, that implements a voice assistant. As the user speaks, the wearable device utilizes an accelerometer to record and then transmit the vibration data to the smart phone which is simultaneously recording the speech signals, [0017] The client device 130 may include a microphone 110 to record speech signals. However, in other embodiments, the microphone 110 may be separate from the client device 130, [0029]) [Examiner interprets system having accelerometer in a necklace and microphone in a smart phone (i.e., separate wearable devices) and recording speech signals (i.e., audio signal) while the wearable device recoding vibration simultaneously (i.e., , the audio signal to correspond with the bone conduction signal) as limitation above]; determining a consistency score for the audio signal in relation to the corresponding bone conduction signal, the consistency score comprising a number indicative of a probability that the audio signal and bone conduction signal originate from the user (Fawaz, the VA module 112 may be configured to run various algorithms to determine if the speech signals originated from the user corresponding to the vibration data…the VA module 112 may compare the speech signals to the vibration data to determine if they are correlated. Depending on the criteria of the system, the correlation may need to be within a certain percentage to authenticate the voice command. In other words, if the VA module 112 determines that the speech signals have a high enough correlation to the vibration data, then the speech signals can be attributed to the user and, thus, a voice command corresponding to the speech signals is considered authenticated, [0034] The matching algorithm may include receiving speech signals and vibration data along with the corresponding sampling frequencies. The matching algorithm may conclude by producing a decision value indicating whether there is a match between the speech signals and the vibration data (i.e. authenticate or deny), [0042] The final post-processing step may include measuring the signal similarity between the accelerometer and microphone signals by using the normalized cross correlation. The system analyzes whether the normalized cross correlation has a significant peak and that the maximum value is outside the range [−0.4, 0.4]. This indicates that the two signals are included within each other as shown in the plot 540 of FIG. 5. In this case, the system will conclude that the resulting microphone signal matches the recorded vibration data, [0052]) [Examiner interprets that VA module calculating normalized cross correlation between speech and vibration signal and produces the decision value compared to thresholds which Under BRI numeric similarity/decision value (i.e., consistency score) for the audio signal in relation to the corresponding bone conduction signal and deciding whether the command is originated from the same user based on this value (i.e., indicative of probability) as limitation above]; in response to the consistency score being greater than or equal to a consistency threshold and in response to a difference in reception time of the bone conduction signal and the audio signal being less than a preselected time interval (Fawaz, the VA module 112 may synchronize the recorded speech signals with the recorded vibrations. The VA module 112 may perform the synchronization by aligning time shifts of the speech signal and vibration data such that there is maximum cross correlation between both signals. This operation may be vital for any comparisons, as the speech signals and the vibration data may not be received and/or recorded at the same time, [0033] the system records vibration data via an accelerometer device (Block 202)….At approximately the same time, the system may also record speech signals via a microphone (Block 204). If speech signals and vibration data are recorded at disparate points of time, then clearly the speech signals are disassociated from the vibration data and authentication automatically fails. Thus, the method 200 only continues with speech signals and vibration data which are recorded at substantially the same time, [0036-0037] Comparison of the speech signals to the vibration data may also include performing a correlation analysis to determine the correlation between the two. Acceptable correlation levels between the vibration data and speech signals may depend on a number of factors such as the quality of the recordings, the necessary level of security, usability/wearability, etc. If the speech signals and vibration data are sufficiently correlated, then the speech signal was originated from the user corresponding to the vibration data. Thus, the voice command corresponding to the speech signal is authenticated, [0040] The final post-processing step may include measuring the signal similarity between the accelerometer and microphone signals by using the normalized cross correlation. The system analyzes whether the normalized cross correlation has a significant peak and that the maximum value is outside the range [−0.4, 0.4]. This indicates that the two signals are included within each other as shown in the plot 540 of FIG. 5. In this case, the system will conclude that the resulting microphone signal matches the recorded vibration data, [0052]) [Examiner interprets that system using minimum correlation requirement such as correlation outside [-0.4,0.4] (i.e., consistency score), comparing against the threshold and requiring the signals to be recorded substantially at the same time and failing authentication when recorded at disparate times, alignment via time shifts teaches limitation above]; Therefore, it would have been obvious to PHOSITA before the effective filing date to modify the teaching of Zhang to include a concept of receiving an audio signal from the user via a microphone separate from the one or more wearable devices; determining a consistency score for the audio signal in relation to the corresponding bone conduction signal, the consistency score comprising a number indicative of a probability that the audio signal and bone conduction signal originate from the user; in response to the consistency score being greater than or equal to a consistency threshold and in response to a difference in reception time of the bone conduction signal and the audio signal being less than a preselected time interval as taught by Fawaz for the purpose of performing the synchronization by aligning time shifts of the speech signal and vibration data such that there is maximum cross correlation between both signals [Fawaz:0033], comparing speech signals and vibration data to determine sufficiently correlated, then the speech signal was originated from the user corresponding to the vibration data for authentication [Fawaz:0040] and improving accuracy of the authentication of voice commands by, for example, employing machine learning methods to dynamically learn the cut-off thresholds [Fawaz:0054]. Regarding claim 13, Zhang and Fawaz further teaches the method of claim 12, wherein the initial or enrollment audio signal is received via the microphone (Zhang, Fig 4, S403: If being in the wearing state, the Bluetooth headset collects, by using the first voice sensor, a first voice component in the voice information entered by the user, and collects a second voice component in the voice information by using the second voice sensor (i.e. an initial or enrollment (1) bone conduction signal and (2) audio signal ), [0108]. For example, the first voice sensor is an air conduction microphone, In a process of using the Bluetooth headset, the user may enter voice information “Xiao F, pay by using WeChat”. In this case, because the air conduction microphone is exposed in the air, the Bluetooth headset may receive, by using the air conduction microphone, a vibration signal (in other words, the first voice component in the voice information) generated by air vibration after the user makes a sound, [0109]) [ Examiner interprets that collecting first voice component (i.e., an initial or enrollment audio signal ) by prompting user to enter voice via air conduction microphone as the initial or enrollment audio signal is received via the microphone]. Fawaz further teaches: audio signal is received via the microphone wherein the smart device includes the microphone (Fawaz, The client device 130 (i.e., smart device) may include a microphone 110 to record speech signals. However, in other embodiments, the microphone 110 may be separate from the client device 130. In these embodiments, the microphone 110 may be configured to transmit recorded speech signals to the client device 130, [0029]) The motivation applies as claim 12. Regarding claim 14, Zhang and Fawaz further teaches the method of claim 13, wherein the initial or enrollment bone conduction signals are received from one of one or more wearable devices (Zhang, Fig 2, the second voice sensor 202 of wearable device 11 collects the voice information sent by the user after bone propagation (i.e., a bone conduction signal), [0071]) Fig 4, S404: The Bluetooth headset sends the second voice component to the mobile phone by using the Bluetooth connection. S405: The mobile phone separately performs voiceprint recognition on the first voice component, to obtain a second voiceprint recognition result corresponding to the second voice component, [0118]) [Examiner interprets that mobile phone receiving second voice component (i.e., a bone conduction signal) from user via wearable device 11 which collects the bone conduction signal for identity authentication as receiving a bone conduction signal from the user via one or more wearable devices]. Regarding claim 15, Zhang and Fawaz further teaches the method of claim 12, further comprising prompting the user to submit initial or enrollment (a) audio signals and (b) bone conduction signals for each of the one or more wearable devices (Zhang, Fig 4, S403: If being in the wearing state, the Bluetooth headset collects, by using the first voice sensor, a first voice component in the voice information entered by the user, and collects a second voice component in the voice information by using the second voice sensor (i.e. an initial or enrollment (1) bone conduction signal and (2) audio signal ), [0108]. For example, the first voice sensor is an air conduction microphone, and the second voice sensor is a bone conduction microphone. In a process of using the Bluetooth headset, the user may enter voice information “Xiao F, pay by using WeChat”. In this case, because the air conduction microphone is exposed in the air, the Bluetooth headset may receive, by using the air conduction microphone, a vibration signal (in other words, the first voice component in the voice information) generated by air vibration after the user makes a sound, In addition, because the bone conduction microphone can be in contact with an ear bone of the user through the skin, the Bluetooth headset may receive, by using the bone conduction microphone, a vibration signal (in other words, the second voice component in the voice information) generated by vibration of the ear bone and the skin after the user makes a sound, [0109]) [ Examiner interprets that collecting first and second voice component (i.e., an initial or enrollment (1) bone conduction signal and (2) audio signal ) by prompting user to enter voice as prompting the user to submit an initial or enrollment (1) bone conduction signal and (2) audio signal for each of the one or more wearable devices]. Regarding claim 16, Zhang and Fawaz teaches the method of claim 15, wherein the initial or enrollment (a) audio signals and (b) bone conduction signals include one or more specific phrases (Zhang, In the first phase, a. developer may collect voices of related texts (for example, “Hello, Xiao E”) (i.e. one or more specific phrases) generated when a large quantity of speakers wearing the Bluetooth headset make a sound [0120] and when an authorized user 1 uses, for the first time, a voice assistant APP installed on the mobile phone, the voice assistant APP may prompt the user to wear a Bluetooth headset and say a registration voice “Hello Xiao E”, [0121]) [ In light of specification, Examiner interprets that including specific phrases such as “Hello, Xiao E” for the initial or enrollment (a) audio signals and (b) bone conduction signals and prompting user to say the specific phrase during the identity verification as the initial or enrollment (a) audio signals and (b) bone conduction signals include one or more specific phrases, see instant application at spec [0068-0071]]. Regarding claim 17, Zhang and Fawaz teaches the method of claim 12, wherein the BC model comprises a convolutional neural network (CNN), and wherein the CNN is trained using a stored/pre-collected bone conduction dataset to generate corresponding initial embedded bone conduction vectors (Zhang, The first phase is establishing the first voiceprint model and the second voiceprint model. The first phase is a background model training phase. In the first phase, a developer may collect voices of related texts (for example, “Hello, Xiao E”) generated when a large quantity of speakers wearing the Bluetooth headset make a sound. Further, after performing filtering and noise reduction on the voices of the related texts, the mobile phone may extract an audio feature (for example, a time-frequency noise spectrum graph, or a gammatone-like spectrogram) (i.e., initial embedded bone conduction vectors) in a background sound, a background model of voiceprint recognition is established by using a machine learning algorithm such as a GMM (gaussian mixed model, Gaussian mixture model), an SVM (support vector machines, support vector machine), or a deep neural network framework. The mobile phone or the Bluetooth headset may establish, based on the background model and a registration voice entered by a user, a first voiceprint model and a second voiceprint model belonging to the user. The deep neural network framework includes but is not limited to a DNN (deep neural network, deep neural network) algorithm, an RNN (recurrent neural network, recurrent neural network) algorithm, an LSTM (long short-term memory, long short-term memory) algorithm, and the like, [0120]) [Examiner interprets that extracting audio features from initially collected bone conduction signal and training the back ground model (i.e., BC model based on CNN) during the first phase as the BC model comprises a convolutional neural network (CNN), and wherein the CNN is trained using a stored/pre-collected bone conduction dataset to generate corresponding initial embedded bone conduction vectors]. Regarding claim 18, Zhang and Fawaz further teaches the method of claim 17, further comprising, prior to verification via the BC model, generating embedded bone conduction vectors using the bone conduction signals and the initial embedded bone conduction vectors (Zhang, Fig 4, In step S405, after obtaining the second voice component (i.e. the bone conduction signals) in the voice information, the mobile phone may separately extract an audio feature of the second voice component (i.e., the initial embedded bone conduction vectors), and then match the second voiceprint model of the authorized user 1 with the audio feature of the second voice component (i.e. verification S406), [0123]). Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Fawaz (US 20180068671 A1) in view of Zhang (US 20210256979 A1) in further view of Klemme (US 20210058701 A1) in further view of Lesso (US 20190295554 A1). Regarding claim 2, Fawaz and Zhang teaches the method of claim 1, wherein determination of the consistency score includes: Fawaz and Zhang does not explicitly teach: determining a marginal bone conduction power distribution with respect to time; selecting a time range of interest based on an average of the margin marginal bone conduction power distribution with respect to time; determining a marginal audio conduction and bone conduction power distribution with respect to frequency; selecting a top frequency index M from the marginal bone conduction power distribution with respect to frequency and a top frequency index N from the marginal audio conduction power distribution with respect to frequency; determining a correlation matrix between the top frequency index M and the top frequency index N; generating, based on the correlation matrix, the consistency score user However, Klemme teaches: determining a marginal bone conduction power distribution with respect to time (Klemme, Fig 1, The first signal data 116 and second signal data 118 (i.e., bone conduction signal, [0050] the phase comparator 208 of this example aligns signal data 116, 118 in the time domain over the time period for which the data was collected or one or more segments thereof. For example, the phase comparator 208 aligns a first portion of the first acceleration data 116 beginning at time T.sub.1 and ending at time T.sub.2 with a second portion of the second acceleration data 118 beginning at time T.sub.1 and ending at time T.sub.2.), [0053]) [Examiner interprets that analyzing the bone conduction signal’s amplitude or power across different time intervals as determining a marginal bone conduction power distribution with respect to time); selecting a time range of interest based on an average of the margin marginal bone conduction power distribution with respect to time (Klemme, bone conduction analyzer 130 of FIG. 2 includes signal modifier 212 which provides means for separating sound data and motion data in the signal data 116, 118. The phase identifier 208 determines that the signal data 116, 118 includes sound data (e.g., based on the portions of the time-aligned signal data 116, 118 that are in-phase) and motion data (e.g., based on the portions of the time-aligned signal data 116, 118 that are out-of-phase). The signal modifier 212 of FIG. 2 modifies the signal data 116, 118 to remove or substantially remove the motion data from the signal data, which may represent noise as compared to the sound (e.g., voice) data., [0056]) [ Examiner interprets that identifying periods where bone conduction signals corresponding to active speech and selecting time ranges where bone conduction signals are in-phase based on the time aligned data as selecting a time range of interest based on an average of the margin marginal bone conduction power distribution with respect to time]; determining a marginal audio conduction and bone conduction power distribution with respect to frequency (Klemme, The sensors 106, 108 may measure bone vibrations and/or collect external sound data such as individual(s) speaking to the user 104, a media-playing device (e.g., the user device 126, another device), environmental noise (e.g., car noise, a passing train, airplane noise, crowd noise, etc.). continuously and/or for specific period(s) [0032, 0033]The sound source identifier 218 of FIG. 2 analyzes phase and magnitude differences between the respective signal data 116, 118 collected by the sensors 106, 108 to determine whether the sound data originated from the user 104 or from an external sound source 132, [0065] the frequency domain converter 210 of FIG. 2 can convert the time domain signal data 116, 118 to the frequency domain (e.g., via Fast Fourier Transform). The phase comparator 208 can analyze the signal data 116, 118 in the frequency domain to identify frequencies that are out of phase (i.e., audio from external source) between the signal data 116, 118. In some other examples, the phase comparator 208 identifies phase differences between the signal data in the time domain,[0066]) [ Examiner interprets that using frequency domain converter (i.e., FTT) to analyze signals in the frequency domain and generating power spectra for signals received based on their sound sources (i.e., both external audio and bone conduction) as determining a marginal audio conduction and bone conduction power distribution with respect to frequency]; selecting a top frequency index M from the marginal bone conduction power distribution with respect to frequency and a top frequency index N from the marginal audio conduction power distribution with respect to frequency (Klemme, Fig 2, the source identification rule(s) 220 include a rule that sound (e.g., voice) data generated by the user 104 corresponds to portions of the signal data 116, 117 that are in-phase (i.e. marginal bone conduction power distribution) and have substantially equal magnitude (e.g., within threshold range). The source identification rule(s) 220 includes a rule that sound data generated by the external sound source(s) 132 corresponds to out-of-phase portions of the signal data 116, 118. Another example rule 220 can indicate that portions of the signal data 116, 118 that are out-of-phase (i.e., marginal audio conduction power distribution) and that have substantially unequal magnitudes between the portions represent external sound(s) generated by an external sound source 132 that is disposed proximate to the right of the user 104 or to the left of the user 104. if the signal data generated by the first sensor 106 has larger magnitude than the signal data generated by the second sensor 108, then the external sound source 132 is disposed to the right of the user 104 whereas if the signal data generated by the second sensor 108 has larger magnitude than the signal data generated by the first sensor 106, then the external sound source is disposed 132 to the left of the user 104. Sound data generated by an external sound source 132 that is disposed substantially in front of the user 104 (e.g., substantially in front of the face of the user 104) includes signal data generated by the respective sensors 106, 108 that is in-phase and has substantially equal magnitudes, but the magnitudes are smaller than the signal data generated by the sensors 106, 108 when the user 104 is the source of the sound data (e.g., when the user is speaking), [0069]) [ Examiner interprets that identifying dominant frequencies from FFT results as shown in fig 3 to find in-phase portion (i.e. bone conduction distribution) and out-phase portions (i.e. audio conduction distribution) as selecting a top frequency index M from the marginal bone conduction power distribution with respect to frequency and a top frequency index N from the marginal audio conduction power distribution with respect to frequency]; Therefore, it would have been obvious to PHOSITA before the effective filing date to modify the teaching of Fawaz and Zhang to include a concept of determining a marginal bone conduction power distribution with respect to time; selecting a time range of interest based on an average of the margin marginal bone conduction power distribution with respect to time; determining a marginal audio conduction and bone conduction power distribution with respect to frequency; selecting a top frequency index M from the marginal bone conduction power distribution with respect to frequency and a top frequency index N from the marginal audio conduction power distribution with respect to frequency as taught by Klemme for the purpose of aligning signal data 116, 118 in the time domain over the time period for which the data was collected or one or more segments, analyzing the aligned signal data 116, 118 to identify any phase differences between the signal data 116, 118, comparing the phase to determines if the first and second portions of the signal data 116, 118 are in-phase or out-of-phase [Klemme:0052] and converting the time domain signal data 116, 118 to the frequency domain (e.g., via Fast Fourier Transform), analyzing the signal data 116, 118 in the frequency domain to identify frequencies that are out of phase between the signal data 116, 118 to identify the phase differences between the signal data in the time domain [Klemme:0066]. Fawaz, Zhang, and Klemme does not explicitly teach: determining a correlation matrix between the top frequency index M and the top frequency index N; generating, based on the correlation matrix, the consistency score user; However, Lesso teaches: determining a correlation matrix between the top frequency index M and the top frequency index N (Lesso, the enable module 306 performs a voice-activity detect function on the air-conducted audio signal, so as to detect the presence of audio in the air-conducted audio signal which is characteristic of speech. It generates an output control signal to the biometric module 316 when both the air-conducted audio signal and the bone-conducted audio signal contain a voice. The control signal is generated when portions of the air-conducted audio signal and the bone-conducted audio signal which overlap in time (or are concurrent) both contain a voice [0061] Upon a determination that the bone-conducted audio signal comprises a voice, the enable module 306 cross-correlate the bone-conducted audio signal (and particularly that portion of the bone-conducted audio signal comprising the voice) with the air-conducted audio signal (particularly that portion of the air-conducted audio signal which is concurrent with the portion of the bone-conducted audio signal comprising the voice), to determine a level of correlation between the two signals. Any suitable correlation algorithm may be used, [0062]) [ Examiner interprets that determining relationships or co relation between frequency components between bone conducted signals and audio signals that contains voice or characteristics of speech (i.e., top frequency) as determining a correlation matrix between the top frequency index M and the top frequency index N]; generating, based on the correlation matrix, the consistency score user (Lesso. Upon a determination that the bone-conducted audio signal comprises a voice, the enable module 306 may cross-correlate the bone-conducted audio signal (and particularly that portion of the bone-conducted audio signal comprising the voice) with the air-conducted audio signal (particularly that portion of the air-conducted audio signal which is concurrent with the portion of the bone-conducted audio signal comprising the voice), to determine a level of correlation between the two signals. Any suitable correlation algorithm is used,[0062] a correlation value indicative of the level of correlation between the two signals may be compared to a threshold: if the correlation value exceeds the threshold, it may be determined that the signals correlate; if the correlation value is less the threshold, it may be determined that the signals do not correlate., [0076] and If the audio signals do correlate, the method proceeds to step 412, in which the biometric system determines whether the user is authenticated as the authorized user or not, [0077]) [ Examiner interprets that calculating correlation between bone conduction signals and audio signals and authenticating user if the correlation is higher than correlation threshold (i.e. consistency threshold) as generating, based on the correlation matrix, the consistency score user]. Therefore, it would have been obvious to PHOSITA before the effective filing date to modify the teaching of Fawaz, Zhang, and Klemme to include a concept of determining a correlation matrix between the top frequency index M and the top frequency index N and generating, based on the correlation matrix, the consistency score user taught by Lesso for the purpose of determining whether the air-conducted audio signal and the bone-conducted audio signal correlate with each other [Lesso:0075] to determine whether the user is authenticated as the authorized user or not [Lesso:0077]. Claims 4 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Fawaz (US 20180068671 A1) in view of Zhang (US 20210256979 A1) in further view of Kurihara (US 20230320903 A1). Regarding claim 4, Fawaz and Zhang teaches method of claim 3, wherein pre-processing the bone conduction signal includes passing the bone conduction signal through (1) a low-pass filter to remove noise generated by human motion (Fawaz, Beginning with the pre-processing step, the module 112 may re-sample both vibration data (i.e., the bone conduction signal) and speech signals to the same sampling rate while applying a low-pass filter to prevent aliasing, the low-pass filter may be 4 Khz, 8 kHz or any other suitable frequency that preserves most of the acoustic features of the speech signals and while reducing the processing load. FIG. 3 illustrates raw speech signals 320 and vibration data 330. With regard to the raw vibration data 330, the graph illustrates a high-energy spike due to an abrupt movement of the accelerometer, and low energy components resulting from speech vibrations. [0044]) Fawaz and Zhang does not explicitly teach: a Wiener filter to remove the noise However, Kurihara teaches: a Wiener filter to remove the noise (Kurihara, Filter circuit 23 includes noise removal filter 23a, low-pass filter 23c. Noise removal filter 23a is a filter for removing noise contained in the first sound signal output from microphone 21, for example, a nonlinear digital filter, but may be a filter using a spectral subtraction method that removes noise in a frequency domain. Noise removal filter 23a is a Wiener filter [0037]. Low-pass filter 23c attenuates a component in a band of 512 Hz or more contained in the first sound signal output from microphone 21. The cutoff frequencies is determined empirically or experimentally according to the type of the mobile body in which ear-worn device 20 is expected to be used, [0038]) [ Examiner interprets that low-pass filter 23c removing high frequency noise contained in the first sound signal output from microphone 21and Noise removal filter 23a (i.e., Wiener filter) removing noise as a low-pass filter to remove noise generated by human motion and (2) a Wiener filter to remove the noise]. Therefore, it would have been obvious to PHOSITA before the effective filing date to modify the teaching of Fawaz and Zhang to include a concept of a Wiener filter to remove the noise as taught by Kurihara for the purpose of performing signal processing on the first sound signal output from microphone 21 by undergoing filtering by noise removal filter, [Kurihara:0052]. Regarding claim 5, Fawaz and Zhang teaches the method of claim 1, further comprising, prior to determining the consistency score, passing the audio signal through the filter to remove noise (Fawaz, Beginning with the pre-processing step, the module 112 may re-sample both vibration data and speech signals (i.e., an audio signal) to the same sampling rate while applying a low-pass filter to prevent aliasing, the low-pass filter may be 4 Khz, 8 kHz or any other suitable frequency that preserves most of the acoustic features of the speech signals and while reducing the processing load. FIG. 3 illustrates raw speech signals 320 and vibration data 330. the raw speech signals 320 demonstrate two high-energy segments along with other lower-energy segments corresponding to background noise. [0044]) Fawaz and Zhang does not explicitly teach: the Wiener filter to remove noise However, Kurihara teaches: a Wiener filter to remove noise (Kurihara, Filter circuit 23 includes noise removal filter 23a. Noise removal filter 23a is a filter for removing noise contained in the first sound signal output from microphone 21, for example, a nonlinear digital filter, but may be a filter using a spectral subtraction method that removes noise in a frequency domain. Noise removal filter 23a is a Wiener filter [0037]) Therefore, it would have been obvious to PHOSITA before the effective filing date to modify the teaching of Fawaz and Zhang to include a concept of a Wiener filter to remove noise as taught by Kurihara for the purpose of performing signal processing on the first sound signal output from microphone 21 by undergoing filtering by noise removal filter, [Kurihara:0052]. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 20210192244 A1: “relates to a biometric identification system, and in particular to a system that can be used without requiring specific actions to be taken by the user” US 20150150116 A1: “relates to preventing spoofing attacks for bone conduction applications” US 20220301574 A1: “relates to the field of signal processing, and in particular, to systems, methods, apparatus, and storage medium for processing a vibration signal” US 20210377650 A1: “related to speech recognition systems, such as hands-free computer systems, are disclosed. More particularly, embodiments related to computer systems having intelligent personal assistant agents” Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMIKSHYA POUDEL whose telephone number is (703)756-1540. The examiner can normally be reached 7:30 AM - 5PM Mon- Fri. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SHEWAYE GELAGAY can be reached at (571)272-4219. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /S.N.P./Examiner, Art Unit 2436 /SHEWAYE GELAGAY/Supervisory Patent Examiner, Art Unit 2436
Read full office action

Prosecution Timeline

Show 1 earlier event
Dec 13, 2024
Non-Final Rejection mailed — §103
Mar 11, 2025
Applicant Interview (Telephonic)
Mar 11, 2025
Examiner Interview Summary
Mar 13, 2025
Response Filed
May 28, 2025
Final Rejection mailed — §103
Aug 28, 2025
Request for Continued Examination
Oct 05, 2025
Response after Non-Final Action
Dec 09, 2025
Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12619726
CYBER RESILIENCE INTEGRATED SECURITY INSPECTION SYSTEM (CRISIS) AGAINST FALSE DATA INJECTION ATTACKS
4y 1m to grant Granted May 05, 2026
Patent 12591663
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING COMPUTER PROGRAM PRODUCT
2y 7m to grant Granted Mar 31, 2026
Patent 12470379
LINK ENCRYPTION AND KEY DIVERSIFICATION ON A HARDWARE SECURITY MODULE
3y 0m to grant Granted Nov 11, 2025
Patent 12452254
SECURE SIGNED FILE UPLOAD
3y 6m to grant Granted Oct 21, 2025
Patent 12341788
NETWORK SECURITY SYSTEMS FOR IDENTIFYING ATTEMPTS TO SUBVERT SECURITY WALLS
2y 7m to grant Granted Jun 24, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4
Expected OA Rounds
47%
Grant Probability
99%
With Interview (+81.8%)
2y 9m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 19 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month