DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-6, 14 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kothari et al. US 20240221741 in view of Dusan et al. US 20240292151.
Regarding claim 1, Kothari et al. teach A method comprising: collecting, by a speech signal detection device worn by a first person, a combination of signals comprising electromyograph (EMG) data signals and one or more non-EMG data signals; (Kothari et al. US 20240221741 abstract; paragraphs [0004]-[0013]; [0033]-[0039]; [0047]-[0064] [0066]-[0076]; [0080]-[0084]; [0124]; [0130]; [0131]; [0160]; [0164]-[0165]; figures 1-12;)
FIG. 3 is an example of a wearable device and components which may be used in the control and operation of the wearable device, in accordance with some embodiments of the technology described herein. Wearable device 300 is an ear-hook wearable device as discussed with relation to FIG. 1A, however any suitable embodiment of the wearable device may be used. The wearable device includes sensors to measure signals from the user 301. The sensors include microphone 311 configured to measure sounds associated with voiced speech, EMG sensor 312 configured to measure muscle activity associated with speech, and IMU 313 configured to measure vibrations associated with voiced speech. The wearable device 300 may include greater or fewer sensors in other examples. For example, the wearable device may include a microphone and EMG sensors, just EMG sensors, a microphone and an IMU, EMG sensors and an IMU, and any combination of these sensors and other sensors as discussed herein (Kothari et al. par. 66). The ML models may be trained on speech data recorded using a wearable device such as wearable device 300, or other mobile devices as described herein. The training data may include data recorded from EMG sensors, microphones, and IMUs, and may be tagged with known words, phrases or actions which were spoken or performed during the recordings. In some examples, the level 1 processing 331 may implement a ML model (Kothari et al. par. 80).
According to the cited passages and figures, examiner interprets the data recorded by EMG sensor as EMG data and the data recorded by the microphones as non-EMG data.
the ML model trained to establish a relationship between training signals comprising training EMG data signals and training non-EMG data signals
The second level of processing may be more intensive than the first level. Level 2 may involve using one or more trained machine learning (ML) models 340 stored within the wearable device to analyze the signals. The one or more ML models may be used to determine one or more words or phrases spoken silently or voiced by the user. In some examples, the one or more trained ML models may include a trained neural network, such as a convolutional neural network. The trained neural network may be configured to recognize words or phrases or known actions from the recorded signals. The neural network may be a trained convolutional neural network. The ML models may be trained on speech data recorded using a wearable device such as wearable device 300, or other mobile devices as described herein. The training data may include data recorded from EMG sensors, microphones, and IMUs, and may be tagged with known words, phrases or actions which were spoken or performed during the recordings. In some examples, the level 1 processing 331 may implement a ML model (Kothari et al. par. 80).
According to the cited passages and figures, examiner interprets the data recorded by EMG sensor as EMG data and the data recorded by the microphones as non-EMG data.
Kothari et al. do not explicitly teach processing the combination of signals by a machine learning (ML) model to detect presence of a second person, and ground-truth presence of people data; and controlling operation of the speech signal detection device being worn by the first person in response to detecting the presence of the second person.
Dusan et al. teach processing the combination of signals by a machine learning (ML) model to detect presence of a second person, and ground-truth presence of people data; (Dusan et al. US 20240292151 abstract; paragraphs [0021]-[0024]; [0037]-[0043]; figures 1-15;)
In various operational scenarios in which the user 101 is wearing two media output devices 150 (e.g., implemented as a pair of earbuds), any or all of audio inputs 200, 210, 215, and/or 212 can be received by only one of the two media output devices, equally by both of the media output devices, or at different loudness levels by the two different media output devices. For example, when two media output devices 150 (e.g., a pair of earbuds) are worn in the two ears of a user, the two media output devices are separated by a distance (e.g., the width of the user's head) that can be known or estimated. In one or more implementations, the two media output devices 150 can determine the distance and/or the angular position for the source of each of one or more of the external audio inputs (e.g., the distance and/or angular position of the source of audio input 200 corresponding to the location of the person 202) relative to the locations of the media output devices. In one or more implementations, one or both of the media output device(s) 150 may perform beamforming operations using multiple microphone(s) 152, and/or may perform source separation operations, voice isolation operations, de-noising operations, and/or other audio processing operations to variously enhance, isolate, suppress, or remove, the audio input 200 from the person 202, the audio input 210, the audio input 212, and/or the voice of the user 101 in the microphone signals generated by the microphone(s) 152 in response to these audio inputs (Dusan et al. par. 40). As shown in FIG. 6, the environmental condition detector 302 may be configured to detect and/or identify, based on the audio input (e.g., the audio input obtained by the microphone(s) 152 of the media output device 150) environmental conditions such as a speaker presence, reverberation (reverb), an indoor/outdoor condition, a wind presence condition, and/or an ambient noise presence condition (as examples). For example, the speaker presence condition may indicate the presence of one or more speakers (e.g., people speaking) in the physical environment of the media output device 150, and/or a location, relative to the media output device, of the one or more speakers. In one or more implementations, the speaker presence condition can be detected by providing an audio input to a machine learning model (e.g., a neural network trained as a classifier) that has been trained by adjusting one or more weights and/or other parameters of the machine learning model based on a comparison of training output data (e.g., a speaker presence label indicating whether and/or where a speaker is present in the training audio input) with a training output of the machine learning model generated in response to a training audio input (Dusan et al. par. 63).
According to the cited passages and figures examiner interprets the system processes audio from the physical environment and the machine learning model determines if another person is present.
and controlling operation of the speech signal detection device being worn by the first person in response to detecting the presence of the second person. (Dusan et al. US 20240292151 abstract; paragraphs [0021]-[0024]; [0037]-[0043]; [0052]; [0063]; [0072]; [0079]; figures 1-15;)
In one or more implementations, the media output device 150 may be operable in various operational modes. As examples, the operational modes may include a media output mode (e.g., for outputting audio content such as music, podcasts, etc.), a noise cancellation mode for using the speaker 151 to cancel some or all of the ambient noise in the environment of the media output device 150, a pass-through or transparent mode, a telephony mode, and/or a hearing assistance mode, such as speech enhancement mode. For example, a hearing assistance mode and/or a speech enhancement mode may be configured to enhance speech (e.g., by the user 101 of the media output device or another person 202) in the environment, for output by the speaker 151 (or in an uplink signal from the electronic device 104 to a remote device, such as during a telephone call or audio or video conference). As shown, the media output device 150 may, in some implementations, provide operational mode information that indicates the operational mode of the media output device 150 to the electronic device 104. The electronic device 104 may activate and/or deactivate one or more of the DSPs and/or neural networks 304 based on the operational mode information. (Dusan et al. par. 52). In one example use case, the multi-channel linear prediction block 702 may be used as a reverberation removal block and may be switched off or otherwise deactivated, by the control signals, when a low reverberation environment or an outdoor environment are indicated by the environmental condition information (e.g., the reverb indicator of FIG. 6) generated by the environmental condition detector 302. In another example use case, the voice isolation block 708 may be switched off or otherwise deactivated, by the control signals, when the environmental condition information (e.g., the speaker presence indicator of FIG. 6) indicates that no voices are detected in the audio input (e.g., for a predetermined amount of time). In another example use case, the voice isolation block 708 may be switched on or otherwise activated, by the control signals, when the media output device is in a hearing assistance mode, whether or not the environmental condition information (e.g., the speaker presence indicator of FIG. 6) indicates that voices are detected in the audio input (e.g., for a predetermined amount of time). As another example use case, the de-noising block 706 may be switched off or deactivated, by the control signals, when the environmental condition information (e.g., the ambient noise presence indicator of FIG. 6) indicates low levels of ambient noise in the audio input. As another example use case, the source location beamformer 700 may be switched off or otherwise deactivated when the environmental condition information (e.g., the speaker presence indicator and/or the ambient noise indicator of FIG. 6) indicates that no sources of sound are identified in the physical environment of the media output device (e.g., for a predetermined amount of time). In another example use case, the wind noise suppressor 710 may be switched off or otherwise deactivated when the environmental condition information (e.g., the wind presence indicator of FIG. 6) indicates that wind noise is not present (e.g., for a predetermined amount of time) in the audio input (Dusan et al. par. 72).
According to the cited passages and figures examiner interprets the system activates/deactivates DSP blocks and processing modes based on another speaker presence indicator which is detection of another person is similar to the controlling the operation of the speech detection device in response to detect the presence of the second person.
Therefore, it would have been obviously to one of ordinary skill in the art before the effective filing date of the claim invention to substitute the environmental audio to detect the presence of another person as taught by Dusan et al. reference into the method of Kothari et al. reference and the result of substitution would be predictable for detect the presence of a second person.
Regarding claim 2, the combination of Kothari et al. and Dusan et al. disclose The method of claim 1, wherein the ML model is implemented by an individual device external to the speech signal detection device worn by the first person.
The external device may contain one or more trained ML models 221 and processors 222. The processors may be configured to recognize one or more words or phrases from the signals received from wearable device 210. The processors may then execute one or more actions on the external device based on the determined words or phrases. For example, applications and functions of the wearable device may be controlled, text inputs may be provided to the external device and communication may be supported by the external device using the signals received from the wearable device, among other actions (Kothari et al. par. 62).
Regarding claim 3, the combination of Kothari et al. and Dusan et al. disclose The method of claim 2, further comprising: converting the combination of signals into a digital signature; and transmitting wirelessly the digital signature from the speech signal detection device worn to the individual device.
The sensors 211 may be supported by the wearable device to record signals 202 associated with speech, either silent or voiced, at or near the head, face and/or neck of the user 201. Once recorded, the signals may be sent to a signal processing module 212 of the wearable device 210. The signal processing module 212 may perform one or more operations on the signals including filtering, thresholding, and analog to digital conversion, among other operations (Kothari et al. par. 57). The signal processing module 212 may then pass the signals to one or more processors 213 of the wearable device 210. The processors 213 may perform additional processing on the signals including preprocessing, and digital processing. In addition, the processors may utilize one or more machine learning models 214 stored within the wearable device 210 to process the signals. The machine learning models 214 may be used to perform operations including feature extraction, and downsampling, as well as other processes for recognizing one or more words or phrases from signals 202. The processors 213 may process the signals to determine if the user is speaking silently or voiced and may determine one or more words or phrases from the signals and compare these words or phrases to known commands to determine an action the user wishes to perform. In some examples, additionally the processors may process the signals to determine if the user is preparing to speak (Kothari et al. par. 58). After processing, signals may be sent to communication module 216, which may transmit the signals to one or more external devices or systems. The communication module 216 may perform one or more operations on the processed signals to prepare the signals for transmission to one or more external devices or systems. The signals may be transmitted using one or more modalities, including but not limited to wired connection, Bluetooth, Wi-Fi, cellular network, Ant, Ant+, NFMI and SRW, among other modalities. The signals may be communicated to an external processing device and/or to a server for further processing and/or actions (Kothari et al. par. 59).
According to the cited passages and figures, examiner interprets a transmission via Bluetooth, Wi-Fi or cellular network as a wireless transmission.
Regarding claim 4, the combination of Kothari et al. and Dusan et al. disclose The method of claim 1, wherein the non-EMG data signals represent movement of certain muscles in a face and neck region of the first person, physical movements associated with inner speech, and muscle twitches.
Process 600 begins at step 601, which involves recording signals indicative of facial movement and actions performed by a user, at a face, head or neck of the user. The signals may be recorded using different sensors as described herein. For example, EMG sensors, microphones or IMUs may be used, as described with reference to FIGS. 3, and 4A-D. In some examples, additional sensors may be used, as described herein. The signals may be associated with a user speaking out loud, and/or silently, or may be associated with particular actions performed by the user such as tapping their check, and/or clenching their jaw, among other actions described herein (Kothari et al. par. 124).
Regarding claim 5, the combination of Kothari et al. and Dusan et al. disclose The method of claim 1, wherein the non-EMG data signals comprise at least one of inertial measurement unit (IMU) movement or audio data.
Process 600 begins at step 601, which involves recording signals indicative of facial movement and actions performed by a user, at a face, head or neck of the user. The signals may be recorded using different sensors as described herein. For example, EMG sensors, microphones or IMUs may be used, as described with reference to FIGS. 3, and 4A-D. In some examples, additional sensors may be used, as described herein. The signals may be associated with a user speaking out loud, and/or silently, or may be associated with particular actions performed by the user such as tapping their check, and/or clenching their jaw, among other actions described herein (Kothari et al. par. 124).
Regarding claim 6, the combination of Kothari et al. and Dusan et al. disclose The method of claim 1, wherein the non-EMG data signals are received from at least one of an array of biopotential sensors, motion sensors, sound sensors, or photonic sensors that are independent of the EMG data signals.
The wearable device may comprise a sensor arm 110, supported by the ear hook 116. The sensor arm 110 may contain one or more sensors for recording speech signals from the user 101. The one or more sensors supported by the sensor arm may include EMG electrodes 111 configured to detect EMG signals associated with speech of the user. The EMG electrodes may be configured as an electrode array or may be configured as one or more electrode arrays supported by the sensor arm 110 of the wearable device 100 (Kothari et al. par. 35). Process 600 begins at step 601, which involves recording signals indicative of facial movement and actions performed by a user, at a face, head or neck of the user. The signals may be recorded using different sensors as described herein. For example, EMG sensors, microphones or IMUs may be used, as described with reference to FIGS. 3, and 4A-D. In some examples, additional sensors may be used, as described herein. The signals may be associated with a user speaking out loud, and/or silently, or may be associated with particular actions performed by the user such as tapping their check, and/or clenching their jaw, among other actions described herein (Kothari et al. par. 124).
Regarding claim 14, the combination of Kothari et al. and Dusan et al. disclose The method of claim 1, wherein controlling the operation of the speech signal detection device comprises presenting a notification that indicates presence of the second person.
In various operational scenarios in which the user 101 is wearing two media output devices 150 (e.g., implemented as a pair of earbuds), any or all of audio inputs 200, 210, 215, and/or 212 can be received by only one of the two media output devices, equally by both of the media output devices, or at different loudness levels by the two different media output devices. For example, when two media output devices 150 (e.g., a pair of earbuds) are worn in the two ears of a user, the two media output devices are separated by a distance (e.g., the width of the user's head) that can be known or estimated. In one or more implementations, the two media output devices 150 can determine the distance and/or the angular position for the source of each of one or more of the external audio inputs (e.g., the distance and/or angular position of the source of audio input 200 corresponding to the location of the person 202) relative to the locations of the media output devices. In one or more implementations, one or both of the media output device(s) 150 may perform beamforming operations using multiple microphone(s) 152, and/or may perform source separation operations, voice isolation operations, de-noising operations, and/or other audio processing operations to variously enhance, isolate, suppress, or remove, the audio input 200 from the person 202, the audio input 210, the audio input 212, and/or the voice of the user 101 in the microphone signals generated by the microphone(s) 152 in response to these audio inputs (Dusan et al. par. 40). As shown in FIG. 6, the environmental condition detector 302 may be configured to detect and/or identify, based on the audio input (e.g., the audio input obtained by the microphone(s) 152 of the media output device 150) environmental conditions such as a speaker presence, reverberation (reverb), an indoor/outdoor condition, a wind presence condition, and/or an ambient noise presence condition (as examples). For example, the speaker presence condition may indicate the presence of one or more speakers (e.g., people speaking) in the physical environment of the media output device 150, and/or a location, relative to the media output device, of the one or more speakers. In one or more implementations, the speaker presence condition can be detected by providing an audio input to a machine learning model (e.g., a neural network trained as a classifier) that has been trained by adjusting one or more weights and/or other parameters of the machine learning model based on a comparison of training output data (e.g., a speaker presence label indicating whether and/or where a speaker is present in the training audio input) with a training output of the machine learning model generated in response to a training audio input (Dusan et al. par. 63).
According to the cited passages and figures examiner interprets the system processes audio from the physical environment and the machine learning model determines if another person is present via audio input. Examiner interprets the audio as a notification of the presence of the second person.
Regarding claim 19, Kothari et al. teach A system comprising: at least one processor; and at least one memory component having instructions stored thereon that, when executed by the at least one processor, cause the at least one processor to perform operations comprising: collecting, by a speech signal detection device worn by a first person, a combination of signals comprising electromyograph (EMG) data signals and one or more non-EMG data signals; (Kothari et al. US 20240221741 abstract; paragraphs [0004]-[0013]; [0033]-[0039]; [0047]-[0064] [0066]-[0076]; [0080]-[0084]; [0124]; [0130]; [0131]; [0160]; [0164]-[0165]; figures 1-12;)
FIG. 3 is an example of a wearable device and components which may be used in the control and operation of the wearable device, in accordance with some embodiments of the technology described herein. Wearable device 300 is an ear-hook wearable device as discussed with relation to FIG. 1A, however any suitable embodiment of the wearable device may be used. The wearable device includes sensors to measure signals from the user 301. The sensors include microphone 311 configured to measure sounds associated with voiced speech, EMG sensor 312 configured to measure muscle activity associated with speech, and IMU 313 configured to measure vibrations associated with voiced speech. The wearable device 300 may include greater or fewer sensors in other examples. For example, the wearable device may include a microphone and EMG sensors, just EMG sensors, a microphone and an IMU, EMG sensors and an IMU, and any combination of these sensors and other sensors as discussed herein (Kothari et al. par. 66). The ML models may be trained on speech data recorded using a wearable device such as wearable device 300, or other mobile devices as described herein. The training data may include data recorded from EMG sensors, microphones, and IMUs, and may be tagged with known words, phrases or actions which were spoken or performed during the recordings. In some examples, the level 1 processing 331 may implement a ML model (Kothari et al. par. 80).
According to the cited passages and figures, examiner interprets the data recorded by EMG sensor as EMG data and the data recorded by the microphones as non-EMG data.
the ML model trained to establish a relationship between training signals comprising training EMG data signals and training non-EMG data signals
The second level of processing may be more intensive than the first level. Level 2 may involve using one or more trained machine learning (ML) models 340 stored within the wearable device to analyze the signals. The one or more ML models may be used to determine one or more words or phrases spoken silently or voiced by the user. In some examples, the one or more trained ML models may include a trained neural network, such as a convolutional neural network. The trained neural network may be configured to recognize words or phrases or known actions from the recorded signals. The neural network may be a trained convolutional neural network. The ML models may be trained on speech data recorded using a wearable device such as wearable device 300, or other mobile devices as described herein. The training data may include data recorded from EMG sensors, microphones, and IMUs, and may be tagged with known words, phrases or actions which were spoken or performed during the recordings. In some examples, the level 1 processing 331 may implement a ML model (Kothari et al. par. 80).
According to the cited passages and figures, examiner interprets the data recorded by EMG sensor as EMG data and the data recorded by the microphones as non-EMG data.
Kothari et al. do not explicitly teach processing the combination of signals by a machine learning (ML) model to detect presence of a second person, and ground-truth presence of people data; and controlling operation of the speech signal detection device being worn by the first person in response to detecting the presence of the second person.
Dusan et al. teach processing the combination of signals by a machine learning (ML) model to detect presence of a second person, and ground-truth presence of people data; (Dusan et al. US 20240292151 abstract; paragraphs [0021]-[0024]; [0037]-[0043]; [0052]; [0063]; [0072]; [0079]; figures 1-15;)
In various operational scenarios in which the user 101 is wearing two media output devices 150 (e.g., implemented as a pair of earbuds), any or all of audio inputs 200, 210, 215, and/or 212 can be received by only one of the two media output devices, equally by both of the media output devices, or at different loudness levels by the two different media output devices. For example, when two media output devices 150 (e.g., a pair of earbuds) are worn in the two ears of a user, the two media output devices are separated by a distance (e.g., the width of the user's head) that can be known or estimated. In one or more implementations, the two media output devices 150 can determine the distance and/or the angular position for the source of each of one or more of the external audio inputs (e.g., the distance and/or angular position of the source of audio input 200 corresponding to the location of the person 202) relative to the locations of the media output devices. In one or more implementations, one or both of the media output device(s) 150 may perform beamforming operations using multiple microphone(s) 152, and/or may perform source separation operations, voice isolation operations, de-noising operations, and/or other audio processing operations to variously enhance, isolate, suppress, or remove, the audio input 200 from the person 202, the audio input 210, the audio input 212, and/or the voice of the user 101 in the microphone signals generated by the microphone(s) 152 in response to these audio inputs (Dusan et al. par. 40). As shown in FIG. 6, the environmental condition detector 302 may be configured to detect and/or identify, based on the audio input (e.g., the audio input obtained by the microphone(s) 152 of the media output device 150) environmental conditions such as a speaker presence, reverberation (reverb), an indoor/outdoor condition, a wind presence condition, and/or an ambient noise presence condition (as examples). For example, the speaker presence condition may indicate the presence of one or more speakers (e.g., people speaking) in the physical environment of the media output device 150, and/or a location, relative to the media output device, of the one or more speakers. In one or more implementations, the speaker presence condition can be detected by providing an audio input to a machine learning model (e.g., a neural network trained as a classifier) that has been trained by adjusting one or more weights and/or other parameters of the machine learning model based on a comparison of training output data (e.g., a speaker presence label indicating whether and/or where a speaker is present in the training audio input) with a training output of the machine learning model generated in response to a training audio input (Dusan et al. par. 63).
According to the cited passages and figures examiner interprets the system processes audio from the physical environment and the machine learning model determines if another person is present.
and controlling operation of the speech signal detection device being worn by the first person in response to detecting the presence of the second person.
In one or more implementations, the media output device 150 may be operable in various operational modes. As examples, the operational modes may include a media output mode (e.g., for outputting audio content such as music, podcasts, etc.), a noise cancellation mode for using the speaker 151 to cancel some or all of the ambient noise in the environment of the media output device 150, a pass-through or transparent mode, a telephony mode, and/or a hearing assistance mode, such as speech enhancement mode. For example, a hearing assistance mode and/or a speech enhancement mode may be configured to enhance speech (e.g., by the user 101 of the media output device or another person 202) in the environment, for output by the speaker 151 (or in an uplink signal from the electronic device 104 to a remote device, such as during a telephone call or audio or video conference). As shown, the media output device 150 may, in some implementations, provide operational mode information that indicates the operational mode of the media output device 150 to the electronic device 104. The electronic device 104 may activate and/or deactivate one or more of the DSPs and/or neural networks 304 based on the operational mode information. (Dusan et al. par. 52). In one example use case, the multi-channel linear prediction block 702 may be used as a reverberation removal block and may be switched off or otherwise deactivated, by the control signals, when a low reverberation environment or an outdoor environment are indicated by the environmental condition information (e.g., the reverb indicator of FIG. 6) generated by the environmental condition detector 302. In another example use case, the voice isolation block 708 may be switched off or otherwise deactivated, by the control signals, when the environmental condition information (e.g., the speaker presence indicator of FIG. 6) indicates that no voices are detected in the audio input (e.g., for a predetermined amount of time). In another example use case, the voice isolation block 708 may be switched on or otherwise activated, by the control signals, when the media output device is in a hearing assistance mode, whether or not the environmental condition information (e.g., the speaker presence indicator of FIG. 6) indicates that voices are detected in the audio input (e.g., for a predetermined amount of time). As another example use case, the de-noising block 706 may be switched off or deactivated, by the control signals, when the environmental condition information (e.g., the ambient noise presence indicator of FIG. 6) indicates low levels of ambient noise in the audio input. As another example use case, the source location beamformer 700 may be switched off or otherwise deactivated when the environmental condition information (e.g., the speaker presence indicator and/or the ambient noise indicator of FIG. 6) indicates that no sources of sound are identified in the physical environment of the media output device (e.g., for a predetermined amount of time). In another example use case, the wind noise suppressor 710 may be switched off or otherwise deactivated when the environmental condition information (e.g., the wind presence indicator of FIG. 6) indicates that wind noise is not present (e.g., for a predetermined amount of time) in the audio input (Dusan et al. par. 72).
According to the cited passages and figures examiner interprets the system activates/deactivates DSP blocks and processing modes based on another speaker presence indicator which is detection of another person is similar to the controlling the operation of the speech detection device in response to detect the presence of the second person.
Therefore, it would have been obviously to one of ordinary skill in the art before the effective filing date of the claim invention to substitute the environmental audio to detect the presence of another person as taught by Dusan et al. reference into the system of Kothari et al. reference and the result of substitution would be predictable for detect the presence of a second person.
Regarding claim 20, Kothari et al. teach A non-transitory computer-readable storage medium having stored thereon instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising: collecting, by a speech signal detection device worn by a first person, a combination of signals comprising electromyograph (EMG) data signals and one or more non-EMG data signals; (Kothari et al. US 20240221741 abstract; paragraphs [0004]-[0013]; [0033]-[0039]; [0047]-[0064] [0066]-[0076]; [0080]-[0084]; [0124]; [0130]; [0131]; [0160]; [0164]-[0165]; figures 1-12;)
FIG. 3 is an example of a wearable device and components which may be used in the control and operation of the wearable device, in accordance with some embodiments of the technology described herein. Wearable device 300 is an ear-hook wearable device as discussed with relation to FIG. 1A, however any suitable embodiment of the wearable device may be used. The wearable device includes sensors to measure signals from the user 301. The sensors include microphone 311 configured to measure sounds associated with voiced speech, EMG sensor 312 configured to measure muscle activity associated with speech, and IMU 313 configured to measure vibrations associated with voiced speech. The wearable device 300 may include greater or fewer sensors in other examples. For example, the wearable device may include a microphone and EMG sensors, just EMG sensors, a microphone and an IMU, EMG sensors and an IMU, and any combination of these sensors and other sensors as discussed herein (Kothari et al. par. 66). The ML models may be trained on speech data recorded using a wearable device such as wearable device 300, or other mobile devices as described herein. The training data may include data recorded from EMG sensors, microphones, and IMUs, and may be tagged with known words, phrases or actions which were spoken or performed during the recordings. In some examples, the level 1 processing 331 may implement a ML model (Kothari et al. par. 80).
According to the cited passages and figures, examiner interprets the data recorded by EMG sensor as EMG data and the data recorded by the microphones as non-EMG data.
the ML model trained to establish a relationship between training signals comprising training EMG data signals and training non-EMG data signals
The second level of processing may be more intensive than the first level. Level 2 may involve using one or more trained machine learning (ML) models 340 stored within the wearable device to analyze the signals. The one or more ML models may be used to determine one or more words or phrases spoken silently or voiced by the user. In some examples, the one or more trained ML models may include a trained neural network, such as a convolutional neural network. The trained neural network may be configured to recognize words or phrases or known actions from the recorded signals. The neural network may be a trained convolutional neural network. The ML models may be trained on speech data recorded using a wearable device such as wearable device 300, or other mobile devices as described herein. The training data may include data recorded from EMG sensors, microphones, and IMUs, and may be tagged with known words, phrases or actions which were spoken or performed during the recordings. In some examples, the level 1 processing 331 may implement a ML model (Kothari et al. par. 80).
According to the cited passages and figures, examiner interprets the data recorded by EMG sensor as EMG data and the data recorded by the microphones as non-EMG data.
Kothari et al. do not explicitly teach processing the combination of signals by a machine learning (ML) model to detect presence of a second person, and ground-truth presence of people data; and controlling operation of the speech signal detection device being worn by the first person in response to detecting the presence of the second person.
Dusan et al. teach processing the combination of signals by a machine learning (ML) model to detect presence of a second person, and ground-truth presence of people data; (Dusan et al. US 20240292151 abstract; paragraphs [0021]-[0024]; [0037]-[0043]; [0052]; [0063]; [0072]; [0079]; figures 1-15;)
In various operational scenarios in which the user 101 is wearing two media output devices 150 (e.g., implemented as a pair of earbuds), any or all of audio inputs 200, 210, 215, and/or 212 can be received by only one of the two media output devices, equally by both of the media output devices, or at different loudness levels by the two different media output devices. For example, when two media output devices 150 (e.g., a pair of earbuds) are worn in the two ears of a user, the two media output devices are separated by a distance (e.g., the width of the user's head) that can be known or estimated. In one or more implementations, the two media output devices 150 can determine the distance and/or the angular position for the source of each of one or more of the external audio inputs (e.g., the distance and/or angular position of the source of audio input 200 corresponding to the location of the person 202) relative to the locations of the media output devices. In one or more implementations, one or both of the media output device(s) 150 may perform beamforming operations using multiple microphone(s) 152, and/or may perform source separation operations, voice isolation operations, de-noising operations, and/or other audio processing operations to variously enhance, isolate, suppress, or remove, the audio input 200 from the person 202, the audio input 210, the audio input 212, and/or the voice of the user 101 in the microphone signals generated by the microphone(s) 152 in response to these audio inputs (Dusan et al. par. 40). As shown in FIG. 6, the environmental condition detector 302 may be configured to detect and/or identify, based on the audio input (e.g., the audio input obtained by the microphone(s) 152 of the media output device 150) environmental conditions such as a speaker presence, reverberation (reverb), an indoor/outdoor condition, a wind presence condition, and/or an ambient noise presence condition (as examples). For example, the speaker presence condition may indicate the presence of one or more speakers (e.g., people speaking) in the physical environment of the media output device 150, and/or a location, relative to the media output device, of the one or more speakers. In one or more implementations, the speaker presence condition can be detected by providing an audio input to a machine learning model (e.g., a neural network trained as a classifier) that has been trained by adjusting one or more weights and/or other parameters of the machine learning model based on a comparison of training output data (e.g., a speaker presence label indicating whether and/or where a speaker is present in the training audio input) with a training output of the machine learning model generated in response to a training audio input (Dusan et al. par. 63).
According to the cited passages and figures examiner interprets the system processes audio from the physical environment and the machine learning model determines if another person is present.
and controlling operation of the speech signal detection device being worn by the first person in response to detecting the presence of the second person.
In one or more implementations, the media output device 150 may be operable in various operational modes. As examples, the operational modes may include a media output mode (e.g., for outputting audio content such as music, podcasts, etc.), a noise cancellation mode for using the speaker 151 to cancel some or all of the ambient noise in the environment of the media output device 150, a pass-through or transparent mode, a telephony mode, and/or a hearing assistance mode, such as speech enhancement mode. For example, a hearing assistance mode and/or a speech enhancement mode may be configured to enhance speech (e.g., by the user 101 of the media output device or another person 202) in the environment, for output by the speaker 151 (or in an uplink signal from the electronic device 104 to a remote device, such as during a telephone call or audio or video conference). As shown, the media output device 150 may, in some implementations, provide operational mode information that indicates the operational mode of the media output device 150 to the electronic device 104. The electronic device 104 may activate and/or deactivate one or more of the DSPs and/or neural networks 304 based on the operational mode information. (Dusan et al. par. 52). In one example use case, the multi-channel linear prediction block 702 may be used as a reverberation removal block and may be switched off or otherwise deactivated, by the control signals, when a low reverberation environment or an outdoor environment are indicated by the environmental condition information (e.g., the reverb indicator of FIG. 6) generated by the environmental condition detector 302. In another example use case, the voice isolation block 708 may be switched off or otherwise deactivated, by the control signals, when the environmental condition information (e.g., the speaker presence indicator of FIG. 6) indicates that no voices are detected in the audio input (e.g., for a predetermined amount of time). In another example use case, the voice isolation block 708 may be switched on or otherwise activated, by the control signals, when the media output device is in a hearing assistance mode, whether or not the environmental condition information (e.g., the speaker presence indicator of FIG. 6) indicates that voices are detected in the audio input (e.g., for a predetermined amount of time). As another example use case, the de-noising block 706 may be switched off or deactivated, by the control signals, when the environmental condition information (e.g., the ambient noise presence indicator of FIG. 6) indicates low levels of ambient noise in the audio input. As another example use case, the source location beamformer 700 may be switched off or otherwise deactivated when the environmental condition information (e.g., the speaker presence indicator and/or the ambient noise indicator of FIG. 6) indicates that no sources of sound are identified in the physical environment of the media output device (e.g., for a predetermined amount of time). In another example use case, the wind noise suppressor 710 may be switched off or otherwise deactivated when the environmental condition information (e.g., the wind presence indicator of FIG. 6) indicates that wind noise is not present (e.g., for a predetermined amount of time) in the audio input (Dusan et al. par. 72).
According to the cited passages and figures examiner interprets the system activates/deactivates DSP blocks and processing modes based on another speaker presence indicator which is detection of another person is similar to the controlling the operation of the speech detection device in response to detect the presence of the second person.
Therefore, it would have been obviously to one of ordinary skill in the art before the effective filing date of the claim invention to substitute the environmental audio to detect the presence of another person as taught by Dusan et al. reference into the system of Kothari et al. reference and the result of substitution would be predictable for detect the presence of a second person.
Claims 7 and 10-13 are rejected under 35 U.S.C. 103 as being unpatentable over Kothari et al. US 20240221741 in view of Dusan et al. US 20240292151 and further in view of Campman US 20080061962.
Regarding claim 7, the combination of Kothari et al. and Dusan et al. teach all the limitation of the claim 1.
The combination of Kothari et al. and Dusan et al. do not explicitly teach The method of claim 1, wherein the second person radiates radio frequency (RF) signals, the RF signals radiated by the second person being received by the speech signal detection device as at least one of the EMG data signals or the one or more non-EMG data signals.
Campman teaches The method of claim 1, wherein the second person radiates radio frequency (RF) signals, the RF signals radiated by the second person being received by the speech signal detection device as at least one of the EMG data signals or the one or more non-EMG data signals. (Campman US 20080061962 abstract; paragraphs [0021]-[0027]; [0033]-[0037]; [0054]-[0058]; figures 1-6;)
The system operates by deploying passive-infrared activated, low-power RF transmitters, referred to as locator transponders, at various locations throughout a building where personnel will pass by or through, such as a doorway or hallway. These locator transponders contain an ability to detect the presence of a person by various sensing means such as, in this instance, passive-infrared radiation from the person's body heat. Other detection methods can also be used such as ultrasonic, RF-field, magnetic field, capacitive-sense, visible light disturbance, pressure floor mat or other sensors that indicate a person's presence (Campman par. 22). Accordingly, it is an object of the invention to provide an apparatus including a locator-transmitter device having a sensor input including but not limited to a passive infrared detector, mechanical or electronic switch input, ultrasonic sonar sensor, optical sensor, radio-frequency (RF) field sensor or other sensor for detecting the presence of a person or object and which contains an adjustable and selectable means for controlling the radiated RF transmitter power output to limit the propagation of its detectable radiated RF signal from a range of several inches to several hundred feet, and a settable unique identity code contained within its emitted RF signal to identify the device and its emitted RF power output level (Campman par. 27).
Therefore, it would have been obviously to one of ordinary skill in the art before the effective filing date of the claim invention to substitute the radio-frequency (RF) field sensor and RF radiation as taught by Campman reference into the method of Kothari et al. and Dusan et al. reference and the result of substitution would be predictable for detect the presence of any object within RF signal range.
Regarding claim 10, the combination of Kothari et al., Dusan et al. and Campman disclose The method of claim 7, further comprising: determining that a portion of the combination of signals result from at least one of noise sources associated with the first person or ambient noise associated with the first person;
Accordingly, the inventors have developed new technologies for silent speech devices allowing the devices to automatically adapt to the current actions of the wearer, allowing for continued interaction with mobile devices, smart devices, communication systems and interactive systems. In some embodiments, the techniques may include a wearable device configured to recognize speech signals of a user including electrical signals indicative of a user's facial muscle movement when the user is speaking (e.g., silently or with voice), motion signals associated with the movement of a wearer's face, vibration signals associated with voiced speech, and/or audio signals, and change its operation in response to the signals. In some examples, the wearable device may additionally or alternatively measure a position of a user's tongue, blood flow of the user, muscle strain of the user, muscle frequencies of the user, temperatures of the user, and magnetic fields of the user, among other signals. Any such signal may be used with the technologies described herein (Kothari et al. par. 33).
and filtering out the portion of the combination of signals from the combination of signals to generate filtered signals, wherein the ML model processes the filtered signals to detect presence of the second person.
As shown in FIG. 6, the environmental condition detector 302 may be configured to detect and/or identify, based on the audio input (e.g., the audio input obtained by the microphone(s) 152 of the media output device 150) environmental conditions such as a speaker presence, reverberation (reverb), an indoor/outdoor condition, a wind presence condition, and/or an ambient noise presence condition (as examples). For example, the speaker presence condition may indicate the presence of one or more speakers (e.g., people speaking) in the physical environment of the media output device 150, and/or a location, relative to the media output device, of the one or more speakers. In one or more implementations, the speaker presence condition can be detected by providing an audio input to a machine learning model (e.g., a neural network trained as a classifier) that has been trained by adjusting one or more weights and/or other parameters of the machine learning model based on a comparison of training output data (e.g., a speaker presence label indicating whether and/or where a speaker is present in the training audio input) with a training output of the machine learning model generated in response to a training audio input (Dusan et al. par. 63). In one or more implementations, the media output device 150 may also, or alternatively, include an echo canceller 800 that cancels an output of a speaker of the media output device that is received as part of the audio input to the microphone(s) 152 and/or the motion sensor 307), before the microphone signals and/or sensor signals are provided to the encoder 502 and/or the mixer 821. As shown, in one or more use cases, a downlink signal 815 from a remote device participating in a call or conference with the electronic device 104 may also be provided to the noise suppressor/post-filter 816, the echo canceller 810, and/or the echo canceller 800 (e.g., and also may be provided for output by a speaker of the media output device 150) (Dusan et al. par. 79).
Regarding claim 11, the combination of Kothari et al., Dusan et al. and Campman disclose The method of claim 10, wherein the noise sources associated with the first person are generated in response to muscle activity performed by the first person.
Accordingly, the inventors have developed new technologies for silent speech devices allowing the devices to automatically adapt to the current actions of the wearer, allowing for continued interaction with mobile devices, smart devices, communication systems and interactive systems. In some embodiments, the techniques may include a wearable device configured to recognize speech signals of a user including electrical signals indicative of a user's facial muscle movement when the user is speaking (e.g., silently or with voice), motion signals associated with the movement of a wearer's face, vibration signals associated with voiced speech, and/or audio signals, and change its operation in response to the signals. In some examples, the wearable device may additionally or alternatively measure a position of a user's tongue, blood flow of the user, muscle strain of the user, muscle frequencies of the user, temperatures of the user, and magnetic fields of the user, among other signals. Any such signal may be used with the technologies described herein (Kothari et al. par. 33).
Regarding claim 12, the combination of Kothari et al., Dusan et al. and Campman disclose The method of claim 10, further comprising: determining that muscle movements of the first person were performed at a particular point in time based on the combination of signals; and selecting, as the portion of the combination of signals to be filtered out,
The sensor arm may support additional sensors 112. The additional sensors 112 may include a microphone for recording voiced or whispered speech, and an accelerometer or IMU for recording vibrations associated with speech such as glottal vibrations produced during voiced speech. In some examples the IMU may additionally or alternatively be used to measure facial movements. In some examples the wearable device includes multiple IMUs including at least one IMU configured to measure vibrations associated with speech and at least one IMU configured to measure facial movements. In some examples IMUs may be filtered at different frequencies, depending on whether they are measuring speech vibrations or facial motion. For example, IMU filtering at a lower frequency, for example 5-50 Hz, may measure facial motion related to speech and IMU filtering at a higher frequency, for example 100+Hz, may measure vibrations associated with speech. The additional sensors 112 may include sensors configured to measure a position of a user's tongue, blood flow of the user, muscle strain of the user, muscle frequencies of the user, temperatures of the user, and magnetic fields of the user, among other signals. The additional sensors 112 may include photoplethysmogram sensors, photodiodes, optical sensors, laser doppler imaging, mechanomyography sensors, sonomyography sensors, ultrasound sensors, infrared sensors, functional near-infrared spectroscopy (fNIRS) sensors, capacitive sensors, electroglottography sensors, electroencephalogram (EEG) sensors, and magnetoencephalography (MEG) sensors, among other sensors (Kothari et al. par. 38). The sensors 211 may be supported by the wearable device to record signals 202 associated with speech, either silent or voiced, at or near the head, face and/or neck of the user 201. Once recorded, the signals may be sent to a signal processing module 212 of the wearable device 210. The signal processing module 212 may perform one or more operations on the signals including filtering, thresholding, and analog to digital conversion, among other operations (Kothari et al. par. 57).
a subset of the combination of signals that were collected at the particular point in time.
In some examples, the level 1 processing 331 involves processing fewer signals than the level 2 processing. For example, the level 1 processing may process signals from a subset of the sensors on the device, such as processing only the signals recorded from the microphone 311 and EMG sensors 312, while the level 2 processing may additionally process the signals from the IMU 313, and/or any other sensors on the wearable device 300. In some examples, the sensors of the mobile device may include multiple channels or multiple sensors. For example, the microphone 311 may include multiple microphones, such as at least 2, at least 3, at least 4 at least 5 or at least 10 microphones. Additionally, the EMG sensor 312 may include multiple electrodes capable of recording EMG signals from the user, for example the EMG sensor may include at least 5, at least 10, at least 20 or at least 50 electrodes. During the level 1 processing 331, a subset of the individual sensors of each of the sensors may be analyzed. For example, if the EMG sensor 312 includes 10 electrodes, 3 of the electrodes may be used in the level 1 processing 331. The level 2 processing 332 may analyze signals from a larger subset of individual sensors of each sensor than the level 1 processing 331 or may analyze signals from all of the individual sensors (Kothari et al. par. 82). Although embodiments of dividing training data into target domain training data and source domain training data are shown in FIG. 12, in other variations, the speech model may optionally be trained using training data that includes different measurement modalities such as described above and further herein. In some embodiments, a subset of the modalities may be selected (e.g., for a training iteration, for a set of measurements, for a training subject, etc.). For example, the speech model may be initially trained using audio signals and EMG signals labeled with speech labels. In subsequent training iterations, only EMG signals and no audio signals are used (Kothari et al. par. 160).
Regarding claim 13, the combination of Kothari et al., Dusan et al. and Campman disclose The method of claim 1, wherein the second person is outside of a field of view of the first person and is within 10 meters of the first person, and wherein the second person is behind the first person or is behind a wall of a room. (Campman US 20080061962 abstract; paragraphs [0021]-[0027]; [0033]-[0037]; [0054]-[0058]; figures 1-6;)
The system operates by deploying passive-infrared activated, low-power RF transmitters, referred to as locator transponders, at various locations throughout a building where personnel will pass by or through, such as a doorway or hallway. These locator transponders contain an ability to detect the presence of a person by various sensing means such as, in this instance, passive-infrared radiation from the person's body heat. Other detection methods can also be used such as ultrasonic, RF-field, magnetic field, capacitive-sense, visible light disturbance, pressure floor mat or other sensors that indicate a person's presence (Campman par. 22). Accordingly, it is an object of the invention to provide an apparatus including a locator-transmitter device having a sensor input including but not limited to a passive infrared detector, mechanical or electronic switch input, ultrasonic sonar sensor, optical sensor, radio-frequency (RF) field sensor or other sensor for detecting the presence of a person or object and which contains an adjustable and selectable means for controlling the radiated RF transmitter power output to limit the propagation of its detectable radiated RF signal from a range of several inches to several hundred feet, and a settable unique identity code contained within its emitted RF signal to identify the device and its emitted RF power output level (Campman par. 27).
According to the cited passages and figures, the system can detect the radiated RF signal from the range of a several inches to several hundred feet. Therefore the object within 10 meters is more likely in the range of cover and it’s known in the art the radio frequency can penetrate through barrier like drywall, wood, plastic, glass or thin brick.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Kothari et al. US 20240221741, in view of Dusan et al. US 20240292151, in view of Campman US 20080061962 and further in view of Mower US 5111815.
Regarding claim 8, the combination of Kothari et al., Dusan et al. and Campman teach all the limitation in the claim 7.
The combination of Kothari et al., Dusan et al. and Campman do not explicitly teach The method of claim 7, wherein the second person radiates the RF signals in response to power line noise.
Mower teaches The method of claim 7, wherein the second person radiates the RF signals in response to power line noise. (Mower US 5111815 abstract; figures 1-3)
Referring again to FIG. 1, the sense amplifier 16, which advantageously includes an automatic gain control and band pass filter, receives information from the neurosensor 12. Even though the neurosignal from the carotid sinus nerve is constant, some long term drift in signal amplitude from the nerve will occur. This is due to changes in the nerve tissue and changes in the electrode and nerve fiber interface. The automatic gain control will maintain a constant output level of the amplifier in the presence of long term drift. Amplifier 16 may also include a band pass filter to reject noise which may be present in the nerve signal. The noise may include biologic noise such as action potentials from other nerve fibers as well as electrical signals caused by contraction of muscles in the area of the nerve electrode. The noise may also include external signals such as power line noise or radio frequency coupled into the body. The band pass filter incorporated in amplifier 16 may typically have a low frequency cutoff of 300 hertz to eliminate biologically induced signals and line power noise signals, and a high frequency cutoff of 5000 hertz to eliminate radio frequency noise. Amplifier 16 may be constructed according to well known techniques and electronic design rules (Mower col. 3 lines 63-68; col. 4 lines 1-18).
Therefore, it would have been obviously to one of ordinary skill in the art before the effective filing date of the claim invention to substitute the power line noise signals or radio frequency coupled into the body as taught by Mower reference into the method of Kothari et al., Dusan et al. and Campman reference and the result of substitution would be predictable for detect the presence of any object within RF signal range.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Kothari et al. US 20240221741, in view of Dusan et al. US 20240292151, in view of Campman US 20080061962, in view of Mower US 5111815 and further in view of Colachis et al. US 20210382554.
Regarding claim 9, the combination of Kothari et al., Dusan et al., Campman and Mower teach all the limitation in the claim 8.
The combination of Kothari et al., Dusan et al., Campman and Mower do not explicitly teach The method of claim 8, wherein the second person radiates the RF signals in response to muscle activity of the second person.
Colachis et al. teach The method of claim 8, wherein the second person radiates the RF signals in response to muscle activity of the second person. (Colachis et al. US 20210382554 abstract; paragraphs [0007]-[0016]; [0057]-[0066]; figures 1-5;)
With reference to FIG. 4, in some applications, non-verbal communications radios 10 are used to in a training setting or in motor skill transfer, to convey information from one user to another user. For example, in a multiplayer gaming setting, gamers can help each other using EMG or some motion sensing acquired by the garment 12, and the sensation is transferred to another gamer via their garment 12 to guide them toward better performance. This provides hints using somatosensory stimulation provided by the garment 12. As an example, FIG. 4 depicts a first user 80 (also sometimes referred to herein as an “expert”) who performs a first motor action (e.g., a controller operation on a gaming controller, or a golf swing in the case of a golfing setting, or some other action performed during a sports activity or so forth. The expert 80 wears an expert wearable device 10 comprising the non-verbal communications radio 10 (this instance of the radio 10 is also sometimes referred to herein as an “expert wearable device”). As the first user 80 performs the action, in an operation 82 EMG signals generated by the relevant muscles are recorded by the high-density array of electrodes 14 of the expert wearable device 10. In an optional operation 84, the recorded EMG may be scaled in intensity, spectrally filtered, or otherwise processed to provide a somatosensation pattern. The recorded EMG either as-recorded (i.e. directly from step 82) or after the optional processing 84, then serves as the message to be transmitted to a second instance of the non-verbal communications radio 10 worn by a second user 90 (This instance of the non-verbal communications radio 10 is sometimes referred to herein as a “learner wearable device). Note that in the embodiment of FIG. 4, the semantic database 40 of FIG. 1 is not used; rather, the recorded or recorded-and-processed EMG serves as the message. This signal is then transmitted from the expert wearable device to the learner wearable device as a radio signal encoding the outgoing recorded or recorded-and-processed EMG. The signal is transmitted using the radio transceiver 26 of the learner wearable device 10 of the expert 80 to the learner wearable device 10 of the learner 90. (This radio transmission is diagrammatically indicated by dotted connecting arrow 86). At the second user, in an operation 92 the received radio signal is processed (e.g. demodulated) to extract the signal (which again is the EMG of the first user 80 recorded at operation 82 and optionally processed at operation 84) and is applied to the corresponding muscles of the second user 90 via the high-density array of electrodes 14 of the second user's non-verbal communications radio 10 (Colachis et al. par. 57).
Therefore, it would have been obviously to one of ordinary skill in the art before the effective filing date of the claim invention to substitute the radio signal encoding the outgoing recorded or recorded-and-processed EMG as taught by Colachis et al. reference into the method of Kothari et al., Dusan et al., Campman and Mower reference and the result of substitution would be predictable for communication between two objects.
Claims 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Kothari et al. US 20240221741 in view of Dusan et al. US 20240292151 and further in view of Geisert et al. US 20230259194.
Regarding claim 15, the combination of Kothari et al. and Dusan et al. teach all the limitation in the claim 14.
The combination of Kothari et al. and Dusan et al. do not explicitly teach The method of claim 14, wherein the notification that indicates a direction along which the second person is moving.
Geisert et al. teach The method of claim 14, wherein the notification that indicates a direction along which the second person is moving. (Geisert et al. US 20230259194 abstract; paragraphs [0002]-[0006]; [0025]-[0030]; [0044]-[0047]; figures 1-14)
FIGS. 8A-D illustrate example views of a second user 102b wearing a second VR display device 135b with a direction of movement 175 approaching a first user 102a wearing a first VR display device 135a. Based on the second user 102b direction of movement 175 and approach toward the first user 102a, the VR display device 135a may display a proximity warning 180. The proximity warning may be a haptic alert. As an example and not by way of limitation, one or more of the VR display device 135 and/or the controllers 106 may pulse or vibrate to alert the user 102 of an impending collision with another user or obstacle. The haptic alert may be directional, such that a potential collision from the right of the user 102 may cause the right-hand controller 106 to provide the haptic alert. The frequency of the haptic alert may increase as the potential for collision increases. The proximity warning may be an auditory alert. As an example and not by way of limitation, VR display device 135 may play a tone, sound, or other noise to alert the user 102 of an impending collision with another user or obstacle. The auditory alert may be directional, such that a potential collision from the right of the user 102 may cause a right-side speaker of the VR display device 135 to provide the auditory alert. The frequency of the auditory alert may increase as the potential for collision increases. The proximity warning may be a visual alert. As an example and not by way of limitation, VR display device 135 may display or render a flashing light, glow, passthrough view, or other visual cue to alert the user 102 of an impending collision with another user or obstacle. The visual alert may be directional, such that a potential collision from the right of the user 102 may cause a right portion of the VR display device 135 to provide the visual alert. The intensity or size of the portion of the visual alert may increase as the potential for collision increases. The proximity warning may be based on determining a relative speed of one VR display device with respect to another VR display device. If a second VR display device 135b is approaching the first VR display device 135a with a direction of movement 175 with a speed greater than a threshold speed (e.g., 5 miles per hour), the VR system 100 may render a proximity warning on the first VR display device 135a. As an example and not by way of limitation, if the second user 102b wearing the second VR display device 135b is jogging or running towards the first user 102a wearing the first VR display device 135a, the first VR display device 135a may render a proximity warning to alert the first user 102a of the rapidly approaching second user 102b (Geisert et al. par. 46).
Therefore, it would have been obviously to one of ordinary skill in the art before the effective filing date of the claim invention to substitute the proximity warning in response to the direction of the second user as taught by Geisert et al. reference into the method of Kothari et al. and Dusan et al. reference and the result of substitution would be predictable for avoid collision.
Regarding claim 16, the combination of Kothari et al., Dusan et al. and Geisert et al. disclose The method of claim 1, wherein controlling the operation of the speech signal detection device comprises activating an augmented reality (AR) element presented to the first person based on movement of the second person who is outside of a field of view of the first person.
One technical challenge may include ensuring the safety of two or more VR users within a shared real-world environment. The solution presented by the embodiments disclosed herein to address this challenge may be to provide proximity warnings on a display of a first VR display device based on a determination that a second VR display device is approaching the first VR display device. In particular embodiments, the VR system 100 may render, for one or more displays 114 of the first VR display device 135a, a first output image comprising a proximity warning with respect to the second VR display device 135b based on determining the pose of the first VR display device 135a with respect to the second VR display device 135b is within a threshold distance. That is, if the second VR display device 135b approaches within a predetermined distance to the first VR display device 135a, the VR system may direct the first VR display device 135a to provide a proximity warning indicating that another VR display device (and accordingly, another user 102b) is approaching the first VR display device 135a (and accordingly, the user 102a) to ensure user safety during the VR experience. This proximity warning may be issued even when the risk of collision comes from a user or object that is outside of the field of view 165 of the user 102. As an example and not by way of limitation, if the user 102b wearing the VR display device 135b approaches within a threshold distance (e.g., 1 meter) of the user 102a wearing the VR display device 135a, an output image comprising a proximity warning may be rendered on the display 114 of the VR display device 135a to alert the user 102a another user 102b is approaching the user 102a. Additionally, the VR system 100 may determine a pose of the first VR display device 135a with respect to one or more anchor points (e.g., one or more objects 145 in the real-world environment 150), determine a pose of the second VR display device 135b with respect to one or more anchor points (e.g., one or more objects 145 in the real-world environment 150), and determine a distance between the first and second VR display devices 135a, 135b based on the pose of the VR display device 135a to the anchor point and the VR display device 135b to the anchor point. As an example and not by way of limitation, if a pose (e.g., position and orientation) between a first VR display device 135a and an anchor point is known, and a pose between a second VR display device 135b and an anchor point is known, the distance between the first and second VR display devices 135a, 135b may be calculated. Accordingly, if the distance between the first and second VR display devices 135a, 135b is determined, then the VR system 100 may determine wither the distance between the first VR display device 135a and the second VR display device 135b is within the threshold distance. Although this disclosure describes rending an output image comprising a proximity warning in a particular manner, this disclosure contemplates rending an output image comprising a proximity warning in any suitable manner (Geisert et al. par. 45).
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Kothari et al. US 20240221741 in view of Dusan et al. US 20240292151 and further in view of Wexler US 20240127824.
Regarding claim 18, the combination of Kothari et al. and Dusan et al. teach the EMG communication device being positioned adjacent to and underneath a neck region of the first person, and the EMG communication device comprising a plurality of electrodes configured to collect the combination of signals.
FIG. 1B is an illustration of wearable device target zone(s) associated with a wearable speech input device such as wearable device 100 (FIG. 1A), in accordance with some embodiments of the technology described herein. The target zones may include one or more areas on or near the user's body part, in which sensor(s) can be placed to measure speech muscle activation patterns while the user is speaking (silently or with voice) or preparing to speak. For example, the speech muscle activation patterns at various target zones may include facial muscle movement, neck muscle movement, chin muscle movement, or a combination thereof associated with the user speaking. In some examples, the sensors may be placed at or near a target zone at which the sensors may be configured to measure the blood flow that occurs as a result of the speech muscle activation associated with the user speaking. Thus, the wearable device 100 may be configured to have its sensors positioned to contact one or more target zones, such as the face and neck of the user (Kothari et al. par. 47). In some embodiments, a second target zone 121 is shown along the jawline of the user. The second target zone 121 may include portions of the user's face above and under the chin of the user. The second target zone 121 may include portions of the user's face under the jawline of the user. The second target zone 121 may be used to measure electrical signals associated with muscles in the face, lips jaw and neck of the user, including the depressor labii inferioris of the user, the depressor anguli oris of the user, the mentalis of the user, the orbicularis oris of the user, the depressor septi of the user, the mentalis of the user, the platysma of the user and/or the risorius of the user. Various sensors may be placed at the second target zone 121. For example, electrodes (e.g., 111 in FIG. 1A) supported by the wearable device 100 (e.g., via a sensor arm 110) may be positioned to contact the second target zone 121. Additional sensors, e.g., accelerometers, may be supported by the wearable device and positioned at the second target zone 121 to measure the movement of the user's jaw. Additional sensor may also include sensors configured to detect the position and activity of the user's tongue (Kothari et al. par. 50). In some embodiments, a third target zone 122 is shown at the neck of the user. The third target zone 122 may be used to measure electrical signals associated with muscles in the neck of the user, e.g., the sternal head of sternocleidomastoid of the user, or the clavicular head of sternocleidomastoid. Various sensors may be positioned at the third target zone 122. For example, accelerometers may be supported at the third target zone to measure vibrations and movement generated by the user's glottis during speech, as well as other vibrations and motion at the neck of user 101 produced during speech (Kothari et al. par. 51).
The combination of Kothari et al. and Dusan et al. do not explicitly teach The method of claim 1, wherein the speech signal detection device comprises an augmented reality (AR) headset that is attached to an EMG communication device.
Wexler teaches The method of claim 1, wherein the speech signal detection device comprises an augmented reality (AR) headset that is attached to an EMG communication device, (Wexler US 20240127824 abstract; [0172]-[0180]; figures 1-13)
Reference is now made to FIG. 2A, which illustrates another example implementation of speech detection system 100, in accordance with the present disclosure. In this example, wearable housing 110 may be integrated with or otherwise attached to a pair of glasses 200 having a frame 202. In this example implementation, glasses 200 may include nasal electrodes 204 and temporal electrodes 206 attached to frame 202 and contacting the user's skin surface. Electrodes 204 and 206 may receive body surface electromyogram (sEMG) signals, which provide additional information regarding the activation of the user's facial muscles. Speech detection system 100 may use the electrical activity sensed by electrodes 204 and 206 together with the output of optical sensing unit 116 in generating, for example, the synthesized audio signals. Additionally or alternatively, speech detection system 100 may include one or more additional optical sensing units 208, similar to optical sensing unit 116, for sensing skin movements in other areas of the user's face, such as eye movement. These additional optical sensing units may be used together with or instead of optical sensing unit 116. In the illustrated example, optical sensing unit 116 may illuminate a first facial region 108A and optical sensing unit 208 may illuminate a second facial region 108B. First facial region 108A and second facial region 108B may be nonoverlapping (Wexler par. 178). Reference is now made to FIG. 2B, illustrating another example implementation of speech detection system 100, in accordance with some embodiments of the present disclosure. In the depicted example, speech detection system 100 may be part of an extended reality appliance 250. Extended reality appliance 250 may include all the sensors discussed above with reference to glasses 200 and more. For example, extended reality appliance 250 may include one or more of a gyroscope, an accelerometer, a magnetometer, an image sensor, a depth sensors, an infrared sensors, a proximity sensor, and/or any other sensor configured to measure one or more properties associated with the individual wearing extended reality appliance 250 and to generate an output relating to the measured property or properties. In some cases, speech detection system 100 may use the input from any one of the sensors of extended reality appliance 250 to determine the vocalized or subvocalized words that individual 102 articulated. For example, speech detection system 100 may use input from an image sensor of extended reality appliance 250 together with data from optical sensing unit 116 (See FIG. 1) to extract meaning of facial movements. In other cases, extended reality appliance 250 may generate output that includes a visual and/or audible presentation associated with the words detected by the speech detection system 100. For example, individual 102 may interact with extended reality appliance 250 using silent commands (Wexler par. 180).
Therefore, it would have been obviously to one of ordinary skill in the art before the effective filing date of the claim invention to substitute the Augmented reality device couple to the speech detection device as taught by Wexler. reference into the method of Kothari et al. and Dusan et al. reference and the result of substitution would be predictable for determine the vocalized or subvocalized words that individual articulated.
Allowable Subject Matter
Claim17 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is an examiner’s statement of reasons for allowance:
Regarding claim 17, Kothari et al. US 20240221741; Dusan et al. US 20240292151; Campman US 20080061962; Mower US 5111815; Colachis et al. US 20210382554; Geisert et al. US 20230259194; Wexler US 20240127824; Thomaz et al. US 20240420728; Maizels et al. US 20230230594; Chappell, III et al. US 20230047787 and Bhalla et al. US 20210027802 are the closest art. They are teaching every limitation of claim 1 except for a newly amendment cited “The method of claim 1, wherein the ML model comprises an Extreme Gradient Boosting (XGB) model or a multiple layer neural network architecture, further comprising training the ML model by performing training operations comprising: obtaining a batch of the training signals; generating a digital representation of the batch of the training signals; processing the digital representation by the ML model to estimate presence of an external person; obtaining the ground-truth presence of people data associated with the batch of the training signals; computing a deviation between the estimated presence of the external person and the ground-truth presence of people data; and updating one or more parameters of the ML model based on the deviation.”.
After update search, there are none of the prior arts of record singularly or combination, teaches or fairly suggest the features present in the claim 17 “The method of claim 1, wherein the ML model comprises an Extreme Gradient Boosting (XGB) model or a multiple layer neural network architecture, further comprising training the ML model by performing training operations comprising: obtaining a batch of the training signals; generating a digital representation of the batch of the training signals; processing the digital representation by the ML model to estimate presence of an external person; obtaining the ground-truth presence of people data associated with the batch of the training signals; computing a deviation between the estimated presence of the external person and the ground-truth presence of people data; and updating one or more parameters of the ML model based on the deviation.”.
Prior arts of record fail to disclose “The method of claim 1, wherein the ML model comprises an Extreme Gradient Boosting (XGB) model or a multiple layer neural network architecture, further comprising training the ML model by performing training operations comprising: obtaining a batch of the training signals; generating a digital representation of the batch of the training signals; processing the digital representation by the ML model to estimate presence of an external person; obtaining the ground-truth presence of people data associated with the batch of the training signals; computing a deviation between the estimated presence of the external person and the ground-truth presence of people data; and updating one or more parameters of the ML model based on the deviation.”. However, upon consideration of the claim invention, there is no reasoning to combine the applied references to arrive in the context of the claim invention.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THANG D TRAN whose telephone number is (408)918-7546. The examiner can normally be reached Monday - Friday 8:00 am - 5:30 pm (pacific time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian A Zimmerman can be reached at 571-272-3059. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/THANG D TRAN/Examiner, Art Unit 2686
/BRIAN A ZIMMERMAN/Supervisory Patent Examiner, Art Unit 2686