DETAILED ACTION
This action is in response to the amendments filed 1/21/2026, Claims 1-2, 4-11, 13-20 are pending and have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/8/2025 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Response to Arguments
Applicant's arguments filed 1/21/2026 have been fully considered but they are not persuasive. In regards to the rejection of Claims 1, 10, and 19 under 35 U.S.C 103 in view of Chen, WO 2022/151156 A1 further in view of Yeo et al, US Publication No. 2018/0268845 A1, applicant argues the following:
“Thus, unlike Chen, which does not involve an air conduction microphone that does not collect vibrations from other vibration sources, amended claim 1's first sound signal and second signal are acquired through the bone conduction element and the air conduction microphone, respectively. Amended claim 1 further specifies that the second sound signal acquired through the air conduction microphone does not include noise signals caused by a vibration due to a movement of the wearer or the vibration of an environment where the wearer resides. Therefore, amended claim 1's first sound signal and second sound signal are acquired through different types of microphones, and the second sound signal acquired by the air conduction microphone does not include noise signals caused by a vibration due to a movement of the wearer or the vibration of an environment where the wearer resides. Chen does not meet amended claim 1's requirements of how to determine whether a voice signal is included in the claimed first signal based on the claimed first and second sound signals, nor does it meet the requirements of the second sound signal.
Yeo does not remedy the deficiencies of Chen. Yeo discloses in paragraph [0042] that "in certain examples, a method of processing microphone signals to detect a likelihood that a headphone user is actively speaking, such as the example method 300, may include band filtering or sub-band processing. For example, the left and right signals 302, 304 may be filtered to remove frequency components not part of a typical voice or vocal tract range, prior to processing by, e.g., the example method 300. Further, the left and right signals 302, 304 may be separated into frequency sub-bands, and one or more of the frequency sub-bands may be separately processed by, e.g., the example method 300. Either of filtering or sub-band processing, or a combination of the two, may decrease the likelihood of a false positive caused by extraneous sounds not associated with the user's voice. However, either of filtering or sub-band processing may require additional circuit components at additional cost, and/or may require additional computational power or processing resources, therefore consuming more energy from a power source, e.g., a battery. In certain examples, filtering may provide a good compromise between accuracy and power consumption."
Therefore, Yeo simply discloses a post-processing noise cancelling mechanism, which uses a filter to remove environmental noises, and Yeo's filter requires additional circuit components, and/or requires additional computational power or processing resources. However, Yeo does not disclose or suggest acquiring a sound signal through an air conduction microphone that excludes the vibration caused by the wearer's own movement or the noise signal caused by environmental vibration, as recited in amended claim 1. Thus, Yeo does not meet the requirements of the claimed second sound signal, nor disclose or suggest how to determine whether a voice signal is included in the first signal based on the first and second sound signals that are acquired through different types of microphones. Yeo does not remedy the deficiencies of Chen.”
As acknowledged by the prior rejection, Chen does not further teach this exclusion of vibrations due to movement and/or the vibration of the environment that the wearer resides, within the second sound signal. Yeo is relied upon to teach this further limitation, as stated in the cited passages, “For example, the left and right signals 302, 304 may be filtered to remove frequency components not part of a typical voice or vocal tract range, prior to processing by, e.g., the example method 300.”. This clearly states that prior to the processing of method 300, which corresponds to the voice activity detection as described in the claimed invention, the signals are passed through filters that eliminate frequencies that would correspond to non-voice signals, which would necessarily result in a second sound signal lacking in the vibrations as described. The inclusion of components that could perform this filtering does not invalidate the limitation being taught, as the signal acquired from this microphone is lacking in these vibrations prior to the processing, as well as filters built within microphone components being known in the art (By way of example, Josefsson et al, US Publication 2008/0090625 A1, displays one such microphone with a built in filter. FIG. 4, and Paragraph 23, “In accordance with illustrative embodiments of the invention, the microphone microchip 42 includes an internal filter 60 that can substantially attenuate the induced RF carrier noise signals while allowing audio signals from the MEMS microphone 44 to pass substantially undisturbed. To that end, the microphone microchip 42 has a filter 60 configured to substantially attenuate interference signals at or near the frequency of the carrier signal that are coupled into the microchip.”, while in a different embodiment, Josefsson et al displays a microphone with an internal filter to attenuate unwanted noise.). Further, the present application does not further clarify a manner in which this function could be performed without additional components, and as such does not display the lack of these components in selective frequency collection. It is for the above reasoning that the prior rejection of independent claims 1, 10, and 19 under 35 U.S.C 103 are maintained.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2, 7, 10-11, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al, WO 2022/151156 A1, in view of Yeo et al, US Publication No. 2018/0268845 A1.
Regarding Claim 1, Chen et al teaches an earphone controlling method, comprising: acquiring, by an earphone controlling apparatus, a first sound signal through a bone conduction element of the earphone (Title/Abstract, Paragraph 19, "In a general concept, the present disclosure provides a method and a system which may adjust the ANC module, function or operation of the headphone based on the estimation of the presence and absence of a user's voice, the user's voice is generated by one who is wearing the headphone and is speaking... The method and system provided by the present disclosure contains additional microphones and/or sensors (FIG.1 only shows sensor for simplicity) in the headphone to capture the user voice using extra secondary transfer path(s)... The additional sensor is, but not limited to, an accelerometer, a bone conduction sensor or other general vibration sensor.");
determining, by the earphone controlling apparatus, whether the first sound signal comprises a voice signal sent by a wearer of the earphone according to the first sound signal (Paragraph 23, "When the user is speaking, both the FF microphone and the FB microphone may capture the user voice respectively as a FF signal and a FB signal. Then, a FF VAD detection and a FB VAD detection may be separately performed based on the FF signal from the FF microphone and the FB signal from the FB microphone. A FF VAD flag and a FB VAD flag can be obtained as a result of the FF VAD detection and the FB VAD detection.");
and controlling, by the earphone controlling apparatus, the earphone to operate in a transparent transmission mode in response to determining that the first sound signal comprises the voice signal (Paragraph 20, "Then, for example, if the detection result indicates that the user voice is active, then a control signal may be generated to toggle the headphone to operate from the ANC mode to a transparency mode which completely or partially allows the ambient sound to reach the user ear.").
acquiring a second sound signal through an air conduction microphone of the earphone (FIG. 2 and 3, Paragraph 22, "As shown in FIG. 3, the headphone may comprise a feedforward microphone (FF mic) and a feedback microphone (FB mic)… The FF microphone is arranged in the side of the headphone toward the outside environment.");
and wherein determining whether the first sound signal comprises the voice signal sent by the wearer of the earphone according to the first sound signal comprises: performing a correlation analysis on the first sound signal and the second sound signal to determine a correlation degree of the first sound signal and the second sound signal (FIG. 8, Paragraph 50, "Further, the processor is further configured to receive a feedforward signal from the feedforward microphone of the headphone; determine, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and determine, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.", Paragraph 54, "The processor may be further configured to: set the combination VAD flag is to 0, if both the values of the FF VAD flag and the FB VAD flag are 0; set the combination VAD flag to 1, if both the values of the FF VAD flag and the FB VAD flag are 1; and calculate the combination VAD flag using a weight parameter based on the value of the FF VAD flag and the value of the FB VAD flag, if one of the value of the FF VAD flag and the value of the FB VAD flag is not equal to 1.");
and determining that the first sound signal includes the voice signal sent by the wearer of the earphone when the correlation degree satisfies a predetermined correlation condition (FIG. 8, Paragraph 50, "Further, the processor is further configured to receive a feedforward signal from the feedforward microphone of the headphone; determine, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and determine, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.", Paragraph 54, "The processor may be further configured to: set the combination VAD flag is to 0, if both the values of the FF VAD flag and the FB VAD flag are 0; set the combination VAD flag to 1, if both the values of the FF VAD flag and the FB VAD flag are 1; and calculate the combination VAD flag using a weight parameter based on the value of the FF VAD flag and the value of the FB VAD flag, if one of the value of the FF VAD flag and the value of the FB VAD flag is not equal to 1.").
Chen et al does not further teach, wherein the second sound signal, acquired through the air conduction microphone of the earphone does not include noise signals caused by a vibration due to a movement of the wearer or the vibration of an environment where the wearer resides
However, Yeo et al, in a similar invention in the same field of endeavor teaches, wherein the second sound signal acquired through the air conduction microphone of the earphone does not include noise signals caused by a vibration due to a movement of the wearer or the vibration of an environment where the wearer resides (Paragraph 42, “In certain examples, a method of processing microphone signals to detect a likelihood that a headphone user is actively speaking, such as the example method 300, may include band filtering or sub-band processing. For example, the left and right signals 302, 304 may be filtered to remove frequency components not part of a typical voice or vocal tract range, prior to processing by, e.g., the example method 300. Further, the left and right signals 302, 304 may be separated into frequency sub-bands, and one or more of the frequency sub-bands may be separately processed by, e.g., the example method 300. Either of filtering or sub-band processing, or a combination of the two, may decrease the likelihood of a false positive caused by extraneous sounds not associated with the user's voice.”, the filter being utilized removes environmental noises, and as such would necessarily remove movement related noises and/or the vibrations within the user’s environment, and as such would perform the function of providing a signal lacking these features after the filter is applied. Paragraph 3 further describes the left and right microphones that provide the left and right signals 302 and 304. Paragraph 53 also further describes an interior microphone that utilizes signals generated by conduction through bones, and utilizing that signal to further enhance the voice activity detection in the system.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of a feedforward microphone signal not including noise signals caused by a vibration due to a movement of the wearer or the vibration of an environment where the wearer resides, as taught by Yeo et al, with the system as taught by Chen et al. The motivation being to allow for better correlation analysis by removing noises that do not correspond to a voice signal, ensuring better accuracy.
Regarding Claim 2, Chen et al in view of Yeo et al teaches all the limitations of claim 1, and Chen et al further teaches, wherein, determining whether the first sound signal comprises the voice signal sent by the wearer of the earphone according to the first sound signal comprises at least one of: determining that the first sound signal comprises the voice signal sent by the wearer of the earphone in response to determining that an energy of the first sound signal is greater than a predetermined energy threshold (Paragraph 32, " It can be understood the SNR is one example for a metric of characterizing the microphone signal without specific limitation, any other metrics that can characterize the microphone signal could be used in the method of system disclosed herein, such as the magnitude of the signal, the energy of the signal, the frequency response of the signal, and so on.", "A predetermined threshold interval for SNR of the FF signal may be set, which is defined by a high threshold THH and a low threshold THL. In comparison with a single threshold, using a threshold interval may improve a fault tolerance rate and reduce the misjudgment. At S602, the SNR of the FF signal is compared to the high threshold. If the SNR is greater than or equal to the high threshold THH, then the flow goes to S603, in which the FF VAD flag is set to 1, if not, the method goes to S604.");
and determining that the first sound signal comprises the voice signal sent by the wearer of the earphone in response to determining that a frequency of the first sound signal satisfies a predetermined frequency condition (Paragraph 32, " It can be understood the SNR is one example for a metric of characterizing the microphone signal without specific limitation, any other metrics that can characterize the microphone signal could be used in the method of system disclosed herein, such as the magnitude of the signal, the energy of the signal, the frequency response of the signal, and so on.", "A predetermined threshold interval for SNR of the FF signal may be set, which is defined by a high threshold THH and a low threshold THL. In comparison with a single threshold, using a threshold interval may improve a fault tolerance rate and reduce the misjudgment. At S602, the SNR of the FF signal is compared to the high threshold. If the SNR is greater than or equal to the high threshold THH, then the flow goes to S603, in which the FF VAD flag is set to 1, if not, the method goes to S604.").
Regarding Claim 7, Chen et al in view of Yeo et al teaches all the limitations of claim 1, and Chen et al further teaches wherein the method further comprises: determining whether a third sound signal acquired through the bone conduction element in a predetermined time period includes a voice signal after operating in the transparent transmission mode (FIG. 9, Paragraphs 45-48 , "If the combination VAD flag is determined as shown at S901, then the method goes to S902. At S902, the method determines whether the combination VAD flag is equal to 1. If it is equal to 1, then a corresponding control signal is generated to turn off the ANC, at S903. In one example, the adjustment of turning off the ANC may be performed for both the FF microphone and the FB microphone.", "If the combination VAD flag is not equal to 1, then the method goes to S904. At S904, the method determines whether the combination VAD flag is equal to 0. If the combination VAD flag is equal to 0. a corresponding control signal is generated to turn on the ANC, at S905", "If the combination VAD flag is not equal to 0, the method goes to S906. At S906, a corresponding control signal is generated to turn on or off ANC with a certain ratio according to the value of the combination VAD flag.");
and controlling the earphone to switch to a noise reduction mode in response to determining that the third sound signal does not include a voice signal (Paragraph 47, "If the combination VAD flag is not equal to 1, then the method goes to S904. At S904, the method determines whether the combination VAD flag is equal to 0. If the combination VAD flag is equal to 0. a corresponding control signal is generated to turn on the ANC, at S905").
Regarding Claim 10, Chen et al teaches An earphone controlling apparatus, comprising: a processor (Paragraph 5, "The system may comprise a processor. The processor may be configured to receive a feedback signal from the feedback microphone or sensor of the headphone;"); a memory for storing processor-executable instructions (Paragraph 61, “Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium… In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device”);
wherein the processor is configured to: acquire a first sound signal through a bone conduction element of the earphone (Title/Abstract, Paragraph 19, "In a general concept, the present disclosure provides a method and a system which may adjust the ANC module, function or operation of the headphone based on the estimation of the presence and absence of a user's voice, the user's voice is generated by one who is wearing the headphone and is speaking... The method and system provided by the present disclosure contains additional microphones and/or sensors (FIG.1 only shows sensor for simplicity) in the headphone to capture the user voice using extra secondary transfer path(s)... The additional sensor is, but not limited to, an accelerometer, a bone conduction sensor or other general vibration sensor.");
determine whether the first sound signal includes a voice signal sent by a wearer of the earphone according to the first sound signal; control the earphone to operate in a transparent mode in the case the first sound signal includes the voice signal (Paragraph 23, "When the user is speaking, both the FF microphone and the FB microphone may capture the user voice respectively as a FF signal and a FB signal. Then, a FF VAD detection and a FB VAD detection may be separately performed based on the FF signal from the FF microphone and the FB signal from the FB microphone. A FF VAD flag and a FB VAD flag can be obtained as a result of the FF VAD detection and the FB VAD detection.", Paragraph 20, "Then, for example, if the detection result indicates that the user voice is active, then a control signal may be generated to toggle the headphone to operate from the ANC mode to a transparency mode which completely or partially allows the ambient sound to reach the user ear.").
acquire a second sound signal through an air conduction microphone of the earphone (FIG. 2 and 3, Paragraph 22, "As shown in FIG. 3, the headphone may comprise a feedforward microphone (FF mic) and a feedback microphone (FB mic)… The FF microphone is arranged in the side of the headphone toward the outside environment.");
and perform a correlation analysis on the first sound signal and the second sound signal to determine a correlation degree of the first sound signal and the second sound signal (FIG. 8, Paragraph 50, "Further, the processor is further configured to receive a feedforward signal from the feedforward microphone of the headphone; determine, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and determine, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.", Paragraph 54, "The processor may be further configured to: set the combination VAD flag is to 0, if both the values of the FF VAD flag and the FB VAD flag are 0; set the combination VAD flag to 1, if both the values of the FF VAD flag and the FB VAD flag are 1; and calculate the combination VAD flag using a weight parameter based on the value of the FF VAD flag and the value of the FB VAD flag, if one of the value of the FF VAD flag and the value of the FB VAD flag is not equal to 1."),
and determine that the first sound signal comprises the voice signal sent by the wearer of the earphone when the correlation degree satisfies a predetermined correlation condition (FIG. 8, Paragraph 50, "Further, the processor is further configured to receive a feedforward signal from the feedforward microphone of the headphone; determine, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and determine, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.", Paragraph 54, "The processor may be further configured to: set the combination VAD flag is to 0, if both the values of the FF VAD flag and the FB VAD flag are 0; set the combination VAD flag to 1, if both the values of the FF VAD flag and the FB VAD flag are 1; and calculate the combination VAD flag using a weight parameter based on the value of the FF VAD flag and the value of the FB VAD flag, if one of the value of the FF VAD flag and the value of the FB VAD flag is not equal to 1.").
Chen et al does not further teach, wherein the second sound signal acquired through the air conduction microphone of the earphone does not include noise signals caused by a vibration due to a movement of the wearer or the vibration of an environment where the wearer resides
However, Yeo et al, in a similar invention in the same field of endeavor teaches, wherein the second sound signal acquired through the air conduction microphone of the earphone does not include noise signals caused by a vibration due to a movement of the wearer or the vibration of an environment where the wearer resides (Paragraph 42, “In certain examples, a method of processing microphone signals to detect a likelihood that a headphone user is actively speaking, such as the example method 300, may include band filtering or sub-band processing. For example, the left and right signals 302, 304 may be filtered to remove frequency components not part of a typical voice or vocal tract range, prior to processing by, e.g., the example method 300. Further, the left and right signals 302, 304 may be separated into frequency sub-bands, and one or more of the frequency sub-bands may be separately processed by, e.g., the example method 300. Either of filtering or sub-band processing, or a combination of the two, may decrease the likelihood of a false positive caused by extraneous sounds not associated with the user's voice.”, the filter being utilized removes environmental noises, and as such would necessarily remove movement related noises and/or the vibrations within the user’s environment, and as such would perform the function of providing a signal lacking these features after the filter is applied, Paragraph 3 further describes the left and right microphones that provide the left and right signals 302 and 304. Paragraph 53 also further describes an interior microphone that utilizes signals generated by conduction through bones, and utilizing that signal to further enhance the voice activity detection in the system.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of a feedforward microphone signal not including noise signals caused by a vibration due to a movement of the wearer or the vibration of an environment where the wearer resides. The motivation being to allow for better correlation analysis by removing noises that do not correspond to a voice signal, ensuring better accuracy.
Regarding Claim 11, Chen et al in view of Yeo et al teaches all the limitations of claim 10, and Chen et al further teaches wherein the processor is further configured to: determine that the first sound signal comprises the voice signal sent by the wearer of the earphone when an energy of the first sound signal is greater than a predetermined energy threshold (Paragraph 32, " It can be understood the SNR is one example for a metric of characterizing the microphone signal without specific limitation, any other metrics that can characterize the microphone signal could be used in the method of system disclosed herein, such as the magnitude of the signal, the energy of the signal, the frequency response of the signal, and so on.", "A predetermined threshold interval for SNR of the FF signal may be set, which is defined by a high threshold THH and a low threshold THL. In comparison with a single threshold, using a threshold interval may improve a fault tolerance rate and reduce the misjudgment. At S602, the SNR of the FF signal is compared to the high threshold. If the SNR is greater than or equal to the high threshold THH, then the flow goes to S603, in which the FF VAD flag is set to 1, if not, the method goes to S604.");
and/or, determine that the first sound signal includes the voice signal sent by the wearer of the earphone when a frequency of the first sound signal satisfies a predetermined frequency condition (Paragraph 32, " It can be understood the SNR is one example for a metric of characterizing the microphone signal without specific limitation, any other metrics that can characterize the microphone signal could be used in the method of system disclosed herein, such as the magnitude of the signal, the energy of the signal, the frequency response of the signal, and so on.", "A predetermined threshold interval for SNR of the FF signal may be set, which is defined by a high threshold THH and a low threshold THL. In comparison with a single threshold, using a threshold interval may improve a fault tolerance rate and reduce the misjudgment. At S602, the SNR of the FF signal is compared to the high threshold. If the SNR is greater than or equal to the high threshold THH, then the flow goes to S603, in which the FF VAD flag is set to 1, if not, the method goes to S604.").
Regarding Claim 19, Chen et al teaches A non-transitory computer-readable storage medium having stored therein instructions that, when executed by a processor of an electronic device, causes the electronic device to perform ((Paragraph 56, “The disclosure further includes a non-transitory computer-readable medium storing program instructions that, when executed by a processor, cause the processor to perform the steps of: receiving a feedback signal from the feedback microphone or sensor of the headphone; determining, a feedback (FB) voice activity detection (VAD) flag, based on the received feedback signal; generating a control signal based on the value of the determined FB VAD flag; and automatically adjusting a transition of the headphone between an ANC mode and a transparency mode, based on the control signal.”)): acquiring a first sound signal through a bone conduction element of the earphone (Title/Abstract, Paragraph 19, "In a general concept, the present disclosure provides a method and a system which may adjust the ANC module, function or operation of the headphone based on the estimation of the presence and absence of a user's voice, the user's voice is generated by one who is wearing the headphone and is speaking... The method and system provided by the present disclosure contains additional microphones and/or sensors (FIG.1 only shows sensor for simplicity) in the headphone to capture the user voice using extra secondary transfer path(s)... The additional sensor is, but not limited to, an accelerometer, a bone conduction sensor or other general vibration sensor.");
determining whether the first sound signal comprises a voice signal sent by a wearer of the earphone according to the first sound signal (Paragraph 23, "When the user is speaking, both the FF microphone and the FB microphone may capture the user voice respectively as a FF signal and a FB signal. Then, a FF VAD detection and a FB VAD detection may be separately performed based on the FF signal from the FF microphone and the FB signal from the FB microphone. A FF VAD flag and a FB VAD flag can be obtained as a result of the FF VAD detection and the FB VAD detection.");
and controlling the earphone to operate in a transparent transmission mode in response to determining that the first sound signal comprises the voice signal (Paragraph 20, "Then, for example, if the detection result indicates that the user voice is active, then a control signal may be generated to toggle the headphone to operate from the ANC mode to a transparency mode which completely or partially allows the ambient sound to reach the user ear.").
and acquiring a second sound signal through an air conduction microphone of the earphone (FIG. 2 and 3, Paragraph 22, "As shown in FIG. 3, the headphone may comprise a feedforward microphone (FF mic) and a feedback microphone (FB mic)… The FF microphone is arranged in the side of the headphone toward the outside environment."),
and wherein determining whether the first sound signal comprises the voice signal sent by the wearer of the earphone according to the first sound signal comprises: performing a correlation analysis on the first sound signal and the second sound signal to determine a correlation degree of the first sound signal and the second sound signal (FIG. 8, Paragraph 50, "Further, the processor is further configured to receive a feedforward signal from the feedforward microphone of the headphone; determine, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and determine, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.", Paragraph 54, "The processor may be further configured to: set the combination VAD flag is to 0, if both the values of the FF VAD flag and the FB VAD flag are 0; set the combination VAD flag to 1, if both the values of the FF VAD flag and the FB VAD flag are 1; and calculate the combination VAD flag using a weight parameter based on the value of the FF VAD flag and the value of the FB VAD flag, if one of the value of the FF VAD flag and the value of the FB VAD flag is not equal to 1.");
and determining that the first sound signal includes the voice signal sent by the wearer of the earphone when the correlation degree satisfies a predetermined correlation condition (FIG. 8, Paragraph 50, "Further, the processor is further configured to receive a feedforward signal from the feedforward microphone of the headphone; determine, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and determine, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.", Paragraph 54, "The processor may be further configured to: set the combination VAD flag is to 0, if both the values of the FF VAD flag and the FB VAD flag are 0; set the combination VAD flag to 1, if both the values of the FF VAD flag and the FB VAD flag are 1; and calculate the combination VAD flag using a weight parameter based on the value of the FF VAD flag and the value of the FB VAD flag, if one of the value of the FF VAD flag and the value of the FB VAD flag is not equal to 1.").
Chen et al does not further teach, wherein the second sound signal acquired through the air conduction microphone of the earphone does not include noise signals caused by a vibration due to a movement of the wearer or the vibration of an environment where the wearer resides.
However, Yeo et al, in a similar invention in the same field of endeavor teaches, wherein the second sound signal acquired through the air conduction microphone of the earphone does not include noise signals caused by a vibration due to a movement of the wearer or the vibration of an environment where the wearer resides (Paragraph 42, “In certain examples, a method of processing microphone signals to detect a likelihood that a headphone user is actively speaking, such as the example method 300, may include band filtering or sub-band processing. For example, the left and right signals 302, 304 may be filtered to remove frequency components not part of a typical voice or vocal tract range, prior to processing by, e.g., the example method 300. Further, the left and right signals 302, 304 may be separated into frequency sub-bands, and one or more of the frequency sub-bands may be separately processed by, e.g., the example method 300. Either of filtering or sub-band processing, or a combination of the two, may decrease the likelihood of a false positive caused by extraneous sounds not associated with the user's voice.”, the filter being utilized removes environmental noises, and as such would necessarily remove movement related noises and/or the vibrations within the user’s environment, and as such would perform the function of providing a signal lacking these features after the filter is applied, Paragraph 3 further describes the left and right microphones that provide the left and right signals 302 and 304. Paragraph 53 also further describes an interior microphone that utilizes signals generated by conduction through bones, and utilizing that signal to further enhance the voice activity detection in the system.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of a feedforward microphone signal not including noise signals caused by a vibration due to a movement of the wearer or the vibration of an environment where the wearer resides. The motivation being to allow for better correlation analysis by removing noises that do not correspond to a voice signal, ensuring better accuracy.
Regarding Claim 20, Chen et al in view of Yeo et al teaches all the limitations of claim 19, and Chen et al further teaches wherein determining whether the first sound signal comprises the voice signal sent by the wearer of the earphone according to the first sound signal comprises at least one of: determining that the first sound signal comprises the voice signal sent by the wearer of the earphone in response to determining that an energy of the first sound signal is greater than a predetermined energy threshold (Paragraph 32, " It can be understood the SNR is one example for a metric of characterizing the microphone signal without specific limitation, any other metrics that can characterize the microphone signal could be used in the method of system disclosed herein, such as the magnitude of the signal, the energy of the signal, the frequency response of the signal, and so on.", "A predetermined threshold interval for SNR of the FF signal may be set, which is defined by a high threshold THH and a low threshold THL. In comparison with a single threshold, using a threshold interval may improve a fault tolerance rate and reduce the misjudgment. At S602, the SNR of the FF signal is compared to the high threshold. If the SNR is greater than or equal to the high threshold THH, then the flow goes to S603, in which the FF VAD flag is set to 1, if not, the method goes to S604.");
and determining that the first sound signal comprises the voice signal sent by the wearer of the earphone in response to determining that a frequency of the first sound signal satisfies a predetermined frequency condition (Paragraph 32, " It can be understood the SNR is one example for a metric of characterizing the microphone signal without specific limitation, any other metrics that can characterize the microphone signal could be used in the method of system disclosed herein, such as the magnitude of the signal, the energy of the signal, the frequency response of the signal, and so on.", "A predetermined threshold interval for SNR of the FF signal may be set, which is defined by a high threshold THH and a low threshold THL. In comparison with a single threshold, using a threshold interval may improve a fault tolerance rate and reduce the misjudgment. At S602, the SNR of the FF signal is compared to the high threshold. If the SNR is greater than or equal to the high threshold THH, then the flow goes to S603, in which the FF VAD flag is set to 1, if not, the method goes to S604.").
Claims 4-5, 13-14, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al, International Publication No. WO 2022/151156 A1, in view of Yeo et al, US Publication No. 2018/0268845 A1, further in view of Hua et al, Chinese Patent No. CN112770214 A.
Regarding Claim 4, Chen et al in view of Yeo et al teaches all the limitations of claim 1, but does not further teach operating in a transparent transmission mode after determining that a sound signal includes a voice conversation.
However, Hua et al in a similar invention in the same field of endeavor teaches, wherein controlling the earphone to operate in a transparent transmission mode in response to determining that the first sound signal includes the voice signal comprises: determining whether the first sound signal includes a voice conversation content in response to determining that the first sound signal includes the voice signal (See Page 6, "In this embodiment, the first ambient sound signal picked up by the feed-forward microphone for speech recognition, determining whether the first ambient sound signal comprises a voice signal, under the condition of determining the first ambient sound signal comprises a voice signal, explaining that the wearer or other people around the wearer is speaking. Further, the current sampling period by the feedback microphone of the earphone of the second ambient sound signal to speech recognition, can be determined whether the wearer is speaking or other people around the wearer speaking, so as to determine whether the wearer has a dialog behavior.");
and controlling the earphone to operate in the transparent transmission mode in response to determining that the first sound signal comprises the voice conversation content (See Page 5, "Therefore, the present disclosure can be according to the feed-forward microphone of the earphone and feedback microphone picking up the environment sound signal, determining whether the earphone wearer has dialogue behavior, when the earphone wearer dialog behavior, controlling the earphone automatically switching to the transmission mode of the technical solution; so that the user can communicate with other people under the condition of wearing the earphone, which is convenient for the user to use.").
One of ordinary skill in the art would have found it obvious before the effective filing date of the application to combine the teachings of an earphone utilizing signal analysis to determine if a user’s voice is present, and controlling the signal based upon that determination, as taught by Hua et al, with the system as taught by Chen et al in view of Yeo et al. The motivation being to more accurately determine if a voice signal is present, and allow for a more dynamic processing operation.
Regarding Claim 5, Chen et al in view of Yeo et al, further in view of Hua et al teaches all the limitations of claim 4, and further teaches, wherein, determining whether the first sound signal comprises the voice conversation content comprises: performing voice recognition on the first sound signal to determine a voice content included in the first sound signal (See Hua et al, Page 7, “"In this embodiment, in the case of determining the first ambient sound signal comprises a voice signal, the second ambient sound signal for voice recognition, if the second ambient sound signal comprises a voice signal, also is a feed-forward microphone and feedback microphone simultaneously detecting the voice signal, indicating that the wearer is speaking." "Step S3310, performing voice recognition to the second environment sound signal, determining whether the second environment sound signal comprises a voice signal.");
and determining that the first sound signal comprises the voice conversation content in response to determining that the voice content satisfies a predetermined voice conversation content (See Hua et al, Page 7, “"In this embodiment, in the case of determining the first ambient sound signal comprises a voice signal, the second ambient sound signal for voice recognition, if the second ambient sound signal comprises a voice signal, also is a feed-forward microphone and feedback microphone simultaneously detecting the voice signal, indicating that the wearer is speaking." "Step S3310, performing voice recognition to the second environment sound signal, determining whether the second environment sound signal comprises a voice signal.").
Regarding Claim 13, Chen et al in view of Yeo et al teaches all the limitations of claim 10, but does not further teach operating in a transparent transmission mode after determining that a sound signal includes a voice conversation.
However, Hua et al in a similar invention in the same field of endeavor teaches, wherein, the processor is further configured to: determine whether the first sound signal includes a voice conversation content when the first sound signal includes the voice signal (See Page 6, "In this embodiment, the first ambient sound signal picked up by the feed-forward microphone for speech recognition, determining whether the first ambient sound signal comprises a voice signal, under the condition of determining the first ambient sound signal comprises a voice signal, explaining that the wearer or other people around the wearer is speaking. Further, the current sampling period by the feedback microphone of the earphone of the second ambient sound signal to speech recognition, can be determined whether the wearer is speaking or other people around the wearer speaking, so as to determine whether the wearer has a dialog behavior."),
and control the earphone to operate in the transparent transmission mode when the first sound signal includes the voice conversation content (See Page 5, "Therefore, the present disclosure can be according to the feed-forward microphone of the earphone and feedback microphone picking up the environment sound signal, determining whether the earphone wearer has dialogue behavior, when the earphone wearer dialog behavior, controlling the earphone automatically switching to the transmission mode of the technical solution; so that the user can communicate with other people under the condition of wearing the earphone, which is convenient for the user to use.").
One of ordinary skill in the art would have found it obvious before the effective filing date of the application to combine the teachings of an earphone utilizing signal analysis to determine if a user’s voice is present, and controlling the signal based upon that determination, as taught by Hua et al, with the system as taught by Chen et al in view of Yeo et al. The motivation being to more accurately determine if a voice signal is present, and allow for a more dynamic processing operation.
Regarding Claim 14, Chen et al in view of Yeo et al, further in view of Hua et al teaches all the limitations of claim 13, and further teaches, wherein, the processor is further configured to: perform voice recognition on the first sound signal to determine a voice content included in the first sound signal (See Hua et al, Page 7, “"In this embodiment, in the case of determining the first ambient sound signal comprises a voice signal, the second ambient sound signal for voice recognition, if the second ambient sound signal comprises a voice signal, also is a feed-forward microphone and feedback microphone simultaneously detecting the voice signal, indicating that the wearer is speaking." "Step S3310, performing voice recognition to the second environment sound signal, determining whether the second environment sound signal comprises a voice signal."),
and determine that the first sound signal comprises the voice conversation content when the voice content satisfies a predetermined voice conversation content (See Hua et al, Page 7, “"In this embodiment, in the case of determining the first ambient sound signal comprises a voice signal, the second ambient sound signal for voice recognition, if the second ambient sound signal comprises a voice signal, also is a feed-forward microphone and feedback microphone simultaneously detecting the voice signal, indicating that the wearer is speaking." "Step S3310, performing voice recognition to the second environment sound signal, determining whether the second environment sound signal comprises a voice signal.").
Regarding Claim 16, Chen et al in view of Yeo et al, further in view of Hua et al teaches all the limitations of claim 13, and further teaches, wherein, the processor is further configured to: determine whether a third sound signal acquired through the bone conduction element in a predetermined time period includes a voice signal after operating in the transparent transmission mode (See Chen et al, FIG. 9, Paragraphs 45-48 , "If the combination VAD flag is determined as shown at S901, then the method goes to S902. At S902, the method determines whether the combination VAD flag is equal to 1. If it is equal to 1, then a corresponding control signal is generated to turn off the ANC, at S903. In one example, the adjustment of turning off the ANC may be performed for both the FF microphone and the FB microphone.", "If the combination VAD flag is not equal to 1, then the method goes to S904. At S904, the method determines whether the combination VAD flag is equal to 0. If the combination VAD flag is equal to 0. a corresponding control signal is generated to turn on the ANC, at S905", "If the combination VAD flag is not equal to 0, the method goes to S906. At S906, a corresponding control signal is generated to turn on or off ANC with a certain ratio according to the value of the combination VAD flag.");
control the earphone to switch to a noise reduction mode when the third sound signal does not include a voice signal (Paragraph 47, "If the combination VAD flag is not equal to 1, then the method goes to S904. At S904, the method determines whether the combination VAD flag is equal to 0. If the combination VAD flag is equal to 0. a corresponding control signal is generated to turn on the ANC, at S905").
Claims 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al, International Publication No. WO 2022/151156 A1, in view of Yeo et al, US Publication No. 2018/0268845 A1, in view of Hua et al, Chinese Patent No. CN 112770214 A, further in view of Meiyappan et al, US Patent No. 10681453 B1.
Regarding Claim 6, Chen et al in view of Yeo et al, further in view of Hua et al teach all the limitations of claim 4, but does not further teach a consistency between an audio content played by the earphone, and a voice content included in the sound signal.
However, Meiyappan et al, in a similar invention in the same field of endeavor teaches, wherein, the method further comprises: detecting a consistency between an audio content currently played by the earphone and a voice content included in the first sound signal in response to determining that the first sound signal does not include the voice conversation content (Column 8, Lines 13-22, "In an aspect, another condition may include detecting that the user is listening to a music stream (e.g., over the Bluetooth A2DP or other music profile) over the headphone speakers and that the speech signal does not relate to the user singing or humming along. In an aspect, when it is detected the headphone speakers are playing a music stream and that the detected speech signal relates to the user singing or humming along, the ANR control algorithm determines that the user does not intend to speak with another subject in the vicinity of the user.");
and controlling the earphone to operate in a noise reduction mode in response to determining that the consistency between the audio content played by the earphone and the voice content included in the first sound signal is greater than or equal to a predetermined threshold (Column 8, Lines 23-31, "In certain aspects, the ANR control algorithm may be configured to check for one or more of the above described conditions in order to determine whether the user desires to speak with another subject in the vicinity of the user… the ANR control algorithm may be configured to check for one or more other conditions in an attempt to determine whether the user desires to speak with another subject.").
One of ordinary skill in the art would have found it obvious before the effective filing date of the application to combine the teachings of an earphone operation method of determining the presence of a speech signal and changing the mode of operation based upon that determination, as taught by Meiyappan et al, with the system as taught by Chen et al in view of Yeo et al, further in view of Hua et al. The motivation being to prevent activation of different operations when the user is producing sound but is not attempting to speak to others.
Regarding Claim 15, Chen et al in view of Yeo et al, further in view of Hua et al teach all the limitations of claim 13, but does not further teach a consistency between an audio content played by the earphone, and a voice content included in the sound signal.
However, Meiyappan et al, in a similar invention in the same field of endeavor teaches, wherein, the processor is further configured to: detect a consistency between an audio content currently played by the earphone and a voice content included in the first sound signal when the first sound signal does not include the voice conversation content (Column 8, Lines 13-22, "In an aspect, another condition may include detecting that the user is listening to a music stream (e.g., over the Bluetooth A2DP or other music profile) over the headphone speakers and that the speech signal does not relate to the user singing or humming along. In an aspect, when it is detected the headphone speakers are playing a music stream and that the detected speech signal relates to the user singing or humming along, the ANR control algorithm determines that the user does not intend to speak with another subject in the vicinity of the user.");
and control the earphone to operate in a noise reduction mode when the consistency between the audio content played by the earphone and the voice content included in the first sound signal is greater than or equal to a predetermined threshold (Column 8, Lines 23-31, "In certain aspects, the ANR control algorithm may be configured to check for one or more of the above described conditions in order to determine whether the user desires to speak with another subject in the vicinity of the user… the ANR control algorithm may be configured to check for one or more other conditions in an attempt to determine whether the user desires to speak with another subject.").
One of ordinary skill in the art would have found it obvious before the effective filing date of the application to combine the teachings of an earphone operation method of determining the presence of a speech signal and changing the mode of operation based upon that determination, as taught by Meiyappan et al, with the system as taught by Chen et al in view of Hua et al, further in view of Yeo et al. The motivation being to prevent activation of different operations when the user is producing sound but is not attempting to speak to others.
Claims 8 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al, International Publication No. WO 2022/151156 A1, in view of Yeo et al, US Publication No. 2018/0268845 A1, further in view of Burnett et al, US Publication No. 2014/0126737 A1.
Regarding Claim 8, Chen et al in view of Yeo et al teaches all the limitations of claim 1, but does not further teach an air vent controlled during operation.
However, Burnett et al in a similar invention in the same field of endeavor teaches, wherein, the earphone is provided with an air vent valve, and the method further comprises: controlling the air vent valve of the earphone to open in response to determining that the first sound signal comprises the voice signal sent by the wearer of the earphone (Paragraph 72, "Performance of the sensor 100 is enhanced through the use of the seal provided between the diaphragm and the airborne environment of the talker. The seal is provided by the coupler 110. A modified gradient microphone is used in an embodiment because it has pressure ports on both ends. Thus, when the first port 104 is sealed by the coupler 110, the second port 106 provides a vent for air movement through the sensor 100. The second port is not required for operation, but does increase the sensitivity of the device to tissue-borne acoustic signals. The second port also allows more environmental acoustic noise to be detected by the device, but the device's diaphragm's sensitivity to environmental acoustic noise is significantly decreased by the loading of the coupler 110, so the increase in sensitivity to the user's speech is greater than the increase in sensitivity to environmental noise.").
One of ordinary skill in the art would have found it obvious before the effective filing date of the application to combine the teachings of an earphone operation method of determining the presence of a speech signal and changing the mode of operation based upon that determination, as taught by Burnett et al, with the system as taught by Chen et al in view of Yeo et al. The motivation being to allow for a better user experience, and allow for better control of air pressure within the user’s ear.
Regarding Claim 17, Chen et al in view of Yeo et al teaches all the limitations of claim 10, but does not further teach an air vent controlled during operation.
However, Burnett et al in a similar invention in the same field of endeavor teaches, wherein, wherein the earphone is provided with an air vent valve, and the processor is further configured to: control the air vent valve of the earphone to open when it is determined that the first sound signal includes the voice signal sent by the wearer of the earphone (Paragraph 72, "Performance of the sensor 100 is enhanced through the use of the seal provided between the diaphragm and the airborne environment of the talker. The seal is provided by the coupler 110. A modified gradient microphone is used in an embodiment because it has pressure ports on both ends. Thus, when the first port 104 is sealed by the coupler 110, the second port 106 provides a vent for air movement through the sensor 100. The second port is not required for operation, but does increase the sensitivity of the device to tissue-borne acoustic signals. The second port also allows more environmental acoustic noise to be detected by the device, but the device's diaphragm's sensitivity to environmental acoustic noise is significantly decreased by the loading of the coupler 110, so the increase in sensitivity to the user's speech is greater than the increase in sensitivity to environmental noise.").
One of ordinary skill in the art would have found it obvious before the effective filing date of the application to combine the teachings of an earphone operation method of determining the presence of a speech signal and changing the mode of operation based upon that determination, as taught by Burnett et al, with the system as taught by Chen et al in view of Yeo et al. The motivation being to allow for a better user experience, and allow for better control of air pressure within the user’s ear.
Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al, International Publication No. WO 2022/151156 A1, in view of Yeo et al, US Publication No. 2018/0268845 A1, further in view of Meiyappan et al, US Patent No. 10681453 B1.
Regarding Claim 9, Chen et al in view of Yeo et al teaches all the limitations of claim 1, but does not further teach the enabling of an operating mode control function of an earphone.
However, Meiyappan et al, in a similar invention in the same field of endeavor teaches, wherein, acquiring a first sound signal through a bone conduction element of the earphone comprises: acquiring the first sound signal through the bone conduction element of the earphone in response to determining that an operating mode control function of the earphone is enabled (Column 4, Lines 21-37, "Wearable audio output devices with ANR capability (e.g., ANR headphones) help users enjoy high quality music and participate in productive voice calls by attenuating sounds including noise external to the audio output devices. However, ANR headphones acoustically isolate the user from the world making it difficult for the user to interact with other people in the vicinity of the user. Thus, when the user wearing the headphones with ANR turned on desires to speak with another person, the user either has to manually lower the level of ANR (e.g., by using a button on the headphones) or has to remove the headphones fully or partially from its regular listening position. This does not provide an optimal experience to the user. Additionally, removing the headphones from its listening position does not allow the user to listen to audio (e.g., music playback or a conference call) while simultaneously speaking to another person.").
One of ordinary skill in the art would have found it obvious before the effective filing date of the application to combine the teachings of an earphone operation method of determining the presence of a speech signal and changing the mode of operation based upon that determination, as taught by Meiyappan et al, with the system as taught by Chen et al in view of Yeo et al. The motivation being to provide options for whether the system operates under the systems as outlined, or to operate in a passive mode.
Regarding Claim 18, Chen et al in view of Yeo et al teaches all the limitations of claim 10, but does not further teach the enabling of an operating mode control function of an earphone.
However, Meiyappan et al, in a similar invention in the same field of endeavor teaches, wherein, the processor is further configured to: acquire the first sound signal through the bone conduction element of the earphone when an operating mode control function of the earphone is enabled (Column 4, Lines 21-37, "Wearable audio output devices with ANR capability (e.g., ANR headphones) help users enjoy high quality music and participate in productive voice calls by attenuating sounds including noise external to the audio output devices. However, ANR headphones acoustically isolate the user from the world making it difficult for the user to interact with other people in the vicinity of the user. Thus, when the user wearing the headphones with ANR turned on desires to speak with another person, the user either has to manually lower the level of ANR (e.g., by using a button on the headphones) or has to remove the headphones fully or partially from its regular listening position. This does not provide an optimal experience to the user. Additionally, removing the headphones from its listening position does not allow the user to listen to audio (e.g., music playback or a conference call) while simultaneously speaking to another person.").
One of ordinary skill in the art would have found it obvious before the effective filing date of the application to combine the teachings of an earphone operation method of determining the presence of a speech signal and changing the mode of operation based upon that determination, as taught by Meiyappan et al, with the system as taught by Chen et al in view of Yeo et al. The motivation being to provide options for whether the system operates under the systems as outlined, or to operate in a passive mode.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DYLAN M NEECE whose telephone number is (703)756-1941. The examiner can normally be reached 10am - 7pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CAROLYN EDWARDS can be reached on (571)-270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DYLAN MAGUIRE NEECE/Examiner, Art Unit 2692
/CAROLYN R EDWARDS/Supervisory Patent Examiner, Art Unit 2692