Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner’s Comments
Device, based on how it is used in the claims, is drawn to any portion of a digital processor or circuit coupled to a digital processor.
‘earphones’ is read as a single earphone device noting the structure of the device in figure 4 requires a single device comprising the processing unit and would preclude earphones which comprise separate devices.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 102a1 as being anticipated by Shankar et al (US 12531046 B1).
As per claim 1, Shankar discloses a call noise reduction method, comprises:
acquiring, by a call noise reduction device (any portion of the processor that performs the following processing):
an echo cancellation reference signal (any of the signaling used by the echo cancelling stage of para 17,18 in adapting ),
a first noise reduction reference signal received by a first microphone coupled to the call noise reduction device (para 16 one of the microphones of one or more microphone(s) 112 in a microphone array and/or one or more loudspeaker(s) 114.) (para. 17: , the echo signal picked up by one of the microphones) and
a call signal received by a second microphone coupled to the call noise reduction device (para 17, the near end signal received by another microphone);
extracting a first fusion feature of the first noise reduction reference signal and the call signal (para. 96, part of the feature maps as applied to the cited signals above), and
extracting an echo signal feature of the echo cancellation reference signal (any of the processing relative to the echo estimate per para 104 requires extracting corresponding features per the feature and feature-layer based processing in para 104) ;
fusing the first fusion feature and the echo signal feature to generate a second fusion feature (the processing per feature and feature layer per para 104,105 in the context of creating the echo estimate per para 93 and 106); and
performing, based on the second fusion feature (the echo estimate),
noise reduction processing on the call signal to generate a noise-reduced call signal ( any or all of the echo cancellation residual echo suppression para 90, echo cancellation per para 134, and noise reduction per 116,117).
As per claim 2, the method according to claim 1, wherein extracting the first fusion feature of the first noise reduction reference signal and the call signal comprises:
processing, using a first complex convolutional network (the convolution layers per para 96), the first noise reduction reference signal and the call signal by performing complex convolutional fusion to generate the first fusion feature (abstract: configured to process complex-valued spectrograms corresponding to the isolated audio data and/or estimated echo data generated by during echo cancellation),
wherein the first fusion feature comprises phase information and amplitude information corresponding to the first noise reduction reference signal and the call signal respectively(each feature, as it is processed by a digital system requires clocking/phase information and amplitude information in order to be read by the digital processor).
As per claim 3, the method according to claim 1, wherein extracting the echo signal feature of the echo cancellation reference signal comprises:
processing, using a second complex convolutional network, the echo cancellation reference signal to generate the echo signal feature(the use of the network of the claim 2 rejection to perform the echo estimation per the claim 1 rejection).
As per claim 4, the method according to claim 1, wherein fusing the first fusion feature and the echo signal feature comprises:
concatenating the first fusion feature and the echo signal feature followed by a modulus operation (the absolute value/modulus processing per para 74) to generate the second fusion feature (para. 97: receives input data 710 and generates a first output, which is concatenated with the input data 710),
wherein the second fusion feature comprises a real-valued feature (part of the real spectrogram per para 104).
As per claim 5, the method according to claim 1, performing the noise reduction processing on the call signal comprises:
processing, using a convolutional neural network, the second fusion feature to generate a convolution-processed second fusion feature (per claim 2 rejection);
processing, using a prediction network (the convolution layers), the convolution-processed second fusion feature to generate probability results (the echo estimate per para 104) corresponding to a plurality of frequency bands (the spectrogram per para 104);
transforming the call signal into a frequency domain signal by
performing, using the probability results as weights, a weighted summation of the frequency domain signals falling into each of the plurality of frequency bands;
and converting the weighted (weights per para 103) summation (required to recover the results from the spectrograms of para 104) of the frequency domain signals back to a time domain to generate the noise-reduced call signal (para 25: While the device 110 may generate the playback audio using the far-end reference signal(s) x(t) in the time domain, for ease of illustration FIG. 1 represents the far-end reference signal(s) X(n, k) in the frequency/subband domain as the AEC component 122).
As per claim 6, the method according to claim 1, the method further comprises:
acquiring a second noise reduction reference signal received by a third microphone coupled to the call noise reduction device (a third microphone of the array receiving the echo signal at a later point in time); and
extracting a second noise reduction signal feature of the second noise reduction reference signal (analogous to as cited in claim 1 but performed on the echo signal received at a later point in time),
wherein fusing the first fusion feature and the echo signal feature to generate the second fusion feature comprises:
concatenating the first fusion feature, the echo signal feature, and the second noise reduction signal feature followed by a modulus operation to generate the second fusion feature (analogous to the processing per the claim 3,4 rejections, but on the signaling acquired at a later point in time).
As per claim 7, the method according to claim 6, wherein extracting the second noise reduction signal feature of the second noise reduction reference signal comprises: processing, using a third complex convolutional network(analogous to the networks cited above but as applied to the second noise reduction reference signal which is received at a later point in time) , the second noise reduction reference signal to generate the second noise reduction signal feature (analogous to the networks cited above but as applied to the second noise reduction reference signal which is received at a later point in time).
As per claim 8, the method according to claim 1, wherein the first fusion feature comprises: phase difference information between the first noise reduction reference signal and the call signal, and amplitude information corresponding to the first noise reduction reference signal and the call signal, respectively (per the claim 2 rejection).
As per claim 9, Shankar discloses a call noise reduction device comprising:
a processor and a memory coupled to the processor, the memory storing computer-readable instructions that, when executed by the processor, cause (a digital processor memory and software are required to implement the cited system):
acquiring an echo cancellation reference signal (per claim 1 rejection),
a first noise reduction reference signal received by a first microphone coupled to the call noise reduction device (per claim 1 rejection), and
a call signal received by a second microphone coupled to the call noise reduction device(per claim 1 rejection);
extracting a first fusion feature of the first noise reduction reference signal and the call signal (per claim 1 rejection);
extracting an echo signal feature of the echo cancellation reference signal (per claim 1 rejection);
fusing the first fusion feature and the echo signal feature to generate a second fusion feature (per claim 1 rejection); and
performing, based on the second fusion feature, noise reduction processing on the call signal to generate a noise-reduced call signal (per claim 1 rejection).
As per claim 10, the call noise reduction device according to claim 9, wherein the instructions, when executed by the processor, further cause extracting the first fusion feature of the first noise reduction reference signal and the call signal by: processing, using a first complex convolutional network, the first noise reduction reference signal and the call signal by performing complex convolutional fusion to generate the first fusion feature, wherein the first fusion feature comprises phase information and amplitude information corresponding to the first noise reduction reference signal and the call signal respectively (per the system cited in claim 2 and 3 rejections).
As per claim 11, the call noise reduction device according to claim 9, wherein the instructions, when executed by the processor, further cause extracting the echo signal feature of the echo cancellation reference signal by:
processing, using a second complex convolutional network, the echo cancellation reference signal to generate the echo signal feature (per claim 3 rejection).
As per claim 12, the call noise reduction device according to claim 9, wherein the instructions, when executed by the processor, further cause fusing the first fusion feature and the echo signal feature by: concatenating the first fusion feature and the echo signal feature followed by a modulus operation to generate the second fusion feature, wherein the second fusion feature comprises a real-valued feature (per claim 4 rejection).
As per claim 13, an earphone (para 130 headphone) comprising a first microphone, a second microphone, and a processing unit/call noise reduction device (per claim 1 rejection),
wherein: the first microphone receives a first noise reduction reference signal (per claim 1 rejection);
the second microphone receives a call signal (per claim 1 rejection); and
the processing unit is configured to:
acquire an echo cancellation reference signal (per claim 1 rejection),
the first noise reduction reference signal, and the call signal (per claim 1 rejection as read into the digital processor);
extract a first fusion feature of the first noise reduction reference signal and the call signal (per claim 1 rejection);
extract an echo signal feature of the echo cancellation reference signal(per claim 1 rejection);
fuse the first fusion feature and the echo signal feature to generate a second fusion feature (per claim 1 rejection); and
perform, based on the second fusion feature, noise reduction processing on the call signal to generate a noise-reduced call signal (per claim 1 rejection).
As per claim 14, the earphone according to claim 13, wherein the processing unit is further configured to extract the first fusion feature of the first noise reduction reference signal and the call signal by: processing, using a first complex convolutional network, the first noise reduction reference signal and the call signal by performing complex convolutional fusion to generate the first fusion feature, wherein the first fusion feature comprises phase information and amplitude information corresponding to the first noise reduction reference signal and the call signal respectively (per claim 2 rejection).
As per claim 15, the earphone according to claim 13, wherein the processing unit is further configured to extract the echo signal feature of the echo cancellation reference signal by: processing, using a second complex convolutional network, the echo cancellation reference signal to generate the echo signal feature(per claim 3 rejection).
As per claim 16, the earphone according to claim 13, wherein the processing unit is further configured to fuse the first fusion feature and the echo signal feature by: concatenating the first fusion feature and the echo signal feature followed by a modulus operation to generate the second fusion feature, wherein the second fusion feature comprises a real-valued feature. (per claim 4 rejection)
As per claim 17, the earphone according to claim 13, wherein the processing unit is further configured to perform the noise reduction processing on the call signal by: processing, using a convolutional neural network, the second fusion feature to generate a convolution-processed second fusion feature; processing, using a prediction network, the convolution-processed second fusion feature to generate probability results corresponding to a plurality of frequency bands; transforming the call signal into a frequency domain signal by: performing, using the probability results as weights, a weighted summation of the frequency domain signals falling into each of the plurality of frequency bands; and converting the weighted summation of the frequency domain signals back to a time domain to generate the noise-reduced call signal (per claim 5 rejection).
As per claim 18, the earphone according to claim 13, wherein the earphone further comprises a third microphone, and wherein the processing unit is further configured to: acquire a second noise reduction reference signal received by the third microphone; and extract a second noise reduction signal feature of the second noise reduction reference signal, wherein fusing the first fusion feature and the echo signal feature to generate the second fusion feature comprises: concatenating the first fusion feature, the echo signal feature, and the second noise reduction signal feature followed by a modulus operation to generate the second fusion feature (per claim 6 rejection).
As per claim 19, the earphone according to claim 18, wherein the processing unit is further configured to extract the second noise reduction signal feature of the second noise reduction reference signal by: processing, using a third complex convolutional network, the second noise reduction reference signal to generate the second noise reduction signal feature (as per claim 7 rejection).
As per claim 20, the earphone according to claim 13, wherein the first fusion feature comprises: phase difference information between the first noise reduction reference signal and the call signal, and amplitude information corresponding to the first noise reduction reference signal and the call signal, respectively (per claim 8 rejection).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER KRZYSTAN whose telephone number is 571-272-7498, and whose email address is alexander.krzystan@uspto.gov
The examiner can usually be reached on m-f 7:30-4:00 est.
If attempts to reach the examiner by telephone or email are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached on (571) 272-7547.
The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications.
/ALEXANDER KRZYSTAN/Primary Examiner, Art Unit 2653
Examiner Alexander Krzystan
March 9, 2026