DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Claims 1, 5, 19, 27, and 29 are amended. Claims 1-30 are presented for examination.
Response to Arguments
Rejection under 35 U.S.C. 103
Applicant’s arguments have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 6-7, 10-11, 19-20, 23-25, 27, and 29-30 are rejected under 35 U.S.C. 103 as being unpatentable over Amman et al. (US 20160019890 A1; hereinafter referred to as Amman) in view of Sun et al. (US 20250046328 A1; hereinafter referred to as Sun) and Lester et al. (US 20220369031 A1; hereinafter referred to as Lester).
Regarding claim 1, Amman discloses: a device comprising: one or more processors configured to: obtain an input audio signal including at least first speech of a first person ([0005] provide a noise reduction feature to a sound input received by a microphone within a vehicle's cabin in order to better detect a user's speech from the sound input);
select one or more filters from a plurality of filters ([0043] The pre-filter selection strategy at 103 is implemented in order to determine which pre-filters from the database of pre-filter options at 104 will be applied to the sound input 105 received by the cabin microphone) based on sensor data identifying an environmental condition ([0041] After receiving the vehicle operational state information 101, and receiving the external information 102, the noise reduction tool may apply a pre-filter selection strategy at 103), the one or more filters configured to identify a particular sound in the input audio signal, wherein the particular sound is associated with the environmental condition ([0052] Another pre-filter option that may be selected for removing cabin noise is the wind buffeting (non-stationary wind noise) pre-filter 5. The wind buffeting pre-filter 5 may be created to reduce, at least in part, recognized wind noise that may be part of the cabin noise as identified from the vehicle operational state information identified at 101);
process, using the one or more filters, the input audio signal to generate a filtered audio signal ([0066, 0072] the machine learning may have a higher probability of developing a prefilter selection strategy for achieving better noise reduction. This allows the noise reduction tool to generate a resultant sound input signal 105' having a clearer user voice component over a cabin noise component. The resultant sound input 105' may then be transmitted to a receiving communications...in some embodiments a traditional noise reduction filter may be applied after the application of the pre-filters. The traditional noise reduction filter may be, for example, a Weiner filter).
Amman does not explicitly, but Sun teaches: selectively adjust weights of a time-frequency mask ([0042] the mask M.sub.1, M.sub.2 comprises a plurality of mask values, one for each time and frequency bin which in general is a real number between zero and one describing the extent to which each time and frequency bin should be suppressed) based on a criterion that identifies a processed audio purpose ([0082] the internal weights and/or parameters of the isolation models 11, 12 are adjusted so as to predict mask M.sub.1 which accurately isolates the speech and mask M.sub.2 which accurately isolates the stationary noise. To accomplish this, the resulting audio signal after applying mask M.sub.1 is compared to a ground truth signal comprising the clean speech from the speech database 179 and the resulting audio signal after applying mask M.sub.2 is compared to a ground truth signal comprising only the stationary noise added from the noise database 177), the weights adjusted to have a first set of values responsive to a first processed audio purpose or to have a second set of values responsive to a second processed audio purpose ([0078] if the classifier 15 predicts the presence of birdsong the selector 16 may select a set of weighting factors 176c which suppresses the stationary noise, amplifies the non-stationary noise and amplifies the speech content as birdsong is considered to not disturb the speech intelligibility while adding a pleasant ambiance);
process, using a trained model that includes the time-frequency mask ([0041] The audio signal S.sub.in may be provided to a trained model wherein the trained model has been trained to output a mask M.sub.1, M.sub.2 for suppressing a certain type of noise wherein the mask M.sub.1, M.sub.2 is typically defined as the magnitude ratio between the desired speech S.sub.m,f and the audio signal mixture X.sub.m,f for each time frame and frequency bin) and is configured to reduce audio associated with speech in the filtered audio signal ([0075] The classifier 15 predicts the presence of at least one noise object (e.g. the presence of at least one noise object of a predetermined set of noise objects) and provides the predicted noise object(s) to a selector 16. The selector 16 accesses a database 173 of trained noise object isolation models 174a, 174b, 174c and selects at least one trained noise object isolation model 174a trained to predict a mask for isolating the at least one predicted noise object {circumflex over (N)}.sub.OBJ,1. Isolating a noise object requires reducing other audio such as speech.), the filtered audio signal to generate an intermediate predicted noise signal ([0075] The predicted mask of the selected noise object isolation model 174a is applied to the audio signal to obtain the noise object), the trained model distinct from the one or more filters ([0076] While the audio processing system 1 in FIG. 6a and FIG. 6b uses a classifier 15 and selector 16 to select appropriate filter data 172a, 172b, 172c or noise object isolator model 174a, 174b, 174c it is envisaged that the classifier 15 and selector 16 may select more than one, such as two or more, filters or noise object isolator models if two or more noise objects are detected to be present in the audio signal by the classifier 15. Also see Fig. 6A.);
Amman and Sun are considered analogous in the field of speech processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Amman to combine the teachings of Sun because doing so would allow for better noise prediction and reduction by using a trained model adapted for different types of audio content, leading to improved speech enhancement and improved flexibility in handling different audio purposes (Sun [0018] While it is difficult to train a model to separate between different types of noise, such as stationary noise content and non-stationary noise content, some implementations of the first aspect of the present invention utilizes trained models adapted for separation of more distinctly different types of audio content, such as speech and stationary noise, and a subsequent manipulation of the separated audio content comprising to more accurately separate different types of noise. The manipulation comprising determining the difference between the stationary noise and the non-speech content).
The combination of Amman and Sun does not explicitly, but Lester teaches: process, using an adaptive filter ([0209] The ANC pipeline 2409 can employ the reference noise associated with the in-ear audio signal sample 2406 for one or more ANC adaptive filtering processes performed by the ANC pipeline), the intermediate predicted noise signal to generate a predicted noise signal ([0208] the ANC pipeline 2409 can be configured to generate an anti-noise signal (e.g., an anti-noise ANC signal) that is provided to a transceiver 2412 to cancel the external sound. The anti-noise signal contains the predicted noise.);
and subtract the predicted noise signal from the filtered audio signal to generate an output audio signal ([0212] To provide an anti-noise ANC signal for performing ANC associated with the audio signal sample 2402, the Al denoiser audio processing system 2408 of the wearable listening device 2502 can combine the ambient audio signal sample 2406 provided by the ambient microphone 2504 with the in-ear audio signal sample 2404 provided by the in-ear microphone 2506. The anti-noise ANC signal can then be employed by the wearable listening device 2502 to cancel ambient noise associated with the audio signal sample 2402 to provide the audio output 2410. The anti-noise signal subtracts the predicted noise.).
Amman, Sun, and Lester are considered analogous in the field of speech processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Amman and Sun to combine the teachings of Lester because doing so would allow for an adaptive filter to be used to further predict a specific noise for generating a clean output signal, leading to better speech enhancement (Lester [0038] present disclosure involve improved audio processing systems that are configured to employ artificial intelligence (Al) or machine learning (ML) to determine denoiser masks that can be applied to an audio signal sample in a manner that satisfies exacting conversational speech latency requirements).
Regarding claim 2, the combination of Amman, Sun, and Lester teaches: the device of claim 1. Amman further teaches: wherein the one or more processors are further configured to select a second filter of the one or more filters from a plurality of filters ([0050] Another pre-filter option that may be selected for removing cabin noise is the engine harmonic pre-filter 3. The engine harmonic pre-filter 3 may be created to reduce, at least in part, cabin noise resulting from the rotational physical operation of the vehicle engine) based on the sensor data identifying an operational state of a second device ([0025] an operational state of the engine may identify a current engine speed (e.g., measured in revolutions per minute), where the engine is known to make specific known sounds within the vehicle cabin at different engine speeds).
Regarding claim 6, the combination of Amman, Sun, and Lester teaches: the device of claim 1. Amman further teaches: wherein the one or more processors are further configured to receive a microphone output signal from a microphone, and wherein the input audio signal is based on the microphone output signal ([0005] a noise reduction tool configured to provide a noise reduction feature to a sound input received by a microphone within a vehicle's cabin in order to better detect a user's speech from the sound input. More specifically, the noise reduction tool may apply specific noise reduction pre-filters to reduce noise in the sound input caused by vehicle components and/or other external factors that are known to be operating or present while the sound input is being received by the microphone).
Regarding claim 7, the combination of Amman, Sun, and Lester teaches: the device of claim 1. Amman further teaches: wherein the environmental condition corresponds to a window being rolled down ([0031]an operational state for windows may identify a window open position for one or more windows of the vehicle, where the window open position is known to contribute a specific sound into the vehicle cabin), and wherein the particular sound corresponds to wind ([0052] the creation of the wind buffeting pre-filter 5 may serve to reduce, at least on part, cabin noise predicted to be within the vehicle cabin caused by wind buffeting due to the down position of one or more windows from the sound input 105).
Regarding claim 10, the combination of Amman, Sun, and Lester teaches: the device of claim 1. Amman further teaches: wherein the sensor data includes particular sensor data from a door sensor ([0080] The vehicle operational state interface 429 may be configured to receive information related to an operational state for various vehicle components that comprise the vehicle system. For example, the vehicle system may include one or more power windows, an engine, windshield wipers, turn signals, car audio system, HVAC system, suspension system, and other components with the potential of adding to cabin noise. This can also include doors, which can add to cabin noise.).
Regarding claim 11, the combination of Amman, Sun, and Lester teaches: the device of claim 1. Amman further teaches: wherein the environmental condition indicates whether a window is open ([0031] an operational state for windows may identify a window open position for one or more windows of the vehicle, where the window open position is known to contribute a specific sound into the vehicle cabin), whether wipers of a vehicle are activated ([0029] an operational state for a wiper component may identify a speed at which the wiper component is operating, where the wiper component is known to have a specific sound within the vehicle cabin at different wiper operational speeds), or both.
Regarding claim 19, it recites similar limitations as claim 1 and therefore is rejected similarly.
Regarding claim 20, it recites similar limitations as claim 2 and therefore is rejected similarly.
Regarding claim 23, the combination of Amman, Sun, and Lester teaches: the device of claim 1. Sun further teaches: wherein the trained model includes a neural network ([0015] an accurate trained model (e.g. implemented with a neural network) may be used to determine the stationary noise content given a representation of an audio signal).
Regarding claim 24, it recites similar limitations as claim 6 and therefore is rejected similarly.
Regarding claim 25, it recites similar limitations as claim 7 and therefore is rejected similarly.
Regarding claim 27, Amman teaches: a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to... ([0086] the computer-readable medium can include a solid-state memory such as a memory Card or other package that houses one or more non-volatile read-only memories, such as flash memory. Further, the computer-readable medium can be a random access memory or other volatile re-writable memory). The rest of the claim recites similar limitations as claim 1 and therefore is rejected similarly.
Regarding claim 29, it recites similar limitations as claim 1 and therefore is rejected similarly.
Regarding claim 30, the combination of Amman, Sun, and Lester teaches: the apparatus of claim 29. Lester further teaches: wherein the means for obtaining and the means for subtracting are integrated into at least one of a smart speaker, a speaker bar, a smart phone, a computer, a display device, a television, a gaming console, a music player, a camera, a navigation device, a vehicle, a headset, an augmented reality headset, a mixed reality headset, a virtual reality headset, an aerial vehicle, a home automation system, a voice activated device, a wireless speaker and voice activated device, a portable electronic device, a communication device, an internet-of-things (loT) device, a virtual reality (VR) device, a mobile device, or any combination thereof ([0179] The client device 1802 can be a user device such as a computing device, a desktop computer, a laptop computer, a mobile device, a smartphone, а tablet computer, a netbook, a wearable device, a virtual reality device, or the like).
Claims 3-4, 8-9, 21-22, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Amman in view of Sun and Lester, as applied to claims 1-2, 6-7, 10-11, 19-20, 24-25, 27, and 29-30 above, and further in view of Everman et al. (US 20220386018 A1; hereinafter referred to as Everman).
Regarding claim 3, the combination of Amman, Sun, and Lester teaches: the device of claim 1. The combination of Amman, Sun, and Lester does not explicitly, but Everman teaches: wherein the adaptive filter includes anon-linear filter ([0033] signal analysis (i.e., processing) may include non-linear filtering of signals).
Amman, Sun, Lester, and Everman are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Amman, Sun, and Lester to combine the teachings of Everman because applying different types of filters to the audio signal would improve the signal to noise ratio and lead to better speech enhancement (Everman [0032] Signal processing steps may include without limitation any of analyzing, modifying, and/or synthesizing a signal. In some cases, a Signal may be processed in order to improve the signal, for instance by improving transmission, storage efficiency, or signal to noise ratio).
Regarding claim 4, the combination of Amman, Sun, Lester, and Everman teaches: the device of claim 3. Everman further teaches: wherein the non-linear filter includes a Wiener filter ([0033] examples of algorithms that may be performed according to digital signal processing techniques include fast Fourier transform (FFT) implemented using hardware and/or software configurations, finite impulse response (FIR) filter, infinite impulse response (I/F) filter, and adaptive filters such as the Wiener and Kalman filters).
Regarding claim 8, the combination of Amman, Sun, and Lester teaches: the device of claim 1. The combination of Amman, Sun, and Lester does not explicitly, but Everman teaches: wherein the one or more filters include a linear filter ([0033] non-limiting examples of algorithms that may be performed according to digital signal processing techniques include fast Fourier transform (FFT) implemented using hardware and/or software configurations, finite impulse response (FIR) filter, infinite impulse response (IIR) filter, and adaptive filters such as the Wiener and Kalman filters).
Regarding claim 9, the combination of Amman, Sun, Lester, and Everman teaches: the device of claim 8. Everman further teaches: wherein the linear filter includes a finite impulse response (FIR) filter ([0033] пon-limiting examples of algorithms that may be performed according to digital signal processing techniques include fast Fourier transform (FFT) implemented using hardware and/or software configurations, finite impulse response (FIR) filter, infinite impulse response (IIR) filter, and adaptive filters such as the Wiener and Kalman filters).
Regarding claim 21, it recites similar limitations as claim 3 and therefore is rejected similarly.
Regarding claim 22, it recites similar limitations as claim 4 and therefore is rejected similarly.
Regarding claim 26, it recites similar limitations as claim 8 and therefore is rejected similarly.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Amman in view of Sun and Lester, as applied to claims 1-2, 6-7, 10-11, 19-20, 24-25, 27, and 29-30 above, and further in view of Lu et al. (US 20250037729 A1; hereinafter referred to as Lu).
Regarding claim 5, the combination of Amman, Sun, and Lester teaches: the device of claim 1. The combination of Amman, Sun, and Lester does not explicitly, but Lu teaches: wherein the first processed audio purpose corresponds to a voice call ([0059] in an instance in which the machine learning model is to generate denoising masks to be applied to audio content that includes a single talker or other non-dialog-heavy content, the aggressiveness control parameter may be set to a relatively larger value that prioritizes noise reduction over speech preservation. This can be a voice call with little dialog.), wherein the second processed audio purpose corresponds to automatic speech recognition ([0059] In some implementations, the aggressiveness control parameter value may be determined based on a type of audio content that is to be processed using the machine learning model. For example, in an instance in which the machine learning model is to generate denoising masks to be applied to audio content that includes conversational content (e.g., with multiple talkers), or the like, the aggressiveness control parameter may be set to a value that is relatively low, e.g., conservative, and therefore prioritizes speech preservation over noise reduction. This can be ASR in an environment with many speakers.), and wherein the first set of weights result in a more aggressive time-frequency mask than the second set of weights ([0059] Process 500 can begin at 502 by determining an aggressiveness control parameter value that modulates a degree of speech preservation to be used when denoising a noisy audio signal. In some implementations, the aggressiveness control parameter value may be determined based on a type of audio content that is to be processed using the machine learning model).
Amman, Sun, Lester, and Lu are considered analogous in the field of speech processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Amman, Sun, and Lester to combine the teachings of Lu because doing so would allow for the use of an aggressiveness control parameter to control the level of noise removed in a speech enhancement process, leading to improved flexibility in balancing noise reduction and speech preservation (Lu [0040] a trained machine learning model may be used to generate a denoised audio signal from an input noisy audio signal. In some implementations, it may be desirable to control a degree of speech preservation in the denoised audio signal. For example, a more aggressive denoising technique may produce a greater degree of noise reduction while having worse performance on speech preservation, and vice versa).
Claims 12-13 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Amman in view of Sun and Lester, as applied to claims 1-2, 6-7, 10-11, 19-20, 24-25, 27, and 29-30 above, and further in view of Matsukawa et al. (US 20210065731 A1; hereinafter referred to as Matsukawa).
Regarding claim 12, the combination of Amman, Sun, and Lester teaches: the device of claim 1. The combination of Amman, Sun, and Lester does not explicitly, but Matsukawa teaches: wherein the one or more processors are configured to obtain a second input audio signal including at least second speech of a second person ([0052] in some embodiments, the teachings and techniques described herein may be used in applications involving teleconferencing or other voice or audio communication scenarios in which a microphone picks up unwanted sounds, such as an echo, other voices, music, motor noise, fan noise, etc. The teachings and techniques described herein can be used to generate an estimate of the unwanted sound, which can then be used to remove the unwanted so und from the subject signal. The second input can be an unwanted voice audio), wherein the output audio signal is based at least in part on the second input audio signal ([0055] Samples of the real noise, interference, and/or distortion, such as for example samples taken from haptics devices or distortion creating objects, are used during the training phase so that the model can learn to identify, predict, and/or generate the noise, interference, and/or distortion).
Amman, Sun, Lester, and Matsukawa are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Amman, Sun, and Lester to combine the teachings of Matsukawa because doing so would allow second input speech to be detected by the microphone and handled by a filter, such as an echo cancellation filter, to improve speech quality (Matsukawa [0048] the subject signal s may comprise a microphone signal having an unwanted echo. By way of example, the echo may be created by the voice of a far-end speaker on a conference call, music playback, or other sounds picked up by the microphone. In order to remove the noise (i.e. the echo), the AEC algorithm may use an echo cancellation filter).
Regarding claim 13, the combination of Amman, Sun, Lester, and Matsukawa teaches: the device of claim 12. Matsukawa further teaches: wherein the one or more processors are configured to receive a microphone output signal from a microphone ([0004] the processor based apparatus is configured to execute steps comprising: receiving the signal that includes the noise), wherein the input audio signal is based on the microphone output signal ([0050] the microphone generates, produces, or creates the microphone signal s based on the sounds picked up by the microphone) and wherein the input audio signal and the second input audio signal are processed using the one or more filters to generate the filtered audio signal ([0048- 0049] y=s-h.sup.Tx.sub.estimated. In this equation y=the output signal having the noise (i.e. the echo) removed or reduced, s=the microphone signal, and h=the echo cancellation filter. In some embodiments, this equation represents a transfer function to cancel the echo or other unwanted sounds picked up by the microphone).
Regarding claim 28, it recites similar limitations as claim 12 and therefore is rejected similarly.
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Amman in view of Sun, Lester, and Matsukawa, as applied to claims 12-13 and 28 above, and further in view of Yang et al. (US 20240046946 A1; hereinafter referred to as Yang).
Regarding claim 14, the combination of Amman, Sun, Lester, and Matsukawa teaches: the device of claim 12. The combination of Amman, Sun, Lester, and Matsukawa does not explicitly, but Yang teaches: wherein the filtered audio signal and the second input audio signal are processed using the trained model to generate the predicted noise signal ([0068] after the speech mask and noise mask are created from the acoustic feature subsets, the estimated speech mask can be combined (such as multiplied) with a first reference microphone signal 408 (such as noisy audio signal #1 in FIG. 4), and the estimated noise mask can be combined (such as multiplied) with a second reference microphone signal 410 (such as noisy audio signal #N in FIG. 4) to predict speech and noise components for use by the speech filtration model 406).
Amman, Sun, Lester, Matsukawa, and Yang are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Amman, Sun, Lester, and Matsukawa to combine the teachings of Yang because doing so would allow for a trained model to process the second input audio as echo noise in order to generate predicted noise characteristics, leading to better noise prediction (Yang [0053] The audio pre-processing model(s) 202 of embodiments this disclosure can include a two-branch architecture, including a speech prediction model and a noise prediction model configured to learn intermediate variables for characterizing speech and noise properties, respectively).
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Amman in view of Sun, Lester, and Matsukawa, as applied to claims 12-13 and 28 above, and further in view of Lee (US 20200286635 А1).
Regarding claim 15, the combination of Amman, Sun, Lester, and Matsukawa teaches: the device of claim 12. The combination of Amman, Sun, Lester, and Matsukawa does not explicitly, but Lee discloses: wherein the second input audio signal is received from a second device ([0051] the processor 120 may receive ambient noise information of a home appliance from an external device (not shown) that is disposed in the vicinity of the home appliance 100 to detect ambient noise of the home appliance 100).
Amman, Sun, Lester, Matsukawa, and Lee are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Amman, Sun, Lester, and Matsukawa to combine the teachings of Lee because doing so would allow for control and noise reduction of audio from the second device, leading to better user flexibility in managing speech enhancement (Lee [0094] when the ambient noise of the home appliance is small, the noise may be reduced by reducing the driving speed of the motor of the home appliance, and when the ambient noise of the home appliance is great, the motor may be controlled according to the existing control profile, thereby improving efficiency of the home appliance. That is, in some cases, both the effect of reducing the noise and the effect of improving efficiency of functions may be exhibited).
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Amman in view of Sun and Lester, as applied to claims 1-2, 6-7, 10-11, 19-20, 24-25, 27, and 29-30 above, and further in view of Iyer et al. (US 20220059112 A1; hereinafter referred to as lyer).
Regarding claim 16, the combination of Amman, Sun, and Lester teaches: the device of claim 1. The combination of Amman, Sun, and Lester does not explicitly, but lyer teaches: wherein the one or more processors are further configured to: obtain mode data indicative of an operation mode; and select, based on the operation mode, the trained model from a plurality of trained models to process the filtered audio signal ([0030] the audio noise reduction selection computing module 190 can learn, reinforce, and contextually switch audio noise reduction models to be implemented at the information handling system 100 to suppress and/or minimize one or more stationary noises, based on user, environmental, system, and session attributes).
Amman, Sun, Lester, and lyer are considered analogous in the field of audio processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Amman, Sun, and Lester to combine the teachings of lyer because doing so would allow for a selection of a specific trained model to be used to process audio signals based on the context from an operational mode or the environment, leading to better noise prediction using environmental context (lyer [0013] the audio noise reduction selection computing module can identify and predict, based on the user's environment, context, and behavior, the likely sources of non-stationary noises that occur, and enable accurate suppression of the noises by loading the correct (combination of) audio noise reduction models).
Claims 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Amman in view of Sun and Lester, as applied to claims 1-2, 6-7, 10-11, 19-20, 24-25, 27, and 29-30 above, and further in view of Yang.
Regarding claim 17, the combination of Amman, Sun, and Lester teaches: the device of claim 1. The combination of Amman, Sun, and Lester does not explicitly, but Yang teaches: wherein the one or more processors are further configured to update the trained model based on a processed signal generated by performing speech processing on the output audio signal ([0076] The speech model 502, the noise model 504, and the filtering model 506 are updated, such as by using a loss function, during the second stage training to reduce or minimize a difference between the predicted filtering mask and the ground truth filtering mask. In some cases, this can be expressed as a minimization of custom-character(W, W). The loss function calculates the error or loss associated with the predictions of the ultimate filtering mask output by the denoising system).
Amman, Sun, Lester, and Yang are considered analogous in the field of speech processing. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Amman, Sun, and Lester to combine the teachings of Yang because doing so would allow for the trained model to be updated using a loss function to improve noise prediction (Yang [0074] the speech model 502 and the noise model 504 are updated, such as by using a loss function, to reduce or minimize a difference or loss between an output speech mask (M.sub. S) and a ground truth speech mask and to reduce or minimize a difference or loss between an output noise mask (M.sub.V) and a ground truth noise mask).
Regarding claim 18, the combination of Amman, Sun, Lester, and Yang teaches: the device of claim 17. Yang further teaches: wherein the criterion is based on determining whether the output audio signal is to be used for automated speech recognition ([0057] embodiments of this disclosure include one or more audio pre-processing models 202 acting as an audio pre-processing front-end to reduce or remove various environmental noises contaminating the audio so that subsequent speech processing systems, such as ASR and wake-up services, can still work properly with minimal or no performance degradation).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Nathan Tengbumroong whose telephone number is (703)756-1725. The examiner can normally be reached Monday - Friday, 11:30 am - 8:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NATHAN TENGBUMROONG/Examiner, Art Unit 2654
/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654