DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/11/2025 has been entered.
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-3, 5-11, 14, 16-22 and 24 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant's arguments filed 12/11/2025 have been fully considered but they are not persuasive. Regarding arguments on page 8 of the Remarks, Examiner notes that the newly applied Sun reference is relied upon to teach the use of a machine learning model for the speech enhancement. While Examiner agrees that noise suppression is not necessarily the same thing as speech enhancement, the two are related. Using Applicant’s example, Examiner argues that improved intelligibility is an enhancement of speech, as is improving the signal-to-noise ratio. If Applicant is arguing that reducing noise and improving intelligibility are not forms of speech enhancement according to the invention, then such should be clearly differentiated in the claims. However, one of ordinary skill in the art would understand that reducing the noise in the speech signal is a form of speech enhancement. As noted in the Advisory action of 10/31/2025, Applicant’ Specification on page 9 lines 22-27 states “The speech enhancement comprises processing the audio signals 205 so as to increase the intelligibility of the speech” and “portions of the audio signals 205 that are identified as speech can be enhanced and/or portions of the audio signals that are identified as not being speech can be attenuated.” Therefore, as the noise reduction is performing the steps described as speech enhancement from the Specification, Examiner believes this is a reasonable interpretation.
Regarding arguments on pages 8-9 of the Remarks, Examiner notes that the proportions of speech to noise can be controlled by adjusting only the noise, as Dewasurendra teaches. While Applicant appears to be arguing that both speech and remainder are controlled, this is not clearly reflected in the claims. By reducing the noise, the proportion of the speech to the noise is controlled, without having to adjust the speech signal.
Regarding arguments on page 9 of the Remarks, Examiner notes that para [0076-77] of Dewasurendra still appear to teach the argued limitation. The selection of the user override is considered the control parameter. “By selecting the override setting, the user may specifically request that a less aggressive level of noise suppression, or no noise suppression, be applied to the captured audio data by noise suppression unit 24” from para [0076] teaches a user preference for a level of noise suppression to be applied, which is considered “a level of speech enhancement of speech relative to a remainder” as noted above.
Regarding arguments on pages 9-10 of the Remarks, Examiner notes that the cited limitation is rejected based on para [0057] in addition to para [0076-77] of Dewasurendra. The gain values of para [0057] are considered the processing parameter, which are determined based on the user override signal and the sound classification. The newly applied Sun reference teaches using an aggressiveness control parameter, corresponding to the processing parameter, to perform the speech enhancement using a machine learning model.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-3, 5, 9, 11, 14, 16-18, and 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dewasurendra et al. (US 2017/0092288 A1), hereinafter referred to as Dewasurendra, in view of Sun et al. (US 2025/0037729 A1), hereinafter referred to as Sun.
Regarding claim 1, Dewasurendra teaches:
An apparatus, comprising:
at least one processor (para [0025], where processors are used); and
at least one memory storing instructions that, when executed with the at least one processor (para [0016-17], where a computer-readable medium is used), cause the apparatus at least to:
obtain one or more audio signals (para [0020], [0046], where audio signals are received at the audio pre-processor);
obtain a control parameter for speech enhancement wherein the control parameter indicates a user preference for a level of speech enhancement of speech relative to a remainder for respective ones of the one or more audio signals (para [0076-77], where a user override signal is received, requesting a less aggressive level of noise suppression);
process the one or more audio signals to determine a sound classification based at least on the one or more audio signals (para [0036], [0056-57], where input audio is classified as speech or music or both);
use the control parameter and the sound classification to determine a processing parameter (para [0057], [0076-77], where gain values are controlled based on the classification and detection of the user override signal); and
use a machine learning model for speech enhancement on the one or more audio signals, where use of the machine learning model comprises using the processing parameter for providing an output signal, where the machine learning model uses the processing parameter to control proportions of the speech and the remainder in the output signal (para [0048-50], where gains for both speech and noise are computed and applied to generate a noise-suppressed signal).
Dewasurendra does not teach:
use a machine learning model for speech enhancement on the one or more audio signals, where use of the machine learning model comprises using the processing parameter for providing an output signal, where the machine learning model uses the processing parameter to control proportions of the speech and the remainder in the output signal.
Sun teaches:
use a machine learning model for speech enhancement on the one or more audio signals, where use of the machine learning model comprises using the processing parameter for providing an output signal, where the machine learning model uses the processing parameter to control proportions of the speech and the remainder in the output signal (para [0045], where a machine learning model uses an aggressiveness control parameter to weight noise reduction and speech preservation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Dewasurendra by using the machine learning model of Sun (Sun para [0045]) in the speech enhancement of Dewasurendra (Dewasurendra para [0048-50]), in order to be able to adjust the aggressiveness of noise reduction using kernel size (Sun para [0045]).
Regarding claim 2, Dewasurendra in view of Sun teaches:
An apparatus as claimed in claim 1, wherein the instructions, when executed with the at least one processor, cause the apparatus to obtain the control parameter from a control input made by a user (Dewasurendra para [0076], where the user selects the override setting).
Regarding claim 3, Dewasurendra in view of Sun teaches:
An apparatus as claimed in claim 2, wherein the control input is made by the user before the one or more audio signals are captured (Sun Fig. 5 element 502, para [0059], where the control parameters are determined based on the type of audio to be processed).
Regarding claim 5, Dewasurendra in view of Sun teaches:
An apparatus as claimed in claim 1, wherein the control parameter comprises at least one of:
a value where the value indicates a proportion of the output signal that should be speech (Dewasurendra para [0076-77], where the override lowers or removes the noise suppression, affecting the proportion of speech); or
a value where the value indicates a proportion for speech relative to remainder in the output signal (Dewasurendra para [0076-77], where the override lowers or removes the noise suppression, affecting the proportion of speech).
Regarding claim 9, Dewasurendra in view of Sun teaches:
An apparatus as claimed in claim 1, wherein the instructions, when executed with the at least one processor, cause the apparatus to use the processing parameter to control mixing of the one or more audio signals (Dewasurendra para [0046], where the audio signals are separated into speech and noise, and [0049-50], [0057], [0076-77], where the gain values of the speech and noise are controlled based on the classification and detection of the user override signal).
Regarding claim 11, Dewasurendra in view of Sun teaches:
An apparatus as claimed in claim 1, wherein respective signals of the one or more audio signals comprise one or more channels (Dewasurendra para [0046-47], where speech and noise are separated into channels).
Regarding claim 14, Dewasurendra teaches:
A method, comprising:
obtaining one or more audio signals (para [0020], [0046], where audio signals are received at the audio pre-processor);
obtaining a control parameter for speech enhancement wherein the control parameter indicates a user preference for a level of speech enhancement of speech relative to a remainder for respective ones of the one or more audio signals (para [0076-77], where a user override signal is received, requesting a less aggressive level of noise suppression);
processing the one or more audio signals to determine a sound classification based at least on the one or more audio signals (para [0036], [0056-57], where input audio is classified as speech or music or both);
using the control parameter and the sound classification to determine a processing parameter (para [0057], [0076-77], where gain values are controlled based on the classification and detection of the user override signal); and
using a machine learning model for speech enhancement on the one or more audio signals, where use of the machine learning model comprises using the processing parameter for providing an output signal, where the machine learning model uses the processing parameter to control proportions of the speech and the remainder in the output signal (para [0048-50], where gains for both speech and noise are computed and applied to generate a noise-suppressed signal).
Dewasurendra does not teach:
using a machine learning model for speech enhancement on the one or more audio signals, where use of the machine learning model comprises using the processing parameter for providing an output signal, where the machine learning model uses the processing parameter to control proportions of the speech and the remainder in the output signal.
Sun teaches:
using a machine learning model for speech enhancement on the one or more audio signals, where use of the machine learning model comprises using the processing parameter for providing an output signal, where the machine learning model uses the processing parameter to control proportions of the speech and the remainder in the output signal (para [0045], where a machine learning model uses an aggressiveness control parameter to weight noise reduction and speech preservation).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Dewasurendra by using the machine learning model of Sun (Sun para [0045]) in the speech enhancement of Dewasurendra (Dewasurendra para [0048-50]), in order to be able to adjust the aggressiveness of noise reduction using kernel size (Sun para [0045]).
Regarding claim 16, Dewasurendra in view of Sun teaches:
A method as claimed in claim 14, wherein the control parameter is obtained from a control input made by a user (Dewasurendra para [0076], where the user selects the override setting).
Regarding claim 17, Dewasurendra in view of Sun teaches:
A method as claimed in claim 16, wherein the control input is made by the user before the one or more audio signals are captured (Sun Fig. 5 element 502, para [0059], where the control parameters are determined based on the type of audio to be processed).
Regarding claim 18, Dewasurendra in view of Sun teaches:
A method as claimed in claim 14, wherein the control parameter comprises at least one of:
a value where the value indicates a proportion of the output signal that should be speech (Dewasurendra para [0076-77], where the override lowers or removes the noise suppression, affecting the proportion of speech); or
a value where the value indicates a proportion for speech relative to remainder in the output signal (Dewasurendra para [0076-77], where the override lowers or removes the noise suppression, affecting the proportion of speech).
Regarding claim 24, Dewasurendra in view of Sun teaches:
A non-transitory program storage device readable with an apparatus, tangibly embodying a program of instructions executable with the apparatus for performing the method of claim 14 (Dewasurendra para [0016-17], where a computer-readable medium is used).
Claim(s) 6-8, 10, and 19-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dewasurendra, in view of Sun, and further in view of Yu (US 2022/0277766 A1).
Regarding claim 6, Dewasurendra in view of Sun teaches:
An apparatus as claimed in claim 1,
Dewasurendra in view of Sun does not teach:
wherein the sound classification comprises an indication of a probability that respective signals of the one or more audio signals comprise one or more sound categories.
Yu teaches:
wherein the sound classification comprises an indication of a probability that respective signals of the one or more audio signals comprise one or more sound categories (para [0035], where a confidence score of each classification indicates a likelihood that the specific frame includes speech or music).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Dewasurendra in view of Sun by using the likelihoods of Yu (Yu para [0035]) in the speech classification of Dewasurendra in view of Sun (Dewasurendra para [0036]), for use in a dialogue detector and for determining a smoothing factor for dialogue enhancement (Yu para [0037-38]).
Regarding claim 7, Dewasurendra in view of Sun and Yu teaches:
An apparatus as claimed in claim 6, wherein a first sound category comprises speech and a second sound category comprises not-speech (Dewasurendra para [0058], where speech content is classified as a 0, while music content is a value of 1).
Regarding claim 8, Dewasurendra in view of Sun teaches:
An apparatus as claimed in claim 1,
Dewasurendra in view of Sun does not teach:
wherein the instructions, when executed with the at least one processor, cause the apparatus to perform using a machine learning program to classify sounds within the one or more audio signals.
Yu teaches:
wherein the instructions, when executed with the at least one processor, cause the apparatus to perform using a machine learning program to classify sounds within the one or more audio signals (para [0035], where machine learning models are used in the classification).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Dewasurendra in view of Sun by using the machine learning model of Yu (Yu para [0035]) in the speech classification and enhancement of Dewasurendra in view of Sun (Dewasurendra para [0036], [0048-50]), for use in a dialogue detector and for determining a smoothing factor for dialogue enhancement (Yu para [0037-38]).
Regarding claim 10, Dewasurendra in view of Sun teaches:
An apparatus as claimed in claim 1, wherein the instructions, when executed with the at least one processor, cause the apparatus to use a product of the control parameter and a value based on the sound classification to determine the processing parameter (para [0062], where the post processing gain is determined based on the sound classification, and where M(n) is interpreted as the control parameter).
Dewasurendra in view of Sun does not explicitly teach that the control parameter is a value that can be multiplied.
Yu teaches:
use a product of the control parameter and a value based on the sound classification to determine the processing parameter (para [0036-37], where the user selected gain is a control parameter and is multiplied by the speech confidence score)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Dewasurendra in view of Sun by using the likelihoods of Yu (Yu para [0035]) in the speech classification of Dewasurendra in view of Sun (Dewasurendra para [0036]), for use in a dialogue detector and for determining a smoothing factor for dialogue enhancement (Yu para [0037-38]).
Regarding claim 19, Dewasurendra in view of Sun teaches:
A method as claimed in claim 14,
Dewasurendra in view of Sun does not teach:
wherein the sound classification comprises an indication of a probability that respective signals of the one or more audio signals comprise one or more sound categories.
Yu teaches:
wherein the sound classification comprises an indication of a probability that respective signals of the one or more audio signals comprise one or more sound categories (para [0035], where a confidence score of each classification indicates a likelihood that the specific frame includes speech or music).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Dewasurendra in view of Sun by using the likelihoods of Yu (Yu para [0035]) in the speech classification of Dewasurendra in view of Sun (Dewasurendra para [0036]), for use in a dialogue detector and for determining a smoothing factor for dialogue enhancement (Yu para [0037-38]).
Regarding claim 20, Dewasurendra in view of Sun and Yu teaches:
A method as claimed in claim 19, wherein a first sound category comprises speech and a second sound category comprises not-speech (Dewasurendra para [0058], where speech content is classified as a 0, while music content is a value of 1).
Regarding claim 21, Dewasurendra in view of Sun teaches:
A method as claimed in claim 14,
Dewasurendra in view of Sun does not teach:
wherein processing the one or more audio signals to determine the sound classification comprises using a machine learning program to classify sounds within the one or more audio signals.
Yu teaches:
wherein processing the one or more audio signals to determine the sound classification comprises using a machine learning program to classify sounds within the one or more audio signals (para [0035], where machine learning models are used in the classification).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Dewasurendra in view of Sun by using the machine learning model of Yu (Yu para [0035]) in the speech classification and enhancement of Dewasurendra in view of Sun (Dewasurendra para [0036], [0048-50]), for use in a dialogue detector and for determining a smoothing factor for dialogue enhancement (Yu para [0037-38]).
Regarding claim 22, Dewasurendra in view of Sun teaches:
A method as claimed in claim 14, further comprises using a product of the control parameter and a value based on the sound classification to determine the processing parameter (para [0062], where the post processing gain is determined based on the sound classification, and where M(n) is interpreted as the control parameter).
Dewasurendra in view of Sun does not explicitly teach that the control parameter is a value that can be multiplied.
Yu teaches:
using a product of the control parameter and a value based on the sound classification to determine the processing parameter (para [0036-37], where the user selected gain is a control parameter and is multiplied by the speech confidence score).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Dewasurendra in view of Sun by using the likelihoods of Yu (Yu para [0035]) in the speech classification of Dewasurendra in view of Sun (Dewasurendra para [0036]), for use in a dialogue detector and for determining a smoothing factor for dialogue enhancement (Yu para [0037-38]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2022/0007116 A1 para [0193] teaches using personalization parameters for the amount of noise reduction by a trained neural network.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658