DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 02/25/2026 has been entered.
This communication is in response to the Amendments and Arguments filed on 02/25/2026.
Claims 1-11 are pending and have been examined.
All previous objections/rejections not mentioned in this Office Action have been withdrawn by the examiner.
Notice of Pre-AIA or AIA Status
The present application is being examined under the pre-AIA first to invent provisions.
Response to Arguments
Applicant's arguments filed 02/25/2026 have been fully considered.
Regarding the 101 rejection, the rejection has been withdrawn. The independent claims recite generating output data by inhibiting the noise component and inhibiting deterioration of the speech component as a target of recognition, where the output data is transferred to a speech recognition engine, and the speech recognition engine recognizes speech in the output data. This is directed to a technological improvement as speech is recognized from data that has had a suppression of noise without deterioration of the speech component.
Applicant’s arguments with respect to claim(s) 1, 5, 7, and 9, have been considered but they are not persuasive and/or are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant asserts on pg 9 that Tasaki does not disclose determining a weighting coefficient based on input data in the predetermined section, and does not appear to rely on a section without speech to determine the weighting number. The Examiner respectfully disagrees with this assertion. Takashi teaches that a period may be detected as having certainly a background noise or certainly a speech period, where the weighing coefficients are determined to be a particular value when the period is a background noise, 0 for the decoded speech signal and 1 for the transformed decoded signal, and the weighting coefficient is multiplied by the respective signal (9:15-10:27).
The newly cited art of Kermorvant teaches that a noise estimate is initialized during the first 10 frames of a signal, where the assumption is that the first 10 frames contain only noise. (Sec. 2.1.2)
Please see the updated mappings below for further detail.
Hence, Applicant’s arguments are not persuasive and/or are moot.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-11 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 5, 7, and 9 each recite “before the data is segmented”. There is insufficient antecedent basis for this limitation in the claim, as multiple types of “data” have been previously recited prior to the introduction of the term “the data”. In the interest of compact prosecution, the Examiner will interpret the claims as --before the input data is segmented--. The Examiner suggests amending the claims to provide confirmation as to which data is being referred to.
Claims 2-4, 6, 8, 10, and 11 are rejected as being dependent upon a rejected base claim.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 2, 5, and 7-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tasaki (US Patent No. 6526378), hereinafter Tasaki, in view of Ichikawa (U.S. PG Pub No. 2006/0136203), hereinafter Ichikawa, and further in view of Kermorvant ("A comparison of noise reduction techniques for robust speech recognition”, IDIAP Research Report, July 1999), hereinafter Kermorvant.
Regarding claim 1, 5, 7, and 9, Tasaki teaches
(claims 1 and 5) A noise suppression device comprising (an apparatus (1:8-14)):
(claims 7 and 9) A noise suppression method executed by a computer (an apparatus and method (1:8-14)), comprising:
(claims 1 and 5) processing circuitry configured to…(a signal processing unit (6:2-17);
to generate post-noise suppression data by performing a noise suppression process on the input data (an input signal having a predetermined length is output as decoded speech, i.e. input data, which is processed by an amplitude smoother to suppress degraded sound such as quantization noise, i.e. performing a noise suppression process on input data, and after further processing is output as a transformed decoded speech, i.e. generate post-noise suppression data (1:8-14),(6:1-32),(6:52-7:1),(7:16-36));
((claim 1 and 7) to determine a weighting coefficient based on the input data in a predetermined section in a time series and the post-noise suppression data in the predetermined section)/(claims 5 and 9) to segment data in a whole section of the input data into a plurality of predetermined short sections in a time series and to determine a weighting coefficient in each of the plurality of short sections based on the input data in the plurality of short sections and the post-noise suppression data in the plurality of short sections), the input data including a speech component and a noise component (a signal having a predetermined length such as 1 frame length is obtained and output as the decoded speech, which is further processed, i.e. to segment data in a whole section of the input data into a plurality of predetermined short sections in a time series, and an addition control value is determined based on processing of the decoded speech through the signal evaluator, which is used by the weighted value adder to weight the decoded speech, i.e. to determine a weighting coefficient based on the input data, and the transformed decoded speech, i.e. to determine a weighting coefficient based on…the post-noise suppression data, where the weight can change between frames, i.e. in a predetermined section in a time series/in each of the plurality of short sections, and the signal can have both speech and noise, i.e. input data including a speech component and a noise component (6:25-32),(7:37-49),(8:11-31),(9:15-34),(10:57-62)), the input data in the predetermined section not including the speech component and including the noise component… (a period may be detected as having certainly a background noise, i.e. including the noise component, rather than certainly a speech period, i.e. not including the speech component (8:7-36),(8:58-9:11));
to generate output data by performing weighted addition on the input data and the post- noise suppression data by using values based on the weighting coefficient as weights, the weighted addition including multiplication of the input data with the weighting coefficient and multiplication of the post-noise suppression data with a complement of the weighting coefficient (the weighted value adder weights and adds the decoded speech and the transformed decoded speech, i.e. performing weighted addition on the input data and the post-noise suppression data, where a higher addition control value results in a smaller weight for the decoded speech and a larger weight for the transformed decoded speech, and a smaller addition control value results in a smaller weight for the transformed decoded speech and a larger weight for the decoded speech, i.e. using values based on the weighting coefficient as weights, where the weight is multiplied by the corresponding decoded or transformed decoded signal, and the weights can have a relationship to each other, such as when the weighting coefficient for the decoded speech is 1, the weighting coefficient for the transformed decoded speech is 0, when it is a speech period instead of a background noise period, i.e. weighted addition including multiplication of the input data with the weighting coefficient and multiplication of the post-noise suppression data with a complement of the weighting coefficient (9:15-10:27)), the output data being generated by inhibiting the noise component and inhibiting deterioration of the speech component as a target of recognition (for the output speech, i.e. output data being generated by, the background noise component that competes with the speech is suppressed, i.e. inhibiting the noise component, and quality degradation of the speech is suppressed, i.e. inhibiting deterioration of the speech component (1:8-14),(7:37-49),(9:15-10:1),(10:28-40)).
While Tasaki provides the use of weighted addition between signals for noise suppression, Tasaki does not specifically teach that the input sound is from a microphone, or outputting the data to a speech recognition device, and thus does not teach
receive input data from a microphone;
to transfer the output data to a speech recognition engine; and
to recognize, using the speech recognition engine, speech in the output data.
Ichikawa, however, teaches receive input data from a microphone (a microphone converts sound from the surroundings into an observed signal that is further processed by a noise reduction unit [0053-5]);
to transfer the output data to a speech recognition engine (the output of the noise reduction process, i.e. output data, is used for a speech recognition process performed by a computer i.e. transfer…to a speech recognition engine [0051],[0069],[0109]); and
to recognize, using the speech recognition engine, speech in the output data (speech recognition was performed on the output of the noise reduction process, i.e. output data, to recognize digits or words, i.e. to recognize speech, where the speech recognition process is performed by a computer, i.e. using the speech recognition engine [0051],[0069],[0089],[0092],[0109]).
And Ichikawa further teaches that adaptive coefficients are learned during non-speech segments [0079].
Tasaki and Ichikawa are analogous art because they are from a similar field of endeavor in performing noise reduction on speech signals. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the use of weighted addition between signals for noise suppression teachings of Tasaki with the receipt of signals from a microphone and output of noise reduced data to a speech recognition system as taught by Ichikawa. It would have been obvious to combine the references to improve speech for speech recognition in an environment where noise is present (Ichikawa [0001]).
While Tasaki in view of Ichikawa provides identifying speech versus noise-only segments, Tasaki in view of Ichikawa does not specifically teach the predetermined section be a predetermined time in the input data before the data is segmented, and thus does not teach
the predetermined section being determined as a predetermined time in the input data before the data is segmented.
Kermorvant, however, teaches the predetermined section being determined as a predetermined time in the input data before the data is segmented (the initialization of the noise estimate is done on the first 10 frames, i.e. determined as a predetermined time in the input data before the data is segmented, which makes the assumption that the first 10 frames contain only noise, i.e. predetermined section (Sec. 2.1.2)).
Tasaki, Ichikawa, and Kermorvant are analogous art because they are from a similar field of endeavor in performing noise reduction on speech signals. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the identifying speech versus noise-only segments teachings of Tasaki, as modified by Ichikawa, with the initialization of the noise estimate in the first 10 frames of the signal as taught by Kermorvant. It would have been obvious to combine the references to initialize an estimate of the noise power spectrum required for spectral subtraction before updating the noise power spectrum during subsequent non-speech periods (Kermorvant (Sec. 2.1.2)).
Regarding claim 2, Tasaki in view of Ichikawa and Kermorvant teaches claim 1, and Kermorvant further teaches
uses a period from a time point when inputting the input data is started till elapse of a predetermined time as the predetermined section (the initialization of the noise estimate is done on the first 10 frames, i.e. uses a period from a time point when inputting the input data is started till elapse of a predetermined time, which makes the assumption that the first 10 frames contain only noise, i.e. predetermined section (Sec. 2.1.2)).
Where the motivation to combine is the same as previously presented.
Regarding claims 8 and 10, Tasaki in view of Ichikawa and Kermorvant teaches claims 7 and 9, and Ichikawa further teaches
A non-transitory computer-readable storage medium storing a noise suppression program noise suppression program that causes a computer to execute the noise suppression method… (a computer usable medium having program code embodied therein for causing a computer to effect the method steps [0118]).
Where the motivation to combine is the same as previously presented.
Regarding claim 11, Tasaki in view of Ichikawa and Kermorvant teaches claim 1, and Ichikawa further teaches
convert, using the speech recognition engine, the recognized speech into text (speech recognition was performed on the output of the noise reduction process, i.e. output data, to recognize the characters associated with digits or words, i.e. convert…the recognized speech into text, where the speech recognition process is performed by a computer, i.e. using the speech recognition engine [0051],[0069],[0089-90],[0092],[0109]).
Where the motivation to combine is the same as previously presented.
Claim(s) 3 and 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tasaki, in view of Ichikawa, in view of Kermorvant, and further in view of Rahbar (U.S. PG Pub No. 2007/0255560), hereinafter Rahbar.
Regarding claim 3, Tasaki in view of Ichikawa and Kermorvant teaches claim 1.
While Tasaki in view of Ichikawa and Kermorvant provides calculating power to determine an addition control value, Tasaki in view of Ichikawa and Kermorvant does not specifically teach a ratio between the power of the input data and the power of the post-noise suppression data, and thus does not teach
calculates the weighting coefficient based on a ratio between power of the input data in the predetermined section and power of the post- noise suppression data in the predetermined section.
Rahbar, however, teaches calculates the weighting coefficient based on a ratio between power of the input data in the predetermined section and power of the post- noise suppression data in the predetermined section (the ratios are calculated for each data frame, where a frame can be identified as being only noise present, i.e. in the predetermined section, where the spectral gain estimator calculates the noise reduction filter coefficients based on the ratio between estimated clean speech power, i.e. power of the post- noise suppression data, and the total power for the data frame, i.e. power of the input data, such that low noise conditions will have little effect on the input signal, and high noise levels will be determined by a product of the outputs of the power ratios, i.e. weighting coefficient [0020],[0025],[0028]).
Where Tasaki teaches that weights are determined based on the noise being high or low, such as a higher weight being given to the decoded speech during low noise (9:15-29).
And where Kermorvant teaches that the noise power spectrum estimate is initialized in the first 10 frames (Sec. 2.1.2).
Tasaki, Ichikawa, Kermorvant, and Rahbar are analogous art because they are from a similar field of endeavor in noise suppression during audio with speech. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the calculating power to determine an addition control value teachings of Tasaki, as modified by Ichikawa and Kermorvant, with comparing the power of the clean speech and the total power of the data frame as taught by Rahbar. It would have been obvious to combine the references to maximize noise suppression while minimizing speech distortion under severe noisy conditions and with very low computational complexity (Rahbar [0011]).
Regarding claim 4, Tasaki in view of Ichikawa and Kermorvant teaches claim 1, and Tasaki further teaches
a noise type judgment model used for judging which of the plurality of types of noise … corresponds to the noise component included in the input data based on a spectral feature value of the input data, wherein the processing circuitry calculates noise, as one of the plurality of types of noise, being most similar to the data in the predetermined section in the input data by using the noise type judgment model, and outputs a candidate for the weighting coefficient associated with …the calculated noise …as the weighting coefficient (processing occurs in a frame, where the background noise likeness evaluator and frictional sound likeness calculator determine whether there is background noise in the input, i.e. predetermined section, using the power input and the estimated noise power, and frictional sound in the input decoded speech using the number of crossing zero, i.e. a noise type judgment model used for judging which of the plurality of types of noise…corresponds to the noise component included in the input data based on a spectral feature value of the input data, wherein the processing circuitry calculates noise as one of the plurality of types of noise being most similar to the data in the predetermined section in the input data by using the noise type judgment model, where the background noise likeness and frictional sound likeness are weighted and added for the addition control value, which determines the weighting of the decoded speech to the transformed decoded speech, i.e. outputs a candidate for the weighting coefficient associated with …the calculated noise … as the weighting coefficient (8:11-24),(9:15-34),(10:57-62),(23-27-24:12),(24:20-31)).
While Tasaki in view of Ichikawa and Kermorvant provides determining a background noise and frictional sound likeness to calculate weighting, Tasaki in view of Ichikawa and Kermorvant does not specifically teach the use of a weighting coefficient table, and thus does not teach
a weighting coefficient table to hold predetermined candidates for the weighting coefficient while associating the predetermined candidates with noise identification numbers assigned respectively to a plurality of types of noise.
Rahbar, however, teaches a weighting coefficient table to hold predetermined candidates for the weighting coefficient while associating the predetermined candidates with noise identification numbers assigned respectively to a plurality of types of noise (a lookup table can be used to look up the appropriate gain, i.e. a weighting coefficient table to hold predetermined candidates for the weighting coefficient, based on calculated noise ratios used as inputs that will vary based on whether the noise levels are low or high, i.e. associating the predetermined candidates with noise identification numbers assigned respectively to a plurality of types of noise [0020],[0028]).
Where Tasaki teaches that the noise can be determined as background noise, frictional sound, or a combination (24:20-31).
And where Kermorvant teaches that the noise power spectrum estimate is initialized in the first 10 frames (Sec. 2.1.2).
Tasaki, Ichikawa, Kermorvant, and Rahbar are analogous art because they are from a similar field of endeavor in noise suppression during audio with speech. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the determining a background noise and frictional sound likeness to calculate weighting teachings of Tasaki, as modified by Ichikawa and Kermorvant, with the use of a lookup table to determine the appropriate gain based on an evaluation of noise as taught by Rahbar. It would have been obvious to combine the references to Maximize noise suppression while minimizing speech distortion under severe noisy conditions and with very low computational complexity (Rahbar [0011]).
Allowable Subject Matter
Claim 6 is dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
The closest prior art of Tasaki teaches using different threshold values of the addition control value to adjust the weighting coefficients. However, Tasaki does not teach comparing a ratio of the power of the input data to the post-noise suppression data to a set of thresholds, where the threshold values depend on whether the input data is judged as speech or noise, and setting different weighting coefficient values based on the relationship of the ratio to the corresponding thresholds, where the weighting coefficient values depend on whether the input data is judged as speech or noise.
Ichikawa teaches calculating adaptive coefficients after the segment is determined to be speech or non-speech through the basis of the power of the observed signal. However, Ichikawa does not teach comparing a ratio of the power of the input data to the post-noise suppression data to a set of thresholds, where the threshold values depend on whether the input data is judged as speech or noise, and setting different weighting coefficient values based on the relationship of the ratio to the corresponding thresholds, where the weighting coefficient values depend on whether the input data is judged as speech or noise.
Kermorvant teaches adaptive noise characteristics. However, Kermorvant does not teach comparing a ratio of the power of the input data to the post-noise suppression data to a set of thresholds, where the threshold values depend on whether the input data is judged as speech or noise, and setting different weighting coefficient values based on the relationship of the ratio to the corresponding thresholds, where the weighting coefficient values depend on whether the input data is judged as speech or noise.
Rahbar teaches using a lookup table to determine noise reduction filter coefficients, where the table may be used or bypassed depending on the value of the ratio between the estimated clean speech power and total power for the data frame. However, Rahbar does not teach determining whether input data is speech or noise, comparing a ratio of the power of the input data to the post-noise suppression data to a set of thresholds, where the threshold values depend on whether the input data is judged as speech or noise, and setting different weighting coefficient values based on the relationship of the ratio to the corresponding thresholds, where the weighting coefficient values depend on whether the input data is judged as speech or noise.
Grosse-Schulte (US 2008/0140396) teaches classifying the input signal as voice or unvoiced and comparing the SNR to a predetermined threshold. However, Grosse-Schulte does not teach comparing a ratio of the power of the input data to the post-noise suppression data to a set of thresholds, where the threshold values depend on whether the input data is judged as speech or noise, and setting different weighting coefficient values based on the relationship of the ratio to the corresponding thresholds, where the weighting coefficient values depend on whether the input data is judged as speech or noise.
None of Tasaki, Ichikawa, Kermorvant, Rahbar, and Grosse-Schulte, either alone or in combination, teaches or makes obvious comparing a ratio of the power of the input data to the post-noise suppression data to a set of thresholds, where the threshold values depend on whether the input data is judged as speech or noise, and setting different weighting coefficient values based on the relationship of the ratio to the corresponding thresholds, where the weighting coefficient values depend on whether the input data is judged as speech or noise, where the weighting coefficient values are used for a weighting addition on input data and post-noise suppression data to generate output data. Therefore, none of the cited prior art either alone or in combination, teaches or makes obvious the combination of limitations as recited in the dependent claims including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NICOLE A K SCHMIEDER/Primary Examiner, Art Unit 2659