Last updated: April 19, 2026
Application No. 18/717,386
SPEECH SIGNAL PROCESSING METHOD, SPEECH SIGNAL PROCESSING APPARATUS, AND PROGRAM

Non-Final OA §101§102§103§112
Filed
Jun 06, 2024
Examiner
SMITH, SEAN THOMAS
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Nippon Telegraph and Telephone Corporation
OA Round
1 (Non-Final)
Interview Optional

— +33.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 6 resolved cases, 2023–2026
Examiner Intelligence

SMITH, SEAN THOMAS View full profile →
Grants 83% — above average
Career Allow Rate
5 granted / 6 resolved
+21.3% vs TC avg
Strong +33% interview lift
Without
With
+33.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
37 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
27.9%
-12.1% vs TC avg
§103
50.7%
+10.7% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
8.6%
-31.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 6 resolved cases
Office Action

§101 §102 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). Claims 1-7 and 9-21 are granted the benefit of the earlier filing date of PCT/JP 2021/045610, filed December 10th, 2021.
Should applicant desire to obtain the benefit of foreign priority under 35 U.S.C. 119(a)-(d) prior to declaration of an interference, a certified English translation of the foreign application must be submitted in reply to this action.  37 CFR 41.154(b) and 41.202(e).
Failure to provide a certified translation may result in no benefit being accorded for the non-English application.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on June 6th, 2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claim 11 (and by dependency, claims 12-13) is objected to because of the following informalities: The claims recite the "voice signal processing method according to claim 7," while claim 7 is directed to a "voice signal processing device." Appropriate correction is required.
Claims 20 and 21 are objected to because of the following informalities: The claims recite an "emphasized signal". Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f), except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f), except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f), because the claim limitations uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are: "acquisition unit" and "determination unit" in claim 7, and "switching model unit" in claim 20.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f), they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f), applicant may:  (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f).
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claim 7 recites the limitation "determine an input signal to be used for the voice recognition." There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-7 and 9-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite a mental process that can be performed in the human mind or with the aid of pen and paper. This judicial exception is not integrated into a practical application because a computer is invoked merely as a tool to execute an abstract idea. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because an abstract idea is merely applied on a generic computer.
Regarding claim 1, the claim recites “A voice signal processing method comprising:acquiring an output value indicating whether to perform voice enhancement on an observation signal in which a voice or noise of another speaker overlaps with a voice of a target speaker, or indicating a degree of necessity of performing the voice enhancement; anddeciding, under a predetermined condition, a ratio between the observation signal and an enhancement signal generated by the voice enhancement using the output value that has been acquired, to determine an input signal to be used for voice recognition.”
The limitations of “acquiring an output value indicating whether to perform voice enhancement,” and “deciding… a ratio… to determine an input signal,” as drafted cover mental activities which can be performed in the mind or with the aid of pen and paper. Taken individually, or as a whole, these limitations describe acts which are equivalent to human mental work of listening and decision making.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the steps of the claimed invention can be performed mentally, and no additional features in the claims would preclude them from being performed as such. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 2, the claim depends from claim 1, and thus recites the limitations of claim 1, “wherein the predetermined condition is defined by the following expression when the output value is ^k, the enhancement signal is ^S, the observation signal is Y, the input signal is ~S, and λ is a preset value in a range of 0 < λ < 1.
    PNG
    media_image1.png
    271
    230
    media_image1.png
    Greyscale
”
Taken individually, or as a whole with claim 1, these limitations describe acts which are equivalent to human mental work defined by mathematical concepts. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 3, the claim depends from claim 1, and thus recites the limitations of claim 1, “wherein the predetermined condition is defined by the following expressing when the output value is ^k, the enhancement signal is ^S, the observation signal is Y, and the input signal is ~S.
    PNG
    media_image2.png
    95
    338
    media_image2.png
    Greyscale
”
Taken individually, or as a whole with claim 1, these limitations describe acts which are equivalent to human mental work defined by mathematical concepts. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 4, the claim depends from claim 1, and thus recites the limitations of claim 1, “wherein the output value is an output value output by a learned model, and the learned model receives, as an input, at least one of the observation signal and the enhancement signal, and output whether to perform the voice enhancement from a viewpoint of voice recognition performance or the degree of necessity of performing the voice enhancement.”
The limitation of “output whether to perform the voice enhancement from a viewpoint of voice recognition performance or the degree of necessity of performing the voice enhancement,” as drafted covers mental activities which can be performed in the mind or with the aid of pen and paper. Taken individually, or as a whole with claim 1, these limitations describe acts which are equivalent to human mental work of decision making. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 5, the claim depends from claim 4, and thus recites the limitations of claims 1 and 4, “wherein the learned model is learned to minimize L, which is a calculation result defined by the following expression, when a loss coefficient is L and a training label used to generate the learned model is k.
    PNG
    media_image3.png
    104
    560
    media_image3.png
    Greyscale
”
The recited loss function is well-understood and readily available to a person having ordinary skill in the art of machine learning, as described in specification paragraph [0028], “As the loss function, for example, a known cross entropy loss define by the following expression can be used.
    PNG
    media_image4.png
    79
    440
    media_image4.png
    Greyscale
”
Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 6, the claim depends from claim 5, and thus recites the limitations of claims 1 and 4-5, “wherein, in the observation signal, when a true value of a ratio between the voice of the target speaker and the voice of the another speaker is SIR, a true value of a ratio between the voice of the target speaker and the noise is SNR, an output value of the learned model when the SIR is input is ^SIR, and an output value of the learned model when the SNR is input is ^SNR, Lmulti that is a calculation result defined by the following expression is used as the loss coefficient by using parameters α and β.
    PNG
    media_image5.png
    404
    411
    media_image5.png
    Greyscale
”
The recited multitask loss function is well-understood and readily available to a person having ordinary skill in the art of machine learning. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 20, the claim depends from claim 1, and thus recites the limitations of claim 1, “further comprising a switching model unit, in which the switching model unit performs speech recognition by using emphasized signal and observed signal, wherein degradation due to speech enhancement is prevented.”
The limitation of “performs speech recognition by using emphasized signal and observed signal,” as drafted covers mental activities which can be performed in the mind or with the aid of pen and paper. Taken individually, or as a whole with claim 1, these limitations describe acts which are equivalent to human mental work of decision making, or choosing a desirable audio track. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claim 21, the claim depends from claim 20, and thus recites the limitations of claims 1 and 20, “wherein the emphasized signal and the observed signal are switched based on determining whether the speech enhancement is required.”
The limitation of “determining whether the speech enhancement is required,” as drafted covers mental activities which can be performed in the mind or with the aid of pen and paper. Taken individually, or as a whole with the preceding claims, these limitations describe acts which are equivalent to human mental work of decision making. Accordingly, the claim is directed to an abstract idea without significantly more. The claim is not patent eligible.
Regarding claims 7 and 9-13, system claims 7 and 9-13 and method claims 1-6 are related as a method and system of using the same, with each system element’s function corresponding to the method step. Accordingly, claims 7 and 9-13 are similarly rejected under the same rationale as applied to claims 1-6.
Regarding claims 14-19, computer-readable medium claims 14-19 and method claims 1-6 are related as method and computer-readable medium for performing the same, with each computer-readable medium element’s function corresponding to the method step. Accordingly, claims 14-19 are similarly rejected under the same rationale as applied to claims 1-6.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 7, 14 and 20-21 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Japan publication JP 2000082999 to Sasaki and Haneda (hereinafter, "Sasaki").
Regarding claims 1, 7 and 14, Sasaki teaches a method, system and computer-readable medium comprising: acquiring an output value indicating whether to perform voice enhancement on an observation signal in which a voice or noise of another speaker overlaps with a voice of a target speaker, or indicating a degree of necessity of performing the voice enhancement (page 2, "Next, for example, a determination of PX(n) < Pth is performed for a predetermined threshold value Pth, and if this conditional expression is satisfied, it is determined that the noise is present."); and
deciding, under a predetermined condition, a ratio between the observation signal and an enhancement signal generated by the voice enhancement using the output value that has been acquired, to determine an input signal to be used for voice recognition (page 2, "The S/N ratio, which is the ratio of the target speech signal to the noise signal, is estimated using PAV N,k(n).").
Regarding claim 20, Sasaki teaches the voice signal processing method according to claim 1, further comprising a switching model unit, in which the switching model unit performs speech recognition by using emphasized signal and observed signal, wherein degradation due to speech enhancement is prevented (page 4, "Measures the ratio of the long-term average audio signal to noise signal of each band, and adaptively adds the optimal amount of each band signal to prevent deterioration in sound quality and maximize the noise reduction effect at the same time It is possible to" and page 5, “Averaging is performed using the following equation to obtain SNRAVk(n), which is transferred to the optimum input signal addition rate determination unit 53.”).
Regarding claim 21, Sasaki teaches the voice signal processing method according to claim 20, wherein the emphasized signal and the observed signal are switched based on determining whether the speech enhancement is required (page 3, " Based on (n), the optimum input signal addition rate α of the input signal is determined… The optimum input signal addition rate α is transferred to the input signal addition unit 54.In the input signal adding section 54, the band signal X transferred from the frequency band dividing section 22, the gainfactor inserting section 28, and the optimum input signal adding rate determining section 53, respectively.").
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 4, 11 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Sasaki in view of China invention application 111179962 to Wang and Lin (hereinafter, "Wang").
Regarding claims 4, 11 and 17, Sasaki does not explicitly teach a method, system or computer-readable medium “wherein the output value is an output value output by a learned model, and the learned model receives, as an input, at least one of the observation signal and the enhancement signal, and output whether to perform the voice enhancement from a viewpoint of voice recognition performance or the degree of necessity of performing the voice enhancement,” and thus, Wang is introduced.
Wang teaches the output value is an output value output by a learned model, and the learned model receives, as an input, at least one of the observation signal and the enhancement signal, and output whether to perform the voice enhancement from a viewpoint of voice recognition performance or the degree of necessity of performing the voice enhancement (page 4, "In one possible implementation, the accuracy determining module, which is used for executing any one of the following steps:
the clean speech signal based on the student model output of the first clean voice signal and the marking in the mixed voice signal, determining the accuracy information of the iteration process;").
Sasaki and Wang are considered analogous because they are each concerned with audio signal separation. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Sasaki with the teachings of Wang for the purpose of improving system performance with machine learning. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Allowable Subject Matter
Claims 2-3,  objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Japan publication JP 2008-309856 to Hata et al.
Japan publication JP 2011-065128 to Tachioka.
Japan publication JP 2014-102318 to Teranishi.
Japan publication JP 2015-203813 to Yamamoto et al.
Japan publication JP 2018-205512 to Endo.
WIPO publication WO 2021/144934 to Koizumi.
China invention application 112201267 to Li et al.
China invention application 112700786 to Zhang et al.
China invention application 113593590 to Lan et al.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN T SMITH whose telephone number is (571)272-6643. The examiner can normally be reached Monday - Friday 8:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, PIERRE-LOUIS DESIR can be reached at (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SEAN THOMAS SMITH/Examiner, Art Unit 2659     

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Jun 06, 2024
Application Filed
Feb 02, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/393,807
Patent 12602540
LEVERAGING A LARGE LANGUAGE MODEL ENCODER TO EVALUATE PREDICTIVE MODELS
2y 5m to grant Granted Apr 14, 2026
18/092,987
Patent 12530534
SYSTEM AND METHOD FOR GENERATING STRUCTURED SEMANTIC ANNOTATIONS FROM UNSTRUCTURED DOCUMENT
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+33.3%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 6 resolved cases by this examiner. Grant probability derived from career allow rate.