DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 07/29/2024, 10/02/2025, and 11/06/2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are:
a neural network configured to generate a target waveform signal from an intermediate representation signal by changing a time component or a feature component of the intermediate representation signal indicating an intermediate representation between an input signal and the target waveform signal using a neural network function in claim 1.
a non-neural network configured to act for at least a part of processing for generating the target waveform signal from the intermediate representation signal using a non-neural network function indicating a relationship between the time component and the feature component of the intermediate representation signal in claim 1
the neural network performs upsampling of a time component of the intermediate representation signal using the neural network function in claim 2
the neural network generates the target waveform signal from the intermediate representation signal of which the time component is changed by the neural network in claim 4
the non-neural network performs frequency-time transformation based on an inverse short-time Fourier transform, an inverse wavelet transform, or a predetermined basis function on the intermediate representation signal in claim 5
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
1. Claims 1-7 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claims 1, 6, and 7, “A waveform signal generation system”, “A waveform signal generation method”, and “A non-transitory computer readable medium” are recited, which all are directed to one of the four statutory categories of invention (machine, method, and article of manufacture respectively; Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite mathematical concepts which fall into the category of abstract idea (Step 2A Prong 1: YES).
The following limitations, under their broadest reasonable interpretation, recite mathematical concepts:
…generate a target waveform signal from an intermediate representation signal by changing a time component of a feature component of the intermediate representation signal indicating an intermediate representation between an input signal and the target waveform signal …: changing a time component of frequency component of an intermediate representation amounts to mathematical calculations
… generating the target waveform signal from the intermediate representation signal using a non-neural network function indicating a relationship between the time component and the feature component of the intermediate representation signal: generating a target waveform using a function is a mathematical calculation.
Claims 1, 6, and 7 do not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). The only additional limitations in claim 1 are “A waveform signal generation system”, ”a neural network configured to generate…using a neural network function”, and “a non-neural network configured to act for at least a part of processing for generating…”. Claim 6 contains the additional limitation of “generating…using a neural network function”. Claim 7 contains the additional limitation “A non-transitory computer readable medium which stores a program causing a computer to function as the waveform signal generation system according to claim 1”. Each of these limitations is recited at a high level of generality and amount to mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination, mere instructions to implement the judicial exception using a generic computer do not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Therefore, claims 1, 6, and 7 are directed to abstract ideas.
Claims 1, 6, and 7 do not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the only additional limitations amount to mere instructions to implement the judicial exception using a generic computer, which even when viewed in combination, do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 1, 6, and 7 are not patent eligible.
Regarding claims 2-5, “The waveform signal generation system” is recited, which falls into one of the statutory categories of invention (machine; Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite further mathematical concepts which fall into the category of abstract idea (Step 2A Prong 1: YES).
Claim 2:
…performs upsampling of a time component of the intermediate representation signal …: upsampling a signal amounts to a mathematical calculation.
Claim 2 contains the additional limitations “wherein the neural network performs…using the neural network function”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 3:
Claim 3 contains the additional limitation “wherein the neural network is a convolutional neural network”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 4:
Generates the target waveform signal from the intermediate representation signal of which the time component is changed: generating the target waveform from the time component altered intermediate signal amounts to a mathematical calculation.
Claim 4 contains the additional limitation “wherein the non-neural network function generates…of which the time component is changed by the neural network”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 5:
performs frequency-time transformation based on an inverse short-time Fourier transform, an inverse wavelet transform, or a predetermined basis function on the intermediate representation signal: performing a frequency time-transformation is a mathematical calculation
Claim 5 contains the additional limitation “wherein the non-neural network performs…”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claims 2-5 do not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). As discussed above, the only additional limitations are mere instructions to implement the judicial exception using a generic computer, which even when viewed in combination do not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Therefore, claims 2-5 are directed to abstract ideas.
Claims 2-5 do not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the only additional limitations are mere instructions to implement the judicial exception using a generic computer, which even when viewed in combination do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 2-5 are not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
2. Claims 1-7 are rejected under 35 U.S.C. 103 as being unpatentable over Kaneko et al. (WO 2020/175530 A1, hereinafter Kaneko) in view of Kameoka & Kaneko (WO 2020/036178 A1, hereinafter Kameoka).
Regarding claim 1, Kaneko discloses A waveform signal generation system (Fig. 4; para. 0063 “Next, a configuration of data conversion learning apparatus according to an embodiment of the present invention will be described. As illustrated in FIG. 4, the data conversion learning device 100 according to the embodiment of the present invention can be configured by a computer including a CPU, a RAM, and a ROM that stores a program for executing a data conversion learning processing routine to be described later and various data. The data conversion learning device 100 functionally includes an input unit 10, a calculation unit 20, and an output unit 50, as illustrated in FIG. 4.”) comprising: a neural network (generator: para. 0099 “Next, in step S152, the acoustic feature amount sequence of the audio signal of the domain of the conversion destination is estimated from the acoustic feature amount sequence extracted by the acoustic feature extraction unit 72 using the forward direction generator G X[Wingdings font/0xE0] Y learned by the data conversion learning device 100.”; para. 0053 “In the embodiment of the present invention, a combination of a 2D CNN and a 1D CNN is used as the generator…”) configured to generate a target waveform signal (time domain signal: para. 0100 “In step S156, a time domain signal is generated from the acoustic feature amount sequence of the voice signal of the estimated domain…”) from an intermediate signal (acoustic feature amount sequence from voice signal of input domain: para. 0098 “First, in step S150, the acoustic feature amount sequence is extracted from the voice signal of the input domain of the conversion source…”) by changing a time component or a feature component of the intermediate representation signal indicating an intermediate representation between an input signal and the target waveform signal using a neural network function (para. 0051 “In the generator using the 2D CNN, down-sampling is performed in the time direction and the feature amount dimension direction in order to efficiently view the relationship in the time direction and the feature amount dimension direction…Then, up-sampling is performed in the time direction and the feature amount dimension direction, and the original size is returned…”).
Kaneko discloses further processing after the neural network to generate the target waveform (para. 0100 “In step S156, a time domain signal is generated from the acoustic feature amount sequence of the voice signal of the estimated domain of the conversation destination, and the time domain signal is output as the voice signal of the conversion destination by the output unit 90…”). However, Kaneko does not specifically disclose a non-neural network configured to act for at least a part of processing for generating the target waveform signal from the intermediate representation signal using a non-neural network function indicating a relationship between the time component and the feature component of the intermediate representation signal.
Kameoka teaches a non-neural network configured to act for at least a part of processing for generating the target waveform signal from the intermediate representation signal using a non-neural network function indicating a relationship between the time component and the feature component of the intermediate representation signal (para. 0067-0069 “[0067] As the acoustic feature vector, (A1) A vector having a logarithmic amplitude spectrum as an element…[0069] After the learning of F is completed, the acoustic feature amount sequence x of the input voice and the target attribute code c are input to G, so that the acoustic feature amount sequence of the converted voice is obtained…From the above, the converted voice can be obtained by the calculation process of the time domain signal according to the calculation process of the acoustic feature amount. For example, in a case where (A1) is used as the acoustic feature amount, a vocoder is used in the cause of using inverse transform (inverse STFT, wavelet inverse transform, or the like) of time frequency analysis…”; para. 0089 “In step S156, a time domain signal is generated from the acoustic feature amount sequence of the estimated target voice signal, and the time domain signal is output as a target voice signal by the output unit 90…”).
Kaneko and Kameoka are considered to be analogous to the claimed invention as they both are in the same field of waveform generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kaneko to incorporate the teachings of Kameoka in order to specifically have the non-neural network function indicate a relationship between the time component and the feature component of the intermediate representation signal. Doing so would be beneficial, as utilizing such functions as inverse STFT’s would allow for reconstruction to generate a time domain output waveform for application such as speech enhancement (Pandey et al., US 2022/0232342 A1, para. 0112).
Regarding claim 2, Kaneko in view of Kameoka discloses wherein the neural network performs upsampling of a time component of the intermediate representation signal using the neural network function (Kaneko, para. 0051 “In the generator using the 2D CNN, down-sampling is performed in the time direction and the feature amount dimension direction in order to efficiently view the relationship in the time direction and the feature amount dimension direction…Then, up-sampling is performed in the time direction and the feature amount dimension direction, and the original size is returned…”).
Regarding claim 3, Kaneko in view of Kameoka discloses wherein the neural network is a convolutional neural network (Kaneko, para. 0053 “In the embodiment of the present invention, a combination of a 2D CNN and a 1D CNN is used as the generator…”).
Regarding claim 4, Kaneko in view of Kameoka discloses wherein the non-neural network generates the target waveform signal from the intermediate representation signal of which the time component is changed by the neural network (Kaneko discloses an intermediate representation signal of which the time component is changed by the neural network (see claim 1 mapping); Kameoka teaches generating a target waveform from an intermediate signal: para. 0067-0069 “[0067] As the acoustic feature vector, (A1) A vector having a logarithmic amplitude spectrum as an element…[0069] After the learning of F is completed, the acoustic feature amount sequence x of the input voice and the target attribute code c are input to G, so that the acoustic feature amount sequence of the converted voice is obtained…From the above, the converted voice can be obtained by the calculation process of the time domain signal according to the calculation process of the acoustic feature amount. For example, in a case where (A1) is used as the acoustic feature amount, a vocoder is used in the cause of using inverse transform (inverse STFT, wavelet inverse transform, or the like) of time frequency analysis…”; para. 0089 “In step S156, a time domain signal is generated from the acoustic feature amount sequence of the estimated target voice signal, and the time domain signal is output as a target voice signal by the output unit 90…”).
Kaneko and Kameoka are considered to be analogous to the claimed invention as they both are in the same field of waveform generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kaneko to incorporate the teachings of Kameoka in order to specifically have the non-neural network generate the target waveform signal from the intermediate representation signal of which the time component is changed by the neural network, using the same rationale to combine given for claim 1.
Regarding claim 5, Kaneko in view of Kameoka discloses wherein the non-neural network performs frequency-time transformation based on an inverse short-time Fourier transform, an inverse wavelet transform, or a predetermined basis function on the intermediate representation signal (Kameoka: para. 0069 “After the learning of F is completed, the acoustic feature amount sequence x of the input voice and the target attribute code c are input to G, so that the acoustic feature amount sequence of the converted voice is obtained…From the above, the converted voice can be obtained by the calculation process of the time domain signal according to the calculation process of the acoustic feature amount. For example, in a case where (A1) is used as the acoustic feature amount, a vocoder is used in the cause of using inverse transform (inverse STFT, wavelet inverse transform, or the like) of time frequency analysis…””).
Kaneko and Kameoka are considered to be analogous to the claimed invention as they both are in the same field of waveform generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kaneko to incorporate the teachings of Kameoka in order to specifically have the non-neural network perform frequency-time transformation based on an inverse short-time Fourier transform, an inverse wavelet transform, or a predetermined basis function on the intermediate representation signal, using the same rationale to combine given for claim 1.
Regarding claim 6, claim 6 is a method claim with limitations similar to claim 1, and is thus also rejected for analogous reasons.
Regarding claim 7, claim 7 is a non-transitory computer readable medium claim with limitations similar to claim 1, and is thus also rejected for analogous reasons.
Additionally, Kaneko discloses A non-transitory computer readable medium which stores a program causing a computer to function…(Kaneko, para. 0063 “As illustrated in Fig. 4, the data conversion learning device 100 according to the embodiment of the present invention can be configured by a computer including…a ROM that stores a program for executing a data conversion learning processing routine…”).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Li et al. (US 12,190,896 B2): inverse transform layer (Fig. 3)
Andreev et al. (US 2023/0326476 A1): HIFI GAN, upsampling via convolutions, inverse STFT (Fig. 1)
Jin et al. (US 2023/0162725 A1): audio super resolution model utilizing convolutions to upsample input audio data (para. 0032)
Dhawan et al. (US 2022/0115028 A1): speech generation, performing time-frequency transformation via post-processing module (Fig. 2A)
Arik et al. (US 2019/0355347 A1): waveform synthesis via convolutional layers (Fig. 1)
Yang et al. (NPL Multi-Band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech): GAN architecture, convolutional neural network with upsampling layers, using reduced parameters for increased speed (Fig. 1 and Abstract)
Ashraf et al. (NPL Underwater Ambient-Noise Removing GAN Based on Magnitude and Phase Spectra): GAN with separate post-processing step comprising inverse STFT (Fig. 4)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CODY DOUGLAS HUTCHESON whose telephone number is (703)756-1601. The examiner can normally be reached M-F 8:00AM-5:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571)-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CODY DOUGLAS HUTCHESON/Examiner, Art Unit 2659
/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659