DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
1. Regarding the rejection under 35 U.S.C. § 101, Applicant's arguments filed 04/28/2026 have been fully considered but they are not persuasive.
Applicant argues that the claims are patent eligible and not directed to an abstract idea, citing Example 39 of the 2019 USPTO SPE Guidance (see pgs. 6-7). Specifically, Applicant argues that the claims are based on mathematical concepts, but do not in themselves recite mathematical calculations, formulas, or calculations, and further argues the claims do not recite mental processes or methods of organizing human activity, and thus do not recite abstract ideas. The Examiner respectfully disagrees with this argument. A claim that recites mathematical calculations, when the claim is given its broadest reasonable interpretation in light of the specification, will be considered as falling with the mathematical concepts grouping under Step 2A Prong 1 analysis. Furthermore, the claims do not have to recite the word “calculating” in order to be considered mathematical calculations. With this reasoning, the claims as currently written recite limitations which fall under the abstract idea grouping of mathematical concepts. The step of generating a second intermediate representation by changing a time or feature component of a first intermediate representation signal (e.g. resolution changes) amounts to a mathematical calculation. Furthermore, generating a third intermediate representation or target waveform using a non-neural network function indicating a relationship between the time and feature components (e.g. performing frequency-time transformations) amounts to further mathematical calculations. Dimensionally compressing a first intermediate representation signal in a frequency direction (e.g. resolution changes) amounts to a mathematical calculation. Finally, generating magnitude and phase spectrograms amount to further mathematical calculations. Therefore, the claims recite abstract ideas under Step 2A Prong 1 analysis.
Hence, Applicant’s arguments are not persuasive.
2. Regarding the rejection under 35 U.S.C. § 103, Applicant’s arguments with respect to claims 1-7 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
3. Claims 1-7 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claims 1, 6, and 7, “A waveform signal generation system”, “A waveform signal generation method”, and “A non-transitory computer readable medium” are recited, which all are directed to one of the four statutory categories of invention (machine, method, and article of manufacture respectively; Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite mathematical concepts which fall into the category of abstract idea (Step 2A Prong 1: YES).
The following limitations, under their broadest reasonable interpretation, recite mathematical concepts:
…generate a second intermediate representation signal, in which at least a portion of a target waveform signal is restored, from a first intermediate representation signal by changing a time component or a feature component of the first intermediate representation signal… …: changing a time component of frequency component of a first intermediate representation to obtain a second intermediate representation amounts to mathematical calculations
… generate a third intermediate representation signal or the target waveform signal from the second intermediate representation signal using a …function indicating a relationship between a time component and a feature component of the second intermediate representation signal: generating a target waveform using a function is a mathematical calculation.
wherein the first intermediate signal, the second intermediate representation signal, and the third intermediate representation signal each indicate an intermediate signal…: generating the recited intermediate signals using the steps recited above amounts to mathematical calculations
wherein the first intermediate representation signal is information dimensionally compressed in a frequency direction: compressing a frequency direction dimensionally in a representation signal amounts to mathematical calculations
the second intermediate representation signal is an amplitude spectrogram and a phase spectrogram: generating amplitude/phase spectrograms amounts to mathematical calculations
Claims 1, 6, and 7 do not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). The only additional limitations in claim 1 are “A waveform signal generation system”, ”a neural network configured to generate…using a neural network function”, and “a non-neural network configured to act for at least a part of processing for generating…”. Claim 6 contains the additional limitation of “generating…using a neural network function”. Claim 7 contains the additional limitation “A non-transitory computer readable medium which stores a program causing a computer to function as the waveform signal generation system according to claim 1”. Each of these limitations are recited at a high level of generality and amount to mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination with the claims as a whole, mere instructions to implement the judicial exception using a generic computer do not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Therefore, claims 1, 6, and 7 are directed to abstract ideas.
Claims 1, 6, and 7 do not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the only additional limitations amount to mere instructions to implement the judicial exception using a generic computer, which even when viewed in combination with the claims as a whole, do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 1, 6, and 7 are not patent eligible.
Regarding claims 2-5, “The waveform signal generation system” is recited, which falls into one of the statutory categories of invention (machine; Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite further mathematical concepts which fall into the category of abstract idea (Step 2A Prong 1: YES).
Claim 2:
…performs upsampling of a time component of the first intermediate representation signal …: upsampling a signal amounts to a mathematical calculation.
Claim 2 contains the additional limitations “wherein the neural network performs…using the neural network function”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 3:
Claim 3 contains the additional limitation “wherein the neural network is a convolutional neural network”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 4:
Generates the target waveform signal from the first intermediate representation signal of which the time component is changed: generating the target waveform from the time component altered intermediate signal amounts to a mathematical calculation.
Claim 4 contains the additional limitation “wherein the non-neural network function generates…of which the time component is changed by the neural network”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 5:
performs frequency-time transformation based on an inverse short-time Fourier transform, an inverse wavelet transform, or a predetermined basis function on the first intermediate representation signal: performing a frequency time-transformation is a mathematical calculation
Claim 5 contains the additional limitation “wherein the non-neural network performs…”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claims 2-5 do not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). As discussed above, the only additional limitations are mere instructions to implement the judicial exception using a generic computer, which even when viewed in combination with the claims as a whole do not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Therefore, claims 2-5 are directed to abstract ideas.
Claims 2-5 do not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the only additional limitations are mere instructions to implement the judicial exception using a generic computer, which even when viewed in combination with the claims as a whole do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 2-5 are not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
4. Claims 1-7 are rejected under 35 U.S.C. 103 as being unpatentable over Kaneko et al. (WO 2020/175530 A1, hereinafter Kaneko) in view of Kameoka & Kaneko (WO 2020/036178 A1, hereinafter Kameoka) and further in view of Jansson (US 2020/0043517 A1).
Regarding claim 1, Kaneko discloses A waveform signal generation system (Fig. 4; para. 0063 “Next, a configuration of data conversion learning apparatus according to an embodiment of the present invention will be described. As illustrated in FIG. 4, the data conversion learning device 100 according to the embodiment of the present invention can be configured by a computer including a CPU, a RAM, and a ROM that stores a program for executing a data conversion learning processing routine to be described later and various data. The data conversion learning device 100 functionally includes an input unit 10, a calculation unit 20, and an output unit 50, as illustrated in FIG. 4.”) comprising: a neural network (generator: para. 0099 “Next, in step S152, the acoustic feature amount sequence of the audio signal of the domain of the conversion destination is estimated from the acoustic feature amount sequence extracted by the acoustic feature extraction unit 72 using the forward direction generator G X[Wingdings font/0xE0] Y learned by the data conversion learning device 100.”; para. 0053 “In the embodiment of the present invention, a combination of a 2D CNN and a 1D CNN is used as the generator…”) configured to generate a second intermediate representation signal (output of main converter unit layers in generator: para. 0051 “In the generator using the 2D CNN, down-sampling is performed in the time direction and the feature amount dimension direction in order to efficiently view the relationship in the time direction and the feature amount direction, and the dimension is increased in the channel direction instead. Next, the conversion is gradually performed by the main conversion unit including a plurality of layers…”; para. 0053 “For example, as illustrated in FIG. 2, the generator includes a down-sampling converter G1, a main converter G2, and an up-sampling converter G3”), in which at least a portion of a target waveform signal is restored (para. 0053 “Next, the main conversion unit G2 performs dynamic conversion by the 1D CNN.”; para. 0051 “Next, the conversion is gradually performed by the main conversion unit including a plurality of layers…””) (time domain signal: para. 0100 “In step S156, a time domain signal is generated from the acoustic feature amount sequence of the voice signal of the estimated domain…”), from a first intermediate signal (output of downsampling layers: para. 0051 “First, similarly to the generator using the 2D CNN, the down-sampling conversion unit G1 performs down-sampling for the time direction and the feature amount dimension direction…”) by changing a time component or a feature component of the first intermediate representation signal using a neural network function (para. 0051 “In the generator using the 2D CNN, down-sampling is performed in the time direction and the feature amount dimension direction in order to efficiently view the relationship in the time direction and the feature amount dimension direction…Then, up-sampling is performed in the time direction and the feature amount dimension direction, and the original size is returned…”).
wherein the first intermediate representation signal, the second intermediate representation signal…each indicate an intermediate representation between an input signal and the target waveform signal (first signal (output of downsampling layers) and second signal (output of main conversion layers unit layers) are each intermediate signals between input signal (voice signal: para. 0098 “First, in step S150, the acoustic feature amount sequence is extracted from the voice signal of the input domain of the conversion source…”) and the target waveform (output time domain signal: para. 0100 “In step S156, a time domain signal is generated from the acoustic feature amount sequence of the voice signal of the estimated domain of the conversion destination, and the time domain signal is output as the voice signal of the conversion destination by the output unit 90, and the data conversion processing routine is ended…”)),
wherein the first intermediate representation signal is information dimensionally compressed … para. 0051 “In the generator using the 2D CNN, down-sampling is performed in the time direction and the feature amount dimension direction in order to efficiently view the relationship in the time direction and the feature amount dimension direction…Then, up-sampling is performed in the time direction and the feature amount dimension direction, and the original size is returned…”.
Kaneko discloses further processing after the neural network to generate the target waveform (para. 0100 “In step S156, a time domain signal is generated from the acoustic feature amount sequence of the voice signal of the estimated domain of the conversation destination, and the time domain signal is output as the voice signal of the conversion destination by the output unit 90…”). However, Kaneko does not specifically disclose a non-neural network configured to generate a third intermediate representation signal or the target waveform signal from the second intermediate representation signal using a non-neural network function indicating a relationship between a time component and a feature component of the second intermediate representation signal…[wherein…] the third intermediate representation signal [each] indicate an intermediate representation between an input signal and the target waveform signal.
Kameoka teaches a non-neural network configured to generate a third intermediate representation signal or the target waveform signal from the second intermediate representation signal using a non-neural network function indicating a relationship between a time component and a feature component of the second intermediate representation signal…[wherein…] the third intermediate representation signal [each] indicate an intermediate representation between an input signal and the target waveform signal. (In order for the reference to read on this limitation, the reference must at least one of: a non-neural network generating a third intermediate representation signal indicating an intermediate representation between an input and target waveform signal, OR a non-neural network generating the target waveform; Kemoka teaches the second option (a non-neural network configured to generate the target waveform): para. 0067-0069 “[0067] As the acoustic feature vector, (A1) A vector having a logarithmic amplitude spectrum as an element…[0069] After the learning of F is completed, the acoustic feature amount sequence x of the input voice and the target attribute code c are input to G, so that the acoustic feature amount sequence of the converted voice is obtained…From the above, the converted voice can be obtained by the calculation process of the time domain signal according to the calculation process of the acoustic feature amount. For example, in a case where (A1) is used as the acoustic feature amount, a vocoder is used in the cause of using inverse transform (inverse STFT, wavelet inverse transform, or the like) of time frequency analysis…”; para. 0089 “In step S156, a time domain signal is generated from the acoustic feature amount sequence of the estimated target voice signal, and the time domain signal is output as a target voice signal by the output unit 90…”).
Kaneko and Kameoka are considered to be analogous to the claimed invention as they both are in the same field of waveform generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kaneko to incorporate the teachings of Kameoka in order to specifically have a non-neural network configured to generate a third intermediate representation signal or the target waveform signal from the second intermediate representation signal using a non-neural network function indicating a relationship between a time component and a feature component of the second intermediate signal, wherein the third intermediate representation signal indicates an intermediate representation between an input signal and the target waveform signal. Doing so would be beneficial, as utilizing such functions as inverse STFT’s would allow for reconstruction to generate a time domain output waveform for application such as speech enhancement (Pandey et al., US 2022/0232342 A1, para. 0112).
Kaneko in view of Kameoka discloses downsampling in a time direction and a feature amount direction (Kaneko, para. 0051 “In the generator using the 2D CNN, down-sampling is performed in the time direction and the feature amount dimension direction in order to efficiently view the relationship in the time direction and the feature amount dimension direction…Then, up-sampling is performed in the time direction and the feature amount dimension direction, and the original size is returned…”). However, Kaneko in view of Kameoka does not specifically disclose [wherein the first intermediate representation signal is information dimensionally compressed] in a frequency direction. Further, Kaneko in view of Kameoka does not specifically disclose the second intermediate representation signal is an amplitude spectrogram and a phase spectrogram.
Jansson teaches [wherein the first intermediate representation signal is information dimensionally compressed] in a frequency direction (para. 0067 “Generally, the procedure 400 according to the present example aspect of the present application includes computing a Time-Frequency Representation (TFR) for the tracks T.sup.O and T.sup.I, using a TFR obtainer 602, to yield corresponding TFRs X.sup.O and X.sup.I, respectively, in the frequency domain (step 402), wherein the TFRs X.sup.O and X.sup.I each are a spectrogram of 2D coefficients, having frequency and phase content, and then performing steps 404 to 410 as will be described below.”; downsampling layers half the number of frequency bins, which reads on compressing dimensionally in the frequency direction: para. 0072 “Each downsampling layer 502b to 502n reduces in half the number of bins and frames, while increasing the number of feature channels.”) and the second intermediate representation signal is an amplitude spectrogram and a phase spectrogram (output of the neural network architecture taught is a complex spectrogram containing both amplitude and phase: para. 0075 “In step 408, the output of layer 504n is employed as a mask for being applied by mask combiner 608 to the input image of layer 502a, to provide an estimated magnitude spectrogram 508, which, in an example case where the U-Net architecture 500 is trained to predict/isolate an instrumental component of a mixed original signal, is an estimated instrumental magnitude spectrum (of course, in another example case where the U-Net architecture 500 is trained to predict/isolate a vocal component of a mixed original signal, the spectrogram is an estimated vocal magnitude spectrum). That step 408 is performed to combine the image (e.g., preferably a magnitude component) from layer 504n with the phase component from the mixed original spectrogram 502a to provide a complex value spectrogram having both phase and magnitude components (i.e., to render independent of the amplitude of the original spectrogram). Step 408 may be performed in accordance with any suitable technique.”).
Kaneko, Kameoka, and Jansson are considered to be analogous to the claimed invention as they are in the same field of waveform generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kaneko in view of Kameoka to incorporate the teachings of Jansson in order to have the first intermediate representation signal be dimensionally compressed in a frequency direction, and to have the second intermediate representation signal be an amplitude spectrogram and a phase spectrogram. Performing the former would be beneficial, as the taught dimensionality compression utilizes a neural network architecture which recreates fine, low-level detail required for high-quality audio reproduction (Jansson, para. 0048). Additionally, performing the latter would be beneficial, as utilizing complex spectrograms with contain both magnitude and phase information leads to better resulting quality, avoiding the use of noisy phases when reconstructing a clean waveform (NPL A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement, pg. 7 section D.2).
Regarding claim 2, Kaneko in view of Kameoka and Jansson discloses wherein the neural network performs upsampling of a time component of the intermediate representation signal using the neural network function (Kaneko, para. 0051 “In the generator using the 2D CNN, down-sampling is performed in the time direction and the feature amount dimension direction in order to efficiently view the relationship in the time direction and the feature amount dimension direction…Then, up-sampling is performed in the time direction and the feature amount dimension direction, and the original size is returned…”).
Regarding claim 3, Kaneko in view of Kameoka and Jansson discloses wherein the neural network is a convolutional neural network (Kaneko, para. 0053 “In the embodiment of the present invention, a combination of a 2D CNN and a 1D CNN is used as the generator…”).
Regarding claim 4, Kaneko in view of Kameoka and Jansson discloses wherein the non-neural network generates the target waveform signal from the intermediate representation signal of which the time component is changed by the neural network (Kaneko discloses an intermediate representation signal of which the time component is changed by the neural network (see claim 1 mapping); Kameoka teaches generating a target waveform from an intermediate signal: para. 0067-0069 “[0067] As the acoustic feature vector, (A1) A vector having a logarithmic amplitude spectrum as an element…[0069] After the learning of F is completed, the acoustic feature amount sequence x of the input voice and the target attribute code c are input to G, so that the acoustic feature amount sequence of the converted voice is obtained…From the above, the converted voice can be obtained by the calculation process of the time domain signal according to the calculation process of the acoustic feature amount. For example, in a case where (A1) is used as the acoustic feature amount, a vocoder is used in the cause of using inverse transform (inverse STFT, wavelet inverse transform, or the like) of time frequency analysis…”; para. 0089 “In step S156, a time domain signal is generated from the acoustic feature amount sequence of the estimated target voice signal, and the time domain signal is output as a target voice signal by the output unit 90…”).
Kaneko, Kameoka, and Jansson are considered to be analogous to the claimed invention as they are in the same field of waveform generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Kameoka in order to specifically have the non-neural network generate the target waveform signal from the intermediate representation signal of which the time component is changed by the neural network, using the same rationale to combine given for claim 1.
Regarding claim 5, Kaneko in view of Kameoka and Jansson discloses wherein the non-neural network performs frequency-time transformation based on an inverse short-time Fourier transform, an inverse wavelet transform, or a predetermined basis function on the intermediate representation signal (Kameoka: para. 0069 “After the learning of F is completed, the acoustic feature amount sequence x of the input voice and the target attribute code c are input to G, so that the acoustic feature amount sequence of the converted voice is obtained…From the above, the converted voice can be obtained by the calculation process of the time domain signal according to the calculation process of the acoustic feature amount. For example, in a case where (A1) is used as the acoustic feature amount, a vocoder is used in the cause of using inverse transform (inverse STFT, wavelet inverse transform, or the like) of time frequency analysis…””).
Kaneko, Kameoka, and Jansson are considered to be analogous to the claimed invention as they both are in the same field of waveform generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Kameoka in order to specifically have the non-neural network perform frequency-time transformation based on an inverse short-time Fourier transform, an inverse wavelet transform, or a predetermined basis function on the intermediate representation signal, using the same rationale to combine given for claim 1.
Regarding claim 6, claim 6 is a method claim with limitations similar to claim 1, and is thus also rejected for analogous reasons.
Regarding claim 7, claim 7 is a non-transitory computer readable medium claim with limitations similar to claim 1, and is thus also rejected for analogous reasons.
Additionally, Kaneko discloses A non-transitory computer readable medium which stores a program causing a computer to function…(Kaneko, para. 0063 “As illustrated in Fig. 4, the data conversion learning device 100 according to the embodiment of the present invention can be configured by a computer including…a ROM that stores a program for executing a data conversion learning processing routine…”).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Lester et al. (US 2022/0369031 A1): encoder branch of DNN model formed with set of downsampling layers that downsample an input in a frequency axis while keeping time axis at same resolution in order to reduce latency (para. 0163)
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CODY DOUGLAS HUTCHESON whose telephone number is (703)756-1601. The examiner can normally be reached M-F 8:00AM-5:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571)-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CODY DOUGLAS HUTCHESON/Examiner, Art Unit 2659
/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659