DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 10/20/2025 has been entered. Claims 1-20 are pending in the application and have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner Comment Regarding Patent Subject Matter Eligibility under 35 U.S.C. 101
Dependent claims 8 and 15 involve a method of training the neural network model by comparing the first and second enhanced waveforms to calculate loss values to adjust neural network parameters. Accordingly, along with the practical application of the trained neural network based on the comparisons as stated in the specifications[0034], these claims are directed towards patent eligible subject matter under step 2A prong 1.
Response to Amendment
The response filed on 10/20/2025 has been correspondingly accepted and considered in this Office Action. Claims 1-20 have been examined.
Response to Arguments
Applicant's arguments filed 10/20/2025 have been fully considered as follows:
Applicant’s arguments with respect to claim 1 (also representative of claims 9 and 17) state that
“Furthermore, to advance prosecution, claim 1 is amended herein to clarify that the method further comprises "converting a first audio data, in a time domain, to a first frequency-based representation of the first audio data at a first frequency," "resampling the low frequency representation to generate a second frequency-based representation of the first audio data at a second frequency that is higher than the first frequency." These steps of "converting [ ... ]," and "resampling [... ]" further distinguish claim 1 from any mathematical concept.
Accordingly, Applicant respectfully submits that the Office Action has improperly characterized the claims and respectfully requests withdrawal of the rejection of claim 1. ”
The examiner respectfully disagrees, the amendments to the claims 1, 9 and 17 represent processing of audio signals to generate frequency representations and manipulate these representations using neural networks, nothing in the claim element precludes the step from practically being performed by a computer based on mathematical formulae or calculations. recites the additional element of generating through a “neural network”, which are recited at a high level of generality and amounts to merely using a computer as a tool to perform an abstract idea or mere instructions to apply the exception using a generic computer component. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the insignificant extra-solution activities abstract idea but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. This judicial exception is not integrated into a practical application. Since the added limitations other than the computer networks are grouped in with the mental process, they would not be left over for further consideration in step 2A prong 2 and 2B to qualify as a technological improvement or inventive concept. The recitation of a circuit in this claim does not negate the abstract nature of these limitations because the claim here merely uses the circuit as a tool to perform the otherwise mental process. See MPEP 2106.05(I)- An inventive concept "cannot be furnished by the unpatentable law of nature (or natural phenomenon or abstract idea) itself." Genetic Techs. v. Merial LLC(Fed. Cir. 2016)…Instead, an "inventive concept" is furnished by an element or combination of elements that is recited in the claim in addition to (beyond) the judicial exception, and is sufficient to ensure that the claim as a whole amounts to significantly more than the judicial exception itself. Alice Corp., 573 U.S. at 27-18, 110 USPQ2d at 1981 (citing Mayo, 566 U.S. at 72-73, 101 USPQ2d at 1966). The rejections of claims 1, 9 and 17 under 35 U.S.C. 101 are sustained and updated accordingly in this Office Action under Claim Rejections under 35 USC 101.
Applicant’s arguments with respect to claim 1 (also representative of claims 9 and 17) state that
“Nonetheless, to advance prosecution, claim 1 is amended herein to clarify that the method may further comprise "converting a first audio data, in a time domain, to a first frequency-based representation of the first audio data at a first frequency" and "resampling the low frequency representation to generate a second frequency-based representation of the first audio data at a second frequency that is higher than the first frequency." Neither Chen nor Kumar, alone or in combination, disclose "converting a first audio data, in a time domain, to a first frequency-based representation of the first audio data at a first frequency," let alone "resampling the low frequency representation to generate a second frequency-based representation of the first audio data at a second frequency that is higher than the first frequency." For at least these reasons, neither Chen nor Kumar, alone or in combination with one another, discloses the elements of claim 1 as amended herein.”
Applicant’s arguments above with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
In response to the art rejection(s) of the remainder of dependent claims are rejected under 35 U.S.C 103, in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in Remarks filed 10/20/2025, Examiner respectfully notes as follows. For completeness, should the mentioned claims be likewise traversed for similar reasons to independent claims 1, 9 and 17 correspondingly, Examiner respectfully directs Applicant to the same previous supra reasons provided in the response directed towards claims 1, 9 and 17 correspondingly discussed above. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and Applicant's arguments have been fully considered but they are not persuasive.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1, 9 and 17 and the dependent claims 2-8, 10-16 and 18-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The support for the amendments in independent claims 1, 9 and 17 line 6 regarding “the first audio data at a second frequency that is higher than different from the low frequency representation the first frequency” is not supported by the specifications. Upon review of the Specification, the Examiner was uncertain where there is support in the specification for the newly amended limitations and request that in a next response, Applicant indicate the respective supporting portions of the specification. The amendments to claim 1 is inferred as being based on Specifications [0033-0034] regarding the resampled spectrogram as a higher frequency than clipped frequency spectrogram, further the super resolution neural network is trained by comparing the high frequency spectrogram with the resampled spectrogram, where the resampled spectrogram is resampled at the designed frequency of the super-resolution neural network 214. The dependent claims 2-8, 10-16 and 18-20 are also rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement due to the lack of support to the amendments in the independent claims 1, 9 and 17 respectively.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-7, 9-14 and 16-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an extra solution activity abstract idea without significantly more.
According to USPTO guidelines, a claim is directed to non-statutory subject matter if:
STEP 1: the claim does not fall within one of the four statutory categories of invention (process, machine, method of manufacture, or composition of matter), or
STEP 2: the claim recites a judicial exception (e.g. an abstract idea) without reciting additional elements that amount to significantly more than the judicial exception, as determined using the following analysis:
STEP 2A (Prong 1): Does the claim recite an abstract idea, law of nature, or natural phenomenon? The guidelines provide three groupings of subject matter that are considered abstract ideas:
Mathematical concepts- mathematical relationships, formulas or equations, calculations
Certain methods of organizing human activity- fundamental economic principles or practices, commercial or legal interactions, managing personal behavior or relationships or interactions between people
Mental processes- concepts that are practicably performed in the human mind (including an observation, evaluation, judgement, or opinions)
STEP 2A (Prong 2): Does the claim recite additional elements that integrate the judicial exception into a practical application? The guidelines provide the following exemplary considerations that are indicative than an additional element (or combination of elements) may have integrated the judicial exception into a practical application:
an additional element reflects an improvement in the functioning of a computer, or an improvement to other technology or technical field;
an additional element that applies or uses a judicial exception to affect a particular treatment or prophylaxis for a disease or medical condition;
an additional element implements a judicial exception with, or uses a judicial exception in conjunction with, a particular machine or manufacture that is integral to the claim;
an additional element effects a transformation or reduction of a particular article to a different state or thing; and
an additional element applies or uses the judicial exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception.
While the guidelines further state that the exemplary considerations are not an exhaustive list and that there may be other examples of integrating the exception into a practical application, the guidelines also list examples in which a judicial exception has not been integrated into a practical application:
an additional element merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea;
an additional element adds insignificant extra-solution activity to the judicial exception; and
an additional element does no more than generally link the use of a judicial exception to a particular technological environment or field of use.
STEP 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? Consider whether an additional element or combination of elements:
adds a specific limitation or combination of limitations that are not well-understood, routine, or conventional activity in the field, which is indicative that an inventive concept may be present; or
simply appends well-understood, routine and conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception, which is indicative that an inventive concept may not be present.
Using the two-step inquiry, claim 1 is directed to an abstract idea as show below:
STEP 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
YES. Claim 1 is directed to a method.
STEP 2A (Prong 1): Does the claim recite an abstract idea, law of nature, or natural phenomenon?
YES. The claim recites an abstract idea:
The limitation converting a first audio data, in a time domain, to a first frequency-based representation of the first audio data at a first frequency as drafted, is a process that, under its broadest reasonable interpretation, recites a mathematical formula or calculation that is used to calculate a frequency-based representation of the audio signal using computer. The limitation of performing filtering of the first frequency-based representation to generate a low frequency representation of the first frequency-based representation, as drafted, is a process that, under its broadest reasonable interpretation, recites a mathematical formula or calculation.
The limitation of resampling the first frequency-based representation low frequency representation to generate a second frequency-based representation of the first audio data at a second frequency that is higher than the first frequency, as drafted, is a process that, under its broadest reasonable interpretation, is a mathematical formula or extra solution activities.
and generating, using one or more neural networks and based at least on the second frequency- based representation of the first audio data, second audio data at the second frequency, as drafted, is a process that, under its broadest reasonable interpretation, recites a mathematical formula or extra solution activities.
STEP 2A (Prong 2): Does the claim recite additional elements that integrate the judicial exception into a practical application?
NO.
Claim 1 recites the additional element of generating through a “neural network”, which are recited at a high level of generality and amounts to merely using a computer as a tool to perform an abstract idea or mere instructions to apply the exception using a generic computer component. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the insignificant extra-solution activities abstract idea but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. This judicial exception is not integrated into a practical application.
STEP 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
NO.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a neural network and audio data conversion amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Claim 1 is not patent eligible.
Claim 2 further specifies providing the format of an input to the neural network and is a process that, under its broadest reasonable interpretation, is a data gathering process (insignificant extra-solution activity) and does not reflect an improvement in the functioning of a technology or computer . The claim is not patent eligible.
Claim 3 further recites process of generating resampled representation of frequency representation of audio signals. This limitation, under its broadest reasonable interpretation, is a data gathering process (insignificant extra-solution activity), using well-understood, routine, and conventional components recited at a high level of generality and does not reflect an improvement in the functioning of a technology or computer. The claim is not patent eligible.
Claim 4 further describes the processing of the audio sequence and is a process that, under its broadest reasonable interpretation, is insignificant extra-solution activity and does not reflect an improvement in the functioning of a technology or computer. The claim is not patent eligible.
Claim 5 further describes generation of the frequency representation of audio data using a processor and is a process that, under its broadest reasonable interpretation, is a mathematical computation process that can be performed by a human using mathematical computational tools and does not reflect an improvement in the functioning of a technology or computer. The claim is not patent eligible.
Claim 6 further describes the generation of the frequency representation of audio data using parallel processors and is a process that, under its broadest reasonable interpretation, is a mathematical computation process that can be performed by a human using mathematical computational tools and does not reflect an improvement in the functioning of a technology or computer. The claim is not patent eligible.
Claim 7 further describes the training of neural network and does not explain on the criteria of adjusting the parameters of the neural network, this type of limitations merely confines the use of the abstract idea to a particular neural network and thus fails to add an inventive concept to the claims and thus falls within the extra-solution activity category as well. The claim is not patent eligible.
Claims 9-14 are analogous to claims 1-4 and 6-7 respectively, as directed to a processing device, the processing device to perform the operations set forth in claims 1-4 and 6-8 and are subjected to the same rejections as claims 1-4 and 6-8 respectively. Claim 9 merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. The claims are not patent eligible.
Claims 16 and 20 further specify the type of processors for claims 9 and 17 respectively and merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. The claims are not patent eligible.
Claims 17-19 are analogous to claims 1-3 respectively as directed to a system using processors, the processing device to perform the operations set forth in claims 1-3, and are subjected to the same rejections as claims 1-3 respectively.
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-2, 4-10, 12-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Eskimez, S. E., et. al. (2019). Adversarial training for speech super-resolution. IEEE Journal of Selected Topics in Signal Processing, 13(2), 347-358 further in view of Kumar et. al. US PgPub. 2022/0101872.
Regarding claim 1, Eskimez teaches a method comprising: converting a first audio data, in a time domain, to a first frequency-based representation of the first audio data at a first frequency (Eskimez, sect IV D, We computed the short-time Fourier transform (STFT) on both low and high-resolution signals); performing filtering of the first frequency-based representation to generate a low frequency representation of the first frequency-based representation (Eskimez, sect IV D We applied a low-pass filter and downsampled the high-resolution signals to obtain their parallel low-resolution signals for training and testing); resampling the first frequency-based representation low frequency representation to generate a second frequency-based representation of the first audio data at a second frequency that is higher than the first frequency (Eskimez, sect II A Before being fed to the network, the low-resolution waveform is upsampled to match the sampling rate of the target super-resolution signal ; Eskimez, sect IV D The low-resolution signals were created by applying an order 8 Chebyshev type I low-pass filter and downsampling the high-resolution signals. The low-resolution signals were upsampled to match the size of the high-resolution signals using cubic upscaling as the input to their neural network; The amendments to claim 1 is inferred as being based on Specifications [0033-0034] regarding the resampled spectrogram as a higher frequency than clipped frequency spectrogram, further the super resolution neural network is trained by comparing the high frequency spectrogram with the resampled spectrogram, where the resampled spectrogram is resampled at the designed frequency of the super-resolution neural network 214); and generating, using one or more neural networks and based at least on the second frequency- based representation of the first audio data, second audio data at the second frequency (Eskimez, sect III A, XNB is fed to the proposed generator network, or namely the Speech-Super Resolution Generative Adversarial Network (SSR-GAN) to estimate the high-frequency range LPS, XWB. The original narrowband and the predicted high-frequency range are concatenated to obtain the estimated wideband LPS XSR. ). Eskimez teaches generating, using one or more neural networks and based at least on the second frequency- based representation of the first audio data, second audio data at the second frequency, to further compact prosecution, Kumar is used to further teach generating, using one or more neural networks and based at least on the second frequency-based representation of the first audio data, second audio data at the second frequency (see Kumar [0102] the model can be trained to transform magnitude spectrograms to a given sampling rate (here, Sampling Rate B). By applying an inverse transform to the second magnitude spectrogram and phase associated with the first magnitude spectrogram, the media production platform can obtain a second discrete audio signal (“Discrete Audio Signal B”) that has the higher sampling rate ).
Eskimez and Kumar are considered to be analogous to the claimed invention because they relate to processing audio data using neural network based super resolution models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Eskimez on training the audio super resolution models with the upsampling discrete audio signals to higher sampling rates teachings and trainings of super resolution model teachings of Kumar to improve audio quality( see Kumar, [0025-0026]).
PNG
media_image1.png
290
252
media_image1.png
Greyscale
Regarding claim 2, Eskimez in view of Kumar teaches the method of claim 1. Kumar further teaches wherein the first frequency-based representation is a frequency-based spectrogram generated using a short-time Fourier transform (STFT) operation (see Kumar, [0051] Assume, for example, that the media production platform 210 acquires input indicative of a selection of an audio file to be upsampled. Generally, the audio file will be in the form of a discrete audio signal with a frequency less than 40,000 Hz. For example, the discrete audio signal may have a sampling rate of 8,000 Hz, 16,000 Hz, 22,050 Hz, or 24,000 Hz depending on the computing device used for recording. In such a scenario, the upsampling module 216 can apply a Fourier transform (e.g., an STFT) to the audio file to produce a first magnitude spectrum). The same motivation to combine as claim 1 applies here.
Regarding claim 4, Eskimez in view of Kumar teaches the method of claim 1. Eskimez further teaches wherein the one or more neural networks infer one or more non-zero audio data values for one or more frequency bands from the second frequency- based representation (see Eskimez, sect III B For the generator network, we employ a common bottleneck autoencoder architecture described in [7]. The generator is a sequence-to-sequence model that accepts the narrowband LPS with T time steps and outputs the high-frequency range LPS with T time steps). Kumar further teaches wherein the one or more neural networks infer one or more non-zero audio data values for one or more frequency bands from the second frequency- based representation (see Kumar, [0102] The media production platform can then apply a model to the first magnitude spectrogram to produce a second magnitude spectrogram (“Magnitude Spectrogram B”). As discussed above, the model can be trained to transform magnitude spectrograms to a given sampling rate (here, Sampling Rate B) and non zero values as shown in Fig. 8 ) The same motivation to combine as claim 1 applies here.
Regarding claim 5, Eskimez in view of Kumar teaches the method of claim 1. Kumar teaches wherein the generating of the first frequency-based representation of the first audio data and the resampling are performed using at least one graphics processing unit (GPU) (see Kumar, [0045] The processor 202 can have generic characteristics similar to general-purpose processors, or the processor 202 may be an application-specific integrated circuit (ASIC) that provides control functions to the computing device 200(GPU). As shown in FIG. 2, the processor 202 can be coupled to all components of the computing device 200, either directly or indirectly, for communication purposes). The same motivation to combine as claim 1 applies here.
Regarding claim 7, Eskimez in view of Kumar teaches the method of claim 1. Eskimez further teaches wherein the one or more neural networks are trained, at least, by: selecting a high frequency audio waveform (see Eskimez, sect IVA, describes the dataset selected at 48KHz and 16KHz sampling rates ); generating a high frequency spectrogram of the high frequency audio waveform (see Eskimez, sect IV D); performing filtering of the high frequency spectrogram to generate a low frequency spectrogram that does not include audio data for one or more higher frequency audio bands (see Eskimez, sect IV D, The low-resolution signals were created by applying an order 8 Chebyshev type I low-pass filter(does not include audio data from higher freq) and downsampling the high-resolution signals; Eskimez, sec III A, In order to avoid discontinuities at the concatenation [6], we also predict the highest C frequency bins of the narrowband spectrogram, where C is called the offset parameter. During concatenation, the top C frequency bins are removed from the narrowband spectrogram); performing a resampling of the low frequency spectrogram to generate a padded spectrogram at the high frequency including padded values for the one or more higher frequency audio bands (Eskimez sect IV D, The low-resolution signals were upsampled to match the size of the high-resolution signals using cubic upscaling as the input to their neural network. The samples were chopped into patches with the length of 6000 in the high-resolution space (0.375 seconds), which is the same for 2x and 4x scales ( padded values)); and adjusting one or more network parameters of the one or more neural networks using the high frequency spectrogram and the padded spectrogram (see Eskimez, III B The discriminator network accepts the concatenated narrowband and high-frequency range LPSs as input, where the high-frequency range LPS could be generated by the generator network or coming directly from the data distribution. Including the narrowband to the discriminator's input is essentially conditioning the input high-frequency range LPS on the narrowband LPS, similar to conditional GANs [31]. The discriminator contains three convolutional layers as shown in Figure 2. Different from the generator, we do not employ BN layers in the discriminator. Using BN in the discriminator leads to instabilities during training, especially if the discriminator loss is regularized [8], [21]. The convolutional layers are followed by two FC layers. We use LeakyReLU activation with a slope of 0.2 in all layers, except for the output layer, where we use a linear activation function. The details of both network architectures are shown in Table I. ). Kumar further teaches generating a high frequency spectrogram of the high frequency audio waveform (Kumar, [0051] Assume, for example, that the media production platform 210 acquires input indicative of a selection of an audio file to be upsampled. Generally, the audio file will be in the form of a discrete audio signal with a frequency less than 40,000 Hz. For example, the discrete audio signal may have a sampling rate of 8,000 Hz, 16,000 Hz, 22,050 Hz, or 24,000 Hz depending on the computing device used for recording. In such a scenario, the upsampling module 216 can apply a Fourier transform (e.g., an STFT) to the audio file to produce a first magnitude spectrum)
Regarding claim 8, Eskimez in view of Kumar teaches the method of claim 7. Eskimez further teaches comparing a first enhanced waveform generated for the high frequency spectrogram and a second enhanced waveform generated for the padded spectrogram to calculate a loss value based on one or more differences between the first enhanced waveform and the second enhanced waveform, wherein the adjusting of the one or more network parameters is based at least on the loss value (see Eskimez, sec III C, equation (1) Our initial testing showed that using Log-Spectral Distance (LSD) (or Log-Spectral Distortion) function as our training objective yield slightly better results for the SSR task and Adversarial Loss to calculate the generator loss based on weighted LSD loss).
Regarding claim 9, is directed to a processor claim corresponding to the method claim presented in claim 1 and is rejected under the same grounds stated above regarding claim 1.
Regarding claim 10, is directed to a processor claim corresponding to the method claim presented in claim 2 and is rejected under the same grounds stated above regarding claim 2.
Regarding claim 12, is directed to a processor claim corresponding to the method claim presented in claim 4 and is rejected under the same grounds stated above regarding claim 4.
Regarding claim 14, is directed to a processor claim corresponding to the method claim presented in claim 7 and is rejected under the same grounds stated above regarding claim 7.
Regarding claim 15, is directed to a processor claim corresponding to the method claim presented in claim 8 and is rejected under the same grounds stated above regarding claim 8.
Regarding claim 16, Eskimez in view of Kumar teaches the processor of claim 9. Kumar further teaches wherein the processor is comprised in at least one of: a system for rendering graphical output(see Kumar, Fig. 9, 918); a system for performing deep learning operations(see Kumar, [0050] At a high level, the model is a machine learning framework that is comprised of one or more algorithms adapted to upsample an audio signal that is provided as input, thereby producing another audio signal having a different sampling rate. ); a system implemented at least partially using cloud computing resources(see Kumar, [0108], Fig. 9, 912/914).
Regarding claim 17, is directed to a system claim corresponding to the method claim presented in claim 1 and is rejected under the same grounds stated above regarding claim 1.
Regarding claim 18, is directed to a system claim corresponding to the method claim presented in claim 2 and is rejected under the same grounds stated above regarding claim 2.
Regarding claim 20, is directed to a system claim corresponding to the processor claim presented in claim 16 and is rejected under the same grounds stated above regarding claim 16.
Claims 3, 11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over in view of Eskimez, S. E., et. al. (2019). Adversarial training for speech super-resolution. IEEE Journal of Selected Topics in Signal Processing, 13(2), 347-358 in view of Kumar et. al. US PgPub. 2022/0101872 further in view of Rogers US PgPub. 2007/0100606.
Regarding claim 3, Eskimez in view of Kumar teaches the method of claim 1. Eskimez teaches wherein the resampling is performed using a fast Fourier transform (FFT) resampler to add one or more padded entries to one or more frequency bands in the second frequency-based representation that do not contain data values in the first frequency- based representation(Eskimez sect IV D, The low-resolution signals were upsampled to match the size of the high-resolution signals using cubic upscaling as the input to their neural network. The samples were chopped into patches with the length of 6000 in the high-resolution space (0.375 seconds), which is the same for 2x and 4x scales ( padded values)). Eskimez teaches resampling to add one or more padded entries but to further compact prosecution, Rogers is used to further teach wherein the resampling is performed using a fast Fourier transform (FFT) resampler to add one or more padded entries to one or more frequency bands in the second frequency-based representation that do not contain data values in the first frequency- based representation(see Rogers, [0037] Because one or more of the blocks associated with the digitized audio signal 200 will be transformed using an FFT, the block width can be set to a power of 2 that corresponds to the size of the FFT, such as 512 samples, 1,024 samples, 2,048 samples, or 4,096 samples. In an implementation, if the last block 220 includes fewer samples than are required to form a full block, one or more additional zero-value samples can be added to complete that block. For example, if the FFT size is 1,024 and the last block 220 only includes 998 samples, 26 zero-value samples can be added to fill in the remainder of the block).
Eskimez and Kumar pertain to processing audio data using neural network based super resolution models. Rogers teaches techniques to resample the selected portion of the input digital audio signal by upsampling and resampling the portion of the output digital audio signal by downsampling (see Rogers, Fig. 3). Using the known technique of zero value padding as taught by Rogers, during FFT processing of the digitized audio signal in the reference Eskimez in view of Kumar, such as enhanced audio output would have been obvious to one of ordinary skill in the art.
Regarding claim 11, is directed to a processor claim corresponding to the method claim presented in claim 3 and is rejected under the same grounds stated above regarding claim 3.
Regarding claim 19, is directed to a system claim corresponding to the method claim presented in claim 3 and is rejected under the same grounds stated above regarding claim 3.
Claims 6 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Eskimez, S. E., et. al. (2019). Adversarial training for speech super-resolution. IEEE Journal of Selected Topics in Signal Processing, 13(2), 347-358 in view of Kumar et. al. US PgPub. 2022/0101872 further in view of Zeyu et. al, US PgPub. 2023/0162725.
Regarding claim 6, Eskimez in view of Kumar teaches the method of claim 5. However, Eskimez in view of Kumar fail to teach wherein the first audio data is received in a first audio stream, and wherein a batch of audio streams including the first audio stream is to be processed in parallel using one or more GPUs.
However, Zeyu further teaches wherein the first audio data is received in a first audio stream, and wherein a batch of audio streams including the first audio stream is to be processed in parallel using one or more GPUs (see Zeyu, [0097] the processor(s) 1202 may include one or more central processing units (CPUs), graphics processing units (GPUs)). See Zeyu, [0074] As illustrated in FIG. 10, the method 1000 includes an act 1006 providing the upsampled audio data to an audio super resolution model, the audio super resolution model trained to perform bandwidth expansion from narrow-band to wide-band. In some embodiments, the audio may be processed in parallel by dividing the audio into batches. Each batch may then be processed by dividing the batch into sub-batches, with each sub-batch being processed in parallel; Zeyu [0103]).
Eskimez, Kumar and Zeyu are considered to be analogous to the claimed invention because they relate to processing audio data using neural network based super resolution models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Eskimez in view of Kumar on using the audio super resolution models with batch processing of audio data teachings and trainings of audio super resolution model teachings of Zeyu to improve processing speed ( see Zeyu, [0043]).
Regarding claim 13, is directed to a processor claim corresponding to the method claim presented in claim 6 and is rejected under the same grounds stated above regarding claim 6.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Schmit et. al US PgPub. 2020/0243102 teaches a method for generating a bandwidth-enhanced audio signal for an input audio signal using spectral vectors used for the purpose of raw signal generation and used for the purpose of raw signal processing using the parametric representation output by the neural network processor (see Schmit, Fig. 2e).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 12:00pm - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached at (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NANDINI SUBRAMANI/ Examiner, Art Unit 2656
/BHAVESH M MEHTA/ Supervisory Patent Examiner, Art Unit 2656