DETAILED ACTION
Claims 1 – 25 are pending.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed on 20 January 2026 has been entered.
Response to Amendment
With regard to the Final Office Action from 20 October 2025, the Applicant has filed a response on 20 January 2026.
Response to Arguments
Applicant’s arguments with respect to the independent claims have been considered but are moot because of the new grounds of rejection necessitated by the amendment to the claims. The claims will be addressed by their current presentation in the following section.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over PASCUAL et al. (WO 2024/086012 A1: hereafter — Pascual1) in view of POLLET et al. (US 2021/0118425 A1: hereafter — Pollet) further in view of Munjal, Prateek, Akanksha Paul, and Narayanan C. Krishnan. “Implicit discriminator in variational autoencoder.” 2020 international joint conference on neural networks (IJCNN). IEEE, 2020 (hereafter — Munjal).
For claim 1, Pascual discloses a method for training machine learning models, comprising:
synthesizing an audio data set in a second domain using a generator within a generative adversarial network (GAN), wherein the generator outputs the synthesized audio data set in the second domain [[such that the audio data set is recognized by an acoustic model]], wherein the generator is a machine learning model trained in coordination with a decoder [[and a loss of the decoder as feedback, wherein the decoder is outside the GAN]] (Pascual: [058] — a generative-adversarial-network implementation that includes a generator trained to generate synthesised audio signals (noting that a generative adversarial network is a trained machine learning model); [072]–[073] — converting an audio signal from one class to another class (indicating the conversion from one domain into another domain)), wherein the generator is trained based on original audio data in a first domain to output synthetic audio features in the second domain (Pascual: [038]–[043] — a generator which features an encoder and a decoder, the encoder taking in an input signal — z, which may be an original audio signal having a first sample rate to transform it into a sequence of hidden features as embedding vectors (the generator taking in audio signal output audio features) with Down GBlocks for performing the transformation; [072]–[073] — original audio is in a first class and it is being synthesised to be presented in a second class as provided to a generator (from a first domain to a second domain)), [[wherein the audio data set has at least a portion of the output synthetic audio features in the second domain of the trained generator (Pascual: [072] — the presence of conditioning features which determine what the target type of sound should imitate (an indication of features in the second domain)), wherein the first domain and the second domain indicate different purposes (Pascual: [073] — the domains could be one for a dog barking and the other being a piano sound, both serving different purposes), wherein the decoder is configured to transform audio features in the second domain into audio features in the first domain (Pascual: [082] — a decoder able to convert hidden features back into a first resolution (teaching of converting back from a second domain into a first domain)).
The reference of Pascual provides teaching for synthesising an audio data making use of a generator and a decoder, but fails to teach the further limitations of this claim regarding the presence of a trained acoustic model. This is however not new to the art as the reference of Pollet is introduced to teach this as:
… wherein the generator outputs the synthesized audio data set in the second domain such that the audio data set is recognized by an acoustic model (Pollet: [0008] — applying the generated TTS voice as training data to generate an acoustic model, the TTS voice that’s developed not being personal data (thereby teaching an adaptation for speech recognition, an acoustic model being used for speech recognition to convert speech into phonetic representation)) … ;
training an acoustic model in the second domain using the synthesized audio data set, wherein the acoustic model is adapted to recognize audio data sets in the second domain (Pollet: [0008] — applying the generated TTS voice as training data to generate an acoustic model, the TTS voice that’s developed not being personal data (showing that both are in the domain of being non-personal data, while also noting that an acoustic model is used for converting speech into phonetic representation, thereby teaching an adaptation for speech recognition)).
The reference of Pascual provides teaching for having a generator trained on original audio to output synthetic audio features in a second domain and a decoder to transform audio features from the second domain into the first domain, for the purpose of generating synthesised audio. This differs from the claimed invention in that the claimed invention further provides teaching for training an acoustic model using synthesised audio data set. This isn’t new to the art as the reference of Pollet is seen to teach above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate the known teaching of Pollet which teaches applying generated text-to-speech voice for training an acoustic model, with the teaching of Pascual which performs the generation of the synthesised speech, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of applying speech data that does not contain personal voice information (from human speech) as well as speech data that would not need further transcription as the transcript would already be available, thereby making training relatively easier (Pollet: [0008]) and also obtaining acoustic model data useful for speech recognition.
The combination of Pascual in view of Pollet provides teaching for the use of a decoder and a GAN for synthesising an audio data set, but differs from the claimed invention in that the claimed invention further provides teaching showing that the decoder is separate from the GAN, the loss of the decoder being used as feedback.
This is however not new to the art as the reference of Munjal is now introduced to teach as:
… wherein the generator is a machine learning model trained in coordination with a decoder and a loss of the decoder as feedback, wherein the decoder is outside the GAN (Munjal: Page 5 Col 1 Equation (13) — provides a decoder loss which is calculated based on a GAN loss (indicating that the decoder is outside the GAN), and this loss is used in learning, (indicating a use of the decoder loss for further training, or as feedback to the system)) …
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate the known teaching of Munjal which provides a decoder outside a GAN, the decoder loss being used feedback, into improving upon the teaching of the combination of Pascual in view of Pollet which teaches the use of a decoder and a GAN for synthesising an audio data set, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of the well-known use of feedback loss for improving upon the performance of a GAN, thereby minimising loss through a feedback loop. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
As for claim 13, computer program product claim 13 and method claim 1 are related as computer program product storing executable instructions required for performing the claimed method steps on a computer. Pascual in [021] provides a non-transitory computer-readable storage medium suitable to read upon this claim. Accordingly, claim 13 is similarly rejected under the same rationale as applied above with respect to method claim 1.
As for claim 14, system claim 14 and method claim 1 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Pascual in [030] provides a processor as well as storage memory suitable to read upon the limitations of this claim. Accordingly, claim 14 is similarly rejected under the same rationale as applied above with respect to method claim 1.
Claims 2 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Pascual (WO 2024/086012 A1) in view of Pollet (US 2021/0118425 A1) further in view of Munjal (“Implicit discriminator in variational autoencoder.” 2020 international joint conference on neural networks (IJCNN). IEEE, 2020) as applied to claims 1 and 14, and further in view of Danielescu et al. (US 2024/0096313 A1: hereafter — Danielescu).
For claim 2, claim 1 is incorporated and the combination of Pascual in view of Pollet further in view of Munjal discloses the method, wherein the synthesized audio data set is a first audio data set (Pollet: [0020] — generated waveforms).
The combination of Pascual in view of Pollet further in view of Munjal however fails to teach the further limitations of this claim, for which the reference of Danielescu is now introduced to teach as wherein the synthesized audio data set is a first audio data set, further comprising:
applying the trained acoustic model to features from a second audio data set in order to generate a plurality of acoustic predictions for the second audio data set (Danielescu: [0005], [0081] — an acoustic model which is trained to predict speech sounds based on input feature coefficients so as to generate output data indicating predicted speech).
The combination of Pascual in view of Pollet further in view of Munjal provides teaching for training an acoustic model, but differs from the claimed invention in that the claimed invention further provides teaching for applying the trained acoustic model to audio features to generate acoustic predictions. This isn’t new to the art as the reference of Danielescu is seen to teach above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate the known teaching of Danielescu which applies a trained acoustic model to audio features in order to generate acoustic predictions, with the generation of an acoustic model as taught by the combination of Pascual in view of Pollet further in view of Munjal, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result that such a process would reduce the error between sound predictions and actually observed speech sounds, thereby being able to operate in a fully self-supervised manner in which only future speech is needed to adjust the parameters of the trained acoustic model (Danielescu: [0081]).
As for claim 15, system claim 15 and method claim 2 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 15 is similarly rejected under the same rationale as applied above with respect to method claim 2.
Claims 3 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Pascual (WO 2024/086012 A1) in view of Pollet (US 2021/0118425 A1) further in view of Munjal (“Implicit discriminator in variational autoencoder.” 2020 international joint conference on neural networks (IJCNN). IEEE, 2020), further in view of Danielescu (US 2024/0096313 A1) as applied to claims 2 and 15, and further in view of Chang et al. (US 11,915,690 B1: hereafter — Chang).
For claim 3, claim 2 is incorporated but the combination of Pascual in view of Pollet, further in view of Munjal and further in view of Danielescu fails to teach the limitation of this claim, for which the reference of Chang is now introduced to teach as:
the method, further comprising:
applying at least one speech recognition model to the plurality of acoustic predictions for the audio data set (Chang: Col 20 lines 33-36 — a speech recognition component making use of acoustic unit data that’s output by an acoustic model (the acoustic unit data output by the acoustic model here is akin to the claimed plurality of acoustic predictions for the audio data set as this is output by an acoustic model, the output here being provided to a speech recognition model/component)).
The combination of Pascual in view of Pollet, further in view of Munjal and further in view of Danielescu provides teaching for the generation of acoustic predictions, but differs from the claimed invention in that the claimed invention further provides the application of at least one speech recognition model to the plurality of acoustic predictions. This isn’t new to the art as the reference of Chang is seen to teach above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate the known teaching of Chang which applies an ASR component to results of an acoustic model which are the acoustic predictions, with the generation of these acoustic predictions from an acoustic model as taught by the combination of Pascual in view of Pollet, further in view of Munjal and further in view of Danielescu, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of identifying the exact spoken words in textual form, that are represented by the acoustic predictions of the acoustic model, aiding viewing and understanding by an observer. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
As for claim 16, system claim 16 and method claim 3 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 16 is similarly rejected under the same rationale as applied above with respect to method claim 3.
Claims 4 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Pascual (WO 2024/086012 A1) in view of Pollet (US 2021/0118425 A1) further in view of Munjal (“Implicit discriminator in variational autoencoder.” 2020 international joint conference on neural networks (IJCNN). IEEE, 2020) as applied to claims 1 and 14, and further in view of MONGE ALVAREZ et al. (US 2021/0225358 A1: hereafter — Monge Alvarez).
For claim 4, claim 1 is incorporated and the combination of Pascual in view of Pollet further in view of Munjal, particularly further considering the reference of Pascual, discloses the method, wherein synthesizing the audio data set further comprises:
generating, using the generator, the plurality of synthetic audio features (Pascual: [011] — a generator being guided toward generating audio to be synthesised; [013]).
The combination of Pascual in view of Pollet further in view of Munjal however fails to teach the further limitation of this claim, for which Monge Alvarez is now introduced to teach as
inputting the plurality of synthetic audio features to a voice encoder, wherein the audio data set is created based on an output of the voice encoder (Monge Alvarez: [0068] — inputting generated acoustic features into a vocoder (voice encoder) to synthesise expressive speech (generating the synthesised output audio data set)).
The combination of Pascual in view of Pollet further in view of Munjal provides teaching for generating synthetic audio features using a generator, but differs from the claimed invention in that the claimed invention further provides inputting the synthetic audio features into voice encoder to have the voice encoder output audio data set. This isn’t new to the art as the reference of Monge Alvarez is seen to teach above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Monge Alvarez which has a vocoder receive generated acoustic features to output synthesised speech as a generated audio data set, with the teaching of the combination of Pascual in view of Pollet further in view of Munjal, which generates synthetic audio features using a generator, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of generating a wide range of synthetic speech data useful for training speech recognition systems in the absence of human speech training data. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
As for claim 17, system claim 17 and method claim 4 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 17 is similarly rejected under the same rationale as applied above with respect to method claim 4.
Claims 5, 7, 9, 10, 18, 20, 22 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Pascual (WO 2024/086012 A1) in view of Pollet (US 2021/0118425 A1) further in view of Munjal (“Implicit discriminator in variational autoencoder.” 2020 international joint conference on neural networks (IJCNN). IEEE, 2020) as applied to claims 1 and 14, and further in view of Beaufays et al. (US 2022/0115000 A1: hereafter — Beaufays).
For claim 5, claim 1 is incorporated but the combination of Pascual in view of Pollet further in view of Munjal fails to fully disclose the limitations of this claim, for which the reference Beaufays is now introduced to teach as:
the method, wherein the GAN further including a discriminator configured to predict whether outputs of the generator are authentic, wherein the generator is trained further in coordination with the discriminator (Beaufays: [0004] — a generative adversarial network; [0010] — a discriminator which predicts that the synthesised speech audio is actually a spoken utterance (determining authenticity) as well as the presence of a trained on-device generator and a trained on-device discriminator).
The combination of Pascual in view of Pollet further in view of Munjal provides teaching for generating synthetic audio features using a generator, but differs from the claimed invention in that the claimed invention further provides that the generator is included in a generative adversarial network, the generator being trained alongside a discriminator which is configured to predict if outputs are authentic. This isn’t new to the art as the reference of Beaufays is seen to teach above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Beaufays which includes a generative adversarial network along with a generator trained with a discriminator able to predict whether outputs are authentic, with the teaching of the combination of Pascual in view of Pollet further in view of Munjal, which generates synthetic audio features using a generator, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of generating synthetic yet authentic sounding audio data that is usable for training ASR models, the data being capable of having features similar to natural human speech. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
For claim 7, claim 5 is incorporated and the combination of Pascual in view of Pollet further in view of Munjal and further in view of Beaufays discloses the method, further comprising:
training the GAN in a plurality of iterations (Beaufays: [0004] — training a GAN that contains a TTS generator; [0011] — updating weights of the TTS generator model based on a generated gradient, performed by backpropagation (backpropagation being an example of an iterative process)), wherein training the GAN at each iteration of the plurality of iterations further comprises:
generating, via the generator, a plurality of training synthetic audio features in the second domain (Beaufays: [0004] — the TTS generator model generates synthesised speech audio data (note from claim 1 that the Pascual reference teaches the generator outputting audio features in the second domain));
determining, via the discriminator, whether each of the plurality of training synthetic audio features is authentic (Beaufays: [0017] — a discriminator which, at every training instance (iteration), is trained to be able to distinguish between real human audio and synthesised audio (being able to determine if audio features are authentic)).
For claim 9, claim 7 is incorporated and the combination of Pascual in view of Pollet further in view of Munjal and further in view of Beaufays discloses the method, wherein training the GAN at each iteration further comprises:
determining a total loss based on a loss of the GAN and a loss of the decoder (Beaufays: [0085]–[0086] — calculated a loss which may be an adversarial loss (as the GAN loss) and an additional loss based on comparing synthesised speech to the ground truth (taken as the claimed decoder loss here); [0087] — having a combination of the generator loss and the additional loss to update the TTS generator model); and
providing the determined loss as feedback to the GAN (Beaufays: [0018] — “the on-device TTS generator model can be updated based on the loss (e.g., the loss may be backpropagated across the on-device TTS generator model to update weights thereof)”; [0087] — having a combination of the generator loss and the additional loss to update the TTS generator model).
For claim 10, claim 7 is incorporated and the combination of Pascual in view of Pollet further in view of Munjal and further in view of Beaufays discloses the method, wherein determining the loss at each iteration further comprises:
creating, via the decoder, a plurality of training synthetic audio features in the first domain (Beaufays: [0042] — generating synthesised speech audio data; FIG. 1B, [0060] — updating a TTS generator model which includes training with synthesised speech audio data 102 (the continuous updating would indicate the presence of a plurality of training synthetic audio features) (note from claim 1 that the Pascual reference was applied to teach the decoder outputting audio features in the first domain)); and
comparing the plurality of training synthetic audio features in the first domain to the original audio data, wherein the loss of the decoder is determined based on the comparison (Beaufays: [0086] — obtaining a loss based on the synthesised audio data and the ground truth audio data).
As for claim 18, system claim 18 and method claim 5 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 18 is similarly rejected under the same rationale as applied above with respect to method claim 5.
As for claim 20, system claim 20 and method claim 7 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 20 is similarly rejected under the same rationale as applied above with respect to method claim 7.
As for claim 22, system claim 22 and method claim 9 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 22 is similarly rejected under the same rationale as applied above with respect to method claim 9.
As for claim 23, system claim 23 and method claim 10 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 23 is similarly rejected under the same rationale as applied above with respect to method claim 10.
Claims 6 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Pascual (WO 2024/086012 A1) in view of Pollet (US 2021/0118425 A1) further in view of Munjal (“Implicit discriminator in variational autoencoder.” 2020 international joint conference on neural networks (IJCNN). IEEE, 2020), further in view of Beaufays (US 2022/0115000 A1), as applied to claims 5 and 18, and further in view of Fernandez Guajardo et al. (US 2022/0392428 A1: hereafter — Fernandez Guajardo).
For claim 6, claim 5 is incorporated but the combination of Pascual in view of Pollet further in view of Munjal and further in view of Beaufays fails to disclose the limitation of this claim for which Fernandez Guajardo is now introduced to teach as:
the method, wherein the discriminator is initially trained based on the original audio data and a plurality of training synthetic audio features generated by the generator (Fernandez Guajardo: [0049] — a synthesis training using a global discriminator model that is trained using voice samples and synthetic audio streams (original audio and synthetic audio)).
The combination of Pascual in view of Pollet further in view of Munjal and further in view of Beaufays provides teaching for a GAN including a discriminator that predicts whether outputs of the generator are authentic, as well as a generator that generates synthesised audio signals Pascual: [057]). This combination differs from the claimed invention in that the claimed invention further provides teaching for training the discriminator with original audio and synthetic audio features generated by the generator. This isn’t new to the art as the reference of Fernandez Guajardo is seen to teach above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Fernandez Guajardo trains a generator based on original audio and synthetic audio, with the teaching of a discriminator that predicts whether outputs of the generator are authentic as taught by the combination of Pascual in view of Pollet further in view of Munjal and further in view of Beaufays, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of providing the discriminator with training data of original audio and synthetic audio data, enabling the discriminator to be able to effectively distinguish between both forms of audio data. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
As for claim 19, system claim 19 and method claim 6 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 19 is similarly rejected under the same rationale as applied above with respect to method claim 6.
Claims 8 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Pascual (WO 2024/086012 A1) in view of Pollet (US 2021/0118425 A1) further in view of Munjal (“Implicit discriminator in variational autoencoder.” 2020 international joint conference on neural networks (IJCNN). IEEE, 2020), further in view of Beaufays (US 2022/0115000 A1), as applied to claims 7 and 20, and further in view of Nair et al. (US 11,514,948 B1: hereafter — Nair).
For claim 8, claim 7 is incorporated and the combination of Pascual in view of Pollet further in view of Munjal and further in view of Beaufays discloses the method, wherein training the GAN at each iteration further comprises:
[[discarding]] each training synthetic audio feature that is not determined to be authentic, wherein the discarded features are not utilized during subsequent iterations (Beaufays: [0017] — during training, the discriminator labels the speech as either ‘real’ and ‘human’ and ‘fake’ or ‘synthesised’ for the inauthentic ones, these being considered negative discriminators (an indication of a negative label means that such training data wouldn’t be applied for determining authenticity, which would mean they’re discarded for such purposes)); and
keeping each training synthetic audio feature that is determined to be authentic, wherein the kept features are utilized during subsequent iterations (Beaufays: [0017] — during training, the discriminator labels the speech as either ‘real’ and ‘human’ for the authentic ones, these being considered positive discriminators (an indication of a positive label means that such training data would be applied for determining authenticity, which would mean they’re kept for such training purposes)).
The combination of Pascual in view of Pollet further in view of Munjal and further in view of Beaufays provides teaching for applying a negative label to audio features which the discriminator determines to be fake or inauthentic as provided above with regard to the Beaufays reference. This combination however fails to precisely teach of discarding those labelled as inauthentic.
The reference of Nair is now introduced to teach this as
discarding each training synthetic audio feature that is not determined to be authentic, wherein the discarded features are not utilized during subsequent iterations (Nair: Col 9 lines 32-39, Col 13 lines 33-41 — rejecting proposed data determined to be false by the discriminator).
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Nair which rejects candidates that are identified as being false, with the teaching of adding a negative label to such determined candidates as taught by the combination of Pascual in view of Pollet further in view of Munjal and further in view of Beaufays, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of entirely rejecting false data representations so as not to taint the iterative training process with unacceptable training information. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
As for claim 21, system claim 21 and method claim 8 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 21 is similarly rejected under the same rationale as applied above with respect to method claim 8.
Claims 11 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Pascual (WO 2024/086012 A1) in view of Pollet (US 2021/0118425 A1) further in view of Munjal (“Implicit discriminator in variational autoencoder.” 2020 international joint conference on neural networks (IJCNN). IEEE, 2020), further in view of Beaufays (US 2022/0115000 A1), as applied to claims 9 and 22, and further in view of Jin et al (US 2021/0343305 A1: hereafter — Jin).
For claim 11, claim 9 is incorporated but the combination of Pascual in view of Pollet further in view of Munjal and further in view of Beaufays fails to disclose the limitations of this claim, for which Jin is now introduced to teach as the method, wherein each of the generator and the discriminator has a respective loss function, wherein the loss provided to the generator as feedback at each iteration is determined based further on an output of the loss function of each of the generator and the discriminator at the iteration (Jin: [0051] — an iterative loop for generating audio; [0060] — a training system over many iterations; [0059] — training the GAN with loss functions to update the prediction model (applying the loss function to each iteration); [0044] — executing a generator loss function and a discriminator loss function).
The combination of Pascual in view of Pollet further in view of Munjal and further in view of Beaufays provides teaching for training the GAN through iterations whereby a total loss is determined and used as feedback into the system. This differs from the claimed invention in that the claimed invention further provides teaching for the generator and the discriminator having their respective loss functions which get provided back to the generator as feedback. This isn’t new to the art as the reference of Jin is seen to teach above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Jin which feeds both generator and discriminator loss functions at every iteration back into the trained model, with the teaching of the combination of Pascual in view of Pollet further in view of Munjal and further in view of Beaufays which trains the GAN over several iterations making use of a total loss, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of teaching/training the prediction model to minimise loss with both the generation of synthetic audio, while also updating the performance of the prediction model. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
As for claim 24, system claim 24 and method claim 11 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 24 is similarly rejected under the same rationale as applied above with respect to method claim 11.
Claims 12 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Pascual (WO 2024/086012 A1) in view of Pollet (US 2021/0118425 A1) further in view of Munjal (“Implicit discriminator in variational autoencoder.” 2020 international joint conference on neural networks (IJCNN). IEEE, 2020) as applied to claims 1 and 14, and further in view of Stanton et al. (US 2023/0206898 A1: hereafter — Stanton).
For claim 12, claim 1 is incorporated and as applied to claim 1 above, the combination of Pascual in view of Pollet further in view of Munjal was applied to show a generator converting from a first domain to a second domain, and a decoder for transforming from a second domain to the first domain.
The combination of Pascual in view of Pollet further in view of Munjal however fails to teach the further limitations of this claim, for which Stanton is now introduced to teach as:
the method, wherein the generator is configured to output synthetic audio features as spectrograms in the second domain, wherein the decoder is configured to transform the spectrograms in the second domain into spectrograms in the first domain (Stanton: [0160] — causing the system to predict two spectrogram frames for each decoder step (indicating the decoder outputting spectrograms); [0161]–[0162] — a generator which receives a spectrogram and outputs a generated waveform audio (which can also be represented in spectrogram form)).
The combination of Pascual in view of Pollet further in view of Munjal provides teaching for a generator converting from a first domain to a second domain, and a decoder for transforming from a second domain to the first domain, but differs from the claimed invention in that the claimed invention now further teaches of the generator outputting synthetic audio features as spectrograms in the second domain, and the decoder transforming spectrograms from the second domain to the first domain. This isn’t new to the art as the reference of Stanton is seen to teach above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Stanton which provides a generator outputting synthetic audio features as spectrograms in the second domain, and the decoder transforming spectrograms from the second domain to the first domain, with the teaching of the combination of Pascual in view of Pollet further in view of Munjal which provides teaching for a generator converting from a first domain to a second domain, and a decoder for transforming from a second domain to the first domain, to thereby come up with the claimed invention.
The combination of both prior art elements would have been obvious to try, given that generated audio can be represented as spectrograms, a spectrogram being a suitable representation to view frequency features of the generated audio. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
As for claim 25, system claim 25 and method claim 12 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 25 is similarly rejected under the same rationale as applied above with respect to method claim 12.
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
Korani et al. (US 2021/0287780 A1) provides teaching for output reconstruction from a (variational autoencoders) VAE decoder which serves as feedback input to a GAN discriminator that also sends feedback to the generator when adversarial loss is used [0019].
Any inquiry concerning this communication or earlier communications from the Examiner should be directed to OLUWADAMILOLA M. OGUNBIYI whose telephone number is (571)272-4708. The Examiner can normally be reached Monday – Thursday (8:00 AM – 5:30 PM Eastern Standard Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s Supervisor, PARAS D. SHAH can be reached at (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/OLUWADAMILOLA M OGUNBIYI/Examiner, Art Unit 2653
/Paras D Shah/Supervisory Patent Examiner, Art Unit 2653
02/06/2026
1 Pascual has a publication with an earlier priority date of 17 October 2022 containing the same subject matter.