Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1 and 3-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cho (US 20220414338)
In claims 1, 17, and 18 Cho discloses
Obtaining, by one or more of the processors, acoustic feature data comprising a value for one or more audio characteristics (paragraph 41, feature vectors are generated based upon an utterance, which is acoustic)
Selecting, by one or more of the processors, a first latent embedding from a codebook of latent embeddings based upon processing the acoustic feature data using a machine learning model and (paragraph 41, latent codes are identified from a codebook for each feature vector. A convolution network is machine learning model)
Generating, by one or more of the processors, an output based upon the selected first latent embedding (paragraphs 41-42, the output is text information)
Cho fails to disclose that the output is an audio output or that it is for a video game, however McCoy discloses machine learning generating custom music for a video game (paragraph 52). It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Cho with McCoy in order to allow for the Vector quantized variation autoencoders of Cho to be used in other contexts including creation of audio in video games.
In claim 3, Cho in view of McCoy fails to disclose MIDI audio data, however Cho discloses WAV, MP3, WMV, AVI, MOV and MP4 and discloses that the disclosure is not limited to these formats. Official notice is taken that MIDI was a notoriously well known file format, and it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine Cho in view of McCoy with this well known technique in order to use whatever file format is preferred by the operator with the related advantages of the MIDI format such as small file size and ease of modification.
In claim 4, Cho discloses one or more neural network layers (paragraph 54)
In claim 5, Cho discloses decoding, by one or more of the processors, the first latent embedding using a decoder machine learning model to generate the output sample (paragraph 41)
In claim 6, Cho discloses selecting by one or more processors a second latent embedding from the codebook based upon a label and wherein generating by one or more of the processors, an audio sample based upon the first selected latent embedding comprises: combining by one or more of the processors, the first latent embedding and the second latent embedding to generate a combined latent embedding and decoding by one or more of the processors, the combined latent embedding to generate the output sample (paragraph 41 discloses multiple latent embeddings for each feature, which are then combined together to create the summarization)
In claims 7 and 19, Cho discloses sampling, by one or more of the processors, a probability distribution over the codebook conditioned on the label (paragraph 2)
In claim 8, Cho discloses obtaining, by one or more of the processors, second acoustic feature data, selecting by one or more of the processors, a third latent embedding from the codebook based upon processing the second acoustic feature data using the acoustic machine learning model and wherein generating, by one or more of the processors, an output audio sample based upon the first selected latent embedding comprises: combining by one or more of the processors, the first latent embedding and the third latent embedding to generate a combined latent embedding, and decoding by one or more of the processors, the combined latent embedding to generate the output sample (paragraphs 41-42, a plurality of audio features are received, and they are all provided latent embedding from the codebook, which is combined to create the summary text)
In claim 9, Cho discloses obtaining, b y one or more of the processors, a training audio sample, obtaining by one or more of the processors, training acoustic feature data comprising a value for one or more audio characteristics of the training audio sample, generating, by one or more of the processors, a first training latent embedding based upon processing the training acoustic feature data using the acoustic machine learning model, generating by one or more of the processors, a second training latent embedding based upon processing the training audio samples using an encoder machine learning model, determining by one or more of the processors, a value of a loss function, wherein the loss function comprises an acoustic loss term based upon a comparison between the first and second training latent embeddings, and updating by one or more of the processors, the acoustic machine learning model based upon the value of the loss function (paragraphs 19, 20, 51-58, 88-108)
In claim 10, Cho discloses the first training latent embedding is selected form the codebook of latent embeddings based upon processing of the acoustic feature data using the acoustic machine learning model, wherein the second training latent embedding is selected from the codebook of latent embeddings based upon the processing of the trainng audio sample using the encoder machine learning model (paragraphs 19, 20, 51-58, 88-108)
In claim 11, Cho discloses updating the codebook of latent embeddings based upon the value of the loss function (paragraph 100)
In claim 12, Cho discloses generating a reconstruction of the training audio sample based upon processing the second training latent embedding using the decoder machine learning model and wherein the loss function further comprises a reconstruction loss term based upon a comparison between the training audio sample and the reconstruction of the training audio sample and updating the decoder machine learning model based upon the value of the loss function (paragraph 100)
In claim 13, Cho discloses updating the encoder machine learning model based upon the loss function (paragraph 100)
In claim 14, Cho discloses quantizing, the output of the processing by the encoder machine learning model, and wherein generating the second training latent embedding is based upon the quantized output (paragraph 40)
In claim 15, Cho discloses the loss function further comprises a quantization loss term (paragraphs 99-100)
In claim 16, Cho discloses the quantization loss term is based upon a comparison between the second latent embedding from a current training iteration and the second training latent embedding from a previous training iteration (paragraphs 19, 20, 51-58, 88-108)
In claim 20, Cho discloses the first and second latent embeddings is based upon a weighted sum of the first and second latent embeddings (It is noted by examiner that “weighted” might simply be that all weights are 1, or for that matter, 0. In this case, Cho uses each latent embedding for each feature to combine for a text summary of the audio)
Claim(s) 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cho in view of McCoy in view of Dorn (US 20240261685)
In claim 2, Cho in view of McCoy fails to disclose at least one value modified from the values corresponding to an existing audio sample, the modified values based upon a desired change in the corresponding audio characteristic of the existing audio sample, however Dorn discloses at least one value modified from the values corresponding to an existing audio sample, the modified values based upon a desired change in the corresponding audio characteristic of the existing audio sample (paragraph 33). It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to allow for audio changes in accordance with changes of the specification of sound producing elements within the graphical scene.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS HAYNES HENRY whose telephone number is (571)270-3905. The examiner can normally be reached M-F 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Peter Vasat can be reached at 571-270-7625. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/THOMAS H HENRY/ Primary Examiner, Art Unit 3715