Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
2. The information disclosure statement (IDS) submitted on July 11, 2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 102
3. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
4. Claims 1-30 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Yang (U.S. Publication No. 20210089863).
Regarding claim 1, Yang discloses a device comprising:
a memory ([0033] – memory);
and one or more processors coupled to the memory and operably configured to ([0033] - The device 102 includes one or more processors…):
generate a first input data state for data samples in a time series of data samples of a portion of an audio data stream ([0040] - The processor 104 (e.g., the autoencoder 130) receives a sequence 120 of input data corresponding to sequential values of a signal to be encoded, such as the audio data 109);
provide the first input data state to a first bottleneck and a second input data state, different from the first input data state, to a second bottleneck, the first bottleneck associated with a first bitrate and the second bottleneck associated with a second bitrate ([0039] - The bit-rate of compression may be controlled by the dimension of the bottleneck together with the codebook size in each dimension [0048] - A neural network 410, such as a fully-connected neural network, is configured to receive the data 460 and generate data 462 (e.g., bottleneck));
and generate a first encoded frame based on a first output data state from the first bottleneck and a second encoded frame based on a second output data state from the second bottleneck, the first encoded frame and the second encoded frame bundled in a packet ([0052] - an encoder/transmitter device that includes the encoder portion 132 may be configured to encode sequential data (e.g., audio data or video data) and to transmit the encoded data (e.g., a sequence of encoded data that includes a version of the data 462 that is generated for each frame of sequential input data).
Regarding claim 2, Yang discloses the device, wherein the first bottleneck and the second bottleneck are integrated into a bottleneck layer of a feedback autoencoder ([0019] - a recurrent autoencoder architecture, referred to as a feedback recurrent autoencoder).
Regarding claim 3, Yang discloses the device, wherein the first and second input data states corresponds to first and second encoder hidden states generated at a bidirectional gated recurrent unit (GRU) layer of the feedback autoencoder ([0035] - The first neural network 133 includes one or more convolutional neural networks, one or more fully-connected neural networks, one or more gated recurrent units (GRUs)…).
Regarding claim 4, Yang discloses the device, wherein the first bitrate is distinct from the second bitrate ([0039] - The bit-rate of compression may be controlled by the dimension of the bottleneck together with the codebook size in each dimension).
Regarding claim 5, Yang discloses the device, wherein the one or more processors are operably configured to allocate a smaller number of bits to latent codes generated at the first bottleneck than latent codes generated at the second bottleneck ([0064] - The clockwork recurrent neural networks enable further reduced bit-rate (e.g., higher compression) by having codes that different time-scales, providing multi-time compression (e.g., latent variables with a time-scale hierarchy)).
Regarding claim 6, Yang discloses the device, wherein a first codebook associated with the first bottleneck has a smaller size than a second codebook associated with the second bottleneck ([0039] - The bit-rate of compression may be controlled by the dimension of the bottleneck together with the codebook size in each dimension).
Regarding claim 7, Yang discloses the device, wherein the packet comprises a predicted frame and a reference frame, wherein a first input data state, associated with the predicted frame, is provided to the first bottleneck, and wherein a second input data state, associated with the reference frame, is provided to the second bottleneck ([0026] - Assuming that a decoder hidden state contains a prediction (e.g., an extrapolated estimate) of a next frame based on previously decoded frames, the encoder portion can use the prediction to form a compact set of codes that carry the information regarding a residual corresponding to a difference between the data associated with the particular time frame (e.g., x.sub.t+1) and the prediction for the particular time frame).
Regarding claim 8, Yang discloses the device, wherein a bit size of the predicted frame is less than a bit size of the reference frame ([0061] - exhibits lower distortion using the same output data size (e.g., at the same compression rate) as the first autoencoder architecture 602, a smaller output data size (e.g., a higher compression rate) at the same distortion, or a combination thereof).
Regarding claim 9, Yang discloses the device, wherein the input data state for each frame of the packet is generated using an attention mechanism ([0091] - the core network may include or correspond to a Public Switched Telephone Network (PSTN), a packet backbone network, or both).
Regarding claim 10, Yang discloses the device, wherein the attention mechanism comprises a transformer ([0072] - A wide-band (16 KHz) audiobook is used having spectrograms computed from square-root Hanning windowed short-time Fourier transform (STFT) with step size of 160 and window size (same as fast Fourier transform (FFT)-size) of 320…).
Regarding claim 11, Yang discloses the device, wherein the one or more processors are operably configured to dynamically change the first bitrate and the second bitrate based on network conditions ([0065] - generated (e.g., learned) based on the latent (e.g., corresponding to the quantizer 134), enabling variable bit-rate compression. For example, a probabilistic model may be generated based on a frequency that each value of the quantized latent occurs).
Regarding claim 12, Yang discloses a method comprising:
generating a first input data state for data samples in a time series of data samples of a portion of an audio data stream ([0040] - The processor 104 (e.g., the autoencoder 130) receives a sequence 120 of input data corresponding to sequential values of a signal to be encoded, such as the audio data 109);
providing the first input data state to a first bottleneck and a second input data state, different from the first input data state, to a second bottleneck, the first bottleneck associated with a first bitrate and the second bottleneck associated with a second bitrate ([0039] - The bit-rate of compression may be controlled by the dimension of the bottleneck together with the codebook size in each dimension [0048] - A neural network 410, such as a fully-connected neural network, is configured to receive the data 460 and generate data 462 (e.g., bottleneck));
and generating a first encoded frame based on a first output data state from the first bottleneck and a second encoded frame based on a second output data state from the second bottleneck, the first encoded frame and the second encoded frame bundled in a packet ([0052] - an encoder/transmitter device that includes the encoder portion 132 may be configured to encode sequential data (e.g., audio data or video data) and to transmit the encoded data (e.g., a sequence of encoded data that includes a version of the data 462 that is generated for each frame of sequential input data).
Dependent claims 13-20 are analogous in scope to claims 2-7, 9, and 11, and are rejected according to the same reasoning.
Regarding claim 21, Yang discloses a device comprising:
a memory ([0033] – memory);
and one or more processors coupled to the memory and operably configured to ([0033] - The device 102 includes one or more processors…):
receive, at a decoder network, a packet that includes a first encoded frame bundled with a second encoded frame, the first encoded frame comprising a first output data state generated from a first bottleneck of a feedback autoencoder, the second encoded frame comprising a second output data state generated from a second bottleneck of the feedback autoencoder, wherein the first bottleneck is associated with a first bitrate and the second bottleneck is associated with a second bitrate ([0020] - An autoencoder includes an encoder portion and a decoder portion. [0026] - The decoder portion can combine the prediction together with the codes to form the reconstruction output as well as updating an extrapolation at the decoder portion for the next time frame, forming an end-to-end trainable predictive coder. [0039] - The bit-rate of compression may be controlled by the dimension of the bottleneck together with the codebook size in each dimension [0048] - A neural network 410, such as a fully-connected neural network, is configured to receive the data 460 and generate data 462 (e.g., bottleneck));
generate a reconstructed first data sample based on the first output data state, the reconstructed first data sample corresponding to a first data sample in a time series of data samples of a portion of an audio data stream ([0026] - The decoder portion can combine the prediction together with the codes to form the reconstruction output as well as updating an extrapolation at the decoder portion for the next time frame, forming an end-to-end trainable predictive coder [0040] - The processor 104 (e.g., the autoencoder 130) receives a sequence 120 of input data corresponding to sequential values of a signal to be encoded, such as the audio data 109);
and generate a reconstructed second data sample based on the second output data state, the reconstructed second data sample corresponding to a second data sample in the time series of data samples ([0026] - The decoder portion can combine the prediction together with the codes to form the reconstruction output as well as updating an extrapolation at the decoder portion for the next time frame, forming an end-to-end trainable predictive coder [0040] - The processor 104 (e.g., the autoencoder 130) receives a sequence 120 of input data corresponding to sequential values of a signal to be encoded, such as the audio data 109).
Regarding claim 22, Yang discloses the device, wherein the first output data state is distinct from the second output data state ([0037] - The first state data 150 and the second state data 152 correspond to a state of the decoder portion 136 resulting from generation of the representation 146 for one set of input data 140).
Dependent claims 23-26 are analogous in scope to claims 3-4 and 9-10, and are rejected according to the same reasoning.
Regarding claim 27, Yang discloses a method comprising:
receiving, at a decoder network, a packet that includes a first encoded frame bundled with a second encoded frame, the first encoded frame comprising a first output data state generated from a first bottleneck of a feedback autoencoder, the second encoded frame comprising a second output data state generated from a second bottleneck of the feedback autoencoder, wherein the first bottleneck is associated with a first bitrate and the second bottleneck is associated with a second bitrate ([0020] - An autoencoder includes an encoder portion and a decoder portion. [0026] - The decoder portion can combine the prediction together with the codes to form the reconstruction output as well as updating an extrapolation at the decoder portion for the next time frame, forming an end-to-end trainable predictive coder. [0039] - The bit-rate of compression may be controlled by the dimension of the bottleneck together with the codebook size in each dimension [0048] - A neural network 410, such as a fully-connected neural network, is configured to receive the data 460 and generate data 462 (e.g., bottleneck));
generating a reconstructed first data sample based on the first output data state, the reconstructed first data sample corresponding to a first data sample in a time series of data samples of a portion of an audio data stream ([0026] - The decoder portion can combine the prediction together with the codes to form the reconstruction output as well as updating an extrapolation at the decoder portion for the next time frame, forming an end-to-end trainable predictive coder [0040] - The processor 104 (e.g., the autoencoder 130) receives a sequence 120 of input data corresponding to sequential values of a signal to be encoded, such as the audio data 109);
and generating a reconstructed second data sample based on the second output data state, the reconstructed second data sample corresponding to a second data sample in the time series of data samples ([0026] - The decoder portion can combine the prediction together with the codes to form the reconstruction output as well as updating an extrapolation at the decoder portion for the next time frame, forming an end-to-end trainable predictive coder [0040] - The processor 104 (e.g., the autoencoder 130) receives a sequence 120 of input data corresponding to sequential values of a signal to be encoded, such as the audio data 109).
Dependent claims 28-30 are analogous in scope to claims 22 and 9-10, and are rejected according to the same reasoning.
Conclusion
5. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Bentaleb (U.S. Publication No. 20220030308) discloses the method and device for streaming content. Ragot (U.S. Publication No. 20210258363) discloses bitrate adaptation of a voice-over-up communication session.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ETHAN DANIEL KIM whose telephone number is (571) 272-1405. The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ETHAN DANIEL KIM/
Examiner, Art Unit 2658
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658