DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/19/2024, 08/14/2024 and 04/07/2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 2 and 12 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 2 and 12 recites the limitation "the first subband signal" in the last line of the claims. There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim(s) 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. More specifically directed to the abstract idea grouping of: mathematical concept and/or mental process.
The independent claim(s) 1, 11, and 20 recite(s):
1. An audio processing method, performed by an electronic device, comprising:
performing multichannel signal decomposition on an audio signal to obtain N subband signals of the audio signal, frequency bands of the N subband signals increasing sequentially and N being an integer greater than 2;
performing signal compression on each subband signal of the N subband signals to obtain a subband signal feature of each subband signal; and
performing quantization encoding on the subband signal feature of each subband signal to obtain a bitstream of each subband signal.
11. An audio processing apparatus comprising:
at least one memory configured to store program code; and
at least one processor configured to read the program code and operate as instructed by the program code, the program code comprising:
decomposition code configured to cause at least one of the at least one processor to perform multichannel signal decomposition on an audio signal to obtain N subband signals of the audio signal, frequency bands of the N subband signals increasing sequentially and N being an integer greater than 2;
compression code configured to cause at least one of the at least one processor to perform signal compression on each subband signal of the N subband signals to obtain a subband signal feature of each subband signal; and
encoding code configured to cause at least one of the at least one processor to perform quantization encoding on the subband signal feature of each subband signal to obtain a bitstream of each subband signal.
20. A non-transitory computer-readable storage medium, storing computer code which, when executed by at least one processor, causes the at least one processor to at least:
[the limitations as in claim 1, above].
This reads on a human (e.g., mentally and/or using pen and paper):
Performing mathematical operations (e.g., mathematical code/algorithm/equation) for decomposing / splitting a signal into a predefined number (greater than 2) of subsignals/frames/subbands, wherein the subsignals/frames/subbands are associated with different frequencies (e.g., 0 - 100 Hz, 100 - 1000 Hz, 1 kHz - 10 kHz);
Performing predetermined set of steps (e.g., mathematical code/algorithm/equation) for compression / downsampling on the subsignals/frames/subbands to obtain features for each;
Performing predetermined set of steps (e.g., mathematical code/algorithm/equation) for quantization / mapping into a smaller set of discrete values.
This judicial exception is not integrated into a practical application because for example: claim 1 recites “an electronic device,”; claim 11 recites “an audio processing apparatus,” “memory,” “program code,” “processor,” “composition code,” “decomposition code,” “compression code,” and “encoding code”; lastly, claim 20 recites “a non-transitory computer-readable storage medium”. As an example, in ¶ [0055] of the as filed specification, it is disclosed: “The processor 520 may be an integrated circuit chip with a signal processing capability, for example, a general-purpose processor, a digital signal processor (DSP), another programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. The general-purpose processor may be a microprocessor, any conventional processor, or the like.”. Therefore, a general-purpose computer or computing device is described and mainly used as an application thereof. Accordingly, these additional elements do not integrate the abstract idea into a practical idea because it does not impose any meaningful limits on practicing the abstract idea.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using a computer is listed as a general computing device as noted. The claim is not patent eligible.
With respect to claims 2 and 12, the claim(s) recite:
2 and 12. The audio processing method/apparatus according to claims 1 and 11,
wherein feature dimensionality of the subband signal feature of the subband signal is not positively correlated with a frequency band of the subband signal, and feature dimensionality of a subband signal feature of an Nth subband signal is lower than feature dimensionality of a subband signal feature of the first subband signal.
This reads on a human (e.g., mentally and/or using pen and paper):
Determining whether the dimensions of the features of the subsignals/frames/subbands do not correlate with a frequency band (e.g., 0 - 100 Hz, 100 - 1000 Hz, 1 kHz - 10 kHz) of the subsignals/frames/subbands, and
Determining whether the dimensions of the features of the subsignals/frames/subbands is lower than the dimensions of the features of a previous subsignal/frame/subband.
No additional limitations are present.
With respect to claims 3 and 13, the claim(s) recite:
3 and 13. The audio processing method/apparatus according to claims 1 and 11,
wherein the multichannel signal decomposition is implemented through multi-layer two-channel subband decomposition and the performing multichannel signal decomposition comprises:
performing first-layer two-channel subband decomposition on the audio signal to obtain a first-layer low-frequency subband signal and a first-layer high-frequency subband signal;
performing an (i+1)th-layer two-channel subband decomposition on an ith-layer subband signal to obtain an (i+1)th-layer low-frequency subband signal and an (i+1)th-layer high-frequency subband signal, the ith-layer subband signal being an ith-layer low-frequency subband signal, or the ith-layer subband signal being an ith-layer high-frequency subband signal and an ith-layer low-frequency subband signal, and i being an increasing natural number with a value range of 1≤i<N; and
determining a last-layer subband signal and a high-frequency subband signal at each layer of the multi-layers that has not undergone the two-channel subband decomposition as subband signals of the audio signal.
This reads on a human (e.g., mentally and/or using pen and paper):
Performing mathematical operations (e.g., mathematical code/algorithm/equation for decomposing / splitting) a signal into a predefined number (greater than 2) of subsignals/frames/subbands, wherein the operations comprise:
Performing the operations to obtain a first low-frequency subsignal/frame/subband and a first high-frequency subsignal/frame/subband
Performing the operations again to obtain a second low-frequency subsignal/frame/subband and a second high-frequency subsignal/frame/subband
Performing the operations until all subsignals/frames/subbands are processed.
No additional limitations are present.
With respect to claims 4 and 14, the claim(s) recite:
4 and 14. The audio processing method/apparatus according to claims 3 and 13,
wherein performing the first-layer two-channel subband decomposition comprises:
sampling the audio signal to obtain a sampled signal, the sampled signal comprising a plurality of sample points obtained through sampling;
performing first-layer low-pass filtering on the sampled signal to obtain a first-layer low-pass filtered signal;
downsampling the first-layer low-pass filtered signal to obtain the first-layer low-frequency subband signal;
performing first-layer high-pass filtering on the sampled signal to obtain a first-layer high-pass filtered signal; and
downsampling the first-layer high-pass filtered signal to obtain the first-layer high-frequency subband signal.
This reads on a human (e.g., mentally and/or using pen and paper):
Wherein the performing the operations to obtain a first low-frequency subsignal/frame/subband and a first high-frequency subsignal/frame/subband comprise:
Sampling (e.g., selecting a number of samples/values) the signal;
Performing operations to obtain a first low-frequency subsignal/frame/subband
Changing the sampling (e.g., selecting a lower number of samples/values) of the signal;
Performing operations to obtain a first high-frequency subsignal/frame/subband
Changing the sampling (e.g., selecting a lower number of samples/values) of the signal.
No additional limitations are present.
With respect to claims 5 and 15, the claim(s) recite:
5 and 15. The audio processing method/apparatus according to claims 1 and 11,
wherein the performing signal compression on each subband signal comprises:
performing the following processing on each subband signal:
calling a first neural network model corresponding to the subband signal; and
performing feature extraction on the subband signal through the first neural network model to obtain the subband signal feature of the subband signal, structural complexity of the first neural network model being positively correlated with dimensionality of the subband signal feature of the subband signal.
This reads on a human (e.g., mentally and/or using pen and paper):
Wherein the performing a predetermined set of steps (e.g., mathematical code/algorithm/equation for compression / downsampling) on the subsignals/frames/subbands to obtain features for each comprises:
Performing the following steps on each:
Using a predetermined set of steps (i.e., model) corresponding to the subsignals/frames/subbands;
Using the predetermined set of steps (i.e., model) to extract features / values from the subsignals/frames/subbands, wherein structurally, the model is correlated (e.g., mathematical concept) with the dimensionality of the features / values from the subsignals/frames/subbands.
No additional limitations are present.
With respect to claims 6 and 16, the claim(s) recite:
6 and 16. The audio processing method/apparatus according to claims 5 and 15,
wherein the performing feature extraction on the subband signal through the first neural network model comprises:
performing the following processing on the subband signal through the first neural network model:
performing convolution on the subband signal to obtain a convolution feature of the subband signal;
performing pooling on the convolution feature to obtain a pooling feature of the subband signal;
downsampling the pooling feature to obtain a downsampling feature of the subband signal; and
performing convolution on the downsampling feature to obtain the subband signal feature of the subband signal.
This reads on a human (e.g., mentally and/or using pen and paper):
Wherein using the predetermined set of steps (i.e., model) to extract features / values from the subsignals/frames/subbands comprise:
Performing the following steps on each:
Using a predetermined set of steps (i.e., model) corresponding to the subsignals/frames/subbands by using convolution (i.e., mathematical concept) to obtain convolution features / values;
Changing the sampling (e.g., downsampling operation) of the features / values;
Changing the sampling (e.g., selecting a lower number of samples/values) of the signal; and
Using convolution (i.e., mathematical concept) to obtain new feature / value subsignals/frames/subbands.
No additional limitations are present.
With respect to claims 7 and 17, the claim(s) recite:
7 and 17. The audio processing method/apparatus according to claims 1 and 11,
wherein the performing signal compression comprises:
separately performing feature extraction on a first k subband signals of the N subband signals to obtain subband signal features respectively corresponding to the first k subband signals, k being an integer within a value range of 1<k<N; and
separately performing bandwidth extension on a last N–k subband signals of the N subband signals to obtain subband signal features respectively corresponding to the last N–k subband signals.
This reads on a human (e.g., mentally and/or using pen and paper):
Wherein the performing predetermined set of steps (e.g., mathematical code/algorithm/equation for compression / downsampling) on the subsignals/frames/subbands to obtain features for each comprises:
Obtaining the features for each subsignals/frames/subbands individually (e.g., one at a time) to obtain features;
Extending the bandwidth (i.e., predetermined number) for each subsignals/frames/subbands individually (e.g., one at a time) to obtain features individually (e.g., one at a time).
No additional limitations are present.
With respect to claims 8 and 18, the claim(s) recite:
8 and 18. The audio processing method/apparatus according to claims 7 and 17,
wherein the separately performing bandwidth extension on the last N–k subband signals of the N subband signals comprises:
performing the following processing on each of the last N–k subband signals:
performing frequency domain transform based on a plurality of sample points comprised in the subband signal to obtain transform coefficients respectively corresponding to the plurality of sample points;
dividing the transform coefficients respectively corresponding to the plurality of sample points into a plurality of subbands;
performing mean processing on a transform coefficient comprised in each subband to obtain average energy corresponding to each subband, and using the average energy as a subband spectral envelope of a corresponding subband; and
determining a subband spectral envelope of each subband as a subband signal feature of the subband signal.
This reads on a human (e.g., mentally and/or using pen and paper):
Wherein the extending the bandwidth (i.e., predetermined number) for each subsignals/frames/subbands individually (e.g., one at a time) to obtain features individually (e.g., one at a time) comprise:
Performing the following steps on each subsignals/frames/subbands
Performing/computing a frequency domain transform (e.g., Fourier transform) based on the samples from the subsignals/frames/subbands to obtain sample coefficients;
Dividing the sample coefficients;
Performing mean processing (i.e., mathematical concept) of the coefficients to obtain an average energy as an envelope/window (i.e., mathematical concept); and
Determining an average energy as an envelope/window (i.e., mathematical concept) for each of the subsignals/frames/subbands.
No additional limitations are present.
With respect to claims 9 and 19, the claim(s) recite:
9 and 19. The audio processing method/apparatus according to claims 8 and 18,
wherein the performing frequency domain transform comprises:
obtaining a reference subband signal of a reference audio signal, the reference audio signal being an audio signal adjacent to the audio signal, and a frequency band of the reference subband signal being the same as that of the subband signal; and
performing, based on a plurality of sample points comprised in the reference subband signal and the plurality of sample points comprised in the subband signal, discrete cosine transform on the plurality of sample points comprised in the subband signal to obtain the transform coefficients respectively corresponding to the plurality of sample points comprised in the subband signal.
This reads on a human (e.g., mentally and/or using pen and paper):
Wherein performing/computing the frequency domain transform (e.g., Fourier transform) based on the samples from the subsignals/frames/subbands to obtain sample coefficients comprises:
Obtaining or identifying a reference signal (i.e., predetermined) and a frequency band/range of said reference signal and
Performing a discrete cosine transform (i.e., mathematical concept) on the plurality of sample points/ values to obtain transform coefficients.
No additional limitations are present.
With respect to claim 10, the claim(s) recite:
10. The audio processing method according to claim 1,
wherein the performing quantization encoding comprises:
quantizing the subband signal feature of each subband signal to obtain an index value of the subband signal feature; and
performing entropy encoding on the index value of the subband signal feature to obtain a bitstream of the subband signal.
This reads on a human (e.g., mentally and/or using pen and paper):
Wherein the performing predetermined set of steps (e.g., mathematical code/algorithm/equation) for quantization / mapping into a smaller set of discrete values comprise:
Obtaining index value for signal features; and
Performing a predetermined set of steps (i.e., entropy encoding – mathematical concept) on the index values to obtain a bitstream of the subsignals/frames/subbands.
No additional limitations are present.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-2, 11-12, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Feng et al. (US 20230154474 A1) and further in view of Kanagawa (US 20240339105 A1).
As to independent claim 1, Feng et al. teaches:
1. An audio processing method, performed by an electronic device (see ¶ [0006]: “… the present disclosure provides a computer-implemented method for providing high quality audio for playback over a low bit rate network connection in real-time communication. The method is performed by a real-time communication software application…”), comprising:
performing multichannel signal decomposition on an audio signal to obtain N subband signals of the audio signal (see ¶ [0006] citation as in preamble above and further ¶ [0006]: “… The method is performed by a real-time communication software application and includes receiving a stream of audio input data on a sending device; suppressing noise from the stream of audio input data to generate clean audio input data on the sending device; splitting the clean audio input data into a set of frames of audio data on the sending device; standardizing each frame within the set of frames to generate a set of frames of standardized audio data on the sending device, wherein audio data of the frame is resampled according to two frequency ranges corresponding to a wideband mode and a super wideband mode, thereby forming lower sub-band audio data and higher sub-band audio data…”
and ¶ [0030]: “…Accordingly, at 308, the improved encoder 114 decomposes the standardized PCM data of each frame into two sub-bands of audio data…”),
performing signal compression on each subband signal of the N subband signals to obtain a subband signal feature of each subband signal (see ¶ [0035]: “Turning back to FIG. 3, at 312, the improved encoder 114 compresses the extracted set of audio features for each frame using a signal compressor, such as a vector quantization and frame correlation method. In one implementation, the signal compressor is a difference vector quantization (DVQ) method. Alternatively, the signal compressor is a residual vector quantization (RVQ) method. In a further implementation, the compression uses a proper interpolation policy. The compression process is further illustrated by reference to FIG. 5.”); and
performing quantization encoding on the subband signal feature of each subband signal to obtain a bitstream of each subband signal (see ¶ [0006]: “[0006] Generally speaking, pursuant to the various embodiments, the present disclosure provides a computer-implemented method for providing high quality audio for playback over a low bit rate network connection in real-time communication. The method is performed by a real-time communication software application and includes receiving a stream of audio input data on a sending device; suppressing noise from the stream of audio input data to generate clean audio input data on the sending device; splitting the clean audio input data into a set of frames of audio data on the sending device; standardizing each frame within the set of frames to generate a set of frames of standardized audio data on the sending device, wherein audio data of the frame is resampled according to two frequency ranges corresponding to a wideband mode and a super wideband mode, thereby forming lower sub-band audio data and higher sub-band audio data; extracting a set of audio features for each frame within the set of frames of standardized audio data, thereby forming a set of sets of audio features on the sending device; quantizing the set of audio features for each frame within the set of frames of standardized audio data into a compressed set of audio features on the sending device; […] In one implementation, the inverse quantization process is an inverse difference vector quantization (DVQ) method, an inverse residual vector quantization (RVQ) method, or an inverse interpolation method. Quantizing the set of audio features includes compressing the set of audio features of each i-frame within the set of frames using a residual vector quantization (RVQ) method or a difference vector quantization (DVQ) method, wherein there is at least one i-frame with the set of frames; and compressing the set of audio features of each non-i-frames within the set of frames using interpolation. In one implementation, the two frequency ranges are 0 to 16 kHz and 16 kHz to 32 kHz respectively; and the noise is suppressed based on machine learning.”,
¶ [0035]: “Turning back to FIG. 3, at 312, the improved encoder 114 compresses the extracted set of audio features for each frame using a signal compressor, such as a vector quantization and frame correlation method. In one implementation, the signal compressor is a difference vector quantization (DVQ) method. Alternatively, the signal compressor is a residual vector quantization (RVQ) method. In a further implementation, the compression uses a proper interpolation policy. The compression process is further illustrated by reference to FIG. 5.”,
Fig. 6-7 (decoder receiving wideband/superwideband packets), and
¶ [0040]: “Referring now to FIG. 6, a flowchart illustrating a process by which the improved decoder 116 decodes a received packet in super wideband mode and obtains audio data for playback on the receiving device 116 is shown and generally indicated at 600. At 602, the improved decoder 116 receives the audio data packet sent by the sender device 102 at 316. Once the packet is retrieved, at 604, the improved decoder 116 retrieves the set of audio features of each frame from the packet. When the sub-bands are 0 kHz-16 kHz and 16 kHz-32 kHz, the higher sub-band has the sampling frequency range of 16 kHz-32 kHz while the lower sub-band has the other range. For the higher sub-band, the LPC coefficients and energy features (such as the ratios of energy summation between the lower and higher sub-bands) are directly retrieved from the packet.”).
However, Feng et al. does not explicitly teach, but Kanagawa does teach:
frequency bands of the N subband signals increasing sequentially and N being an integer greater than 2 (see Fig. 2 and ¶ [0044]: “FIG. 2 is a diagram illustrating an example of subband signals. The vertical axis in FIG. 2 corresponds to amplitude response, and the horizontal axis corresponds to normalized frequency. FIG. 2 illustrates a case where four subband signals sub1, sub2, sub3, and sub4 are generated by filtering the speech signal (full-band signal). The subband signal sub1 is a low-frequency subband signal. The subband signal sub2 is a low-frequency to mid-frequency subband signal. The subband signal sub3 is a mid-frequency to high-frequency subband signal. The subband signal sub4 is a high-frequency subband signal.”)
Feng et al. and Kanagawa are considered to be analogous to the claimed invention because they are in the same field of endeavor in audio/speech signal processing associated with sub-band generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Feng et al. to incorporate the teachings of Kanagawa of frequency bands of the N subband signals increasing sequentially and N being an integer greater than 2 which provides the benefit of generating speech waveforms at high speed from acoustic feature values ([0015] of Kanagawa).
As to independent claim 11, Feng et al. in combination with Kanagawa teaches the limitations as in claim 1, above.
Feng et al. further teaches:
11. An audio processing apparatus (see Fig. 1 (102, 104: devices), and
¶ [0020]: “Turning to the Figures and to FIG. 1 in particular, a block diagram illustrating a real-time communication (RTC) system is shown and generally indicated at 100. The RTC system includes a set of electronic communication devices, such as those indicated at 102 and 104, adapted to communicate with each other over a network (such as the Internet) 122. In one implementation, the network communication protocol is Transmission Control Protocol (TCP) and the Internet Protocol (IP) (collectively referred to as TCP/IP). The devices 102-104 are also referred to herein as participating devices. The devices 102-104 connect to the Internet 122 via wireless or wired networks, such as Wi-Fi networks and Ethernet networks.”) comprising:
at least one memory configured to store program code (see ¶ [0022]: “Referring to FIG. 2, a block diagram illustrating the wireless communication device 102 is shown. The device 102 includes a processing unit 202, some amount of memory 204 operatively coupled to the processing unit 202, one or more user input interfaces (such as a touch pad, a keyboard, a mouse, etc.) 206 operatively coupled to the processing unit 202, a voice input interface (such as a microphone) 208 operatively coupled to the processing unit 202, a voice output interface (such as a speaker) 210 operatively coupled to the processing unit 202,…”); and
at least one processor configured to read the program code and operate as instructed by the program code (see ¶ [0020 and 0022] citations as in limitations above and further ¶ [0022-0024]: “[0022] …The device 102 also includes an operating system (such as iOS®, Android, etc.) 220 running on the processing unit 202. One or more computer software applications 222-224 are loaded and executed on the device 102. The computer software applications 222-224 are implemented using one or more computer software programming languages, such as C, C++, C#, Java, etc. [0023] In one implementation, the computer software application 222 is a real-time communication software application. For example, the application 222 enables an online meeting between two or more of people over the Internet 122. Such a real-time communication involves audio and/or video communication.”), the program code comprising:
decomposition code configured to cause at least one of the at least one processor (see ¶ [0022-0024] citations as in limitation above. More specifically: “[0022] …The device 102 also includes an operating system (such as iOS®, Android, etc.) 220 running on the processing unit 202… [0024] Turning back to FIG. 1, the RTC devices 102-104 are adapted to participate in RTC sessions. Each of the 102-104 RTC devices runs the improved RTC application software 222, which includes a machine learning based noise suppression module 112, an encoder 114 and a decoder 116.”) to perform multichannel signal decomposition on an audio signal to obtain N subband signals of the audio signal (see ¶ [0030]: “…Accordingly, at 308, the improved encoder 114 decomposes the standardized PCM data of each frame into two sub-bands of audio data…”);
compression code configured to cause at least one of the at least one processor (see ¶ [0022-0024] citations as in limitation above. More specifically: “[0022] …The device 102 also includes an operating system (such as iOS®, Android, etc.) 220 running on the processing unit 202… [0024] Turning back to FIG. 1, the RTC devices 102-104 are adapted to participate in RTC sessions. Each of the 102-104 RTC devices runs the improved RTC application software 222, which includes a machine learning based noise suppression module 112, an encoder 114 and a decoder 116.” ) to perform signal compression on each subband signal of the N subband signals to obtain a subband signal feature of each subband signal (see ¶ [0035]: “Turning back to FIG. 3, at 312, the improved encoder 114 compresses the extracted set of audio features for each frame using a signal compressor, such as a vector quantization and frame correlation method. In one implementation, the signal compressor is a difference vector quantization (DVQ) method. Alternatively, the signal compressor is a residual vector quantization (RVQ) method. In a further implementation, the compression uses a proper interpolation policy. The compression process is further illustrated by reference to FIG. 5.”); and
encoding code configured to cause at least one of the at least one processor (see ¶ [0022-0024] citations as in limitation above. More specifically: “[0022] …The device 102 also includes an operating system (such as iOS®, Android, etc.) 220 running on the processing unit 202… [0024] Turning back to FIG. 1, the RTC devices 102-104 are adapted to participate in RTC sessions. Each of the 102-104 RTC devices runs the improved RTC application software 222, which includes a machine learning based noise suppression module 112, an encoder 114 and a decoder 116.”) to perform quantization encoding on the subband signal feature of each subband signal to obtain a bitstream of each subband signal (see ¶ [0006]: “[0006] Generally speaking, pursuant to the various embodiments, the present disclosure provides a computer-implemented method for providing high quality audio for playback over a low bit rate network connection in real-time communication. The method is performed by a real-time communication software application and includes receiving a stream of audio input data on a sending device; suppressing noise from the stream of audio input data to generate clean audio input data on the sending device; splitting the clean audio input data into a set of frames of audio data on the sending device; standardizing each frame within the set of frames to generate a set of frames of standardized audio data on the sending device, wherein audio data of the frame is resampled according to two frequency ranges corresponding to a wideband mode and a super wideband mode, thereby forming lower sub-band audio data and higher sub-band audio data; extracting a set of audio features for each frame within the set of frames of standardized audio data, thereby forming a set of sets of audio features on the sending device; quantizing the set of audio features for each frame within the set of frames of standardized audio data into a compressed set of audio features on the sending device; […] In one implementation, the inverse quantization process is an inverse difference vector quantization (DVQ) method, an inverse residual vector quantization (RVQ) method, or an inverse interpolation method. Quantizing the set of audio features includes compressing the set of audio features of each i-frame within the set of frames using a residual vector quantization (RVQ) method or a difference vector quantization (DVQ) method, wherein there is at least one i-frame with the set of frames; and compressing the set of audio features of each non-i-frames within the set of frames using interpolation. In one implementation, the two frequency ranges are 0 to 16 kHz and 16 kHz to 32 kHz respectively; and the noise is suppressed based on machine learning.”,
¶ [0035]: “Turning back to FIG. 3, at 312, the improved encoder 114 compresses the extracted set of audio features for each frame using a signal compressor, such as a vector quantization and frame correlation method. In one implementation, the signal compressor is a difference vector quantization (DVQ) method. Alternatively, the signal compressor is a residual vector quantization (RVQ) method. In a further implementation, the compression uses a proper interpolation policy. The compression process is further illustrated by reference to FIG. 5.”,
Fig. 6-7 (decoder receiving wideband/superwideband packets), and
¶ [0040]: “Referring now to FIG. 6, a flowchart illustrating a process by which the improved decoder 116 decodes a received packet in super wideband mode and obtains audio data for playback on the receiving device 116 is shown and generally indicated at 600. At 602, the improved decoder 116 receives the audio data packet sent by the sender device 102 at 316. Once the packet is retrieved, at 604, the improved decoder 116 retrieves the set of audio features of each frame from the packet. When the sub-bands are 0 kHz-16 kHz and 16 kHz-32 kHz, the higher sub-band has the sampling frequency range of 16 kHz-32 kHz while the lower sub-band has the other range. For the higher sub-band, the LPC coefficients and energy features (such as the ratios of energy summation between the lower and higher sub-bands) are directly retrieved from the packet.”)
Kanagawa further teaches:
frequency bands of the N subband signals increasing sequentially and N being an integer greater than 2 (see Fig. 2 and ¶ [0044]: “FIG. 2 is a diagram illustrating an example of subband signals. The vertical axis in FIG. 2 corresponds to amplitude response, and the horizontal axis corresponds to normalized frequency. FIG. 2 illustrates a case where four subband signals sub1, sub2, sub3, and sub4 are generated by filtering the speech signal (full-band signal). The subband signal sub1 is a low-frequency subband signal. The subband signal sub2 is a low-frequency to mid-frequency subband signal. The subband signal sub3 is a mid-frequency to high-frequency subband signal. The subband signal sub4 is a high-frequency subband signal.”)
Feng et al. and Kanagawa are considered to be analogous to the claimed invention because they are in the same field of endeavor in audio/speech signal processing associated with sub-band generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Feng et al. to incorporate the teachings of Kanagawa of frequency bands of the N subband signals increasing sequentially and N being an integer greater than 2 which provides the benefit of generating speech waveforms at high speed from acoustic feature values ([0015] of Kanagawa).
As to independent claim 20, Feng et al. in combination with Kanagawa teaches the limitations as in claim 1, above.
Feng et al. further teaches:
20. A non-transitory computer-readable storage medium, storing computer code which, when executed by at least one processor (see [0021]: “The communication devices 102-104 each can be a laptop computer, a tablet computer, a smartphone, or other types of portable devices capable of accessing the Internet 122 over a network link. Taking the device 102 as an example, the devices 102-104 are further illustrated by reference to FIG. 2.”
and ¶ [0022]: “Referring to FIG. 2, a block diagram illustrating the wireless communication device 102 is shown. The device 102 includes a processing unit 202, some amount of memory 204 operatively coupled to the processing unit 202, one or more user input interfaces (such as a touch pad, a keyboard, a mouse, etc.) 206 operatively coupled to the processing unit 202, a voice input interface (such as a microphone) 208 operatively coupled to the processing unit 202, a voice output interface (such as a speaker) 210 operatively coupled to the processing unit 202, a video input interface (such as a camera) 212 operatively coupled to the processing unit 202, a video output interface (such as a display screen) 214 operatively coupled to the processing unit 202, and a network interface (such as a Wi Fi network interface) 216 operatively coupled to the processing unit 202 for connecting to the Internet 122. The device 102 also includes an operating system (such as iOS®, Android, etc.) 220 running on the processing unit 202. One or more computer software applications 222-224 are loaded and executed on the device 102. The computer software applications 222-224 are implemented using one or more computer software programming languages, such as C, C++, C#, Java, etc.”), causes the at least one processor to at least:
[the limitations as in claim 1, above].
Regarding claims 2 and 12, Feng et al. in combination with Kanagawa teach the limitations as in claims 1 and 11, above.
Kanagawa further teaches:
2 and 12. The audio processing method/apparatus according to claims 1 and 11,
wherein feature dimensionality of the subband signal feature of the subband signal is not positively correlated with a frequency band of the subband signal (see Fig. 2 and ¶ [0042]: “The intermediate representation of the acoustic feature value is information obtained by extending the sequence length of the acoustic feature value to be the same as the number of speech samples…”
and ¶ [0044]: “FIG. 2 is a diagram illustrating an example of subband signals. The vertical axis in FIG. 2 corresponds to amplitude response, and the horizontal axis corresponds to normalized frequency. FIG. 2 illustrates a case where four subband signals sub1, sub2, sub3, and sub4 are generated by filtering the speech signal (full-band signal). The subband signal sub1 is a low-frequency subband signal. The subband signal sub2 is a low-frequency to mid-frequency subband signal. The subband signal sub3 is a mid-frequency to high-frequency subband signal. The subband signal sub4 is a high-frequency subband signal.”), and
feature dimensionality of a subband signal feature of an Nth subband signal is lower than feature dimensionality of a subband signal feature of the first subband signal (see Fig. 2 and ¶ [0044] citations as in limitation above. More specifically, ¶ [0042]: “The intermediate representation of the acoustic feature value is information obtained by extending the sequence length of the acoustic feature value to be the same as the number of speech samples…”).
Feng et al. and Kanagawa are considered to be analogous to the claimed invention because they are in the same field of endeavor in audio/speech signal processing associated with sub-band generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Feng et al. to incorporate the teachings of Kanagawa of wherein feature dimensionality of the subband signal feature of the subband signal is not positively correlated with a frequency band of the subband signal, and feature dimensionality of a subband signal feature of an Nth subband signal is lower than feature dimensionality of a subband signal feature of the first subband signal which provides the benefit of generating speech waveforms at high speed from acoustic feature values ([0015] of Kanagawa).
Claims 5-6 and 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Feng et al. (US 20230154474 A1) and further in view of Kanagawa (US 20240339105 A1) as applied to claims 1 and 11 above, and further in view of Ghaffarzadegan et al. (US 20190365342 A1).
Regarding claims 5 and 15, Feng et al. in combination with Kanagawa teach the limitations as in claims 1 and 11, above.
Feng et al. further teaches:
5 and 15. The audio processing method/apparatus according to claims 1 and 11,
(claim 5) wherein the performing signal compression on each subband signal (see ¶ [0035]: “Turning back to FIG. 3, at 312, the improved encoder 114 compresses the extracted set of audio features for each frame using a signal compressor, such as a vector quantization and frame correlation method. In one implementation, the signal compressor is a difference vector quantization (DVQ) method. Alternatively, the signal compressor is a residual vector quantization (RVQ) method. In a further implementation, the compression uses a proper interpolation policy. The compression process is further illustrated by reference to FIG. 5.”) comprises:
(claim 15) wherein the compression code is further configured to cause at least one of the at least one processor (see ¶ [0035] citation as in limitation above and further see ¶ [0022-0024] citations as in limitation above. More specifically: “[0022] …The device 102 also includes an operating system (such as iOS®, Android, etc.) 220 running on the processing unit 202… [0024] Turning back to FIG. 1, the RTC devices 102-104 are adapted to participate in RTC sessions. Each of the 102-104 RTC devices runs the improved RTC application software 222, which includes a machine learning based noise suppression module 112, an encoder 114 and a decoder 116.” ) to:
However, Feng et al. in combination with Kanagawa do not explicitly teach, but Ghaffarzadegan et al. does teach:
performing the following processing on each subband signal (see ¶ [0044]: “…The first convolutional layer 120 is configured to extract features of the respective frequency sub-band segment…”
and ¶ [0028]: “…The phonocardiogram classification model 42 is configured to receive a digital audio waveform corresponding segmented cardiac cycle 100…”):
calling a first neural network model corresponding to the subband signal (see ¶ [0006 and 0028] citations as in limitation above and further ¶ [0006]: “ …The processor is configured to: operate the transceiver to receive the phonocardiogram from the stethoscope; segment the phonocardiogram into a plurality of segments, each segment comprising a time series of acoustic values corresponding to only one cardiac cycle from the phonocardiogram; for each segment in the plurality of segments: decompose the respective segment into a respective plurality of frequency sub-band segments using the first convolutional neural network, each frequency sub-band segment comprising a time series of acoustic values corresponding to a respective frequency sub-band of the respective segment; and determine a probability that the respective segment contains the abnormal heart sound based on the respective plurality of frequency sub-band segments using at least one neural network; and operate the output device to generate a perceptible output depending on the probabilities that each segment in the plurality of segments contains the abnormal heart sound.”
and ¶ [0028-0029]: “[0028]…The phonocardiogram classification model 42 comprises at least one neural network, in particular at least one convolutional neural network (CNN), configured to extract features from a segmented cardiac cycle 100 and classify the segmented cardiac cycle 100 as being normal or abnormal. [0029] The classification parameters 44 of the phonocardiogram classification model 42 comprise a plurality of kernel weights and/or filter values which are learned in a training process and used by the convolutional neural network(s) to extract features from the segmented cardiac cycle and to classify the segmented cardiac cycle 100 as being normal or abnormal…”); and
performing feature extraction on the subband signal through the first neural network model to obtain the subband signal feature of the subband signal (see ¶ [0006 and 0028-0029] citations as in limitations above and further ¶ [0044]: “The first convolutional layer 120 is implemented as a convolutional neural network having a predetermined number of filters with a predetermined length and/or kernel size. The first convolutional layer 120 is configured to extract features of the respective frequency sub-band segment. In the illustrated embodiment, the first convolutional layer 120 of each branch has 8 filters of length and/or kernel size 5. The first convolutional layer 120 is followed by a Rectified Linear Unit (ReLU) activation of the output. In at least one embodiment, the first convolutional layer 120 is also followed by batch normalization and/or L2 regularization. After activation, the first maxpooling layer 130 pools and/or reduces the dimensionality of the output with a predetermined pool size (e.g., 2). In at least one embodiment, after the maxpooling, a dropout layer is applied to dropout a random set (e.g., 50%) of activations.”),
structural complexity of the first neural network model being positively correlated with dimensionality of the subband signal feature of the subband signal (see ¶ [0044] citation as in limitation above and further ¶ [0033]: “Through a local connectivity pattern of neurons between adjacent layers, the 1D-CNN of the time-convolution layer 110 is configured to perform cross-correlation between its input x[n] (i.e., the segmented cardiac cycle 100) and its kernel.”
¶ [0042]: “FIG. 6 shows decompositions of a segmented cardiac cycle 100 with an exemplary FIR filter bank, an exemplary tConv layer, and an exemplary LP-tConv layer, as well as magnitude and phase responses of the exemplary LP-tConv layer. Particularly, the plots in rows (1), (2), (3), and (4) correspond to the four respective input branches of time-convolution layers 110. The plots in column (A) correspond the decomposed frequency sub-bands using a FIR filter bank designed to implement band pass filters for the frequency sub-bands 25-45 Hz (1), 45-80 Hz (2), 80-200 Hz (3), and 200-500 Hz (4). The plots in column (B) correspond to the decomposed frequency sub-bands using a tConv layer having a learned kernel initialized based on the equivalent FIR coefficients for the sub-bands of column (A). The plots in column (C) correspond to the decomposed frequency sub-bands using a LP-tConv layer having a learned kernel initialized based on the equivalent FIR coefficients for the sub-bands of column (A). Finally, the plots in column (D) correspond to magnitude response (solid line) and phase response (dashed line) of the learned LP-tConv layer of column (C). As can be observed, the learned kernel for the higher frequency sub-bands are less affected by the training process after initialization, compared the lower frequency sub-bands.”
and ¶ [0055]: “As discussed above, each time-convolution layer 110 has a unique set of kernel weights b.sub.0.sup.j, b.sub.1.sup.j, b.sub.2.sup.j . . ., b.sub.N.sup.j, where j corresponds to the respective time-convolution layer 110. In at least one embodiment, the processor 32 is configured to determine each respective frequency sub-band segment based on a respective segmented cardiac cycle 100 using a different respective time-convolution layer 110. More particularly, in one embodiment, the processor 32 is configured to determine each respective frequency sub-band segment by calculating a cross-correlation between the respective segmented cardiac cycle 100 and the unique set of kernel weights b.sub.0.sup.j, b.sub.1.sup.j, b.sub.2.sup.j . . . , b.sub.N.sup.j of the different respective time-convolution layer 110. As a result, each time-convolution layer 110 generates a different pathologically significant frequency sub-band segment, due to unique filtering characteristics of each time-convolution layer 110. As discussed above, exemplary frequency sub-band segment comprises are shown in columns (B) and (C) of the FIG. 6.”).
Feng et al., Kanagawa, and Ghaffarzadegan et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in audio signal processing associated with sub-band generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Feng et al. in combination with Kanagawa to incorporate the teachings of Ghaffarzadegan et al.of performing the following processing on each subband signal: calling a first neural network model corresponding to the subband signal; and performing feature extraction on the subband signal through the first neural network model to obtain the subband signal feature of the subband signal, structural complexity of the first neural network model being positively correlated with dimensionality of the subband signal feature of the subband signal which provides the benefit of offering an improvement over traditionally implemented FIR filter-bank front-end ([0041] of Ghaffarzadegan et al.).
Regarding claims 6 and 16, Feng et al. in combination with Kanagawa and Ghaffarzadegan et al. teach the limitations as in claims 5 and 15, above.
Feng et al. further teaches:
6 and 16. The audio processing method/apparatus according to claims 5 and 15,
(claim 16) wherein the compression code is further configured to cause at least one of the at least one processor (see ¶ [0022-0024] citations as in claim 11 above. More specifically: “[0022] …The device 102 also includes an operating system (such as iOS®, Android, etc.) 220 running on the processing unit 202… [0024] Turning back to FIG. 1, the RTC devices 102-104 are adapted to participate in RTC sessions. Each of the 102-104 RTC devices runs the improved RTC application software 222, which includes a machine learning based noise suppression module 112, an encoder 114 and a decoder 116.” ) to:
Ghaffarzadegan et al. further teaches:
(claim 6) wherein the performing feature extraction on the subband signal through the first neural network model (see ¶ [0006 and 0028-0029, and 0044] citations as in claim 5 and 15 above: “[0044] …The first convolutional layer 120 is configured to extract features of the respective frequency sub-band segment..”) comprises:
performing the following processing on the subband signal through the first neural network model (see ¶ [0006 and 0028-0029, and 0044] citations as in claim 5 and 15 above: “[0044] …The first convolutional layer 120 is implemented as a convolutional neural network having a predetermined number of filters with a predetermined length and/or kernel size. The first convolutional layer 120 is configured to extract features of the respective frequency sub-band segment. In the illustrated embodiment, the first convolutional layer 120 of each branch has 8 filters of length and/or kernel size 5.”):
performing convolution on the subband signal to obtain a convolution feature of the subband signal (see ¶ [0006 and 0028-0029, and 0044] citations as in claim 5 and 15 above: “[0044] …The first convolutional layer 120 is implemented as a convolutional neural network having a predetermined number of filters with a predetermined length and/or kernel size. The first convolutional layer 120 is configured to extract features of the respective frequency sub-band segment. In the illustrated embodiment, the first convolutional layer 120 of each branch has 8 filters of length and/or kernel size 5.);
performing pooling on the convolution feature to obtain a pooling feature of the subband signal (see ¶ [0006 and 0028-0029] citations as in limitations above and further ¶ [0044]: “The first convolutional layer 120 is implemented as a convolutional neural network having a predetermined number of filters with a predetermined length and/or kernel size. The first convolutional layer 120 is configured to extract features of the respective frequency sub-band segment. In the illustrated embodiment, the first convolutional layer 120 of each branch has 8 filters of length and/or kernel size 5. The first convolutional layer 120 is followed by a Rectified Linear Unit (ReLU) activation of the output. In at least one embodiment, the first convolutional layer 120 is also followed by batch normalization and/or L2 regularization. After activation, the first maxpooling layer 130 pools and/or reduces the dimensionality of the output with a predetermined pool size (e.g., 2). In at least one embodiment, after the maxpooling, a dropout layer is applied to dropout a random set (e.g., 50%) of activations.”);
downsampling the pooling feature to obtain a downsampling feature of the subband signal (see ¶ [0006 and 0028-0029] citations as in limitations above and further ¶ [0044]: “The first convolutional layer 120 is implemented as a convolutional neural network having a predetermined number of filters with a predetermined length and/or kernel size. The first convolutional layer 120 is configured to extract features of the respective frequency sub-band segment. In the illustrated embodiment, the first convolutional layer 120 of each branch has 8 filters of length and/or kernel size 5. The first convolutional layer 120 is followed by a Rectified Linear Unit (ReLU) activation of the output. In at least one embodiment, the first convolutional layer 120 is also followed by batch normalization and/or L2 regularization. After activation, the first maxpooling layer 130 pools and/or reduces the dimensionality of the output with a predetermined pool size (e.g., 2). In at least one embodiment, after the maxpooling, a dropout layer is applied to dropout a random set (e.g., 50%) of activations.”); and
performing convolution on the downsampling feature to obtain the subband signal feature of the subband signal (see ¶ [0006, 0028-0029, and 0044] citations as in limitations above and further Fig. 3 (first convolutional layer 120, the first maxpooling layer 130, the second convolutional layer 140, the second maxpooling layer 150) and ¶ [0045]: “The second convolutional layer 140 is similarly implemented as a convolutional neural network having a predetermined number of filters with a predetermined length and/or kernel size. The second convolutional layer 140 is configured to extract features of the respective frequency sub-band segment. In at least one embodiment, the second convolutional layer 140 has fewer filters than the first convolutional layer 120. In the illustrated embodiment, the second convolutional layer 140 of each branch has 4 filters of length and/or kernel size 5. The second convolutional layer 140 is followed by a Rectified Linear Unit (ReLU) activation of the output. In at least one embodiment, the second convolutional layer 140 is also followed by batch normalization and/or L2 regularization. After activation, the second maxpooling layer 150 pools and/or reduces the dimensionality of the output with a predetermined pool size (e.g., 2). In at least one embodiment, after the maxpooling, a dropout layer is applied to dropout a random set (e.g., 50%) of activations.”
and ¶ [0058]: “…In the particular embodiments described herein, the processor 32 is configured to use the phonocardiogram classification model 42, which includes the first convolutional layer 120, the first maxpooling layer 130, the second convolutional layer 140, the second maxpooling layer 150..”).
Feng et al., Kanagawa, and Ghaffarzadegan et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in audio signal processing associated with sub-band generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Feng et al. in combination with Kanagawa to incorporate the teachings of Ghaffarzadegan et al.of wherein the performing feature extraction on the subband signal through the first neural network model comprises: performing the following processing on the subband signal through the first neural network model: performing convolution on the subband signal to obtain a convolution feature of the subband signal; performing pooling on the convolution feature to obtain a pooling feature of the subband signal; downsampling the pooling feature to obtain a downsampling feature of the subband signal; and performing convolution on the downsampling feature to obtain the subband signal feature of the subband signal which provides the benefit of offering an improvement over traditionally implemented FIR filter-bank front-end ([0041] of Ghaffarzadegan et al.).
Claim 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Feng et al. (US 20230154474 A1) and further in view of Kanagawa (US 20240339105 A1) as applied to claim 1 above, and further in view of You (US 7937271 B2).
Regarding claim 10, Feng et al. in combination with Kanagawa teach the limitations as in claim 1, above.
Feng et al. further teaches:
10. The audio processing method according to claim 1,
wherein the performing quantization encoding (see ¶ [0006, 0035, and 0040] citations as in claim 1, above. e.g., “[0006] … quantizing the set of audio features for each frame within the set of frames of standardized audio data into a compressed set of audio features on the sending device;...”.”) comprises:
However, Feng et al. in combination with Kanagawa do not explicitly teach, but You does teach:
quantizing the subband signal feature of each subband signal to obtain an index value of the subband signal feature (see ¶ Col. 1, lines 40-67: “(6) The present invention addresses this need by providing, among other things, decoding systems, methods and techniques in which audio data are retrieved from a bit stream by applying code books to specified ranges of quantization indexes (in some cases even crossing boundaries of quantization units) and by identifying a sequence of different windows to be applied within a single frame of the audio data based on window information within the bit stream.
(7) Thus, in one representative embodiment, the invention is directed to systems, methods and techniques for decoding an audio signal from a frame-based bit stream. Each frame includes processing information pertaining to the frame and entropy-encoded quantization indexes representing audio data within the frame. The processing information includes: (i) entropy code book indexes, (ii) code book application information specifying ranges of entropy-encoded quantization indexes to which the code books are to be applied, and (iii) window information. The entropy-encoded quantization indexes are decoded by applying the identified code books to the corresponding ranges of entropy-encoded quantization indexes. Subband samples are then generated by dequantizing the decoded quantization indexes, and a sequence of different window functions that were applied within a single frame of the audio data is identified based on the window information. Time-domain audio data are obtained by inverse-transforming the subband samples and using the plural different window functions indicated by the window information.”); and
performing entropy encoding on the index value of the subband signal feature to obtain a bitstream of the subband signal (see ¶ Col. 1, lines 40-67 and further ¶ Col. 3, lines 7-18: “(10) As discussed in more detail in the '346 Application, in the preferred embodiments of the invention the audio data within bit stream 20 have been transformed into subband samples (preferably using a unitary sinusoidal-based transform technique), quantized, and then entropy-encoded. In the preferred embodiments, the audio data have been transformed using the modified discrete cosine transform (MDCT), quantized and then entropy-encoded using appropriate Huffman encoding. However, in alternate embodiments other transform and/or entropy-encoding techniques instead may be used, and references in the following discussion to MDCT or Huffman should be understood as exemplary only.”).
Feng et al., Kanagawa, and You are considered to be analogous to the claimed invention because they are in the same field of endeavor in audio signal processing associated with sub-band generation. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Feng et al. in combination with Kanagawa to incorporate the teachings of You of quantizing the subband signal feature of each subband signal to obtain an index value of the subband signal feature; and performing entropy encoding on the index value of the subband signal feature to obtain a bitstream of the subband signal which provides the benefit of achieving greater efficiency and simultaneously providing more acceptable reproduction of the original audio signal (Col. 2, lines 2-3 of You).
Allowable Subject Matter
Claims 3-4, 7-9, 13-14, and 17-19 would be allowable if rewritten in a form that overcomes the rejection under 35 USC 101 rejection(s), as described above.
The following is a statement of reasons for the indication of allowable subject matter:
Regarding claims claim(s) 3 and 13, and their respective dependents, the closest prior art of record include: Feng et al. (US 20230154474 A1), Kanagawa (US 20240339105 A1), Ghaffarzadegan et al. (US 20190365342 A1), and You (US 7937271 B2).
The claim(s) recite:
3 and 13. The audio processing method/apparatus according to claims 1 and 11,
(claim 3) wherein the multichannel signal decomposition is implemented through multi-layer two-channel subband decomposition and the performing multichannel signal decomposition comprises:
(claim 13) wherein the multichannel signal decomposition is implemented through multi-layer two-channel subband decomposition and the decomposition code is further configured to cause at least one of the at least one processor to:
performing first-layer two-channel subband decomposition on the audio signal to obtain a first-layer low-frequency subband signal and a first-layer high-frequency subband signal;
performing an (i+1)th-layer two-channel subband decomposition on an ith-layer subband signal to obtain an (i+1)th-layer low-frequency subband signal and an (i+1)th-layer high-frequency subband signal, the ith-layer subband signal being an ith-layer low-frequency subband signal, or the ith-layer subband signal being an ith-layer high-frequency subband signal and an ith-layer low-frequency subband signal, and i being an increasing natural number with a value range of 1≤i<N; and
determining a last-layer subband signal and a high-frequency subband signal at each layer of the multi-layers that has not undergone the two-channel subband decomposition as subband signals of the audio signal.
Feng et al. teaches (at least in Figs. 6-7 and ¶ [0006, 0030, 0035, and 0040]):
A method for providing a high quality audio for playback, and includes steps of receiving stream of audio, suppressing noise from audio to generate clean audio, splitting clean audio data in set of frames of audio data, resampling audio data frames according to different frequency ranges, forming lower and higher sub-band audio data (i.e., wideband and super wideband modes).
Compression of extracted audio features for each frame of the audio data.
Resampling audio data of the frames according to frequency ranges (i.e., wideband and super wideband), extracting sets of audio features, and quantizing the set of audio features.
However, Feng et al. does not teach:
Frequency bands of the N subband signals increasing sequentially and N being an integer greater than 2;
The performing multichannel signal decomposition comprising:
performing first-layer two-channel subband decomposition on the audio signal to obtain a first-layer low-frequency subband signal and a first-layer high-frequency subband signal;
performing a following (i.e., (i+1)th) layer two-channel subband decomposition on an ith-layer subband signal to obtain a following (i.e., (i+1)th) layer low-frequency subband signal and a following (i.e., (i+1)th) layer high-frequency subband signal, the ith-layer subband signal being an ith-layer low-frequency subband signal, or the ith-layer subband signal being an ith-layer high-frequency subband signal and an ith-layer low-frequency subband signal, and i being an increasing natural number with a value range of 1≤i<N; and
determining a last layer subband signal and a high-frequency subband signal at each layer of the multi-layers that has not undergone the two-channel subband decomposition as subband signals of the audio signal.
Kanagawa teaches (at least in Fig. 2 and ¶ [0042 and 0044]):
Generating subband signals associated with different frequency ranges (i.e., subband 1: low-frequency subband signal, subband 2: low-to-mid-frequency subband signal, subband 3: mid-to-high-frequency subband signal, and subband 4: high-frequency subband signal).
However, Kanagawa does not teach:
The performing multichannel signal decomposition comprising:
performing first-layer two-channel subband decomposition on the audio signal to obtain a first-layer low-frequency subband signal and a first-layer high-frequency subband signal;
performing a following (i.e., (i+1)th) layer two-channel subband decomposition on an ith-layer subband signal to obtain a following (i.e., (i+1)th) layer low-frequency subband signal and a following (i.e., (i+1)th) layer high-frequency subband signal, the ith-layer subband signal being an ith-layer low-frequency subband signal, or the ith-layer subband signal being an ith-layer high-frequency subband signal and an ith-layer low-frequency subband signal, and i being an increasing natural number with a value range of 1≤i<N; and
determining a last layer subband signal and a high-frequency subband signal at each layer of the multi-layers that has not undergone the two-channel subband decomposition as subband signals of the audio signal.
Ghaffarzadegan et al. teaches (at least in Fig. 3 and ¶ [0006, 0028-29, 0033, 0042, 0044-45, 0055, and 0058]):
Processing audio waveforms and for each segment of the plurality of segments, decomposing the segments into respective frequency sub-band segments using a first convolutional neural network (CNN).
At least one neural network (i.e., CNN) configured to extract features from segments.
Performing cross-correlation between input signals and its kernel weights.
Decomposing frequency sub-bands using FIR filter banks to implement band pass filters for different frequency sub-bands
However, Ghaffarzadegan et al. does not teach:
The limitations under Kanagawa (does not teach) list, above.
You teaches (at least in ¶ Col. 1, lines 40-67 and ¶ Col. 3, lines 7-18):
Audio data being retrieved form bitstream by applying codebooks to specified ranges of quantization indexes.
Decoding audio signals in a frame-based bit stream.
Frames including processing information about frames and entropy-encoded quantization indexes representing audio data withing the frame.
However, You does not teach:
The limitations under Kanagawa (does not teach) list, above.
None of Feng et al., Kanagawa, Ghaffarzadegan et al., and You, either alone or in combination, teaches or makes obvious the performing multichannel signal decomposition comprising: performing first-layer two-channel subband decomposition on the audio signal to obtain a first-layer low-frequency subband signal and a first-layer high-frequency subband signal; performing a following (i.e., (i+1)th) layer two-channel subband decomposition on an ith-layer subband signal to obtain a following (i.e., (i+1)th) layer low-frequency subband signal and a following (i.e., (i+1)th) layer high-frequency subband signal, the ith-layer subband signal being an ith-layer low-frequency subband signal, or the ith-layer subband signal being an ith-layer high-frequency subband signal and an ith-layer low-frequency subband signal, and i being an increasing natural number with a value range of 1≤i<N; and determining a last layer subband signal and a high-frequency subband signal at each layer of the multi-layers that has not undergone the two-channel subband decomposition as subband signals of the audio signal. Therefore, none of the cited prior art either alone or in combination, teaches or makes obvious the combination of limitations as recited in the independent claims.
Similarly, regarding claims claim(s) 7 and 17, and their respective dependents, the closest prior art of record also include: Feng et al. (US 20230154474 A1), Kanagawa (US 20240339105 A1), Ghaffarzadegan et al. (US 20190365342 A1), and You (US 7937271 B2).
The claim(s) recite:
7 and 17. The audio processing method/apparatus according to claims 1 and 11,
(claim 7) wherein the performing signal compression comprises:
(claim 17) wherein the compression code is further configured to cause at least one of the at least one processor to:
separately performing feature extraction on a first k subband signals of the N subband signals to obtain subband signal features respectively corresponding to the first k subband signals, k being an integer within a value range of 1<k<N; and
separately performing bandwidth extension on a last N–k subband signals of the N subband signals to obtain subband signal features respectively corresponding to the last N–k subband signals.
Feng et al., Kanagawa, Ghaffarzadegan et al., and You disclose teachings as discussed under the reasons for the indication of allowable subject matter regarding claims 3 and 13, above.
None of Feng et al., Kanagawa, Ghaffarzadegan et al., and You, either alone or in combination, teaches or makes obvious wherein the performing signal compression comprises: separately performing feature extraction on a first k subband signals of the N subband signals to obtain subband signal features respectively corresponding to the first k subband signals, k being an integer within a value range of 1<k<N; and separately performing bandwidth extension on a last N–k subband signals of the N subband signals to obtain subband signal features respectively corresponding to the last N–k subband signals. Therefore, none of the cited prior art either alone or in combination, teaches or makes obvious the combination of limitations as recited in the independent claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Regarding encoding methods to perform quantization bit allocation for spectral coefficients of audio signals (pertinent to claims 1, 11, and 20):
Liu et al. (US 20160275955 A1).
Regarding coding of extended-band spectrum (pertinent to claims 1, 11, and 20):
Kawashima et al. (US 20150294673 A1).
Regarding multichannel decomposition signals into subbands (i.e., 16 bands) and non-uniformity of bands (pertinent to claims 1-2, 11-12, and 20):
Chen et al. (CN 1375817 A).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Keisha Y Castillo-Torres whose telephone number is (571)272-3975. The examiner can normally be reached Monday - Friday, 9:00 am - 4:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Keisha Y. Castillo-Torres
Examiner
Art Unit 2659
/Keisha Y. Castillo-Torres/Examiner, Art Unit 2659