Last updated: April 19, 2026
Application No. 17/761,656
Audio Encoding and Audio Decoding

Non-Final OA §103
Filed
Mar 18, 2022
Examiner
ZHANG, LESHUI
Art Unit
2695
Tech Center
2600 — Communications
Assignee
Nokia Technologies Oy
OA Round
5 (Non-Final)
Interview Optional

— +36.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 928 resolved cases, 2023–2026
Examiner Intelligence

ZHANG, LESHUI View full profile →
Grants 78% — above average
Career Allow Rate
719 granted / 928 resolved
+15.5% vs TC avg
Strong +36% interview lift
Without
With
+36.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
47 currently pending
Career history
975
Total Applications
across all art units
Statute-Specific Performance

§101
5.5%
-34.5% vs TC avg
§103
42.5%
+2.5% vs TC avg
§102
13.6%
-26.4% vs TC avg
§112
28.7%
-11.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 928 resolved cases
Office Action

§103
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
This Office Action is in response to a RCE application communication filed on December 15, 2025 and wherein claims 1, 18, 19-20, 23 amended and claims 15, 17, 21, 24 maintained cancellation status.
In virtue of this communication, claims 1-14, 16, 18-20, 22-23 are currently pending in this Office Action.
With respect to the objection of claims 1-14, 16, 18-20, 22-23 due to formality issue, as set forth in the previous Office Action, the claim amendment, including canceled “… over time” in claims, and argument, see paragraph 2 of page 12 in Remarks filed on December 15, 2025, have been fully considered and the argument is persuasive. Therefore, the objection of claims 1--14, 16, 18-20, 22-23 due to the formality issue, as set forth in the previous Office Action, has been withdrawn.
The Office appreciates the explanation of the amendment and analyses of the prior arts, and however, although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993) and MPEP 2145.

Claim Objections
Claims 1-14, 16, 18-20, 22-23 are objected to because of the following informalities: 
Claim 1 recited “wherein the one or more transport audio signals and the spatial metadata are encoded separately from the least one audio signal and with a different bitrate and wherein “the least one audio signal” which should be -- wherein the one or more transport audio signals and the spatial metadata are encoded separately from the at least one identified audio signal and with a different bitrate --. Claims 2-14, 16, 18 are objected due to the dependencies to claim 1.
Claims 19-20, 23 are objected for the at least similar reason as described in claim 1 above since claims 19-20, 23 recited similar deficient feature as recited in claim 1. Claim 22 is objected due to the dependency to claim 20.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 19-20, 23 are rejected under 35 U.S.C. 103 as being unpatentable over Purnhagen et al. (US 20170339505 A1, hereinafter Purnhagen) and in view of references Fuchs et al. (US 20200265851 A1, hereinafter Fuchs) and Panchagnula et al. (US 20160255348 A1, hereinafter Panchagnula). 
Claim 1. Purnhagen teaches an apparatus (title and abstract, ln 1-17, an multichannel audio encoder 300 in fig. 3) comprising: 
at least one processor (digital signal processor DSP or microprocessor, para 229); and 
at least one memory storing instructions that (computer readable media including RAM, ROM, EEPROM flash memory, etc., stored with computer readable instructions, program modules, para 229), when executed with the at least one processor (processor and instructions above), cause the apparatus at least to: 
receive multi-channel audio signals (generated from an audio authoring equipment 301 or recorded by transducers, para 144, e.g., M-channel received for encoding in figs. 1-2, para 140); 
identify at least one audio signal (channels C or LFE as a group for encoding in fig. 3, e.g., M=11 in 11.1 channel configuration, regardless of encoding mode F1, F2, or F3 in figs. 6-8) to separate from the multi-channel audio signals (C/LFE, etc. are to be separated as one group for encoding in fig. 3); 
separate, based on the identified at least one audio signal (C or LFE channel in fig. 3 and discussion above), the multi-channel audio signals (M-channel signals above) into at least a first sub-set of audio signals (including C/LFE for encoding from M-channel signals in fig. 3) and a second sub-set of audio signals (including at least LS, LB, …, RS, RB, … from M-channel signals for encoding in fig. 3), wherein the first sub-set comprises the identified at least one audio signal (the group containing C/LFE, etc., as identified in M-channel above) and the second sub-set comprises one or more audio signals (including at least LS, RS, …, as rest of M-channel signals in fig. 3), of the multi-channel audio signals (M-channel as input audio channel signals in fig. 3, and the discussion above), other than the at least one identified audio signals (the separated LS, RS, … do not have channels C and LFE, etc., in fig. 3), wherein the first subset comprises at least a center loudspeaker channel signal (C/LFE as first subset and discussion above); 
analyze the one or more audio signals of the second sub-set of audio signals (via encoding sections 100/303 in fig. 3) to determine one or more transport audio signals (L1, L2, R1, R2 in fig. 3) and metadata (βL, γL, βR, γR as dry and wet parameters in fig. 3, para 144-145); and 
encode the at least one identified audio signal (via MDCT 308 and applied to at least one of C and LFE channels in fig. 3, para 147), the one or more transport audio signals (L1, L2, R1, R2 via MDCT 306 in fig. 3) and metadata (via switch 304 and quantization 307, etc., in fig. 3, para 145-148), wherein the complexity degree of the one or more transport audio signals and the spatial metadata are more than the at least one audio signal (only two channels C and LFE with only MDCT 308 for encoding, compared to the multiple channels L1, L2, R1, R2, with additional encoding 303 compensated to the encoding 100 in fig. 3, para 144).
However, Purnhagen does not explicitly teach wherein the metadata is spatial metadata and encoding the spatial metadata and does not explicitly teach wherein it is with a different bitrate of encoding the at least one audio signal from encoding the one or more transport audio signals and the spatial metadata.
Fuchs teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-9 and an audio encoder in fig. 1a) and wherein the apparatus comprises at least one processor (encoder processor 200, etc., in figs. 2a/2b, microprocessor, para 345) and at least one memory (FPGA, para 345 or non-transitory digital storage medium, para 22) including computer program code (thereon a computer program for performing a method of encoding directional audio coding parameters, etc., para 22), the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus (computer having processor and non-transitory digital storage medium having stored thereon the computer program, para 22) to analyze one or more audio signals of the second sub-set of audio signals to determine one or more transport audio signals and spatial metadata (B-format audio signals analyzed by the filter bank analysis 130 and diffuseness estimation 110 and direction estimation 120 in fig. 1a and to generate diffuseness parameter and direction parameters for spatial metadata encoder 200 in fig. 2a and beamforming signal selection for EVS encoder 150 in fig. 1a) and encode the one or more transport audio signals and the spatial metadata (through elements 200 and 150 in fig. 1a) for benefits of achieving a lot bit encoding with high quality (para 24) with low delay (para 110).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein the metadata is the spatial metadata and encoding the spatial metadata, as taught by Fuchs, to the metadata and encoding in the apparatus, as taught by Purnhagen, for the benefits discussed above.
However, the combination of Purnhagen does not explicitly teach wherein it is with a different bitrate for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata.
Panchagnula teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-14 and a video encoding system including audio encoder 104 in fig. 1) and wherein one or more transport audio signals (left and right channels carry sound effects and music and rear channels might be silent, para 24, and as complex program, para 44) and the spatial metadata (metadata, para 44) are disclosed and a at least one identified audio signal is also disclosed (a center channel, e.g., mono or stereo channel, as simple dialog, para 24, 44) and wherein it is with a different bitrate 402 between for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata (a target audio bitrate is set based on the complexity of program, e.g., a news broadcast audio signal is in majority of dialog with a lower bitrate than the complex program containing music and sound effect, para 44, such as left channel and right channel, para 48) for benefits of improving a performance of the system (by maintaining a sound quality with low transmission bitrate so that saving bitrate for high quality of video image without loss of audio quality in fig. 5, para 50 and para 75).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein it is with the different bitrate for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata, as taught by Panchagnula, to the at least one identified audio signal and the one or more transport audio signals and the spatial metadata in the apparatus, as taught by the combination of Purnhagen and Fuchs, for the benefits discussed above.
Claim 19 recited a method for coding audio signals and has been analyzed and rejected according to claim 1 above.
Claim 20: the combination of Purnhagen, Fuchs, and Panchagnula teaches an apparatus (Purnhagen, title and abstract, ln 1-17, an multi-channel audio decoder 1000 in fig. 10 and title and abstract, ln 1-9 and an audio encoder in fig. 1a) comprising: 
at least one processor (Purnhagen, digital signal processor DSP or microprocessor, para 229 and Panchagnula, processors, para 20); and 
at least one memory storing instrucitons that, when executed (Purnhagen, computer readable media including RAM, ROM, EEPROM flash memory, etc., stored with computer readable instructions, program modules, para 229 and Panchagnula, memory storing software elements, para 20) with the at least one processor (Purnhagen, processor and instructions above and Panchagnula, implementing audio and video encoders, para 20), cause the apparatus at least to: 
receive encoded data comprising at least one audio signal, one or more transport audio signals and spatial metadata (Purnhagen, via receiving section 1001 in fig. 10, para 174 and bitstream B from the audio encoding system 300 of fig. 3, and Fuchs, spatial metadata in fig. 1a) for decoding (via decoding sections 900, 1005, etc., in fig. 10, para 166, 177), wherein the one or more transport audio signals and the spatial metadata are encoded separately from the last one audio signal (see the discussion in claim 1 above, based on Purnhagen and Fuchs); 
decode the received encoded data to decode the at least one audio signal (Purnhagen, via MDCT-1 1010 to recover C/LFE in fig. 10, para 181), the one or more transport audio signals (via MDCT-1 1002, 1006 in fig. 10) and the spatial metadata (Purnhagen, via q-1 for reconstructing α = (αL, αR) in fig. 10 and Fuchs, spatial metadata in fig. 1a), wherein the at least one audio signal comprises at least a center loudspeaker channel signal (Purnhagen, reconstructed C channel outputted from the element 1010 in fig. 10); 
synthesize the decoded one or more transport audio signals and the decoded spatial metadata (Purnhagen, via decoding section 900, 1005 in fig. 10 and Fuchs, spatial metadata in fig. 1a) to provide a set of audio signals (Purnhagen, Ḹ , Ṝ, ḸS, ṜS… outputted from element 900, 1005 in fig. 10); 
identify multi-channel indices of the at least one audio signal (Purnhagen, labeling of C/LFE in fig. 10) and audio signals of the set of audio signals (Purnhagen, represented by the labeling Ḹ , Ṝ, ḸS, ṜS… in fig. 10 and via the synthesis section 1011 in fig. 10); and 
combine, using the multi-channel indices (Purnhagen, mapping the channel signals to loudspeakers section 1012 in fig. 10, para 181), at least the decoded at least one audio signal (Purnhagen, reconstructed C/LFE in fig. 10) and the set of audio signals (Purnhagen, Ḹ , Ṝ, ḸS, ṜS… in fig. 10) to provide the multi-channel audio signals (Purnhagen, playback by the multi-speaker system in fig. 10, para 181).
Claim 23 recited a method for decoding audio signals and has been analyzed and rejected according to claim 20 above.

Claims 1-14, 16, 18-20, 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over Oomen et al (US 20150142453 A1, hereinafter Oomen) and in view of reference Purnhagen (above), Fuchs (above), and Panchagnula et al. (US 20160255348 A1, hereinafter Panchagnula).
Claim 1. Oomen teaches an apparatus (title and abstract, ln 1-15, an encoder 1201 in fig. 12-13) comprising: 
at least one processor (one or more data processors, para 196); and 
at least one memory including a computer program code (software running on the processors and DSPs, and thus, memory for storing the software is inherently), the at least one memory and the computer program code configured to, with the at least one processor (the processors with software implementation, para 196), cause the apparatus at least to: 
receive multi-channel audio signals (via a receiver 1301 in fig. 13, receiving audio signals, and each of the audio signals has time-frequency tiles, i.e., multi-channel audio signals, para 72 or the audio signals representing sound objects and extracted from multi-channel downmix, para 17); 
identify at least one audio signal to separate from the multi-channel audio signals (at least some from time-frequency tiles that are to be non-downmixed, para 72, for example, according to energy, spatial characteristics, and coherence between pairs of the time-frequency tiles, para 81); 
separate, based on the identified at least one audio signal, the multi-channel audio signals into at least a first sub-set of audio signals (non-downmixed time-frequency tiles must include the some from time-frequency tiles above) and a second sub-set of audio signals (downmix time-frequency tiles and forwarded to encoder 1309 in fig. 13), wherein the first sub-set comprises the identified at least one audio signal and the second sub-set comprises one or more audio signals, of the multi-channel audio signals, other than the at least one identified audio signal (separation is based on energy of time-frequency tiles, spatial characteristic of the time frequency tiles, and a coherence characteristic between pairs of the time-frequency tiles, para 81, in order to improve efficiency of encoding, para 25-26, and thus, selected downmix time-frequency tiles are inherently not to non-downmix time-frequency tiles);  
analyze the one or more audio signals of the second sub-set of audio signals to determine one or more transport audio signals and metadata (performing downmix on the selected downmix time-frequency tiles and the downmixer 1305 also generates parametric data for upmix in reconstructing the original audio object tile, e.g., ILD, ICC, ITD, IPD, etc., para 120); and 
encode the at least one identified audio signal (part of non-downmix signal through 1309), the one or more transport audio signals (downmixed time-frequency tiles are encoded by 1307 in fig. 13, para 121) and the metadata (encoding the ILD, ICC, ITD, IPD, etc., by the encoder 1307, para 122), wherein the one or more transport audio signals and are encoded separately from the least one audio signal (down mix signal is encoded by ENC 1307 and non-downmix signal is encoded by 1309 and elements 1307, 1309 are separated in fig. 13). 
However, Oomen does not explicitly teach wherein the first sub-set comprises at least a center loudspeaker channel signal and does not explicitly teach wherein the metadata is spatial metadata and does not explicitly teach wherein it is with a different bitrate for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata.
Purnhagen teaches an apparatus (title and abstract, ln 1-17, an multichannel audio encoder 300 in fig. 3) comprising: 
at least one processor (digital signal processor DSP or microprocessor, para 229); and 
at least one memory storing instructions that (computer readable media including RAM, ROM, EEPROM flash memory, etc., stored with computer readable instructions, program modules, para 229), when executed with the at least one processor (processor and instructions above), cause the apparatus at least to: 
receive multi-channel audio signals (generated from an audio authoring equipment 301 or recorded by transducers, para 144, e.g., M-channel received for encoding in figs. 1-2, para 140); 
identify at least one audio signal (channels C or LFE as a group for encoding in fig. 3, e.g., M=11 in 11.1 channel configuration, regardless of encoding mode F1, F2, or F3 in figs. 6-8) to separate from the multi-channel audio signals (C/LFE, etc. are to be separated as one group for encoding in fig. 3); 
separate, based on the identified at least one audio signal (C or LFE channel in fig. 3 and discussion above), the multi-channel audio signals (M-channel signals above) into at least a first sub-set of audio signals (including C/LFE for encoding from M-channel signals in fig. 3) and a second sub-set of audio signals (including at least LS, LB, …, RS, RB, … from M-channel signals for encoding in fig. 3), wherein the first sub-set comprises the identified at least one audio signal (the group containing C/LFE, etc., as identified in M-channel above) and the second sub-set comprises one or more audio signals (including at least LS, RS, …, as rest of M-channel signals in fig. 3), of the multi-channel audio signals (M-channel as input audio channel signals in fig. 3, and the discussion above), other than the at least one identified audio signals (the separated LS, RS, … do not have channels C and LFE, etc., in fig. 3), wherein the first subset comprises at least a center loudspeaker channel signal (C/LFE as first subset and discussion above); 
analyze the one or more audio signals of the second sub-set of audio signals (via encoding sections 100/303 in fig. 3) to determine one or more transport audio signals (L1, L2, R1, R2 in fig. 3) and metadata (βL, γL, βR, γR as dry and wet parameters in fig. 3, para 144-145); and 
encode the at least one identified audio signal (via MDCT 308 and applied to at least one of C and LFE channels in fig. 3, para 147), the one or more transport audio signals (L1, L2, R1, R2 via MDCT 306 in fig. 3) and metadata (via switch 304 and quantization 307, etc., in fig. 3, para 145-148) for benefits of achieving an efficient audio encoding and decoding (by adapting multiple encoding modes upon the channel contents, para 4, 23) and improving playback sound quality (by applying less artifact decorrelated signal caused by switching among encoding modes, para 38, improving listener’s experiences by improving perceived fidelity, para 23).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein the first sub-set comprises at least the center loudspeaker channel signal, as taught by Purnhagen, to the first sub-set in the apparatus, as taught by Oomen, for the benefits discussed above.
However, the combination of Oomen and Purnhagen does not explicitly teach wherein the metadata is spatial metadata and encoded and does not explicitly teach wherein it is with a different bitrate for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata.
Fuchs teaches wherein the metadata is the spatial metadata and one or more audio signals are analyzed to determine the spatial metadata and the spatial metadata further encoded for the similar benefits discussed in claim 1 above.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein the metadata is the spatial metadata and encoding the spatial metadata, as taught by Fuchs, to the metadata and encoding in the apparatus, as taught by Oomen and Purnhagen, for the benefits discussed above.
However, the combination of Oomen, Purnhagen, and Fuchs does not explicitly teach wherein it is with a different bitrate for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata.
Panchagnula teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-14 and a video encoding system including audio encoder 104 in fig. 1) and wherein one or more transport audio signals (left and right channels carry sound effects and music and rear channels might be silent, para 24, and as complex program, para 44) and the spatial metadata (metadata, para 44) are disclosed and a at least one identified audio signal is also disclosed (a center channel, e.g., mono or stereo channel, as simple dialog, para 24, 44) and wherein it is with a different bitrate 402 between for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata (a target audio bitrate is set based on the complexity of program, e.g., a news broadcast audio signal is in majority of dialog with a lower bitrate than the complex program containing music and sound effect, para 44, such as left channel and right channel, para 48) for benefits of improving a performance of the system (by maintaining a sound quality with low transmission bitrate so that saving bitrate for high quality of video image without loss of audio quality in fig. 5, para 50 and para 75).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein it is with the different bitrate for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata, as taught by Panchagnula, to the at least one identified audio signal and the one or more transport audio signals and the spatial metadata in the apparatus, as taught by the combination of Oomen, Purnhagen, and Fuchs, for the benefits discussed above.
Claim 19 recited a method for coding audio signals and has been analyzed and rejected according to claim 1 above.
Claim 20: Oomen teaches an apparatus (title and abstract, ln 1-15, an decoder 1203 in figs. 12, 14) comprising: 
at least one processor (one or more data processors, para 196); and 
at least one memory including a computer program code (software running on the processors and DSPs, and thus, memory for storing the software is inherently), the at least one memory and the computer program code configured to, with the at least one processor (the processors with software implementation, para 196), cause the apparatus at least to: 
receive encoded data comprising at least one audio signal, one or more transport audio signals and metadata for decoding (via receiver 1401, including encoded downmix time-frequency tiles, encoded non-downmix time-frequency tiles, and encoded data signal comprising a downmix indication, etc., para 37), wherein the one or more transport audio signals and the spatial metadata are encoded separately from the least one audio signal (Oomen, discussed in claim 1 above); 
decode the received encoded data to decode the at least one audio signal, the one or more transport audio signals and the metadata (decoding based on downmix indication so that both encoded downmix and non-downmix time frequency times are decoded, para 135); 
synthesize the decoded one or more transport audio signals and the decoded metadata to provide a set of audio signals (upmix the decoded downmix time frequency tile based on the upmix parameters, para 41, 140); 
identify multi-channel indices of the at least one audio signal and audio signals of the set of audio signals (based on the downmix indication, para 135, e.g., indicating the time-frequency tiles are downmixed or non-downmixed, para 146); and 
combine, using the multi-channel indices, at least the decoded at least one audio signal and the set of audio signals to provide the multi-channel audio signals (rendering 1407 in fig. 14, para 143-146).
However, Oomen does not explicitly teach wherein the at least one audio signal comprises at least a center loudspeaker channel signal and Oomen does not explicitly teach wherein the metadata is spatial metadata and does not explicitly teach wherein it is with a different bitrate for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata.
Purnhagen teaches wherein the at least one audio signal comprises at least the center loudspeaker channel signal for the benefits, as discussed in claim 1 above.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein the at least one audio signal comprises at least the center loudspeaker channel signal, as taught by Purnhagen, to the at least one audio signal in the apparatus, as taught by Oomen, for the benefits discussed above.
However, the combination of Oomen and Purnhagen does not explicitly teach wherein the metadata is spatial metadata and does not explicitly teach wherein it is with a different bitrate for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata.
Fuchs teaches wherein the metadata is the spatial metadata and one or more audio signals are analyzed to determine the spatial metadata and the spatial metadata further encoded for the similar benefits discussed in claim 1 above.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein the metadata is the spatial metadata and encoding the spatial metadata, as taught by Fuchs, to the metadata and encoding in the apparatus, as taught by Oomen and Purnhagen, for the benefits discussed above.
However, the combination of Oomen, Purnhagen, and Fuchs does not explicitly teach wherein it is with a different bitrate for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata.
Panchagnula teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-14 and a video encoding system including audio encoder 104 in fig. 1) and wherein one or more transport audio signals (left and right channels carry sound effects and music and rear channels might be silent, para 24, and as complex program, para 44) and the spatial metadata (metadata, para 44) are disclosed and a at least one identified audio signal is also disclosed (a center channel, e.g., mono or stereo channel, as simple dialog, para 24, 44) and wherein it is with a different bitrate 402 between for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata (a target audio bitrate is set based on the complexity of program, e.g., a news broadcast audio signal is in majority of dialog with a lower bitrate than the complex program containing music and sound effect, para 44, such as left channel and right channel, para 48) for benefits of improving a performance of the system (by maintaining a sound quality with low transmission bitrate so that saving bitrate for high quality of video image without loss of audio quality in fig. 5, para 50 and para 75).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein it is with the different bitrate for encoding the at least one audio signal and for encoding the one or more transport audio signals and the spatial metadata, as taught by Panchagnula, to the at least one identified audio signal and the one or more transport audio signals and the spatial metadata in the apparatus, as taught by the combination of Oomen, Purnhagen, and Fuchs, for the benefits discussed above.
Claim 23 recited a method for decoding audio signals and has been analyzed and rejected according to claim 20 above.
Claim 2: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 1 above, wherein the first sub-set of audio signals is a fixed sub-set of the multi-channel audio signals and the second sub-set of audio signals is a fixed sub-set of the multi-channel audio signals (Oomen, the selection is based on the energy, location, and coherence, and thus, fixed frame by frame, para 81 and segmenting into time segments, e.g., 20ms duration as fixed relatively, para 116 and Purnhagen, 11.1 are fixed channel configuration in fig. 3).
Claim 3: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 1 above, wherein the first sub-set further comprises at least one of:
a pair of stereo channel signals, or one or more dominantly voice audio channel signals (Oomen, including speech and music, para 2, and Purnhagen, L/R channel can be placed into the first sub-set with C/LFE in fig. 3).
Claim 4: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 1 above, wherein the second sub-set of audio signals is a variable sub-set of the multi-channel audio signals (Oomen, segmenting, such as 20ms per segment, variable with respect to multiple segments, para 116 and Purnhagen, C/LFE can be grouped with L/R, or not, i.e., flexible, and similar in RS, LS, RB, LB, …, in fig. 3).
Claim 5: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 4 above, wherein a count of the first sub-set of audio signals is variable (Oomen, both options for downmix and nondowminx, para 168, 174, and Purnhagen, the discussion in claim 4 above).
Claim 6: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 1 above, wherein the first sub-set of audio signals comprises signals that are determined to satisfy a first criterion and the second sub-set of audio signals comprises signals that are determined not to satisfy the first criterion (Oomen, criteria used for selection, para 168, and Purnhagen, low frequency effect for criteria on frequency requirement, and center channel is located at the center, etc. upon the 11.1 channel configuration).
Claim 7: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 6 above, wherein the first criterion is dependent upon one or more first audio characteristics of the multi-channel audio signals (e.g., center location for channel C and frequency characteristics for LFE channel in 11.1 configuration in fig. 3), wherein the first sub-set of audio signals share the one or more first audio characteristics (Purnhagen, e.g., C and LFE are located at the front of listener in figs. 6-8) and second sub-set of audio signals do not share the one or more first audio characteristics (Oomen, for efficiency of encoding, no overlap between the downmix and non-downmix, and the discussion in claim 1 above and Purnhagen, left and right LS/LB and RS/RB are not shared each other for 11.1 configuration in fig. 3).
Claim 8: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 6 above, wherein the first criterion is dependent upon one or more spectral properties of the multi-channel audio signals, wherein at least one audio signal of the first sub-set of audio signals share the one or more spectral properties and the second sub-set of audio signals do not share the one or spectral properties (Oomen, indicated by downmix indication, para 127, and Purnhagen, e.g., frequency requirement for LFE in 11.1 configuration, and while L/R and C/LFE are the same group in figs. 6-8, L/R/C sharing the same frequency bandwidth inherently and LS, LB and RS, RB are not sharing each other in 11.1 channel configuration).
Claim 9: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 7 above, wherein the one or more first audio characteristics comprise an energy level of an audio signal, wherein the respective audio signals of first sub-set of audio signals have an energy level greater than respectively audio signals of the second sub-set of audio signals (Oomen, selection is also based on energy between pairs and time frequency tiles, para 81, and Purnhagen, C/LFE and L/R are located at the front of the listener in figs. 6-8, and thus, inherently having stronger energy level than surround and back sound defined by LS/LB, and RS/RB).
Claim 10: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 7 above,, wherein the one or more first audio characteristics comprise audio signal correlation, wherein respective audio signals of the first sub-set of audio signals have greater cross-correlation with the audio signals of the first sub-set than the one or more audio signals of the second sub-set (Purnhagen, dry channel signals L/R, C/LFE are dry channel signals inherently having greater cross correlation than surround sound signals LS/RS, LB/RB which are wet channel signals, para 137, 141-143), or 
wherein the one or more first audio characteristics comprise audio signal de-correlation, wherein at least one audio signal of the first sub-set of audio signals has low cross-correlation with at least one other audio signal of the first sub-set and with the one or more audio signals of the second sub-set, or wherein the one or more first audio characteristics comprise audio characteristics defined with an audio classifier, wherein the at least one audio signal of the first sub-set of audio signals conveys voice and the one or more audio signals of the second sub-set do not (Oomen, the selection is also relied on coherence between pairs, para 172-174, and Purnhagen, discussion in claim 9 above, and center comprising speech or voice and music on the surround being common in 11.1 or 5.1 configuration).
Claim 11: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 1 above, wherein the multi-channel audio signals comprise multiple audio signals, where respective ones of the multiple audio signals are configured for rendering audio via a different output channel (Oomen, decoded time-frequency tiles are distributed to different channels for rendering in fig. 15 and Purnhagen, through loudspeaker system 1012 in fig. 10).
Claim 12: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 1 above, wherein a count of the audio signals of the first sub-set is dependent upon an available bandwidth (Oomen, relied on available bit rate, para 151 and Purnhagen, using Hoffman coding to save bandwidth, para 150).
Claim 13: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 1 above, wherein analyzing the one or more audio signals of the second sub-set of audio signals and the metadata comprises the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to:
analyze the second sub-set of audio signals but not the first sub-set of audio signals (Oomen, only downmix time-frequency tiles used for evaluating upmix parameters, para 122, 128, and Purnhagen, the elements 100, 303 and 304 are not applied to C/LFE in fig. 3).
Claim 14: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 13 above, wherein the metadata comprises at least one of: 
parameterization of time-frequency portions of the second sub-set of audio signals (Purnhagen, via QMF to convert audio signals in frequency domain in fig. 3); or 
an encoding of at least spatial energy distribution of a sound field defined with the second sub-set of audio signals (Oomen, including energies calculation for selection, para 81 and Purnhagen, including information for decorrelation or wet audio signal regeneration, abstract, also including level differences and cross-correlation, para 3).
Claim 16: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 1 above, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to provide control information that at least identifies at least one of: which of the multi-channel audio signals are comprised in the first sub-set of audio signals; or processed audio signals produced with the analysis (Oomen, based on the downmix indication, para 126 and Purnhagen, 5.1 in fig. 13, para 33 or 11.1 channel configuration in fig. 6-8, para 33, 126).
Claim 18: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 1 above, wherein one or more transport audio signals comprise one or more processed audio signals produced via metadata assisted spatial audio processing (Oomen, through element 1405 by using the decoded parameters outputted from element 1401 in fig. 14 and Purnhagen, through element 900/1005 by using the side information S in fig. 10, and Fuchs, throughaudio renderer 420 by using metadata including decoded diffuseness parameters and decoded direction parameters, etc., in fig. 7a), wherein the one or more processed audio signals and the metadata are at least one of:
jointly encoded with the at least one audio signal, or 
encoded separately from the least one audio signals (Oomen, joint encoding and joint decoding, para 126 and Purnhagen, disjoint first and second groups of one or more channels, para 36).
Claim 22: the combination of Oomen, Purnhagen, Fuchs, and Panchagnula further teaches, according to claim 20 above, further comprising at least one of:
a joint decoder for decoding the received encoded data to decode the at least one audio signal, the one or more transport audio signals and the metadata or 
a first decoder for decoding at least a first sub-set of the received encoded data to provide the at least one audio signal, and a second, different, decoder for decoding at least a second sub-set of the received encoded data to provide the one or more transport audio signals and the metadata (Oomen, joint encoding and joint decoding, para 126, and disjoining, para 147 and Purnhagen, MDCT-1 1010 for reconstructing C/LFE and MDCT-1 1002, 1006 for decoding L1,L2, R1, R2 in fig. 10).

Response to Arguments

Applicant's arguments filed on December 15, 2025 have been fully considered and but are moot in view of the new ground(s) of rejection necessitated by the applicant amendment. The Office has thoroughly reviewed Applicants' arguments, e.g., about newly added feature “with a different bitrate” for encoding “the one or more transport audio signals and the spatial metadata” and for “encoding “the least one audio signal”, etc. (paragraph 2 of page 13, paragraph 2 of page 16, paragraphs 3-4 of page 20, and paragraphs 1-3 of page 21 in Remarks filed on December 15, 2025), but firmly believes that the cited references to reasonably and properly meet the claimed limitations.
In the response to this office action, the examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589.  The examiner can normally be reached on Monday-Friday 6:30amp-4:00pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 571-272-7848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LESHUI ZHANG/
Primary Examiner, Art Unit 2695
Read full office action
Prosecution Timeline

Mar 18, 2022
Application Filed
Feb 24, 2024
Non-Final Rejection — §103
May 20, 2024
Response Filed
Jul 27, 2024
Final Rejection — §103
Oct 31, 2024
Response after Non-Final Action
Nov 08, 2024
Response after Non-Final Action
Nov 27, 2024
Request for Continued Examination
Dec 06, 2024
Response after Non-Final Action
Jan 25, 2025
Non-Final Rejection — §103
Apr 28, 2025
Response Filed
Jul 19, 2025
Final Rejection — §103
Nov 17, 2025
Response after Non-Final Action
Dec 15, 2025
Request for Continued Examination
Dec 18, 2025
Response after Non-Final Action
Jan 24, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/124,589
Patent 12585677
AUTOMATED GENERATION OF IMPROVED LIST-TYPE ANSWERS IN QUESTION ANSWERING SYSTEMS
2y 5m to grant Granted Mar 24, 2026
17/726,728
Patent 12572757
VIDEO PROCESSING METHOD, VIDEO PROCESSING APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 10, 2026
18/410,942
Patent 12567423
SYSTEM AND METHODS FOR UPSAMPLING OF DECOMPRESSED SPEECH DATA USING A NEURAL NETWORK
2y 5m to grant Granted Mar 03, 2026
18/553,783
Patent 12567424
METHOD AND DEVICE FOR MULTI-CHANNEL COMFORT NOISE INJECTION IN A DECODED SOUND SIGNAL
2y 5m to grant Granted Mar 03, 2026
18/104,083
Patent 12561354
SYSTEMS AND METHODS FOR ITEM-SPECIFIC KEYWORD RECOMMENDATION
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+36.0%)
2y 10m
Median Time to Grant
High
PTA Risk
Based on 928 resolved cases by this examiner. Grant probability derived from career allow rate.
Audio Encoding and Audio Decoding

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email