Last updated: April 19, 2026
Application No. 18/662,842
Method and System for Coding Audio Data

Non-Final OA §103
Filed
May 13, 2024
Examiner
SULTANA, NADIRA
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Apple Inc.
OA Round
1 (Non-Final)
Interview Optional

— +31.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 97 resolved cases, 2023–2026
Examiner Intelligence

SULTANA, NADIRA View full profile →
Grants 74% — above average
Career Allow Rate
72 granted / 97 resolved
+12.2% vs TC avg
Strong +31% interview lift
Without
With
+31.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
29 currently pending
Career history
126
Total Applications
across all art units
Statute-Specific Performance

§101
25.4%
-14.6% vs TC avg
§103
54.8%
+14.8% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
3.6%
-36.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 97 resolved cases
Office Action

§103
DETAILED ACTION

Notice of AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement

The information disclosure statements (IDS) submitted on 05/13/2024 and 02/03/2025 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Status of Claims

Claims were amended pursuant to a preliminary amendment filed on 03/06/2025 with the initial set on 05/13/2024.  For the examination purpose, the claim set with amended claims have been used. Claim 21 was amended. Claims 1-23 are pending of which Claims 1, 10 and 17 are independent.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 10-13 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al.  ( US 20170105085 A1), hereinafter referenced as Kim, in view of Lee et al. (US 7801733 B2), hereinafter referenced as Lee.

Regarding Claim 1,  Kim teaches a decoder-side method, the method comprising:
receiving a bitstream that includes an encoded representation of an input audio signal and metadata associated with the input audio signal ( Kim: Para.[0093],[0094], Fig. 4, audio decoding device 22A received encoded audio data, such as bitstream 56A. Para.[0043], input audio signals could be object-based audio, which involves discrete pulse-code-modulation (PCM) data for single audio objects with associated metadata containing their location coordinates and other information);
producing a plurality of audio driver signals by rendering the input audio signal based on the metadata ( Kim: Para.[0096],[0098], Fig. 4, Audio decoding unit 204 may be configured to decode coded audio signal 62 into audio signal 70. Audio decoding unit 204 may decode channels C1-CN of audio signal 62 into channels C1-CN of decoded audio signal 70. HOA generation unit 208A may be configured to generate an HOA sound field based on multi-channel audio data and spatial positioning vectors ( metadata) and  provide HOA coefficients 212A to rendering unit 210)
and driving a plurality of speakers using the plurality of audio driver signals (Kim: Para.[0100], Fig. 4, rendering unit 210 may render audio signals 26A for playback at a plurality of local loudspeakers ).

Kim while teaching the method of claim 1, fails to explicitly teach the claimed, producing a decoded representation of the input audio signal by decoding the encoded representation using a Matching Pursuit (MP) coding-based algorithm; 

However, Lee does teach the claimed, producing a decoded representation of the input audio signal by decoding the encoded representation using a Matching Pursuit (MP) coding-based algorithm (Lee: Column 17, lines 29-43, the high-band speech signal is encoded and decoded based on a structure in which a harmonic structure and a stochastic structure is combined. The harmonic structure searches for an amplitude and a phase of a sine wave dictionary using a matching pursuit (MP) algorithm. Hence, the wideband speech encoding and decoding system can reproduce high-quality sound at a low bitrate and with low complexity) ; 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Lee’s teaching of a high-band speech encoding and decoding apparatus in wideband speech encoding and decoding method, into the method coding of higher-order ambisonics audio data, taught by Kim, because, this would reproduce high quality sound even at a low bitrate in wideband speech encoding and decoding.(Lee, Column 2, lines 29-59).

Claim 10 is a decoder-side device claim comprising: at least one processor; and memory having stored instructions which when executed by the at least one processor ( Kim: Para.[0226]-[0228], various aspects of the techniques in each of the sets of  examples may provide for a non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause the one or more processors to perform the method for which the audio decoding device has been configured to perform), causes the decoder-side device to perform the steps in method claim 1 above and as such, claim 10 is similar in scope and content to claim 1 and therefore, claim 10 is rejected under similar rationale as presented against claim 1 above.

Regarding Claim 2, Kim in view of Lee teach the method of claim 1. Kim further teaches, wherein the decoded representation comprises a higher- order ambisonics (HOA) representation of the input audio signal, wherein the method further comprising applying a conversion matrix to the HOA representation to reconstruct the input audio signal ( Kim: Para.[0053]-[0055], the audio decoder may generate a set of higher order ambisonics (HOA) coefficients based on the multi-channel audio signal and the spatial positioning vectors. To reconstruct, the decoder may use rendering matrix D of number of HOA coefficients and number of channels. Para.[0098], Fig. 4, HOA generation unit 208A may generate set of HOA coefficients 212A).

Regarding Claim 3, Kim in view of Lee teach the method of claim 2. Kim further teaches, wherein the input audio signal comprises a plurality of full- range audio channels of a surround-sound format, wherein the method further comprises receiving at least one band-limited audio channel associated with the surround-sound format ( Kim: Para.[0085], input audio signal 50 may be a six-channel audio signal for a source loudspeaker configuration of 5.1 (i.e., a front-left channel, a center channel, a front-right channel, a surround back left channel, a surround back right channel, and a low-frequency effects (LFE) channel), 
wherein producing the plurality of audio driver signals comprises assigning each of the channels to a particular speaker of the plurality of speakers based on the metadata ( Kim: Para.[0098]-[0100], Fig. 4, HOA generation unit 208A may be configured to generate an HOA sound field based on multi-channel audio data and spatial positioning vectors ( metadata) and  provide HOA coefficients 212A to rendering unit 210. Rendering unit 210 may render audio signals 26A for playback at a plurality of local loudspeakers, where the plurality of local loudspeakers includes L loudspeakers, audio signals 26A may include channels C1 through CL that are respectively intended for playback through loudspeakers 1 through L ).

Claim 12 is a decoder-side device claim performing the steps in method claim 3 above and as such, claim 12 is similar in scope and content to claim 3 and therefore, claim 12 is rejected under similar rationale as presented against claim 3 above.

Regarding Claim 4, Kim in view of Lee teach the method of claim 2. Kim further teaches, wherein the input audio signal comprises a set of one or more audio objects and the metadata comprises positional information relating to the set of one or more audio objects ( Kim: Para.[0043], input audio signals could be object-based audio, which involves discrete pulse-code-modulation (PCM) data for audio objects with associated metadata containing their location coordinates ( positional information) and other information),
wherein the plurality of audio driver signals are produced by spatially rendering the set of one or more audio objects according to the positional information ( Kim: Para.[0098], [0099], Fig. 4, HOA generation unit 208A generate plurality of audio signals, based on spatial positioning vectors ).

Regarding Claim 5, Kim in view of Lee teach the method of claim 4. Kim further teaches, further comprising: determining a number of the set of one or more audio objects ( Kim: Para.[0100], Fig. 4, audio signals 26A may include audio signals for channels C1 through CL ( set of audio objects)); 
and determining the conversion matrix based on the number ( Kim: Para.[0101], Fig. 4, Eq. 29 illustrates determining of rendering matrix D ( conversion matrix) based on the number of audio signal 26A). 

Regarding Claim 6, Kim in view of Lee teach the method of claim 4. Kim further teaches, further comprising receiving an output speaker layout for the plurality of speakers, wherein the set of one or more audio objects are spatially rendered according to the output speaker layout ( Kim: Para.[0121], Fig.6, source loudspeaker setup information 48 specifies a CICP speaker layout index. Rendering format unit 110 may determine, based on this CICP speaker layout index, locations of loudspeakers in the source loudspeaker setup. Accordingly, representation unit 115 may include, in spatial vector representation data 71A, an indication of the CICP speaker layout index).

Regarding Claim 11, Kim in view of Lee teach the decoder-side device of claim 10. Kim further teaches, wherein the decoded representation comprises a higher-order ambisonics (HOA) representation of the input audio signal, wherein the memory has further instructions to apply a conversion matrix to the HOA representation to reconstruct the input audio signal, wherein the plurality of audio driver signals are produced by rendering the input audio signal based on the metadata ( Kim: Para.[0053]-[0055], the audio decoder may generate a set of higher order ambisonics (HOA) coefficients based on the multi-channel audio signal and the spatial positioning vectors. To reconstruct, the decoder may use rendering matrix D of number of HOA coefficients and number of channels. Para.[0098], Fig. 4, HOA generation unit 208A may generate set of HOA coefficients 212A. Para.[0043], input audio signals could be object-based audio, which involves discrete pulse-code-modulation (PCM) data for single audio objects with associated metadata containing their location coordinates and other information).

Regarding Claim 13, Kim in view of Lee teach the decoder-side device of claim 11. Kim further teaches, wherein the input audio signal comprises a set of one or more audio objects and the metadata comprises positional information relating to the set of one or more audio objects ( Kim: Para.[0043], input audio signals could be object-based audio, which involves discrete pulse-code-modulation (PCM) data for audio objects with associated metadata containing their location coordinates ( positional information) and other information),
 wherein the plurality of audio driver signals are produced by spatially rendering the set of one or more audio objects according to the positional information and a layout of the plurality of speakers ( Kim: Para.[0098], [0099], Fig. 4, HOA generation unit 208A generate plurality of audio signals, based on spatial positioning vectors. Para.[0121], Fig.6, source loudspeaker setup information 48 specifies a CICP speaker layout index. Rendering format unit 110 may determine, based on this CICP speaker layout index, locations of loudspeakers in the source loudspeaker setup. Accordingly, representation unit 115 may include, in spatial vector representation data 71A, an indication of the CICP speaker layout index).

Regarding Claim 17, Kim teaches an encoder-side method, the method comprising:
receiving an input audio signal of a piece of audio content and metadata relating to the input audio signal ( Kim: Para.[0084],[0085], Fig. 3, audio encoding device 14A receives audio signal 50. Para.[0043], input audio signals could be object-based audio, which involves discrete pulse-code-modulation (PCM) data for single audio objects with associated metadata containing their location coordinates and other information);
and transmitting the encoded input audio signal and the metadata to an audio playback device ( Kim: Para.[0090],[0091], Figs. 3, 4, bitstream 56A generated by the encoding device 14A is transmitted to decoding device 22A. Para.[0096],[0100], Fig. 4, Audio decoding unit 204 may be configured to decode coded audio signal 62 into audio signal 70. Audio decoding unit 204 may decode channels C1-CN of audio signal 62 into channels C1-CN of decoded audio signal 70. Rendering unit 210 may render audio signals 26A for playback at a plurality of local loudspeakers).

Kim while teaching the method of claim 17, fails to explicitly teach the claimed, encoding, using a Matching Pursuit (MP) coding-based algorithm, the input audio signal.

However, Lee does teach the claimed, encoding, using a Matching Pursuit (MP) coding-based algorithm, the input audio signal (Lee: Column 17, lines 29-43, the high-band speech signal is encoded and decoded based on a structure in which a harmonic structure and a stochastic structure is combined. The harmonic structure searches for an amplitude and a phase of a sine wave dictionary using a matching pursuit (MP) algorithm. Hence, the wideband speech encoding and decoding system according to the present invention can reproduce high-quality sound at a low bitrate and with low complexity).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Lee’s teaching of a high-band speech encoding and decoding apparatus in wideband speech encoding and decoding method, into the method coding of higher-order ambisonics audio data, taught by Kim, because, this would reproduce high quality sound even at a low bitrate in wideband speech encoding and decoding.(Lee, Column 2, lines 29-59).

Claims 7-9, 14-16 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al.  ( US 20170105085 A1), hereinafter referenced as Kim, in view of Lee et al. (US 7801733 B2), hereinafter referenced as Lee, further in view of Peters et al. ( US 20150127354 A1), hereinafter referenced as Peters.
Regarding Claim 7, Kim in view of Lee teach the method of claim 2. Kim in view of Lee fail to explicitly teach the claimed, wherein the bitstream is received from an encoder-side device, wherein the conversion matrix is an inverse matrix of a matrix used by the encoder-side device to produce the encoded representation of the input audio signal. 

However, Peters does teach the claimed, wherein the bitstream is received from an encoder-side device, wherein the conversion matrix is an inverse matrix of a matrix used by the encoder-side device to produce the encoded representation of the input audio signal ( Peters: Para.[0785]-[0786], Fig. 60, the encoded bitstream generated by audio encoding device 20 after step 854, is generated by inverse nearfield filtering).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Peters’s teaching of compressing higher order ambisonics (HOA) audio data, into the method, taught by Kim in view of Lee, because, this would improve the coding efficiency.(Peters, Para.[0087]).

Claim 14 is a decoder-side device claim performing the steps in method claim 7 above and as such, claim 14 is similar in scope and content to claim 7 and therefore, claim 14 is rejected under similar rationale as presented against claim 7 above.

Regarding Claim 8, Kim in view of Lee teach the method of claim 1. Kim in view of Lee fail to explicitly teach the claimed, further teaches, wherein the decoded representation of the input audio signal comprises a mixed signal, wherein the method further comprises, splitting the mixed signal into: a plurality of surround-sound channels of a surround-sound format, one or more audio objects that include one or more audio signals, and HOA data that includes a plurality of HOA signals. 

However, Peters does teach the claimed, further teaches, wherein the decoded representation of the input audio signal comprises a mixed signal ( Peters: Para.[0387], Fig. 18 illustrates audio decoding device 354, where the decoded representation 356, from the renderer 355 shows mixed signals such as for 5.1 speakers, 22.2 speakers…Binaural headphones),
wherein the method further comprises, splitting the mixed signal into: a plurality of surround-sound channels of a surround-sound format, one or more audio objects that include one or more audio signals, and HOA data that includes a plurality of HOA signals ( Peters: Para.[0387],[0388], Figs. 18, 19 illustrates audio decoding device 354 and the output representation of the audio signals 356, where 5.1 speakers, 22.2 speakers ( surround sound format), Binaural headphones ( audio object), HOA representation),
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Peters’s teaching of compressing higher order ambisonics (HOA) audio data, into the method, taught by Kim in view of Lee, because, this would improve the coding efficiency.(Peters, Para.[0087]).

Claim 15 is a decoder-side device claim performing the steps in method claim 8 above and as such, claim 15 is similar in scope and content to claim 8 and therefore, claim 15 is rejected under similar rationale as presented against claim 8 above.

Regarding Claim 9, Kim in view of Lee further in view of Peters teach the method of claim 8. Peters further teaches, wherein producing the plurality of audio driver signals comprises: rendering the plurality of surround-sound channels, the one or more audio signals ( Peters: Para.[0387], Fig. 18 illustrates audio decoding device 354, where the decoded representation 356, from the renderer 355 shows audio drive signals such as for 5.1 speakers, 22.2 speakers ( surround sound), Binaural headphones),
and mixing the renderings into the plurality of audio driver signals ( Peters: Para.[0387], Fig. 18 illustrates audio decoding device 354, where 366 represents the mixing of plurality of audio drive signals),


Kim further teaches, and the plurality of HOA signals according to the metadata and an output speaker layout of the plurality of speakers( Kim: Para.[0098], [0099], Fig. 4, HOA generation unit 208A generate plurality of audio signals, based on spatial positioning vectors. Para.[0121], Fig.6, source loudspeaker setup information 48 specifies a CICP speaker layout index. Rendering format unit 110 may determine, based on this CICP speaker layout index, locations of loudspeakers in the source loudspeaker setup. Accordingly, representation unit 115 may include, in spatial vector representation data 71A, an indication of the CICP speaker layout index ).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Peters’s teaching of compressing higher order ambisonics (HOA) audio data, into the method, taught by Kim in view of Lee, because, this would improve the coding efficiency.(Peters, Para.[0087]).

Claim 16 is a decoder-side device claim performing the steps in method claim 9 above and as such, claim 16 is similar in scope and content to claim 9 and therefore, claim 16 is rejected under similar rationale as presented against claim 9 above.

Regarding Claim 23, Kim in view of Lee teach the method of claim 17. Kim further teaches, wherein the input audio signal comprises: a plurality of surround-sound audio channels of the piece of audio content, a higher-order ambisonics (HOA) representation of the piece of audio content and a set of one or more audio objects of the piece of audio content ( Kim: Para.[0085], Fig. 3, audio signal 50 may be a six-channel audio signal for a source loudspeaker configuration of 5.1 (i.e., a front-left channel, a center channel, a front-right channel, a surround back left channel, a surround back right channel, and a low-frequency effects (LFE) channel). Para.[0142], Fig. 13, audio encoding device 14C determine, based on the data indicating the virtual source location for the audio object and data indicating a plurality of loudspeaker locations, a spatial vector of the audio object in a HOA domain), 
Lee further teaches, wherein encoding comprises encoding, using the MP coding-based algorithm, the mixed audio signal into a bitstream for transmission to the audio playback device (Lee: Column 17, lines 29-43, the high-band speech signal is encoded and decoded based on a structure in which a harmonic structure and a stochastic structure is combined. The harmonic structure searches for an amplitude and a phase of a sine wave dictionary using a matching pursuit (MP) algorithm. Hence, the wideband speech encoding and decoding system according to the present invention can reproduce high-quality sound at a low bitrate and with low complexity).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Lee’s teaching of a high-band speech encoding and decoding apparatus in wideband speech encoding and decoding method, into the method coding of higher-order ambisonics audio data, taught by Kim, because, this would reproduce high quality sound even at a low bitrate in wideband speech encoding and decoding.(Lee, Column 2, lines 29-59).

Kim in view of Lee while teaching the method of claim 23, fail to explicitly teach the claimed, wherein the method further comprises producing a mixed audio signal that includes the plurality of surround-sound audio channels, the HOA representation, and the set of one or more audio objects,

However, Peters does teach the claimed, wherein the method further comprises producing a mixed audio signal that includes the plurality of surround-sound audio channels, the HOA representation, and the set of one or more audio objects ( Peters: Para.[0387],[0388], Figs. 18, 19 illustrates audio decoding device 354 and the output representation of the audio signals 356, where 5.1 speakers, 22.2 speakers ( surround sound format), Binaural headphones ( audio object), HOA representation).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Peters’s teaching of compressing higher order ambisonics (HOA) audio data, into the method, taught by Kim in view of Lee, because, this would improve the coding efficiency.(Peters, Para.[0087]).



Claims 18, 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al.  ( US 20170105085 A1), hereinafter referenced as Kim, in view of Lee et al. (US 7801733 B2), hereinafter referenced as Lee, further in view of Routray et al. ( Upscaling HOA Signals using Order Recursive Matching Pursuit in Spherical Harmonics Domain, IEEE, 2022), hereinafter referenced as Routray.

Regarding Claim 18, Kim in view of Lee teach the method of claim 17. Kim further teaches, wherein the input audio signal comprises a plurality of surround-sound audio channels that includes a sound source, wherein the plurality of surround- sound audio channels comprises a first set of one or more full-range audio channels and a second set of one or more band-limited audio channels ( Kim: Para.[0085], Fig. 3,  input audio signal 50 may be a six-channel audio signal for a source loudspeaker configuration of 5.1, where the first set can be the front-left channel, center channel, front-right channel, surround back left channel, surround back right channel and the second set can be the low-frequency effects (LFE) channel ( band limited)),
wherein the method further comprises converting the first set into a higher-order ambisonics (HOA) representation of the sound source ( Kim: Para.[0092], Fig. 3, multi-channel audio signal from source loudspeaker is configured to represent a set of higher-order ambisonics (HOA) coefficients), 

Kim in view of Lee while teaching the claim of 18, fail to explicitly teach the claimed, wherein encoding comprises encoding, using the MP coding-based algorithm, the HOA representation into a bitstream for transmission to the audio playback device. 

However, Routray does teach the claimed, wherein encoding comprises encoding, using the MP coding-based algorithm, the HOA representation into a bitstream for transmission to the audio playback device ( Routray: Section III.C, source is encoded using MP ( matching pursuit) and proposed ORMP ( order recursive matching pursuit) algorithm. Encoded signals are decoded using a regular spaced loudspeaker array. The decoded HOA signals are rendered using 10, 25, and 64 numbers of regularly spaced loudspeakers).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Routray’s teaching of upscaling HOA signals using order recursive matching pursuit in spherical harmonics domain, into the method, taught by Kim in view of Lee, because, by using order recursive matching pursuit, the spatial resolution during the reproduction of the signal, would be improved. (Routray, Section IV).

Regarding Claim 20, Kim in view of Lee further in view of Routray teach the method of claim 18. Kim further teaches, wherein the metadata comprises surround-sound speaker layout information for the plurality of surround-sound audio channels ( Kim: Para.[0168], Fig. 17, audio encoding device 14D is encoding channel based audio, may obtain source loudspeaker setup information, source location of an audio object. Para.[0191], audio encoding device 14 may receive six-channels of audio data in the 5.1 surround sound format. Para.[0043], input audio signals could be object-based audio, with associated metadata containing their location coordinates ( positional information) and other information), 
wherein the first set is converted into HOA representation according to the surround-sound speaker layout information ( Kim: Para.[0177], Fig. 17, audio encoding device 14D may obtain, based on the source loudspeaker configuration, a plurality of spatial positioning vectors in the Higher-Order Ambisonics (HOA) domain that, in combination with the multi-channel audio signal, represent a set of higher-order ambisonics (HOA) coefficients that represent the multi-channel audio signal).  

Regarding Claim 21, Kim in view of Lee teach the method of claim 17. Kim further teaches, wherein receiving the[[an]] input audio signal comprises receiving a set of one or more audio objects, each audio object having at least one audio signal ( Kim: Para.[0139],[0141], Fig. 13, audio encoding device 14C is configured to encode object-based audio data. Bitstream generation unit 52C obtains an audio signal 50B for the audio object),
wherein the method further comprises producing a higher-order ambisonics (HOA) representation of the set of one or more audio objects ( Kim: Para.[0142], Fig. 13, audio encoding device 14C determine, based on the data indicating the virtual source location for the audio object and data indicating a plurality of loudspeaker locations, a spatial vector of the audio object in a HOA domain),

Kim in view of Lee while teaching the claim of 21, fail to explicitly teach the claimed, wherein encoding comprises encoding, using the MP coding-based algorithm, the HOA representation into a bitstream for transmission to the audio playback device. 

However, Routray does teach the claimed, wherein encoding comprises encoding, using the MP coding-based algorithm, the HOA representation into a bitstream for transmission to the audio playback device ( Routray: Section III.C, source is encoded using MP ( matching pursuit) and proposed ORMP ( order recursive matching pursuit) algorithm. Encoded signals are decoded using a regular spaced loudspeaker array. The decoded HOA signals are rendered using 10, 25, and 64 numbers of regularly spaced loudspeakers).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Routray’s teaching of upscaling HOA signals using order recursive matching pursuit in spherical harmonics domain, into the method, taught by Kim in view of Lee, because, by using order recursive matching pursuit, the spatial resolution during the reproduction of the signal, would be improved. (Routray, Section IV).

Regarding Claim 22, Kim in view of Lee, further in view of Routray teach the method of claim 21. Kim further teaches, further comprising: determining a number of the set of one or more audio objects ( Kim: Para.[0100], Fig. 4, audio signals 26A may include audio signals for channels C1 through CL ( set of audio objects)); 
 and determining a conversion matrix based on the number ( Kim: Para.[0101], Fig. 4, Eq. 29 illustrates determining of rendering matrix D ( conversion matrix) based on the number of audio signal 26A), 
wherein the HOA representation is produced by applying the conversion matrix to the set of one or more audio objects ( Kim: Para.[0051], an audio encoder may determine and encode one or more spatial positioning vectors (SPVs) that enable conversion of the encoded audio data into HOA coefficients. Para.[0101], in eq. 29, H represents HOA coefficients  and D is the rendering matrix ).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Kim et al.  ( US 20170105085 A1), hereinafter referenced as Kim, in view of Lee et al. (US 7801733 B2), hereinafter referenced as Lee, further in view of Routray et al. ( Upscaling HOA Signals using Order Recursive Matching Pursuit in Spherical Harmonics Domain, IEEE, 2022), hereinafter referenced as Routray, further in view of Sun et al. ( Immersive audio, capture, transport, and rendering: a review, Cambridge University, 24th August, 2021).

Regarding Claim 19, Kim in view of Lee further in view of Routray teach the method of claim 18. Kim in view of Lee further in view of Routray fail to explicitly teach the claimed, further comprising encoding the second set into the bitstream separately from the encoded HOA representation.  

However, Sun does teach the claimed, further comprising encoding the second set into the bitstream separately from the encoded HOA representation ( Sun: Section III, D, Fig.26 illustrates encoding different sets into bitstreams separately ).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Sun’s teaching of immersive audio background, end to end workflow, covering audio capture, compression, and rendering, into the method, taught by Kim in view of Lee, further in view of Routray, because, by better understanding of the immersive audio system the overall coding efficiency and customer’s experience could be improved. (Sun, Section III, B,IV, E) ).


Conclusion

Listed below are the prior arts made of record and not relied upon but are considered pertinent to applicant's disclosure.
Friedrich et al. (US 11929082 B2) teaches a system related to the field audio coding, an in particular to an audio decoder having at least two decoding modes, and associated decoding methods and decoding software for such audio decoder. In one of the decoding modes, at least one dynamic audio object is mapped to a set of static audio objects, the set of static audio objects corresponding to a predefined speaker configuration. The present disclosure further relates to a corresponding audio encoder, and associated encoding methods and encoding software for such audio encoder.
Chabanne et al. (US 9622014 B2) teaches  a method and system of rendering and playing back spatial audio content using a channel-based format. Spatial audio content that is played back through legacy channel-based equipment is transformed into the appropriate channel-based format resulting in the loss of certain positional information within the audio objects and positional metadata comprising the spatial audio content. To retain this information for use in spatial audio equipment even after the audio content is rendered as channel-based audio, certain metadata generated by the spatial audio processor is incorporated into the channel-based data. The channel-based audio can then be sent to a channel-based audio decoder or a spatial audio decoder. The spatial audio decoder processes the metadata to recover at least some positional information that was lost during the down-mix operation by upmixing the channel-based audio content back to the spatial audio content for optimal playback in a spatial audio environment.
Adami et al.  (US 20220101867 A1) teaches audio encoder for encoding audio input data to obtain audio output data includes an input interface for receiving a plurality of audio channels, a plurality of audio objects and metadata related to one or more of the plurality of audio objects; a mixer for mixing the plurality of objects and the plurality of channels to obtain a plurality of pre-mixed channels, each pre-mixed channel including audio data of a channel and audio data of at least one object; a core encoder for core encoding core encoder input data; and a metadata compressor for compressing the metadata related to the one or more of the plurality of audio objects, wherein the audio encoder is configured to operate in at least one mode of the group of two modes.
Soulier et al. (US 20220038818 A1) teaches Methods and systems for optimizing a routing of audio data to audio transmitting devices using a Bluetooth network are disclosed. One method includes receiving an encoded audio bitstream at a first speaker of the audio rendering system comprising a first and a second audio channels, separating a first set of spectral components of the first audio channel and a second set of spectral components of the second audio channel from the encoded audio bitstream, without decoding the audio bitstream, generating a first encoded bitstream from the first set of spectral components, and forwarding the first encoded bitstream to a second speaker of the audio rendering system over the wireless link.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NADIRA SULTANA whose telephone number is (571)272-4048. The examiner can normally be reached M-F,7:30 am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D. Shah can be reached on (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/NADIRA SULTANA/Examiner, Art Unit 2653
Read full office action
Prosecution Timeline

May 13, 2024
Application Filed
Mar 06, 2025
Response after Non-Final Action
Jan 24, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/654,845
Patent 12603086
CONTEXTUAL EDITABLE SPEECH RECOGNITION METHODS AND SYSTEMS
2y 5m to grant Granted Apr 14, 2026
18/129,882
Patent 12591747
ENTITY-CONDITIONED SENTENCE GENERATION
2y 5m to grant Granted Mar 31, 2026
18/154,197
Patent 12573413
AUDIO CODING METHOD AND RELATED APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 10, 2026
18/316,173
Patent 12567420
METHOD AND APPARATUS FOR CONTROLLING SOUND RECEIVING DEVICE BASED ON DUAL-MODE AUDIO THREE-DIMENSIONAL CODE
2y 5m to grant Granted Mar 03, 2026
17/575,195
Patent 12536992
ELECTRONIC DEVICE AND METHOD FOR PROVIDING VOICE RECOGNITION SERVICE
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+31.1%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 97 resolved cases by this examiner. Grant probability derived from career allow rate.