Last updated: April 19, 2026

Application No. 18/292,336

METHODS AND APPARATUS FOR PROCESSING OBJECT-BASED AUDIO AND CHANNEL-BASED AUDIO

Final Rejection §103

Filed

Jan 25, 2024

Examiner

ZHANG, LESHUI

Art Unit

2695

Tech Center

2600 — Communications

Assignee

Dolby International AB

OA Round

2 (Final)

Interview Optional

— +36.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 928 resolved cases, 2023–2026

Examiner Intelligence

ZHANG, LESHUI View full profile →

Grants 78% — above average

Career Allow Rate

719 granted / 928 resolved

+15.5% vs TC avg

Strong +36% interview lift

Without

With

+36.0%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

47 currently pending

Career history

975

Total Applications

across all art units

Statute-Specific Performance

§101

5.5%

-34.5% vs TC avg

§103

42.5%

+2.5% vs TC avg

§102

13.6%

-26.4% vs TC avg

§112

28.7%

-11.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 928 resolved cases

Office Action

§103

DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
This Office Action is in response to the claim amendment filed on December 18, 2025 and wherein claims 1, 12, 14 amended and claims 3, 15 cancelled.
The Office appreciates the explanation of the amendment and analyses of the prior arts, and however, although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993) and MPEP 2145.
In the response to this office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 8-10, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Eggerding et al (US 20170243596 A1, hereinafter Eggerding) and in view of reference Purnhagen et al (US 20200020345 A1, hereinafter Purnhagen)..
Claim 1: Eggerding teaches a method (title and abstract, ln 1-15, method steps in fig. 5 and details in figs. 4a/4b) comprising: 
receiving a first frame of audio of a first format (receiving one of channel-based segment and object-based segment from a bitstream in Dolby Digital Plus Bitstream to Dolby Digital Plus Decoder 402 in fig. 4a, para 47);
receiving a second frame of audio of a second format different from the first format (receiving other one of the channel-based segment and object-based segment from the bitstream in Dolby Digital Plus Bitstream to Dolby Digital Plus Decoder 402 in fig. 4b, para 47), the second frame for playback subsequent to the first frame (through switching mechanism 301, para 46 and forming adaptive audio mix 208 with the channel 202 followed by the object 204 in fig. 2, and processing and playback in a continuous form, para 36, and from segment to segment of the audio stream, para 51); 
decoding the first frame of audio into a decoded first frame (by Dolby Digital Plus Decoder 402, Dolby Surround Upmixer 404 in fig. 4a, for channel-based decoding, para 47-48); 
decoding the second frame of audio into a decoded second frame (by Dolby Digital Plus Decoder 402 and object audio renderer 406 in fig. 4b, for object-based decoding, para 47-48); and 
generating a plurality of output frames of a third format (adaptive audio format to contain channel-based elements and object-based audio elements in a digital bitstream, para 38 and 208 in fig. 2, the digital bitstream, para 43, e.g., frame 1 and frame 2 with evolution frame 706 in the Dolby Digital PlusTM  frame, para 62-63) by performing rendering (via additional processing 410 by providing sound steering, object trajectory, height effects, according to adaptive-audio enabled speaker information, para 49) based on the decoded first frame and the decoded second frame (by taking outputs from Dolby Surround Upmixer 404 for decoded channel-based audio and by taking outputs from Object Audio Renderer 406 for decoded object-based audio), wherein the first format is an object-based audio format and the second format is a channel-based audio format or the first format is a channel-based audio format and the second format is an object-based audio format (channel-based audio segment format 200 and object-based audio segment format 204 in fig. 2) and 
wherein generating the plurality of output frames of a third format includes generating a hybrid output frame (including evolution frame and not update frame in frame 708, 710 in figs. 7-8, and fades are applied at the transient location of two types of audio signals, para 49) that includes two portions (adaptive audio mix 208 in fig. 2, to generate one or more bitstreams containing both conventional channel-based audio elements and audio object-based audio, para 38, i.e., as one continual audio stream containing hybrid output mix, para 36, 48), said generating the hybrid output frame comprises: 
obtaining one portion of the hybrid output frame by a portion of the frame of audio of the object-based audio format (object-based audio signal 204 and outputted from the decoder 302 in fig. 3); and 
obtaining the other portion of the hybrid output frame from a portion of the frame of audio of the channel-based audio format (channel-based audio signal portion 202 in the adaptive audio format or mix 208 in fig. 2 and outputted from the decoder 302 in fig. 3).
However, Eggerding does not explicitly teach wherein the portion of the frame of audio of the object-based audio format is downmixed.
Purnhagen teaches an analogous field of endeavor by disclosing a method (title and abstract, ln 1-8 and method implemented by a renderer 122 in fig. 9) and wherein downmixing a frame of audio of the object-based audio format is disclosed (reconstructed approximate N audio objects 106’ are further rendered to an output signal 124 having a format being suitable for playback on headphone configuration, i.e., from N reconstructed audio object signals to two channel signals, equivalent to downmix in fig. 9 or L auxiliary signals from the N audio objects and reconstructed N audio objects are synthesized or rendered via 624 into N reconstructed audio object signals, i.e., downmixing) for improving reconstructing audio objects in a more accurate manner (para 34-36) with costless operations (para 5 and e.g., less computation to less number of audio objects due to downmix of audio objects). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied downmixing the frame of audio of the object-based audio format, as taught by Purnhagen, to the portion of the frame of audio of the object-based audio format for obtaining the portion of the hybrid output frame in the method, as taught by Eggerding, for the benefits discussed above.
Claim 2: the combination of Eggerding and Purnhagen further teaches, according to claim 1 above, wherein generating the plurality of output frames of a third format (Eggerding, adaptive audio mix 206 in fig. 2, discussed in claim 1 above) by downmixing the frame of audio of the object-based audio format (Eggerding, groups of sound elements, and Purnhagen, the approximate N audio objects 106’ are further rendered to an output signal 124 having a format being suitable for playback on headphone configuration, i.e., from N reconstructed audio object signals downmixed to two channel signals in fig. 9 or L auxiliary signals from the N audio objects are synthesized or rendered via 624 into N reconstructed audio object signals). 
Claim 4: the combination of Eggerding and Purnhagen further teaches, according to claim 3 above, wherein a duration of the portion of the frame of audio of the object-based audio format is determined (Eggerding, latency management is implemented within the object audio renderer 406; the object audio renderer as the part of object-based decoding processing is quired for latency in samples by a latency-determining algorithm, as duration of portion of frame of audio of the object-based audio format, in fig. 4b, para 48) based on a latency of an associated decoding process (Eggerding, quired latency of components including the object audio renderer as the part of the decoding processing in sample and discussed above).
Claim 5: the combination of Eggerding and Purnhagen further teaches, according to claim 1 above, wherein the first frame of audio and the second frame of audio are received in a first bitstream (Eggerding, audio segment, either channel-based or object-based, in audio bitstream and delivered to audio decoder, para 12-13).
Claim 6: the combination of Eggerding and Purnhagen further teaches, according to claim 1 above, after rendering, (Eggerding, after object audio renderer in fig. 4B), performing one or more fades to resolve output discontinuities (Eggerding, application of fades for minimizing transients, para 49, and wherein transients as signal discontinuities).
Claim 8: the combination of Eggerding and Purnhagen further teaches, according to claim 1 above, wherein decoding the frame of audio of the object-based format (Eggerding, decoding object audio by 402 in fig. 4b) includes modifying object audio metadata OAMD (Eggerding, metadata is updated over time to define different rendering attributes, para 15) associated with the frame of audio of the object-based format (Eggerding, the metadata is updated over time at the decoder, para 8, and to indicate changes of rendering parameters with respect to PCM samples from the input audio bitstream at the audio object decoder, para 62).
Claim 9: the combination of Eggerding and Purnhagen further teaches, according to claim 8 above, wherein when the first frame is of the channel-based format, and the second frame is of the object-based format (Eggerding, channel-based and object-based segments in the input bitstream and discussed in claim 1 above), modifying the OAMD associated with the frame of audio of the object-based format (Eggerding, OAM including data defining type of audio segment upon detection of the change of type, para 12, and updating the metadata at the decoder, para 8, and to indicate changes of rendering parameters from the input audio bitstream at the audio object decoder, para 62) includes at least one of: 
applying, to the OAMD associated with the frame of audio of the object-based format, a time offset corresponding to a latency of the decoding process (Eggerding, metadata element passed to metadata frame with a sample offset indicating at which sample in an audio block the frame applies at OARI, i.e., the sample offset counted as associated with processing of the Dolby TrueHD decoder and Dolby Digital Plus Decoder 402 in fig. 4b, para 14, including oa_sample_offset, oa_sample_offset_code, etc., in table 1 of fig. 14 or Evolution frame payload containing object-based metadata OAMD, para 58, with either 2000 samples or 1000 samples as time offset, upon the sampling frequency applied in the decoding processing, para 82); and 
setting a ramp duration (Eggerding, measuring a degree of timing and alignment of metadata updates in the Evolution framework payload against each of audio blocks the metadata is applied to, para 82-84) specified in the OAMD of the frame of audio of the object-based format to zero (Eggerding, including timing, alignment of the metadata updates, maximum/minimum block size and other factors by which, the processing block sizes of the audio are adjusted for optimally pairing with the audio blocks, specifically for objects that update non-uniformly with respect to the data block boundaries, including time scaling with sampling frequency in table 2, para 83, i.e., ramp duration to compensate the alignment and time is zero inherently for optimally matching boundaries of the audio block, para 84, for example, by filtering for those metadata updates that do not align on block boundaries, para 59), wherein the ramp duration specifies the time to transition from previous OAMD to the OAMD of the frame of audio of the object-based format (Eggerding, boundaries between the OA metadata to OA metadata against associated with audio blocks, para 84, and e.g., by filtering to compensate block alignment, para 59).
Claim 10: Eggerding further teaches, according to claim 8 above, wherein when the first frame is of the object-based format, and the second frame is of the channel-based format (channel-based and object-based segments in the input bitstream and discussed in claim 1 above), modifying the OAMD associated with the frame of audio of the object-based format (OAM including data defining type of audio segment upon detection of the change of type, para 12, and updating the metadata at the decoder, para 8, and to indicate changes of rendering parameters from the input audio bitstream at the audio object decoder, para 62), and providing OAMD that includes position data specifying the positions of the object of the object-based format (the metadata also includes 3D position, object size, for audio objects, para 54-55), except explicitly teaching providing OAMD that includes position data specifying the positions of the channels of the channel-based format.
Purnhagen teaches an analogous field of endeavor by disclosing a method (title and abstract, ln 1-8 and method implemented by a renderer 122 in fig. 9) and wherein metadata is disclosed (104 from the encoder to the renderer 122 of the decoder in fig. 1) and wherein providing OAMD that includes position data specifying the positions of the channels of the channel-based format (the metadata includes positional information specifying the positions of the bed channels of the channel-based format 106b compared to the object-based format 106a, para 58 or generating presentation matrix applied in the renderer 122 based on the positional information, para 73) so that audio contents including audio objects and bed channels can be rendered on a desired loudspeaker or headphones configuration (para 64).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied providing OAMD that includes position data specifying the positions of the channels of the channel-based format, as taught by Purnhagen, to providing OAMD that includes position data specifying the positions of the objects of the object-based format in the method, as taught by Eggerding, for the benefits above.
Claim 12 has been analyzed and rejected according to claim 1 above and the combination of Eggerding and Purnhagen further teaches an electronic device (Eggerding, computer or processing devices, para 36 and e.g., AVR systems by using DolbyTM AtmosTM technology, para 6 and Purnhagen, decoding devices, para 39), comprising: 
one or more processors (Eggerding, DSPs, para 53 and processor-based computing device, para 88 and Purnhagen, DSP or microprocessor, para 103); and 
a memory (Eggerding, machine-readable or computer-readable media, including register transfer and logic component, para 88 and Purnhagen, RAM, ROM, EEPROM, flash memory, para 103) storing one or more programs (Eggerding, storing data and instructions, para 88 and Purnhagen, storing instructions and program modules, para 103) configured to by executed by the one or more processors to implement the method of claim 1 (Eggerding, as hardware, firmware and instructions implemented by the processors, para 88 and Purnhagen, the stored software is implemented by the processor, para 103).
Claim 14 has been analyzed and rejected according to claims 1, 12 above.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Eggerding (above) and in view of references Purnhagen (above) and Luyten et al (US 20210360348 A1, hereinafter Luyten).
Claim 7: the combination of Eggerding and Purnhagen teaches, according to claim 6 above, the one or more fading operations (Eggerding, the fades for minimizing transients, para 49), except applying a limiter, wherein the one or more fading operations includes a fade in and fade out, wherein both the fade in and the fade out have a duration equal to a delay of the limiter.
Luyten teaches an analogous field of endeavor by disclosing a method (title and abstract, ln 1-14 and figs. 1, 3) and wherein one or more fading operations are disclosed to include a fade in and fade out (audio signals src 1 and src 2 in fig. 1, and mixing factor g1 and g2 controlled by ctrl 1 112 in fig. 1, fade-in for src 2 from 216 to 218 following time t1 and src 1 from 226 to 228 following time t2 in fig. 2 and fade-out for src 1 from 212 to zero following the time 1 and for src 2 from 222 to 224 following the time 2 in fig. 2) and a limiter is applied (110 and 111 in fig. 1) and wherein both the fade in and the fade out have a duration equal to the delay of the limiter (the delay added in 110 and 111 for compensating time lag during reducing the output amplitude provided at an audio output port of the cross-fade module from the first audio signal src1, while increasing the output amplitude of the second audio signal in response to the control signal, para 13, and for smoothly transitions from one source to the other with less annoying for the user, para 37). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the limiter and wherein the one or more fading operations includes the fade in and the fade out, wherein both the fade in and the fade out have the duration equal to the delay of the limiter, as taught by Luyten, to the fade in and the fade out in the one or more fading operations, as taught by the combination of Eggerding and Purnhagen, for the benefits discussed above.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Eggerding (above) and in view of references Purnhagen (above) and Mate et al (US 20210029480 A1, hereinafter Mate).
Claim 11: the combination of Eggerding and Purnhagen further teaches, according to claim 1 above, wherein the first frame of audio and the second frame of audio are delivered (Eggerding, in an input audio bitstream in fig. 3, para 44), except explicitly teaching that the delivery is accordance with an adaptive streaming protocol.
Mate teaches an analogous field of endeavor by disclosing a method (title and abstract, ln 1-16 and figs. 6-10) and wherein audio frames (scene representation as audio channels, audio objects, or HOA, para 25) are delivered (within bitrate budget with S3, S4, S6 in time T1, and S5, S7, S8 in T1 as channel downmix, etc., in fig. 2, para 33) accordance with an adaptive streaming protocol (via a DASH delivery from the MPD server to a client 304 for 6DOF audio player, para 55, including MPEG-DASH media presentation description in the stream, para 71) because of lower latency performance in a flexible manner (para 31). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the delivery of the first frame and the second frame of the audio according to according with the adaptive streaming protocol, as taught by Mate, to the delivery of the first frame and the second frame in the method, as taught by the combination of Eggerding and Purnhagen, for the benefits discussed above.

Allowable Subject Matter
Claims 12-13 are in condition for allowance.

Response to Arguments

Applicant's arguments filed on December 18, 2025 have been fully considered and but are moot in view of the new ground(s) of rejection necessitated by the applicant amendment. Although a new ground of rejection has been used to address additional limitations that have been added to claims 1, 12, 14, a response is considered necessary for several of applicant’s arguments since references Eggerding and Purnhagen will continue to be used to meet several claimed limitations.
With respect to the prior art rejection of independent claim 1, similar to claim 14, under 35 USC §103(a), as set forth in the Office Action, applicant challenged prior art Eggerding and argued: “Neither Eggerding nor Purnhagen disclose a hybrid output frame” because “The adaptive audio mix 208 disclosed by Eggerding is a mix of bed channels and object channels and not a hybrid output frame that includes, e.g., samples from channel-based audio and object-based audio combined into a single frame … paragraphs [0060]-[0062]. A mix is not a frame. Likewise, Purnhagen discloses reconstruction of audio objects and possible bed channels, neither of which are hybrid frames”, as asserted in paragraphs 1-2 of page 8 in Remarks filed on December 18, 2025.
In response to the argument cited above, the Office respectfully disagrees because claims broadly recited “two portions” comprised of “a hybrid output frame” with no limitation of how the “two portions” are distributed or composited, and thus, Eggerding clearly not only disclosed that mixed frame (figs. 7-8) comprising two portions (evolution frame or object-based audio signal portion and non-evolution frame or channel-based audio signal portion specified based on metadata updates and non-updated metadata, respectively in figs. 7-8, para 62-63), but also mixed at transient portion of the frame between two types of audio signals in the frame in order to minimize transients  (fades applied to minimize transients in the system, i.e., mixed two types of audio signals, para 49), but applicant is in silence and thus, the argument above is moot. As recorded, prior art Chinen (US 20180033440 A1) has not been applied in the office action, but also disclosed similar scheme to achieve the hybrid frame comprising both object-based audio signal and channel-based audio signal (through the fade-in and fade-out, para 315, 328-330, etc.)
In the response to this office action, the Office respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Office in prosecuting this application.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589. The examiner can normally be reached Monday-Friday 6:30amp-4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached at 571-272-7848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/LESHUI ZHANG/
Primary Examiner, 
Art Unit 2695

Read full office action

Prosecution Timeline

Jan 25, 2024

Application Filed

Sep 05, 2025

Non-Final Rejection — §103

Dec 08, 2025

Response Filed

Dec 08, 2025

Response after Non-Final Action

Dec 18, 2025

Response Filed

Mar 25, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/124,589

Patent 12585677

AUTOMATED GENERATION OF IMPROVED LIST-TYPE ANSWERS IN QUESTION ANSWERING SYSTEMS

2y 5m to grant Granted Mar 24, 2026

17/726,728

Patent 12572757

VIDEO PROCESSING METHOD, VIDEO PROCESSING APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM

2y 5m to grant Granted Mar 10, 2026

18/410,942

Patent 12567423

SYSTEM AND METHODS FOR UPSAMPLING OF DECOMPRESSED SPEECH DATA USING A NEURAL NETWORK

2y 5m to grant Granted Mar 03, 2026

18/553,783

Patent 12567424

METHOD AND DEVICE FOR MULTI-CHANNEL COMFORT NOISE INJECTION IN A DECODED SOUND SIGNAL

2y 5m to grant Granted Mar 03, 2026

18/104,083

Patent 12561354

SYSTEMS AND METHODS FOR ITEM-SPECIFIC KEYWORD RECOMMENDATION

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

78%

Grant Probability

99%

With Interview (+36.0%)

2y 10m

Median Time to Grant

Moderate

PTA Risk

Based on 928 resolved cases by this examiner. Grant probability derived from career allow rate.