Last updated: April 19, 2026

Application No. 18/605,688

Spatial Audio Upscaling Using Machine Learning

Final Rejection §103§DP

Filed

Mar 14, 2024

Examiner

MONIKANG, GEORGE C

Art Unit

2692

Tech Center

2600 — Communications

Assignee

Apple Inc.

OA Round

2 (Final)

Interview Optional

— +7.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 941 resolved cases, 2023–2026

Examiner Intelligence

MONIKANG, GEORGE C View full profile →

Grants 74% — above average

Career Allow Rate

701 granted / 941 resolved

+12.5% vs TC avg

Moderate +7% lift

Without

With

+7.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

48 currently pending

Career history

989

Total Applications

across all art units

Statute-Specific Performance

§101

3.9%

-36.1% vs TC avg

§103

58.6%

+18.6% vs TC avg

§102

22.5%

-17.5% vs TC avg

§112

4.0%

-36.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 941 resolved cases

Office Action

§103 §DP

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 1/26/2026 have been fully considered but they are not persuasive.
With regards to applicant’s argument that the Salehin reference whether rotation of the formatted FOA audio is carried out before or after spatial conversion to higher order ambisonics, the examiner maintains. The Salehin reference teaches the concept of captured sounds converted to ambisonics being played back over a plurality of loudspeaker configurations where the play back captured sounds can be rotated based on the head position of a listener, wherein the rotation technique may be described with respect to first order ambisonics and also higher order ambisonics (Salehin, para 0086), therefore the it would have been obvious for one of ordinary skill in the art to perform audio rotation based on head position of the listener with respect to first order ambisonics as taught in Salehin as part of the first order ambisonics formatting of Sen et al for the purpose of accommodating changed with the user’s head position.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.

Claim 11 of 18/605,688
An audio system, comprising: a processor configured to: obtain first order Ambisonics (FOA) audio that includes a captured sound scene; rotate the FOA audio based on a user head position to produce a rotated FOA audio; format each signal of the rotated FOA audio to a stream of audio frames, as a formatted rotated FOA audio; provide the formatted rotated FOA audio to a machine learning model that is configured to upscale the formatted rotated FOA audio into a desired higher order Ambisonics (HOA) format; and obtain output audio of the captured sound scene in the desired HOA format from the machine learning model.

Claim 1 of 18/641,125
A method performed by one or more digital processors for loudspeaker calibration and room personalization, the method comprising: a) generating audio stimuli that are wirelessly sent to inputs of a plurality of speaker drivers simultaneously, wherein the speaker drivers are in one or more loudspeaker housings in a room, and in response the speaker drivers produce a sound field around a portable audio capture device in the room; b) generating a first order ambisonics, FOA, capture of the sound field using multiple microphone outputs of an integrated microphone array in the portable audio capture device; and c) processing the FOA capture to determine a plurality of sets of filters for the plurality of speaker drivers, respectively, each set of filters corrects for sound coloration.

Claim 2 of 18/641,125
The method of claim 1 further comprising: rendering a multi-channel sound program into a plurality of audio signals; applying the plurality of sets of filters to the plurality of audio signals, respectively, to produce a plurality of corrected audio signals; and wirelessly sending the plurality of corrected audio signals to inputs of the plurality of speaker drivers, respectively.

Claim 3 of 18/641,125
The method of claim 2 further comprising upscaling the FOA capture into a higher order ambisonics, HOA, capture; and processing the HOA capture to determine the plurality of sets of filters.

Claim 10 of 18/641,125
An article of manufacture comprising a non-transitory machine-readable medium having stored therein instructions that configure an audio system to perform loudspeaker calibration and room personalization, the audio system being configured to: a) generate audio stimuli that are to be wirelessly sent to inputs of a plurality of speaker drivers simultaneously, wherein the speaker drivers are in one or more loudspeaker housings in a room, and in response the speaker drivers produce a sound field around a portable audio capture device in the room; b) generate a first order ambisonics capture, FOA capture, of the sound field using multiple microphone outputs of an integrated microphone array in the portable audio capture device; and c) process the FOA capture to determine a plurality of sets of filters for the plurality of speaker drivers, respectively, each set of filters corrects for sound coloration.

Claim 11 of 18/641,125
The article of manufacture of claim 10 further comprising stored instructions that configure the audio system to: render a multi-channel sound program into a plurality of audio signals; apply the plurality of sets of filters to the plurality of audio signals, respectively, to produce a plurality of corrected audio signals; and send the plurality of corrected audio signals to inputs of the plurality of speaker drivers, respectively.

Claim 12 of 18/641,125
The article of manufacture of claim 11 further comprising stored instructions that configure the audio system to: upscale the FOA capture into a higher order ambisonics capture, HOA capture; and process the HOA capture to determine the plurality of sets of filters.


Claim 11 (of application number 18/605,688 hereinafter referred to as ‘688) is provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-3, 10-12 of copending Application No. 18/641,125 (hereinafter referred to as ‘125) in view of Wang et al, US Patent Pub. 20190306451 A1, in view of Sen et al, US Patent Pub. 20210098004 A1, and further in view of Salehin et al, US Patent Pub. 20200107118 A1. 
Claims 1, 11, 16 of ‘688 are obvious various wordings of claims 1-3, 10-12 of ‘125; but fails to disclose rotate the FOA audio based on a user head position to produce a rotated FOA audio; and formatting rotated FOA audio before using machine learning models to upscale the FOA to HOA. Sen et al discloses a system that teaches the concept of formatting lower order ambisonics such as first order ambisonics (Sen et al, para 0004). It would have been obvious to modify ‘125 claims 1-3, 10-12 such that its first order ambisonics are formatted as taught in Sen et al before being processed into high order ambisonics via the machine learning neural network for the purpose of making the system future proof format and best used for artificial intelligence systems. Wang et al discloses where a neural network is utilized to carry out the upscaling from a first order ambisonics audio to high order ambisonics audio (Wang et al, para 0019: neural network machine learning model processes first order spatial audio to obtain high order spatial audio; wherein the spatial audio includes any three-dimensional audio for example ambisonics audio). It would have been obvious to modify ‘125 claims 1-3, 10-12 such that its upscaling is carried out by a neural network machine learner as taught in Wang et al for the purpose of creating a system that can learn and adapt to new data.
Furthermore, Salehin et al teaches the concept of the first order ambisonics signals being adjusted based on head position (Salehin et al, para 0086). It would have been obvious for one of ordinary skill in the art to perform audio rotation based on head position of the listener with respect to first order ambisonics as taught in Salehin as part of the first order ambisonics formatting of Sen et al for the purpose of accommodating changed with the user’s head position.

This is a provisional nonstatutory double patenting rejection.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 11 & 13 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al, US Patent Pub. 20190306451 A1, in view of Sen et al, US Patent Pub. 20210098004 A1, and further in view of Salehin et al, US Patent Pub. 20200107118 A1.
Re Claim 11, Wang et al discloses an audio system, comprising: a processor configured to: obtain first order Ambisonics (FOA) audio that includes a captured sound scene (para 0019: neural network machine learning model processes first order spatial audio to obtain high order spatial audio; wherein the spatial audio includes any three-dimensional audio for example ambisonics audio; wherein the machine learning algorithm of the neural network also include fourier transform for time domain processing (para 0092)); provide the FOA audio to a machine learning model that is configured to upscale the FOA audio into a desired higher order Ambisonics (HOA) format (para 0019: neural network machine learning model processes first order spatial audio to obtain high order spatial audio; wherein the spatial audio includes any three-dimensional audio for example ambisonics audio; wherein the machine learning algorithm of the neural network also include fourier transform for time domain processing (para 0092)); and obtain output audio of the captured sound scene in the desired HOA format from the machine learning model (para 0019: neural network machine learning model processes first order spatial audio to obtain high order spatial audio; wherein the spatial audio includes any three-dimensional audio for example ambisonics audio; wherein the machine learning algorithm of the neural network also include fourier transform for time domain processing (para 0092)); but fails to disclose rotate the FOA audio based on a user head position to produce a rotated FOA audio; and format each signal of the rotated FOA audio to a stream of audio frames, as a formatted rotated FOA audio. However, Sen et al discloses a system that teaches the concept of formatting lower order ambisonics such as first order ambisonics (Sen et al, para 0004). It would have been obvious to modify Wang et al such that its first order ambisonics are formatted as taught in Sen et al before being processed into high order ambisonics via the machine learning neural network for the purpose of making the system future proof format and best used for artificial intelligence systems.
Furthermore, Salehin et al teaches the concept of the first order ambisonics signals being adjusted based on head position (Salehin et al, para 0086). It would have been obvious for one of ordinary skill in the art to perform audio rotation based on head position of the listener with respect to first order ambisonics as taught in Salehin as part of the first order ambisonics formatting of Sen et al for the purpose of accommodating changed with the user’s head position.
Re Claim 13, the combined teachings of Wang et al, Sen et al and Salehin disclose the audio system of claim 11, further comprising: rendering the output audio based on a plurality of desired speaker positions of a surround sound speaker format to produce a plurality of speaker channels (Sen et al, claim 6: sound rendered in surround format).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al, US Patent Pub. 20190306451 A1, Sen et al, US Patent Pub. 20210098004 A1 and Salehin et al, US Patent Pub. 20200107118 A1, as applied to claim 11 above, in view of Sunder et al, US Patent 11076257 B1.
Re Claim 12, the combined teachings of Wang et al and Sen et al disclose the audio system of claim 11, but fail to explicitly disclose further comprising: rendering the output audio with a binaural renderer to produce binaural aural audio comprising a left audio channel and a right audio channel. However, Sunder et al teaches the general concept of converting ambisonics audio to binaural left and binaural right audio output channels (Sunder al, abstract). It would have been obvious to modify the Wang et al and Sen et al system such that the high order ambisonics can be transmitted to be output as binaural left and binaural right channel outputs as taught in Sunder et al for the purpose of obtaining enhanced audio quality.

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al, US Patent Pub. 20190306451 A1, Sen et al, US Patent Pub. 20210098004 A1 and Salehin et al, US Patent Pub. 20200107118 A1, as applied to claim 11 above, in view of Sun et al, WO 2016033358 A1.
Re Claim 14, the combined teachings of Wang et al and Sen et al disclose the audio system of claim 11, but fail to explicitly disclose further comprising: rendering the output audio with a cross talk canceller (XTC) to produce binaural aural audio comprising a left audio channel and a right audio channel. However, Sun et al teaches the concept of cross talk cancellation to produce binaural left and right channel output signals (Sun et al, paras 0043-0044, 0070). It would have been obvious to modify the Wang et al and Sen et al system such that it incorporates cross talk cancellation as taught in Sun et al for the purpose of minimizing sound leaks.



Allowable Subject Matter
Claim 15 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter for claim 15: The prior art does not teach or moderately suggest the following limitations:
Wherein the processor is further configured to: remove higher order components of the output audio in the desired HOA format, to result in a compressed FOA audio; and transmit the compressed FOA audio to a remote device, wherein the remote device applies the machine learning model to the compressed FOA audio to obtain the desired HOA format.
Limitations such as these may be useful in combination with other limitations of claim 11.

Claims 1-6, 8-10, 16 & 19-20 are allowed. 
The following is an examiner’s statement of reasons for allowance:
	Claims 1 & 16 are allowed for the reasons set forth in applicants arguments filed 01/26/2026 page 6. 
Claims 2-6, 8-10 depend on claim 1. Claims 19-20 depend on claim 16.



Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEORGE C MONIKANG whose telephone number is (571)270-1190. The examiner can normally be reached Mon. - Fri., 9AM-5PM, ALT. Fridays off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carolyn R Edwards can be reached at 571-270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GEORGE C MONIKANG/Primary Examiner, Art Unit 2692             					2/26/2026

Read full office action

Prosecution Timeline

Mar 14, 2024

Application Filed

Nov 15, 2024

Response after Non-Final Action

Oct 31, 2025

Non-Final Rejection — §103, §DP

Jan 26, 2026

Response Filed

Feb 26, 2026

Final Rejection — §103, §DP (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/436,924

Patent 12604126

VEHICULAR MICROPHONE AND VEHICLE

2y 5m to grant Granted Apr 14, 2026

18/442,631

Patent 12596518

MICROPHONE INTERFACE, VEHICLE, CONNECTION METHOD, AND PRODUCTION METHOD

2y 5m to grant Granted Apr 07, 2026

18/529,910

Patent 12596888

CONTEXTUALIZATION OF GENERATIVE LANGUAGE MODELS BASED ON ENTITY RESOURCE IDENTIFIERS

2y 5m to grant Granted Apr 07, 2026

18/536,551

Patent 12598428

TRANSDUCER AND ELECTRONIC DEVICE

2y 5m to grant Granted Apr 07, 2026

18/541,435

Patent 12591749

MACHINE LEARNING SYSTEM FOR MULTI-DOMAIN LONG DOCUMENT CLUSTERING

2y 5m to grant Granted Mar 31, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

74%

Grant Probability

82%

With Interview (+7.2%)

3y 0m

Median Time to Grant

Moderate

PTA Risk

Based on 941 resolved cases by this examiner. Grant probability derived from career allow rate.

Spatial Audio Upscaling Using Machine Learning

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email