Last updated: April 19, 2026

Application No. 18/364,638

HYBRID RENDERING

Non-Final OA §103

Filed

Aug 03, 2023

Examiner

ZHU, QIN

Art Unit

2691

Tech Center

2600 — Communications

Assignee

Qualcomm Incorporated

OA Round

3 (Non-Final)

Interview Optional

— +2.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 610 resolved cases, 2023–2026

Examiner Intelligence

ZHU, QIN View full profile →

Grants 88% — above average

Career Allow Rate

534 granted / 610 resolved

+25.5% vs TC avg

Minimal +3% lift

Without

With

+2.6%

Interview Lift

resolved cases with interview

Fast prosecutor

2y 1m

Avg Prosecution

29 currently pending

Career history

639

Total Applications

across all art units

Statute-Specific Performance

§101

3.8%

-36.2% vs TC avg

§103

42.0%

+2.0% vs TC avg

§102

20.9%

-19.1% vs TC avg

§112

16.3%

-23.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 610 resolved cases

Office Action

§103

DETAILED ACTION
This action is in response to communications filed 1/26/2026:
Claims 1-37 are pending
35 USC 112f interpretations are maintained

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-37 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Response to Amendment
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that use the word “means” or “step” but are nonetheless not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph because the claim limitation(s) recite(s) sufficient structure, materials, or acts to entirely perform the recited function.  Such claim limitation(s) is/are: “means for determining…means for determining…means for rendering…and means for rendering…” in claim 33.
Because this/these claim limitation(s) is/are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are not being interpreted to cover only the corresponding structure, material, or acts described in the specification as performing the claimed function, and equivalents thereof.
If applicant intends to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitation(s) does/do not recite sufficient structure, materials, or acts to perform the claimed function.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-5, 7-14, 16-17, 19-21, and 23-37 is/are rejected under 35 U.S.C. 103 as being unpatentable over Walsh et al (US20200382894, hereinafter “Walsh”) in view of Zamani et al (US20240098444, hereinafter “Zamani”).
	Regarding claim 1, Walsh teaches a device (¶2, system) comprising: 
a memory configured to store first audio data and second audio data (¶38, storage); and 
one or more processors (¶38, processor) coupled to the memory and configured to: 
determine priorities of a plurality of audio sources of an audio scene (¶15, sound sources within user’s field of view (FOV) are rendered using a more complex method of rendering the audio vs those sources that are not in direct FOV); 
Walsh fails to explicitly teach determine whether a single renderer or multiple renderers that include an object renderer and a first ambisonics renderer are to be used to render the plurality of audio sources; and 
based on a determination that the multiple renderers are to be used to render the plurality of audio sources: 
render, using the object renderer, the first audio data to generate a first audio signal, wherein the first audio data represents a first audio source of the plurality of audio sources that is associated with a first priority; and 
render, using the first ambisonics renderer, the second audio data to generate a second audio signal, wherein the second audio data represents a second audio source of the plurality of audio sources that is associated with a second priority.
Zamani teaches determine whether a single renderer or multiple renderers that include an object renderer and a first ambisonics renderer are to be used to render the plurality of audio sources (Fig. 4, ¶83-92, based on the previously determined priorities, objects are rendered using object renderer 418 or ambisonics renderer 454); and 
based on a determination that the multiple renderers are to be used to render the plurality of audio sources: 
render, using the object renderer, the first audio data to generate a first audio signal, wherein the first audio data represents a first audio source of the plurality of audio sources that is associated with a first priority (¶87, Fig. 4, using determined priorities of object data, the audio signal can be rendered using object renderer); and 
render, using the first ambisonics renderer, the second audio data to generate a second audio signal, wherein the second audio data represents a second audio source of the plurality of audio sources that is associated with a second priority (¶87, Fig. 4, using determined priorities of object data, the audio signal can be rendered using ambisonics renderer).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the audio rendering apparatus (as taught by Walsh) with the layout module (as taught by Zamani). The rationale to do so is to combine prior art elements according to known methods to yield the predictable result of outputting audio according to a desired output layout (Zamani, ¶34).
Regarding claim 2, Walsh in view of Zamani teaches wherein the object renderer provides a higher spatial accuracy than the first ambisonics renderer (Walsh, ¶13, a combination of personalized HRTFs and frequency domain interpolation to provide improved performance in frontal localization and externalization vs just virtualization of speakers through Ambisonics (see Fig. 2)).
Regarding claim 3, Walsh in view of Zamani teaches wherein the first ambisonics renderer uses fewer processing resources as compared to the object renderer (Walsh, ¶13, rendering frontal objects requiring more computational complexity vs virtualizing speakers using generalized HRTFs).
Regarding claim 4, Walsh in view of Zamani teaches wherein the one or more processors are configured to:
determine a field of view of a user; and
assign the first priority to the first audio source based at least in part on a determination that a first source position of the first audio source is within the field of view (Walsh, Fig. 2, ¶15, using more complex processing to provide improved localized sounds in a user’s FOV).
Regarding claim 5, Walsh in view of Zamani teaches wherein the field of view corresponds to a cone in forward-looking direction from the head of the user (Walsh, Figs. 2-3, ¶19, a user’s FOV is determined using a 60 degree cone of vision).
Regarding claim 7, Walsh in view of Zamani teaches wherein the one or more processors are configured to assign a priority to an audio source based at least in part on a source position of the audio source, a source identifier of the audio source, a source type of the audio source, a source output of the audio source, a source localization angle, or a combination thereof (Walsh, ¶15, audio sources in the user’s FOV is given priority (by applying more computation resources to output a more accurate audio output)).
Regarding claim 8, Walsh in view of Zamani teaches wherein the one or more processors are configured to assign a priority to an audio source based at least in part on an audio source position of the audio source in the audio scene, a visual source position of the audio source in a visual scene, or both (Walsh, ¶15, audio sources in the user’s FOV is given priority (by applying more computation resources to output a more accurate audio output)).
Regarding claim 9, Walsh in view of Zamani teaches wherein the one or more processors are configured to:
assign the first priority to the first audio source based at least in part on determining that the first audio source has a first source position within a central target region of a visual scene; and
assign the second priority to the second audio source based at least in part on determining that the second audio source has a second source position within a peripheral target region of the visual scene (Walsh, ¶15, Fig. 2, sound sources not within user’s field of view (FOV) are rendered using a lesser complex method of rendering the audio and vice versa).
Regarding claim 10, Walsh in view of Zamani teaches wherein the one or more processors are configured to:
assign a third priority to a third audio source of the plurality of audio sources based at least in part on determining that the third audio source has a third source position that is in a particular target region between the central target region and the peripheral target region (Walsh, Fig. 2, ¶15, the entire audio scene can be broken down into 3 categories including a second category that describes audio sources that are in the peripherals of the user’s FOV); and
render, using a second ambisonics renderer of the multiple renderers, third audio data to generate a third audio signal, wherein the third audio data represents the third audio source, and wherein the second ambisonics renderer is a higher-order ambisonics renderer than the first ambisonics renderer (Zamani, Fig. 4, ¶82, based on a set priority (similar to the priority taught in Walsh), a different renderer (i.e. ambisonics renderer) can be used to render one or more bitstreams).
Regarding claim 11, Walsh in view of Zamani teaches wherein the one or more processors are configured to:
based on determining that a first renderer priority of the object renderer matches the first priority of the first audio source, select the object renderer to render the first audio data; and
based on determining that a second renderer priority of the first ambisonics renderer matches the second priority of the second audio source, select the first ambisonics renderer to render the second audio data (Zamani, ¶84, higher priority items may have a higher resolution (i.e. higher order ambisonics) versus those of a lower priority; Walsh, ¶15, based on the user’s FOV, different techniques with differing computation requirements can be applied to render audio objects).
Regarding claim 12, Walsh in view of Zamani teaches wherein the one or more processors are configured to assign a priority to an audio source based at least in part on determining whether a source position of the audio source is within one or more target regions (Walsh, Fig. 2, ¶15, regions are based on the user’s FOV and includes those in the direct FOV, peripheral, or outside the FOV).
Regarding claim 13, Walsh in view of Zamani teaches wherein the one or more target regions are based on at least one of a gaze direction of a user, a source localization angle, or a source output (Walsh, ¶15, regions are selected based on the user’s FOV/gaze).
Regarding claim 14, Walsh in view of Zamani teaches wherein the one or more processors are configured to update the priorities based on a change in a source position, a change in a gaze direction of a user, a change in a source localization angle, a change in a source output, or a combination thereof (Walsh, Fig. 3, ¶20, a user’s gaze direction determines which audio object is in FOV and thus the priority is modified accordingly).
Regarding claim 16, Walsh in view of Zamani teaches wherein the one or more processors are configured to determine the change in the source position based on detecting a movement of an audio source (Walsh, ¶36, 3D audio includes 3D positional data that are dynamic).
Regarding claim 17, Walsh in view of Zamani teaches wherein the one or more processors are configured to mix the first audio signal and the second audio signal to generate an output audio signal (Walsh, ¶21, output signal can be a combination of one or more rendered audio objects).
Regarding claim 19, Walsh in view of Zamani teaches wherein the one or more processors are configured to, based on determining that a multi-render criterion is satisfied, determine that the multiple renderers are to be used to render the plurality of audio sources (Zamani, Fig. 4, multiple renderers are available to be used in conjunction with a set priority).
Regarding claim 20, Walsh in view of Zamani teaches wherein the one or more processors are configured to determine that the multi-render criterion is satisfied based on determining that a count of the audio sources is greater than a count threshold, that available memory is less than a memory threshold, that remaining battery charge is less than a battery threshold, that a user setting indicates that multiple renderers are to be used, that at least two of the audio sources have source positions in different target regions, or a combination thereof (Zamani, ¶28, available bandwidth can be used as criteria for determining multi-rendering).
Regarding claim 21, Walsh in view of Zamani teaches wherein the one or more processors are configured to, based on determining that the multi-render criterion is not satisfied, transition from using the multiple renderers to using a single renderer to generate an output audio signal (Walsh, ¶14-15, Fig. 4, if no audio objects are found outside of the user’s FOV (only within the FOV) then only one type of rendering technique needs to be applied; Zamani, ¶28-29, available bandwidth can be used in consideration to determining which renderer to use).
Regarding claim 23, Walsh in view of Zamani teaches further comprising one or more microphones, wherein the one or more processors are configured to receive the first audio data from the one or more microphones (Zamani, ¶37, one or more microphones to capture audio signals).
Regarding claim 24, Walsh in view of Zamani teaches wherein the one or more processors are further configured to apply audio source extraction to audio data to generate the first audio data and the second audio data. (Zamani, ¶37, audio data can be obtained from the microphones and digitized to obtain the first and second audio data or audio can be obtained from an intermediary device).
Regarding claim 25, it is rejected similarly as claim 1. The method can be found in Walsh (¶2, methods).
Regarding claims 26-27, they are rejected similarly as claims 8-9, respectively. The method can be found in Walsh (¶2, methods).
Regarding claims 28-30, they are rejected similarly as claims 12-14, respectively. The method can be found in Walsh (¶2, methods).
Regarding claim 31, it is rejected similarly as claim 1. The medium can be found in Walsh (¶31, medium).
Regarding claim 32, it is rejected similarly as claim 17. The medium can be found in Walsh (¶31, medium).
Regarding claim 33, it is rejected similarly as claim 1. The medium can be found in Walsh (¶31, medium).
Regarding claim 34, Walsh in view of Zamani teaches wherein the means for determining priorities, the means for determining whether the single means for rendering or the multiple means for rendering are to be used, the means for rendering first audio data, and the means for rendering second audio data are integrated into at least one of a communication device, a mobile device, a computer, a display device, a television, a gaming console, a music player, a vehicle, a headset, an augmented reality headset, a mixed reality headset, a virtual reality headset, ear phones, an aerial vehicle, a home automation system, a voice-activated device, a wireless speaker and voice activated device, or an internet-of-things (IoT) device (Walsh, ¶36, integration into a headset).
Regarding claim 35, it is rejected similarly as a combination of claims 1 and 12.
Regarding claim 36, Walsh in view of Zamani teaches wherein the one or more processors are configured to assign a priority to an audio source based at least in part on a source identifier of the audio source, a source type of the audio source, a source output of the audio source, a source localization angle, or a combination thereof (Zamani, ¶87, priority can be determined within the metadata or derived from the type that is associated with the object; Walsh, Fig. 2, a localization parameter can be used to determine a FOV and thus a priority during rendering).
Regarding claim 37, Walsh in view of Zamani teaches wherein the one or more processors are configured to update the priorities based on user input (Walsh, Fig. 2, a user can move their head around to determine which audio objects are in view and thus determining their priority).

Claim(s) 6, 15, 18, and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Walsh et al (US20200382894, hereinafter “Walsh”) in view of Zamani et al (US20240098444, hereinafter “Zamani”) in further view of Olivieri et al (US20210160644, hereinafter “Olivieri”).
Regarding claim 6, Walsh in view of Zamani fail to explicitly teach wherein the one or more processors are configured to:
estimate a head orientation of a user;
based on the head orientation and a first source position of the first audio source, determine that the user is facing the first audio source; and
assign the first priority to the first audio source based on the determination that the user is facing the first audio source.
Olivieri teaches wherein the one or more processors are configured to:
estimate a head orientation of a user (¶69, device capable of tracking a user’s movements; ¶78, 6DOF includes head orientation);
based on the head orientation and a first source position of the first audio source, determine that the user is facing the first audio source (¶120, directional mapper determines which sounds are coming in front of the user); and
assign the first priority to the first audio source based on the determination that the user is facing the first audio source (¶120, assigning a predetermined priority for objects in and out of a user’s FOV).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the audio rendering apparatus (as taught by Walsh in view of Zamani) with the head tracker (as taught by Olivieri). The rationale to do so is to combine prior art elements according to known methods to yield the predictable result of rendering audio according to a user’s head orientation so that priority can be given to audio objects in the user’s field of view (Olivieri, ¶120).
Regarding claim 15, Walsh in view of Zamani in further view of Olivieri teaches wherein the one or more processors are configured to estimate the change in the gaze direction based on detecting a head rotation of the user (Olivieri, ¶57, 120, user’s 6 DOF (including rotation) is tracked to determine a gaze direction and to give priorities of the audio objects in the user’s gaze direction accordingly).
Regarding claim 18, Walsh in view of Zamani in further view of Olivieri teaches wherein the one or more processors are configured to:
apply a first gain to the first audio signal to generate a first gain adjusted signal;
apply a second gain to the second audio signal to generate a second gain adjusted signal, wherein the first gain is higher than the second gain; and
mix the first gain adjusted signal and the second gain adjusted signal to generate an output audio signal (Olivieri, ¶68, rendering of the audio signals can include determine which signal should be predominantly heard by the user (simplest way is by adjusting a volume)).
Regarding claim 22, Walsh in view of Zamani in further view of Olivieri teaches wherein the first audio source is live, and wherein the second audio source is virtual (Olivieri, ¶71, bitstream could comprise both of captured audio streams (i.e. live audio streams) and synthesized audio streams (i.e. virtual audio streams)).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Refer to PTO-892, Notice of References Cited for a listing of analogous art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to QIN ZHU whose telephone number is (571)270-1304. The examiner can normally be reached Monday-Thursday 6AM-4PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/QIN ZHU/Primary Examiner, Art Unit 2691

Read full office action

Prosecution Timeline

Aug 03, 2023

Application Filed

Jul 11, 2025

Non-Final Rejection — §103

Oct 14, 2025

Response Filed

Nov 19, 2025

Final Rejection — §103

Dec 19, 2025

Interview Requested

Dec 31, 2025

Applicant Interview (Telephonic)

Dec 31, 2025

Examiner Interview Summary

Jan 13, 2026

Response after Non-Final Action

Jan 26, 2026

Request for Continued Examination

Jan 30, 2026

Response after Non-Final Action

Mar 24, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/366,824

Patent 12604125

DETECTING ACTIVE SPEAKERS USING HEAD DETECTION

2y 5m to grant Granted Apr 14, 2026

18/664,733

Patent 12603076

NOISE CONTROL SYSTEM, NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM INCLUDING A PROGRAM, AND NOISE CONTROL METHOD

2y 5m to grant Granted Apr 14, 2026

18/225,558

Patent 12597900

METHOD AND APPARATUS TO EVALUATE AUDIO EQUIPMENT FOR DYNAMIC DISTORTIONS AND OR DIFFERENTIAL PHASE AND OR FREQUENCY MODULATION EFFECTS

2y 5m to grant Granted Apr 07, 2026

18/509,860

Patent 12593169

DIRECTION-BASED FILTERING FOR AUDIO DEVICES USING TWO MICROPHONES

2y 5m to grant Granted Mar 31, 2026

18/335,989

Patent 12587805

SOUND-FIELD CONTROL METHOD AND DEVICE, ELECTRONIC DEVICE AND COMPUTER-READABLE STORAGE MEDIUM

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

88%

Grant Probability

90%

With Interview (+2.6%)

2y 1m

Median Time to Grant

High

PTA Risk

Based on 610 resolved cases by this examiner. Grant probability derived from career allow rate.