Prosecution Insights
Last updated: April 19, 2026
Application No. 18/364,638

HYBRID RENDERING

Non-Final OA §103
Filed
Aug 03, 2023
Examiner
ZHU, QIN
Art Unit
2691
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
3 (Non-Final)
88%
Grant Probability
Favorable
3-4
OA Rounds
2y 1m
To Grant
90%
With Interview

Examiner Intelligence

Grants 88% — above average
88%
Career Allow Rate
534 granted / 610 resolved
+25.5% vs TC avg
Minimal +3% lift
Without
With
+2.6%
Interview Lift
resolved cases with interview
Fast prosecutor
2y 1m
Avg Prosecution
29 currently pending
Career history
639
Total Applications
across all art units

Statute-Specific Performance

§101
3.8%
-36.2% vs TC avg
§103
42.0%
+2.0% vs TC avg
§102
20.9%
-19.1% vs TC avg
§112
16.3%
-23.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 610 resolved cases

Office Action

§103
DETAILED ACTION This action is in response to communications filed 1/26/2026: Claims 1-37 are pending 35 USC 112f interpretations are maintained Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant’s arguments with respect to claim(s) 1-37 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Response to Amendment Claim Interpretation The following is a quotation of 35 U.S.C. 112(f): (f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph: An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked. As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph: (A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; (B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and (C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. This application includes one or more claim limitations that use the word “means” or “step” but are nonetheless not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph because the claim limitation(s) recite(s) sufficient structure, materials, or acts to entirely perform the recited function. Such claim limitation(s) is/are: “means for determining…means for determining…means for rendering…and means for rendering…” in claim 33. Because this/these claim limitation(s) is/are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are not being interpreted to cover only the corresponding structure, material, or acts described in the specification as performing the claimed function, and equivalents thereof. If applicant intends to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitation(s) does/do not recite sufficient structure, materials, or acts to perform the claimed function. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-5, 7-14, 16-17, 19-21, and 23-37 is/are rejected under 35 U.S.C. 103 as being unpatentable over Walsh et al (US20200382894, hereinafter “Walsh”) in view of Zamani et al (US20240098444, hereinafter “Zamani”). Regarding claim 1, Walsh teaches a device (¶2, system) comprising: a memory configured to store first audio data and second audio data (¶38, storage); and one or more processors (¶38, processor) coupled to the memory and configured to: determine priorities of a plurality of audio sources of an audio scene (¶15, sound sources within user’s field of view (FOV) are rendered using a more complex method of rendering the audio vs those sources that are not in direct FOV); Walsh fails to explicitly teach determine whether a single renderer or multiple renderers that include an object renderer and a first ambisonics renderer are to be used to render the plurality of audio sources; and based on a determination that the multiple renderers are to be used to render the plurality of audio sources: render, using the object renderer, the first audio data to generate a first audio signal, wherein the first audio data represents a first audio source of the plurality of audio sources that is associated with a first priority; and render, using the first ambisonics renderer, the second audio data to generate a second audio signal, wherein the second audio data represents a second audio source of the plurality of audio sources that is associated with a second priority. Zamani teaches determine whether a single renderer or multiple renderers that include an object renderer and a first ambisonics renderer are to be used to render the plurality of audio sources (Fig. 4, ¶83-92, based on the previously determined priorities, objects are rendered using object renderer 418 or ambisonics renderer 454); and based on a determination that the multiple renderers are to be used to render the plurality of audio sources: render, using the object renderer, the first audio data to generate a first audio signal, wherein the first audio data represents a first audio source of the plurality of audio sources that is associated with a first priority (¶87, Fig. 4, using determined priorities of object data, the audio signal can be rendered using object renderer); and render, using the first ambisonics renderer, the second audio data to generate a second audio signal, wherein the second audio data represents a second audio source of the plurality of audio sources that is associated with a second priority (¶87, Fig. 4, using determined priorities of object data, the audio signal can be rendered using ambisonics renderer). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the audio rendering apparatus (as taught by Walsh) with the layout module (as taught by Zamani). The rationale to do so is to combine prior art elements according to known methods to yield the predictable result of outputting audio according to a desired output layout (Zamani, ¶34). Regarding claim 2, Walsh in view of Zamani teaches wherein the object renderer provides a higher spatial accuracy than the first ambisonics renderer (Walsh, ¶13, a combination of personalized HRTFs and frequency domain interpolation to provide improved performance in frontal localization and externalization vs just virtualization of speakers through Ambisonics (see Fig. 2)). Regarding claim 3, Walsh in view of Zamani teaches wherein the first ambisonics renderer uses fewer processing resources as compared to the object renderer (Walsh, ¶13, rendering frontal objects requiring more computational complexity vs virtualizing speakers using generalized HRTFs). Regarding claim 4, Walsh in view of Zamani teaches wherein the one or more processors are configured to: determine a field of view of a user; and assign the first priority to the first audio source based at least in part on a determination that a first source position of the first audio source is within the field of view (Walsh, Fig. 2, ¶15, using more complex processing to provide improved localized sounds in a user’s FOV). Regarding claim 5, Walsh in view of Zamani teaches wherein the field of view corresponds to a cone in forward-looking direction from the head of the user (Walsh, Figs. 2-3, ¶19, a user’s FOV is determined using a 60 degree cone of vision). Regarding claim 7, Walsh in view of Zamani teaches wherein the one or more processors are configured to assign a priority to an audio source based at least in part on a source position of the audio source, a source identifier of the audio source, a source type of the audio source, a source output of the audio source, a source localization angle, or a combination thereof (Walsh, ¶15, audio sources in the user’s FOV is given priority (by applying more computation resources to output a more accurate audio output)). Regarding claim 8, Walsh in view of Zamani teaches wherein the one or more processors are configured to assign a priority to an audio source based at least in part on an audio source position of the audio source in the audio scene, a visual source position of the audio source in a visual scene, or both (Walsh, ¶15, audio sources in the user’s FOV is given priority (by applying more computation resources to output a more accurate audio output)). Regarding claim 9, Walsh in view of Zamani teaches wherein the one or more processors are configured to: assign the first priority to the first audio source based at least in part on determining that the first audio source has a first source position within a central target region of a visual scene; and assign the second priority to the second audio source based at least in part on determining that the second audio source has a second source position within a peripheral target region of the visual scene (Walsh, ¶15, Fig. 2, sound sources not within user’s field of view (FOV) are rendered using a lesser complex method of rendering the audio and vice versa). Regarding claim 10, Walsh in view of Zamani teaches wherein the one or more processors are configured to: assign a third priority to a third audio source of the plurality of audio sources based at least in part on determining that the third audio source has a third source position that is in a particular target region between the central target region and the peripheral target region (Walsh, Fig. 2, ¶15, the entire audio scene can be broken down into 3 categories including a second category that describes audio sources that are in the peripherals of the user’s FOV); and render, using a second ambisonics renderer of the multiple renderers, third audio data to generate a third audio signal, wherein the third audio data represents the third audio source, and wherein the second ambisonics renderer is a higher-order ambisonics renderer than the first ambisonics renderer (Zamani, Fig. 4, ¶82, based on a set priority (similar to the priority taught in Walsh), a different renderer (i.e. ambisonics renderer) can be used to render one or more bitstreams). Regarding claim 11, Walsh in view of Zamani teaches wherein the one or more processors are configured to: based on determining that a first renderer priority of the object renderer matches the first priority of the first audio source, select the object renderer to render the first audio data; and based on determining that a second renderer priority of the first ambisonics renderer matches the second priority of the second audio source, select the first ambisonics renderer to render the second audio data (Zamani, ¶84, higher priority items may have a higher resolution (i.e. higher order ambisonics) versus those of a lower priority; Walsh, ¶15, based on the user’s FOV, different techniques with differing computation requirements can be applied to render audio objects). Regarding claim 12, Walsh in view of Zamani teaches wherein the one or more processors are configured to assign a priority to an audio source based at least in part on determining whether a source position of the audio source is within one or more target regions (Walsh, Fig. 2, ¶15, regions are based on the user’s FOV and includes those in the direct FOV, peripheral, or outside the FOV). Regarding claim 13, Walsh in view of Zamani teaches wherein the one or more target regions are based on at least one of a gaze direction of a user, a source localization angle, or a source output (Walsh, ¶15, regions are selected based on the user’s FOV/gaze). Regarding claim 14, Walsh in view of Zamani teaches wherein the one or more processors are configured to update the priorities based on a change in a source position, a change in a gaze direction of a user, a change in a source localization angle, a change in a source output, or a combination thereof (Walsh, Fig. 3, ¶20, a user’s gaze direction determines which audio object is in FOV and thus the priority is modified accordingly). Regarding claim 16, Walsh in view of Zamani teaches wherein the one or more processors are configured to determine the change in the source position based on detecting a movement of an audio source (Walsh, ¶36, 3D audio includes 3D positional data that are dynamic). Regarding claim 17, Walsh in view of Zamani teaches wherein the one or more processors are configured to mix the first audio signal and the second audio signal to generate an output audio signal (Walsh, ¶21, output signal can be a combination of one or more rendered audio objects). Regarding claim 19, Walsh in view of Zamani teaches wherein the one or more processors are configured to, based on determining that a multi-render criterion is satisfied, determine that the multiple renderers are to be used to render the plurality of audio sources (Zamani, Fig. 4, multiple renderers are available to be used in conjunction with a set priority). Regarding claim 20, Walsh in view of Zamani teaches wherein the one or more processors are configured to determine that the multi-render criterion is satisfied based on determining that a count of the audio sources is greater than a count threshold, that available memory is less than a memory threshold, that remaining battery charge is less than a battery threshold, that a user setting indicates that multiple renderers are to be used, that at least two of the audio sources have source positions in different target regions, or a combination thereof (Zamani, ¶28, available bandwidth can be used as criteria for determining multi-rendering). Regarding claim 21, Walsh in view of Zamani teaches wherein the one or more processors are configured to, based on determining that the multi-render criterion is not satisfied, transition from using the multiple renderers to using a single renderer to generate an output audio signal (Walsh, ¶14-15, Fig. 4, if no audio objects are found outside of the user’s FOV (only within the FOV) then only one type of rendering technique needs to be applied; Zamani, ¶28-29, available bandwidth can be used in consideration to determining which renderer to use). Regarding claim 23, Walsh in view of Zamani teaches further comprising one or more microphones, wherein the one or more processors are configured to receive the first audio data from the one or more microphones (Zamani, ¶37, one or more microphones to capture audio signals). Regarding claim 24, Walsh in view of Zamani teaches wherein the one or more processors are further configured to apply audio source extraction to audio data to generate the first audio data and the second audio data. (Zamani, ¶37, audio data can be obtained from the microphones and digitized to obtain the first and second audio data or audio can be obtained from an intermediary device). Regarding claim 25, it is rejected similarly as claim 1. The method can be found in Walsh (¶2, methods). Regarding claims 26-27, they are rejected similarly as claims 8-9, respectively. The method can be found in Walsh (¶2, methods). Regarding claims 28-30, they are rejected similarly as claims 12-14, respectively. The method can be found in Walsh (¶2, methods). Regarding claim 31, it is rejected similarly as claim 1. The medium can be found in Walsh (¶31, medium). Regarding claim 32, it is rejected similarly as claim 17. The medium can be found in Walsh (¶31, medium). Regarding claim 33, it is rejected similarly as claim 1. The medium can be found in Walsh (¶31, medium). Regarding claim 34, Walsh in view of Zamani teaches wherein the means for determining priorities, the means for determining whether the single means for rendering or the multiple means for rendering are to be used, the means for rendering first audio data, and the means for rendering second audio data are integrated into at least one of a communication device, a mobile device, a computer, a display device, a television, a gaming console, a music player, a vehicle, a headset, an augmented reality headset, a mixed reality headset, a virtual reality headset, ear phones, an aerial vehicle, a home automation system, a voice-activated device, a wireless speaker and voice activated device, or an internet-of-things (IoT) device (Walsh, ¶36, integration into a headset). Regarding claim 35, it is rejected similarly as a combination of claims 1 and 12. Regarding claim 36, Walsh in view of Zamani teaches wherein the one or more processors are configured to assign a priority to an audio source based at least in part on a source identifier of the audio source, a source type of the audio source, a source output of the audio source, a source localization angle, or a combination thereof (Zamani, ¶87, priority can be determined within the metadata or derived from the type that is associated with the object; Walsh, Fig. 2, a localization parameter can be used to determine a FOV and thus a priority during rendering). Regarding claim 37, Walsh in view of Zamani teaches wherein the one or more processors are configured to update the priorities based on user input (Walsh, Fig. 2, a user can move their head around to determine which audio objects are in view and thus determining their priority). Claim(s) 6, 15, 18, and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Walsh et al (US20200382894, hereinafter “Walsh”) in view of Zamani et al (US20240098444, hereinafter “Zamani”) in further view of Olivieri et al (US20210160644, hereinafter “Olivieri”). Regarding claim 6, Walsh in view of Zamani fail to explicitly teach wherein the one or more processors are configured to: estimate a head orientation of a user; based on the head orientation and a first source position of the first audio source, determine that the user is facing the first audio source; and assign the first priority to the first audio source based on the determination that the user is facing the first audio source. Olivieri teaches wherein the one or more processors are configured to: estimate a head orientation of a user (¶69, device capable of tracking a user’s movements; ¶78, 6DOF includes head orientation); based on the head orientation and a first source position of the first audio source, determine that the user is facing the first audio source (¶120, directional mapper determines which sounds are coming in front of the user); and assign the first priority to the first audio source based on the determination that the user is facing the first audio source (¶120, assigning a predetermined priority for objects in and out of a user’s FOV). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the audio rendering apparatus (as taught by Walsh in view of Zamani) with the head tracker (as taught by Olivieri). The rationale to do so is to combine prior art elements according to known methods to yield the predictable result of rendering audio according to a user’s head orientation so that priority can be given to audio objects in the user’s field of view (Olivieri, ¶120). Regarding claim 15, Walsh in view of Zamani in further view of Olivieri teaches wherein the one or more processors are configured to estimate the change in the gaze direction based on detecting a head rotation of the user (Olivieri, ¶57, 120, user’s 6 DOF (including rotation) is tracked to determine a gaze direction and to give priorities of the audio objects in the user’s gaze direction accordingly). Regarding claim 18, Walsh in view of Zamani in further view of Olivieri teaches wherein the one or more processors are configured to: apply a first gain to the first audio signal to generate a first gain adjusted signal; apply a second gain to the second audio signal to generate a second gain adjusted signal, wherein the first gain is higher than the second gain; and mix the first gain adjusted signal and the second gain adjusted signal to generate an output audio signal (Olivieri, ¶68, rendering of the audio signals can include determine which signal should be predominantly heard by the user (simplest way is by adjusting a volume)). Regarding claim 22, Walsh in view of Zamani in further view of Olivieri teaches wherein the first audio source is live, and wherein the second audio source is virtual (Olivieri, ¶71, bitstream could comprise both of captured audio streams (i.e. live audio streams) and synthesized audio streams (i.e. virtual audio streams)). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Refer to PTO-892, Notice of References Cited for a listing of analogous art. Any inquiry concerning this communication or earlier communications from the examiner should be directed to QIN ZHU whose telephone number is (571)270-1304. The examiner can normally be reached Monday-Thursday 6AM-4PM EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /QIN ZHU/Primary Examiner, Art Unit 2691
Read full office action

Prosecution Timeline

Aug 03, 2023
Application Filed
Jul 11, 2025
Non-Final Rejection — §103
Oct 14, 2025
Response Filed
Nov 19, 2025
Final Rejection — §103
Dec 19, 2025
Interview Requested
Dec 31, 2025
Applicant Interview (Telephonic)
Dec 31, 2025
Examiner Interview Summary
Jan 13, 2026
Response after Non-Final Action
Jan 26, 2026
Request for Continued Examination
Jan 30, 2026
Response after Non-Final Action
Mar 24, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12604125
DETECTING ACTIVE SPEAKERS USING HEAD DETECTION
2y 5m to grant Granted Apr 14, 2026
Patent 12603076
NOISE CONTROL SYSTEM, NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM INCLUDING A PROGRAM, AND NOISE CONTROL METHOD
2y 5m to grant Granted Apr 14, 2026
Patent 12597900
METHOD AND APPARATUS TO EVALUATE AUDIO EQUIPMENT FOR DYNAMIC DISTORTIONS AND OR DIFFERENTIAL PHASE AND OR FREQUENCY MODULATION EFFECTS
2y 5m to grant Granted Apr 07, 2026
Patent 12593169
DIRECTION-BASED FILTERING FOR AUDIO DEVICES USING TWO MICROPHONES
2y 5m to grant Granted Mar 31, 2026
Patent 12587805
SOUND-FIELD CONTROL METHOD AND DEVICE, ELECTRONIC DEVICE AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
88%
Grant Probability
90%
With Interview (+2.6%)
2y 1m
Median Time to Grant
High
PTA Risk
Based on 610 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month