Prosecution Insights
Last updated: April 19, 2026
Application No. 18/807,130

SPATIALLY AWARE AUDIO-AUGMENTED CONVERSATIONAL AGENTS

Non-Final OA §101§102§103
Filed
Aug 16, 2024
Examiner
AZAD, ABUL K
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
85%
Grant Probability
Favorable
1-2
OA Rounds
2y 6m
To Grant
99%
With Interview

Examiner Intelligence

Grants 85% — above average
85%
Career Allow Rate
665 granted / 781 resolved
+23.1% vs TC avg
Moderate +14% lift
Without
With
+14.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
21 currently pending
Career history
802
Total Applications
across all art units

Statute-Specific Performance

§101
16.6%
-23.4% vs TC avg
§103
36.6%
-3.4% vs TC avg
§102
28.4%
-11.6% vs TC avg
§112
5.1%
-34.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 781 resolved cases

Office Action

§101 §102 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is in response to the communication filed on August 16, 2024. Claims 1-20 are pending in this action. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., law of nature, a natural phenomenon, or an abstract idea) without significantly more. The claim(s) recite(s) an abstract idea of generate a training data set. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims directed to an abstract idea of generate a training data set. The claim is drawn to process/system (a series of steps or acts) that similar to an idea ‘Of itself such as an instantiated concept, plan or scheme, as well as a mental process (thinking) that “can be performed in the human mind, or by a human using a pen and paper’. The claim does not require that the method be implemented by a particular machine. The method does not require a particular transformation of a particular article. There is no transformation of a physical objects or data into a different state or thing. This generate a training data set is similar to delivering user-selected media content to a portable device found by the courts to be abstract idea (Affinity Labs of Tex., LLC v. Amazon.com Inc., 120 USPQ2d 1210 (Fed. Cir. 2016)) and also displaying certain results of the collection and analysis found by the courts to be abstract idea (Elec. Power Grp., LLC v. Alstom S.A., 119 USPQ2d 1739 (Fed. Cir. 2016). This judicial exception is not integrated into a practical application because claims broadly recites the result (generate a training data set, generate an encoded representation of multichannel audio, generate a training dataset for the machine-learning model, update, using the training data set …generate output corresponding to input special audio), rather than sufficiently claiming a technical means of achieving the result. See Two-Way Media Ltd. v. Comcast Cable Commons, LLC, 874 F.3d 1329, 1337 (Fed. Cir. 2017) (“The claim requires the functional results ... but does not sufficiently describe how to achieve these results in a non-abstract way.”). The claims recite a Judicial exception relating to “generate a training data set, along with a generic processors device that simply used as tool to implement the abstract idea”. Here the claims do not change the underlying or other technology, rather the claimed techniques playing using computing device as pedagogical tool. The claimed additional elements - -the processors device- -“merely use a processor as a tool to perform an abstract idea” or “do no more than generally link the use of a judicial exception to a particular technological environment.” Memorandum, 84 Fed. Reg. at 55; see Customedia Techs., LLC v. Dish Network Corp., No. 2018- 2239, 2020 WL 1069742, at *3 (Fed. Cir. Mar. 6, 2020) (“We have held that it is not enough, however, to merely improve a fundamental practice or abstract process by invoking a computer merely as a tool.”). Accordingly, claims 1-20 do not integrate the judicial exception into a practical application. See Memorandum, 84 Fed. Reg. at 54. As the claim recites a judicial exception and fails to integrate the exception into a practical application, the claim is “directed to the .. . judicial exception.” Id. at 54. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are simply a generic processor. The claim amounts to no more than generate a training data set. Taking the claimed elements either individually or as ordered combination, that transform claims into patent-eligible application, since claims merely recite use of already existing processors-based generate a training data set, and there is no “inventive concept” in play using processors/computing device well- understood, routine, and conventional activities commonly used in industry of machine-learning, since claims, at most, attempt to limit abstract idea to particular technological environment, and such limitation has been held insufficient to save claims in this context, and since dependent claims are not rendered patent-eligible by recitation of additional steps, such as per claims 2 and 12, recites “machine-learning model comprises language model”; per claims 3, 13, and 19, recites “machine-learning model to generate output text data relating to at least an audio source represented in the input spatial audio”; per claims 8, 14, recites “output data indicate of the special information based on the input video and the input audio”; even though additional limitations may narrow scope of claims. The claim as a whole does not amount to significantly more than the abstract idea itself. Accordingly, claims 1-20, are ineligible. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. Claim(s) 1, 5, 7, 11, 16, and 18 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Veluri et al. (WO 2024/254467 A2). As per claim 1, Veluri discloses, one or more processors (Fig. 1) comprising: one or more circuits (Paragraph 0045) to: generate an encoded representation of multichannel audio data corresponding to a machine-learning model (Paragraphs 0014, 0019, 0050 and 0075, “binaural audio”, “neural network”); generate a training dataset for the machine-learning model using the encoded representation (Paragraph 0047, neural network trained to extract target signals from the audio signals), the training dataset indicating spatial information for at least one audio source represented in the multichannel audio data (Paragraphs 0048, “spatial information”, “binaural input signal”); and update, using the training dataset, one or more parameters of the machine-learning model to generate output corresponding to input spatial audio (Paragraph 0074, “output”). As per claims 11 and 18, they are analyzed and thus rejected for the same reasons set forth in the rejection of claim 1, because the corresponding claims have similar limitations. As per claim 5, Veluri discloses, wherein the one or more circuits are to generate the multichannel audio data by applying a spatial transform operation to a plurality of audio sources (Paragraph 0048, “binaural audio). As per claim 7, Veluri discloses, wherein the one or more circuits are to update the one or more parameters of the machine-learning model to generate output spatial audio according to the input spatial audio (Paragraphs 0047-0048). As per claim 16, Veluri discloses, wherein the output data comprises one or more of a number of sound sources represented in the input audio, an estimated distance of a sound source represented in the input audio, or an estimated location of a sound source represented in the input audio (Paragraph 0031, here it is inherent in HRTF). Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 2-4, 8-10, 12-15, 17, 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Veluri et al. (WO 2024/254467 A2) in view of Zhao et al. (ChatBridge: Bridging modalities with Large Language Model as a language catalyst). As per claims 2 and 12, Veluri does not explicitly disclose, but Zhao discloses, wherein the machine-learning model (or neural network) comprises at least one of a large language model (LLM), a vision language model (VLM), or a multi-modal language model (MMLM) (Abstract). As per claims 3, 13, and 19, Veluri does not explicitly disclose, but Zho discloses, wherein the spatial information comprises text data, and wherein the one or more circuits are to update the one or more parameters of the machine-learning model to generate output text data relating to at least an audio source represented in the input spatial audio (Section 2.1 Multimodal learning and Fig. 2, output as “language response”). As per claim 4, Veluri does not explicitly disclose, but Zho discloses, wherein the output text data identifies one or more of a distance to the audio source represented in the input spatial audio, a number of audio sources represented in the input spatial audio, or a transcription or diarization output of speech from a moving audio source represented in the input spatial audio (Fig. 1). As per claims 8 and 14, Veluri does not explicitly disclose, but Zho discloses, wherein the one or more circuits are to: generate the training dataset to include an encoded representation of video data; and update, using the training dataset, the one or more parameters of the machine-learning model to generate output spatial audio tracking at least one audio source depicted in the video data (Section 2.1 Multimodal learning and Fig. 2). As per claim 9, Veluri does not explicitly disclose, but Zho discloses, wherein the one or more circuits are to update the one or more parameters of the machine-learning model to receive single channel audio data and the encoded representation of the video data to generate the output spatial audio (Fig. 2). As per claims 10 and 17, Veluri does not explicitly disclose, but Zho discloses, wherein the one or more processors are comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system for performing generative AI operations using a language model; a system for performing generative AI operations using a large language model (LLM); a system for performing generative AI operations using a vision language model (VLM); a system for performing generative AI operations using a multi-modal language model; a system for generating synthetic data; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources (Abstract). As per claim 15, Veluri does not explicitly disclose, but Zho discloses, wherein the output data comprises an encoded output of the language model, and the one or more processors are to: generate output multichannel audio based on the encoded output of the language model (Fig. 2). As per claim 20, Veluri does not explicitly disclose, but Zho discloses, wherein the output text data identifies one or more of a distance to the audio source represented in the input spatial audio, a number of audio sources represented in the input spatial audio, or a transcription of speech from a moving audio source represented in the input spatial audio (Fig. 2). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Veluri by including multimodalities as taught by Zhao so as to provide quantitative and qualitive results on zero shot multimodal tasks (Abstract). Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Veluri et al. (WO 2024/254467 A2) in view of Zhao et al. (ChatBridge: Bridging modalities with Large Language Model as a language catalyst) further in view of McCauley et al. (US 2017/0293461). As per claim 6, Veluri in view of Zhao do not explicitly disclose, but McCauley discloses, wherein the spatial transform operation generates the multichannel audio data as B-format audio (Paragraph 0002). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Veluri in view of Zhao by including B-format audio as taught by McCauley so as to provide representation of the video signal may be used as a frame of reference for location of the audio sources (Abstract). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Rees (US 2021/0081830) discloses, encoding machine-learning models and determining ownership of machine-learning models. Rubenstein et al. (US 2024/0428056) discloses, performing tasks using generative neural networks. Contact Information Any inquiry concerning this communication or earlier communications from the examiner should be directed to Abul K. Azad whose telephone number is (571) 272-7599. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Bhavesh Mehta, can be reached at (571) 272-7453. Any response to this action should be mailed to: Commissioner for Patents P.O. Box 1450 Alexandria, VA 22313-1450 Or faxed to: (571) 273-8300. Hand-delivered responses should be brought to 401 Dulany Street, Alexandria, VA-22314 (Customer Service Window). Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). March 16, 2026 /ABUL K AZAD/Primary Examiner, Art Unit 2656
Read full office action

Prosecution Timeline

Aug 16, 2024
Application Filed
Mar 16, 2026
Non-Final Rejection — §101, §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12603098
AUTOMATIC GAIN CONTROL METHOD AND APPARATUS FOR VOICE INTERACTION SYSTEM, AND SYSTEM
2y 5m to grant Granted Apr 14, 2026
Patent 12592236
VOICE INTERACTION METHOD AND APPARATUS
2y 5m to grant Granted Mar 31, 2026
Patent 12586582
APPARATUS PERFORMING BASED ON VOICE RECOGNITION AND ARTIFICIAL INTELLIGENCE AND METHOD FOR CONTROLLING THEREOF
2y 5m to grant Granted Mar 24, 2026
Patent 12586587
METHOD FOR ANALYZING USER UTTERANCE BASED ON UTTERANCE CACHE AND ELECTRONIC DEVICE SUPPORTING THE SAME
2y 5m to grant Granted Mar 24, 2026
Patent 12573399
DISPLAY CONTROL DEVICE AND DISPLAY CONTROL METHOD
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
85%
Grant Probability
99%
With Interview (+14.3%)
2y 6m
Median Time to Grant
Low
PTA Risk
Based on 781 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month