Last updated: April 19, 2026

Application No. 18/807,130

SPATIALLY AWARE AUDIO-AUGMENTED CONVERSATIONAL AGENTS

Non-Final OA §101§102§103

Filed

Aug 16, 2024

Examiner

AZAD, ABUL K

Art Unit

2656

Tech Center

2600 — Communications

Assignee

Nvidia Corporation

OA Round

1 (Non-Final)

Interview Optional

— +14.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 781 resolved cases, 2023–2026

Examiner Intelligence

AZAD, ABUL K View full profile →

Grants 85% — above average

Career Allow Rate

665 granted / 781 resolved

+23.1% vs TC avg

Moderate +14% lift

Without

With

+14.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 6m

Avg Prosecution

21 currently pending

Career history

802

Total Applications

across all art units

Statute-Specific Performance

§101

16.6%

-23.4% vs TC avg

§103

36.6%

-3.4% vs TC avg

§102

28.4%

-11.6% vs TC avg

§112

5.1%

-34.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 781 resolved cases

Office Action

§101 §102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the communication filed on August 16, 2024.
Claims 1-20 are pending in this action. 



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., law of nature, a natural phenomenon, or an abstract idea) without significantly more. The claim(s) recite(s) an abstract idea of generate a training data set. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims directed to an abstract idea of generate a training data set. The claim is drawn to process/system (a series of steps or acts) that similar to an idea ‘Of itself such as an instantiated concept, plan or scheme, as well as a mental process (thinking) that “can be performed in the human mind, or by a human using a pen and paper’.
The claim does not require that the method be implemented by a particular machine. The method does not require a particular transformation of a particular article. There is no transformation of a physical objects or data into a different state or thing. This generate a training data set is similar to delivering user-selected media content to a portable device found by the courts to be abstract idea (Affinity Labs of Tex., LLC v. Amazon.com Inc., 120 USPQ2d 1210 (Fed. Cir. 2016)) and also displaying certain results of the collection and analysis found by the courts to be abstract idea (Elec. Power Grp., LLC v. Alstom S.A., 119 USPQ2d 1739 (Fed. Cir. 2016).
This judicial exception is not integrated into a practical application because claims broadly recites the result (generate a training data set, generate an encoded representation of multichannel audio, generate a training dataset for the machine-learning model, update, using the training data set …generate output corresponding to input special audio), rather than sufficiently claiming a technical means of achieving the result. See Two-Way Media Ltd. v. Comcast Cable Commons, LLC, 874 F.3d 1329, 1337 (Fed. Cir. 2017) (“The claim requires the functional results ... but does not sufficiently describe how to achieve these results in a non-abstract way.”).
The claims recite a Judicial exception relating to “generate a training data set, along with a generic processors device that simply used as tool to implement the abstract idea”. Here the claims do not change the underlying or other technology, rather the claimed techniques playing using computing device as pedagogical tool. The claimed additional elements - -the processors device- -“merely use a processor as a tool to perform an abstract idea” or “do no more than generally link the use of a judicial exception to a particular technological environment.” Memorandum, 84 Fed. Reg. at 55; see Customedia Techs., LLC v. Dish Network Corp., No. 2018- 2239, 2020 WL 1069742, at *3 (Fed. Cir. Mar. 6, 2020) (“We have held that it is not enough, however, to merely improve a fundamental practice or abstract process by invoking a computer merely as a tool.”).
Accordingly, claims 1-20 do not integrate the judicial exception into a practical application. See Memorandum, 84 Fed. Reg. at 54. As the claim recites a judicial exception and fails to integrate the exception into a practical application, the claim is “directed to the .. . judicial exception.” Id. at 54.

The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements are simply a generic processor. The claim amounts to no more than generate a training data set. Taking the claimed elements either individually or as ordered combination, that transform claims into patent-eligible application, since claims merely recite use of already existing processors-based generate a training data set, and there is no “inventive concept” in play using processors/computing device well- understood, routine, and conventional activities commonly used in industry of machine-learning, since claims, at most, attempt to limit abstract idea to particular technological environment, and such limitation has been held insufficient to save claims in this context, and since dependent claims are not rendered patent-eligible by recitation of additional steps, such as per claims 2 and 12, recites “machine-learning model comprises language model”; per claims 3, 13, and 19, recites “machine-learning model to generate output text data relating to at least an audio source represented in the input spatial audio”; per claims 8, 14, recites “output data indicate of the special information based on the input video and the input audio”; even though additional limitations may narrow scope of claims. The claim as a whole does not amount to significantly more than the abstract idea itself. Accordingly, claims 1-20, are ineligible.



Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 5, 7, 11, 16, and 18 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Veluri et al. (WO 2024/254467 A2).
As per claim 1, Veluri discloses, one or more processors (Fig. 1) comprising: 
one or more circuits (Paragraph 0045) to: 
generate an encoded representation of multichannel audio data corresponding to a machine-learning model (Paragraphs 0014, 0019, 0050 and 0075, “binaural audio”, “neural network”); 
generate a training dataset for the machine-learning model using the encoded representation (Paragraph 0047, neural network trained to extract target signals from the audio signals), the training dataset indicating spatial information for at least one audio source represented in the multichannel audio data (Paragraphs 0048, “spatial information”, “binaural input signal”); and 
update, using the training dataset, one or more parameters of the machine-learning model to generate output corresponding to input spatial audio (Paragraph 0074, “output”).  
	As per claims 11 and 18, they are analyzed and thus rejected for the same reasons set forth in the rejection of claim 1, because the corresponding claims have similar limitations.
As per claim 5, Veluri discloses, wherein the one or more circuits are to generate the multichannel audio data by applying a spatial transform operation to a plurality of audio sources (Paragraph 0048, “binaural audio).
As per claim 7, Veluri discloses, wherein the one or more circuits are to update the one or more parameters of the machine-learning model to generate output spatial audio according to the input spatial audio (Paragraphs 0047-0048).
As per claim 16, Veluri discloses, wherein the output data comprises one or more of a number of sound sources represented in the input audio, an estimated distance of a sound source represented in the input audio, or an estimated location of a sound source represented in the input audio (Paragraph 0031, here it is inherent in HRTF).

	
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2-4, 8-10, 12-15, 17, 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Veluri et al. (WO 2024/254467 A2) in view of Zhao et al. (ChatBridge: Bridging modalities with Large Language Model as a language catalyst).
As per claims 2 and 12, Veluri does not explicitly disclose, but Zhao discloses, wherein the machine-learning model (or neural network) comprises at least one of a large language model (LLM), a vision language model (VLM), or a multi-modal language model (MMLM) (Abstract).  
As per claims 3, 13, and 19, Veluri does not explicitly disclose, but Zho discloses, wherein the spatial information comprises text data, and wherein the one or more circuits are to update the one or more parameters of the machine-learning model to generate output text data relating to at least an audio source represented in the input spatial audio (Section 2.1 Multimodal learning and Fig. 2, output as “language response”).  
As per claim 4, Veluri does not explicitly disclose, but Zho discloses, wherein the output text data identifies one or more of a distance to the audio source represented in the input spatial audio, a number of audio sources represented in the input spatial audio, or a transcription or diarization output of speech from a moving audio source represented in the input spatial audio (Fig. 1).  
 	As per claims 8 and 14, Veluri does not explicitly disclose, but Zho discloses, wherein the one or more circuits are to: generate the training dataset to include an encoded representation of video data; and update, using the training dataset, the one or more parameters of the machine-learning model to generate output spatial audio tracking at least one audio source depicted in the video data (Section 2.1 Multimodal learning and Fig. 2).  
As per claim 9, Veluri does not explicitly disclose, but Zho discloses, wherein the one or more circuits are to update the one or more parameters of the machine-learning model to receive single channel audio data and the encoded representation of the video data to generate the output spatial audio (Fig. 2).  
As per claims 10 and 17, Veluri does not explicitly disclose, but Zho discloses, wherein the one or more processors are comprised in at least one of: 
a control system for an autonomous or semi-autonomous machine;
a perception system for an autonomous or semi-autonomous machine;
a system for performing simulation operations;
a system for performing digital twin operations;
a system for performing light transport simulation;
a system for performing collaborative content creation for 3D assets;
a system for performing deep learning operations;
a system implemented using an edge device;
a system implemented using a robot;
a system for performing conversational AI operations; 
a system for performing generative AI operations using a language model;
a system for performing generative AI operations using a large language model (LLM);
a system for performing generative AI operations using a vision language model (VLM);
a system for performing generative AI operations using a multi-modal language model;
a system for generating synthetic data;
a system incorporating one or more virtual machines (VMs);
a system implemented at least partially in a data center; or
a system implemented at least partially using cloud computing resources (Abstract).
As per claim 15, Veluri does not explicitly disclose, but Zho discloses, wherein the output data comprises an encoded output of the language model, and the one or more processors are to: generate output multichannel audio based on the encoded output of the language model (Fig. 2).  
As per claim 20, Veluri does not explicitly disclose, but Zho discloses, wherein the output text data identifies one or more of a distance to the audio source represented in the input spatial audio, a number of audio sources represented in the input spatial audio, or a transcription of speech from a moving audio source represented in the input spatial audio (Fig. 2).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Veluri by including multimodalities as taught by Zhao so as to provide quantitative and qualitive results on zero shot multimodal tasks (Abstract).


Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Veluri et al. (WO 2024/254467 A2) in view of Zhao et al. (ChatBridge: Bridging modalities with Large Language Model as a language catalyst) further in view of McCauley et al. (US 2017/0293461).
As per claim 6, Veluri in view of Zhao do not explicitly disclose, but McCauley discloses, wherein the spatial transform operation generates the multichannel audio data as B-format audio (Paragraph 0002).  
 Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Veluri in view of Zhao by including B-format audio as taught by McCauley so as to provide representation of the video signal may be used as a frame of reference for location of the audio sources (Abstract).
 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Rees (US 2021/0081830) discloses, encoding machine-learning models and determining ownership of machine-learning models.
Rubenstein et al. (US 2024/0428056) discloses, performing tasks using generative neural networks.


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Abul K. Azad whose telephone number is (571) 272-7599. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Bhavesh Mehta, can be reached at (571) 272-7453.
Any response to this action should be mailed to:
Commissioner for Patents 
P.O. Box 1450
Alexandria, VA 22313-1450
Or faxed to: (571) 273-8300.
Hand-delivered responses should be brought to 401 Dulany Street, Alexandria, VA-22314 (Customer Service Window).
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
					
March 16, 2026	

/ABUL K AZAD/Primary Examiner, Art Unit 2656

Read full office action

Prosecution Timeline

Aug 16, 2024

Application Filed

Mar 16, 2026

Non-Final Rejection — §101, §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/293,643

Patent 12603098

AUTOMATIC GAIN CONTROL METHOD AND APPARATUS FOR VOICE INTERACTION SYSTEM, AND SYSTEM

2y 5m to grant Granted Apr 14, 2026

18/488,647

Patent 12592236

VOICE INTERACTION METHOD AND APPARATUS

2y 5m to grant Granted Mar 31, 2026

18/378,371

Patent 12586582

APPARATUS PERFORMING BASED ON VOICE RECOGNITION AND ARTIFICIAL INTELLIGENCE AND METHOD FOR CONTROLLING THEREOF

2y 5m to grant Granted Mar 24, 2026

18/378,997

Patent 12586587

METHOD FOR ANALYZING USER UTTERANCE BASED ON UTTERANCE CACHE AND ELECTRONIC DEVICE SUPPORTING THE SAME

2y 5m to grant Granted Mar 24, 2026

18/565,573

Patent 12573399

DISPLAY CONTROL DEVICE AND DISPLAY CONTROL METHOD

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

85%

Grant Probability

99%

With Interview (+14.3%)

2y 6m

Median Time to Grant

Low

PTA Risk

Based on 781 resolved cases by this examiner. Grant probability derived from career allow rate.