Last updated: April 19, 2026
Application No. 18/627,669
DEVICES, METHODS, AND GRAPHICAL USER INTERFACES FOR MANAGING AUDIO SOURCES

Non-Final OA §102§103§112
Filed
Apr 05, 2024
Examiner
REPSHER III, JOHN T
Art Unit
2143
Tech Center
2100 — Computer Architecture & Software
Assignee
Apple Inc.
OA Round
1 (Non-Final)
This examiner grants 58% of cases after interview

— +48.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 347 resolved cases, 2023–2026
Examiner Intelligence

REPSHER III, JOHN T View full profile →
Grants 58% of resolved cases
Career Allow Rate
203 granted / 347 resolved
+3.5% vs TC avg
Strong +48% interview lift
Without
With
+48.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
18 currently pending
Career history
365
Total Applications
across all art units
Statute-Specific Performance

§101
8.9%
-31.1% vs TC avg
§103
49.6%
+9.6% vs TC avg
§102
12.7%
-27.3% vs TC avg
§112
20.6%
-19.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 347 resolved cases
Office Action

§102 §103 §112
DETAILED ACTION
This action is in response to the original filing on 04/05/2024 and the preliminary amendment filed 05/10/2024.  Claims 36-61 and 66-67 are pending and have been considered below.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 36, 66, and 67 are objected to because of the following informalities:  
Claims 36, 66, and 67 recite ‘outputting audio corresponding to the first object’; however, it should recite - -  outputting the audio corresponding to the first object - -.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 36-61 and 66-67 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 36, claim 36 recites “while a first object is visible, via the display generation component, outputting, via the one or more audio output devices, audio corresponding to the first object at a first prominence of audio output”.  The relationship between these elements is unclear.  It is unclear whether “via the display component” modifies “while a first object is visible” or “outputting”.  For the purposes of examination, this limitation is interpreted as: while a first object is visible on a display generation component, outputting, via the one or more audio output devices, audio corresponding to the first object at a first prominence of audio output

Claim 36 further recites “detecting an occurrence of an event that includes detecting attention of a user moving away from being directed to the first object”.  It is unclear whether “that includes detecting attention of a user moving away from being directed to the first object” modifies the detecting or the event.  For the purposes of examination, this limitation is interpreted as: detecting an occurrence of an event, wherein the event includes detecting attention of a user moving away from being directed to the first object

Regarding claims 66 and 67, claims 66 and 67 contains substantially similar limitations to those found in claim 36.  Consequently, claims 66 and 67 are rejected for the same reasons.

Regarding claim 52, claim 52 recites “while a fifth object different from the first object is visible, via the display generation component, outputting, via the one or more audio output devices, audio corresponding to the fifth object at a third prominence of audio output”.  The relationship between these elements is unclear.  It is unclear whether “via the display component” modifies “while a fifth object different from the first object is visible” or “outputting”.  For the purposes of examination, this limitation is interpreted as: while a fifth object different from the first object is visible on the display generation component, outputting, via the one or more audio output devices, audio corresponding to the fifth object at a third prominence of audio output

Claim 52 further recites “detecting an occurrence of a third event that includes detecting the attention of a user moving away from being directed to the first object”.  It is unclear whether “that includes detecting attention of a user moving away from being directed to the first object” modifies the detecting or the third event.  For the purposes of examination, this limitation is interpreted as: detecting an occurrence of a third event, wherein the third event includes detecting attention of a user moving away from being directed to the first object

Regarding claim 53, claim 53 recites “wherein the first type of object is an object that corresponds to a music application, a video application, a spoken audio application, a communication application, or one or more combinations thereof”.  It is unclear to which previous limitations “one or more combinations thereof” refers.  For the purposes of examination, this limitation is interpreted as: wherein the first type of object is an object that corresponds to one or more of a music application, a video application, a spoken audio application, or a communication application

Regarding claim 56, claim 56 recites “an object that is the first type of object different from the fifth object”.  It is unclear how “the first type of object different from the fifth object” is intended to refer to the previously recited first type of object.  For the purposes of examination, this limitation is interpreted as: an object that is a second first type of object different from the fifth object

Regarding claims 37-61, claims 37-61 are also rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for depending on an indefinite parent claim.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 36, 37, 39, 40, 42, 43, 47, 52-55, 57-61, 66, and 67 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Romblom et al. (WO 2022178194 A1, published 08/25/2022), hereinafter Romblom.

	Regarding claim 67, Romblom teaches the claim comprising:
	A computer system that is in communication with one or more audio output devices and a display generation component, comprising: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for (Romblom Figs. 1-6; [0026], audio channels can be used to drive left and right speakers of a headphone set to form a spatialized audio environment through what is known as binaural audio; [0058], The audio processing system 150 includes one or more buses 162 that serve to interconnect the various components of the system. One or more processors 152 are coupled to bus 162 as is known in the art; [0059], the processor 152 retrieves computer program instructions stored in a machine readable storage medium (memory) and executes those instructions to perform operations described herein):
while a first object is visible, via the display generation component, outputting, via the one or more audio output devices, audio corresponding to the first object at a first prominence of audio output (Romblom Figs. 1-6; [0006], a head mounted display; When a sound source is not the subject of a user's attention, for example, the sound source leaves the user's field of view, such sound sources can be spatially rendered in a manner that is less distracting to a user; [0007], Sensors can track a user’s head motion, gaze, hand gestures, or other input that indicates a user’s attention to a sound source; [0021], a user’s eye can be tracked with image sensors to determine gaze; [0023], as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0038], The binaural Tenderer 30 can generate spatialized audio for each audio source and combine them to form spatialized audio content 31. The spatialized audio acoustically resembles the locations of sound sources shown on display 38; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0054], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely);
while outputting audio corresponding to the first object at the first prominence of audio output, detecting an occurrence of an event that includes detecting attention of a user moving away from being directed to the first object; and in response to detecting the occurrence of the event that includes detecting the attention of the user moving away from being directed to the first object, outputting, via the one or more audio output devices, the audio corresponding to the first object at a second prominence of audio output that is lower than the first prominence of audio output (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0031], the method can be repeated periodically to continuously update the spatial rendering of sound sources even as the user’s attention shifts throughout a session; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0040], Figures 3-6 show how decorrelation of sounds that are not the subject of the user’s attention can be performed and controlled in different manners; [0041], as shown in Figure 3, the decorrelation of a sound source can be inversely proportional to the measure of the user attention; [0042], decorrelation level can vary based on content type, recognizing that some content (e.g., speech) is more distracting than others (e.g., running water); [0043], As shown in Figure 5, the relationship between decorrelation of the sound source and the measure of user attention can be non-linear; [0044], Figure 6 shows a two-state relationship to decorrelation where if a measure of the user’s attention satisfies a threshold then it is decorrelated. Otherwise, the sound is not decorrelated; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-007])

Regarding claims 36 and 66, claims 36 and 66 contains substantially similar limitations to those found in claim 67.  Consequently, claims 36 and 66 are rejected for the same reasons.

Regarding claim 37, Romblom teaches all the limitations of claim 36, further comprising:
wherein detecting the attention of the user moving away from being directed to the first object includes detecting a first gaze input of the user moving from a first gaze position to a second gaze position, and wherein the first gaze position corresponds to a location corresponding to the first object and the second gaze position corresponds to a location that does not correspond to the first object (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])

Regarding claim 39, Romblom teaches all the limitations of claim 36, further comprising:
wherein detecting the attention of the user moving away from being directed to the first object includes detecting that a second object, different from the first object, is becoming active (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0031], the method can be repeated periodically to continuously update the spatial rendering of sound sources even as the user’s attention shifts throughout a session; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0040-0045])

Regarding claim 40, Romblom teaches all the limitations of claim 39, further comprising:
wherein detecting that the second object is becoming active includes detecting a third gaze input directed to the second object (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])

Regarding claim 42, Romblom teaches all the limitations of claim 39, further comprising:
wherein detecting that the second object is becoming active includes detecting an input that is directed to the second object (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])

Regarding claim 43, Romblom teaches all the limitations of claim 36, further comprising:
wherein detecting the attention of the user moving away from being directed to the first object includes detecting that the first object is becoming inactive (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])

Regarding claim 47, Romblom teaches all the limitations of claim 36, further comprising:
wherein detecting the attention of the user moving away from being directed to the first object includes detecting an interaction with a fourth object different from the first object (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0033], Attention to a sound source can be determined from interaction with the sound source, for example, by selecting an application, or interacting with an object that is associated with the sound source; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0051], if a user manipulates a control (e.g., a dial or button) on the movie player, the measure of the user attention to the movie player can become high; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])

Regarding claim 52, Romblom teaches all the limitations of claim 36, further comprising:
further comprising: while a fifth object different from the first object is visible, via the display generation component, outputting, via the one or more audio output devices, audio corresponding to the fifth object at a third prominence of audio output; while outputting audio corresponding to the fifth object at a third prominence of audio output, detecting an occurrence of a third event that includes detecting the attention of a user moving away from being directed to the first object; and in response to detecting the occurrence of the third event that includes detecting the attention of the user moving away from being directed to the first object: in accordance with a determination that the fifth object corresponds to a first type of object, continuing to output audio corresponding to the fifth object at the third prominence of audio output; and in accordance with a determination that the fifth object corresponds to a second type of object different from the first type of object, forgoing outputting audio corresponding to the fifth object at the third prominence of audio output (Romblom Figs. 1-9; [0006], a head mounted display; When a sound source is not the subject of a user's attention, for example, the sound source leaves the user's field of view, such sound sources can be spatially rendered in a manner that is less distracting to a user; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases; [0031], the method can be repeated periodically to continuously update the spatial rendering of sound sources even as the user’s attention shifts throughout a session; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0040], Figures 3-6 show how decorrelation of sounds that are not the subject of the user’s attention can be performed and controlled in different manners; [0041], as shown in Figure 3, the decorrelation of a sound source can be inversely proportional to the measure of the user attention; [0042], decorrelation level can vary based on content type, recognizing that some content (e.g., speech) is more distracting than others (e.g., running water); [0043], As shown in Figure 5, the relationship between decorrelation of the sound source and the measure of user attention can be non-linear; [0044], Figure 6 shows a two-state relationship to decorrelation where if a measure of the user’s attention satisfies a threshold then it is decorrelated. Otherwise, the sound is not decorrelated; [0046], different control relationships can be applied for different sound sources. Different control relationships can be determined based on a type of the sound source; for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type. The criteria can be stored as metadata. For example, the importance level from some sound sources such as an alert from a messenger application may be deemed as high importance because such a sound source is associated with communication with the user, which may be of higher priority. Thus, even though another application might be subject to the user’s attention, some sound sources, such as alerts, calls, alarms, system notifications, can be rendered spatially without decorrelation; [0054], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; the loudness level of sound source 44 might be below a threshold so that the system need not decorrelate sound source 44, thus saving computational resources; [0055], if a virtual distance between the user and sound source 45 is beyond the distance threshold (e.g., 5 virtual meters, 8 virtual meters, or 10 virtual meters) then the spatialization of the sound source according to room acoustics may sufficiently decorrelate the sound. In such a case, performing decorrelation may be wasteful of computational resources; as described, user attention can be quantified as one or more values that may be increased or decreased dependent on the user’s gaze direction ([0023]); depending on the type of source, decorrelation may change linearly with the attention value or remain static until an attention threshold is crossed ([0041-0044]); some types of sources are not decorrelated dependent on the user’s changing gaze direction, which shifts throughout the session ([0031], [0050], [0054-0055])

Regarding claim 53, Romblom teaches all the limitations of claim 52, further comprising:
wherein the first type of object is an object that corresponds to a music application, a video application, a spoken audio application, a communication application, or one or more combinations thereof (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0040-0042], decorrelation level can vary based on content type, recognizing that some content (e.g., speech) is more distracting than others (e.g., running water); [0043], As shown in Figure 5, the relationship between decorrelation of the sound source and the measure of user attention can be non-linear; [0044], Figure 6 shows a two-state relationship to decorrelation where if a measure of the user’s attention satisfies a threshold then it is decorrelated. Otherwise, the sound is not decorrelated; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated; [0050], even though another application might be subject to the user’s attention, some sound sources, such as alerts, calls, alarms, system notifications, can be rendered spatially without decorrelation; see also [0006], [0031], [0054-0055])

Regarding claim 54, Romblom teaches all the limitations of claim 36, further comprising:
further comprising: detecting a request to display a system user interface; and in response to detecting the request to display the system user interface, displaying, via the display generation component, the system user interface, including: a first control that, when selected, causes the computer system to adjust output of audio corresponding to the fifth object; and a second control that, when selected, causes the computer system to adjust output of audio corresponding to a sixth object, wherein the sixth object is the first type of object (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0031], the method can be repeated periodically to continuously update the spatial rendering of sound sources even as the user’s attention shifts throughout a session; [0033], Attention to a sound source can be determined from interaction with the sound source, for example, by selecting an application, or interacting with an object that is associated with the sound source; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0051], if a user manipulates a control (e.g., a dial or button) on the movie player, the measure of the user attention to the movie player can become high; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0040-0045])

Regarding claim 55, Romblom teaches all the limitations of claim 54, further comprising:
wherein detecting the request to display the system user interface includes detecting a fifth gaze input at a fixed location in a viewport of a three-dimensional environment, and wherein the three-dimensional environment is visible in the viewport of the three-dimensional environment (Romblom Figs. 1-9; [0019], One or more sound sources can have virtual locations in a spatial audio environment. They can be displayed in a visual environment, such as, for example, in 3D extended reality (XR) setting; [0023], as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; Suitable spatial filters are determined for each sound source based on the user’s head position (which can include spherical coordinates such as azimuth and elevation, and/or a three- dimensional position such as X, Y, and Z) and corresponding sound source location; [0039], The sound sources can be displayed in a three dimensional spatial environment (e.g., in an extended reality environment); [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0053], A user can move about in the XR setting, such that the user’s position, head position, and/or gaze in the XR setting can be tracked and updated; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])

Regarding claim 57, Romblom teaches all the limitations of claim 36, further comprising:
further comprising: in response to detecting the occurrence of the event that includes detecting the attention of the user moving away from being directed to the first object, outputting, via the one or more audio output devices, audio corresponding to a seventh object at a fourth prominence of audio output that is higher than a fifth prominence of audio output, wherein the fifth prominence of audio output is a prominence of audio output at which audio corresponding to the seventh object was output before detecting the occurrence of the event that includes detecting the attention of the user moving away from being directed to the first object, wherein the seventh object is different from the first object (Romblom Figs. 1-9; [0006], a head mounted display; When a sound source is not the subject of a user's attention, for example, the sound source leaves the user's field of view, such sound sources can be spatially rendered in a manner that is less distracting to a user; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases; [0031], the method can be repeated periodically to continuously update the spatial rendering of sound sources even as the user’s attention shifts throughout a session; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0040], Figures 3-6 show how decorrelation of sounds that are not the subject of the user’s attention can be performed and controlled in different manners; [0041], as shown in Figure 3, the decorrelation of a sound source can be inversely proportional to the measure of the user attention; [0042], decorrelation level can vary based on content type, recognizing that some content (e.g., speech) is more distracting than others (e.g., running water); [0043], As shown in Figure 5, the relationship between decorrelation of the sound source and the measure of user attention can be non-linear; [0044], Figure 6 shows a two-state relationship to decorrelation where if a measure of the user’s attention satisfies a threshold then it is decorrelated. Otherwise, the sound is not decorrelated; [0046], different control relationships can be applied for different sound sources. Different control relationships can be determined based on a type of the sound source; for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type. The criteria can be stored as metadata. For example, the importance level from some sound sources such as an alert from a messenger application may be deemed as high importance because such a sound source is associated with communication with the user, which may be of higher priority. Thus, even though another application might be subject to the user’s attention, some sound sources, such as alerts, calls, alarms, system notifications, can be rendered spatially without decorrelation; [0054], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; the loudness level of sound source 44 might be below a threshold so that the system need not decorrelate sound source 44, thus saving computational resources; [0055], if a virtual distance between the user and sound source 45 is beyond the distance threshold (e.g., 5 virtual meters, 8 virtual meters, or 10 virtual meters) then the spatialization of the sound source according to room acoustics may sufficiently decorrelate the sound. In such a case, performing decorrelation may be wasteful of computational resources)

Regarding claim 58, Romblom teaches all the limitations of claim 36, further comprising:
further comprising: while outputting audio corresponding to the first object at the first prominence of audio output, outputting audio corresponding to an eighth object different from the first object at a sixth prominence of audio output that is different from the first prominence of audio output (Romblom Figs. 1-9; [0023], as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases; [0031], the method can be repeated periodically to continuously update the spatial rendering of sound sources even as the user’s attention shifts throughout a session; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], different control relationships can be applied for different sound sources. Different control relationships can be determined based on a type of the sound source; for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type. The criteria can be stored as metadata. For example, the importance level from some sound sources such as an alert from a messenger application may be deemed as high importance because such a sound source is associated with communication with the user, which may be of higher priority. Thus, even though another application might be subject to the user’s attention, some sound sources, such as alerts, calls, alarms, system notifications, can be rendered spatially without decorrelation; [0054], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; the loudness level of sound source 44 might be below a threshold so that the system need not decorrelate sound source 44, thus saving computational resources; see also [0006], [0040-0045])

Regarding claim 59, Romblom teaches all the limitations of claim 36, further comprising:
wherein the first object is an application window (Romblom Figs. 1-9; [0023], as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0032], A sound source can be represented visually by an object, which can be, for example, an application window, a graphic, an avatar, an animation, an image, or other computer- rendered object; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])

Regarding claim 60, Romblom teaches all the limitations of claim 53, further comprising:
wherein the object is visible in a three-dimensional environment (Romblom Figs. 1-9; [0019], One or more sound sources can have virtual locations in a spatial audio environment. They can be displayed in a visual environment, such as, for example, in 3D extended reality (XR) setting; [0023], as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; Suitable spatial filters are determined for each sound source based on the user’s head position (which can include spherical coordinates such as azimuth and elevation, and/or a three- dimensional position such as X, Y, and Z) and corresponding sound source location; [0039], The sound sources can be displayed in a three dimensional spatial environment (e.g., in an extended reality environment); [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0053], A user can move about in the XR setting, such that the user’s position, head position, and/or gaze in the XR setting can be tracked and updated; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])

Regarding claim 61, Romblom teaches all the limitations of claim 36, further comprising: 
wherein: in accordance with a determination that the first object is associated with a first location, the audio corresponding to the first object is output such that the audio corresponding to the first object is spatially positioned at the first location; and in accordance with a determination that the first object is associated with a second location different from the first location, the audio corresponding to the first object is output such that the audio corresponding to the first object is spatially positioned at the second location (Romblom Figs. 1-9; [0019], One or more sound sources can have virtual locations in a spatial audio environment. They can be displayed in a visual environment, such as, for example, in 3D extended reality (XR) setting; [0023], as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0026], Spatialized audio maintains the illusion that one or more sounds originate from virtual locations in the environment; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; Suitable spatial filters are determined for each sound source based on the user’s head position (which can include spherical coordinates such as azimuth and elevation, and/or a three- dimensional position such as X, Y, and Z) and corresponding sound source location; [0039], The sound sources can be displayed in a three dimensional spatial environment (e.g., in an extended reality environment); [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 38, 41, 46, 51, and 56 are rejected under 35 U.S.C. 103 as being unpatentable over Romblom in view of McCoy et al. (US 20200097246 A1, published 03/26/2020), hereinafter McCoy.

Regarding claim 38, Romblom teaches all the limitations claim 36, further comprising:
wherein detecting the attention of the user moving away from being directed to the first object includes detecting that a second gaze input has not been directed to the first object for a amount of time (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])
However, Romblom fails to expressly disclose includes detecting that a second gaze input has not been directed to the first object for more than a threshold amount of time.  In the same field of endeavor, McCoy teaches:
includes detecting that a second gaze input has not been directed to the first object for more than a threshold amount of time (McCoy Figs. 1-6; [0004], Based on the gaze and/or the perceived locations of virtual content, virtual content currently occupying the user's attention may be identified and the presentation of audio may be modified; the modification may include one or more of enhancing audio associated with and/or accompanying the identified virtual content, diminishing the audio not associated with and/or not accompanying the identified virtual content; [0063], Gaze information may be determined over time; [0069], the modification component 122 may be configured to modify the presentation of the audio content, ambient sounds, and/or other content based a gaze direction being directed toward, intersecting with, and/or otherwise coinciding with a perceived location of a virtual object for predetermined period of time; the predetermined period of time may be a period of time in the range of one and eight seconds; [0074], A gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the first location, presentation of one or more of the first audio content 304, the second audio content 310, ambient sound 314, and/or other audio content may be modified. In some implementations, the modification may include one or more of increasing a volume of first audio content 304, decreasing a volume of second audio content 310, ceasing presentation of second audio content 310, decreasing a volume of ambient sound 314, ceasing presentation of ambient sound 314, and/or other modifications; [0075], FIG. 5 illustrates another view of interactive environment 300 from the perspective of user 301. The gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the second location, presentation of one or more of the first audio content 304, the second audio content 310, ambient sound 314, and/or other audio content may be modified. In some implementations, the modification may include one or more of increasing a volume of second audio content 310, decreasing a volume of first audio content 304, ceasing presentation of first audio content 304, decreasing a volume of ambient sound 314, ceasing presentation of ambient sound 314, and/or other modifications; [0076], the gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the second user 601, presentation of audio content may be modified. In some implementations, the modification may include one or more of effectuating presentation of the user-specific audio content 604, decreasing a volume of first audio content 304, ceasing presentation of first audio content 304, decreasing a volume of second audio content 310, ceasing presentation of second audio content 310, decreasing a volume of ambient sound, ceasing presentation of ambient sound, and/or other modifications)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated includes detecting that a second gaze input has not been directed to the first object for more than a threshold amount of time as suggested in McCoy into Romblom.  Doing so would be desirable because audio content associated with individual virtual objects and/or individual physical objects may be enhanced and/or diminished based on individual virtual objects and/or individual physical objects occupying the user's attention (see McCoy [0029]).  Gaze information may be determined over time (see McCoy [0063]).  The system of McCoy would improve the system of Romblom by providing additional methods of measuring attention (see Romblom [0040-0046]), such as a predetermined time period (see McCoy [0069]).  McCoy’s modification component 122 may be configured to modify the presentation of the audio content, ambient sounds, and/or other content based a gaze direction being directed toward, intersecting with, and/or otherwise coinciding with a perceived location of a virtual object for predetermined period of time (see McCoy [0069]), thereby better measuring the user’s attention and avoiding confusion when a user may accidentally, quickly glance at a sound source, without intending to focus on the audio of that object.  By providing additional measures of attention, McCoy’s system provides additional flexibility and features, thereby increasing the usefulness and desirability of the system.

Regarding claim 41, Romblom teaches all the limitations claim 39, further comprising:
wherein detecting that the second object is becoming active includes detecting a fourth gaze input has been directed to the second object for a amount of time (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])
However, Romblom fails to expressly disclose detecting a fourth gaze input has been directed to the second object for more than a second threshold amount of time.  In the same field of endeavor, McCoy teaches:
detecting a fourth gaze input has been directed to the second object for more than a second threshold amount of time (McCoy Figs. 1-6; [0004], Based on the gaze and/or the perceived locations of virtual content, virtual content currently occupying the user's attention may be identified and the presentation of audio may be modified; the modification may include one or more of enhancing audio associated with and/or accompanying the identified virtual content, diminishing the audio not associated with and/or not accompanying the identified virtual content; [0063], Gaze information may be determined over time; [0069], the modification component 122 may be configured to modify the presentation of the audio content, ambient sounds, and/or other content based a gaze direction being directed toward, intersecting with, and/or otherwise coinciding with a perceived location of a virtual object for predetermined period of time; the predetermined period of time may be a period of time in the range of one and eight seconds; [0074], A gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the first location, presentation of one or more of the first audio content 304, the second audio content 310, ambient sound 314, and/or other audio content may be modified. In some implementations, the modification may include one or more of increasing a volume of first audio content 304, decreasing a volume of second audio content 310, ceasing presentation of second audio content 310, decreasing a volume of ambient sound 314, ceasing presentation of ambient sound 314, and/or other modifications; [0075], FIG. 5 illustrates another view of interactive environment 300 from the perspective of user 301. The gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the second location, presentation of one or more of the first audio content 304, the second audio content 310, ambient sound 314, and/or other audio content may be modified. In some implementations, the modification may include one or more of increasing a volume of second audio content 310, decreasing a volume of first audio content 304, ceasing presentation of first audio content 304, decreasing a volume of ambient sound 314, ceasing presentation of ambient sound 314, and/or other modifications; [0076], the gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the second user 601, presentation of audio content may be modified. In some implementations, the modification may include one or more of effectuating presentation of the user-specific audio content 604, decreasing a volume of first audio content 304, ceasing presentation of first audio content 304, decreasing a volume of second audio content 310, ceasing presentation of second audio content 310, decreasing a volume of ambient sound, ceasing presentation of ambient sound, and/or other modifications)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated includes detecting a fourth gaze input has been directed to the second object for more than a second threshold amount of time as suggested in McCoy into Romblom.  Doing so would be desirable because audio content associated with individual virtual objects and/or individual physical objects may be enhanced and/or diminished based on individual virtual objects and/or individual physical objects occupying the user's attention (see McCoy [0029]).  Gaze information may be determined over time (see McCoy [0063]).  The system of McCoy would improve the system of Romblom by providing additional methods of measuring attention (see Romblom [0040-0046]), such as a predetermined time period (see McCoy [0069]).  McCoy’s modification component 122 may be configured to modify the presentation of the audio content, ambient sounds, and/or other content based a gaze direction being directed toward, intersecting with, and/or otherwise coinciding with a perceived location of a virtual object for predetermined period of time (see McCoy [0069]), thereby better measuring the user’s attention and avoiding confusion when a user may accidentally, quickly glance at a sound source, without intending to focus on the audio of that object.  By providing additional measures of attention, McCoy’s system provides additional flexibility and features, thereby increasing the usefulness and desirability of the system.

Regarding claim 46, Romblom teaches all the limitations claim 36, further comprising:
wherein: before detecting the attention of the user moving away from being directed to the first object; and while detecting the attention of the user moving away from being directed to the first object (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])
However, Romblom fails to expressly disclose wherein: before detecting the attention of the user moving away from being directed to the first object, the computer system is a first distance from the first object; and while detecting the attention of the user moving away from being directed to the first object, the computer system is the first distance from the first object.  In the same field of endeavor, McCoy teaches:
wherein: before detecting the attention of the user moving away from being directed to the first object, the computer system is a first distance from the first object; and while detecting the attention of the user moving away from being directed to the first object, the computer system is the first distance from the first object (McCoy Figs. 1-6; [0004], Based on the gaze and/or the perceived locations of virtual content, virtual content currently occupying the user's attention may be identified and the presentation of audio may be modified; the modification may include one or more of enhancing audio associated with and/or accompanying the identified virtual content, diminishing the audio not associated with and/or not accompanying the identified virtual content; [0037-0038], Presentation device 141 may superimpose images of virtual content over views of the real-world such that the virtual content may be perceived by the viewing user as being present in the real world; [0063], Gaze information may be determined over time; [0069], the modification component 122 may be configured to modify the presentation of the audio content, ambient sounds, and/or other content based a gaze direction being directed toward, intersecting with, and/or otherwise coinciding with a perceived location of a virtual object for predetermined period of time; the predetermined period of time may be a period of time in the range of one and eight seconds; [0071], The one or more real-world objects may include real-world object 316; [0074], A gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the first location, presentation of one or more of the first audio content 304, the second audio content 310, ambient sound 314, and/or other audio content may be modified. In some implementations, the modification may include one or more of increasing a volume of first audio content 304, decreasing a volume of second audio content 310, ceasing presentation of second audio content 310, decreasing a volume of ambient sound 314, ceasing presentation of ambient sound 314, and/or other modifications; [0075], FIG. 5 illustrates another view of interactive environment 300 from the perspective of user 301. The gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the second location, presentation of one or more of the first audio content 304, the second audio content 310, ambient sound 314, and/or other audio content may be modified. In some implementations, the modification may include one or more of increasing a volume of second audio content 310, decreasing a volume of first audio content 304, ceasing presentation of first audio content 304, decreasing a volume of ambient sound 314, ceasing presentation of ambient sound 314, and/or other modifications; [0076], FIG. 6 illustrates another view of interactive environment 300 from the perspective of user 301. The interactive environment 300 may further include one or more of a second user 601, a second presentation device 602 installed on the head of the second user 601, other users, and/or other presentation devices; the gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the second user 601, presentation of audio content may be modified. In some implementations, the modification may include one or more of effectuating presentation of the user-specific audio content 604, decreasing a volume of first audio content 304, ceasing presentation of first audio content 304, decreasing a volume of second audio content 310, ceasing presentation of second audio content 310, decreasing a volume of ambient sound, ceasing presentation of ambient sound, and/or other modifications (as shown in Figs. 3-6, while changing gaze, the user remains a distance from real world objects in the display)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein: before detecting the attention of the user moving away from being directed to the first object, the computer system is a first distance from the first object; and while detecting the attention of the user moving away from being directed to the first object, the computer system is the first distance from the first object as suggested in McCoy into Romblom.  Doing so would be desirable because audio content associated with individual virtual objects and/or individual physical objects may be enhanced and/or diminished based on individual virtual objects and/or individual physical objects occupying the user's attention (see McCoy [0029]).  Gaze information may be determined over time (see McCoy [0063]).  The system of McCoy would improve the system of Romblom by providing the ability to display both virtual and real-world objects at a distance from the user (see McCoy [0037-0038]) and modify audio associated with the virtual and real-world objects (McCoy [0071-0076]).  By providing additional objects for audio modification, McCoy’s system provides additional flexibility and features, thereby increasing the usefulness and desirability of the system.

Regarding claim 51, Romblom teaches all the limitations claim 36, further comprising:
further comprising: while outputting, via the one or more audio output devices, the audio corresponding to the first object at the second prominence of audio output, detecting an occurrence of a second event that includes detecting the attention of the user moving away from being directed to the first object; and in response to detecting the occurrence of the second event that includes detecting the attention of the user moving away from being directed to the first object, reducing prominence of the output of audio corresponding to the first object by modifying the audio corresponding to the first object (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])
However, Romblom fails to expressly disclose includes reducing prominence of the output of audio corresponding to the first object by pausing the audio corresponding to the first object.  In the same field of endeavor, McCoy teaches:
reducing prominence of the output of audio corresponding to the first object by pausing the audio corresponding to the first object (McCoy Figs. 1-6; [0004], Based on the gaze and/or the perceived locations of virtual content, virtual content currently occupying the user's attention may be identified and the presentation of audio may be modified; the modification may include one or more of enhancing audio associated with and/or accompanying the identified virtual content, diminishing the audio not associated with and/or not accompanying the identified virtual content; [0063], Gaze information may be determined over time; [0069], the modification component 122 may be configured to modify the presentation of the audio content, ambient sounds, and/or other content based a gaze direction being directed toward, intersecting with, and/or otherwise coinciding with a perceived location of a virtual object for predetermined period of time; the predetermined period of time may be a period of time in the range of one and eight seconds; [0074], A gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the first location, presentation of one or more of the first audio content 304, the second audio content 310, ambient sound 314, and/or other audio content may be modified. In some implementations, the modification may include one or more of increasing a volume of first audio content 304, decreasing a volume of second audio content 310, ceasing presentation of second audio content 310, decreasing a volume of ambient sound 314, ceasing presentation of ambient sound 314, and/or other modifications; [0075], FIG. 5 illustrates another view of interactive environment 300 from the perspective of user 301. The gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the second location, presentation of one or more of the first audio content 304, the second audio content 310, ambient sound 314, and/or other audio content may be modified. In some implementations, the modification may include one or more of increasing a volume of second audio content 310, decreasing a volume of first audio content 304, ceasing presentation of first audio content 304, decreasing a volume of ambient sound 314, ceasing presentation of ambient sound 314, and/or other modifications; [0076], the gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the second user 601, presentation of audio content may be modified. In some implementations, the modification may include one or more of effectuating presentation of the user-specific audio content 604, decreasing a volume of first audio content 304, ceasing presentation of first audio content 304, decreasing a volume of second audio content 310, ceasing presentation of second audio content 310, decreasing a volume of ambient sound, ceasing presentation of ambient sound, and/or other modifications)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated reducing prominence of the output of audio corresponding to the first object by pausing the audio corresponding to the first object as suggested in McCoy into Romblom.  Doing so would be desirable because audio content associated with individual virtual objects and/or individual physical objects may be enhanced and/or diminished based on individual virtual objects and/or individual physical objects occupying the user's attention (see McCoy [0029]).  Gaze information may be determined over time (see McCoy [0063]).  The system of McCoy would improve the system of Romblom by providing options for decreasing the prominence of audio, such as pausing the audio (see McCoy [0074-0076]).  McCoy’s additional modifications would better allow the system to focus the user’s attention as needed, thereby better focusing the user’s attention and avoiding confusion when there are multiple sound sources.  By providing additional audio modifications, McCoy’s system provides additional flexibility and features, thereby increasing the usefulness and desirability of the system.

Regarding claim 56, Romblom teaches all the limitations claim 52, further comprising:
wherein the fifth object is the first type of object, the method comprising: while outputting audio corresponding to the fifth object, detecting a request to output audio corresponding to an object that is the first type of object different from the fifth object; and in response to detecting the request to output audio corresponding to the object that is the first type of object: outputting audio corresponding to the object that is the first type of object; and modifying outputting audio corresponding to the fifth type of object (Romblom Figs. 1-9; [0006], a head mounted display; When a sound source is not the subject of a user's attention, for example, the sound source leaves the user's field of view, such sound sources can be spatially rendered in a manner that is less distracting to a user; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0040], Figures 3-6 show how decorrelation of sounds that are not the subject of the user’s attention can be performed and controlled in different manners; [0041], as shown in Figure 3, the decorrelation of a sound source can be inversely proportional to the measure of the user attention; [0042], decorrelation level can vary based on content type, recognizing that some content (e.g., speech) is more distracting than others (e.g., running water); [0043], As shown in Figure 5, the relationship between decorrelation of the sound source and the measure of user attention can be non-linear; [0044], Figure 6 shows a two-state relationship to decorrelation where if a measure of the user’s attention satisfies a threshold then it is decorrelated. Otherwise, the sound is not decorrelated; [0046], different control relationships can be applied for different sound sources. Different control relationships can be determined based on a type of the sound source; for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type. The criteria can be stored as metadata. For example, the importance level from some sound sources such as an alert from a messenger application may be deemed as high importance because such a sound source is associated with communication with the user, which may be of higher priority. Thus, even though another application might be subject to the user’s attention, some sound sources, such as alerts, calls, alarms, system notifications, can be rendered spatially without decorrelation; [0054], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; the loudness level of sound source 44 might be below a threshold so that the system need not decorrelate sound source 44, thus saving computational resources; [0055], if a virtual distance between the user and sound source 45 is beyond the distance threshold (e.g., 5 virtual meters, 8 virtual meters, or 10 virtual meters) then the spatialization of the sound source according to room acoustics may sufficiently decorrelate the sound. In such a case, performing decorrelation may be wasteful of computational resources; as described, user attention can be quantified as one or more values that may be increased or decreased dependent on the user’s gaze direction ([0023]); depending on the type of source, decorrelation may change linearly with the attention value or remain static until an attention threshold is crossed ([0041-0044]); some types of sources are not decorrelated dependent on the user’s gaze direction ([0050], [0054-0055])
However, Romblom fails to expressly disclose includes ceasing outputting audio corresponding to the fifth type of object.  In the same field of endeavor, McCoy teaches:
ceasing outputting audio corresponding to the fifth type of object (McCoy Figs. 1-6; [0004], Based on the gaze and/or the perceived locations of virtual content, virtual content currently occupying the user's attention may be identified and the presentation of audio may be modified; the modification may include one or more of enhancing audio associated with and/or accompanying the identified virtual content, diminishing the audio not associated with and/or not accompanying the identified virtual content; [0063], Gaze information may be determined over time; [0066], The modification component 112 may be configured to, responsive to the gaze direction of the user being toward the first location, modify the presentation of the first audio content, the second audio content, and/or other audio content. In some implementations, the modification may include one or more of decreasing a volume of the second audio content and/or other audio content, ceasing presentation of the second audio content and/or other audio content, increasing a volume of the first audio content, and/or other modifications; [0069], the modification component 122 may be configured to modify the presentation of the audio content, ambient sounds, and/or other content based a gaze direction being directed toward, intersecting with, and/or otherwise coinciding with a perceived location of a virtual object for predetermined period of time; the predetermined period of time may be a period of time in the range of one and eight seconds; [0074], A gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the first location, presentation of one or more of the first audio content 304, the second audio content 310, ambient sound 314, and/or other audio content may be modified. In some implementations, the modification may include one or more of increasing a volume of first audio content 304, decreasing a volume of second audio content 310, ceasing presentation of second audio content 310, decreasing a volume of ambient sound 314, ceasing presentation of ambient sound 314, and/or other modifications; [0075], FIG. 5 illustrates another view of interactive environment 300 from the perspective of user 301. The gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the second location, presentation of one or more of the first audio content 304, the second audio content 310, ambient sound 314, and/or other audio content may be modified. In some implementations, the modification may include one or more of increasing a volume of second audio content 310, decreasing a volume of first audio content 304, ceasing presentation of first audio content 304, decreasing a volume of ambient sound 314, ceasing presentation of ambient sound 314, and/or other modifications; [0076], the gaze direction 400 of the user 301 may be determined. Responsive to the gaze direction 400 of the user 301 being toward the second user 601, presentation of audio content may be modified. In some implementations, the modification may include one or more of effectuating presentation of the user-specific audio content 604, decreasing a volume of first audio content 304, ceasing presentation of first audio content 304, decreasing a volume of second audio content 310, ceasing presentation of second audio content 310, decreasing a volume of ambient sound, ceasing presentation of ambient sound, and/or other modifications)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated ceasing outputting audio corresponding to the fifth type of object as suggested in McCoy into Romblom.  Doing so would be desirable because audio content associated with individual virtual objects and/or individual physical objects may be enhanced and/or diminished based on individual virtual objects and/or individual physical objects occupying the user's attention (see McCoy [0029]).  Gaze information may be determined over time (see McCoy [0063]).  The system of McCoy would improve the system of Romblom by providing options for decreasing the prominence of audio, such as pausing the audio (see McCoy [0074-0076]).  Romblom discloses that different types of content may be modified differently (Romblom [0046]), and McCoy discloses that some content may be modified and other audio content may be ceased (see McCoy [0066]).  McCoy’s additional modifications would better allow the system to focus the user’s attention as needed, thereby better focusing the user’s attention and avoiding confusion when there are multiple sound sources.  By providing additional audio modifications, McCoy’s system provides additional flexibility and features, thereby increasing the usefulness and desirability of the system.

Claims 44 and 45 are rejected under 35 U.S.C. 103 as being unpatentable over Romblom in view of Perry (US 20170221264 A1, published 08/03/2017).

Regarding claim 44, Romblom teaches all the limitations of claim 43, further comprising:
wherein detecting that the first object is becoming inactive includes detecting a request (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])
However, Romblom fails to expressly disclose wherein detecting that the first object is becoming inactive includes detecting a request to cease displaying the first object.  In the same field of endeavor, Perry teaches:
wherein detecting that the first object is becoming inactive includes detecting a request to cease displaying the first object (Perry Figs. 1-10; abs. The virtual reality scene is adjusted to move the object of current focus of the user toward a point of view of the user within the virtual reality scene. Audio content associated with the object of current focus of the user is provided to the head mounted display; [0019], FIG. 4C shows the top view of the user immersed in the virtual reality scene as shown in FIG. 4B, after having determined that the display D1 is the object of current focus of the user 100; [0020], FIG. 4D shows the top view of the user immersed in the virtual reality scene as shown in FIG. 4C, with the user having changed their focus direction; [0021-0022], FIG. 4F shows the top view of the user immersed in the virtual reality scene as shown in FIG. 4E, with the user having changed their focus direction toward display D5; [0031], FIG. 6A shows an example of the field of view of the user immersed in the virtual reality scene; [0032-0035], FIG. 6E shows the field of view of the user within the virtual reality scene as shown in FIG. 6D, after completion of bringing the display D36 toward the user in the spatial manner; [0036-0038], FIG. 6H shows the field of view of the user within the virtual reality scene as shown in FIG. 6G, after completion of bringing the display D35 toward the user in the spatial manner; [0039], FIG. 6I shows the field of view of the user within the virtual reality scene as shown in FIG. 6H, after the user has activated the transparency control so as to make the object of current focus of the user transparent; [0040], FIG. 6J shows the field of view of the user within the virtual reality scene as shown in FIG. 6I, after having determined that the display D39 is the new object of current focus of the user, with the virtual reality scene adjusting to diminish the display D35 back to its normal position and to bring the display D39 toward the user in a spatial manner (first object is obscured by a third object different from the first object); [0041], FIG. 6K shows the field of view of the user within the virtual reality scene as shown in FIG. 6J, after completion of bringing the display D39 toward the user in the spatial manner; [0112], FIG. 6J shows the field of view 401 of the user 100 within the virtual reality scene 400 as shown in FIG. 6I, after having determined that the display D39 is the new object of current focus of the user 100, with the virtual reality scene 400 adjusting to diminish the display D35 back to its normal position and to bring the display D39 toward the user 100 in a spatial manner, in accordance with some embodiments of the present invention. Again, once the display D39 is determined to be the object of current focus of the user 100, the audio content associated with the display D39 can be included in an audio feed provided to the user 100; [0117], when an object of current focus of the user 100 is brought toward the user 100 in a spatial manner, the object of current focus of the user 100 can be enlarged to substantially fill the field of view 401 of the user 100 (first object is obscured by a third object different from the first object); [0121], upon detecting the change in the eye gaze direction of the user, the method includes changing the focus direction of the user within the field of view of the user within virtual reality scene to align with the eye gaze direction of the user. And, the method includes determining a new object of current focus of the user within the virtual reality scene based on the determined focus direction of the user, where the focus direction of the user is directed toward the new object of current focus of the user)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein detecting that the first object is becoming inactive includes detecting a request to cease displaying the first object as suggested in Perry into Romblom.  Doing so would be desirable because in many virtual reality applications, it is not only desirable to have the user feel visually immersed in the virtual reality scene, but it is also desirable to provide the user with an ability to select objects displayed within the virtual reality scene for a more focused view. It is within this context that the present invention arises (see Perry [0003]).  The visual content and audio content of the various displays D1-D12 can be essentially any type of content associated with essentially any type of computer application and/or information source (see Perry [0086]).  The system of Perry would improve the system of Romblom by ceasing display of an inactive object, while simultaneously bringing the object of current focus of the user toward the user in a spatial manner, such that an enlarged version of the display D3 is positioned prominently in front of the user 100 (see Perry [0088]), which would better enable the user to interact with a desired object (see Perry [0107]).  By providing additional display modifications, Perry’s system provides additional flexibility and features, thereby increasing the usefulness and desirability of the system.

Regarding claim 45, Romblom teaches all the limitations of claim 43, further comprising:
wherein detecting that the first object is becoming inactive includes detecting (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])
However, Romblom fails to expressly disclose wherein detecting that the first object is becoming inactive includes detecting that the first object is obscured by a third object different from the first object.  In the same field of endeavor, Perry teaches:
wherein detecting that the first object is becoming inactive includes detecting that the first object is obscured by a third object different from the first object (Perry Figs. 1-10; abs. The virtual reality scene is adjusted to move the object of current focus of the user toward a point of view of the user within the virtual reality scene. Audio content associated with the object of current focus of the user is provided to the head mounted display; [0031], FIG. 6A shows an example of the field of view of the user immersed in the virtual reality scene; [0032-0035], FIG. 6E shows the field of view of the user within the virtual reality scene as shown in FIG. 6D, after completion of bringing the display D36 toward the user in the spatial manner; [0036-0038], FIG. 6H shows the field of view of the user within the virtual reality scene as shown in FIG. 6G, after completion of bringing the display D35 toward the user in the spatial manner; [0039], FIG. 6I shows the field of view of the user within the virtual reality scene as shown in FIG. 6H, after the user has activated the transparency control so as to make the object of current focus of the user transparent; [0040], FIG. 6J shows the field of view of the user within the virtual reality scene as shown in FIG. 6I, after having determined that the display D39 is the new object of current focus of the user, with the virtual reality scene adjusting to diminish the display D35 back to its normal position and to bring the display D39 toward the user in a spatial manner (first object is obscured by a third object different from the first object); [0041], FIG. 6K shows the field of view of the user within the virtual reality scene as shown in FIG. 6J, after completion of bringing the display D39 toward the user in the spatial manner; [0112], FIG. 6J shows the field of view 401 of the user 100 within the virtual reality scene 400 as shown in FIG. 6I, after having determined that the display D39 is the new object of current focus of the user 100, with the virtual reality scene 400 adjusting to diminish the display D35 back to its normal position and to bring the display D39 toward the user 100 in a spatial manner, in accordance with some embodiments of the present invention. Again, once the display D39 is determined to be the object of current focus of the user 100, the audio content associated with the display D39 can be included in an audio feed provided to the user 100; [0117], when an object of current focus of the user 100 is brought toward the user 100 in a spatial manner, the object of current focus of the user 100 can be enlarged to substantially fill the field of view 401 of the user 100 (first object is obscured by a third object different from the first object); [0121], upon detecting the change in the eye gaze direction of the user, the method includes changing the focus direction of the user within the field of view of the user within virtual reality scene to align with the eye gaze direction of the user. And, the method includes determining a new object of current focus of the user within the virtual reality scene based on the determined focus direction of the user, where the focus direction of the user is directed toward the new object of current focus of the user)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein detecting that the first object is becoming inactive includes detecting that the first object is obscured by a third object different from the first object as suggested in Perry into Romblom.  Doing so would be desirable because in many virtual reality applications, it is not only desirable to have the user feel visually immersed in the virtual reality scene, but it is also desirable to provide the user with an ability to select objects displayed within the virtual reality scene for a more focused view. It is within this context that the present invention arises (see Perry [0003]).  The visual content and audio content of the various displays D1-D12 can be essentially any type of content associated with essentially any type of computer application and/or information source (see Perry [0086]).  The system of Perry would improve the system of Romblom by obscuring an inactive object, while simultaneously bringing the object of current focus of the user toward the user in a spatial manner, such that an enlarged version of the display D3 is positioned prominently in front of the user 100 (see Perry [0088]), which would better enable the user to interact with a desired object (see Perry [0107]).  By providing additional display modifications, Perry’s system provides additional flexibility and features, thereby increasing the usefulness and desirability of the system.

Claims 48 and 50 are rejected under 35 U.S.C. 103 as being unpatentable over Romblom in view of Tajik (US 20210084429 A1, published 03/18/2021), hereinafter Tajik.

Regarding claim 48, Romblom teaches all the limitations of claim 36, further comprising:
wherein outputting, via the one or more audio output devices, the audio corresponding to the first object at the second prominence of audio output includes reducing (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])
However, Romblom fails to expressly disclose reducing a volume level of the audio corresponding to the first object from a first volume level to a second volume level different from the first volume level.  In the same field of endeavor, Tajik teaches:
reducing a volume level of the audio corresponding to the first object from a first volume level to a second volume level different from the first volume level (Jang Figs. 1-10; [0032], mixed reality objects comprise corresponding pairs of real objects and virtual objects (i.e., 122A/122B, 124A/124B, 126A/126B) that occupy corresponding locations in coordinate space 108; [0034], the presentation described above may also incorporate audio aspects; [0053], a MRE (such as experienced via a mixed reality system, e.g., mixed reality system 200 described above) can present, to a user, audio signals that may correspond to a “listener” coordinate, such that the audio signals represent what a user might hear at that listener coordinate; [0054], the virtual sound is presented to the user as a real audio signal (e.g., via speakers 2134 and 2136), the user may perceive the virtual sound as originating from the position of the sound source, and traveling in the direction of an orientation of the sound source; [0066], a virtual audio signal comprises base sound data (e.g., a computer file representing an audio waveform) and one or more parameters that can be applied to that base sound data. Such parameters may correspond to attenuation of the base sound (e.g., a volume drop-off); filtering of the base sound (e.g., a low-pass filter); time delay (e.g., phase shift) of the base sound; reverberation parameters for applying artificial reverb and echo effects; voltage-controlled oscillator (VCO) parameters for applying time-based modulation effects; pitch modulation of the base sound (e.g., to simulate Doppler effects); or other suitable parameters; [0067], The MRE could also apply a low-pass filter to the virtual audio signal, resulting in the signal appearing more muffled as high-frequency content is rolled off; [0079], wearable head device 510 may be configured to detect a position of the user's head, and to approximate the respective locations of the user's ears based on that position (e.g., by estimating or detecting the width of the user's head, and identifying the locations of the ears as being located along the circumference of the head and separated by the width of the head). By identifying the locations of the user's ears, audio signals can be presented to the ears that correspond to those particular locations; [0094], As the wearable head device 200, 400A is displaced or rotated, the spatialization of virtual audio may be adjusted; [0097], spatialized audio signals can be determined by applying left and right Head Related Transfer Functions (HRTFs))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated reducing a volume level of the audio corresponding to the first object from a first volume level to a second volume level different from the first volume level as suggested in Tajik into Romblom.  Doing so would be desirable because VR systems may experience various drawbacks that result from replacing a user's real environment with a virtual environment. One drawback is a feeling of motion sickness that can arise when a user's field of view in a virtual environment no longer corresponds to the state of his or her inner ear, which detects one's balance and orientation in the real environment (not a virtual environment). Similarly, users may experience disorientation in VR environments where their own bodies and limbs (views of which users rely on to feel “grounded” in the real environment) are not directly visible. Another drawback is the computational burden (e.g., storage, processing power) placed on VR systems which must present a full 3D virtual environment, particularly in real-time applications that seek to immerse the user in the virtual environment. Similarly, such environments may need to reach a very high standard of realism to be considered immersive, as users tend to be sensitive to even minor imperfections in virtual environments—any of which can destroy a user's sense of immersion in the virtual environment (see Tajik [0005]). It may be desirable to present audio cues to a user of an XR system in a way that mimics aspects, particularly subtle aspects, of our own sensory experiences (see Tajik [0008]).  When users are presented with audio signals, such as described above, they may experience difficulty quickly and accurately identifying the source of the audio signal in the virtual environment—even though identifying audio sources in the real environment is an intuitive natural ability. It is desirable to improve the ability of the user to perceive a position or orientation of the sound source in the MRE, such that the user's experience in a virtual or mixed reality environment more closely resembles the user's experience in the real world (see Tajik [0055]).  Tajik can flexibly apply multiple different filters types, such as volume drop-off and low-pass filters, in combination with the time delays, phase shifts, and HRTFs of Romblom (see Romblom [0025-0026] and Tajik [0066], [0097]), thereby enhancing the audio modifications beyond those of Romblom.  By providing additional audio modifications, Tajik’s system provides additional flexibility and features, thereby increasing the usefulness and desirability of the system.

Regarding claim 50, Romblom teaches all the limitations of claim 36, further comprising:
wherein outputting, via the one or more audio output devices, the audio corresponding to the first object at the second prominence of audio output includes applying (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])
However, Romblom fails to expressly disclose applying a low-pass filter to the audio corresponding to the first object at the first prominence of audio output.  In the same field of endeavor, Tajik teaches:
applying a low-pass filter to the audio corresponding to the first object at the first prominence of audio output (Jang Figs. 1-10; [0032], mixed reality objects comprise corresponding pairs of real objects and virtual objects (i.e., 122A/122B, 124A/124B, 126A/126B) that occupy corresponding locations in coordinate space 108; [0034], the presentation described above may also incorporate audio aspects; [0053], a MRE (such as experienced via a mixed reality system, e.g., mixed reality system 200 described above) can present, to a user, audio signals that may correspond to a “listener” coordinate, such that the audio signals represent what a user might hear at that listener coordinate; [0054], the virtual sound is presented to the user as a real audio signal (e.g., via speakers 2134 and 2136), the user may perceive the virtual sound as originating from the position of the sound source, and traveling in the direction of an orientation of the sound source; [0066], a virtual audio signal comprises base sound data (e.g., a computer file representing an audio waveform) and one or more parameters that can be applied to that base sound data. Such parameters may correspond to attenuation of the base sound (e.g., a volume drop-off); filtering of the base sound (e.g., a low-pass filter); time delay (e.g., phase shift) of the base sound; reverberation parameters for applying artificial reverb and echo effects; voltage-controlled oscillator (VCO) parameters for applying time-based modulation effects; pitch modulation of the base sound (e.g., to simulate Doppler effects); or other suitable parameters; [0067], The MRE could also apply a low-pass filter to the virtual audio signal, resulting in the signal appearing more muffled as high-frequency content is rolled off; [0079], wearable head device 510 may be configured to detect a position of the user's head, and to approximate the respective locations of the user's ears based on that position (e.g., by estimating or detecting the width of the user's head, and identifying the locations of the ears as being located along the circumference of the head and separated by the width of the head). By identifying the locations of the user's ears, audio signals can be presented to the ears that correspond to those particular locations; [0094], As the wearable head device 200, 400A is displaced or rotated, the spatialization of virtual audio may be adjusted; [0097], spatialized audio signals can be determined by applying left and right Head Related Transfer Functions (HRTFs)))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated applying a low-pass filter to the audio corresponding to the first object at the first prominence of audio output as suggested in Tajik into Romblom.  Doing so would be desirable because VR systems may experience various drawbacks that result from replacing a user's real environment with a virtual environment. One drawback is a feeling of motion sickness that can arise when a user's field of view in a virtual environment no longer corresponds to the state of his or her inner ear, which detects one's balance and orientation in the real environment (not a virtual environment). Similarly, users may experience disorientation in VR environments where their own bodies and limbs (views of which users rely on to feel “grounded” in the real environment) are not directly visible. Another drawback is the computational burden (e.g., storage, processing power) placed on VR systems which must present a full 3D virtual environment, particularly in real-time applications that seek to immerse the user in the virtual environment. Similarly, such environments may need to reach a very high standard of realism to be considered immersive, as users tend to be sensitive to even minor imperfections in virtual environments—any of which can destroy a user's sense of immersion in the virtual environment (see Tajik [0005]). It may be desirable to present audio cues to a user of an XR system in a way that mimics aspects, particularly subtle aspects, of our own sensory experiences (see Tajik [0008]).  When users are presented with audio signals, such as described above, they may experience difficulty quickly and accurately identifying the source of the audio signal in the virtual environment—even though identifying audio sources in the real environment is an intuitive natural ability. It is desirable to improve the ability of the user to perceive a position or orientation of the sound source in the MRE, such that the user's experience in a virtual or mixed reality environment more closely resembles the user's experience in the real world (see Tajik [0055]).  Tajik can flexibly apply multiple different filters types, such as volume drop-off and low-pass filters, in combination with the time delays, phase shifts, and HRTFs of Romblom (see Romblom [0025-0026] and Tajik [0066], [0097]), thereby enhancing the audio modifications beyond those of Romblom.  By providing additional audio modifications, Tajik’s system provides additional flexibility and features, thereby increasing the usefulness and desirability of the system.
 
Claim 49 is rejected under 35 U.S.C. 103 as being unpatentable over Romblom in view of Jang et al. (US 20210258709 A1, published 08/19/2021), hereinafter Jang.

Regarding claim 49, Romblom teaches all the limitations of claim 36, further comprising:
wherein outputting, via the one or more audio output devices, the audio corresponding to the first object at the second prominence of audio output includes (Romblom Figs. 1-9; [0023], The measure of user attention can be quantified as one or more values. For example, it can be a normalized value from 0.0 to 1.0, 0 to 100, a binary 0 or 1, a set of values, or other convention that indicates the user’s attention level to a sound source. Each input can contribute numerically to the measure of user’s attention. In some aspects, the measure of user attention can be determined based on head direction. For example, as the user’s head is directed towards a sound source, the measure of user attention increases, and as the user’s head is directed away from the sound source, the measure of user attention decreases. The same can apply to a user’s gaze; [0037], when a user pays attention to a sound source, the audio is rendered to originate from a concise point in space. When the user turns or looks away from the sound source, the audio will be rendered to sound more spread out, so as to be less distracting; [0046], for speech sound sources, an abrupt transition may be useful to quickly decorrelate one speech sound source in favor of another speech sound source (e.g., in a virtual meeting) because speech can be especially distracting; [0049], Measure of the user's attention can be determined and used as a basis to decorrelate sound sources. For example, if a user turns her gaze towards the movie player, then the other applications such as the web browser, the messenger application, and the music player can be decorrelated. In some aspects, for example, in an XR setting, some applications can be open but not shown to the user when those applications are outside the field of view of the user; [0050], the decorrelation can be further determined based on criteria such as importance level, loudness level, a virtual distance, or content type; [0054-0055], For example, the user can be turned towards and gazing at sound source 41 in the XR setting 40. Based on the tracked head position (e.g., azimuth, elevation) and/or gaze, the user attention is measured as high for sound source 41 and low for the other sound sources. In such a case, sound sources such as sound source 42 and sound source 43 can be spatially rendered with decorrelation, while sound source 41 can be spatially rendered concisely; see also [0006-0007], [0031], [0040-0045])
However, Romblom fails to expressly disclose wherein outputting, via the one or more audio output devices, the audio corresponding to the first object at the second prominence of audio output includes increasing an amount of reverberation for the audio corresponding to the first object from a first reverberation level to a second reverberation level different from the first reverberation level.  In the same field of endeavor, Jang teaches:
wherein outputting, via the one or more audio output devices, the audio corresponding to the first object at the second prominence of audio output includes increasing an amount of reverberation for the audio corresponding to the first object from a first reverberation level to a second reverberation level different from the first reverberation level (Jang Figs. 1-10; [0059], when an audio zooming effect corresponding to an audio object is set to be used, the audio zooming effect corresponding to the audio object may be activated based on a gaze of a user. For example, when a user views a content in a VR, an audio object included in the content and corresponding to a direction of a gaze of the user may be activated; [0065], the user may more readily listen to the audio zooming effect corresponding to audio object 2 which is closer to the user; [0072], an audio zooming effect corresponding to an activated audio object may be controlled based on a volume level adjustment method, and applied to a corresponding content. In this example, a volume level of an audio signal corresponding to the activated audio object may be controlled to increase by a preset level compared to a volume level of an audio signal corresponding to an inactivated audio object; [0073], an audio zooming effect corresponding to an activated audio object may be controlled based on a direct to reverberant (D/R) ratio, or a ratio of a direct sound to a reflected sound, and applied to a corresponding content. That is, the audio zooming effect may be controlled by controlling a D/R ratio of the activated audio object and a D/R ratio of an inactivated audio object; [0074], by reducing a ratio of a direct sound to an indirect sound (e.g., reflected sound and reverberant sound) of an audio signal propagated directly from an activated audio object to be less than a ratio of a direct sound to an indirect sound of an audio signal corresponding to an inactivated audio object, it is possible to increase a rate of the direct sound from the activated audio object. That is, as a rate of a direct sound corresponding to an audio object increases, an audio zooming effect corresponding to the audio object may increase. In addition, as a rate of an indirect sound including a reflected sound and a reverberant sound corresponding to an audio object increases, an audio zooming effect corresponding to the audio object may decrease)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein outputting, via the one or more audio output devices, the audio corresponding to the first object at the second prominence of audio output includes increasing an amount of reverberation for the audio corresponding to the first object from a first reverberation level to a second reverberation level different from the first reverberation level as suggested in Jang into Romblom.  Doing so would be desirable because when a content is transmitted through media, information associated with a transmission path of an audio source included in the content may not suffice, which may obstruct the selective attention. Thus, an existing method such as dialogue enhancement may be used to amplify a volume of a relatively significant sound on a relatively insignificant sound, thereby enhancing or magnifying the significant sound (see Jang [0003]).  However, such existing method may not be readily applicable to a virtual reality (VR) space and immersive media that enable an interaction with an object. Thus, there is a desire for technology for enhancing or magnifying an audio source in which a listener is interested among a plurality of audio sources in a VR (see Jang [0004]).  By providing additional audio modifications, Jang’s system provides additional flexibility and features, thereby increasing the usefulness and desirability of the system.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Cappello (US 20220011860 A1) see Figs. 1-14 and [0120-0123].
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN T REPSHER III whose telephone number is (571)272-7487. The examiner can normally be reached Monday - Friday, 8AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached at (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOHN T REPSHER III/            Primary Examiner, Art Unit 2143
Read full office action
Prosecution Timeline

Apr 05, 2024
Application Filed
Mar 20, 2026
Non-Final Rejection — §102, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/091,844
Patent 12574602
CONTROL DISPLAY METHOD, ELECTRONIC DEVICE, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 10, 2026
17/171,794
Patent 12568166
TIME-AVERAGED PROXIMITY SENSOR
2y 5m to grant Granted Mar 03, 2026
16/174,108
Patent 12554991
Device and Method for Performing Self-Learning Operations of an Artificial Neural Network
2y 5m to grant Granted Feb 17, 2026
18/374,371
Patent 12511029
USER INTERFACE FOR AN AUTOMATED MASSAGE SYSTEM WITH BODY MODEL AND CONTROL OBJECT
2y 5m to grant Granted Dec 30, 2025
18/301,685
Patent 12483602
COMPUTER IMPLEMENTED METHOD AND APPARATUS FOR MANAGEMENT OF NON-BINARY PRIVILEGES IN A STRUCTURED USER ENVIRONMENT
2y 5m to grant Granted Nov 25, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
58%
Grant Probability
99%
With Interview (+48.0%)
3y 5m
Median Time to Grant
Low
PTA Risk
Based on 347 resolved cases by this examiner. Grant probability derived from career allow rate.