Last updated: April 19, 2026
Application No. 18/781,885
SYSTEMS, DEVICES, AND METHODS FOR AUDIO PRESENTATION IN A THREE-DIMENSIONAL ENVIRONMENT

Non-Final OA §103
Filed
Jul 23, 2024
Examiner
BEUTEL, WILLIAM A
Art Unit
2616
Tech Center
2600 — Communications
Assignee
Apple Inc.
OA Round
1 (Non-Final)
Interview Optional

— +20.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 469 resolved cases, 2023–2026
Examiner Intelligence

BEUTEL, WILLIAM A View full profile →
Grants 70% — above average
Career Allow Rate
328 granted / 469 resolved
+7.9% vs TC avg
Strong +20% interview lift
Without
With
+20.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
28 currently pending
Career history
497
Total Applications
across all art units
Statute-Specific Performance

§101
9.9%
-30.1% vs TC avg
§103
49.8%
+9.8% vs TC avg
§102
10.7%
-29.3% vs TC avg
§112
22.0%
-18.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 469 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 6-8 are objected to because of the following informalities: 
Claim 6 recites “within in a viewport of the computing system” which appears to contain a grammatical or typographical error.  Claims 7-8 depend from 6 and are therefore rejected for incorporating the same language.  
Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over:
Morris et al. (US 2022/0086203 A1) in view of 
Hertensteiner et al. (US 2025/0205597 A1).  
Regarding claim 12, Morris discloses:
A computer system that is in communication with a display generation component, an audio output device, and one or more input devices, (Morris, Abstract, Fig. 2A and ¶78: system for teleconferencing virtual environment; ¶79: each client device including display, speakers, and input devices; also ¶127) the computer system comprising:
One or more processors (Morris, ¶79: processors; ¶127: CPU); 
Memory (Morris, ¶79: memory devices; ¶127: storage device and memory); and
One or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, (Morris, ¶127: storage device includes software; ¶128: central processing unit 421 is any logic circuitry that responds to and processes instructions fetched from the main memory unit 422; ¶132: software programs stored on storage device for implementing systems) the one or more programs including instructions for: 
Displaying, via the display generation component, a three-dimensional environment from a viewpoint of a user (Morris, ¶37: implementations of the systems and methods discussed herein provide a three-dimensional virtual environment with teleconferencing audio and video feeds placed within the environment via three-dimensional virtual avatars, including indications of directional orientation or facing, and with mixing of spatial audio providing directionality and distance cues; Fig. 1A and ¶38: FIG. 1A is an illustration of an example of a teleconferencing virtual environment 10 corresponding to a viewport of a virtual camera or view of a display rendered for a user); 
While displaying the three-dimensional environment from the viewpoint of the user, detecting an event associated with the three-dimensional environment (Morris, ¶37: provide a three-dimensional virtual environment with teleconferencing audio and video feeds placed within the environment via three-dimensional virtual avatars, including indications of directional orientation or facing, and with mixing of spatial audio providing directionality and distance cues ¶40:  a first user may turn their virtual camera to look at an avatar of another user (e.g. in response to them speaking), and the first user's avatar 102 may turn correspondingly; ¶49: detecting user’s speech includes detecting input audio at a microphone of the user’s computing device exceeding a threshold); and
In response to detecting the event: in accordance with a determination that the event corresponds to activation of a sound effect: (Morris, ¶43: Audio corresponding to each avatar (e.g. recorded by microphones at each user's computing device and provided as a media stream) may be spatially mixed based on the relative positions and orientations of the corresponding avatars and the user's avatar; ¶49: detecting user’s speech includes detecting input audio at a microphone of the user’s computing device exceeding a threshold; Also note ¶66 discloses effects automatically triggered based on detecting speech)
In accordance with a determination that the event corresponds to activation of a spatialized sound effect, presenting, via the audio output device, the sound effect as emanating from a location in the three-dimensional environment associated with the event (Morris, ¶43: Audio corresponding to each avatar (e.g. recorded by microphones at each user's computing device and provided as a media stream) may be spatially mixed based on the relative positions and orientations of the corresponding avatars and the user's avatar, where the stereo audio image of each audio stream may be placed in a position within the stereo field corresponding to the relative position of the avatar to the user's virtual camera and avatar; ¶47: spatial audio may be used to place an audio stream corresponding to an avatar in an appropriate position (e.g. panning within a stereo field or stereo attenuation, dynamically adjusting the level of an audio signal within stereo channels to control its apparent direction and/or distance) corresponding to the relative positions or orientations of the avatar and the viewer)
	Morris does not explicitly discuss the distinction between spatialized and non-spatialized sound effects as claimed.  
	Hertensteiner discloses:
In response to detecting the event: in accordance with a determination that the event corresponds to activation of a sound effect: (Hertensteiner, Fig. 8 and  ¶89: upon receiving indication, audio service performs dictionary lookup which identifies set of acoustic data from dictionary, and based on retrieved acoustic data, render audio signals either as custom acoustic data in dictionary or using default acoustic data; 
¶90:
In some embodiments, input audio signal 902 is received at audio service 600 from the client application; for example, FIG. 6 shows that audio signals (such as input audio signal 902) and sound source properties may be presented to audio service 600 (e.g., at rendering engine 640) directly by the client application. At stage 910, first acoustic data (such as runtime acoustic data 530) is identified for the client application. Examples of identifying the acoustic data for a client application are described above with respect to FIG. 7 and FIG. 8 and processes 700 and 800. At stage 920, the first acoustic data is applied to the input audio signal 902 to produce an output audio signal 904, which may correspond to audio output 650 in FIG. 6. Stage 920 can be performed by audio service 600, by rendering engine 640 (which may belong to audio service 600), or by some other suitable process. In some embodiments, output audio signal 904 can be presented to a user, such as a user of a wearable device via speakers or headphones of the wearable device.
)
 
In accordance with a determination that the event corresponds to activation of a spatialized sound effect, presenting, via the audio output device, the sound effect as emanating from a location in the three-dimensional environment associated with the event (Hertensteiner, ¶24: a virtual object in the virtual environment may generate a sound originating from a location coordinate of the object (e.g., a virtual character may speak or cause a sound effect; ¶30: a user/listener/head coordinate system 114 (comprising an x-axis 114X, a y-axis 114Y, and a z-axis 114Z) with its origin at point 115 (e.g., user/listener/head coordinate) can define a coordinate space for the user/listener/head on which the mixed reality system 112 is located; ¶36: virtual monster associated with one or more audio channels such as a footstep sound effect generated as the monster walks around the MRE; Also ¶55: virtual monster emits sound corresponding to monster’s speech or sound effects, or real object, such as lamp, emitting virtual sound corresponding to lamp being switched on or off, where virtual sound can correspond to a position and orientation of the sound source (whether real or virtual))
In accordance with a determination that the event corresponds to activation of a non-spatialized sound effect, presenting, via the audio output device, the sound effect as emanating from a predetermined location for audio in the three-dimensional environment that has a predetermined spatial relationship relative to the viewpoint of the user (Hertensteiner, ¶24: the virtual environment may be associated with musical cues or ambient sounds that may or may not be associated with a particular location, and processor can determine an audio signal corresponding to a “listener” coordinate—for instance, an audio signal corresponding to a composite of sounds in the virtual environment, and mixed and processed to simulate an audio signal that would be heard by a listener at the listener coordinate—and present the audio signal to a user via one or more speakers; ¶62: some client applications (such as applications that play non-spatialized music tracks, or menu systems that present non-spatialized sound effects) need not interact with virtual environments, even if those applications are in use by a user of a MR system)
	Both Morris and Hertensteiner are directed to systems for providing spatial audio to users of 3D virtual environments.  It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, and with a reasonable expectation of success, to modify the system and technique for providing 3D spatial audio with 3D visual graphics for an interactive, collaborative virtual environment with spatial audio based on user location relative to virtual objects generating sounds to a user as provided by Morris, by further accounting for different types of sounds effects as provided by Hertensteiner, using known electronic interfacing and programming techniques.  The modification results in an improved virtual environment by allowing for a greater diversity of sound effects to a user to provide for additional interactivity to a user for a more engaging and versatile graphic-audio interactive system.  
Regarding claim 1, the system of claim 12 performs the same method as claim 1 and as such claim 1 is rejected based on the same rationale as claim 12 set forth above.  Furthermore, claim 1 recites the “in accordance with a determination” steps as contingent, and as such only one of the steps is required – See MPEP 2111.04(II).  
Regarding claim 13, the claim recites a non-transitory computer readable storage medium which is substantially the same as the memory recited by claim 12 and as such claim 13 is rejected based on the same rationale as claim 12 set forth above.  
Regarding claim 2, Morris further discloses: 
the event corresponding to activation of the spatialized sound effect includes display of a representation of a participant of a communication session in the three-dimensional environment, (Morris, ¶39: avatars 102 corresponding to other users displayed within the virtual environment at specified positions; ¶47: spatial audio used to place an audio stream corresponding to an avatar in an appropriate position, where users are able to “direct” their conversation to another user by turning their avatar, and also allow for users to have clearer conversations with others by reducing distractions of other nearby conversations)
the representation of the participant including a first visual element having an appearance of a representation of a mouth corresponding to the participant (Morris, Fig. 1A shows avatars with representation of participants’ mouths; Also, ¶46: a three dimensional face corresponding to the user may be rendered on the full body avatar. In one implementation, the face may be animated to match movements of the user's face captured by the camera of the user's computing device (e.g. with animated lips or eyes following the user's lips or eyes, via facial recognition technology), and 
the location in the three-dimensional environment associated with the event is a location in the three-dimensional environment corresponding to where the first visual element is located (Morris, ¶43: Audio corresponding to each avatar (e.g. recorded by microphones at each user's computing device and provided as a media stream) may be spatially mixed based on the relative positions and orientations of the corresponding avatars and the user's avatar, where the stereo audio image of each audio stream may be placed in a position within the stereo field corresponding to the relative position of the avatar to the user's virtual camera and avatar; ¶47: spatial audio may be used to place an audio stream corresponding to an avatar in an appropriate position (e.g. panning within a stereo field or stereo attenuation, dynamically adjusting the level of an audio signal within stereo channels to control its apparent direction and/or distance) corresponding to the relative positions or orientations of the avatar and the viewer)
Regarding claim 3, Morris further discloses: 
the event corresponding to activation of the spatialized sound effect includes display of a representation of a participant of a communication session in the three-dimensional environment, the representation of the participant being a representation of a geometric shape, (Morris, Fig. 1A and ¶¶38-39: teleconferencing virtual environment, including avatars of other users comprising geometric shapes in 3D) and 
the location in the three-dimensional environment associated with the event is a location in the three-dimensional environment corresponding to a center of the representation of the geometric shape (Morris, Fig. 1A shows image of user at center of geometric shape, ¶43: audio streams corresponding to the user and corresponding avatar, where audio corresponding to each avatar (e.g. recorded by microphones at each user's computing device and provided as a media stream) may be spatially mixed based on the relative positions and orientations of the corresponding avatars and the user's avatar; ¶49: detecting user’s speech includes detecting input audio at a microphone of the user’s computing device exceeding a threshold; Also note ¶66 discloses effects automatically triggered based on detecting speech.)
Regarding claim 4, Morris further discloses: 
 the event corresponding to activation of the spatialized sound effect includes display of a user interface element that includes representation of a participant of a communication session in the three-dimensional environment, (Morris, Fig. 1A and ¶¶38-39: teleconferencing virtual environment, including avatars of other users in 3D) and the location in the three-dimensional environment associated with the event is a location in the three-dimensional environment corresponding to a center of the user interface element (Morris, Fig. 1A shows image of user at center of geometric shape, ¶43: audio streams corresponding to the user and corresponding avatar, where audio corresponding to each avatar (e.g. recorded by microphones at each user's computing device and provided as a media stream) may be spatially mixed based on the relative positions and orientations of the corresponding avatars and the user's avatar; ¶49: detecting user’s speech includes detecting input audio at a microphone of the user’s computing device exceeding a threshold; Also note ¶66 discloses effects automatically triggered based on detecting speech.)
Regarding claim 5, Morris further discloses: 
in response to detecting the event and in accordance with the determination that the event corresponds to activation of the sound effect: (Morris, ¶43 and ¶49 discloses detecting speech of multiple participants;  also ¶45 a “virtual avatar” 108 that does not correspond to a user may be placed within the virtual environment and correspond to a video stream and/or audio stream from a source other than a user, such as a media server. Images, slides, animations, music videos, television programs, movies, or other content may be displayed in the same manner as a user's avatar, with a video stream rendered on a surface of the virtual avatar 108 and/or an audio stream spatially placed in the environment (e.g. with attenuation and/or panning as discussed above) at a position corresponding to the virtual avatar)
in accordance with a determination the event corresponding to activation of the spatialized sound effect includes: display of a first user interface element that includes shared virtual content of a communication session in the three-dimensional environment, wherein the shared virtual content is further associated with audio corresponding to the shared virtual content, different from the audio corresponding to a participant (Note that even a second participant reads on this claim, as audio corresponding to a second participant is different from audio for “a participant”; Fig. 1A and ¶¶43-44: audio stream of second avatar, with audio placed in position of avatar in virtual environment; Also ¶45 a “virtual avatar” 108 that does not correspond to a user may be placed within the virtual environment and correspond to a video stream and/or audio stream from a source other than a user, such as a media server. Images, slides, animations, music videos, television programs, movies, or other content may be displayed in the same manner as a user's avatar, with a video stream rendered on a surface of the virtual avatar 108 and/or an audio stream spatially placed in the environment (e.g. with attenuation and/or panning as discussed above) at a position corresponding to the virtual avatar); and 
display of a second user interface element, different from the first user interface element, that includes a representation of a participant of a communication session in the three-dimensional environment, wherein the second user interface element is further associated with audio corresponding to the participant, different from the audio corresponding to a visual shared virtual content: (Morris, Fig. 1A and ¶45 a “virtual avatar” 108 that does not correspond to a user may be placed within the virtual environment and correspond to a video stream and/or audio stream from a source other than a user, such as a media server. Images, slides, animations, music videos, television programs, movies, or other content may be displayed in the same manner as a user's avatar, with a video stream rendered on a surface of the virtual avatar 108 and/or an audio stream spatially placed in the environment (e.g. with attenuation and/or panning as discussed above) at a position corresponding to the virtual avatar)
presenting, via the audio output device, the audio corresponding to the participant as spatialized audio that emanates from a first location in the three-dimensional environment, wherein the first location is above the display of the second user interface element in the three-dimensional environment; and (Morris, Figs. 1A and 1B and ¶43: the stereo audio image of each audio stream may be placed in a position within the stereo field corresponding to the relative position of the avatar to the user's virtual camera and avatar, allowing easy localization of speakers and separation of simultaneous speakers at distinct positions – 1A showing avatar “above” other avatar location; Also Figs. 1A-1B and ¶45: a virtual avatar 108 that does not correspond to a user may be placed within the virtual environment and correspond to a video stream and/or audio stream from a source other than a user where spatial audio from a virtual avatar 108 may be provided to users with avatars 102 in a region “beneath” the virtual avatar or in a direction of its facing, using the same techniques discussed above with regards to other avatars and audio streams); 
presenting, via the audio output device, the audio corresponding to the visual shared virtual content as spatialized audio that emanates from a second location, different from the first location, corresponding to a center of the first user interface element in the three-dimensional environment (Morris, ¶43: the stereo audio image of each audio stream may be placed in a position within the stereo field corresponding to the relative position of the avatar to the user's virtual camera and avatar, allowing easy localization of speakers and separation of simultaneous speakers at distinct positions; also ¶45 a “virtual avatar” 108 that does not correspond to a user may be placed within the virtual environment and correspond to a video stream and/or audio stream from a source other than a user, such as a media server. Images, slides, animations, music videos, television programs, movies, or other content may be displayed in the same manner as a user's avatar, with a video stream rendered on a surface of the virtual avatar 108 and/or an audio stream spatially placed in the environment (e.g. with attenuation and/or panning as discussed above) at a position corresponding to the virtual avatar; Fig. 1B and ¶45 discloses spatial audio provided in area corresponding to center of region shown in Fig. 1B)
Regarding claim 6, Morris further discloses: 
Wherein the predetermined location for audio in the three-dimensional environment that has the predetermined spatial relationship relative to the viewpoint of the user is within in a viewport of the computer system (Morris, Fig. 1A and ¶38: FIG. 1A is an illustration of an example of a teleconferencing virtual environment 10 corresponding to a viewport of a virtual camera or view of a display rendered for a user (e.g. on a display of a computing device of the user), according to some implementations. The virtual environment may, in some implementations, comprise a ground plane 20 and skybox 30, and may include additional environmental objects not illustrated, including walls, buildings, stairs, ramps, platforms, mountains, water features, clouds, or any other such objects. For example, in some implementations, the virtual environment may be configured as a club or bar with corresponding features (e.g. walls, tables and chairs, a stage or dance floor, or other such features). Although shown as a single two dimensional image of a three dimensional environment, in some implementations, stereoscopic views may be provided (e.g. for virtual reality headsets or similar devices). 
Regarding claim 7, Morris further discloses: 
wherein presenting, via the audio output device, the sound effect as emanating from the predetermined location for audio in the three-dimensional environment includes presenting the sound effect as synthesized stereo audio (Morris, ¶43: audio corresponding to each avatar is recorded at each user’s computing device and provided as a media stream, where the audio is spatially mixed based on relative positions and orientations of the corresponding avatars, such that the stereo audio image of each audio stream may be placed in a position within the stereo field corresponding to the relative position of the avatar to the user's virtual camera and avatar, allowing easy localization of speakers and separation of simultaneous speakers at distinct positions, including left and right stereo fields)
Regarding claim 8, Morris further discloses: 
the event corresponding to activation of the sound effect includes display of a user interface element that includes a representation of a participant of a communication session in the three-dimensional environment, presenting, via the audio output device, the sound effect as emanating from the predetermined location for audio in the three-dimensional environment includes presenting the sound effect as synthesized stereo audio from the predetermined location for audio, different from a location of the user interface element in the three-dimensional environment. (Morris, Figs. 1A and 1B, element 108, and ¶45: spatial audio from a virtual avatar 108 may be provided to users with avatars 102 in a region “beneath” the virtual avatar or in a direction of its facing, using the same techniques discussed above with regards to other avatars and audio streams; ¶45 a “virtual avatar” 108 that does not correspond to a user may be placed within the virtual environment and correspond to a video stream and/or audio stream from a source other than a user, such as a media server. Images, slides, animations, music videos, television programs, movies, or other content may be displayed in the same manner as a user's avatar, with a video stream rendered on a surface of the virtual avatar 108 and/or an audio stream spatially placed in the environment (e.g. with attenuation and/or panning as discussed above) at a position corresponding to the virtual avatar; Fig. 1B and ¶45 discloses spatial audio provided in area corresponding to center of region shown in Fig. 1B)
Morris modified by Hertensteiner further discloses:
the event corresponding to activation of the non-spatialized sound effect (Hertensteiner, ¶24: the virtual environment may be associated with musical cues or ambient sounds that may or may not be associated with a particular location, and processor can determine an audio signal corresponding to a “listener” coordinate—for instance, an audio signal corresponding to a composite of sounds in the virtual environment, and mixed and processed to simulate an audio signal that would be heard by a listener at the listener coordinate—and present the audio signal to a user via one or more speakers; ¶62: some client applications (such as applications that play non-spatialized music tracks, or menu systems that present non-spatialized sound effects) need not interact with virtual environments, even if those applications are in use by a user of a MR system)
	Both Morris and Hertensteiner are directed to systems for providing spatial audio to users of 3D virtual environments.  It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, and with a reasonable expectation of success, to modify the system and technique for providing 3D spatial audio with 3D visual graphics for an interactive, collaborative virtual environment with spatial audio based on user location relative to virtual objects generating sounds to a user as provided by Morris, by further accounting for different types of sounds effects as provided by Hertensteiner, using known electronic interfacing and programming techniques.  The modification results in an improved virtual environment by allowing for a greater diversity of sound effects to a user to provide for additional interactivity to a user for a more engaging and versatile graphic-audio interactive system.  
Regarding claim 9, Morris further discloses: 
wherein the computer system is in a communication session with one or more other participants and wherein the sound effect includes audio corresponding to the one or more other participants of the communication session, and the method comprising: while in the communication session with the one or more other participants: in accordance with a determination that the event includes display, in the three-dimensional environment, of a representation of a first three-dimensional environment of a participant of the communication session, presenting, via the audio output device, the audio corresponding to the one or more other participants of the communication session as spatialized audio emanating from a location in the three-dimensional environment that is above the display of the representation of the first three-dimensional environment (Morris, Fig. 1A and ¶¶38-39: avatars 102 corresponding to other users displayed within the virtual environment at specified positions; ¶43: audio corresponding to each avatar is recorded at each user’s computing device and provided as a media stream, where the audio is spatially mixed based on relative positions and orientations of the corresponding avatars, such that the stereo audio image of each audio stream may be placed in a position within the stereo field corresponding to the relative position of the avatar to the user's virtual camera and avatar – note in figure 1A, the avatars are shown above the environment ground plane 20; ¶47: spatial audio used to place an audio stream corresponding to an avatar in an appropriate position, where users are able to “direct” their conversation to another user by turning their avatar, and also allow for users to have clearer conversations with others by reducing distractions of other nearby conversations; ¶78: system for teleconferencing virtual environment).
Regarding claim 10, Morris further discloses: 
while presenting a sound effect as emanating from the location in the three-dimensional environment associated with the event or from the predetermined location for audio, wherein the viewpoint of the user of is a first viewpoint of the user while presenting the sound effect as emanating from the location in the three-dimensional environment associated with the event or from the predetermined location for audio: detecting, via the one or more input devices, an action corresponding to a request to change a viewpoint of the user from the first viewpoint of the user to a second viewpoint of the user different from the first viewpoint of the user and in response to detecting the action: displaying, via the display generation component, the three-dimensional environment from the second viewpoint of the user; (Morris, ¶39: avatars 102 may be moved within the three dimensional environment by the corresponding user with up to six degrees of freedom, where a user may use a mouse to freely look around without mouse movements, may move using arrow keys, and same rotations and/or translations may be applied to the virtual camera of the user within the three dimensional environment, or body movement detection, such that the user's viewpoint corresponds to the orientation and position of the avatar 102 and/or video stream 104 position or facing; ¶40 discloses turning while conversation occurs); and 
in accordance with a determination that the sound effect is being presented as emanating from the location in the three-dimensional environment associated with the event when the action is detected, continuing presenting, via the audio output device, the sound effect as emanating from the location in the three-dimensional environment associated with the event; and in accordance with a determination that the sound effect is being presented as emanating from the predetermined location for audio in the three-dimensional environment that has the predetermined spatial relationship relative to the first viewpoint of the user when the action is detected, changing the location of the sound effect such that the sound effect is being presented as emanating from a first predetermined location for audio in the three-dimensional environment that has the predetermined spatial relationship relative to the second viewpoint of the user, wherein the predetermined spatial relationship relative to the second viewpoint of the user that the first predetermined location for audio has is the same as the predetermined spatial relationship relative to the first viewpoint of the user.(Morris, ¶40:  first user may turn their virtual camera to look at an avatar of another user (e.g. in response to them speaking), and the first user's avatar 102 may turn correspondingly. The other user (and any additional users) may see the first user's avatar's rotation, and may intuitively interpret this as a signal that the first user is paying attention to the other user. As shown, this correspondence of avatar and video stream may allow users to face each other to have a conversation in a group, e.g. groups 106A and 106B, which may be distributed throughout the space; ¶43: Audio corresponding to each avatar (e.g. recorded by microphones at each user's computing device and provided as a media stream) may be spatially mixed based on the relative positions and orientations of the corresponding avatars and the user's avatar, where the stereo audio image of each audio stream may be placed in a position within the stereo field corresponding to the relative position of the avatar to the user's virtual camera and avatar, allowing easy localization of speakers and separation of simultaneous speakers at distinct positions – i.e. relative position of audio mapped to individual avatars in the virtual environment is consistent across viewpoints of user avatar as user moves/rotates avatar position)
Regarding claim 11, Morris further discloses: 
wherein the spatialized sound effect or the non-spatialized sound effect is associated with a user interface element that is displayed at a first location in the three-dimensional environment, and the method comprising: while presenting the sound effect that is associated with the user interface element and while displaying the user interface element, detecting, via the one or more input devices, an action corresponding to a request to change a location of the user interface element that is associated with the sound effect from the first location in the three-dimensional environment to a second location, different from the first location, in the three-dimensional environment; and in response to detecting the action: displaying, via the display generation component, the user interface element at the second location in the three-dimensional environment; (Morris, ¶39: avatars 102 may be moved within the three dimensional environment by the corresponding user with up to six degrees of freedom, where a user may use a mouse to freely look around without mouse movements, may move using arrow keys, and same rotations and/or translations may be applied to the virtual camera of the user within the three dimensional environment, or body movement detection, such that the user's viewpoint corresponds to the orientation and position of the avatar 102 and/or video stream 104 position or facing, and where translation/movement is also applied to the avatar 102 to move and reorient it within the environment 10);
in accordance with a determination that the sound effect is the spatialized sound effect that is associated with the user interface element, wherein the spatialized sound effect is being presented at the first location in the three-dimensional environment when the action is detected, changing the location of the presentation of the spatialized sound effect such that the sound effect is being presented as emanating from the second location in the three-dimensional environment; (Morris, ¶43: Audio corresponding to each avatar (e.g. recorded by microphones at each user's computing device and provided as a media stream) may be spatially mixed based on the relative positions and orientations of the corresponding avatars and the user's avatar, where the stereo audio image of each audio stream may be placed in a position within the stereo field corresponding to the relative position of the avatar to the user's virtual camera and avatar; ¶47: spatial audio may be used to place an audio stream corresponding to an avatar in an appropriate position (e.g. panning within a stereo field or stereo attenuation, dynamically adjusting the level of an audio signal within stereo channels to control its apparent direction and/or distance) corresponding to the relative positions or orientations of the avatar and the viewer
Morris modified by Hertensteiner further discloses: 
In accordance with a determination that the event corresponds to activation of a spatialized sound effect, presenting, via the audio output device, the sound effect as emanating from a location in the three-dimensional environment associated with the event (Hertensteiner, ¶24: a virtual object in the virtual environment may generate a sound originating from a location coordinate of the object (e.g., a virtual character may speak or cause a sound effect; ¶30: a user/listener/head coordinate system 114 (comprising an x-axis 114X, a y-axis 114Y, and a z-axis 114Z) with its origin at point 115 (e.g., user/listener/head coordinate) can define a coordinate space for the user/listener/head on which the mixed reality system 112 is located; ¶36: virtual monster associated with one or more audio channels such as a footstep sound effect generated as the monster walks around the MRE; Also ¶55: virtual monster emits sound corresponding to monster’s speech or sound effects, or real object, such as lamp, emitting virtual sound corresponding to lamp being switched on or off, where virtual sound can correspond to a position and orientation of the sound source (whether real or virtual))
and in accordance with a determination that the sound effect is the non-spatialized sound effect that is associated with the user interface element, continuing presenting, via the audio output device, the sound effect as emanating from the predetermined location for audio in the three-dimensional environment that has the predetermined spatial relationship relative to the viewpoint of the user (Hertensteiner, ¶24: the virtual environment may be associated with musical cues or ambient sounds that may or may not be associated with a particular location, and processor can determine an audio signal corresponding to a “listener” coordinate—for instance, an audio signal corresponding to a composite of sounds in the virtual environment, and mixed and processed to simulate an audio signal that would be heard by a listener at the listener coordinate—and present the audio signal to a user via one or more speakers; ¶62: some client applications (such as applications that play non-spatialized music tracks, or menu systems that present non-spatialized sound effects) need not interact with virtual environments, even if those applications are in use by a user of a MR system)
	Both Morris and Hertensteiner are directed to systems for providing spatial audio to users of 3D virtual environments.  It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, and with a reasonable expectation of success, to modify the system and technique for providing 3D spatial audio with 3D visual graphics for an interactive, collaborative virtual environment with spatial audio based on user location relative to virtual objects generating sounds to a user as provided by Morris, by further accounting for different types of sounds effects as provided by Hertensteiner, using known electronic interfacing and programming techniques.  The modification results in an improved virtual environment by allowing for a greater diversity of sound effects to a user to provide for additional interactivity to a user for a more engaging and versatile graphic-audio interactive system.  
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM A BEUTEL whose telephone number is (571)272-3132. The examiner can normally be reached Monday-Friday 9:00 AM - 5:00 PM (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DANIEL HAJNIK can be reached at 571-272-7642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM A BEUTEL/Primary Examiner, Art Unit 2616
Read full office action
Prosecution Timeline

Jul 23, 2024
Application Filed
Jan 23, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/344,299
Patent 12581262
AUGMENTED REALITY INTERACTION METHOD AND ELECTRONIC DEVICE
2y 5m to grant Granted Mar 17, 2026
18/307,238
Patent 12572258
APPARATUS AND METHOD WITH IMAGE PROCESSING USER INTERFACE
2y 5m to grant Granted Mar 10, 2026
17/948,480
Patent 12566531
CONFIGURING A 3D MODEL WITHIN A VIRTUAL CONFERENCING SYSTEM
2y 5m to grant Granted Mar 03, 2026
18/342,458
Patent 12561927
MEDIA RESOURCE DISPLAY METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Feb 24, 2026
18/199,695
Patent 12554384
SYSTEMS AND METHODS FOR IMPROVED CONTENT EDITING AT A COMPUTING DEVICE
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
70%
Grant Probability
90%
With Interview (+20.4%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 469 resolved cases by this examiner. Grant probability derived from career allow rate.