Last updated: April 19, 2026
Application No. 18/502,644
SYSTEMS AND METHODS OF RECEIVING VOICE INPUT

Final Rejection §103§112
Filed
Nov 06, 2023
Examiner
REPSHER III, JOHN T
Art Unit
2143
Tech Center
2100 — Computer Architecture & Software
Assignee
Sonos Inc.
OA Round
4 (Final)
Interview Optional

— +48.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 347 resolved cases, 2023–2026
Examiner Intelligence

REPSHER III, JOHN T View full profile →
Grants 58% of resolved cases
Career Allow Rate
203 granted / 347 resolved
+3.5% vs TC avg
Strong +48% interview lift
Without
With
+48.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
18 currently pending
Career history
365
Total Applications
across all art units
Statute-Specific Performance

§101
8.9%
-31.1% vs TC avg
§103
49.6%
+9.6% vs TC avg
§102
12.7%
-27.3% vs TC avg
§112
20.6%
-19.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 347 resolved cases
Office Action

§103 §112
DETAILED ACTION

Remarks
Claims 1-20 and 22 have been examined and rejected. This Office action is responsive to the amendment filed on 01/15/2026, which has been entered in the above identified application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-14 and 22 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 

Regarding claim 1, claim 1 recites “receiving, via the at least one microphone, voice input data from a first source, wherein the first playback device is farther away from the first source than the second playback device is from the first source, determining, in response to receiving the voice input data from the first source, feedback for playback, the determined feedback for playback comprising at least one audio feedback element and at least one visual feedback element, determining a first location of the first playback device relative to the first source, wherein the first playback device is farther away from the first source than the second playback device is from the first source, determining a second location of the second playback device relative to the first source, determining, based on information related to a bonded configuration of the first playback device and the second playback device, that the first playback device is closer to a visual output device than the second playback device, after determining that the first playback device is closer to the visual output device than the second playback device, selecting, the first playback device for playback of the determined feedback for playback, and causing output, of the determined feedback for playback, by the first playback device so that the determined feedback for playback is output farther away from the first source than the second playback device is from the first source”.  Per the instant specification:
[0106] In some embodiments, one or more of the components 551-556 described above can operate in conjunction with the microphone array 524 to determine the location of a user in the home environment and/or relative to a location of one or more of the NMDs 103. The location or proximity of a user may be detected and compared to a variable stored in the command information 590, as described below.
[0138] To address the aforementioned shortcomings of conventional systems, the disclosed technology determines one or more feedback parameters derived from the voice input data, media content data, and/or data related to the listening environment (such as secondary data) and, based on those parameters, selects the feedback element(s) and/or tailors the characteristics of the selected feedback element(s). Such parameters include, for example, the type of command, the type of media content, the input interface over which the audio content is received, the grouping and/or location (relative to the user, environment, or other playback devices) of the NMD receiving the voice input, the volume at which the media content is being played back (if the voice input is received while media content is being played back), the amount of background noise, and a particular user profile
[0162] In those embodiments where all of the playback devices are voice-enabled (such as that shown in FIG. 12B), the process 1100 may select one or more of the voice-enabled playback devices for output of the one or more feedback elements. The process 1100 may select the playback device(s) based on a variety of factors, such as proximity of the playback device to the user and/or location of the playback device relative to the user, another one or more playback devices, and/or the visual output device (e.g., a television, a projector screen, etc.). For instance, the process 1100 may determine that the first playback device is closer to the user than the second and third playback devices (for example, if the user is sitting at the left side of the couch) and, based on that determination, the process 1100 may cause the feedback element(s) to be output on the first playback device. As another example, the process 1100 may determine, based on information related to the bonded configuration of the first, second, and third playback devices, that the third playback device is closest to the visual output device (e.g., the television) and, based on that determination, the process 1100 may cause the feedback element(s) to be output on the third playback device. The inventors have recognized that outputting the feedback element(s) through the playback device closest to the visual output device is generally preferred by the user, as the user is more accustomed to receiving an audio feedback element from a playback device at the center of their visual attention rather than one that is out of sight (emphasis added). 

The specification appears to disclose a first example of determining a closest playback device based on locations of first and second devices relative to a location of a source.  The specification further appears to disclose a second, separate example of selecting a device based on a “bonded configuration of the first, second, and third playback devices”; however, the second example, as discussed in [0162] and shown in Fig. 12, does not include the first playback device is farther away from the first source than the second playback device and causing output of the determined feedback for playback by the first playback device so that the determined feedback for playback is output farther away from the first source than the second playback device is from the first source.  The second example does not determine a first location of the first playback device relative to the first source, wherein the first playback device is farther away from the first source than the second playback device is from the first source or determine a second location of the second playback device relative to the first source.  The determination to output the feedback on the third device is made “based on information related to the bonded configuration of the first, second, and third playback devices, that the third playback device is closest to the visual output device (e.g., the television)”.  As described in the first example, when using relative locations with respect to a source, the closest device to the source is selected.  In the second example, a different output device may be selected; however, this selection does not consider locations relative to the source, but a bonded configuration of the three devices.
Thus, the specification does not disclose receiving, via the at least one microphone, voice input data from a first source, wherein the first playback device is farther away from the first source than the second playback device is from the first source, determining, in response to receiving the voice input data from the first source, feedback for playback, the determined feedback for playback comprising at least one audio feedback element and at least one visual feedback element, determining a first location of the first playback device relative to the first source, wherein the first playback device is farther away from the first source than the second playback device is from the first source, determining a second location of the second playback device relative to the first source, determining, based on information related to a bonded configuration of the first playback device and the second playback device, that the first playback device is closer to a visual output device than the second playback device, after determining that the first playback device is closer to the visual output device than the second playback device, selecting, the first playback device for playback of the determined feedback for playback, and causing output, of the determined feedback for playback, by the first playback device so that the determined feedback for playback is output farther away from the first source than the second playback device is from the first source.

Regarding claim 8, claim 8 contains substantially similar limitations to those found in claim 1.  Consequently, claim 8 is rejected for the same reasons.
Regarding claims 2-7, 9-14, and 22, claims 2-7, 9-14, and 22 are also rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as being dependent on parent claims failing to comply with the written description requirement.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-14 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Kim (US 20180277107 A1, published 09/27/2018) in view of Hart et al. (US 9842584 B1, published 12/12/2017), hereinafter Hart, in further view of Williams et al. (US 9898250 B1, published 02/20/2018), hereinafter Williams, in further view of Janus (US 20140270280 A1, published 09/18/2014).

Regarding claim 1, Kim teaches the claim comprising:
A media playback system associated with a listening environment, the media playback system comprising: a first playback device; and a second playback device comprising: at least one microphone, one or more processors, and tangible computer-readable memory storing instructions that, when executed by the one or more processors, cause the second playback device to perform operations comprising (Kim Figs. 1-8; [0021], FIG. 1 is a schematic diagram illustrating a multi-device intelligent personal assistant (IPA) system 100, configured to implement one or more aspects of the various embodiments. Multi-device IPA system 100 includes a master smart device 120, a slave smart device 130, and a slave smart device 140, all communicatively connected to each other via communication network 150. Also shown in FIG. 1 is a user 90, who generates a user request via a verbal utterance 91; [0023], Master smart device 120 also generates an audio signal 121 via a microphone 122 in response to verbal utterance 91; [0025], FIG. 2 illustrates a computing device 200 configured to implement one or more aspects of the disclosure. Computing device 200 may be employed as master smart device 120, slave smart device 130, and/or slave smart device 140; [0026], computing device 200 includes, without limitation, an interconnect (bus) 240 that connects a processing unit 250, an input/output (I/O) device interface 260 coupled to input/output (I/O) devices 280, memory 210, a storage 230, and a network interface 270; [0031], Memory 210 includes various software programs that can be executed by processor 250 and application data associated with said software programs, including speech recognition application 211, audio signal merging application 212, loudness matching application 213, temporal alignment application 214, master selection application 215 and/or topology application 216):
receiving, via the at least one microphone, voice input data from a first source,
determining, in response to receiving the voice input data from the first source, feedback for playback, the determined feedback for playback comprising at least one audio feedback element (Kim Figs. 1-8; [0023], Each of master smart device 120, slave smart device 130, and slave smart device 140 is an IPA-enabled computing device configured to receive and act on certain voice commands from a user. In operation, one or more of master smart device 120, slave smart device 130, and slave smart device 140 detect verbal utterance 91 and convert verbal utterance 91 to a respective audio signal, such as a digital audio signal. Thus, slave smart device 130 generates an audio signal 131 in response to verbal utterance 91, for example via a microphone 132, and transmits audio signal 131 to master smart device 120. Similarly, slave smart device 140 generates an audio signal 141 in response to verbal utterance 91, for example via a microphone 142, and transmits audio signal 141 to master smart device 120. Master smart device 120 also generates an audio signal 121 via a microphone 122 in response to verbal utterance 91, and then constructs a speech recognition audio signal based on portions of audio signal 131, audio signal 141, and/or audio signal 121, as described in greater detail below. The speech recognition audio signal is then transferred to a speech recognition application for evaluation. When a response audio signal 125 is returned by the speech recognition application, master smart device 120 determines which smart device in multi-device IPA system 100 is closest to user 90, and transmits response audio signal 125 to that smart device for conversion into sound energy by an appropriate loudspeaker 123, 133, or 143; [0037], Master smart device 120 is further configured to determine which smart device in multi-device IPA system 100 is closest to user 90 and provide that smart device with any response audio signal 125 returned by speech recognition application 211. Consequently, the appropriate smart device in multi-device IPA system 100 provides any forthcoming audio response to user 90; [0053], response audio signal 125 may include a speech-based response to the voice command or commands detected in step 408; see also [0027-0028], smart devices include screens),
determining a first location of the first playback device relative to the first source, determining a second location of the second playback device relative to the first source, determining, based on information related to a bonded configuration of the first playback device and the second playback device, selecting the first playback device for playback of the determined feedback for playback, and causing output, of the determined feedback for playback, by the first playback device so that the determined feedback for playback is output (Kim Figs. 1-8; [0023], Each of master smart device 120, slave smart device 130, and slave smart device 140 is an IPA-enabled computing device configured to receive and act on certain voice commands from a user; When a response audio signal 125 is returned by the speech recognition application, master smart device 120 determines which smart device in multi-device IPA system 100 is closest to user 90, and transmits response audio signal 125 to that smart device for conversion into sound energy by an appropriate loudspeaker 123, 133, or 143 (as shown in Fig. 1, loudspeaker 133 is farther from the source than microphone 132); [0044], It is noted that the relative signal strength of one of the audio signals received in step 401 with respect to the other audio signals can vary throughout temporal segments 501A-501N. For example, audio signal 131 has the strongest audio signal strength in temporal segments 510, whereas audio signal 141 has the strongest audio signal strength in temporal segments 520. Such a change in relative audio signal strength may be a result of a change in location or orientation of user 90 with respect to one or more of master smart device 120, slave smart device 130, or slave device 140. Thus, during the time interval represented by temporal segments 510, user 90 may be proximate or directly facing slave smart device 130, while in the time interval represented by temporal segments 520, user 90 may be more directly facing or closer to slave smart device 140 (causing output of the determined feedback by the first playback device so that the feedback is output farther away from the first source than from where the voice input from the first source was received); [0054], In step 410, master smart device 120 determines which of the smart devices included in multi-device IPA system 100 is closest to user 90. In some embodiments, master smart device 120 determines which smart devices is closest to user 90 based on segment metadata 302. Specifically, the master smart device 120 may determine that the smart device that is closest to user 90 is the smart device from which the last temporal segment 531N of speech recognition audio signal 530 originated; [0055], In step 411, master smart device 120 transmits response audio signal 125 to the smart device determined to be the closest to user 90 in step 410. Thus, the smart device that is located closest to user 90 provides the audible response to voice commands included in verbal utterance 91; see also [0027-0028], smart devices include screens)
However, Kim fails to expressly disclose at least one visual feedback element.  In the same field of endeavor, Hart teaches:
	at least one visual feedback element (Hart Figs. 1-8; col. 2 [line 10], the first device includes a microphone for generating audio signals representative of user speech, as well as a speaker for outputting audible content in response to identified voice commands in the user speech. However, the first device might not include a display for displaying graphical content. As such, the first device may be configured to identify devices that include displays and that are proximate to the first device. The first device may then instruct one or more of these other devices to output visual content associated with a user's voice command; col. 3 [line 4], the second device may display content in any number of ways. In some implementations, the second device may include an application that is specifically configured to interact with the first device (e.g., a “companion application”). The companion application may be configured to receive information and/or instructions from the first device and/or a remote computing resource and display the appropriate content associated with the user's command. For instance, the application may display one or more links that lead to web sites, applications, or other destinations that include content about Benjamin Franklin. Additionally or alternatively, the application may directly pull in and display this content, such as detailed information about Benjamin Franklin; col. 3 [line 32], the second device awakens and directly causes display of the content upon receiving the instruction from the first device; col. 6 [line 28], while the voice-controlled device 106 may utilize these other devices to output visual content, the voice-controlled device 106 may additionally or alternatively utilize these devices to output additional audible content; col. 10 [line 25], the first content comprises audible content while the second content comprises visual content; col. 10 [line 39], the process 600 identifies a device on which to output the content by identifying a device that is within a threshold distance of the first device and/or the user. In other instances, the process 600 identifies and selects a device on which to output content based on a type of the device, information regarding whether the device is powered on, and the like. The process 600 may also ensure that this other device is associated with the user. At 608, the process 600 visually presents the content on the identified device that is within the threshold distance)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated at least one visual feedback element as suggested in Hart into Kim.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, and portable communication devices. As computing devices evolve, many different ways have been introduced to allow users to interact with these devices, such as through mechanical means (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through a user speaking to a device and the device outputting audio to the user in return. However, in some instances, certain content is best output in a form other than audio alone (see Hart col. 1 [line 14]).  The system of Hart is applicable to any client device (see Hart col. 2 [line 25]) to display any type of visual content (see Hart col. 11 [line 52]).  The system of Hart would improve the system of Kim by enabling the master device to provide both audible and visual output to the slave devices (see Hart col. 6 [line 28]), thereby ensuring the user receives desired audible and visual feedback in the best output form (see Hart col. 1 [line 14]).  
	However, Kim in view of Hart fails to expressly disclose wherein the first playback device is farther away from the first source than the second playback device is from the first source, wherein the first playback device is farther away from the first source than the second playback device is from the first source, determining, based on information related to a bonded configuration of the first playback device and the second playback device, selecting the first playback device for playback of the determined feedback for playback, causing output, of the determined feedback for playback, by the first playback device so that the determined feedback for playback is output farther away from the first source than the second playback device is from the first source.  In the same field of endeavor, Williams teaches:
wherein the first playback device is farther away from the first source than the second playback device is from the first source, wherein the first playback device is farther away from the first source than the second playback device is from the first source, determining, based on information related to a bonded configuration of the first playback device and the second playback device, selecting the first playback device for playback of the determined feedback for playback, causing output, of the determined feedback for playback, by the first playback device so that the determined feedback for playback is output farther away from the first source than the second playback device is from the first source (Williams Figs. 1-18; col. 7 [line 18], the server(s) 112 may select every speaker 20 and/or output zone, may identify a preferred output zone (e.g., living room) based on previously received commands, may identify speaker(s) and/or an output zone associated with the device 110 that received the input audio 11; col. 8 [line 18], the system 100 may enable the user 10 to instruct the server(s) 112 to generate output audio 30 using any combination of the speaker(s) 20; the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); col. 8 [line 46], an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 17 [line 33], the server(s) 112 may generate output audio 30 using the speakers 20a-1/20a-2/20b/20c and/or device 110c in Room 1, Room 3 and/or Room 4 of the house 440; the device 110c (e.g., television) may act as an input device (e.g., include a microphone array configured to receive the input audio 11) and as an output device (e.g., include speakers configured to generate the output audio 30); while devices 110a, 110b-1 and 110b-2 are included as input devices, they may generate output audio 30 without departing from the present disclosure; col. 18 [line 44], FIG. 5A illustrates output devices located in house 540a, such as device 110a in Room 1, speaker 20a-1 and speaker 20a-2 in Room 1, device 110c (e.g., television) in Room 1, speaker 20b in Room 3 and speaker 20c in Room 4; col. 19 [line 8], the user 10 and/or the server(s) 112 may select output devices and generate output zones, as illustrated in FIG 5C; a house 540b illustrated in FIG. 5C include the device 110a, the device 110c and the speakers 20a in Zone 1, speaker 20b in Zone 2 and speaker 20c in Zone 4, as illustrated by interface 520 shown in FIG. 5D; Zone 5 (not shown) may include Zone 1, Zone 2, Zone 3 and Zone 4 and may be used to generate output audio 30 all over the house 540b; col. 19 [line 50], an output zone may include input devices and/or output devices in multiple rooms; col. 20 [line 53], the server(s) 112 may refer to an input device 110, a location of the input device 110, a first input/output association or the like and the speaker controller 22 may identify a second input/output association and/or output zone including corresponding speaker(s) 20. Thus, the server(s) 112 may send the instruction to the speaker controller 22 indicating output devices; col. 22 [line 22], a device 110a may receive input audio 712 from user 10 in Room 1 and the server(s) 112 may determine that selected output devices 710 include speakers 20a in Room 1, speaker 20b in Room 3 and speaker 20c in Room 4; col. 20 [line 6], a speaker 20 (e.g., first speaker 20a-1) may have an identification (e.g., unique name), location (e.g., specific location) and/or address (e.g., network address) used by both the server(s) 112 and the speaker controller 22. As a second example, the server(s) 112 and the speaker controller 22 may both group speakers 20a-1 and 20a-2 in a first output zone (e.g., zone 1) and the first output zone may have an identification (e.g., unique name), location (e.g., specific room or location); col. 22 [line 32], As illustrated in FIG. 7B, the device 110a may receive input audio 722 from user 10 in Room 1 and the server(s) 112 may determine that selected output devices 720 include speakers 20a in Room 1; col. 22 [line 46], As illustrated in FIG. 7D, a device 110b-2 may receive input audio 742 from user 10 in Room 2 and the server(s) 112 may determine that selected output devices 740 include speakers 20a in Room 1. Thus, the server(s) 112 may send audio data to the selected output devices 740 and the selected output devices 740 may generate output audio 744 in a different output zone than where the input audio 742 was received; col. 22 [line 54], the speaker controller 22 may determine the selected output devices; col. 25 [line 51], the server(s) 112 may generate voice output indicating to the user 10 that the command was performed. For example, the voice output may state “audio muted,” “increasing volume,” “decreasing volume” or the like; col. 32 [line 21], While the speaker(s) 20 is playing the output audio, an input device 110 may receive (1214) input audio; col. 32 [line 44], the speaker controller 22 may determine (1226) output devices, as discussed above with regard to FIGS. 11A-11C, and may send (1228) the second command and the voice output data to the speaker(s) 20. The speaker(s) 20 may lower (1230) the volume of the output audio from a first volume level to a second volume level, play (1232) voice output using the voice output data and raise (1234) the volume of the output audio from the second volume level to the first volume level; examiner note: per the instant specification [0162], the determination to output on speakers closest to the television is “based on information related to the bonded configuration of the first, second, and third playback devices”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein the first playback device is farther away from the first source than the second playback device is from the first source, wherein the first playback device is farther away from the first source than the second playback device is from the first source, determining, based on information related to a bonded configuration of the first playback device and the second playback device, selecting the first playback device for playback of the determined feedback for playback, causing output, of the determined feedback for playback, by the first playback device so that the determined feedback for playback is output farther away from the first source than the second playback device is from the first source as suggested in Williams into Kim in view of Hart.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, speakers and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through natural language input such as speech input (see Williams col. 1 [line 23]).  An environment may include a number of different entertainment systems, including standalone speakers, wired speakers, wireless speakers or the like. However, the different entertainment systems may be separated as different devices are controlled separately from each other (see Williams col. 2 [line 26]).   Using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands (see Williams col. 2 [line 49]).  Additionally, outputting audio based on a location would enable the user to receive audio in a preferred output zone based on the voice command (see Williams col. 7 [line 18]) which would enable the user to flexibly receive desired audio feedback in preferred locations on desired devices.
However, Kim in view of Hart in further view of Williams fails to expressly disclose determining, based on information related to a bonded configuration of the first playback device and the second playback device, that the first playback device is closer to a visual output device than the second playback device, after determining that the first playback device is closer to the visual output device than the second playback device, selecting the first playback device for playback of the determined feedback for playback.  In the same field of endeavor, Janus teaches:
 determining, based on information related to a bonded configuration of the first playback device and the second playback device, that the first playback device is closer to a visual output device than the second playback device, after determining that the first playback device is closer to the visual output device than the second playback device, selecting the first playback device for playback of the determined feedback for playback (Janus Figs. 1-7; [0015], audio effects may comprise sounds, tones, speech, music, and/or any other types of audio effects; [0017], audio playback devices 150-p may comprise one or more wired and/or wireless speakers; [0018],apparatus 100, displays 145-n, and audio playback devices 150-p are illustrated as separate components; [0019], apparatus 100 and/or system 140 may be operative to generate audio playback information operative on one or more audio playback devices 150-p to cause one or more desired audio effects to be generated. In some embodiments, apparatus 100 and/or system 140 may be operative to generate audio playback information based on audio information that corresponds to particular visual effects. For example, apparatus 100 and/or system 140 may be operative to generate audio playback information for an error chime corresponding to an error window in an operating system. In various such embodiments, apparatus 100 and/or system 140 may be operative to generate audio playback information such that for a user listening to audio playback devices 150-p, the apparent origin of a given audio effect corresponds to the position of its associated visual effect on one or more displays 145-n. Continuing with the previous example, apparatus 100 and/or system 140 may be operative to generate audio playback information such that if the error window appears in an upper right corner of a display 145-n, the apparent origin of the error chime to a listening user is also the upper right corner of the display; [0020], audiovisual application 107 may be operative to generate graphics information 108. Graphics information 108 may comprise data, information, logic, and/or instructions corresponding to one or more user interface elements to be displayed on one or more displays 145-n. Such user interface elements may comprise any visual or optical sensory effect(s) such as, for example, images, pictures, video, text, graphics, menus, textures, and/or patterns. Such user interface elements may be associated with menus, prompts, and/or controls usable to operate audiovisual application 107, and/or may be associated with content presented by audiovisual application 107; [0021], audiovisual application 107 may be operative to generate audio information 110 corresponding to graphics information 108. Audio information 110 may comprise data, information, logic, and/or instructions corresponding to one or more audio effects to be produced by one or more audio playback devices 150-p in conjunction with the presentation of one or more user interface elements by one or more displays 145-n. In an example embodiment in which audiovisual application 107 comprising an operating system, particular audio information 110 may correspond to an alert sound to be produced when a visual prompt of the operating system is displayed; [0022], presentation layout information 112 of FIG. 1 may indicate that there are two displays in presentation area 200 and may identify their respective positions therein. Likewise, presentation layout information 112 may indicate that there are four audio playback devices in presentation area 200 and may identify the respective positions of those audio playback devices; [0024], apparatus 100 may utilize one or more conventional position sensing techniques to sense the positions of one or more displays 145-n and/or one or more audio playback devices 150-p. Based on this information, audio management module 106 may be operative to determine presentation layout information 112 identifying the relative locations of one or more displays 145-n and/or one or more audio playback devices 150-p within the presentation area; [0027], audio management module 106 may be operative to generate audio playback information 118 based on audio information 110 and audio location information 116 for that audio information 110. In some embodiments, audio management module 106 may be operative to generate audio playback information 118 operative on one or more audio playback devices 150-p to generate an audio effect such that it appears to originate from a position identified by audio location information 116. In various embodiments, audio management module 106 may be operative to generate audio playback information 118 using one or more techniques for controlling the apparent origins of audio effects; [0034], At 408, audio playback information for the audio effect may be generated based on the audio location information. For example, audio management module 106 of FIG. 1 may be operative to generate audio playback information 118 operative on audio playback devices 208-1 and 208-2 of FIG. 2 to cause the audio effect to be generated such that it appears to originate from the right side of display 20; [0067], voice recognition device and software)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated determining, based on information related to a bonded configuration of the first playback device and the second playback device, that the first playback device is closer to a visual output device than the second playback device, after determining that the first playback device is closer to the visual output device than the second playback device, selecting the first playback device for playback of the determined feedback for playback as suggested in Janus into Kim in view of Hart in view of Williams.  Doing so would be desirable because in systems comprising large displays, large display arrays, or displays separated by significant distances, it may not be possible for a user to maintain all of the collective display area within his field of vision simultaneously. As a result, prompts or other visual elements requiring user attention may be presented in portions of the collective display area lying outside the users field of vision. Additionally, in some conventional configurations, audio effects corresponding to any visual elements presented in the collective display area may be generated such that they appear to originate from the same point, such as the midpoint between two speakers. As a result, in such conventional systems, audio effects corresponding to visual elements may not appear to originate from positions corresponding to the positions of those visual elements (see Janus [0002]).  The system of Janus is useable for any type of audio effects, such as speech, for any audio application (see Janus [0015]), such as voice recognition software (see Janus [0067]).  An advantage of some embodiments may be that by localizing audio effects according to the locations of their corresponding visual effects may assist a user in locating display items that require attention. Another advantage of various embodiments may be that performing such audio localization may result in a more natural and pleasurable user experience during content consumption, because audio effects may appear to originate from their associated visual sources to a greater extent than they do in conventional systems. Other advantages may be associated with the disclosed subject matter, and the embodiments are not limited in this context (see Janus [0019]).

Regarding claim 8, claim 8 contains substantially similar limitations to those found in claim 1, the only difference being wherein the first playback device is closer to a visual output device than the second playback device is to the visual output device, so that the determined feedback for playback is output through a playback device closer to the visual output device than the second playback device (Janus Figs. 1-7; [0015], audio effects may comprise sounds, tones, speech, music, and/or any other types of audio effects; [0017], audio playback devices 150-p may comprise one or more wired and/or wireless speakers; [0018],apparatus 100, displays 145-n, and audio playback devices 150-p are illustrated as separate components; [0019], apparatus 100 and/or system 140 may be operative to generate audio playback information operative on one or more audio playback devices 150-p to cause one or more desired audio effects to be generated. In some embodiments, apparatus 100 and/or system 140 may be operative to generate audio playback information based on audio information that corresponds to particular visual effects. For example, apparatus 100 and/or system 140 may be operative to generate audio playback information for an error chime corresponding to an error window in an operating system. In various such embodiments, apparatus 100 and/or system 140 may be operative to generate audio playback information such that for a user listening to audio playback devices 150-p, the apparent origin of a given audio effect corresponds to the position of its associated visual effect on one or more displays 145-n. Continuing with the previous example, apparatus 100 and/or system 140 may be operative to generate audio playback information such that if the error window appears in an upper right corner of a display 145-n, the apparent origin of the error chime to a listening user is also the upper right corner of the display; [0020], audiovisual application 107 may be operative to generate graphics information 108. Graphics information 108 may comprise data, information, logic, and/or instructions corresponding to one or more user interface elements to be displayed on one or more displays 145-n. Such user interface elements may comprise any visual or optical sensory effect(s) such as, for example, images, pictures, video, text, graphics, menus, textures, and/or patterns. Such user interface elements may be associated with menus, prompts, and/or controls usable to operate audiovisual application 107, and/or may be associated with content presented by audiovisual application 107; [0021], audiovisual application 107 may be operative to generate audio information 110 corresponding to graphics information 108. Audio information 110 may comprise data, information, logic, and/or instructions corresponding to one or more audio effects to be produced by one or more audio playback devices 150-p in conjunction with the presentation of one or more user interface elements by one or more displays 145-n. In an example embodiment in which audiovisual application 107 comprising an operating system, particular audio information 110 may correspond to an alert sound to be produced when a visual prompt of the operating system is displayed; [0022], As shown in FIG. 2, presentation area 200 is a three-dimensional space defined by and comprising displays 202 and 204 and audio playback devices 206-1, 206-2, 208-1, and 208-2; presentation layout information 112 of FIG. 1 may indicate that there are two displays in presentation area 200 and may identify their respective positions therein. Likewise, presentation layout information 112 may indicate that there are four audio playback devices in presentation area 200 and may identify the respective positions of those audio playback devices; [0024], apparatus 100 may utilize one or more conventional position sensing techniques to sense the positions of one or more displays 145-n and/or one or more audio playback devices 150-p. Based on this information, audio management module 106 may be operative to determine presentation layout information 112 identifying the relative locations of one or more displays 145-n and/or one or more audio playback devices 150-p within the presentation area; [0027], audio management module 106 may be operative to generate audio playback information 118 based on audio information 110 and audio location information 116 for that audio information 110. In some embodiments, audio management module 106 may be operative to generate audio playback information 118 operative on one or more audio playback devices 150-p to generate an audio effect such that it appears to originate from a position identified by audio location information 116. In various embodiments, audio management module 106 may be operative to generate audio playback information 118 using one or more techniques for controlling the apparent origins of audio effects; [0034], At 408, audio playback information for the audio effect may be generated based on the audio location information. For example, audio management module 106 of FIG. 1 may be operative to generate audio playback information 118 operative on audio playback devices 208-1 and 208-2 of FIG. 2 to cause the audio effect to be generated such that it appears to originate from the right side of display 20; [0067], voice recognition device and software).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein the first playback device is closer to a visual output device than the second playback device is to the visual output device, so that the determined feedback for playback is output through a playback device closer to the visual output device than the second playback device as suggested in Janus into Kim in view of Hart in view of Williams.  Doing so would be desirable because in systems comprising large displays, large display arrays, or displays separated by significant distances, it may not be possible for a user to maintain all of the collective display area within his field of vision simultaneously. As a result, prompts or other visual elements requiring user attention may be presented in portions of the collective display area lying outside the users field of vision. Additionally, in some conventional configurations, audio effects corresponding to any visual elements presented in the collective display area may be generated such that they appear to originate from the same point, such as the midpoint between two speakers. As a result, in such conventional systems, audio effects corresponding to visual elements may not appear to originate from positions corresponding to the positions of those visual elements (see Janus [0002]).  The system of Janus is useable for any type of audio effects, such as speech, for any audio application (see Janus [0015]), such as voice recognition software (see Janus [0067]).  An advantage of some embodiments may be that by localizing audio effects according to the locations of their corresponding visual effects may assist a user in locating display items that require attention. Another advantage of various embodiments may be that performing such audio localization may result in a more natural and pleasurable user experience during content consumption, because audio effects may appear to originate from their associated visual sources to a greater extent than they do in conventional systems. Other advantages may be associated with the disclosed subject matter, and the embodiments are not limited in this context (see Janus [0019]). Consequently, claim 8 is rejected for the same reasons.

Regarding claim 2, Kim in view of Hart in further view of Williams in further view of Janus teaches all the limitations of claim 1, further comprising:
determining a location of the visual output device (Janus Figs. 1-7; [0015], audio effects may comprise sounds, tones, speech, music, and/or any other types of audio effects; [0017], audio playback devices 150-p may comprise one or more wired and/or wireless speakers; [0018-0019], apparatus 100 and/or system 140 may be operative to generate audio playback information such that for a user listening to audio playback devices 150-p, the apparent origin of a given audio effect corresponds to the position of its associated visual effect on one or more displays 145-n. Continuing with the previous example, apparatus 100 and/or system 140 may be operative to generate audio playback information such that if the error window appears in an upper right corner of a display 145-n, the apparent origin of the error chime to a listening user is also the upper right corner of the display; [0020-0021], In an example embodiment in which audiovisual application 107 comprising an operating system, particular audio information 110 may correspond to an alert sound to be produced when a visual prompt of the operating system is displayed; [0024], apparatus 100 may utilize one or more conventional position sensing techniques to sense the positions of one or more displays 145-n and/or one or more audio playback devices 150-p. Based on this information, audio management module 106 may be operative to determine presentation layout information 112 identifying the relative locations of one or more displays 145-n and/or one or more audio playback devices 150-p within the presentation area; [0034], At 408, audio playback information for the audio effect may be generated based on the audio location information. For example, audio management module 106 of FIG. 1 may be operative to generate audio playback information 118 operative on audio playback devices 208-1 and 208-2 of FIG. 2 to cause the audio effect to be generated such that it appears to originate from the right side of display 20; see also [0022], [0027], [0067])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated determining a location of the visual output device as suggested in Janus into Kim in view of Hart in view of Williams.  Doing so would be desirable because in systems comprising large displays, large display arrays, or displays separated by significant distances, it may not be possible for a user to maintain all of the collective display area within his field of vision simultaneously. As a result, prompts or other visual elements requiring user attention may be presented in portions of the collective display area lying outside the users field of vision. Additionally, in some conventional configurations, audio effects corresponding to any visual elements presented in the collective display area may be generated such that they appear to originate from the same point, such as the midpoint between two speakers. As a result, in such conventional systems, audio effects corresponding to visual elements may not appear to originate from positions corresponding to the positions of those visual elements (see Janus [0002]).  The system of Janus is useable for any type of audio effects, such as speech, for any audio application (see Janus [0015]), such as voice recognition software (see Janus [0067]).  An advantage of some embodiments may be that by localizing audio effects according to the locations of their corresponding visual effects may assist a user in locating display items that require attention. Another advantage of various embodiments may be that performing such audio localization may result in a more natural and pleasurable user experience during content consumption, because audio effects may appear to originate from their associated visual sources to a greater extent than they do in conventional systems. Other advantages may be associated with the disclosed subject matter, and the embodiments are not limited in this context (see Janus [0019]).

Regarding claim 9, claim 9 contains substantially similar limitations to those found in claim 2.  Consequently, claim 9 is rejected for the same reasons.

Regarding claim 3, Kim in view of Hart in further view of Williams in further view of Janus teaches all the limitations of claim 1.  Williams further teaches:
synchronously playing back media content via the first playback device at a first volume level and via the second playback device at a second volume level (Williams Figs. 1-18; col. 3 [line 37], a volume of the audio; col. 3 [line 54] – col. 4 [line 14], a speaker controller 22 may control multiple speakers 20 and may send audio data to the multiple speakers 20 so that the multiple speakers 20 collectively generate output audio 30; col. 7 [line 18], the server(s) 112 may select every speaker 20 and/or output zone, may identify a preferred output zone (e.g., living room) based on previously received commands, may identify speaker(s) and/or an output zone associated with the device 110 that received the input audio 11; col. 8 [line 18], the system 100 may enable the user 10 to instruct the server(s) 112 to generate output audio 30 using any combination of the speaker(s) 20; the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 8 [line 46], an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 17 [line 33], the server(s) 112 may generate output audio 30 using the speakers 20a-1/20a-2/20b/20c and/or device 110c in Room 1, Room 3 and/or Room 4 of the house 440; col. 19 [line 8], the user 10 and/or the server(s) 112 may select output devices and generate output zones, as illustrated in FIG 5C; Zone 5 (not shown) may include Zone 1, Zone 2, Zone 3 and Zone 4 and may be used to generate output audio 30 all over the house 540b; col. 19 [line 50], an output zone may include input devices and/or output devices in multiple rooms; see also col. 22 [line 32], col. 22 [line 46], col. 22 [line 54], col. 25 [line 51], col. 32 [line 44])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated synchronously playing back media content via the first playback device at a first volume level and via the second playback device at a second volume level as suggested in Williams into Kim in view of Hart in further view of Janus.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, speakers and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through natural language input such as speech input (see Williams col. 1 [line 23]).  An environment may include a number of different entertainment systems, including standalone speakers, wired speakers, wireless speakers or the like. However, the different entertainment systems may be separated as different devices are controlled separately from each other (see Williams col. 2 [line 26]).   Using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands.  Additionally or alternatively, the system may send audio data directly to the multiple entertainment systems/speakers in response to the voice commands (see Williams col. 2 [line 49]).  Additionally, the system of Williams would improve the system of Kim by enabling the user to flexibly output desired audio in desired ways, thereby increasing the usefulness of the speakers in the system as well as enabling the user enjoy media content.

Regarding claim 10, claim 10 contains substantially similar limitations to those found in claim 3.  Consequently, claim 10 is rejected for the same reasons.

Regarding claim 4, Kim in view of Hart in further view of Williams in further view of Janus teaches all the limitations of claim 3.  Williams further teaches:
causing a volume level of at least one audio feedback element of the determined feedback for playback output by the first playback device to be adjusted based on the first volume level (Williams Figs. 1-18; col. 3 [line 37], a volume of the audio; col. 3 [line 54] – col. 4 [line 14], a speaker controller 22 may control multiple speakers 20 and may send audio data to the multiple speakers 20 so that the multiple speakers 20 collectively generate output audio 30; col. 7 [line 18], the server(s) 112 may select every speaker 20 and/or output zone, may identify a preferred output zone (e.g., living room) based on previously received commands, may identify speaker(s) and/or an output zone associated with the device 110 that received the input audio 11; col. 8 [line 18], the system 100 may enable the user 10 to instruct the server(s) 112 to generate output audio 30 using any combination of the speaker(s) 20; the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 8 [line 46], an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 25 [line 51], As illustrated in FIG. 9B, device 110b may receive input audio 940 from user 10 including a command and the device 110b may send audio data corresponding to the input audio 940 to the server(s) 112. The server(s) 112 may determine the command and may send data 950 to speaker(s) 20 to generate output audio 960, which includes music and voice output; In response to some commands, the server(s) 112 may generate voice output indicating to the user 10 that the command was performed. For example, the voice output may state “audio muted,” “increasing volume,” “decreasing volume” or the like. Thus, the output audio 960 may include the music playing at a first volume and the voice output playing at a second volume higher than the first volume; see also col. 17 [line 33], col. 22 [line 32], col. 22 [line 46], col. 22 [line 54], col. 25 [line 51], col. 32 [line 44])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated causing a volume level of at least one audio feedback element of the determined feedback for playback output by the first playback device to be adjusted based on the first volume level as suggested in Williams into Kim in view of Hart in further view of Janus.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, speakers and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through natural language input such as speech input (see Williams col. 1 [line 23]).  An environment may include a number of different entertainment systems, including standalone speakers, wired speakers, wireless speakers or the like. However, the different entertainment systems may be separated as different devices are controlled separately from each other (see Williams col. 2 [line 26]).   Using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands.  Additionally or alternatively, the system may send audio data directly to the multiple entertainment systems/speakers in response to the voice commands (see Williams col. 2 [line 49]).  Additionally, basing the volume of the audio feedback on the first volume level would better enable the user to hear voice responses.

Regarding claim 11, claim 11 contains substantially similar limitations to those found in claim 4.  Consequently, claim 11 is rejected for the same reasons.

Regarding claim 5, Kim in view of Hart in further view of Williams in further view of Janus teaches all the limitations of claim 3.  Williams further teaches:
determining that the media content is lean back audio; and in response to determining that the media content is lean back audio, reducing playback of the media content via the second playback device from the second volume level to a third volume level, wherein the third volume level is lower than the second volume level (Williams Figs. 1-18; abs. the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands; the system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output; col. 3 [line 54] – col. 4 [line 14], a speaker controller 22 may control multiple speakers 20 and may send audio data to the multiple speakers 20 so that the multiple speakers 20 collectively generate output audio 30; col. 7 [line 18], the server(s) 112 may select every speaker 20 and/or output zone, may identify a preferred output zone (e.g., living room) based on previously received commands, may identify speaker(s) and/or an output zone associated with the device 110 that received the input audio 11; col. 8 [line 1], the user 10 hears the voice output at a first volume and the music at a second, lower, volume; col. 8 [line 18], the system 100 may enable the user 10 to instruct the server(s) 112 to generate output audio 30 using any combination of the speaker(s) 20; the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 8 [line 46], an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 13 [line 19], the device 110 may be associated with domains for different applications such as music, telephony, calendaring, contact lists, and device-specific communications; col. 25 [line 51], As illustrated in FIG. 9B, device 110b may receive input audio 940 from user 10 including a command and the device 110b may send audio data corresponding to the input audio 940 to the server(s) 112. The server(s) 112 may determine the command and may send data 950 to speaker(s) 20 to generate output audio 960, which includes music and voice output; In response to some commands, the server(s) 112 may generate voice output indicating to the user 10 that the command was performed. For example, the voice output may state “audio muted,” “increasing volume,” “decreasing volume” or the like. Thus, the output audio 960 may include the music playing at a first volume and the voice output playing at a second volume higher than the first volume; col. 27 [line 50], the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio over the speaker(s) 20 while displaying output video on a television. When the system 100 receives the input audio, the system 100 may control the speaker(s) 20 to lower a volume of the output audio while pausing the output video on the television; see also col. 17 [line 33], col. 22 [line 32], col. 22 [line 46], col. 22 [line 54], col. 25 [line 51], col. 32 [line 44])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated determining that the media content is lean back audio; and in response to determining that the media content is lean back audio, reducing playback of the media content via the second playback device from the second volume level to a third volume level, wherein the third volume level is lower than the second volume level as suggested in Williams into Kim in view of Hart in further view of Janus.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, speakers and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through natural language input such as speech input (see Williams col. 1 [line 23]).  An environment may include a number of different entertainment systems, including standalone speakers, wired speakers, wireless speakers or the like. However, the different entertainment systems may be separated as different devices are controlled separately from each other (see Williams col. 2 [line 26]).   Using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands.  Additionally or alternatively, the system may send audio data directly to the multiple entertainment systems/speakers in response to the voice commands (see Williams col. 2 [line 49]).  Additionally, lowering the volume on the speakers would better enable the user to hear voice responses.  Additionally, the system of Williams would improve the system of Kim by enabling the user to flexibly output desired audio in desired ways, thereby increasing the usefulness of the speakers in the system as well as enabling the user enjoy media content.

Regarding claim 12, claim 12 contains substantially similar limitations to those found in claim 5.  Consequently, claim 12 is rejected for the same reasons.

Regarding claim 6, Kim in view of Hart in further view of Williams in further view of Janus teaches all the limitations of claim 3.  Hart further teaches:
wherein causing output of the determined feedback for playback by the first playback device comprises causing output of only visual feedback elements of the determined feedback for playback (Hart Figs. 1-8; col. 2 [line 10], the first device includes a microphone for generating audio signals representative of user speech, as well as a speaker for outputting audible content in response to identified voice commands in the user speech. However, the first device might not include a display for displaying graphical content. As such, the first device may be configured to identify devices that include displays and that are proximate to the first device. The first device may then instruct one or more of these other devices to output visual content associated with a user's voice command; col. 3 [line 4], the second device may display content in any number of ways. In some implementations, the second device may include an application that is specifically configured to interact with the first device (e.g., a “companion application”). The companion application may be configured to receive information and/or instructions from the first device and/or a remote computing resource and display the appropriate content associated with the user's command. For instance, the application may display one or more links that lead to web sites, applications, or other destinations that include content about Benjamin Franklin. Additionally or alternatively, the application may directly pull in and display this content, such as detailed information about Benjamin Franklin; col. 3 [line 32], the second device awakens and directly causes display of the content upon receiving the instruction from the first device; col. 6 col. 10 [line 25], the first content comprises audible content while the second content comprises visual content; col. 10 [line 39], the process 600 identifies a device on which to output the content by identifying a device that is within a threshold distance of the first device and/or the user. In other instances, the process 600 identifies and selects a device on which to output content based on a type of the device, information regarding whether the device is powered on, and the like. The process 600 may also ensure that this other device is associated with the user. At 608, the process 600 visually presents the content on the identified device that is within the threshold distance)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein causing output of the determined feedback by the first playback device comprises causing output of only visual feedback elements of the determined feedback as suggested in Hart into Kim in view of Williams in further view of Janus.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, and portable communication devices. As computing devices evolve, many different ways have been introduced to allow users to interact with these devices, such as through mechanical means (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through a user speaking to a device and the device outputting audio to the user in return. However, in some instances, certain content is best output in a form other than audio alone (see Hart col. 1 [line 14]).  The system of Hart would improve the system of Kim by enabling the master device to provide both audible and visual output to the slave devices (see Hart col. 6 [line 28]), thereby ensuring the user receives desired audible and visual feedback in the best output form (see Hart col. 1 [line 14]).  
Williams further teaches:
determining that the media content is lean in audio (Williams Figs. 1-18; abs. the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands; the system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output; col. 3 [line 54] – col. 4 [line 14], a speaker controller 22 may control multiple speakers 20 and may send audio data to the multiple speakers 20 so that the multiple speakers 20 collectively generate output audio 30; col. 7 [line 18], the server(s) 112 may select every speaker 20 and/or output zone, may identify a preferred output zone (e.g., living room) based on previously received commands, may identify speaker(s) and/or an output zone associated with the device 110 that received the input audio 11; col. 8 [line 1], the user 10 hears the voice output at a first volume and the music at a second, lower, volume; col. 8 [line 18], the system 100 may enable the user 10 to instruct the server(s) 112 to generate output audio 30 using any combination of the speaker(s) 20; the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 8 [line 46], an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 13 [line 19], the device 110 may be associated with domains for different applications such as music, telephony, calendaring, contact lists, and device-specific communications; col. 25 [line 51], As illustrated in FIG. 9B, device 110b may receive input audio 940 from user 10 including a command and the device 110b may send audio data corresponding to the input audio 940 to the server(s) 112. The server(s) 112 may determine the command and may send data 950 to speaker(s) 20 to generate output audio 960, which includes music and voice output; In response to some commands, the server(s) 112 may generate voice output indicating to the user 10 that the command was performed. For example, the voice output may state “audio muted,” “increasing volume,” “decreasing volume” or the like. Thus, the output audio 960 may include the music playing at a first volume and the voice output playing at a second volume higher than the first volume; col. 27 [line 50], the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio over the speaker(s) 20 while displaying output video on a television. When the system 100 receives the input audio, the system 100 may control the speaker(s) 20 to lower a volume of the output audio while pausing the output video on the television; see also col. 17 [line 33], col. 22 [line 32], col. 22 [line 46], col. 22 [line 54], col. 25 [line 51], col. 32 [line 44])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated determining that the media content is lean in audio as suggested in Williams into Kim in view of Hart.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, speakers and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through natural language input such as speech input (see Williams col. 1 [line 23]).  An environment may include a number of different entertainment systems, including standalone speakers, wired speakers, wireless speakers or the like. However, the different entertainment systems may be separated as different devices are controlled separately from each other (see Williams col. 2 [line 26]).   Using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands.  Additionally or alternatively, the system may send audio data directly to the multiple entertainment systems/speakers in response to the voice commands (see Williams col. 2 [line 49]).  Additionally, the system of Williams would improve the system of Kim by enabling the user to flexibly output desired audio in desired ways, thereby increasing the usefulness of the speakers in the system as well as enabling the user enjoy media content.

Regarding claim 13, claim 13 contains substantially similar limitations to those found in claim 6.  Consequently, claim 13 is rejected for the same reasons.

Regarding claim 7, Kim in view of Hart teaches in further view of Williams in further view of Janus teaches all the limitations of claim 1.  Williams further teaches:
wherein the first playback device and the second playback device are part of a home theater system (Williams Figs. 1-18; col. 3 [line 37], a volume of the audio; col. 3 [line 54] – col. 4 [line 14], a speaker controller 22 may control multiple speakers 20 and may send audio data to the multiple speakers 20 so that the multiple speakers 20 collectively generate output audio 30; col. 7 [line 18], the server(s) 112 may select every speaker 20 and/or output zone, may identify a preferred output zone (e.g., living room) based on previously received commands, may identify speaker(s) and/or an output zone associated with the device 110 that received the input audio 11; col. 8 [line 18], the system 100 may enable the user 10 to instruct the server(s) 112 to generate output audio 30 using any combination of the speaker(s) 20; the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 8 [line 46], an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 17 [line 33], the server(s) 112 may generate output audio 30 using the speakers 20a-1/20a-2/20b/20c and/or device 110c in Room 1, Room 3 and/or Room 4 of the house 440; col. 19 [line 8], the user 10 and/or the server(s) 112 may select output devices and generate output zones, as illustrated in FIG 5C; Zone 5 (not shown) may include Zone 1, Zone 2, Zone 3 and Zone 4 and may be used to generate output audio 30 all over the house 540b; col. 19 [line 50], an output zone may include input devices and/or output devices in multiple rooms; see also col. 22 [line 32], col. 22 [line 46], col. 22 [line 54], col. 25 [line 51], col. 32 [line 44])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein the first playback device and the second playback device are part of a home theater system as suggested in Williams into Kim in view of Hart in further view of Janus.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, speakers and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through natural language input such as speech input (see Williams col. 1 [line 23]).  An environment may include a number of different entertainment systems, including standalone speakers, wired speakers, wireless speakers or the like. However, the different entertainment systems may be separated as different devices are controlled separately from each other (see Williams col. 2 [line 26]).   Using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands.  the system may send audio data directly to the multiple entertainment systems/speakers in response (see Williams col. 2 [line 49]).  Additionally, the system of Williams would improve the system of Kim by enabling the user to flexibly output desired audio in desired ways, thereby increasing the usefulness of the speakers in the system as well as enabling the user enjoy media content.

Regarding claim 14, claim 14 contains substantially similar limitations to those found in claim 7.  Consequently, claim 14 is rejected for the same reasons.

Regarding claim 22, Kim in view of Hart teaches in further view of Williams in further view of Janus teaches all the limitations of claim 8.  Janus further teaches:
wherein the first playback device, the second playback device, and the visual output device are in the same room (Janus Figs. 1-7; [0015], audio effects may comprise sounds, tones, speech, music, and/or any other types of audio effects; [0017], audio playback devices 150-p may comprise one or more wired and/or wireless speakers; [0018],apparatus 100, displays 145-n, and audio playback devices 150-p are illustrated as separate components; [0019], apparatus 100 and/or system 140 may be operative to generate audio playback information such that for a user listening to audio playback devices 150-p, the apparent origin of a given audio effect corresponds to the position of its associated visual effect on one or more displays 145-n. Continuing with the previous example, apparatus 100 and/or system 140 may be operative to generate audio playback information such that if the error window appears in an upper right corner of a display 145-n, the apparent origin of the error chime to a listening user is also the upper right corner of the display; [0020], audiovisual application 107 may be operative to generate graphics information 108. Graphics information 108 may comprise data, information, logic, and/or instructions corresponding to one or more user interface elements to be displayed on one or more displays 145-n; [0021], In an example embodiment in which audiovisual application 107 comprising an operating system, particular audio information 110 may correspond to an alert sound to be produced when a visual prompt of the operating system is displayed; [0022], As shown in FIG. 2, presentation area 200 is a three-dimensional space defined by and comprising displays 202 and 204 and audio playback devices 206-1, 206-2, 208-1, and 208-2; [0024], identifying the relative locations of one or more displays 145-n and/or one or more audio playback devices 150-p within the presentation area; [0034], At 408, audio playback information for the audio effect may be generated based on the audio location information. For example, audio management module 106 of FIG. 1 may be operative to generate audio playback information 118 operative on audio playback devices 208-1 and 208-2 of FIG. 2 to cause the audio effect to be generated such that it appears to originate from the right side of display 20; [0067], voice recognition device and software; see also [0027])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein the first playback device, the second playback device, and the visual output device are in the same room as suggested in Janus into Kim in view of Hart in view of Williams.  Doing so would be desirable because in systems comprising large displays, large display arrays, or displays separated by significant distances, it may not be possible for a user to maintain all of the collective display area within his field of vision simultaneously. As a result, prompts or other visual elements requiring user attention may be presented in portions of the collective display area lying outside the users field of vision. Additionally, in some conventional configurations, audio effects corresponding to any visual elements presented in the collective display area may be generated such that they appear to originate from the same point, such as the midpoint between two speakers. As a result, in such conventional systems, audio effects corresponding to visual elements may not appear to originate from positions corresponding to the positions of those visual elements (see Janus [0002]).  The system of Janus is useable for any type of audio effects, such as speech, for any audio application (see Janus [0015]), such as voice recognition software (see Janus [0067]).  An advantage of some embodiments may be that by localizing audio effects according to the locations of their corresponding visual effects may assist a user in locating display items that require attention. Another advantage of various embodiments may be that performing such audio localization may result in a more natural and pleasurable user experience during content consumption, because audio effects may appear to originate from their associated visual sources to a greater extent than they do in conventional systems. Other advantages may be associated with the disclosed subject matter, and the embodiments are not limited in this context (see Janus [0019]).

Claims 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Kim (US 20180277107 A1, published 09/27/2018) in view of Hart et al. (US 9842584 B1, published 12/12/2017), hereinafter Hart, in further view of Janus (US 20140270280 A1, published 09/18/2014).

Regarding claim 15, Kim teaches the claim comprising:
Tangible, non-transitory, computer-readable media storing instructions executable by one or more processors to cause a media playback system to perform operations, the media playback system comprising a first playback device and a second playback device, the operations comprising (Kim Figs. 1-8; [0006], The various embodiments set forth a non-transitory computer-readable medium including instructions that, when executed by one or more processors, configure the one or more processors to perform speech recognition in a multi-device system; [0021], FIG. 1 is a schematic diagram illustrating a multi-device intelligent personal assistant (IPA) system 100, configured to implement one or more aspects of the various embodiments. Multi-device IPA system 100 includes a master smart device 120, a slave smart device 130, and a slave smart device 140, all communicatively connected to each other via communication network 150. Also shown in FIG. 1 is a user 90, who generates a user request via a verbal utterance 91; [0023], Master smart device 120 also generates an audio signal 121 via a microphone 122 in response to verbal utterance 91; [0025], FIG. 2 illustrates a computing device 200 configured to implement one or more aspects of the disclosure. Computing device 200 may be employed as master smart device 120, slave smart device 130, and/or slave smart device 140; [0026], computing device 200 includes, without limitation, an interconnect (bus) 240 that connects a processing unit 250, an input/output (I/O) device interface 260 coupled to input/output (I/O) devices 280, memory 210, a storage 230, and a network interface 270; [0031], Memory 210 includes various software programs that can be executed by processor 250 and application data associated with said software programs, including speech recognition application 211, audio signal merging application 212, loudness matching application 213, temporal alignment application 214, master selection application 215 and/or topology application 216):
receiving, via at least one microphone of the second playback device, voice input data from a first source; determining, in response to receiving the voice input data from the first source, feedback for playback, the determined feedback for playback comprising at least one audio feedback element (Kim Figs. 1-8; [0023], Each of master smart device 120, slave smart device 130, and slave smart device 140 is an IPA-enabled computing device configured to receive and act on certain voice commands from a user. In operation, one or more of master smart device 120, slave smart device 130, and slave smart device 140 detect verbal utterance 91 and convert verbal utterance 91 to a respective audio signal, such as a digital audio signal. Thus, slave smart device 130 generates an audio signal 131 in response to verbal utterance 91, for example via a microphone 132, and transmits audio signal 131 to master smart device 120. Similarly, slave smart device 140 generates an audio signal 141 in response to verbal utterance 91, for example via a microphone 142, and transmits audio signal 141 to master smart device 120. Master smart device 120 also generates an audio signal 121 via a microphone 122 in response to verbal utterance 91, and then constructs a speech recognition audio signal based on portions of audio signal 131, audio signal 141, and/or audio signal 121, as described in greater detail below. The speech recognition audio signal is then transferred to a speech recognition application for evaluation. When a response audio signal 125 is returned by the speech recognition application, master smart device 120 determines which smart device in multi-device IPA system 100 is closest to user 90, and transmits response audio signal 125 to that smart device for conversion into sound energy by an appropriate loudspeaker 123, 133, or 143; [0037], Master smart device 120 is further configured to determine which smart device in multi-device IPA system 100 is closest to user 90 and provide that smart device with any response audio signal 125 returned by speech recognition application 211. Consequently, the appropriate smart device in multi-device IPA system 100 provides any forthcoming audio response to user 90; [0053], response audio signal 125 may include a speech-based response to the voice command or commands detected in step 408; see also [0027-0028], smart devices include screens),
determining, based on information related to a bonded configuration of the first playback device and the second playback device; selecting the first playback device for playback of the determined feedback for playback so that the determined feedback for playback is output through a playback device closer to the visual output device, and causing output of the determined feedback for playback by the first playback device (Kim Figs. 1-8; [0023], Each of master smart device 120, slave smart device 130, and slave smart device 140 is an IPA-enabled computing device configured to receive and act on certain voice commands from a user; When a response audio signal 125 is returned by the speech recognition application, master smart device 120 determines which smart device in multi-device IPA system 100 is closest to user 90, and transmits response audio signal 125 to that smart device for conversion into sound energy by an appropriate loudspeaker 123, 133, or 143 (as shown in Fig. 1, loudspeaker 133 is farther from the source than microphone 132); [0044], It is noted that the relative signal strength of one of the audio signals received in step 401 with respect to the other audio signals can vary throughout temporal segments 501A-501N. For example, audio signal 131 has the strongest audio signal strength in temporal segments 510, whereas audio signal 141 has the strongest audio signal strength in temporal segments 520. Such a change in relative audio signal strength may be a result of a change in location or orientation of user 90 with respect to one or more of master smart device 120, slave smart device 130, or slave device 140. Thus, during the time interval represented by temporal segments 510, user 90 may be proximate or directly facing slave smart device 130, while in the time interval represented by temporal segments 520, user 90 may be more directly facing or closer to slave smart device 140 (causing output of the determined feedback by the first playback device so that the feedback is output farther away from the first source than from where the voice input from the first source was received); [0054], In step 410, master smart device 120 determines which of the smart devices included in multi-device IPA system 100 is closest to user 90. In some embodiments, master smart device 120 determines which smart devices is closest to user 90 based on segment metadata 302. Specifically, the master smart device 120 may determine that the smart device that is closest to user 90 is the smart device from which the last temporal segment 531N of speech recognition audio signal 530 originated; [0055], In step 411, master smart device 120 transmits response audio signal 125 to the smart device determined to be the closest to user 90 in step 410. Thus, the smart device that is located closest to user 90 provides the audible response to voice commands included in verbal utterance 91; see also [0027-0028], smart devices include screens)
However, Kim fails to expressly disclose at least one visual feedback element.  In the same field of endeavor, Hart teaches:
	at least one visual feedback element (Hart Figs. 1-8; col. 2 [line 10], the first device includes a microphone for generating audio signals representative of user speech, as well as a speaker for outputting audible content in response to identified voice commands in the user speech. However, the first device might not include a display for displaying graphical content. As such, the first device may be configured to identify devices that include displays and that are proximate to the first device. The first device may then instruct one or more of these other devices to output visual content associated with a user's voice command; col. 3 [line 4], the second device may display content in any number of ways. In some implementations, the second device may include an application that is specifically configured to interact with the first device (e.g., a “companion application”). The companion application may be configured to receive information and/or instructions from the first device and/or a remote computing resource and display the appropriate content associated with the user's command. For instance, the application may display one or more links that lead to web sites, applications, or other destinations that include content about Benjamin Franklin. Additionally or alternatively, the application may directly pull in and display this content, such as detailed information about Benjamin Franklin; col. 3 [line 32], the second device awakens and directly causes display of the content upon receiving the instruction from the first device; col. 6 [line 28], while the voice-controlled device 106 may utilize these other devices to output visual content, the voice-controlled device 106 may additionally or alternatively utilize these devices to output additional audible content; col. 10 [line 25], the first content comprises audible content while the second content comprises visual content; col. 10 [line 39], the process 600 identifies a device on which to output the content by identifying a device that is within a threshold distance of the first device and/or the user. In other instances, the process 600 identifies and selects a device on which to output content based on a type of the device, information regarding whether the device is powered on, and the like. The process 600 may also ensure that this other device is associated with the user. At 608, the process 600 visually presents the content on the identified device that is within the threshold distance)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated at least one visual feedback element as suggested in Hart into Kim.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, and portable communication devices. As computing devices evolve, many different ways have been introduced to allow users to interact with these devices, such as through mechanical means (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through a user speaking to a device and the device outputting audio to the user in return. However, in some instances, certain content is best output in a form other than audio alone (see Hart col. 1 [line 14]).  The system of Hart is applicable to any client device (see Hart col. 2 [line 25]) to display any type of visual content (see Hart col. 11 [line 52]).  The system of Hart would improve the system of Kim by enabling the master device to provide both audible and visual output to the slave devices (see Hart col. 6 [line 28]), thereby ensuring the user receives desired audible and visual feedback in the best output form (see Hart col. 1 [line 14]).  
However, Kim in view of Hart fails to expressly disclose determining based on information related to a bonded configuration of the first playback device and the second playback device. that the first playback device is closer to a visual output device than the second playback device; after determining that the first playback device is closer to the visual output device than the second playback device, selecting the first playback device for playback of the determined feedback for playback so that the determined feedback for playback is output through a playback device closer to the visual output device than the second playback device.  In the same field of endeavor, Janus teaches:
determining based on information related to a bonded configuration of the first playback device and the second playback device. that the first playback device is closer to a visual output device than the second playback device; after determining that the first playback device is closer to the visual output device than the second playback device, selecting the first playback device for playback of the determined feedback for playback so that the determined feedback for playback is output through a playback device closer to the visual output device than the second playback device (Janus Figs. 1-7; [0015], audio effects may comprise sounds, tones, speech, music, and/or any other types of audio effects; [0017], audio playback devices 150-p may comprise one or more wired and/or wireless speakers; [0018],apparatus 100, displays 145-n, and audio playback devices 150-p are illustrated as separate components; [0019], apparatus 100 and/or system 140 may be operative to generate audio playback information operative on one or more audio playback devices 150-p to cause one or more desired audio effects to be generated. In some embodiments, apparatus 100 and/or system 140 may be operative to generate audio playback information based on audio information that corresponds to particular visual effects. For example, apparatus 100 and/or system 140 may be operative to generate audio playback information for an error chime corresponding to an error window in an operating system. In various such embodiments, apparatus 100 and/or system 140 may be operative to generate audio playback information such that for a user listening to audio playback devices 150-p, the apparent origin of a given audio effect corresponds to the position of its associated visual effect on one or more displays 145-n. Continuing with the previous example, apparatus 100 and/or system 140 may be operative to generate audio playback information such that if the error window appears in an upper right corner of a display 145-n, the apparent origin of the error chime to a listening user is also the upper right corner of the display; [0020], audiovisual application 107 may be operative to generate graphics information 108. Graphics information 108 may comprise data, information, logic, and/or instructions corresponding to one or more user interface elements to be displayed on one or more displays 145-n. Such user interface elements may comprise any visual or optical sensory effect(s) such as, for example, images, pictures, video, text, graphics, menus, textures, and/or patterns. Such user interface elements may be associated with menus, prompts, and/or controls usable to operate audiovisual application 107, and/or may be associated with content presented by audiovisual application 107; [0021], audiovisual application 107 may be operative to generate audio information 110 corresponding to graphics information 108. Audio information 110 may comprise data, information, logic, and/or instructions corresponding to one or more audio effects to be produced by one or more audio playback devices 150-p in conjunction with the presentation of one or more user interface elements by one or more displays 145-n. In an example embodiment in which audiovisual application 107 comprising an operating system, particular audio information 110 may correspond to an alert sound to be produced when a visual prompt of the operating system is displayed; [0022], presentation layout information 112 of FIG. 1 may indicate that there are two displays in presentation area 200 and may identify their respective positions therein. Likewise, presentation layout information 112 may indicate that there are four audio playback devices in presentation area 200 and may identify the respective positions of those audio playback devices; [0024], apparatus 100 may utilize one or more conventional position sensing techniques to sense the positions of one or more displays 145-n and/or one or more audio playback devices 150-p. Based on this information, audio management module 106 may be operative to determine presentation layout information 112 identifying the relative locations of one or more displays 145-n and/or one or more audio playback devices 150-p within the presentation area; [0027], audio management module 106 may be operative to generate audio playback information 118 based on audio information 110 and audio location information 116 for that audio information 110. In some embodiments, audio management module 106 may be operative to generate audio playback information 118 operative on one or more audio playback devices 150-p to generate an audio effect such that it appears to originate from a position identified by audio location information 116. In various embodiments, audio management module 106 may be operative to generate audio playback information 118 using one or more techniques for controlling the apparent origins of audio effects; [0034], At 408, audio playback information for the audio effect may be generated based on the audio location information. For example, audio management module 106 of FIG. 1 may be operative to generate audio playback information 118 operative on audio playback devices 208-1 and 208-2 of FIG. 2 to cause the audio effect to be generated such that it appears to originate from the right side of display 20; [0067], voice recognition device and software)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated determining based on information related to a bonded configuration of the first playback device and the second playback device. that the first playback device is closer to a visual output device than the second playback device; after determining that the first playback device is closer to the visual output device than the second playback device, selecting the first playback device for playback of the determined feedback for playback so that the determined feedback for playback is output through a playback device closer to the visual output device than the second playback device as suggested in Janus into Kim in view of Hart.  Doing so would be desirable because in systems comprising large displays, large display arrays, or displays separated by significant distances, it may not be possible for a user to maintain all of the collective display area within his field of vision simultaneously. As a result, prompts or other visual elements requiring user attention may be presented in portions of the collective display area lying outside the users field of vision. Additionally, in some conventional configurations, audio effects corresponding to any visual elements presented in the collective display area may be generated such that they appear to originate from the same point, such as the midpoint between two speakers. As a result, in such conventional systems, audio effects corresponding to visual elements may not appear to originate from positions corresponding to the positions of those visual elements (see Janus [0002]).  The system of Janus is useable for any type of audio effects, such as speech, for any audio application (see Janus [0015]), such as voice recognition software (see Janus [0067]).  An advantage of some embodiments may be that by localizing audio effects according to the locations of their corresponding visual effects may assist a user in locating display items that require attention. Another advantage of various embodiments may be that performing such audio localization may result in a more natural and pleasurable user experience during content consumption, because audio effects may appear to originate from their associated visual sources to a greater extent than they do in conventional systems. Other advantages may be associated with the disclosed subject matter, and the embodiments are not limited in this context (see Janus [0019]).

Regarding claim 16, Kim in view of Hart in further view of Janus teaches all the limitations of claim 15, further comprising:
determining a location of the visual output device (Janus Figs. 1-7; [0015], audio effects may comprise sounds, tones, speech, music, and/or any other types of audio effects; [0017], audio playback devices 150-p may comprise one or more wired and/or wireless speakers; [0018-0019], apparatus 100 and/or system 140 may be operative to generate audio playback information such that for a user listening to audio playback devices 150-p, the apparent origin of a given audio effect corresponds to the position of its associated visual effect on one or more displays 145-n. Continuing with the previous example, apparatus 100 and/or system 140 may be operative to generate audio playback information such that if the error window appears in an upper right corner of a display 145-n, the apparent origin of the error chime to a listening user is also the upper right corner of the display; [0020-0021], In an example embodiment in which audiovisual application 107 comprising an operating system, particular audio information 110 may correspond to an alert sound to be produced when a visual prompt of the operating system is displayed; [0024], apparatus 100 may utilize one or more conventional position sensing techniques to sense the positions of one or more displays 145-n and/or one or more audio playback devices 150-p. Based on this information, audio management module 106 may be operative to determine presentation layout information 112 identifying the relative locations of one or more displays 145-n and/or one or more audio playback devices 150-p within the presentation area; [0034], At 408, audio playback information for the audio effect may be generated based on the audio location information. For example, audio management module 106 of FIG. 1 may be operative to generate audio playback information 118 operative on audio playback devices 208-1 and 208-2 of FIG. 2 to cause the audio effect to be generated such that it appears to originate from the right side of display 20; see also [0022], [0027], [0067])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated determining a location of the visual output device as suggested in Janus into Kim in view of Hart in view of Williams.  Doing so would be desirable because in systems comprising large displays, large display arrays, or displays separated by significant distances, it may not be possible for a user to maintain all of the collective display area within his field of vision simultaneously. As a result, prompts or other visual elements requiring user attention may be presented in portions of the collective display area lying outside the users field of vision. Additionally, in some conventional configurations, audio effects corresponding to any visual elements presented in the collective display area may be generated such that they appear to originate from the same point, such as the midpoint between two speakers. As a result, in such conventional systems, audio effects corresponding to visual elements may not appear to originate from positions corresponding to the positions of those visual elements (see Janus [0002]).  The system of Janus is useable for any type of audio effects, such as speech, for any audio application (see Janus [0015]), such as voice recognition software (see Janus [0067]).  An advantage of some embodiments may be that by localizing audio effects according to the locations of their corresponding visual effects may assist a user in locating display items that require attention. Another advantage of various embodiments may be that performing such audio localization may result in a more natural and pleasurable user experience during content consumption, because audio effects may appear to originate from their associated visual sources to a greater extent than they do in conventional systems. Other advantages may be associated with the disclosed subject matter, and the embodiments are not limited in this context (see Janus [0019]).

Claims 17-20 are rejected under 35 U.S.C. 103 as being unpatentable Kim in view of Hart in further view of Janus in further view of Williams et al. (US 9898250 B1, published 02/20/2018), hereinafter Williams.

Regarding claim 17, Kim in view of Hart in further view of Janus teaches all the limitations of claim 15.  However, Kim in view of Hart in further view of Janus fails to expressly disclose synchronously playing back media content via the first playback device at a first volume level and via the second playback device at a second volume level.  Williams further teaches:
synchronously playing back media content via the first playback device at a first volume level and via the second playback device at a second volume level (Williams Figs. 1-18; col. 3 [line 37], a volume of the audio; col. 3 [line 54] – col. 4 [line 14], a speaker controller 22 may control multiple speakers 20 and may send audio data to the multiple speakers 20 so that the multiple speakers 20 collectively generate output audio 30; col. 7 [line 18], the server(s) 112 may select every speaker 20 and/or output zone, may identify a preferred output zone (e.g., living room) based on previously received commands, may identify speaker(s) and/or an output zone associated with the device 110 that received the input audio 11; col. 8 [line 18], the system 100 may enable the user 10 to instruct the server(s) 112 to generate output audio 30 using any combination of the speaker(s) 20; the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 8 [line 46], an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 17 [line 33], the server(s) 112 may generate output audio 30 using the speakers 20a-1/20a-2/20b/20c and/or device 110c in Room 1, Room 3 and/or Room 4 of the house 440; col. 19 [line 8], the user 10 and/or the server(s) 112 may select output devices and generate output zones, as illustrated in FIG 5C; Zone 5 (not shown) may include Zone 1, Zone 2, Zone 3 and Zone 4 and may be used to generate output audio 30 all over the house 540b; col. 19 [line 50], an output zone may include input devices and/or output devices in multiple rooms; see also col. 22 [line 32], col. 22 [line 46], col. 22 [line 54], col. 25 [line 51], col. 32 [line 44])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated synchronously playing back media content via the first playback device at a first volume level and via the second playback device at a second volume level as suggested in Williams into Kim in view of Hart in further view of Janus.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, speakers and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through natural language input such as speech input (see Williams col. 1 [line 23]).  An environment may include a number of different entertainment systems, including standalone speakers, wired speakers, wireless speakers or the like. However, the different entertainment systems may be separated as different devices are controlled separately from each other (see Williams col. 2 [line 26]).   Using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands.  Additionally or alternatively, the system may send audio data directly to the multiple entertainment systems/speakers in response to the voice commands (see Williams col. 2 [line 49]).  Additionally, the system of Williams would improve the system of Kim by enabling the user to flexibly output desired audio in desired ways, thereby increasing the usefulness of the speakers in the system as well as enabling the user enjoy media content.

Regarding claim 18, Kim in view of Hart in further view of Janus in further view of Williams teaches all the limitations of claim 17.  Williams further teaches:
causing a volume level of at least one audio feedback element of the determined feedback for playback output by the first playback device to be adjusted based on the first volume level (Williams Figs. 1-18; col. 3 [line 37], a volume of the audio; col. 3 [line 54] – col. 4 [line 14], a speaker controller 22 may control multiple speakers 20 and may send audio data to the multiple speakers 20 so that the multiple speakers 20 collectively generate output audio 30; col. 7 [line 18], the server(s) 112 may select every speaker 20 and/or output zone, may identify a preferred output zone (e.g., living room) based on previously received commands, may identify speaker(s) and/or an output zone associated with the device 110 that received the input audio 11; col. 8 [line 18], the system 100 may enable the user 10 to instruct the server(s) 112 to generate output audio 30 using any combination of the speaker(s) 20; the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 8 [line 46], an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 25 [line 51], As illustrated in FIG. 9B, device 110b may receive input audio 940 from user 10 including a command and the device 110b may send audio data corresponding to the input audio 940 to the server(s) 112. The server(s) 112 may determine the command and may send data 950 to speaker(s) 20 to generate output audio 960, which includes music and voice output; In response to some commands, the server(s) 112 may generate voice output indicating to the user 10 that the command was performed. For example, the voice output may state “audio muted,” “increasing volume,” “decreasing volume” or the like. Thus, the output audio 960 may include the music playing at a first volume and the voice output playing at a second volume higher than the first volume; see also col. 17 [line 33], col. 22 [line 32], col. 22 [line 46], col. 22 [line 54], col. 25 [line 51], col. 32 [line 44])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated causing a volume level of at least one audio feedback element of the determined feedback for playback output by the first playback device to be adjusted based on the first volume level as suggested in Williams into Kim in view of Hart in further view of Janus.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, speakers and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through natural language input such as speech input (see Williams col. 1 [line 23]).  An environment may include a number of different entertainment systems, including standalone speakers, wired speakers, wireless speakers or the like. However, the different entertainment systems may be separated as different devices are controlled separately from each other (see Williams col. 2 [line 26]).   Using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands.  Additionally or alternatively, the system may send audio data directly to the multiple entertainment systems/speakers in response to the voice commands (see Williams col. 2 [line 49]).  Additionally, basing the volume of the audio feedback on the first volume level would better enable the user to hear voice responses.

Regarding claim 19, Kim in view of Hart in further view of Janus in further view of Williams teaches all the limitations of claim 17.  Williams further teaches:
determining that the media content is lean back audio; and in response to determining that the media content is lean back audio, reducing playback of the media content via the second playback device from the second volume level to a third volume level, wherein the third volume level is lower than the second volume level (Williams Figs. 1-18; abs. the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands; the system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output; col. 3 [line 54] – col. 4 [line 14], a speaker controller 22 may control multiple speakers 20 and may send audio data to the multiple speakers 20 so that the multiple speakers 20 collectively generate output audio 30; col. 7 [line 18], the server(s) 112 may select every speaker 20 and/or output zone, may identify a preferred output zone (e.g., living room) based on previously received commands, may identify speaker(s) and/or an output zone associated with the device 110 that received the input audio 11; col. 8 [line 1], the user 10 hears the voice output at a first volume and the music at a second, lower, volume; col. 8 [line 18], the system 100 may enable the user 10 to instruct the server(s) 112 to generate output audio 30 using any combination of the speaker(s) 20; the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 8 [line 46], an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 13 [line 19], the device 110 may be associated with domains for different applications such as music, telephony, calendaring, contact lists, and device-specific communications; col. 25 [line 51], As illustrated in FIG. 9B, device 110b may receive input audio 940 from user 10 including a command and the device 110b may send audio data corresponding to the input audio 940 to the server(s) 112. The server(s) 112 may determine the command and may send data 950 to speaker(s) 20 to generate output audio 960, which includes music and voice output; In response to some commands, the server(s) 112 may generate voice output indicating to the user 10 that the command was performed. For example, the voice output may state “audio muted,” “increasing volume,” “decreasing volume” or the like. Thus, the output audio 960 may include the music playing at a first volume and the voice output playing at a second volume higher than the first volume; col. 27 [line 50], the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio over the speaker(s) 20 while displaying output video on a television. When the system 100 receives the input audio, the system 100 may control the speaker(s) 20 to lower a volume of the output audio while pausing the output video on the television; see also col. 17 [line 33], col. 22 [line 32], col. 22 [line 46], col. 22 [line 54], col. 25 [line 51], col. 32 [line 44])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated determining that the media content is lean back audio; and in response to determining that the media content is lean back audio, reducing playback of the media content via the second playback device from the second volume level to a third volume level, wherein the third volume level is lower than the second volume level as suggested in Williams into Kim in view of Hart in further view of Janus.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, speakers and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through natural language input such as speech input (see Williams col. 1 [line 23]).  An environment may include a number of different entertainment systems, including standalone speakers, wired speakers, wireless speakers or the like. However, the different entertainment systems may be separated as different devices are controlled separately from each other (see Williams col. 2 [line 26]).   Using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands.  Additionally or alternatively, the system may send audio data directly to the multiple entertainment systems/speakers in response to the voice commands (see Williams col. 2 [line 49]).  Additionally, lowering the volume on the speakers would better enable the user to hear voice responses.  Additionally, the system of Williams would improve the system of Kim by enabling the user to flexibly output desired audio in desired ways, thereby increasing the usefulness of the speakers in the system as well as enabling the user enjoy media content.

Regarding claim 20, Kim in view of Hart in further view of Janus in further view of Williams teaches all the limitations of claim 17.  Hart further teaches:
wherein causing output of the determined feedback for playback by the first playback device comprises causing output of only visual feedback elements of the determined feedback for playback (Hart Figs. 1-8; col. 2 [line 10], the first device includes a microphone for generating audio signals representative of user speech, as well as a speaker for outputting audible content in response to identified voice commands in the user speech. However, the first device might not include a display for displaying graphical content. As such, the first device may be configured to identify devices that include displays and that are proximate to the first device. The first device may then instruct one or more of these other devices to output visual content associated with a user's voice command; col. 3 [line 4], the second device may display content in any number of ways. In some implementations, the second device may include an application that is specifically configured to interact with the first device (e.g., a “companion application”). The companion application may be configured to receive information and/or instructions from the first device and/or a remote computing resource and display the appropriate content associated with the user's command. For instance, the application may display one or more links that lead to web sites, applications, or other destinations that include content about Benjamin Franklin. Additionally or alternatively, the application may directly pull in and display this content, such as detailed information about Benjamin Franklin; col. 3 [line 32], the second device awakens and directly causes display of the content upon receiving the instruction from the first device; col. 6 col. 10 [line 25], the first content comprises audible content while the second content comprises visual content; col. 10 [line 39], the process 600 identifies a device on which to output the content by identifying a device that is within a threshold distance of the first device and/or the user. In other instances, the process 600 identifies and selects a device on which to output content based on a type of the device, information regarding whether the device is powered on, and the like. The process 600 may also ensure that this other device is associated with the user. At 608, the process 600 visually presents the content on the identified device that is within the threshold distance)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein causing output of the determined feedback by the first playback device comprises causing output of only visual feedback elements of the determined feedback as suggested in Hart into Kim in view of Janus in view of Williams.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, and portable communication devices. As computing devices evolve, many different ways have been introduced to allow users to interact with these devices, such as through mechanical means (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through a user speaking to a device and the device outputting audio to the user in return. However, in some instances, certain content is best output in a form other than audio alone (see Hart col. 1 [line 14]).  The system of Hart would improve the system of Kim by enabling the master device to provide both audible and visual output to the slave devices (see Hart col. 6 [line 28]), thereby ensuring the user receives desired audible and visual feedback in the best output form (see Hart col. 1 [line 14]).  
Williams further teaches:
determining that the media content is lean in audio (Williams Figs. 1-18; abs. the system receives voice commands and may determine speakers playing output audio in proximity to the voice commands; the system may generate voice output and send the voice output to the speakers, along with a command to reduce a volume of output audio while playing the voice output; col. 3 [line 54] – col. 4 [line 14], a speaker controller 22 may control multiple speakers 20 and may send audio data to the multiple speakers 20 so that the multiple speakers 20 collectively generate output audio 30; col. 7 [line 18], the server(s) 112 may select every speaker 20 and/or output zone, may identify a preferred output zone (e.g., living room) based on previously received commands, may identify speaker(s) and/or an output zone associated with the device 110 that received the input audio 11; col. 8 [line 1], the user 10 hears the voice output at a first volume and the music at a second, lower, volume; col. 8 [line 18], the system 100 may enable the user 10 to instruct the server(s) 112 to generate output audio 30 using any combination of the speaker(s) 20; the system 100 may control the multiple speaker systems to play music from an audio source in response to a voice command (e.g., input audio 11); the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio 30 over the speaker(s) 20 while displaying output video on a television; col. 8 [line 46], an audio capture component, such as a microphone of device 110, captures audio 11 corresponding to a spoken utterance; col. 13 [line 19], the device 110 may be associated with domains for different applications such as music, telephony, calendaring, contact lists, and device-specific communications; col. 25 [line 51], As illustrated in FIG. 9B, device 110b may receive input audio 940 from user 10 including a command and the device 110b may send audio data corresponding to the input audio 940 to the server(s) 112. The server(s) 112 may determine the command and may send data 950 to speaker(s) 20 to generate output audio 960, which includes music and voice output; In response to some commands, the server(s) 112 may generate voice output indicating to the user 10 that the command was performed. For example, the voice output may state “audio muted,” “increasing volume,” “decreasing volume” or the like. Thus, the output audio 960 may include the music playing at a first volume and the voice output playing at a second volume higher than the first volume; col. 27 [line 50], the system 100 may control the multiple speaker systems to play audio corresponding to a video source, such as playing output audio over the speaker(s) 20 while displaying output video on a television. When the system 100 receives the input audio, the system 100 may control the speaker(s) 20 to lower a volume of the output audio while pausing the output video on the television; see also col. 17 [line 33], col. 22 [line 32], col. 22 [line 46], col. 22 [line 54], col. 25 [line 51], col. 32 [line 44])
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated determining that the media content is lean in audio as suggested in Williams into Kim in view of Hart in further view of Janus.  Doing so would be desirable because homes are becoming more wired and connected with the proliferation of computing devices such as desktops, tablets, entertainment systems, speakers and portable communication devices. As these computing devices evolve, many different ways have been introduced to allow users to interact with computing devices, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through natural language input such as speech input (see Williams col. 1 [line 23]).  An environment may include a number of different entertainment systems, including standalone speakers, wired speakers, wireless speakers or the like. However, the different entertainment systems may be separated as different devices are controlled separately from each other (see Williams col. 2 [line 26]).   Using the techniques described herein, a user is able to conveniently interact with multiple entertainment systems/speakers at one time using voice commands.  Additionally or alternatively, the system may send audio data directly to the multiple entertainment systems/speakers in response to the voice commands (see Williams col. 2 [line 49]).  Additionally, the system of Williams would improve the system of Kim by enabling the user to flexibly output desired audio in desired ways, thereby increasing the usefulness of the speakers in the system as well as enabling the user enjoy media content.

Response to Arguments
The Examiner acknowledges the Applicant’s amendments to claims 1, 2, 8, 9, 15, and 16, the cancellation of claim 21, and the addition of claim 22.  The rejection of claims 15-21 under 35 U.S.C. 112(a) is respectfully withdrawn.  
Regarding the rejection of claims 1-20 under 35 U.S.C. 112(a), the Applicant alleges the rejection of claims 1-21 under 35 U.S.C. 112(a) should be withdrawn (see remarks pp. 9-10).  Examiner respectfully disagrees.  As discussed in the rejection above, claims 1-14 and 22 contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor had possession of the claimed invention.  
Applicant alleges that Kim in view of Hart in further view of Williams as described in the previous Office action, does not explicitly teach determining, based on information related to a bonded configuration of the first playback device and the second playback device, that the first playback device is closer to a visual output device than the second playback device and after determining that the first playback device is closer to the visual output device than the second playback device, selecting the first playback device for playback of the determined feedback for playback, as has been amended to the claim.  Examiner has therefore rejected independent claim 1 under 35 U.S.C § 103 as unpatentable over Kim in view of Hart in further view of Williams in further view of Janus.
Similar arguments have been presented for claims 8 and 15 and thus, Applicant’s arguments are not persuasive for the same reasons.
Applicant states that the dependent claims recite all the limitations of the independent claims, and thus, are allowable in view of the remarks set forth regarding the independent claims.  However, as discussed above, Kim in view of Hart in further view of Williams in further view of Janus is considered to teach the independent claims, and consequently, the dependent claims are rejected.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. VanBlon (US 20180088894 A1) see Figs. 1-3 and [0015].
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN T REPSHER III whose telephone number is (571)272-7487. The examiner can normally be reached Monday - Friday, 8AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached at (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOHN T REPSHER III/            Primary Examiner, Art Unit 2143
Read full office action
Prosecution Timeline

Nov 06, 2023
Application Filed
Aug 02, 2024
Non-Final Rejection — §103, §112
Nov 13, 2024
Interview Requested
Nov 21, 2024
Examiner Interview Summary
Nov 21, 2024
Applicant Interview (Telephonic)
Dec 09, 2024
Response Filed
Feb 08, 2025
Final Rejection — §103, §112
Mar 14, 2025
Interview Requested
May 13, 2025
Request for Continued Examination
May 18, 2025
Response after Non-Final Action
Jun 04, 2025
Interview Requested
Jun 30, 2025
Applicant Interview (Telephonic)
Jun 30, 2025
Examiner Interview Summary
Oct 10, 2025
Non-Final Rejection — §103, §112
Jan 15, 2026
Response Filed
Feb 06, 2026
Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/091,844
Patent 12574602
CONTROL DISPLAY METHOD, ELECTRONIC DEVICE, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 10, 2026
17/171,794
Patent 12568166
TIME-AVERAGED PROXIMITY SENSOR
2y 5m to grant Granted Mar 03, 2026
16/174,108
Patent 12554991
Device and Method for Performing Self-Learning Operations of an Artificial Neural Network
2y 5m to grant Granted Feb 17, 2026
18/374,371
Patent 12511029
USER INTERFACE FOR AN AUTOMATED MASSAGE SYSTEM WITH BODY MODEL AND CONTROL OBJECT
2y 5m to grant Granted Dec 30, 2025
18/301,685
Patent 12483602
COMPUTER IMPLEMENTED METHOD AND APPARATUS FOR MANAGEMENT OF NON-BINARY PRIVILEGES IN A STRUCTURED USER ENVIRONMENT
2y 5m to grant Granted Nov 25, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
58%
Grant Probability
99%
With Interview (+48.0%)
3y 5m
Median Time to Grant
High
PTA Risk
Based on 347 resolved cases by this examiner. Grant probability derived from career allow rate.
SYSTEMS AND METHODS OF RECEIVING VOICE INPUT

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email