Detailed Action
Notice of Pre-AIA or AIA status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This Final Office action is responsive to the communication filed under 37 C.F.R. § 1.111 on December 12, 2025 (hereafter “Response”). The amendments to the claims are acknowledged and have been entered.
Claims 1, 8, 11, 18, and 20 are now amended.
Claims 1–20 are pending in the application.
Response to Arguments
Objections
The previous objections to the title and the specification are hereby withdrawn, responsive to the Applicant amending the specification to correct the informalities raised therein, and amending the title to recite one that is more descriptive of the claimed invention. However, the new amendment to the title introduces a minor grammatical informality, and therefore, a new objection is raised in order to prompt correction of the informality.
Indefiniteness
The amendment to claims 8 and 18 resolve the antecedent basis issue raised in the last office action, and therefore, the 35 U.S.C. § 112 rejection is withdrawn.
Prior Art Rejections
The rejection of claims 1–4, 9–14, 19, and 20 under 35 U.S.C. § 102(a)(1) as being anticipated by U.S. Patent Application Publication No. 2013/0307771 A1 (“Parker”) is hereby withdrawn, in response to the Applicant narrowing the scope of the sound source location in claims 1, 11, and 20 to something not explicitly disclosed by Parker.
The 35 U.S.C. § 102 rejections of claims 13–19 involving the “broader” interpretation of claim 11 are also withdrawn, because the amendment to claim 11 narrows the scope of the processor’s instructions to directly cause the claimed operations, rather than merely “facilitate” them.
Claims 1, 2, 11, and 12 stand rejected under 35 U.S.C. §§ 102(a)(1) and (a)(2) as being anticipated by U.S. Patent Application Publication No. 2021/0134286 A1 (“Burton”).1 The Applicant’s arguments concerning Burton’s anticipation of claims 1 and 11 have been considered, but are not persuasive.
The Applicant argues that “in Burton, the characteristics of the audio signal provided by utterances are merely used to determine ‘who’ - i.e., which user, is the speaker, but are not used for determining ‘where’ the speaker or the user is.” (Response 11).
Respectfully, the Applicant is incorrect. Burton explicitly discloses that the loudness levels of the same utterance, as measured by several devices distributed in different locations of a room, is compared to see where the user is relative each of the devices in the room. “For example, if a first candidate device and a second candidate device both receive the same spoken utterance input, but the amplitude, audibility, and/or quality of the audio signal is greater for the first candidate device (with less noise), the system can determine that the first candidate device is more likely to represent the target device. In other words, the SNR and loudness of a user's speech is likely to be higher at the device endpoint they are close to and intended to interact with.” Burton ¶ 60. Accordingly, this argument is not persuasive.
The Applicant also argues that “nowhere in Burton has mentioned that a location of a speaker or the user is determined solely based on the characteristics of the audio signal of the spoken utterance,” (Response 11), but this argument is not persuasive either, because none of the claims require the location of the speaker to be determined “solely” based on the characteristics of the audio signal of the spoken utterance.
The Applicant also argues that Burton does not mention “any information regarding determining a location of a speaker or the user based on a sound field distribution obtained through measuring sound pressure near the sound source location.” (Response 11). This argument is not persuasive either, because it is not true. Burton explicitly discloses using the “loudness of a user’s speech” measured at several different points in the room (each point corresponding to a respective one of the controllable devices) to determine which “device endpoint they are close to and intended to interact with.” Burton ¶ 60. By definition, all sound is perceived from “mechanical radiant energy that is transmitted by longitudinal pressure waves in a material medium (such as air).” See sound, Merriam-Webster’s Online Dictionary, <https://www.merriam-webster.com/dictionary/sound>. Accordingly, “loudness” is the measure of the amount of pressure carried by sound. Burton therefore does determine a location of a speaker or the user based on a sound field distribution obtained through measuring sound pressure near the sound source location, as claimed.
Accordingly, claims 1, 2, 11, and 12 stand rejected under 35 U.S.C. § 102 as anticipated by Burton, and the remaining claims (not individually discussed in the Response) are rejected under 35 U.S.C. § 103 based on the combination of Burton with other prior art references, as discussed herein.
For these reasons, the Applicant’s request for a Notice of Allowance (Response 14) is respectfully denied.
Specification
The Office objects to the specification for having the following informalities: the serial conjunction in the new title of the invention is improperly formed. It should be amended as follows: METHOD, APPARATUS, AND SYSTEM FOR INTERFACE CONTROL VIA SOUND SOURCE AND LINE-OF-SIGHT INFORMATION
Appropriate correction is required.
Claim Objections
The Office objects to claims 1, 11, and 20 for having the following informalities.
Objection I
The amendment to claims 1, 11, and 20 tack all of the new limitations onto the end of the claims, rather than reciting them together with the elements that actually introduce them. This makes the narrative of the claim difficult to follow, as one must jump back and forth to fully understand the limitations of the first recited step.
Also, the use of a “wherein” clause paired with the passive voice (“is determined based on a sound field distribution obtained through measuring sound pressure near the sound source location”) makes it unclear whether the new limitations are even required at all. Claims 1 and 11 do not yet recite an affirmative step of actually measuring the sound pressure near the sound source location or even obtaining the sound field distribution; they only say that they “obtain” the location, so it is not clear whether one needs to perform this extra step in order to infringe (or anticipate) the claim.
Additionally, if the “wherein” clause was to remain in place, the “and” conjunction should not have been moved, because the wherein clause is not a step in the list of steps recited in the claims.
Claim 1 should therefore be amended as follows, if the Applicant wishes to maintain the newly added limitation:
1. An interface control method applied to an interface control apparatus, the method comprising:
obtaining a speech instruction of a user
obtaining a sound source location where the user delivered the speech instruction, based on a sound field distribution obtained through measuring sound pressure near the sound source location;
obtaining line-of-sight information of the user;
determining a target window on an interface based on the sound source location and the line-of-sight information;
controlling the target window based on the speech instruction
Claims 11 and 20 have similar informalities, and are under objection for the same reason, with similar changes proposed.
Objection II
The Office separately objects to claim 20 for having the following informality: the Applicant erroneously excluded an extra word (“the”) from the deletion instruction on lines 4–5 of the claim, resulting in a double recitation of that word (“cause the the interface control apparatus to perform”).
Appropriate correction is required.
Claim Rejections – 35 U.S.C. § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. § 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1, 2, 11, and 12 are rejected under 35 U.S.C. §§ 102(a)(1) and (a)(2) as being anticipated by U.S. Patent Application Publication No. 2021/0134286 A1 (“Burton”).
Claim 1
Burton discloses:
An interface control method applied to an interface control apparatus, the method comprising:
Reference is made to FIG. 2, which illustrates a “processing environment 230,” implemented as a computing device, that performs the claimed method as part of its normal operation by executing the code in its various modules. Burton ¶ 27. “Under the principles of inherency, if a prior art device, in its normal and usual operation, would necessarily perform the method claimed, then the method claimed will be considered to be anticipated by the prior art device.” MPEP § 2112.02.
Burton also explicitly discloses the method 800 performed by the processing environment 230 as well, see Burton FIG. 8, but for the sake of brevity, this rejection will focus on Burton’s disclosure of the normal operations performed by processing environment 230.
obtaining a speech instruction of a user and a sound source location of the user;
“[A] parallel signals processing component 232 of the multiple device response detector 240 can receive and process data from devices 206 that indicates that two or more devices in a device neighborhood have received a similar spoken utterance at or around the same time.” Burton ¶ 41. Spoken utterance include “commands or requests,” Burton ¶ 47, and therefore fall within the scope of the claimed speech instruction.
Additionally, “a signal characteristics component 224 may receive and analyze attention data collected by each of the candidate devices at or around the time the spoken utterance 212 occurred, for example via attention sensors 214a, 214b, 214bb, 214c. This information can be used to assign a likelihood of that a particular candidate device was the target device.” Burton ¶ 42. More specifically, as shown in FIG. 4, this data can include “characteristics of the audio signal provided by utterances 410, such as but not limited to amplitude (loudness) 412, audibility 414, and signal-to-noise ratio (SNR) 416.” Burton ¶ 60. Such characteristics fall within the scope of sound source location because “if a first candidate device and a second candidate device both receive the same spoken utterance input, but the amplitude, audibility, and/or quality of the audio signal is greater for the first candidate device (with less noise), the system can determine that the first candidate device is more likely to represent the target device.” Burton ¶ 60.
obtaining line-of-sight information of the user;
The attention data obtained by signal characteristics component 224 further includes “eye-gaze data 432 may be collected by some or all of the candidate devices at or around the time at which the spoken utterance is received.” Burton ¶ 62.
determining a target window on an interface based on the sound source location and the line-of-sight information;
After using all of the above information to assign a likelihood level to each candidate device as the target device, a device ranking component 226 ranks the candidate devices, and the ranking is “used by a device selector 228 to select the device that was most likely to represent the target device.” Burton ¶ 42.
As shown in FIGS. 1A, 1B, and 5–7, at least one of the candidate devices displays content provided by a respectively executing application. For example, “[i]n FIG. 1A, the first user 100 is viewing a document 160 on a display 152 of the first client 150,” Burton ¶ 22, while “[i]n FIG. 5, the first user 510 is viewing a spreadsheet 556 on the viewscreen 550.” Burton ¶ 72.
and controlling the target window based on the speech instruction;
“The most likely target device is then selected by a device selector 228 as the device that will render a response to the spoken utterance for a VA interaction.” Burton ¶ 42; see also Burton ¶¶ 24 (FIG. 1B, visual presentation 140) and 75 (FIG. 6, indication 600).
and wherein the sound source location is a location where the user delivers the speech instruction, and is determined based on a sound field distribution obtained through measuring sound pressure near the sound source location.
“For example, if a first candidate device and a second candidate device both receive the same spoken utterance input, but the amplitude, audibility, and/or quality of the audio signal is greater for the first candidate device (with less noise), the system can determine that the first candidate device is more likely to represent the target device. In other words, the SNR and loudness of a user's speech is likely to be higher at the device endpoint they are close to and intended to interact with.” Burton ¶ 60.
Claim 2
Burton discloses the method according to claim 1,
wherein the target window is closest to the sound source location and is located in a line-of-sight direction indicated by the line-of-sight information.
As shown in FIG. 4, device selector 228 uses a combination of both the audio data 410 and eye-gaze data 432 at the time of the utterance, to “provide a clearer indication as to which device was the intended target device.” Burton ¶ 62.
In more detail, the system can determine “if a first candidate device and a second candidate device both receive the same spoken utterance input, but the amplitude, audibility, and/or quality of the audio signal is greater for the first candidate device (with less noise),” Burton ¶ 60, but at the same time, “eye-gaze data 432 may be collected by some or all of the candidate devices at or around the time at which the spoken utterance is received” to “provide a clearer indication as to which device was the intended target device,” Burton ¶ 62, where “gaze that is directed toward a device may be used to identify an intent to invoke the services of the virtual assistant at that device.” Burton ¶ 73.
Claims 11 and 12
Claims 11 and 12 are directed to the same interface control apparatus mentioned in claims 1 and 2, that performs the same method recited in claims 1 and 2, including a memory and processor configured to perform the method as part of its normal operation. Burton discloses that process for the reasons given in the rejection of claims 1 and 2, and further discloses a “processing environment 230,” implemented as a computing device with the same components, that performs the claimed method as part of its normal operation by executing the code in its various modules. Burton ¶ 27.
Therefore, claims 11 and 12 are anticipated by Burton according to both this finding and the findings set forth in the rejection of claims 1 and 2.
Claim Rejections – 35 U.S.C. § 103
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were effectively filed absent any evidence to the contrary. Applicant is advised of the obligation under 37 C.F.R. § 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned at the time a later invention was effectively filed in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
I. Burton and Noda teach claims 3–8 and 13–18.
Claims 3–8 and 13–18 are rejected under 35 U.S.C. § 103 as being unpatentable over Burton as applied to claim 1 and the narrow interpretation of claim 11, above, and further in view of U.S. Patent Application Publication No. 2018/0239440 A1 (“Noda”).
Claim 3
Burton teaches the method according to claim 1,
wherein a window closest to the sound source location is a first window,
“[I]f a first candidate device and a second candidate device both receive the same spoken utterance input, but the amplitude, audibility, and/or quality of the audio signal is greater for the first candidate device (with less noise), the system can determine that the first candidate device is more likely to represent the target device.” Burton ¶ 60.
a window in the line-of-sight direction indicated by the line-of-sight information is a second window,
“Furthermore, attention data 430 as collected by various sensors of the candidate devices may provide a clearer indication as to which device was the intended target device.” Burton ¶ 62.
Burton does not appear to explicitly contemplate a preset priority between the utterance data 410 over the attention data 430, or vice versa. This leads to a problem in cases where the utterance data 410 and the attention data 430 conflict with one another (e.g., if the user is looking at a device further from where he is speaking).
However, many solutions to this problem were known prior to the effective filing date of the claimed invention. In particular, Noda, teaches a technique that, when applied to Burton’s base system, results in a system that determines a target device (which is displaying a target application window), wherein determining the target window on the interface based on the sound source location and the line-of-sight information comprises:
determining the target window based on a priority of the sound source location and a priority of the line-of-sight information,
“The information processing apparatus 100 (for example, the display control unit 143) according to the present embodiment has a function of reflecting operation information on display on the basis of priority set for each modal,” Noda ¶ 55, where “modal” refers to the different available modalities of input, such as “gaze input, gesture input, touch input and voice input.” Noda ¶ 56.
wherein based on the priority of the sound source location being higher than the priority of the line-of-sight information, the first window is the target window; or based on the priority of the line-of-sight information being higher than the priority of the sound source location, the second window is the target window.
“In the example illustrated in FIG. 3, lower priority is set for gaze input, gesture input, touch input and voice input in this order.” Noda ¶ 56. However, note that the display control unit 143 is not limited to the order shown in FIG. 3, it may differ depending on the user’s preferences or other circumstances as described elsewhere in Noda’s disclosure.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to substitute Burton’s mechanism for resolving conflicts between gaze and voice (see Burton ¶ 42, second-to-last sentence) with Noda’s mechanism for resolving the same conflicts, i.e., by instituting a hierarchy amongst input modals like the one shown in FIG. 3. One would have been motivated to substitute Burton’s mechanism with Noda’s mechanism because, depending on the circumstances, gaze input may be either is more convenient or more inconvenient than other forms of input. See Noda ¶¶ 40–41.
Claim 4
Burton and Noda teach the method according to claim 3,
wherein the priority of the sound source location and the priority of the line-of-sight information are predefined.
“For example, the display control unit 143 may set higher priority for a modal with higher operation load for the user or higher operation accuracy or higher reliability.” Noda ¶ 56. Alternatively, “the priority may be arbitrarily set by the user.” Noda ¶ 57.
Claim 5
Burton and Noda teach the method according to claim 4,
wherein the sound source location has first priority information, the line-of-sight information has second priority information, the first priority information is used to determine the priority of the sound source location, and the second priority information is used to determine the priority of the line-of-sight information.
The voice modal and the gaze modal are each assigned a respective rank in the hierarchy (e.g., as shown in FIG. 3). Noda ¶ 56. Hence, “the user can operate the indicator preferentially using a modal with higher priority.” Noda ¶ 57.
Claim 7
Burton and Noda teach the method according to claim 3,
wherein there is a first correlation between the sound source location and a service indicated by the speech instruction,
As shown in FIG. 4, “characteristics of the audio signal provided by utterances 410, such as but not limited to amplitude (loudness) 412, audibility 414, and signal-to-noise ratio (SNR) 416” provide a first source of likelihood, for each of the devices in the system, that a user wishes to invoke an assistant on a particular device endpoint. Burton ¶ 60.
and there is a second correlation between the line-of-sight information and the service indicated by the speech instruction.
“Furthermore, attention data 430 as collected by various sensors of the candidate devices may provide a clearer indication as to which device was the intended target device. For example, eye-gaze data 432 may be collected by some or all of the candidate devices at or around the time at which the spoken utterance is received.” Burton ¶ 62. In other words, the attention data 430 includes information about the likelihood that a user wishes to invoke the assistant on a particular device endpoint. See Burton ¶ 62.
Claim 8
Burton and Noda teach the method according to claim 7, further comprising:
adjusting the first correlation and the second correlation based on an execution result of the speech instruction.
Sometimes, “there remains ambiguity between candidate devices,” Burton ¶ 42, e.g., if there is a tie between the utterance data 410 and the attention data 430 (which, again, correspond to the claimed first and second correlation for the reasons given in claim 7). In such a case, “the system 200 can choose a most recently used or a most frequently used candidate device,” Burton ¶ 42, thus disregarding the correlation of one modality for another, on the basis of the last device that successfully executed prior commands (most recently used device), or on the basis of the device that most often successfully executed prior commands (most frequently used device)
“If there remains ambiguity between candidate devices, the system 200 can choose a most recently used or a most frequently used candidate device.” Burton ¶ 42.
Claims 13–15, 17, and 18
Claims 13–15, 17, and 18 are rejected over the combined teachings and suggestions of Burton with Noda for the same reasons as given above for claims 3–5, 7, and 8, above.
II. Burton, Noda, and Choudhary teach claims 6 and 16.
Claims 6 and 16 are rejected under 35 U.S.C. § 103 as being unpatentable over Burton and Noda as applied to claims 5 and 15 above, and further in view of U.S. Patent Application Publication No. 2021/0012770 A1 (“Chaudhary”).
Claim 6
Burton and Noda teach the method according to claim 5, but do not appear to explicitly disclose adjusting the first priority information and the second priority information based on the execution result.
Choudhary, however, teaches a method comprising:
obtaining an execution result of the speech instruction;
As shown in FIG. 2, a multi-modal user interface system maintains history data 258, 268 of each user 250, 260 of the system “based on historical trends corresponding to multi-modal inputs of the first user processed by the multi-modal recognition engine 130,” Chaudhary ¶ 50, or the second user, Chaudhary ¶ 51, respectively. “For example, the processor 108 may determine, such as based on the first history data 258, that speech inputs from the first user are less reliably interpreted as compared to gesture inputs from the first user.” Chaudhary ¶ 53.
and adjusting the first priority information and the second priority information based on the execution result.
As a result of the history data, the system adjusts the weight that it applies to each mode of input for that user, accordingly. For example, if the first user’s history 258 indicates his speech inputs are less reliably interpreted (i.e., the result of his speech commands tends to fail or be wrong), “the weight W1 may be reduced from a default W1 value, and the weight W2 may be increased from a default W2 value in the first weight data 254 to reduce reliance on speech inputs and to increase reliance on gesture inputs from the first user.” Chaudhary ¶ 53.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to improve Burton and Noda’s prioritized multi-modal input system with Chaudhary’s technique of prioritizing the different modes of input based on the results of past voice commands. One would have been motivated to apply Chaudhary’s weighting technique to better accommodate differences among the users. “For example, the first user and the second user may have different accents, different styles of gesturing, different body mechanics when performing video input, or any combination thereof.” Chaudhary ¶ 51.
Claim 16
The additional limitations of claim 16 (narrow version) are substantially similar to those recited in claim 6, and therefore, claim 16 is rejected according to the same findings and rationale as provided in the rejection above.
III. Burton and Parker teach claims 9, 19, and 20.
Claim 20 is rejected under 35 U.S.C. § 103 as being unpatentable over Burton in view of U.S. Patent Application Publication No. 2013/0307771 A1 (“Parker”).
Claims 9 and 19 are rejected under 35 U.S.C. § 103 as being unpatentable over Burton, as applied to claims 1 and 11 above, and further in view of Parker.
Claim 9
Burton teaches the method according to claim 1, wherein the controlling the target window based on the speech instruction comprises:
displaying, in the target window, an icon corresponding to a service indicated by the speech instruction
“In FIG. 6, the viewscreen 550 presents an indication 600 that the spoken utterance issued by the first user 510 has been received and is being processed by the computing device 552.” Burton ¶ 75.
Burton does not appear to explicitly disclose that its indication 600 further includes “one or more indexes,” i.e., as described in paragraph 79 of the present disclosure.
Parker, however, teaches the method according to claim 1, wherein the controlling the target window based on the speech instruction comprises:
displaying, in the target window, an icon corresponding to a service indicated by the speech instruction, wherein the icon comprises one or more indexes.
“Interface 310 provides a number of soft buttons, tiles, icons or text for the driver to select various options for the infotainment system,” Parker ¶ 54, which “may change based upon the driver's point of focus on the display itself.” Parker ¶ 60. Accordingly, the interface 310 displays one or more icons or tiles (the claimed indexes) corresponding to the interaction set that is currently active (based on the driver’s point of focus). See Parker ¶¶ 61–63.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to add Parker’s “soft buttons, tiles, icons or text” to Burton’s indication 600, thereby providing the claimed one or more indexes as part of that indication 600. One would have been motivated to add Parker’s analogous indexes to Burton’s indication 600 because this would reduce the number of button-presses the user would need to issue in order to navigate the user interface. See Parker ¶ 62.
Claim 19
Claim 19 is substantially similar to claim 9, and therefore rejected over the same findings and rationale as provided above for claim 9.
Claim 20
Burton teaches:
“FIG. 10 is a block diagram illustrating components of an example machine 1000 configured to read instructions from a machine-readable medium (for example, a machine-readable storage medium) and perform any of the features described herein. The example machine 1000 is in a form of a computer system, within which instructions 1016 (for example, in the form of software components) for causing the machine 1000 to perform any of the features described herein may be executed.” Burton ¶ 93.
obtaining a speech instruction of a user and a sound source location of the user;
“[A] parallel signals processing component 232 of the multiple device response detector 240 can receive and process data from devices 206 that indicates that two or more devices in a device neighborhood have received a similar spoken utterance at or around the same time.” Burton ¶ 41. Spoken utterance include “commands or requests,” Burton ¶ 47, and therefore fall within the scope of the claimed speech instruction.
Additionally, “a signal characteristics component 224 may receive and analyze attention data collected by each of the candidate devices at or around the time the spoken utterance 212 occurred, for example via attention sensors 214a, 214b, 214bb, 214c. This information can be used to assign a likelihood of that a particular candidate device was the target device.” Burton ¶ 42. More specifically, as shown in FIG. 4, this data can include “characteristics of the audio signal provided by utterances 410, such as but not limited to amplitude (loudness) 412, audibility 414, and signal-to-noise ratio (SNR) 416.” Burton ¶ 60. Such characteristics fall within the scope of sound source location because “if a first candidate device and a second candidate device both receive the same spoken utterance input, but the amplitude, audibility, and/or quality of the audio signal is greater for the first candidate device (with less noise), the system can determine that the first candidate device is more likely to represent the target device.” Burton ¶ 60.
obtaining line-of-sight information of the user;
The attention data obtained by signal characteristics component 224 further includes “eye-gaze data 432 may be collected by some or all of the candidate devices at or around the time at which the spoken utterance is received.” Burton ¶ 62.
determining a target window on an interface based on the sound source location and the line-of-sight information;
After using all of the above information to assign a likelihood level to each candidate device as the target device, a device ranking component 226 ranks the candidate devices, and the ranking is “used by a device selector 228 to select the device that was most likely to represent the target device.” Burton ¶ 42.
As shown in FIGS. 1A, 1B, and 5–7, at least one of the candidate devices displays content provided by a respectively executing application. For example, “[i]n FIG. 1A, the first user 100 is viewing a document 160 on a display 152 of the first client 150,” Burton ¶ 22, while “[i]n FIG. 5, the first user 510 is viewing a spreadsheet 556 on the viewscreen 550.” Burton ¶ 72.
and controlling the target window based on the speech instruction;
“The most likely target device is then selected by a device selector 228 as the device that will render a response to the spoken utterance for a VA interaction.” Burton ¶ 42; see also Burton ¶¶ 24 (FIG. 1B, visual presentation 140) and 75 (FIG. 6, indication 600).
and wherein the sound source location is a location where the user delivers the speech instruction, and is determined based on a sound field distribution obtained through measuring sound pressure near the sound source location.
“For example, if a first candidate device and a second candidate device both receive the same spoken utterance input, but the amplitude, audibility, and/or quality of the audio signal is greater for the first candidate device (with less noise), the system can determine that the first candidate device is more likely to represent the target device. In other words, the SNR and loudness of a user's speech is likely to be higher at the device endpoint they are close to and intended to interact with.” Burton ¶ 60.
Burton does not appear to explicitly disclose an implementation of machine 1000 within a vehicle.
Parker, however, teaches:
A vehicle, comprising an interface control apparatus which includes: a memory having processor-executable instructions stored thereon; and a processor configured to execute the instructions in the memory to cause the interface control apparatus to perform the following:
Reference is made to FIGs. 1 and 5, which illustrate a “a system 101 that uses gaze detection for management of user interaction.” Parker ¶ 33. “system 101 may be used in a vehicle,” Parker ¶ 39, and comprises a processor 104 and memory with instructions for performing the functionality disclosed therein. See Parker ¶¶ 33–37
obtaining a speech instruction of a user and a sound source location of the user;
With respect to the speech instruction, Parker discloses that system 101 includes a “speech detector 107” with “a microphone that captures the user's sounds. These sounds are then compared to a dictionary or grammar of known words to identify the user's spoken inputs or commands.” Parker ¶ 34.
With respect to the sound source location, Parker further discloses that “[t]he spoken inputs or commands are provided to interface controller 103 for further processing,” Parker ¶ 34, such as the further processing shown in FIG. 5. Specifically, as shown in FIG. 5, “[a]udio selector 503 receives audio information 504, such as the user's spoken commands,” and “ensures that the best audio signal is presented to the system,” by using “a beam former and microphone array, a multitude of individual microphones, or a single microphone . . . to capture and select user audio.” Parker ¶ 65. “Processing by audio selector 503 may be based on signals from gaze detector 501, such as signals indicating whether the user is facing a microphone.” Parker ¶ 65.
obtaining line-of-sight information of the user;
“System 101 also includes gaze detector 112 that monitors the user’s body, head, eye, and/or iris position to determine the user's line of sight and/or point of focus.” Parker ¶ 36. Much like with the audio signal, FIG. 5 further illustrates the flow of gaze detector 112’s signal in the context of the program executed by system 101.
determining a target window on an interface based on the sound source location and the line-of-sight information;
“User interface controller 103 may further select an interaction set 114 corresponding to the area the driver has focused on so that any command spoken by the driver are optimized for that area.” Parker ¶ 39.
The “area the driver has focused on” is determined based on both sound source location and line-of-sight information. Specifically, as shown in FIG. 5, gate and segmenter component 507 receives the directionally-selected audio signal 504 from audio selector 503, and also receives the gaze information from gaze detector 501, causing gaze and segmenter component 507 to “segment[] the audio signals and pair[] audio segments to changes in gaze information.” Parker ¶ 67. “For example, the user may speak one series of commands while looking at a first object and then speak other commands when looking at a second object. Gate and segmenter component 507 links each set of commands with the appropriate object.” Parker ¶ 67. As a further accuracy enhancement, “[t]he gaze detection information is provided to context engine 509, which may influence speech recognition. Context engine 509 understands how the gaze information relates to certain components, objects, or sections of a display and can apply changes in the user's gaze to the operation of the speech recognizer 508.” Parker ¶ 68.
and controlling the target window based on the speech instruction.
“By identifying the area of the display that the driver is focused on, the system can respond with a result that the user expects, such as a new display with additional objects that are related to the area of the driver's focus.” Parker ¶ 40. For example, in the case of a navigation application that presents a map on display 102, “gaze detector 112 determines that the driver is looking at the map display 102 [and] sends gaze information to user interface controller 103 to identify the map as the driver's point of focus,” so that “[i]f the driver then says ‘find home,’ the speech detector 107 recognizes the command and provides it to the navigation application via user interface controller 103,” which responds by updating the map application with the route. Parker ¶ 40.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to improve Burton’s system 1000 in the same way that Parker improved its extremely similar system 101, i.e., by installing it in a vehicle. One would have been motivated to copy Parker’s directive to install either system in a vehicle because such a system “can minimize the amount of time that the driver has to look inside the vehicle and away from the road.” Parker ¶ 57.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 C.F.R. § 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 C.F.R. § 1.17(a)) pursuant to 37 C.F.R. § 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Justin R. Blaufeld whose telephone number is (571)272-4372. The examiner can normally be reached M-F 9:00am - 4:00pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James K Trujillo can be reached at (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Justin R. Blaufeld
Primary Examiner
Art Unit 2151
/Justin R. Blaufeld/Primary Examiner, Art Unit 2151
1 Even though the rejection of claims 1 and 11 under 35 U.S.C. § 102 over Burton is maintained, the new grounds of rejection for claims 2, 9, 12, and 19 involving the Burton reference were necessitated by the amendment to independent claims 1 and 11, because claims 2, 9, 12, and 19 were previously rejected over a different reference (Parker) that the present amendment to claims 1 and 11 overcomes.