DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In communications filed on 06/27/2025. Claim 1 is amended. Claims 3, and 5-6 are cancelled. Claims 1-2, 4, and 7-23 are pending in this examination.
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. This examination is in response to US Patent Application No. 17/447,929.
Examiner Note
Claim 20 recites that “a computer readable storage medium”. The computer readable storage medium has been described on Paragraph 71 as: The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Applicant is encouraged to review the relevant references mentioned at the conclusion section of this office action.
Response to Arguments
Applicant's arguments filed 06/27/2025 have been fully considered but they are not persuasive:
Applicant submits on pages 7-9 of remarks filed on 06/27/2025 regarding 112(a), and 112(b) rejection in claim 1 for limitations “statistically significant differences” , and “ wherein analyzing comprises detecting and identifying a polarized light at different intensities directed from a laser entering a workspace located within the location and hitting the one or more other IoT devices, and wherein performing the ameliorative action comprises: determining that the audio command was not authorized based on both detecting and identifying the polarized light at different intensities directed from a laser intrusion entering a workspace located within the location and hitting the one or more other IoT devices and the anomaly; sending an alert utilizing an alert system to notify an authorized owner of the first IoT device about possible attacks; and delaying processing of the audio command until the authorized owner approves the audio command”, that with regard to the first issue raised in the last full paragraph of page 4 of the Office Action, the phrase statistically significant differences when read in the context of the entire instant application, including the specification is clear to those of ordinary skill in the art, and With regard to the second issue raised in the paragraph bridging pages 4-5 of the Office Action, the objected to language is no longer in the claims as amended because the phrase and or is no longer recited. Consequently, the claims as amended when viewed in light of the specification are clear because the words and phrases are clear in the context of the specification.
Examiner respectfully disagrees with applicant argument for claim 1 filed on 06/27/2025 on pages 7-9 of remarks.
Regarding the first issue, the phrase statistically significant differences will not be clear to those of ordinary skill in the art since the ordinary skill in the art will not be able to know the magnitude of statistically significant difference, and specification do not provide any details how it is measured.
Regarding the second issue, the claimed limitation still does not indicate that the first IoT device also gets hit with laser and then sending an alert and notifying the owner of possible attack, it just mentions that one or more other IoT devices getting hit with the laser.
Examiner maintain the rejection for both arguments.
Applicant submits on pages 11-12 of remarks filed on 06/27/2025 regarding claim 1,
For example, the Ady, Riaz, Tian, Takeshi Sugawara, Snyder and/or Amman references do not disclose or suggest the claimed invention because Ady, Riaz, Tian, Takeshi, Snyder and/or Amman do not disclose or suggest the claimed elements of sending an alert utilizing an alert system to notify an owner about possible attacks; and delaying processing of the audio command until the authorized owner approves the audio command. The references are silent regarding delaying processing of an audio command until an authorized owner approves an audio command.
Examiner respectfully disagrees with applicant argument for claim 1 filed on 06/27/2025 on pages 11-12 of remarks.
Snyder discloses delaying processing of the audio command until the authorized owner approves the audio command [¶59, The term"as-live"refers to a live media production that has been recorded for a delayed broadcast over traditional or network mediums. The delay period is typically a matter of seconds and is based on a number of factors. For example, a live broadcast may be delayed to grant an editor sufficient time to approve the content or edit the content to remove objectionable subject matter], and [¶112, If audio commands are assigned to a variable, an audio object 110 must be selected. For audio, the property page field (s) includes an audio command field, audio control channel field, audio preset field, cross-fade group field, and audio grouping field. The audio command field includes the instructions for controlling an audio device. Audio commands include, but are not limited to, fade up, fade down, cross-fade-up, and cross-fade-down], and [¶57… Media productions also include live or recorded audio (including radio broadcast) …].
Applicant submits on pages 12-13 of remarks filed on 06/27/2025 regarding claim 1, Ady, Riaz, Tian, Takeshi Sugawara, Snyder and/or Amman references do not disclose or suggest the claimed invention because Ady, Riaz, Tian, Takeshi, Snyder and/or Amman do not disclose or suggest the claimed elements of performing an ameliorative action in response to an anomaly identified during the analyzing, wherein the anomaly comprises statistically significant differences in expected patterns of audio delay.
Examiner respectfully disagrees with applicant argument for claim 1 filed on 06/27/2025 on pages 12-13 of remarks.
While ADY discloses this limitation as: [¶34, Alternatively or in addition to responding to tactile control inputs during this version of a privacy mode, the always-on privacy mode utility 122 can configure the first user device 100 to respond to voice commands that are determined to be made directly within hand-held or earpiece proximity. For example, the audio amplitude and delay analyzer utility 128 can respond to the at least one sound receiving component 149 receiving an audible command by measuring a volume magnitude of the audible command. Further, the audio amplitude and delay analyzer utility 128 can compare the volume magnitude to a loudness threshold that is pre-selected to indicate when a user is speaking directly into the first user device 100. In response to the volume magnitude exceeding the loudness threshold, the voice-activated information assistant 130 is allowed to process and respond to the audible command], and [¶40, In one or more embodiments, the audible acknowledgement may be the same from two or more user devices and thus the uniqueness of the audible acknowledgement as the basis of distinguishing the audible acknowledgement originating from the first user device 100 from another user device is questionable. However, the audio amplitude and delay analyzer 128 can discern sound qualities that can be used to differentiate the first user device 100 from another user device. For example, due to processing delays by the first user device 100, another user device can respond first to the voice activation. Thus, even with a delay due to the time required for the sound to travel from the other user device to the first user device 100, a delay may be insufficient to distinguish when using one sound receiving component 149. In this instance, the audio amplitude variance between the two sources can be used to determine whether the source is the first user device 100 or another user device. In another instance, the first user device 100 can be in close proximity to another user device 100 such that delay and amplitude are not significantly different. However, the audio amplitude and delay analyzer utility 128 can detect differences between what the front microphone 150 and the back microphone 152 detects, and these differences can be utilized to distinguish the two audio input sources 210. In yet another example, due to the volume settings of the first user device 100 and another user device, the amplitude detected can be the same at the at least one sound receiving component 149. However, the distance can impart a delay that is detectable by the amplitude and delay analyzer utility 128], and [¶44, In one or more embodiments, the user interface 134 displays and audibly responds to the authorized user 110 as depicted. For example, a privacy announcement 320, a challenge query 323 and a tactile input control 325 can be displayed on the user interface 134 and/or interfaced with as depicted at 110'. Similarly, the user interface 134 allows or in some instances requires direct interaction with the user interface 134 within earpiece proximity. Thresholds for audible volume and sensitivity for receiving audible voice commands can be pre-determined or adjusted to constrain the size of the loudspeaker proximity 315. For example, a third user device 305 and a third user 309 can be outside of the loudspeaker proximity 315. Thus, while third user device 305 may also detect voice activation signal 102, an audible acknowledgement 106 by the third user device 305 may have a variance in volume magnitude or time delay in arrival at the first user device 100 as to be undetectable], and [¶56, If the determination is that the second timer has expired in block 510, then the always-on privacy mode utility 122 accesses the privacy settings in block 512 with regard to the one or more manners in which the always-on privacy mode utility 122 is configured to determine whether an authorized user has been verified. In a first illustrative determination, the always-on-privacy mode utility 122 accesses settings regarding verification based upon the authorized user speaking directly into the first user device 100. A determination is made in decision block 514 as to whether this first verification setting is enabled. If enabled in decision block 514, then the always-on privacy mode utility 122 compares the volume magnitude of the received confirmation as measured by the audio amplitude and delay analyzer utility 128 to a threshold. After the comparison in block 515, the always-on privacy mode utility 122 makes a determination in decision block 516 as to whether the confirmation response is verified based on the comparison.], and [ claim 3, The method of claim 1 wherein processing and responding to a received audible command comprises: measuring a volume magnitude of the audible command; comparing the volume magnitude to a loudness threshold that is pre-selected to indicate when a user is speaking directly into the user device; and in response to the volume magnitude exceeding the loudness threshold, processing and responding to the received audible command], and [¶¶ 35-39, 44].
Furthermore, Takeshi discloses: [page 2644, section 6.4, Acoustic Stealthiness. To tackle the issue of the device owner hearing the targeted device acknowledging the execution of voice command (or asking for a PIN number during the brute forcing process), the attacker (equated to intruder) can start the attack by asking the device to lower its speaker volume. For some devices (EcoBee, Google Nest Camera IQ, and Fire TV), the volume can be reduced to completely zero, while for other devices it can be set to barely-audible levels. Moreover, the attacker can also abuse device features to achieve the same goal. For Google Assistant, enabling the “do not disturb mode” mutes reminders, broadcast messages and other spoken notifications. For Amazon Echo devices, enabling “whisper mode” significantly reduces the volume of the device responses during the attack to almost inaudible levels], and [ Page 2631, section 1.1, Attack Stealthiness and Cheap Setup. We then show how an attacker can build a cheap yet effective injection setup, using commercially available laser pointers and laser drivers. Moreover, by using infrared lasers and abusing volume features (e.g., whisper mode for Alexa devices) on the target device, we show how an attacker can mount a light-based audio injection attack while minimizing the chance of discovery by the target’s legitimate owner.].
RIAZ further discloses this limitation as: [ABSTRACT A method and system that utilize perceptual quality of service (QoS) metrics to determine whether to admit new calls onto a VoIP network is described. Perceptual QoS metrics are generated at a communications device, such as a telephone, that represent the measurement of perceptible variations in the quality level of a voice signal, or audio and video signal received at the communications device. A call admission threshold is generated from the perceptual QoS metrics for each node in the network that has communications devices attached thereto. When a request to admit a new call on the network through a node is received, the call admission threshold is compared with a priority value associated with the request to determine whether to admit the new call onto the network], and [ Page 7, Figure 2 is a flow chart illustrating the operation of the VoIP network 100 concerning the admittance of a new call onto the network through a network gateway or node according to one embodiment of the present invention. In block 210, perceptual QoS agents located at various voice terminals distributed over the network measure and generate one or more perceptual QoS metrics that relate to perceptible variations in the quality of the voice stream. In the simplest form, the resulting metrics may comprise a single number indicating the overall quality of the voice signal, or the metrics may comprise a more sophisticated set of data that individually describes the various anomalies within the voice signal, such as delay, choppiness, and speech level variation…], and [Pages 8-9, In yet another embodiment wherein the threshold value assigned to the node represents either an acceptable or unacceptable perceptual QoS on that node, no priority value may be assigned to the call request. Rather, the call will be automatically admitted if the perceived QoS level is acceptable, and block is the perceived QoS level on an associated node is not acceptable. Typically, the VoIP networks of the present invention are designed to handle a certain volume of calls with an acceptable perceptual QoS. Accordingly, most of the time, the call admission thresholds for the various nodes and gateways on the network are extremely low and few, if any, calls are blocked. The call admission algorithms are therefore of particular importance when the capacity of the network or portions of the network is taxed. By using call admission algorithms based on perceptual QoS metrics that are directly related to the quality of the call as would be perceived by a user, the number of calls serviced by a portion of the network experiencing a high level of traffic can be maximized without causing unnecessary caller frustration due to poor call quality. This is in contrast to many prior art VoIP networks that assign hard limits to the various nodes and gateways indicating the maximum number of calls it will support], and [claim 7. The method according to claim 1, wherein the said perceptible anomalies are selected from the group consisting of: audio distortion, audio choppiness, audio jitter and variations in audio level].
Examiner maintain the rejection.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION. —The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-2, 4, and 7-23 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention.
The independent claims 1, 18, and 20 recite “statistically significant differences”, which renders the claims indefinite. The term “statistically significant differences” is not definite by the claim, and this limitation is not clearly explained in the applicant specification of how this term is defined and how it is calculated?
The independent claims 1, 18, and 20 recite “ wherein analyzing comprises detecting and identifying a polarized light at different intensities directed from a laser entering a workspace located within the location and hitting the one or more other IoT devices, and wherein performing the ameliorative action comprises: determining that the audio command was not authorized based on both detecting and identifying the polarized light at different intensities directed from a laser intrusion entering a workspace located within the location and hitting the one or more other IoT devices and the anomaly; sending an alert utilizing an alert system to notify an authorized owner of the first IoT device about possible attacks; and delaying processing of the audio command until the authorized owner approves the audio command”, which renders the claims indefinite. This limitations states that, detecting and analyzing a polarized light directed from a laser entering the workspace within the location and hitting the one or more other IOT device. The question is that why we are hitting and analyzing the “one or more other IOT devices” here, we should analyze the first IOT device to see if the first device getting hit with the laser and if someone attacking the device to cause the audio delay, since the owner of first IOT device gets an alert that the first IOT device is under attack and later on the owner of the first |OT device delays the approval of the audio command! In other word we are analyzing the audio delay on the first IOT device, NOT on the one or more other IOT devices!
Claims 2, 4, and 7-17, 19, and 21-23 do not cure the deficiency of claims 1, 18, and 20 and are rejected under 35 USC 112, 2nd paragraph, for their dependency upon claims 1, 18, and 20.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL. —The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-2, 4, and 7-23 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.
The independent claims 1, 18, and 20 contain ““statistically significant differences” which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA the inventor(s), at the time the application was filed, had possession of the claimed invention.
Applicant is kindly requested to show the examiner support in the original disclosure for the new or amended claims. See MPEP 714.02 and 2163.06 (“Applicant should specifically point out the support for any amendments made to the disclosure").
Claims 2, 4, and 7-17, 19, and 21-23 do not cure the deficiency of claims 1, 18, and 20 and are rejected under 35 USC 112, 1st paragraph, for their dependency upon claims 1, 18, and 20.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 4, 7-14, 16 and 18-23 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. (US2014/0372126) issued to ADY, and in view of SMAILZADEH RIAZ (WO 0232097 A2), hereinafter “RIAZ”, and further in view of US Patent No. (US2018/0047394) issued to Tian and further in view of NPL: “Light Commands: Laser-Based Audio Injection Attacks on Voice-Controllable Systems” issued to Takeshi Sugawara, hereinafter, “Takeshi”, and further in view of Hodge (US10027797) and further in view of SNYDER ROBERT ( US20040210945), hereinafter as “Snyder”.
Regarding claim 1, ADY discloses a method for preventing attacks of audio-based virtual assistants, comprising [¶37, As a first example, the always-on privacy mode utility 122 can configure the first user device 100 to generate a challenge requesting confirmation that the first pre-established, audible activation command originated from an authorized user of the first user device 100 by producing, via the at least one sound producing component 159, a challenge query that solicits the pre-established confirmation response 146 as an audible response detectable within the loudspeaker proximity of the first user device 100], and [¶¶36, 39]; and
wherein the analyzing includes comparing the audio command received by the first IoT device to audio received by the one or more other IoT devices including computing an audio delay [¶34, Alternatively or in addition to responding to tactile control inputs during this version of a privacy mode, the always-on privacy mode utility 122 can configure the first user device 100 to respond to voice commands that are determined to be made directly within hand-held or earpiece proximity. For example, the audio amplitude and delay analyzer utility 128 can respond to the at least one sound receiving component 149 receiving an audible command by measuring a volume magnitude of the audible command. Further, the audio amplitude and delay analyzer utility 128 can compare the volume magnitude to a loudness threshold that is pre-selected to indicate when a user is speaking directly into the first user device 100. In response to the volume magnitude exceeding the loudness threshold, the voice-activated information assistant 130 is allowed to process and respond to the audible command], and [¶37, As a first example, the always-on privacy mode utility 122 can configure the first user device 100 to generate a challenge requesting confirmation that the first pre-established, audible activation command originated from an authorized user of the first user device 100 by producing, via the at least one sound producing component 159, a challenge query that solicits the pre-established confirmation response 146 as an audible response detectable within the loudspeaker proximity of the first user device 100. The at least one sound receiving component 149 receives an audible confirmation response to the challenge query produced. The audio query utility 124 verifies that the first pre-established, audible activation command originated from an authorized user 110 of the first user device 100 by comparing the received audible confirmation response to the pre-established confirmation response 146 for a match. For example, the received audible confirmation response 146 can be a pre-selected one of a specific identifier assigned to the first user device 100 and a pre-recorded name of the authorized user. The multiple device detection (MDD) utility 232 can then respond by enabling the voice-activated information assistant 130 to respond to a subsequent voice command], and [¶44, In one or more embodiments, the user interface 134 displays and audibly responds to the authorized user 110 as depicted. For example, a privacy announcement 320, a challenge query 323 and a tactile input control 325 can be displayed on the user interface 134 and/or interfaced with as depicted at 110'. Similarly, the user interface 134 allows or in some instances requires direct interaction with the user interface 134 within earpiece proximity. Thresholds for audible volume and sensitivity for receiving audible voice commands can be pre-determined or adjusted to constrain the size of the loudspeaker proximity 315. For example, a third user device 305 and a third user 309 can be outside of the loudspeaker proximity 315. Thus, while third user device 305 may also detect voice activation signal 102, an audible acknowledgement 106 by the third user device 305 may have a variance in volume magnitude or time delay in arrival at the first user device 100 as to be undetectable]; and
and performing an ameliorative action in response to an anomaly identified during the analyzing, wherein the anomaly comprises statistically significant differences in expected patterns of audio delay
While ADY discloses this limitation as: [¶34, Alternatively or in addition to responding to tactile control inputs during this version of a privacy mode, the always-on privacy mode utility 122 can configure the first user device 100 to respond to voice commands that are determined to be made directly within hand-held or earpiece proximity. For example, the audio amplitude and delay analyzer utility 128 can respond to the at least one sound receiving component 149 receiving an audible command by measuring a volume magnitude of the audible command. Further, the audio amplitude and delay analyzer utility 128 can compare the volume magnitude to a loudness threshold that is pre-selected to indicate when a user is speaking directly into the first user device 100. In response to the volume magnitude exceeding the loudness threshold, the voice-activated information assistant 130 is allowed to process and respond to the audible command], and [¶40, In one or more embodiments, the audible acknowledgement may be the same from two or more user devices and thus the uniqueness of the audible acknowledgement as the basis of distinguishing the audible acknowledgement originating from the first user device 100 from another user device is questionable. However, the audio amplitude and delay analyzer 128 can discern sound qualities that can be used to differentiate the first user device 100 from another user device. For example, due to processing delays by the first user device 100, another user device can respond first to the voice activation. Thus, even with a delay due to the time required for the sound to travel from the other user device to the first user device 100, a delay may be insufficient to distinguish when using one sound receiving component 149. In this instance, the audio amplitude variance between the two sources can be used to determine whether the source is the first user device 100 or another user device. In another instance, the first user device 100 can be in close proximity to another user device 100 such that delay and amplitude are not significantly different. However, the audio amplitude and delay analyzer utility 128 can detect differences between what the front microphone 150 and the back microphone 152 detects, and these differences can be utilized to distinguish the two audio input sources 210. In yet another example, due to the volume settings of the first user device 100 and another user device, the amplitude detected can be the same at the at least one sound receiving component 149. However, the distance can impart a delay that is detectable by the amplitude and delay analyzer utility 128], and [¶44, In one or more embodiments, the user interface 134 displays and audibly responds to the authorized user 110 as depicted. For example, a privacy announcement 320, a challenge query 323 and a tactile input control 325 can be displayed on the user interface 134 and/or interfaced with as depicted at 110'. Similarly, the user interface 134 allows or in some instances requires direct interaction with the user interface 134 within earpiece proximity. Thresholds for audible volume and sensitivity for receiving audible voice commands can be pre-determined or adjusted to constrain the size of the loudspeaker proximity 315. For example, a third user device 305 and a third user 309 can be outside of the loudspeaker proximity 315. Thus, while third user device 305 may also detect voice activation signal 102, an audible acknowledgement 106 by the third user device 305 may have a variance in volume magnitude or time delay in arrival at the first user device 100 as to be undetectable], and [¶56, If the determination is that the second timer has expired in block 510, then the always-on privacy mode utility 122 accesses the privacy settings in block 512 with regard to the one or more manners in which the always-on privacy mode utility 122 is configured to determine whether an authorized user has been verified. In a first illustrative determination, the always-on-privacy mode utility 122 accesses settings regarding verification based upon the authorized user speaking directly into the first user device 100. A determination is made in decision block 514 as to whether this first verification setting is enabled. If enabled in decision block 514, then the always-on privacy mode utility 122 compares the volume magnitude of the received confirmation as measured by the audio amplitude and delay analyzer utility 128 to a threshold. After the comparison in block 515, the always-on privacy mode utility 122 makes a determination in decision block 516 as to whether the confirmation response is verified based on the comparison.], and [ claim 3, The method of claim 1 wherein processing and responding to a received audible command comprises: measuring a volume magnitude of the audible command; comparing the volume magnitude to a loudness threshold that is pre-selected to indicate when a user is speaking directly into the user device; and in response to the volume magnitude exceeding the loudness threshold, processing and responding to the received audible command], and [¶¶ 35-39, 44].
Examiner Note: Furthermore, Takeshi discloses: [page 2644, section 6.4, Acoustic Stealthiness. To tackle the issue of the device owner hearing the targeted device acknowledging the execution of voice command (or asking for a PIN number during the brute forcing process), the attacker (equated to intruder) can start the attack by asking the device to lower its speaker volume. For some devices (EcoBee, Google Nest Camera IQ, and Fire TV), the volume can be reduced to completely zero, while for other devices it can be set to barely-audible levels. Moreover, the attacker can also abuse device features to achieve the same goal. For Google Assistant, enabling the “do not disturb mode” mutes reminders, broadcast messages and other spoken notifications. For Amazon Echo devices, enabling “whisper mode” significantly reduces the volume of the device responses during the attack to almost inaudible levels], and [ Page 2631, section 1.1, Attack Stealthiness and Cheap Setup. We then show how an attacker can build a cheap yet effective injection setup, using commercially available laser pointers and laser drivers. Moreover, by using infrared lasers and abusing volume features (e.g., whisper mode for Alexa devices) on the target device, we show how an attacker can mount a light-based audio injection attack while minimizing the chance of discovery by the target’s legitimate owner.].
RIAZ further discloses this limitation as: [ABSTRACT A method and system that utilize perceptual quality of service (QoS) metrics to determine whether to admit new calls onto a VoIP network is described. Perceptual QoS metrics are generated at a communications device, such as a telephone, that represent the measurement of perceptible variations in the quality level of a voice signal, or audio and video signal received at the communications device. A call admission threshold is generated from the perceptual QoS metrics for each node in the network that has communications devices attached thereto. When a request to admit a new call on the network through a node is received, the call admission threshold is compared with a priority value associated with the request to determine whether to admit the new call onto the network], and [ Page 7, Figure 2 is a flow chart illustrating the operation of the VoIP network 100 concerning the admittance of a new call onto the network through a network gateway or node according to one embodiment of the present invention. In block 210, perceptual QoS agents located at various voice terminals distributed over the network measure and generate one or more perceptual QoS metrics that relate to perceptible variations in the quality of the voice stream. In the simplest form, the resulting metrics may comprise a single number indicating the overall quality of the voice signal, or the metrics may comprise a more sophisticated set of data that individually describes the various anomalies within the voice signal, such as delay, choppiness, and speech level variation…], and [Pages 8-9, In yet another embodiment wherein the threshold value assigned to the node represents either an acceptable or unacceptable perceptual QoS on that node, no priority value may be assigned to the call request. Rather, the call will be automatically admitted if the perceived QoS level is acceptable and block is the perceived QoS level on an associated node is not acceptable. Typically, the VoIP networks of the present invention are designed to handle a certain volume of calls with an acceptable perceptual QoS. Accordingly, most of the time, the call admission thresholds for the various nodes and gateways on the network are extremely low and few, if any, calls are blocked. The call admission algorithms are therefore of particular importance when the capacity of the network or portions of the network is taxed. By using call admission algorithms based on perceptual QoS metrics that are directly related to the quality of the call as would be perceived by a user, the number of calls serviced by a portion of the network experiencing a high level of traffic can be maximized without causing unnecessary caller frustration due to poor call quality. This is in contrast to many prior art VoIP networks that assign hard limits to the various nodes and gateways indicating the maximum number of calls it will support], and [claim 7. The method according to claim 1, wherein the said perceptible anomalies are selected from the group consisting of: audio distortion, audio choppiness, audio jitter and variations in audio level].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of ADY, by incorporating “utilizing perceptual quality of service (QoS) metrics”, as taught by RIAZ. One could have been motivated to do so in order for determining whether to admit a new call onto a network based on the perceptual quality of service of other communications active on the network, as well as the required perceptual quality level by the user requesting the new call, where the perceptual QoS agents located at various voice terminals distributed over the network measure and generate one or more perceptual QoS metrics that relate to perceptible variations in the quality of the voice stream [ RIAZ, ¶48].
While ADY discloses, receiving an audio command at a first Internet-of-Things (IoT) device in a location, analyzing the audio command using one or more other IoT devices in the location as:
[¶28, Turning now to FIG. 2, a diagram of an example always-on privacy mode environment 200 is illustrated. When voice activation signal 102 (i.e., first pre-established, audible activation command) is received from an audio input source 210, the first user device 100 employs mechanisms and techniques to discern whether the first user device 100 is in a multiple device environment], and [ ¶¶35, In one or more embodiments of the present disclosure, the always-on privacy mode utility 122 can configure the first user device 100 to continue responding to voice commands with privacy maintained by verifying that such voice commands come from an authorized user 110. To that end, the audio query utility 124 of the always-on privacy mode utility 122 can generate a challenge requesting confirmation that the first pre-established, audible activation command originated from an authorized user of the user device. The at least one sound receiving component 149 receives a confirmation response to the challenge. In response to the received confirmation response being verified by the audio query utility 124 as a pre-established confirmation response that is assigned to the user device 100, the always-on privacy mode utility 122 processes and responds to a received audible command], and [¶¶11, 39].
However, ADY, and RIAZ do not explicitly discloses the limitation and Tian discloses: [Abstract, Systems and methods for associating audio signals in an environment surrounding a voice-controlled system include receiving by a voice-controlled system through a microphone, an audio signal from a user of a plurality of users within an environment surrounding the microphone. The voice-controlled system determines a source location of the audio signal. The voice-controlled system determines a first user location of a first user and a second user location of a second user. The voice-controlled system then determines that the first user location correlates with the source location such that the source location and the first user location are within a predetermined distance of each other. In response, the voice-controlled system performs at least one security action associated with the first user providing the audio signal], and [see FIG.1 and corresponding text for a method for location-based voice recognition], and [see FIG 6 , ¶42 for more details], and [FIG. 10 , ¶95, FIG. 10 includes a plurality of user devices 1002, a plurality of voice-controlled devices 1004, a messaging service provider device 1006, and a plurality of networking devices 1008 in communication over a network 1010. Any of the user devices 1002 may be any of the user devices discussed above and operated by the users discussed above. The voice-controlled devices 1004 may be the voice-controlled devices discussed above and may be operated by the users discussed above...].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of ADY, and RIAZ by incorporating “an audio source location engine configured with one or more machine learning algorithms to perform supervised machine learning, unsupervised machine learning, semi-supervised learning, reinforcement learning, deep learning, and other machine learning algorithms”, as taught by Tian. One could have been motivated to do so in order to by implementing an audio source location engine to generate and update acoustic signatures while the voice-controlled device is in operation. For example, some audio commands may be associated with a source location with a high certainty whereas other audio commands may be associated with a source location with low certainty [ Tian, ¶48].
Tian, and RIAZ do not explicitly disclose, wherein analyzing comprises detecting and identifying a polarized light at different intensities directed from a laser entering a workspace located within the location and hitting the one or more other IoT devices and wherein performing the ameliorative action comprises determining that the audio command was not authorized based on both detecting and identifying the polarized light at different intensities directed from a laser intrusion entering a workspace located within the location and hitting the one or more other IoT devices and the anomaly
While ADY discloses [ ¶¶34-40, 44, 56, claim 3, the audio amplitude and delay analyzer utility 128 can respond to the at least one sound receiving component 149 receiving an audible command by measuring a volume magnitude of the audible command. Further, the audio amplitude and delay analyzer utility 128 can compare the volume magnitude to a loudness threshold that is pre-selected to indicate when a user is speaking directly into the first user device 100. In response to the volume magnitude exceeding the loudness threshold, the voice-activated information assistant 130 is allowed to process and respond to the audible command].
Furthermore, Takeshi discloses: [Abstract, examining various products that use Amazon’s Alexa, Apple’s Siri, Facebook’s Portal, and Google Assistant, we show how to use light to obtain control over these devices at distances up to 110 meters and from two separate buildings. Next, we show that user authentication on these devices is often lacking, allowing the attacker (equated to intruder /intrusion) to use light-injected voice commands to unlock the target’s smartlock-protected front doors, open garage doors, shop on e-commerce websites at the target’s expense, or even unlock and start various vehicles connected to the target’s Google account (e.g., Tesla and Ford). Finally, we conclude with possible software and hardware defenses against our attacks], and [ see page 2633. Section 2-1, Voice-Controllable System, and 2.2, Attacks (equated to intrusion) on Voice-Controllable Systems], and [page1634, section 3, Threat Model, Line of Sight. We do assume however that the attacker has remote line of sight access to the target device and its microphones. We argue that such an assumption is reasonable, as voice-activated devices (such as smart speakers, thermostats, security cameras, or even phones) are often left visible to the attacker (equated to intruder), including through closed glass windows], and [Page 2642, Laser Focusing and Aiming. As in Section 5.2, it is impossible to focus the laser using the small lens typically used for laser pointers. We thus mounted the laser to an Opteka 650- 1300 mm telephoto lens. Next, to aim the laser across large distances, we have mounted the telephoto lens on a Manfrotto 410 geared tripod head. This allows us to precisely aim the laser beam on the target device across large distances, achieving an accuracy far exceeding the one possible with regular (non-geared) tripod heads where the attacker’s arm directly moves the laser module. Finally, in order to see the laser spot and the device’s microphone ports from far away, we have used a consumer-grade Meade Infinity 102 telescope. As can be seen in Figure 10 (left), the Google Home microphone’s ports are clearly visible through the telescope], and [Page 2634, section 2.6 Laser Sources, Choice of a Laser. A laser is a device that emits a beam of coherent light that stays narrow over a long distance and be focused to a tight spot... in this paper we focus on laser emitting diodes, which are common in consumer laser products such as laser pointers. Next, as the light intensity emitted from a laser diode is directly proportional to the diode’s driving current, we can easily encode analog signals via the beam’s intensity by using a laser driver capable of amplitude modulation], and [page 2644, section 6.4, Acoustic Stealthiness. To tackle the issue of the device owner hearing the targeted device acknowledging the execution of voice command (or asking for a PIN number during the brute forcing process), the attacker (equated to intruder) can start the attack by asking the device to lower its speaker volume. For some devices (EcoBee, Google Nest Camera IQ, and Fire TV), the volume can be reduced to completely zero, while for other devices it can be set to barely-audible levels. Moreover, the attacker can also abuse device features to achieve the same goal. For Google Assistant, enabling the “do not disturb mode” mutes reminders, broadcast messages and other spoken notifications. For Amazon Echo devices, enabling “whisper mode” significantly reduces the volume of the device responses during the attack to almost inaudible levels].
Examiner Note: This link indicates that Introduction to Polarization | Edmund Optics
Understanding Polarization, the most common source of polarized light is a laser.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of ADY, RIAZ, and Tian by incorporating “Laser-Based Audio Injection Attacks on Voice-Controllable Systems”, as taught by Takeshi. One could have been motivated to do so in order for an attacker inject arbitrary audio signals to a target microphone by aiming an amplitude-modulated light at the microphone’s aperture, which this effect leads to a remote voice-command injection attack on voice-controllable systems. [ Takeshi, ¶48].
RIAZ, Tian, and Takeshi do not explicitly disclose:
and sending an alert utilizing an alert system to notify an authorized owner of the first IoT device about possible attacks
While ADY discloses this limitation as: [Abstract, in response to detecting, the method includes triggering entry into a privacy mode of audible command input and producing a privacy mode announcement via at least one of a display and a sound producing component], and [¶11, In response to detecting at least one second, audible acknowledgement within the pre-set time interval, the user device triggers entry into a privacy mode of audible command input and produces a privacy mode announcement via at least one of a display and a sound producing component].
Furthermore, Hodge discloses: [Col. 8 lines 28-37 … audio analysis module 314 may transmit the audio and associated data to the inmate call monitoring stations 204a and 204b only after an anomaly has been detected in a particular inmate call. For example, by transmitting data after an anomaly has been detected, the audio analysis module 314 allows the inmate call monitoring stations 204a and 204b to receive data for audio calls where suspicious activity may be occurring, such that individuals at the monitoring stations are able to monitor the calls further and take action quickly], and [Col8 lines 28-37, Anomaly detection module 316 analyzes audio and detects call anomalies in phone calls of inmates at the correctional facility. In some embodiments, anomaly detection module 316 is configured to detect any call anomaly or calling event which may indicate that an inmate is engaging in an illicit activity during a phone call (e.g., calling events that are prohibited by the correctional facility). For example, inmates may engage in illicit activities during phone calls, such as by using phone calls to call one or more individuals whom they are not allowed to contact (e.g., judges, attorneys, witnesses, and the like). Anomaly detection module 316 is configured to identify violations by detecting call anomalies], and [Col.9 lines 34-67, Col. 10 lines 1-7, In some embodiments, anomaly detection module 316 performs anomaly detection of inmate phone calls in or near real-time (e.g., as the phone calls are occurring), whereas in other embodiments, anomaly detection module 316 receives audio data for an inmate phone call from communication center 110 and/or from audio analysis module 314 and performs a delayed anomaly detection on the received audio data. Upon detection of a call anomaly, anomaly detection module 316 communicates with alarm control module 318 to perform alarm activation and communicates with monitoring subsystem 310 to provide alerts and/or notifications to at least one of the inmate call monitoring stations 204a and 204b. For example, anomaly detection module 316 detects a call anomaly and communicates the detected call anomaly to the alarm control module 318 and the monitoring subsystem 310, and the monitoring subsystem 310 generates one or more notifications that are transmitted to at least one of the inmate call monitoring stations 204a and 204b, in which the one or more notifications indicate that a call anomaly has been detected in a particular inmate's phone call].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of ADY, RIAZ, Tian, and Takeshi by incorporating “Anomaly detection module 316 analyzes audio and detects call anomalies”, as taught by Hodge. One could have been motivated to do so in order to receives audio data from audio analysis module 314 and performs a delayed anomaly detection on the received audio data. Upon detection of a call anomaly, anomaly detection module 316 communicates with alarm control module 318 to perform alarm activation and communicates with monitoring subsystem 310 to provide alerts and/or notifications to at least one of the inmate call monitoring stations 204a and 204b. [ Hodge, [Col.9 lines 34-67, Col. 10 lines 1-7].
ADY, RIAZ, Tian, and Takeshi, and Hodge do not explicitly disclose, however, Snyder discloses delaying processing of the audio command until the authorized owner approves the audio command [¶59, The term"as-live"refers to a live media production that has been recorded for a delayed broadcast over traditional or network mediums. The delay period is typically a matter of seconds and is based on a number of factors. For example, a live broadcast may be delayed to grant an editor sufficient time to approve the content or edit the content to remove objectionable subject matter], and [¶112, If audio commands are assigned to a variable, an audio object 110 must be selected. For audio, the property page field (s) includes an audio command field, audio control channel field, audio preset field, cross-fade group field, and audio grouping field. The audio command field includes the instructions for controlling an audio device. Audio commands include, but are not limited to, fade up, fade down, cross-fade-up, and cross-fade-down], and [¶57… Media productions also include live or recorded audio (including radio broadcast) …].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of ADY, RIAZ, Tian, Takeshi , and Hodge by incorporating “as-live" which refers to a live media production that has been recorded for a delayed broadcast”, as taught by Snyder. One could have been motivated to do so in order for a live broadcast to be delayed granting an editor sufficient time to approve the content or edit the content to remove objectionable subject matter [ Snyder, ¶¶48, 57]
Regarding claim 2, ADY discloses, wherein the comparing comprises computing a relative volume level [¶34, Alternatively or in addition to responding to tactile control inputs during this version of a privacy mode, the always-on privacy mode utility 122 can configure the first user device 100 to respond to voice commands that are determined to be made directly within hand-held or earpiece proximity. For example, the audio amplitude and delay analyzer utility 128 can respond to the at least one sound receiving component 149 receiving an audible command by measuring a volume magnitude of the audible command. Further, the audio amplitude and delay analyzer utility 128 can compare the volume magnitude to a loudness threshold that is pre-selected to indicate when a user is speaking directly into the first user device 100. In response to the volume magnitude exceeding the loudness threshold, the voice-activated information assistant 130 is allowed to process and respond to the audible command], and [¶44, 56].
Regarding claim 4, ADY discloses wherein the analyzing is in response to a receipt of the audio command by the first IoT device. [ ¶¶34, 37, 44].
Regarding claims 7, and 21, ADY discloses, wherein the ameliorative action comprises: not processing, at a first time, the audio command; receiving feedback from the authorized owner about the identified anomaly comprising at least of negative feedback and positive feedback; and in response to the feedback, processing, at a second time subsequent to the first time, the audio command [¶11, The illustrative embodiments of the present disclosure provide a method and a user device that discriminately provide audible responses to a voice command received by a user device that supports voice activation. According to one aspect, the user device includes: audio receiving mechanisms that detect a first pre-established, audible activation command that activates the user device. In response to detecting the first pre-established, audible activation command, the user device produces a first audible acknowledgement within loudspeaker proximity of the user device. The user device monitors for detection of at least one second, audible acknowledgement produced by another user device within a pre-set time interval. In response to not detecting any second, audible acknowledgement within the pre-set time interval, the user device processes and responds to a received audible command in response to not detecting any second, audible acknowledgement within the pre-set time interval. In response to detecting at least one second, audible acknowledgement within the pre-set time interval, the user device triggers entry into a privacy mode of audible command input and produces a privacy mode announcement via at least one of a display and a sound producing component], and [¶39] In at least one embodiment, the always-on privacy mode utility 122 further configures the user device 100 to receive a control input at a user interface 134 of the user device 100 to perform one of: (a) modifying. and (b) adding a pre-established confirmation response 146 assigned to one of the user devices(s) 100 and the authorized user. To that end, the at least one sound receiving component 149 receives a new confirmation response from the authorized user 110. The always-on privacy mode utility updates the pre-established confirmation response 146 in the data storage device 142 to match the new confirmation response], and [¶45, In the illustrative scenario, each user device 100, 104 has a corresponding user associated therewith. Specifically, first user device 100 has a corresponding first user 110 and the second user device 104 has a corresponding second user 108. In the illustrative scenario, both of the first and second user devices 100, 104 are configured to monitor for the voice activation signal 102 and to respond with the audible acknowledgement 106. Privacy of the first user 110 can be compromised if the first user device 100 were to audibly disclose private information in response to a voice command that was not intended for the first user device 100, that did not originate from an authorized user, or that was not intended to be a voice command. To address this issue, the always-on privacy mode utility 122 of the first user device 100 prevents inadvertent audible response containing private information that can be overheard by the second user 108], and [¶¶38].
Regarding claims 8, and 22, ADY discloses, further comprising: receiving a historical data corpus of audio commands from the first IoT device and the one or more other IoT devices[¶30, The server 224 can also provide additional user information from a database 228 and additional query functionality from a received command/query response engine 230], and [ see FIG.2 , # 146( pre-established confirmation responses, [¶31, the first user device 100 can access locally or remotely stored data or programs within data storage device 142, depicted as containing privacy settings 144 and pre-established confirmation response 146], and [¶37, The audio query utility 124 verifies that the first pre-established, audible activation command originated from an authorized user 110 of the first user device 100 by comparing the received audible confirmation response to the pre-established confirmation response 146 for a match. For example, the received audible confirmation response 146 can be a pre-selected one of a specific identifier assigned to the first user device 100 and a pre-recorded name of the authorized user].
ADY, RIAZ, Takeshi, Hodge, and Snyder do not explicitly disclose, however, Tian discloses and training an artificial intelligence model from a historical corpus to analyze the audio command [¶47, In yet another example, the audio source location engine 314 may be configured with one or more machine learning algorithms to perform supervised machine learning, unsupervised machine learning, semi-supervised learning, reinforcement learning, deep learning, and other machine learning algorithms known to one of skill in the art in possession of the present disclosure in determining a source location of a user within an environment. In one example, the audio source location engine may include a supervised machine learning algorithm to calibrate the audio source location engine 314 for a particular environment. The environment where the voice-controlled device 300 is located may have unique acoustic properties. When the voice-controlled device 300 is initiated for the first time, users may be instructed to undergo an initial calibration routine of the voice-controlled device 300. For example, the voice-controlled device 300 may prompt the user to issue a certain set of audio commands at a predetermined location in the environment. The audio command may have unique characteristics that are based on the predetermined location and the environment in which the user is providing the audio commands. The voice-controlled device 300 may generate an acoustic signature based on the audio command provided by the user that is particular to the unique characteristics of the predetermined location. The audio source location engine 314 may compare the acoustic signatures generated during the calibration to subsequent acoustic signatures associated with subsequent audio commands to determine a source location of those subsequent audio commands].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of ADY, RIAZ, Takeshi, Hodge, and Snyder by incorporating “an audio source location engine configured with one or more machine learning algorithms to perform supervised machine learning, unsupervised machine learning, semi-supervised learning, reinforcement learning, deep learning, and other machine learning algorithms”, as taught by Tian. One could have been motivated to do so in order to by implementing an audio source location engine to generate and update acoustic signatures while the voice-controlled device is in operation. For example, some audio commands may be associated with a source location with a high certainty whereas other audio commands may be associated with a source location with low certainty [ Tian, ¶48].
Regarding claims 9, and 23, ADY, RIAZ, Takeshi, Hodge, and Snyder do not explicitly disclose, however, Tian discloses, further comprising using the feedback to further train the artificial intelligence model) [¶47, In yet another example, the audio source location engine 314 may be configured with one or more machine learning algorithms to perform supervised machine learning, unsupervised machine learning, semi-supervised learning, reinforcement learning, deep learning, and other machine learning algorithms known to one of skill in the art in possession of the present disclosure in determining a source location of a user within an environment. In one example, the audio source location engine may include a supervised machine learning algorithm to calibrate the audio source location engine 314 for a particular environment. The environment where the voice-controlled device 300 is located may have unique acoustic properties. When the voice-controlled device 300 is initiated for the first time, users may be instructed to undergo an initial calibration routine of the voice-controlled device 300. For example, the voice-controlled device 300 may prompt the user to issue a certain set of audio commands at a predetermined location in the environment. The audio command may have unique characteristics that are based on the predetermined location and the environment in which the user is providing the audio commands. The voice-controlled device 300 may generate an acoustic signature based on the audio command provided by the user that is particular to the unique characteristics of the predetermined location. The audio source location engine 314 may compare the acoustic signatures generated during the calibration to subsequent acoustic signatures associated with subsequent audio commands to determine a source location of those subsequent audio commands].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of ADY, RIAZ, Takeshi, Hodge, and Snyder by incorporating “an audio source location engine configured with one or more machine learning algorithms to perform supervised machine learning, unsupervised machine learning, semi-supervised learning, reinforcement learning, deep learning, and other machine learning algorithms”, as taught by Tian. One could have been motivated to do so in order to by implementing an audio source location engine to generate and update acoustic signatures while the voice-controlled device is in operation. For example, some audio commands may be associated with a source location with a high certainty whereas other audio commands may be associated with a source location with low certainty [ Tian, ¶48].
Regarding claim 10, ADY discloses, further comprising identifying the one or more other IoT devices [¶28, Turning now to FIG. 2, a diagram of an example always-on privacy mode environment 200 is illustrated. When voice activation signal 102 (i.e., first pre-established, audible activation command) is received from an audio input source 210, the first user device 100 employs mechanisms and techniques to discern whether the first user device 100 is in a multiple device environment, depicted as containing a second user device 104. This discernment by the first user device 100 can be triggered by detecting an audible acknowledgement 106 from the second user device 104 to the voice activation signal 102. When in a multiple device environment, another user 108 associated with the second user device 104 may be the source of the voice activation signal 102 rather than an authorized user 110 associated with the first user device 100].
Examiner Note: Tian also discloses this limitation as: [ see FIG. 10 and corresponding text for more details, ¶95, FIG. 10 includes a plurality of user devices 1002, a plurality of voice-controlled devices 1004].
Regarding claim 11, ADY discloses, wherein the identifying comprises receiving a user selection of the one or more other IoT devices [¶28, Turning now to FIG. 2, a diagram of an example always-on privacy mode environment 200 is illustrated. When voice activation signal 102 (i.e., first pre-established, audible activation command) is received from an audio input source 210, the first user device 100 employs mechanisms and techniques to discern whether the first user device 100 is in a multiple device environment, depicted as containing a second user device 104. This discernment by the first user device 100 can be triggered by detecting an audible acknowledgement 106 from the second user device 104 to the voice activation signal 102. When in a multiple device environment, another user 108 associated with the second user device 104 may be the source of the voice activation signal 102 rather than an authorized user 110 associated with the first user device 100].
Examiner Note: Tian also discloses this limitation as: [ see FIG. 10 and corresponding text for more details, ¶95, FIG. 10 includes a plurality of user devices 1002, a plurality of voice-controlled devices 1004].
Regarding claim 12, ADY discloses, wherein the identifying comprises analyzing a historical data corpus to identify the one or more other IoT devices[¶30, The server 224 can also provide additional user information from a database 228 and additional query functionality from a received command/query response engine 230], and [ see FIG.2 , # 146( pre-established confirmation responses, [¶31, the first user device 100 can access locally or remotely stored data or programs within data storage device 142, depicted as containing privacy settings 144 and pre-established confirmation response 146], and [¶37, The audio query utility 124 verifies that the first pre-established, audible activation command originated from an authorized user 110 of the first user device 100 by comparing the received audible confirmation response to the pre-established confirmation response 146 for a match. For example, the received audible confirmation response 146 can be a pre-selected one of a specific identifier assigned to the first user device 100 and a pre-recorded name of the authorized user].
Regarding claim 13, ADY discloses, further comprising determining a security level for the command; and wherein the analyzing is performed in response to the security level exceeding a predetermined threshold [¶34, Alternatively or in addition to responding to tactile control inputs during this version of a privacy mode, the always-on privacy mode utility 122 can configure the first user device 100 to respond to voice commands that are determined to be made directly within hand-held or earpiece proximity. For example, the audio amplitude and delay analyzer utility 128 can respond to the at least one sound receiving component 149 receiving an audible command by measuring a volume magnitude of the audible command. Further, the audio amplitude and delay analyzer utility 128 can compare the volume magnitude to a loudness threshold that is pre-selected to indicate when a user is speaking directly into the first user device 100. In response to the volume magnitude exceeding the loudness threshold, the voice-activated information assistant 130 is allowed to process and respond to the audible command].
Regarding claims 14, and 16 further comprising: identifying a change to the location; and calculating an impact of the change on an expected audio profile of the audio command at the one or more other IoT devices
ADY discloses: [¶28, Turning now to FIG. 2, a diagram of an example always-on privacy mode environment 200 is illustrated. When voice activation signal 102 (i.e., first pre-established, audible activation command) is received from an audio input source 210, the first user device 100 employs mechanisms and techniques to discern whether the first user device 100 is in a multiple device environment], and [ ¶¶35, In one or more embodiments of the present disclosure, the always-on privacy mode utility 122 can configure the first user device 100 to continue responding to voice commands with privacy maintained by verifying that such voice commands come from an authorized user 110. To that end, the audio query utility 124 of the always-on privacy mode utility 122 can generate a challenge requesting confirmation that the first pre-established, audible activation command originated from an authorized user of the user device. The at least one sound receiving component 149 receives a confirmation response to the challenge. In response to the received confirmation response being verified by the audio query utility 124 as a pre-established confirmation response that is assigned to the user device 100, the always-on privacy mode utility 122 processes and responds to a received audible command], and [¶¶11, 39].
Furthermore, Tian discloses: [Abstract, Systems and methods for associating audio signals in an environment surrounding a voice-controlled system include receiving by a voice-controlled system through a microphone, an audio signal from a user of a plurality of users within an environment surrounding the microphone. The voice-controlled system determines a source location of the audio signal. The voice-controlled system determines a first user location of a first user and a second user location of a second user. The voice-controlled system then determines that the first user location correlates with the source location such that the source location and the first user location are within a predetermined distance of each other. In response, the voice-controlled system performs at least one security action associated with the first user providing the audio signal], and [¶¶26-27, When multiple users are within an environment surrounding a voice-controlled device, the voice-controlled device may have trouble distinguishing which user is providing a particular audio command, separating multiple audio signals received from different users at the same time, and associating a first audio signal provided by a user with a second audio signal provided by the same user at a later time. These issues become even more troublesome when the users are moving around the environment or in close proximity to each other, as resulting associations between what is being said in the environment based on a source location of the audio signal may be incorrect. [0027] The voice association system of the present disclosure includes a voice-controlled device that may determine the source location of one or more audio signals in its environment that includes multiple users. The voice-controlled device may then determine a user location for each user within the environment. If a user location correlates with the source location, as well as with a user identity determined from the audio signal, then the voice-controlled device may associate a particular signal with a particular user within its environment. When source location and/or voice comparison between audio signals are indeterminable, user location and user identity (based on user location) may be used by the voice-controlled device to distinguish the audio signals and associate them with distinct processes associated with each user in the environment], and [see FIG.1 and corresponding text for a method for location-based voice recognition, ¶37], and [see FIG 6 , ¶42 for more details], and [FIG. 10 , ¶95, FIG. 10 includes a plurality of user devices 1002, a plurality of voice-controlled devices 1004, a messaging service provider device 1006, and a plurality of networking devices 1008 in communication over a network 1010. Any of the user devices 1002 may be any of the user devices discussed above and operated by the users discussed above. The voice-controlled devices 1004 may be the voice-controlled devices discussed above and may be operated by the users discussed above...], and [¶48, Similarly, the audio source location engine 314 may be configured with unsupervised machine learning algorithms such that the audio source location engine 314 may generate and update acoustic signatures while the voice-controlled device 300 is in operation. For example, some audio commands may be associated with a source location with a high certainty whereas other audio commands may be associated with a source location with low certainty. If the audio command is associated with a source location with a high certainty (e.g., user's audio command is picked up by multiple microphones, the audio signal has very little noise, and other), then that audio command's acoustic signature may be added to a training set of acoustic signatures], and [¶¶40,58,61, and 63].
Furthermore, Takeshi discloses: [ Page 2646, section 7.3, while line of sight access is often available for smart speakers visible through windows, the situation is different for mobile devices such as smart watches, phones and tablets. This is since unlike static smart speakers, these devices are often mobile, requiring an attacker to quickly aim and inject commands. When combined with the precise aiming and higher laser power required to attack such devices, successful LightCommands attacks might be particularly challenging].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine ADY, RIAZ, Tian, and Takeshi teaching in order to provides audible responses to a voice command received by a user device that supports voice activation taught by ADY with associating audio signals in an environment surrounding a voice-controlled system include receiving by a voice-controlled system through a microphone, an audio signal from a user of a plurality of users within an environment surrounding the microphone. The voice-controlled system determines a source location of the audio signal taught by Tian, and laser-based audio injection attacks on voice-controllable devices by providing line of sight access with the precise aiming and laser power required to attack such different devices such as smart speakers, mobile devices smart watches, tablets and etc. which different laser bean intensity will have different impact on each devise.
Regarding claims 18, and 20, these claims are interpreted and rejected for the same rational set forth in claim 1.
Regarding claim 19, ADY discloses wherein the processing unit and the memory configured into a cloud computing environment. [¶24, To support the wireless communication, first user device 100 includes one or more communication components, including wireless wide area network (WWAN) transceiver 165 with connected antenna 166 to communicate with a radio access network (RAN) 168 of a cellular network 169...], and [¶30, the always-on privacy mode environment 200 includes a distributed architecture, depicted as the first user device 100 communicating via a data packet network 222 to a server 224. The communication mechanism 194 of the first user device 100 can communicate with a data packet network access component 226 of the server 224. For example, certain functions such as the text-to-voice conversion module 218 and a voice-to-text conversion module 220 can be downloaded from server 224 or be provided as remote functions on server 224. The server 224 can also provide additional user information from a database 228 and additional query functionality from a received command/query response engine 230], and [ ¶¶25-27].
Examiner Note: Tian also discloses this limitation as: [ see FIG. 10 and corresponding text for more details, ¶95, FIG. 10 The embodiment of the networked system 1000 illustrated in FIG. 10 includes a plurality of user devices 1002, a plurality of voice-controlled devices 1004, a messaging service provider device 1006, and a plurality of networking devices 1008 in communication over a network 1010.], and [¶97, The network 1010 may be implemented as a single network or a combination of multiple networks. For example, in various embodiments, the network 1010 may include the Internet and/or one or more intranets, landline networks, wireless networks, and/or other appropriate types of networks].
Claims 15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over US Patent No. (US2014/0372126) issued to ADY, and in view of SMAILZADEH RIAZ (WO 0232097 A2), hereinafter “RIAZ”, and in view of US Patent No. (US2018/0047394) issued to Tian and further in view of NPL: “Light Commands: Laser-Based Audio Injection Attacks on Voice-Controllable Systems” issued to Takeshi Sugawara, hereinafter, “Takeshi and further in view of (US2017/0140777) issued to Amman.
Regarding claims 15, and 17, ADY, RIAZ, Tian, and Takeshi do not explicitly disclose, however, Amman discloses wherein the change is chosen from the group consisting of an opening of a door and an activation of a heating, air-conditioning and ventilation (HVAC) system [¶5, In a first illustrative embodiment, a system includes a head and torso simulator (HATS) system, configured to play back pre-recorded audio commands while simulating a driver head location as an output location. The system also includes a vehicle speaker system and a processor configured to engage a vehicle heating, ventilation and air-conditioning (HVAC) system. The processor is also configured to play back audio commands through the HATS system while playing back pre-recorded vehicle environment noises through the speaker system. The processor is further configured to determine if the audio commands, recorded by a vehicle microphone, are recognizable in the environment noises and HVAC noises and, for each audio command in a set of commands, repeat the engagement, play back of commands and noises, and determination, recording the results of the determination for each audio command].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of ADY, RIAZ, Tian and Takeshi, by incorporating “tuning speech recognition systems to accommodate ambient noise”, as taught by Amman. One could have been motivated to do so in order to recognize the audio command in the presence of ambient noise [ Amman, ¶¶1, 32].
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Velusamy (US2012/0144415) [ [0030] In step 220, detection agent 105 causes data, either data relating to an anomaly detected as described above with respect to step 210, or data requested from server 170 is described with respect to 215, to be sent to server 170. Such data may include statistics and/or other data relating to an anomaly and/or statistics and/or other data relating to a media stream provided over a given period of time, as well as contents of the loopback buffer or other similar mechanism. For example, data provided to server 170 in this step could include a number of packets dropped or delayed, a video or audio channel associated with the data and/or anomaly, an identifier for the content processing device 110, an identifier for network nodes associated with the content processing device 110, such as identifiers for a central office, routers, ONTs 130, OLTs 135, etc., other data identifying one or more anomalies such as those mentioned above, such as a report that a signal has been lost, a frame has been frozen, etc.].
Lee (US2019/0373041) [ [0003] The described techniques relate to improved methods, systems, devices, and apparatuses that support improvising audio dejittering using delay standard deviation from one or more determined values, such as a delay mean. The techniques may relate to determining a delay (e.g., a mean delay) applied to packets in a packet voice communication system. The techniques may further relate to calculating a parameter (e.g., a standard deviation) for at least a subset of packets in a group (e.g., a talk spurt) based on the delay and determining a target delay for the group by applying an adjustment or factor (e.g., a moving average) to the parameter, and applying the target delay to at least one packet from the group].
Berkowitz (US2012/0008760) [ 011…Most Internet applications use connection routing entirely at the IP address layer. In this approach, each IP switching element (called a "router") examines each received packet, examines its IP address, and forwards it towards a destination based on comparing that address with a routing table. This works very well for network traffic that is not particularly delay-sensitive. However, VoIP applications tend to be delay sensitive. For example, if Alice is talking to Bob, and a congested router delays the voice stream from Alice, Bob will hear the problem, either as a skip in the speech, a burst of noise, or some other anomaly…].
Doshi (US6219339) [ (25) Upon receiving the first packet of a call, the receiver waits for an initial period of time, referred to herein as the "build-out" delay, before reconstructing and playing out the audio signal during a connection, or call. Once the build-out delay has passed, the receiver reconstructs the audio signal using the recovered sequence numbers to re-order received packets for the duration of the connection. Unfortunately, the use of sequence numbers, by themselves, and a single build-out delay for the entire call does not mitigate other anomalies present in packet voice systems due to packet delay and packet loss].
Sanborn (US11393471) [Col.7, lines 49-52, ...a machine learning model may be trained to recognize whispered speech based on resonance, volume, and/or other features of the audio data 211], and [ Col. 9 lines 29-43, Various machine learning techniques may be used to train and/or operate the machine learning models usable by the speech characteristics detector 285. In machine learning techniques, component is “trained” by repeatedly providing it examples of data and how the data should be processed using an adaptive model until it can consistently identify how a new example of the data should be processed, even if the new example is different from the examples included in the training set. Getting an adaptive model to consistently identify a pattern is in part dependent upon providing the component with training data that represents the desired decision features in such a way that patterns emerge. Providing data with consistent patterns and recognizing such patterns when presented with new and different data is within the capacity of today's systems].
EP3133472 [ Voice communication 108 may also include (e.g., capture) background noise 110 that may be present at the location of HVAC component 106 while user 102 is speaking. Background noise 110 can include a number of different types of background noise. For example, background noise 110 can include a number of different noise levels that come from a number of different sources, such as, for example, noisy equipment, among other sources. Voice communication 108 may also include a command (e.g., trigger phrase) from user 102 to activate wireless interface 124 and/or MUI system 130. That is, user 102 can activate wireless interface 124 and/or MUI system 130 using the command (e.g., wireless interface 124 and/or MUI system 130 can activate upon receiving the command). MUI system 130 will be further described herein. The command can be, for example, "start voice control". However, embodiments of the present disclosure are not limited to a particular command].
WO2013/049248A2 [ [00948] The touchless or contactless biometric data gathering discussed above may be controlled in several ways, such as the control techniques discussed else in this disclosure. For example, in one embodiment, a user may initiate a data-gathering session by pressing a touch pad on the glasses, or by giving a voice command. In another embodiment, the user may initiate a session by a hand movement or gesture or using any of the control techniques described herein. Any of these techniques may bring up a menu, from which the user may select an option, such as "begin data gathering session," "terminate data-gathering session," or "continue session." If a data- gathering session is selected, the computer-controlled menu may then offer menu choices for number of cameras, which cameras, and so forth, much as a user selects a printer. There may also be modes, such as a polarized light mode, a color filter mode, and so forth. After each selection, the system may complete a task or offer another choice, as appropriate. User intervention may also be required, such as turning on a source of polarized light or other light source, applying filters or polarizers, and so forth], and [¶984].
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAHRIAR ZARRINEH whose telephone number is (571)272-1207. The examiner can normally be reached Monday-Friday, 8:30am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jorge Ortiz-Criado can be reached at 571-272-7624. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHAHRIAR ZARRINEH/Primary Examiner, Art Unit 2496