DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections – 35 USC § 103
1. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
2. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
3. Claims 1-7 & 9-19 are rejected under 35 U.S.C. 103 as being unpatentable over Sumner et al. (US 20160259656 A1 hereinafter, Sumner ‘656) in view of Schramm et al. (US 20200302932 A1 hereinafter, Schramm ‘932).
Regarding claim 1; Sumner ‘656 discloses an electronic device (Fig. 2A, Portable Multifunction Device 200),
comprising:
a memory (Fig. 2A, Memory 202)
configured to store voice recognition models each associated with a respective virtual assistant (i.e. Examples of other applications 236 that are, optionally, stored in memory 202 include voice recognition, and voice replication. Paragraph 0117)
an audio transducer (Fig. 2A, Microphone 213)
and at least one processor (Fig. 2A, Processor(s) 220)
Sumner ‘656 does not expressly disclose the limitations as expressed below.
Schramm ‘932 discloses and at least one processor (Fig. 2A, Processor(s) 220 i.e. The various components shown in Fig. 7A are implemented in hardware, software instructions for execution by one or more processors, firmware, including one or more signal processing and/or application specific integrated circuits, or a combination thereof. Paragraph 0192)
configured to: receive an indication of a virtual assistant at a companion device (i.e. Upon receiving the user utterance, primary device 802 sends a representation of the user utterance and contextual information to DA server 806 (as represented by arrow 812). The contextual information indicates, for example, that a wireless communication connection is established between primary device 802 and companion device 804 and that companion device 804 is a registered device of primary device 802. Paragraphs 0254-0255);
select, based on the virtual assistant at the companion device, one of the voice recognition models (i.e. In some examples, the contextual information specifies an operating state of companion device 804. The operating state of the companion device 804 includes, for example, whether or not companion device 804 is currently engaged in an active call. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. For example, the contextual information specifies that primary device 802 is a smart speaker device (e.g., without stand-alone telephony functions) and/or that companion device 804 is a smartphone device (e.g., having stand-alone telephony functions). Paragraph 0255);
and detect, by providing an audio input from the audio transducer to the selected one of the voice recognition models, a trigger phrase associated with the virtual assistant at the companion device (i.e. In the present example, the user utterance is received in conjunction with invoking a digital assistant on primary device 802. The digital assistant is invoked, for example, upon determining that a first portion of the user utterance contains a predefined spoken trigger (e.g., “Hey Siri, . . . ”). Invoking the digital assistant causes primary device 802 to obtain audio data (e.g., via a microphone of primary device 802) containing a second portion of the user utterance (e.g., “ . . . call my mom”) and to automatically perform speech recognition (e.g., using STT processing module 730) and natural language processing (e.g., using natural language processing module 732) on the second portion of the user utterance. Paragraph 0254)
Sumner ‘656 and Schramm ‘932 are combinable because they are from same field of endeavor of speech systems (Schramm ‘932 at “Field”).
Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Sumner ‘656 by adding the limitations as taught by Schramm ‘932. The motivation for doing so would have been advantageous so that intelligent automated assistants (or digital assistants) can provide a beneficial interface between human users and electronic devices to better allow users to interact with devices or systems using natural language in spoken and/or text forms. Therefore, it would have been obvious to combine Sumner ‘656 with Schramm ‘932 to obtain the invention as specified.
Regarding claim 2; Sumner ‘656 discloses wherein the at least one processor is configured to load the selected one of the voice recognition models into the at least one processor prior to detecting the trigger phrase (i.e. Based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 can select one of the actionable intents as the task that the user intended the digital assistant to perform. In some examples, the domain that has the most “triggered” nodes can be selected. In some examples, the domain having the highest confidence value (e.g., based on the relative importance of its various triggered nodes) can be selected. In some examples, the domain can be selected based on a combination of the number and the importance of the triggered nodes. Paragraph 0239).
Regarding claim 3; Schramm ‘932 discloses wherein the indication comprises a device type of the companion device (i.e. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. For example, the contextual information specifies that primary device 802 is a smart speaker device (e.g., without stand-alone telephony functions) and/or that companion device 804 is a smartphone device (e.g., having stand-alone telephony functions). Paragraph 0255).
Regarding claim 4; Sumner ‘656 discloses wherein the electronic device comprises a media output device (Fig. 2A, Speaker 211 i.e. Digital assistant client module 229 can also be capable of providing output in audio (e.g., speech output), visual, and/or tactile forms through various output interfaces (e.g., speaker 211, touch-sensitive display system 212, tactile output generator(s) 267, etc.) of portable multifunction device 200. For example, output can be provided as voice, sound, alerts, text messages, menus, graphics, videos, animations, vibrations, and/or combinations of two or more of the above. During operation, digital assistant client module 229 can communicate with DA server 106 using RF circuitry 208. Paragraph 0091).
Regarding claim 5; Schramm ‘932 discloses wherein the at least one processor is configured to receive the indication of the virtual assistant in association with a pairing process between the electronic device and the companion device (i.e. Second user device 122 serves as a primary device and user device 104 serves as a companion device of second user device 122. In these examples, user device 104 is paired to second user device 122 and a wireless communication connection is established with each other upon successfully exchanging authentication information. Paragraph 0035).
Regarding claim 6; Schramm ‘932 discloses wherein the companion device comprises a smartphone (i.e. Companion device 804 provides information indicating that it is a smartphone having telephony functions and that it is currently not engaged in a call. Paragraph 0253).
Regarding claim 7; Schramm ‘932 discloses wherein the at least one processor is further configured to: receive a new indication of an other virtual assistant at a new companion device other than the companion device (i.e. Upon receiving the user utterance, primary device 802 sends a representation of the user utterance and contextual information to DA server 806 (as represented by arrow 812). The contextual information indicates, for example, that a wireless communication connection is established between primary device 802 and companion device 804 and that companion device 804 is a registered device of primary device 802. Paragraphs 0254-0255);
select, based on the new indication, an other one of the voice recognition models (i.e. In some examples, the contextual information specifies an operating state of companion device 804. The operating state of the companion device 804 includes, for example, whether or not companion device 804 is currently engaged in an active call. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. For example, the contextual information specifies that primary device 802 is a smart speaker device (e.g., without stand-alone telephony functions) and/or that companion device 804 is a smartphone device (e.g., having stand-alone telephony functions). Paragraph 0255);
and detect, by providing a new audio input from the audio transducer to the selected other one of the voice recognition models, another trigger phrase associated with the other virtual assistant at the new companion device (i.e. In the present example, the user utterance is received in conjunction with invoking a digital assistant on primary device 802. The digital assistant is invoked, for example, upon determining that a first portion of the user utterance contains a predefined spoken trigger (e.g., “Hey Siri, . . . ”). Invoking the digital assistant causes primary device 802 to obtain audio data (e.g., via a microphone of primary device 802) containing a second portion of the user utterance (e.g., “ . . . call my mom”) and to automatically perform speech recognition (e.g., using STT processing module 730) and natural language processing (e.g., using natural language processing module 732) on the second portion of the user utterance. Paragraph 0254).
Regarding claim 9; Sumner ‘656 discloses an electronic device (Fig. 2A, Portable Multifunction Device 200),
comprising: a memory (Fig. 2A, Memory 202);
configured to store a plurality voice recognition models each associated with a trigger phrase (Fig. 2A, Memory 202)
an audio transducer (Fig. 2A, Microphone 213)
and at least one processor (Fig. 2A, Processor(s) 220)
Sumner ‘656 does not expressly disclose the limitations as expressed below.
Schramm ‘932 discloses configured to: determine a device type of a companion device (i.e. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. For example, the contextual information specifies that primary device 802 is a smart speaker device (e.g., without stand-alone telephony functions) and/or that companion device 804 is a smartphone device (e.g., having stand-alone telephony functions). Paragraph 0255);
select, based on the device type of the companion device, one or more of the voice recognition models (i.e. Primary device 802 receives and analyzes the command from DA server 806 (e.g., using DA client module 229) and determines, based on a set of rules, which device is to execute the command. For example, primary device 802 is communicatively coupled to several devices (including companion device 804). Based on the command, primary device 802 determines whether the command is to be executed by itself or by one of the coupled devices. For example, based on the “place a call” actionable intent node and the “telephony” super domain specified in the command, primary device 802 determines that the command is to be executed by a device having telephony functions. Paragraph 0259)
load the selected one more of the voice recognition models (i.e. In accordance with determining that companion device 804 is a smartphone having telephony functions, primary device 802 sends instructions to companion device 804 (as represented by arrow 816) via the established wireless communication connection (e.g., using DA client module 229). The instructions cause companion device 804 to perform tasks that satisfy the user intent. Paragraph 0260)
and detect, based on an audio input from the audio transducer, the trigger phrase associated with one of the selected one more of the voice recognition models (i.e. In the present example, the user utterance is received in conjunction with invoking a digital assistant on primary device 802. The digital assistant is invoked, for example, upon determining that a first portion of the user utterance contains a predefined spoken trigger (e.g., “Hey Siri, . . . ”). Invoking the digital assistant causes primary device 802 to obtain audio data (e.g., via a microphone of primary device 802) containing a second portion of the user utterance (e.g., “ . . . call my mom”) and to automatically perform speech recognition (e.g., using STT processing module 730) and natural language processing (e.g., using natural language processing module 732) on the second portion of the user utterance. Paragraph 0254).
Sumner ‘656 and Schramm ‘932 are combinable because they are from same field of endeavor of speech systems (Schramm ‘932 at “Field”).
Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Sumner ‘656 by adding the limitations as taught by Schramm ‘932. The motivation for doing so would have been advantageous so that intelligent automated assistants (or digital assistants) can provide a beneficial interface between human users and electronic devices to better allow users to interact with devices or systems using natural language in spoken and/or text forms. Therefore, it would have been obvious to combine Sumner ‘656 with Schramm ‘932 to obtain the invention as specified.
Regarding claim 10; Schramm ‘932 discloses wherein the at least one processor is configured to determine the device type based on connection information associated with a connection between the electronic device and the companion device (i.e. The contextual information indicates, for example, that a wireless communication connection is established between primary device 802 and companion device 804 and that companion device 804 is a registered device of primary device 802. In some examples, the contextual information specifies an operating state of companion device 804. The operating state of the companion device 804 includes, for example, whether or not companion device 804 is currently engaged in an active call. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. Paragraph 0255).
Regarding claim 11; Schramm ‘932 discloses wherein the at least one processor is configured to determine the device type based on the connection information, responsive to establishing the connection (i.e. The contextual information indicates, for example, that a wireless communication connection is established between primary device 802 and companion device 804 and that companion device 804 is a registered device of primary device 802. In some examples, the contextual information specifies an operating state of companion device 804. The operating state of the companion device 804 includes, for example, whether or not companion device 804 is currently engaged in an active call. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. Paragraph 0255).
Regarding claim 12; Schramm ‘932 discloses wherein the device type corresponds to a manufacturer of the companion device (i.e. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. Paragraph 0255).
Regarding claim 13; Schramm ‘932 discloses wherein the device type corresponds to an operating system of the companion device (i.e. In some examples, the contextual information specifies an operating state of companion device 804. The operating state of the companion device 804 includes, for example, whether or not companion device 804 is currently engaged in an active call. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. For example, the contextual information specifies that primary device 802 is a smart speaker device (e.g., without stand-alone telephony functions) and/or that companion device 804 is a smartphone device (e.g., having stand-alone telephony functions). Paragraph 0255);
Regarding claim 14; Schramm ‘932 discloses wherein the device type corresponds to a vendor of the companion device (i.e. In some examples, server system 108 also employs various virtual devices and/or services of third-party service providers (e.g., third-party cloud service providers) to provide the underlying computing resources and/or infrastructure resources of server system 108. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. Paragraphs 0034 & 0255).
Regarding claim 15; Schramm ‘932 discloses wherein the, the device type corresponds to a service provider associated with the companion device (i.e. In some examples, server system 108 also employs various virtual devices and/or services of third-party service providers (e.g., third-party cloud service providers) to provide the underlying computing resources and/or infrastructure resources of server system 108. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. Paragraphs 0034 & 0255).
Regarding claim 16; Sumner ‘656 discloses a processor (Fig. 2A, Processor(s) 220)
Sumner ‘656 does not expressly disclose the limitations as expressed below.
Schramm ‘932 discloses configured to: receive an indication of a virtual assistant at an electronic device that does not include the processor (i.e. Upon receiving the user utterance, primary device 802 sends a representation of the user utterance and contextual information to DA server 806 (as represented by arrow 812). The contextual information indicates, for example, that a wireless communication connection is established between primary device 802 and companion device 804 and that companion device 804 is a registered device of primary device 802. Paragraphs 0254-0255);
select, from among a plurality of voice recognition models each associated with a respective virtual assistant, one of the plurality of voice recognition models that is associated with the indicated virtual assistant at the electronic device (i.e. In some examples, the contextual information specifies an operating state of companion device 804. The operating state of the companion device 804 includes, for example, whether or not companion device 804 is currently engaged in an active call. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. For example, the contextual information specifies that primary device 802 is a smart speaker device (e.g., without stand-alone telephony functions) and/or that companion device 804 is a smartphone device (e.g., having stand-alone telephony functions). Paragraph 0255);
and detect, by providing an audio input from an audio transducer to the selected one of plurality of the voice recognition models, a trigger phrase associated with the virtual assistant at the electronic device (i.e. In the present example, the user utterance is received in conjunction with invoking a digital assistant on primary device 802. The digital assistant is invoked, for example, upon determining that a first portion of the user utterance contains a predefined spoken trigger (e.g., “Hey Siri, . . . ”). Invoking the digital assistant causes primary device 802 to obtain audio data (e.g., via a microphone of primary device 802) containing a second portion of the user utterance (e.g., “ . . . call my mom”) and to automatically perform speech recognition (e.g., using STT processing module 730) and natural language processing (e.g., using natural language processing module 732) on the second portion of the user utterance. Paragraph 0254).
Sumner ‘656 and Schramm ‘932 are combinable because they are from same field of endeavor of speech systems (Schramm ‘932 at “Field”).
Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Sumner ‘656 by adding the limitations as taught by Schramm ‘932. The motivation for doing so would have been advantageous so that intelligent automated assistants (or digital assistants) can provide a beneficial interface between human users and electronic devices to better allow users to interact with devices or systems using natural language in spoken and/or text forms. Therefore, it would have been obvious to combine Sumner ‘656 with Schramm ‘932 to obtain the invention as specified.
Regarding claim 17; Sumner ‘656 discloses wherein the processor is configured to load the selected one of the plurality of voice recognition models into the processor prior to detecting the trigger phrase (i.e. Based on the quantity and/or relative importance of the activated nodes, natural language processing module 732 can select one of the actionable intents as the task that the user intended the digital assistant to perform. In some examples, the domain that has the most “triggered” nodes can be selected. In some examples, the domain having the highest confidence value (e.g., based on the relative importance of its various triggered nodes) can be selected. In some examples, the domain can be selected based on a combination of the number and the importance of the triggered nodes. Paragraph 0239).
Regarding claim 18; Sumner ‘656 discloses wherein the indication comprises a device type of the electronic device (i.e. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. For example, the contextual information specifies that primary device 802 is a smart speaker device (e.g., without stand-alone telephony functions) and/or that companion device 804 is a smartphone device (e.g., having stand-alone telephony functions). Paragraph 0255);
Regarding claim 19; Schramm ‘932 discloses wherein the device type corresponds to an operating system of the electronic device (i.e. In some examples, the contextual information specifies an operating state of companion device 804. The operating state of the companion device 804 includes, for example, whether or not companion device 804 is currently engaged in an active call. In some examples, the contextual information specifies the type of device corresponding to primary device 802 and/or companion device 804. For example, the contextual information specifies that primary device 802 is a smart speaker device (e.g., without stand-alone telephony functions) and/or that companion device 804 is a smartphone device (e.g., having stand-alone telephony functions). Paragraph 0255);
3. Claims 8 & 20 are rejected under 35 U.S.C. 103 as being unpatentable over Sumner ‘656 and Schramm ‘932 as applied to above, and further in view of Binder et al. (US 20140222436 A1 hereinafter, Binder ‘436).
Regarding claims 8; Sumner ‘656 as modified does not expressly disclose the limitations as expressed below.
Binder ‘436 discloses wherein the at least one processor is configured to detect the trigger phrase by performing a low power listening operation using the selected one of the voice recognition models (i.e. One technique for initiating a speech-based service with a voice trigger is to have the speech-based service continuously listen for a predetermined trigger word, phrase, or sound. The main processor of an electronic device is kept in a low-power or un-powered state while one or more sound detectors that use less power remain active. Paragraph 0007);
the low power listening operation comprising: periodically or continuously providing the audio input from the audio transducer to the selected one of the voice recognition models (i.e. One technique for initiating a speech-based service with a voice trigger is to have the speech-based service continuously listen for a predetermined trigger word, phrase, or sound (any of which may be referred to herein as “the trigger sound”). However, continuously operating the speech-based service (e.g., the voice-based digital assistant) requires substantial audio processing and battery power. In order to reduce the power consumed by providing voice trigger functionality, several techniques may be employed. In some implementations, the main processor of an electronic device (i.e., an “application processor”) is kept in a low-power or un-powered state while one or more sound detectors that use less power (e.g., because they do not rely on the application processor) remain active. Paragraph 0007)
and triggering an active listening mode for the companion device responsive to an output of the selected one of the voice recognition models that indicates a detection of the trigger phrase in the audio input (i.e. Voice triggers can also be implemented so that they are activated in response to a specific, predetermined word, phrase, or sound, and without requiring a physical interaction by the user. For example, a user may be able to activate a SIRI digital assistant on an IPHONE by reciting the phrase “Hey, SIRI.” In response, the device outputs a beep, sound, or speech output (e.g., “what can I do for you?”) indicating to the user that the listening mode is active. Paragraph 0006).
Sumner ‘656 and Binder ‘436 are combinable because they are from same field of endeavor of speech systems (Binder ‘436 at “Technical Field”).
Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Sumner ‘656 by adding the limitations as taught by Binder ‘436. The motivation for doing so would have been advantageous to provide a method and system of activating a voice-based digital assistant (or other speech-based service) using a voice input or signal, and not a tactile input. Therefore, it would have been obvious to combine Sumner ‘656 with Binder ‘436 to obtain the invention as specified.
Regarding claim 20; Binder ‘436 discloses wherein the processor is configured to detect the trigger phrase by performing a low power listening operation using the selected one of the plurality of voice recognition models (i.e. One technique for initiating a speech-based service with a voice trigger is to have the speech-based service continuously listen for a predetermined trigger word, phrase, or sound. The main processor of an electronic device is kept in a low-power or un-powered state while one or more sound detectors that use less power remain active. Paragraph 0007)
the low power listening operation comprising: periodically or continuously providing the audio input from the audio transducer to the selected one of the plurality of voice recognition models (i.e. One technique for initiating a speech-based service with a voice trigger is to have the speech-based service continuously listen for a predetermined trigger word, phrase, or sound (any of which may be referred to herein as “the trigger sound”). However, continuously operating the speech-based service (e.g., the voice-based digital assistant) requires substantial audio processing and battery power. In order to reduce the power consumed by providing voice trigger functionality, several techniques may be employed. In some implementations, the main processor of an electronic device (i.e., an “application processor”) is kept in a low-power or un-powered state while one or more sound detectors that use less power (e.g., because they do not rely on the application processor) remain active. Paragraph 0007)
and triggering an active listening mode for the electronic device responsive to an output of the selected one of the plurality of voice recognition models that indicates a detection of the trigger phrase in the audio input (i.e. Voice triggers can also be implemented so that they are activated in response to a specific, predetermined word, phrase, or sound, and without requiring a physical interaction by the user. For example, a user may be able to activate a SIRI digital assistant on an IPHONE by reciting the phrase “Hey, SIRI.” In response, the device outputs a beep, sound, or speech output (e.g., “what can I do for you?”) indicating to the user that the listening mode is active. Paragraph 0006)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCUS T. RILEY, ESQ. whose telephone number is (571)270-1581. The examiner can normally be reached 9-5 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
MARCUS T. RILEY, ESQ.
Primary Examiner
Art Unit 2654
/MARCUS T RILEY/Primary Examiner, Art Unit 2654