DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-2, 4, 6, 8-13, 15, 17-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Krishnan et al. (US 2022/0093101 A1, “Krishnan”).
As to claims 1, 12, 20, Krishnan discloses a method implemented by one or more processors, the method comprising:
receiving a stream of audio data, the stream of audio data being generated by one or more microphones of a client device of a user, and the stream of audio data capturing at least a portion of a spoken utterance provided by the user that are directed to an automated assistant implemented at least in part at the client device (client device 110 includes a microphone array to capture audio including speech directed by a user to the digital assistant, para. 0060, and on-device language processing components, para. 0098);
determining, based on processing the stream of audio data, audio-based characteristics associated with the portion of the spoken utterance (conversation analyzer 1120 considers audio characteristics such as pause timing/length, para. 0320);
determining, based on the audio-based characteristics associated with the portion of the spoken utterance, whether the user has paused in providing the spoken utterance (system detects a silence that is classified as a pause, para. 0320, 0349); and
in response to determining that the user has paused in providing the spoken utterance:
determining natural conversation output to be provided for audible presentation to the user, the natural conversation output to be provided for audible presentation to the user to indicate the automated assistant is waiting for the user to continue providing of the spoken utterance (in response to a pause in speech, the system outputs a backchannel response to identify to the user that the system is continuing to pay attention and is waiting, para. 0040, 0349; system may act more human-like as a natural participant in a conversation and may answer questions or interject information that may be helpful to the conversation, para. 0311, 0342, 0495-0499); and
causing the natural conversation output to be provided for audible presentation to the user via one or more speakers of the client device (synthesized speech output, para. 0040, 0043, 0080, 0495-0499).
As to claims 2, 13, Krishnan discloses: wherein causing the natural conversation output to be provided audible presentation to the user via the one or more speakers of the client device is further in response to determining that the user has paused in providing the spoken utterance for a threshold duration of time (pause time data may be used to determine the timing of an interjection, para. 0320).
As to claims 4, 15, Krishnan discloses:
determining whether the user has completed providing of the spoken utterance (determining an incomplete user utterance when detected silence is classified as a pause, para. 0349),
wherein determining natural conversation output to be provided for audible presentation to the user is further in response to determining that the user has not completed providing of the spoken utterance (a backchannel response is determined and output in order to encourage a completed utterance by the user, para. 0349).
As to claim 6, Krishnan discloses: in response to determining that the user has completed providing the spoken utterance: causing the automated assistant to initiate fulfillment of the spoken utterance (computing device performs task based on user’s spoken commands, para. 0002).
As to claims 8, 17, Krishnan discloses: keeping one or more automated assistant components that utilize the ASR model active while causing the natural conversation output to be provided for audible presentation to the user via one or more speakers of the client device (system may act more human-like as a natural participant in a conversation and may answer questions or interject information that may be helpful to the conversation, para. 0311, 0342, 0495-0499; synthesized speech output via loudspeakers of device 110, para. 0040, 0043, 0080, 0094-0095, 0495-0499).
As to claims 9, 18, Krishnan discloses: wherein causing the natural conversation output to be provided for audible presentation to the user via the one or more speakers of the client device comprises:
processing, using a text-to-speech (TTS) model, the natural conversation output to generate synthesized speech audio data that includes the natural conversation output (TTS component 280 creates audio data corresponding to the system generated natural language response, para. 0074, 0077); and
causing the synthesized speech audio data to be provided for audible presentation to the user via the one or more speakers of the client device (synthesized speech output via loudspeakers of device 110, para. 0094-0095).
As to claims 10, 19, Krishnan discloses: wherein causing the natural conversation output to be provided for audible presentation to the user via the one or more speakers of the client device comprises:
obtaining, from on-device memory of the client device, synthesized speech audio data that includes the natural conversation output (on-device language processing components include ASR, NLU, TTS, NLG, para. 0098); and
causing the synthesized speech audio data to be provided for audible presentation to the user via the one or more speakers of the client device (synthesized speech output via loudspeakers of device 110, para. 0094-0095).
As to claim 11, Krishnan discloses: wherein the one or more processors are implemented locally at the client device of the user (device 110 may conduct its own speech processing using on-device language processing components, para. 0098).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 3, 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan in view of Bratt et al. (US 2022/0115001 A1, “Bratt”).
Krishnan differs from claims 3, 14 in that although it teaches the use of machine learning models, it does not specifically disclose: wherein determining whether the user has paused in providing the spoken utterance comprises:
processing, using an audio-based classification machine learning (ML) model, the audio-based characteristics associated with the portion of the spoken utterance to generate output; and
determining, based on the output generated using the audio-based classification ML model, whether the user has paused in providing the spoken utterance.
Bratt teaches a conversational assistant with pause analytics to analyze and understand detected pauses in user speech (para. 0040, 0076, 0133, 0177), using a trained machine learning model (para. 0186, 0188). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Krishnan in view of Bratt in order to improve the flow of dialogue between a user and the assistant.
Claim(s) 5, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan in view of Kodish-Wachs (US 11,062,704 B1).
Krishnan differs from claims 5, 16 in that it does not disclose: processing, using a natural language understanding (NLU) model, the stream of ASR output, to generate a stream of NLU output, and wherein determining whether the user has completed providing of the spoken utterance is based on the stream of NLU output.
Kodish-Wachs teaches a natural language processor, which encompasses NLP, ASR and NLU (col. 5, lines 42-45), determining whether an utterance is complete or incomplete based on analyzing grammatical structure of the utterance, detecting a portion of the utterance is undecipherable and therefore incomplete, etc. (col. 9, line 50 – col. 10, line 7). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Krishnan with the above teaching of Kodish-Wachs in order to improve further computing functions, as taught by Kodish-Wachs (col. 4, lines 27-67).
Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan in view of Hansen et al. (US 11,038,934 B1, “Hansen”).
Krishnan differs from claim 7 in that it does not disclose: wherein causing the automated assistant to initiate the fulfillment of the spoken utterance comprises: causing, based on the stream of NLU output, a stream of fulfillment data to be generated, wherein the stream of fulfillment data includes an indication of the fulfillment of the spoken utterance.
Hansen teaches a digital assistant which outputs an audio and/or visual response based on the performance of one or more tasks in response to a user request in the form of a natural language command (col. 15, line 49 – col. 16, line 8; col. 87, lines 33-49; col. 88, lines 6-16). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Krishnan with the above teaching of Hansen in order to indicate to the user performance of a requested task.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Garg et al. (US 2022/0020376 A1) teach determining a pause using machine learning (para. 0050).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Stella L Woo whose telephone number is (571)272-7512. The examiner can normally be reached Monday - Friday, 8 a.m. to 5 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached at 571-272-7488. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
STELLA L. WOO
Primary Examiner
Art Unit 2693
/Stella L. Woo/ Primary Examiner, Art Unit 2693