Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103 is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Priority
Acknowledgment is made of applicant's claim for foreign priority based on Chinese application 202310296836.X filed on 03/23/2023. Certified copy of said foreign application has been received.
Claim Rejections - 35 USC § 101
35 U.S.C. §101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 9-14 are rejected under 35 USC 101 as directing toward non-statutory subject matter.
Regarding Claims 9-14, claim 9 recites a computer readable storage medium.
According to the specification US 2024/0321268 A1 at ¶139: “The computer readable storage medium may be one readable medium or any combination of a plurality of readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but not limited to electricity, magnetism, light, electromagnetism, infrared ray, or a semiconductor system, apparatus or device, or any combination of the above”.
Here, the scope of “computer readable storage medium” includes “readable signal medium” or transitory signal.
While signal (e.g., electricity, magnetism, light, electromagnetism, infrared ray) is man-made and physical – it exists in the real word and has tangible causes and effects – it is a change in electric potential and energy embodying such claimed computer readable storage medium is fleeting and devoid of any semblance of permanence during transmission. In re Nuijten, 500 F.3d 1346, 1356 (Fed. Cir. 2007). Therefore, a computer readable storage medium whose scope includes a transient recording medium is devoid of matter and are not statutory within the meaning of §101. Id. at 1357.
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-6, 8-13, and 15-19 are rejected under 35 USC 103(a) as being unpatentable over Weinberg et al. (US 2022/0293124 A1) in view of Pitschel et al. (US 9922642 B2).
Regarding Claim 1, Weinberg discloses a human-machine interaction method (¶243, systems and processes for continuous dialog with a digital assistant), including:
in response to receiving target interaction information collected when a target user interacts with a target device according to a first interaction mode (¶244, upon being invoked by “Hey Siri”, the digital assistant enters a listening state 804), performing semantic recognition on the target interaction information to obtain target semantic information (¶244, digital assistant samples speech input including questions and commands and begin processing the questions and commands in processing state 806; per ¶208, convert speech input into text, identifying a user’s intent expressed in the natural language input, actively obtaining information needed to fully infer the user’s intent, and determining a task flow for fulfilling the inferred intent);
determining completeness of the target semantic information (¶208, processing includes actively eliciting and obtaining information needed to fully infer the user’s intent);
in response to the target semantic information being incomplete, determining whether there is to-be-combined semantic information cached in a preset semantic state record library (¶217, natural language processing module 728 uses contextual information including prior interactions / dialogues between the digital assistant and the user to clarify, supplement, and further define information; ¶230, generate a structured query to represent identified actionable intent of the user request, determine that user’s utterance contains insufficient information to complete the structured query, and populate the structured query with contextual information), wherein the to-be-combined semantic information is semantic information obtained when the target user interacts with the target device according to at least one second interaction mode during a target interaction phase (¶244, during processing state 806, the user may interrupt the processing of the speech input by way of a subsequent speech input; ¶¶283-84, in one example, prior to second speech input, speech assistant obtained prior interactions / dialogues including a first speech input and a third speech input where the user interrupted / corrected the digital assistant; this interaction mode allows the user to interrupt the digital assistant and the system reduces unnecessary output from the device);
in response to presence of the to-be-combined semantic information in the semantic state record library, generating complete semantic information based on the target semantic information and the to-be-combined semantic information (¶230, natural language processing module 732 populates some parameters of the structured query with received contextual information); and
based on the complete semantic information, determining a target controlled object and a control mode for controlling the target controlled object, and generating a control instruction corresponding to the control mode (¶245, if the digital assistant completes processing of the speech input to obtain one or more results, then the digital assistant enters a response state 808 where digital assistant provides one or more results; e.g., ¶235, task flow processing module 736 performs steps (1)-(4) to make a restaurant reservation for restaurant reservation structured query at ABC Café, on 3/12/12, at 7pm, for a party of 5)
Weinberg does not disclose updating the to-be-combined semantic information in the semantic state record library based on the target semantic information.
Pitschel teaches a human machine interaction method (Col 9, Rows 25-31, I/O processing module 328 interacts with user to obtain user input and to provide response to user input) performing semantic recognition on target interaction information to obtain target semantic information (Col 10, Rows 6-14, natural language processing module 332 takes words / tokens of speech to text processed user input and associate the token sequence with one or more actionable intent), determining completeness of the target semantic information (Col 13, Rows 31-56, generate a structured query to represent the identified actionable intent and determine that the user utterance contains insufficient information to complete the structured query associated with the domain), using to-be-combined semantic information cached in a preset semantic state record library to generate complete semantic information for the incomplete target semantic information (Col 13, Rows 54-59, use context information to populate parameters of the structured query; per Col 10, Rows 33-38, context information includes prior interaction / dialogue between the digital assistant and the user), and updating the to be combined semantic information in the semantic state record library based on the target semantic information (Col 17, Rows 26-35, digital assistant maintains a user log 370 based on user requests and interactions to store information such as user requests received, context information, responses provided to the user, clarification inputs, the parameters used by digital assistant to generate and provide the response).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to update the to be combined semantic information in the semantic state record library based on the target semantic information to provide a searchable semantic state record library (Pitschel, Col 17, Rows 35-38).
Regarding Claim 2, Weinberg discloses after the determining the completeness of the target semantic information, in response to determining that the target semantic information is complete, generating a control instruction corresponding to the target semantic information based on the target semantic information (¶235, once task flow processing module 736 has completed the structured query for an actionable intent, proceed to perform the ultimate task associated with the actionable intent; see e.g., task flow steps (1)-(4)); and
as modified by Pitschel, updating the to-be-combined semantic information in the semantic state record library based on the target semantic information (Pitschel, Col 17, Rows 29-38, user log stores context information surrounding the user requests, responses provided to the user, the parameters used by the digital assistant to generate and provide the response).
Regarding Claim 3, Weinberg discloses wherein, the method further includes:
after the determining whether there is to-be-combined semantic information cached in a predetermined semantic state record library, in response to determining that there is not the to-be-combined semantic information in the semantic state record library, generating the to-be-combined semantic information based on the target semantic information (¶230, populating some parameters of the structured query with contextual information means not all parameters can be populated with contextual information (i.e., there is not contextual information to complete the structure query); e.g., ¶227, for “invite my friends to my birthday party”, access user data 748 to determine who the “friends” are and when and where the “birthday party” would be held), and as modified by Pitschel, storing the to-be-combined semantic information into the semantic state record library (Pitschel, Col 17, Rows 29-38, user log stores context information surrounding the user requests, responses provided to the user, the parameters used by the digital assistant to generate and provide the response).
Regarding Claim 4, Weinberg discloses wherein the receiving target interaction information collected when the target user interacts with the target device according to the first interaction mode includes:
receiving first interaction information collected from the target user in response to an interaction start signal for triggering a first-round interaction with the target device during the target interaction phase (¶244, user utters “hey Siri” to invoke the digital assistant; once invoked, the digital assistant enters listening state 804 to sample audio including speech input from the user); and
in response to determining that a type of the first interaction information is speech interaction information, determining the first interaction information as the target interaction information (¶244, the digital assistant samples speech input from the user while in the listening state 804 and begin processing the speech input in processing state 806; per ¶208, converting speech input into text, identifying a user’s intent expressed in a natural language input received from the user, determining the task flow for fulling the intent).
Regarding Claim 5, Weinberg discloses wherein in response to receiving target interaction information collected when a target user interacts with a target device according to a first interaction mode, the performing semantic recognition on the target interaction information to obtain target semantic information includes:
receiving second interaction information collected when interacting with the target user according to any of predetermined at least one interaction mode, in response to an interaction start signal for triggering a non-first-round interaction with the target device during the target interaction phase (¶284, at block 1406 (after receiving first speech and third speech at blocks 1402 and 1404), initiate a session window associated with a user gaze directed to a displayed digital assistant object and receive a second speech input; the system focuses on a relevant window of time to capture relevant additional speech from the user and improves user experience by capturing additional speech from the user);
generating the target interaction information based on the second interaction information (¶285, determine that the second speech input includes speech directed to the digital assistant according to detected user gaze being directed to a display of the digital assistant electronic device and detect a command within the second speech input); and
performing semantic recognition on the target interaction information according to a semantic recognition mode corresponding to the target interaction information to obtain the target semantic information (¶285, identify a command within the second speech input and determine that the second speech input includes speech directed to the digital assistant; i.e., ¶208, converting speech input into text, identifying a user’s intent expressed in a natural language input received from the user, determining the task flow for fulling the intent).
Regarding Claim 6, Weinberg discloses wherein the performing semantic recognition on the target interaction information according to the semantic recognition mode corresponding to the target interaction information to obtain the target semantic information includes:
in response to determining that the target interaction information is line-of-sight interaction information obtained by utilizing a line-of-sight interaction mode among the at least one interaction mode, performing recognition on the line-of-sight interaction information according to the line-of-sight recognition mode to obtain target controlled object information, and determining the target controlled object information as target semantic information (¶285, determine that the second speech input includes speech directed to the digital assistant according to detected user gaze being directed to a display of the digital assistant electronic device and detect a command within the second speech input; i.e., perform semantic processing per ¶208); and
in response to determining that the target interaction information is interaction information obtained by utilizing a non-line-of-sight interaction mode among the at least one interaction mode, performing semantical recognition on the target interaction information according to a semantic recognition mode corresponding to the target interaction information to obtain the target controlled object information and/or target instruction information (¶283, at blocks 1402-1404, perform processing of first speech input and third speech input to identify predefined words therein to determine speech inputs include speech directed to the digital assistant in a non-line of sight mode prior to line of sight mode / gaze detection at block 1406), and
determining the target controlled object information and/or the target instruction information as target semantic information (¶283, for non-gaze / non-line of sight mode, determine speech is directed to digital assistant and determine corresponding task flows per ¶235; for gaze / line of sight mode, ¶286, determine second speech input includes speech directed to the digital assistant and determine corresponding task flows per ¶235).
Regarding Claim 8, Weinberg as modified by Pitschel discloses wherein the updating the to-be-combined semantic information in the semantic state record library based on the target semantic information includes:
extracting target controlled object information and/or target instruction information from the target semantic information (Pitschel, Col 17, Rows 29-35, user log stores the responses provided to the user (e.g., Col 15, Rows 15-23, restaurant reservation at ABC Café, 3/12/2012, at 7pm, for party of 5); compare Weinberg, ¶235, restaurant reservation at ABC Café, 3/12/2012, at 7pm, for party of 5); and
updating to-be-combined controlled object information and/or to-be-combined instruction information included in the to-be-combined semantic information by using the target controlled object information and/or the target instruction information (Pitschel, Col 17, Rows 29-35, user log stores the parameters and the procedures used by the digital assistant to generate and provide the response).
Regarding Claim 9, Weinberg discloses a computer-readable storage medium (¶¶198-99, digital assistant system 700 includes memory 702 / non-transitory computer readable medium), in which a computer program is stored, the computer program is configured for being executed by a processor (¶197, software instructions for execution by one or more processors; ¶198, digital assistant 700 includes processors 704) to implement the method according to claim 1 (¶197, software instructions for execution by one or more processors).
Regarding Claim 10, Weinberg discloses wherein the method further includes:
after the determining the completeness of the target semantic information, in response to determining that the target semantic information is complete, generating a control instruction corresponding to the target semantic information based on the target semantic information (¶235, once task flow processing module 736 has completed the structured query for an actionable intent, proceed to perform the ultimate task associated with the actionable intent; see e.g., task flow steps (1)-(4)); and
as modified by Pitschel, updating the to-be-combined semantic information in the semantic state record library based on the target semantic information (Pitschel, Col 17, Rows 29-38, user log stores context information surrounding the user requests, responses provided to the user, the parameters used by the digital assistant to generate and provide the response).
Regarding Claim 11, Weinberg discloses wherein, the method further includes: after the determining whether there is to-be-combined semantic information cached in a predetermined semantic state record library, in response to determining that there is not the to-be-combined semantic information in the semantic state record library, generating the to-be-combined semantic information based on the target semantic information (¶230, populating some parameters of the structured query with contextual information means not all parameters can be populated with contextual information (i.e., there is not contextual information to complete the structure query); e.g., ¶227, for “invite my friends to my birthday party”, access user data 748 to determine who the “friends” are and when and where the “birthday party” would be held), and as modified by Pitschel, storing the to-be-combined semantic information into the semantic state record library (Pitschel, Col 17, Rows 29-38, user log stores context information surrounding the user requests, responses provided to the user, the parameters used by the digital assistant to generate and provide the response).
Regarding Claim 12, Weinberg discloses wherein the receiving target interaction information collected when the target user interacts with the target device according to the first interaction mode includes:
receiving first interaction information collected from the target user in response to an interaction start signal for triggering a first-round interaction with the target device during the target interaction phase (¶244, user utters “hey Siri” to invoke the digital assistant; once invoked, the digital assistant enters listening state 804 to sample audio including speech input from the user); and
in response to determining that a type of the first interaction information is speech interaction information, determining the first interaction information as the target interaction information (¶244, the digital assistant samples speech input from the user while in the listening state 804 and begin processing the speech input in processing state 806; per ¶208, converting speech input into text, identifying a user’s intent expressed in a natural language input received from the user, determining the task flow for fulling the intent).
Regarding Claim 13, Weinberg discloses wherein in response to receiving target interaction information collected when a target user interacts with a target device according to a first interaction mode, the performing semantic recognition on the target interaction information to obtain target semantic information includes:
receiving second interaction information collected when interacting with the target user according to any of predetermined at least one interaction mode, in response to an interaction start signal for triggering a non-first-round interaction with the target device during the target interaction phase (¶284, at block 1406 (after receiving first speech and third speech at blocks 1402 and 1404), initiate a session window associated with a user gaze directed to a displayed digital assistant object and receive a second speech input; the system focuses on a relevant window of time to capture relevant additional speech from the user and improves user experience by capturing additional speech from the user);
generating the target interaction information based on the second interaction information (¶285, determine that the second speech input includes speech directed to the digital assistant according to detected user gaze being directed to a display of the digital assistant electronic device and detect a command within the second speech input); and
performing semantic recognition on the target interaction information according to a semantic recognition mode corresponding to the target interaction information to obtain the target semantic information (¶285, identify a command within the second speech input and determine that the second speech input includes speech directed to the digital assistant; i.e., ¶208, converting speech input into text, identifying a user’s intent expressed in a natural language input received from the user, determining the task flow for fulling the intent).
Regarding Claim 15, Weinberg discloses an electronic device (¶198, digital assistant system 700), including:
a processor (¶198, digital assistant system 700 includes one or more processors 704); and
a memory configured for storing processor-executable instructions (¶197 and ¶199, memory 702 includes non-transitory computer readable medium for software instructions); wherein the processor is configured for reading the executable instructions from the memory and executing the instructions to implement the method according to claim 1 (¶197, software instructions for execution by one or more processors).
Regarding Claim 16, Weinberg discloses wherein the method further includes:
after the determining the completeness of the target semantic information, in response to determining that the target semantic information is complete, generating a control instruction corresponding to the target semantic information based on the target semantic information (¶235, once task flow processing module 736 has completed the structured query for an actionable intent, proceed to perform the ultimate task associated with the actionable intent; see e.g., task flow steps (1)-(4)); and
as modified by Pitschel, updating the to-be-combined semantic information in the semantic state record library based on the target semantic information (Pitschel, Col 17, Rows 29-38, user log stores context information surrounding the user requests, responses provided to the user, the parameters used by the digital assistant to generate and provide the response).
Regarding Claim 17, Weinberg discloses wherein, the method further includes:
after the determining whether there is to-be-combined semantic information cached in a predetermined semantic state record library, in response to determining that there is not the to-be-combined semantic information in the semantic state record library, generating the to-be-combined semantic information based on the target semantic information (¶230, populating some parameters of the structured query with contextual information means not all parameters can be populated with contextual information (i.e., there is not contextual information to complete the structure query); e.g., ¶227, for “invite my friends to my birthday party”, access user data 748 to determine who the “friends” are and when and where the “birthday party” would be held), and
as modified by Pitschel, storing the to-be-combined semantic information into the semantic state record library (Pitschel, Col 17, Rows 29-38, user log stores context information surrounding the user requests, responses provided to the user, the parameters used by the digital assistant to generate and provide the response).
Regarding Claim 18, Weinberg discloses wherein the receiving target interaction information collected when the target user interacts with the target device according to the first interaction mode includes:
receiving first interaction information collected from the target user in response to an interaction start signal for triggering a first-round interaction with the target device during the target interaction phase (¶244, user utters “hey Siri” to invoke the digital assistant; once invoked, the digital assistant enters listening state 804 to sample audio including speech input from the user); and
in response to determining that a type of the first interaction information is speech interaction information, determining the first interaction information as the target interaction information (¶244, the digital assistant samples speech input from the user while in the listening state 804 and begin processing the speech input in processing state 806; per ¶208, converting speech input into text, identifying a user’s intent expressed in a natural language input received from the user, determining the task flow for fulling the intent).
Regarding Claim 19, Weinberg discloses wherein in response to receiving target interaction information collected when a target user interacts with a target device according to a first interaction mode, the performing semantic recognition on the target interaction information to obtain target semantic information includes:
receiving second interaction information collected when interacting with the target user according to any of predetermined at least one interaction mode, in response to an interaction start signal for triggering a non-first-round interaction with the target device during the target interaction phase (¶284, at block 1406 (after receiving first speech and third speech at blocks 1402 and 1404), initiate a session window associated with a user gaze directed to a displayed digital assistant object and receive a second speech input; the system focuses on a relevant window of time to capture relevant additional speech from the user and improves user experience by capturing additional speech from the user);
generating the target interaction information based on the second interaction information (¶285, determine that the second speech input includes speech directed to the digital assistant according to detected user gaze being directed to a display of the digital assistant electronic device and detect a command within the second speech input); and
performing semantic recognition on the target interaction information according to a semantic recognition mode corresponding to the target interaction information to obtain the target semantic information (¶285, identify a command within the second speech input and determine that the second speech input includes speech directed to the digital assistant; i.e., ¶208, converting speech input into text, identifying a user’s intent expressed in a natural language input received from the user, determining the task flow for fulling the intent).
Claims 7, 14, and 20 are rejected under 35 USC 103(a) as being unpatentable over Weinberg et al. (US 2022/0293124 A1) in view of Pitschel et al. (US 9922642 B2) as applied to claim 1, in further view of Weinsten et al. (US 9990925 B2).
Regarding Claims 7, 14, and 20 Weinberg discloses wherein the method further includes: in response to triggering a sleep signal for causing the target device to enter an interactive sleep state, controlling the target device to enter the interactive sleep state and exit a target interaction phase (¶288, toggling the device between speech recognition states based on speech thresholds, the system conserves processing resources by transitioning to low power states when appropriate).
Weinberg and Pitschel do not disclose deleting the to-be-combined semantic information from the semantic state record library.
Weinstein discloses processing speech inputs for semantic information that are likely sensitive data (Col 3, Rows 63-66, speech processing system 106 analyzes speech recognition request 107 to generate text transcription; Col 4, Rows 35-38 and Col 5, Rows 55-59, determine at least a portion of the data in the request 107 is likely sensitive data) and deleting the semantic information from a semantic state record library (Col 5, Rows 64-66, portions of request 107 that are sensitive data are not to be logged and may be deleted).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to delete the to-be-combined semantic information from the semantic state record library if it is determined that the to be combined semantic information is sensitive in order to address the challenge that certain to be combined semantic information may not be loggable and need to be removed from the system quickly (Weinstein, Col 2, Rows 28-32).
Conclusion
Prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 2019/0369748 A1 discloses an intelligent automated assistant system determining one or more criteria representing expressed user disinterest to automatically deactivating a virtual assistant session, the criteria includes determination of whether direction of user gaze is directed to the system (Abstract and see ¶271).
CN 115982328 A discloses information processing device acquiring natural language information to be identified, performing intention identification on natural language information to be identified according to target historical interaction information to obtain an intention identified result, trimming / correcting the natural language information to be recognized according to the intention recognition result and the target historical interaction information to obtain trimmed / corrected natural language information.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor Hai Phan whose telephone number is 571-272-6338. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICHARD Z ZHU/Primary Examiner, Art Unit 2654 02/13/2026