Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Acknowledgement
Acknowledgement is made of applicant’s amendment made on 11/26/2025. Applicant’s submission filed has been entered and made of record.
Status of the Claims
Claims 1-9 and 19-30 are pending. Claims 10-18 were withdrawn.
Response to Applicant’s Arguments
In response to “That is, the core goal of Kanungo is to enable a single device (such as electronic device 101) to independently complete the action or actions corresponding to the user's speech without the collaboration of other devices, and thus no executor determination is required”.
Kanungo discloses a smart glasses electronic device 101 / processor 120 (¶43) can request another device (such as electronic devices 102 and 104 or server 106) to perform at least some functions associated therewith where the other electronic device is able to execute the requested functions (i.e., executors) and transfer a result of the execution to the smart glasses electronic device 101 / processor 120 (¶45).
In particular, once smart glasses electronic device 101 / processor 120 has access to the intent of the utterance and the slots for items such as one or more named entities included in the utterance, the processor 120 constructs an action plan and uses the action plan to instruct at least one action of the electronic device 101 or of another device or system that corresponds to one or more instructions or requests provided in the utterance (¶56).
To that extent, the smart glasses electronic device 101 / processor 120 can process and understand audio data to identify entities (¶50) in order to retrieve named entities from personal database 204 (¶50) such as Internet of Things IoT device names such as personalized device names for devices like ovens, refrigerators, light or illumination devices, microwaves, thermostats and so on (¶51).
Therefore, smart glasses electronic device 101 / processor 120 can instruct other IoT devices / executors by determining named entities corresponding to IoT devices / executors.
In response to “Further, the trained language model 502 in the server only outputs intents, slots, and instructions, where the instructions are used for basic guidance for action execution, and they are unstructured and non-directly executable commands that cannot be distributed. Even assuming that the electronic device 101 can instruct other devices to perform other operations, the electronic device 101 can only generate corresponding new control instructions based on the output of the trained language model 502, and use the generated new control instructions to control other devices to perform the actions. The electronic device 101 cannot instruct different devices to perform different operations by distributing the output of the trained language model 502 to other devices”.
Exemplary claim 1 recites:
receive the at least one task execution command sent by the model server,
(2) determine one or more executors of one or more tasks according to description information of each task or executor description information of each task in the at least one task execution command, and
(3) distribute each task execution command to the determined one or more executors, so that each executor executes at least one action associated with the received task execution command, wherein the determined one or more executors comprise at least one of smart glasses and devices in an Internet of Things (IoT)
With respect to (1), Kanungo discloses the smart glasses electronic device 101 / processor 120 receives an intent and one or more slots associated with the utterance from the server (¶36).
With respect to (2), the processor 120 further identifies a named entity corresponding to at least one slot using the named entity database (¶36); i.e., the processor 120 retrieves named entities from the personal database 204 (¶50) storing records for IoT device names corresponding to smart things application category that can understand and execute commands given by a user in respective topic domains (¶51).
In other words, the processor 120 received slot information corresponded to at least an executor description information of each task because the slot information enabled the processor 120 to identify a named entity of an IoT device / executor that can understand and execute command.
In the example of the utterance “start Dobby at 8 pm at 425 degrees” (¶57), the processor 120 received an intent and slot tags for the utterance to create an action plan that indicates the processor 120 should instruct the smart oven device to preheat to 425 degrees at 8 pm (¶57). This example demonstrated at least that the smart glasses electronic device 101 / processor 120 determines named entity of an executor / IoT device based on slot information received from the server. Therefore, the slot information received from the model server meets the requirement “executor description information of each task in the at least one task execution command”.
With respect to (3), in the example of the utterance “start Dobby at 8 pm at 425 degrees”, once the processor 120 receives the intent and slot tags for the utterance and creates the plan indicating that the processor 120 should instruct the smart oven IoT device to preheat to 425 degrees at 8 pm (¶57), the processor 120 in turn instructs at least an action of the smart oven IoT device (¶56) to preheat to 425 degrees at 8 pm.
Therefore, the smart glasses electronic device 101 / processor 120 at least distributes a preheat task execution command to the determined IoT smart oven device to execute preheating to 425 degrees at 8 pm.
In response to “In contrast, the prompt generator of Amended Claim 1uses a plurality of associated semantics as one piece of prompt message according to a relationship between various semantics. Therefore, the encoded utterance of Kanungo is different from the prompt message of Amended Claim 1, and Kanungo fails to disclose, teach, or suggest the innovation (2) of Amended Claim 1”.
Kanungo discloses smart glasses electronic device 101 / processor 120 uses voice assistant 202 / machine learning language models 205 (Fig. 2) to process audio data utterance and understanding for use in identifying entities and planning (¶49):
PNG
media_image1.png
444
523
media_image1.png
Greyscale
In particular, the machine learning models 205 includes a delexicalization and annotation model that delexicalizes, encodes, and marks portions of the utterances using the personal database 204 for further processing by another MEM-BERT language model at the server (¶49).
Therefore, the machine learning models 205 is a prompt generator to generate a prompt / encoded utterance to prompt the MEM-BERT language model at the server.
Finally, the processor 120 uses the ML models 205 to perform semantic parsing on the audio input to at least retrieve one or more named entities from the personal database 204 (¶50). Specifically, the semantic parsing corresponds to a delexicalization and annotation operation consulting the personal database 204 based on user utterances to delexicalize each utterance by querying the personal database 204 for the named entities included in the utterance (¶71). The process replaces any named entity in the utterance with an entity mask (¶71).
For example, the utterance “set a reminder to call Joe at 7 pm”, the process generates an encoded utterance “set a reminder to call_MASK1_ at 7 pm” as prompt (¶73) to the MEM-BERT language model at the server. In this particular example, the smart glasses electronic device 101 / processor 120 used the annotation function in the ML models 205 to determine the semantics relationship between “Joe” in the utterance “set a reminder to call Joe at 7 pm” and personalized data in the personal database 204 in order to replace “Joe” with a mask to generate the delexicalized / encoded utterance “set a reminder to call_MASK1_ at 7 pm”.
In other words, the ML models 205 / prompt generator (1) parsed “set a reminder to call Joe at 7 pm” to perform named entity recognition of “Joe” in the utterance (i.e., first semantics / understanding that “Joe” is a named entity), (2) determined that the named entity “Joe” corresponds / relates to a personalized data in personal database 204 (i.e., second semantics / understanding that “Joe” is personal data), and generated the encoded utterance / prompt “set a reminder to call_MASK1_ at 7 pm” by masking “Joe” (i.e., one piece of the prompt message) based on the relationship between named entity “Joe” in the utterance and personalized data in the personal database 204.
By creating an encoded utterance / prompt including one or more masked entities (¶36), the ML models 205 / prompt generator uses a plurality of associated semantics corresponding to the masked entities.
Therefore, Kanungo discloses wherein the smart glasses system is further configured to generate, through a prompt generator, the at least one first prompt message based on the parsed semantics, wherein the prompt generator uses a plurality of associated semantics as one piece of prompt message according to a relationship between various semantics.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
(a) NOVELTY; PRIOR ART.—A person shall be entitled to a patent unless—
(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention; or
(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
PNG
media_image2.png
18
19
media_image2.png
Greyscale
(b) EXCEPTIONS.—
(1) DISCLOSURES MADE 1 YEAR OR LESS BEFORE THE EFFECTIVE FILING DATE OF THE CLAIMED INVENTION.—A disclosure made 1 year or less before the effective filing date of a claimed invention shall not be prior art to the claimed invention under subsection (a)(1) if—
(A) the disclosure was made by the inventor or joint inventor or by another who obtained the subject matter disclosed directly or indirectly from the inventor or a joint inventor; or
(B) the subject matter disclosed had, before such disclosure, been publicly disclosed by the inventor or a joint inventor or another who obtained the subject matter disclosed directly or indirectly from the inventor or a joint inventor.
(2) DISCLOSURES APPEARING IN APPLICATIONS AND PATENTS.—A disclosure shall not be prior art to a claimed invention under subsection (a)(2) if—
(A) the subject matter disclosed was obtained directly or indirectly from the inventor or a joint inventor;
(B) the subject matter disclosed had, before such subject matter was effectively filed under subsection (a)(2), been publicly disclosed by the inventor or a joint inventor or another who obtained the subject matter disclosed directly or indirectly from the inventor or a joint inventor; or
(C) the subject matter disclosed and the claimed invention, not later than the effective filing date of the claimed invention, were owned by the same person or subject to an obligation of assignment to the same person.
Claims 1-2 and 19 are rejected under 35 USC 102(a)(2) as being anticipated by Kanungo et al. (US 2024/0071376 A1).
Regarding Claim 1, Kanungo discloses a natural language command control system based on generative artificial intelligence large language model (GAILLM) (Fig. 1), comprising:
a smart glasses system (Fig. 1, electronic devices 101, 102; ¶15, smart glasses / wearable system comprising electronic devices 101, 102) and a model server (Fig. 1, server 106), wherein:
the model server is configured with the GAILLM (¶46, server 106 stores a trained language model; ¶63, the language model pretrained using world knowledge; ¶69, the language model being trained from fine tuning a Generative Pretrained Transformer model);
the smart glasses system is configured to obtain a first user speech (¶¶35-36, electronic device 101 / processor 120 receives and processes audio input / input utterances), perform a semantic parsing on the first user speech, generate at least one first prompt message based on the parsed semantics (¶36, perform delexicalization on the utterance using a named entity database to create an encoded utterance including one or more masked entities; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance), and send the at least one first prompt message to the model server (¶36, transmit encoded utterance to a server on which a language model is stored);
the model server is configured to obtain at least one task execution command through the GAILLM based on the at least one first prompt message from the smart glasses system, and send the at least one task execution command to the smart glasses system (¶38, ¶46, and ¶58, server 106 with language model processes the encoded utterances from device 101, and transmits corresponding intent, slots and instructions back to device 101); and
the smart glasses system is further configured to execute at least one action corresponding to the at least one task execution command (¶38 and ¶58, server 106 transmits instructions back to device 101 to execute functions associated with instructions included in the utterances; see ¶56, electronic device 101 / processor 120 creates an action plan for executing one or more actions performable by electronic device 101 based on the intent of the utterance and corresponding slots);
wherein the smart glasses system is further configured to:
receive the at least one task execution command sent by the model server (¶36, electronic device 101 transmits encoded utterance to the server and receive an intent and one or more slots associated with the utterance from the server; ¶58, server 106 transmits instructions to execute functions associated with instructions included in the utterances);
determine one or more executors of one or more tasks according to description information of each task or executor description information of each task in the at least one task execution command (¶36, upon receiving an intent and one or more slots associated with the utterance from the server, the processor 120 identifies a named entity corresponding to at least one slot using the named entity database; ¶57, e.g., for the utterance “start Dobby at 8 pm at 425 degrees”, the processor 120 in electronic device 101 receives an intent and slot tags (i.e., from the server) for the utterance to create an action plan that indicates the processor 120 should instruct the smart oven device “Dobby” to preheat to 425 degrees at 8 pm); and
distribute each task execution command to the determined one or more executors, so that each executor executes at least one action associated with the received task execution command (¶57, for “start Dobby at 8 pm at 425 degrees”, the processor 120 instructs smart oven device “Dobby” to preheat to 425 degrees at 8 pm), wherein the determined one or more executors comprise at least one of smart glasses (¶57, for “call mom” processor 120 instructs a phone application in smart glasses electronic device 101 to begin communication session with “mom” contact stored on the electronic device 101), a smart mobile terminal (¶15, electronic device 101 can be a mobile phone; ¶57, for “call mom” processor 120 instructs a phone application electronic device 101 to begin communication session with “mom” contact stored on the electronic device 101), and devices in an Internet of Things (IoT) (¶57, smart oven being IoT device per ¶51); and
wherein the smart glasses system is further configured to generate, through a prompt generator (¶49 and Fig. 2, using machine learning models 205 with a delexicalization and annotation model that delexicalizes, encodes, and masks portions of utterances using personal database 204 for further processing by MEM-BERT language model at the server), the at least one first prompt message based on the parsed semantics, wherein the prompt generator uses a plurality of associated semantics as one piece of prompt message according to a relationship between various semantics (¶36, processor 120 performs delexicalization on the utterance using a named entity database to create an encoded utterance (i.e., prompt) including one or more masked entities and transmit the encoded utterance to server to prompt the server’s language model; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance; e.g., ¶57, for “start Dobby at 8 pm at 425 degrees”, processor 120 (1) performs named entity recognition of “Dobby” in the utterance and (2) determines “Dobby” is the personalized name for the user’s smart oven device in order to create an encoded utterance / prompt using delexicalization; i.e., based on the relationship between named entity “Dobby” and a corresponding personal data in the personal database 204, create encoded utterance / prompt to mask at least one piece in the encoded utterance / prompt where “Dobby” is delexicalized based on the relationship).
Regarding claim 2, Kanungo discloses wherein the smart glasses system comprises the smart glasses, and the smart mobile terminal (Fig. 1, electronic devices 101, 102; ¶15, “electronic device” can be a mobile phone (i.e., electronic device 102 can be a mobile phone); ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses communicating with electronic device 102) and/or a prompt server (¶45, server 106 includes a group of one or more servers such that one can be the prompt server and the other the model server), the at least one task execution command comprises the description information of each task and the executor description information of each task (¶58, the server 106 transmits instructions back to execute functions associated with instructions included in the utterance; per ¶56, once the processor 120 / electronic device 101 received the intent of the utterance and the slots for items such as one or more named entities included in the utterance, construct an action plan to instruct at least one action of the electronic device 101 or another device or system that corresponds to one or more instructions or requests provided in the utterance; e.g., ¶57, server instruction for “call mom” includes intent for a phone application to perform calling mom corresponding to the slot for named entity “mom”; for “start Dobby at 8 pm at 425 degrees”, the server instruction includes intent to preheat IoT smart oven device according to degrees slot “425” and time slot “8 pm”), and the prompt generator is configured on the smart glasses (¶15, electronic device 101 being a wearable device / smart glasses; ¶36, electronic device 101 / processor 120 performs a function of delexicalization on the utterance to create the encoded utterances / prompt), the smart mobile terminal (¶45, electronic device 101 / smart glasses can request another electronic device (i.e., electronic devices 102 and 104) to perform the function), the model server (¶45, electronic device 101 / smart glasses can request one of the one or more servers to perform a function), or the prompt server (¶45, electronic device 101 / smart glasses can request one of the one or more servers to perform the function); and
wherein the smart glasses are configured to obtain the first user speech through a built-in microphone (¶¶35-36, electronic device 101 / processor 120 receives and processes audio input / input utterances from an input device like a microphone; ¶42, electronic device 101 with sensor 180 including one or more microphones), perform the semantic parsing on the first user speech (¶36, processor 120 / electronic device 101 receives and processes audio input through the microphone and perform delexicalization on the utterance using a named entity database stored on the electronic device 101 / memory 130; e.g., ¶57, processor 120 processes “start Dobby at 8 pm at 425 degrees” to determine from personal database 204 that “Dobby” is the personalized name for the user’s smart oven device), and generate, through the prompt generator, the at least one first prompt message based on the parsed semantics (¶36, create an encoded utterance including one or more masked entities; e.g., masked entity corresponding to “Dobby”).
Regarding Claim 19, Kanungo discloses a computer-implemented natural language command control method based on generative artificial intelligence large language model (GAILLM) (¶46 and ¶69, server 106 stores a GPT based language model), applied to a smart wearable device system (Fig. 1, ¶15, smart glasses / wearable system comprising electronic devices 101, 102), comprising:
obtaining a first user speech, performing a semantic parsing on the first user speech, and obtaining a parsing result (electronic device 101 / processor 120 receives and processes audio input / input utterances to perform delexicalization on the utterance using a named entity database and to create an encoded utterance; ¶38, ¶46, and ¶58, server 106 with language model processes the encoded utterances from device 101, and transmits corresponding intent, slots and instructions back to device 101);
obtaining at least one task execution command through the GAILLM based on the parsing result (¶58, server 106 processes the encoded input using the GPT based machine learning language model and transmit instructions back to electronic device 101 to execute functions associated with instructions included in the utterances); and
executing at least one action corresponding to the at least one task execution command (¶56, electronic device 101 / processor 120 creates an action plan for executing one or more actions performable by electronic device 101 based on the intent of the utterance and corresponding slots);
wherein the step of obtaining the at least one task execution command through the GAJLLM based on the parsing result comprises:
generating, through a prompt generator, at least one first prompt message based on the parsing result, wherein the prompt generator uses a plurality of associated semantics as one piece of prompt message according to a relationship between various semantics in the parsing results (¶36, processor 120 performs delexicalization on the utterance using a named entity database to create an encoded utterance (i.e., prompt) including one or more masked entities and transmit the encoded utterance to server to prompt the server’s language model; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance; e.g., ¶57, for “start Dobby at 8 pm at 425 degrees”, processor 120 determines (1) “Dobby” is the personalized name for (2) the user’s smart oven device to create an encoded utterance / prompt using delexicalization; per ¶45, electronic device 101 can request electronic device 102 to perform this task); and
obtaining the at least one task execution command through the GAILLM based on the at least one first prompt message (¶36, electronic device 101 transmits encoded utterance (i.e., LM prompt) to the server and receive an intent and one or more slots associated with the utterance from the server; ¶58, server 106 transmits instructions to execute functions associated with instructions included in the utterances); and
wherein the step of executing the at least one action corresponding to the at least one task execution command comprises:
determine one or more executors of one or more tasks according to description information of each task or executor description information of each task in the at least one task execution command (¶57, e.g., for the utterance “start Dobby at 8 pm at 425 degrees”, the processor 120 in electronic device 101 receives an intent and slot tags (i.e., received from the server) for the utterance to create an action plan that indicates the processor 120 should instruct the smart oven device “Dobby” to preheat to 425 degrees at 8 pm); and
distribute each task execution command to the determined one or more executors, so that each executor executes at least one action associated with the received task execution command (¶57, for “start Dobby at 8 pm at 425 degrees”, the processor 120 instructs smart oven device “Dobby” to preheat to 425 degrees at 8 pm), wherein the determined one or more executors comprise at least one of a smart wearable device (¶57, for “call mom” processor 120 instructs a phone application in smart glasses electronic device 101 to begin communication session with “mom” contact stored on the electronic device 101), a smart mobile terminal, and devices in an Internet of Thins (IoT) (¶57, smart oven being IoT device per ¶51).
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 3-4, 20-24, and 28-30 are rejected under 35 USC 103(a) as being unpatentable over Kanungo et al. (US 2024/0071376 A1) in view of Jung et al. (US 2024/0119930 A1).
Regarding claim 3, Kanungo discloses wherein the smart glasses system comprises the smart glasses and the smart mobile terminal (Fig. 1, electronic devices 101, 102; ¶15, “electronic device” can be a mobile phone (i.e., electronic device 102 can be a mobile phone); ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses communicating with electronic device 102), and the prompt generator is configured on the smart mobile terminal (¶36, transmit encoded utterance to a server on which a language model is stored; per ¶45, electronic device 101 can request electronic device 102 to perform this task); and
wherein the smart glasses are further configured to send the first user speech to the smart mobile terminal through a wireless network (¶36, electronic device 101 / processor 120 creates encoded utterance and transmit the encoded utterance to a server on which a language model is stored);
the smart mobile terminal is configured to convert the first user speech into a first text through a speech-to-text engine (¶49, electronic device 101 / processor 120 performs automated speech recognition tasks on utterances received; per ¶45, electronic device 101 can request electronic device 102 to perform this task), perform the semantic parsing on the first text, generate the at least one first prompt message based on the parsed semantics (¶36, perform delexicalization on the utterance using a named entity database to create an encoded utterance including one or more masked entities; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance; per ¶45, electronic device 101 can request electronic device 102 to perform this task), and send the at least one first prompt message to the model server (¶36, transmit encoded utterance to a server on which a language model is stored; per ¶45, electronic device 101 can request electronic device 102 to perform this task);
the model server is configured to obtain the at least one task execution command through the GAILLM based on the at least one first prompt message from the smart mobile terminal, and send the at least one task execution command to the smart mobile terminal (¶38, ¶46, and ¶58, server 106 with language model processes the encoded utterances from client device (i.e., electronic device 102), and transmits corresponding intent, slots and instructions back to client device (i.e., electronic device 102)); and
the smart mobile terminal is further configured to execute the at least one action corresponding to the at least one task execution command (¶38 and ¶58, server 106 transmits instructions back to client device (i.e., electronic device 102) to execute functions associated with instructions included in the utterances; see ¶56, electronic device 101 / processor 120 creates an action plan for executing one or more actions performable by electronic device 101 based on the intent of the utterance and corresponding slots; per ¶45, electronic device 101 can request electronic device 102 to perform this task).
Kanungo does not disclose the smart glasses sends the first user speech to the smart mobile terminal through a Bluetooth.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)) and a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other via Bluetooth (¶98).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to send first user speech from the smart wearable device / smart glasses to the smart mobile terminal through Bluetooth in order to transmit and receive data from external devices through wireless communication technologies (Jung, ¶97).
Regarding claim 4, Kanungo discloses wherein the smart mobile terminal is further configured to send, the first prompt messages, and appearance order of semantics corresponding to each of the first prompt messages in the first text, to the model server (¶52 and see Fig. 4B, in one example, create encoded and masked utterances for “if I call Mike in California at 6 pm” comprising delexicalized utterance “if I call_PERSON_in_LOCATION_at 6 pm”; ¶36, transmit encoded utterance to a server on which a language model is stored; per ¶45, electronic device 101 (i.e., smart glasses) can request electronic device 102 (i.e., mobile phone) to perform this task), or configured to send the first prompt messages to the model server one by one according to appearance order of semantics corresponding to each of the first prompt messages in the first text (Figs. 4A – 4C, creating respective encoded utterances for “set a reminder to call Joe at 7 pm”, “if I call Mike in California at 6 pm”, and “his time be set a meeting with Hongbin” for transmission to the server per ¶36);
the model server is further configured to: obtain the task execution commands through the GAJLLM based on the first prompt messages and the appearance order from the smart mobile terminal (¶75, trained language model annotates the encoded utterance with intent and slot tags), and send, the task execution commands, and execution order of each of the task execution commands, to the smart mobile terminal (¶38, ¶46, and ¶58, server 106 with language model processes the encoded utterances from client device (i.e., device 102), and transmits corresponding intent, slots and instructions back to client device (i.e., device 102)), wherein the execution order corresponds to the appearance order (¶76, create an action plan determining what actions to perform and under what parameters to perform the actions requested in the original user utterance 601;; e.g., ¶57, for “call mom”, create an action plan to instruct audio output device 208 to output “calling Mom” and to cause a phone application to begin a communication session with a “mom” contact stored on electronic device 101; ¶45, electronic device 101 can request server 106 to perform this task / action planning); and
the smart mobile terminal is further configured to execute actions corresponding to each of the task execution commands according to the execution order (¶45, request electronic device 102 (i.e., mobile phone) to perform the tasks; and per ¶56, using the action plan to instruct at least one action of the electronic device 101 or another device that corresponds to one or more instructions or requests provided in the utterance).
Regarding claim 20, Kanungo discloses wherein the smart wearable device system comprises the smart wearable device and the smart mobile terminal (Fig. 1, electronic devices 101, 102; ¶15, “electronic device” can be a mobile phone (i.e., electronic device 102 can be a mobile phone); ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses communicating with electronic device 102), the prompt generator is configured on the smart wearable device (¶15, electronic device 101 being a wearable device / smart glasses; ¶36, electronic device 101 / processor 120 performs a function of delexicalization on the utterance to create the encoded utterances / prompt), the GAILLM is configured on a model server (¶46, server 106 stores a trained language model; ¶63, the language model pretrained using word knowledge; ¶69, the language model being trained from fine tuning a Generative Pretrained Transformer model), and the steps of obtaining the first user speech, performing the semantic parsing on the first user speech and obtaining the parsing result, obtaining the at least one task execution command through the GAILLM based on the parsing result, and executing the at least one action corresponding to the at least one task execution command comprise:
obtaining, by the smart wearable device, the first user speech (¶¶35-36, electronic device 101 / processor 120 receives and processes audio input / input utterances from an input device like a microphone; ¶42, electronic device 101 with sensor 180 including one or more microphones), converting the first user speech into a first text through a speech-to-text engine (¶49, electronic device 101 / processor 120 performs automated speech recognition tasks on utterances received),
performing the semantic parsing on the first text, generating the at least one first prompt message based on the parsed semantics using the prompt generator (¶36, perform delexicalization on the utterance using a named entity database to create an encoded utterance including one or more masked entities; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance), and sending the at least one first prompt message to the smart mobile terminal through a wireless communication interface (¶36 and ¶41, processor 120 communicates with electronic device 102 via wireless communication interface 170);
sending, by the smart mobile terminal, the at least one first prompt message to the model server (¶36, transmit encoded utterance to a server on which a language model is stored; per ¶45, electronic device 101 can request electronic device 102 to perform this task), to obtain, through the GAILLM on the model server, the at least one task execution command based on the at least one first prompt message (¶38, ¶46, and ¶58, server 106 with language model processes the encoded utterances from client device (i.e., electronic device 102), and transmits corresponding intent, slots and instructions back to client device (i.e., electronic device 102)); and
receiving, by the smart mobile terminal, the at least one task execution command from the model server (¶38, ¶46, and ¶58, server 106 with language model processes the encoded utterances from client device (i.e., electronic device 102), and transmits corresponding intent, slots and instructions back to client device (i.e., electronic device 102)), and executing the at least one action corresponding to the at least one task execution command (¶38 and ¶58, server 106 transmits instructions back to client device (i.e., electronic device 102) to execute functions associated with instructions included in the utterances; see ¶56, electronic device 101 / processor 120 creates an action plan for executing one or more actions performable by electronic device 101 based on the intent of the utterance and corresponding slots; per ¶45, electronic device 101 can request electronic device 102 to perform this task).
Kanungo does not disclose send the at least one first prompt message to the smart mobile terminal through a Bluetooth.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)) and a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other via Bluetooth (¶98).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to transmit first prompt message from the smart wearable device to the smart mobile terminal through Bluetooth in order to transmit and receive data from external devices through wireless communication technologies (Jung, ¶97).
Regarding claim 21, Kanungo discloses wherein the smart wearable device system comprises the smart wearable device and the smart mobile terminal (Fig. 1, electronic devices 101, 102; ¶15, “electronic device” can be a mobile phone (i.e., electronic device 102 can be a mobile phone); ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses communicating with electronic device 102), the prompt generator is configured on the smart mobile terminal (¶45, electronic device 101 / smart glasses can request another electronic device (i.e., electronic devices 102 and 104) to perform the function), the GAILLM is configured on a model server (¶46, server 106 stores a trained language model; ¶63, the language model pretrained using word knowledge; ¶69, the language model being trained from fine tuning a Generative Pretrained Transformer model), and the steps of obtaining the first user speech, performing the semantic parsing on the first user speech and obtaining the parsing result, obtaining the at least one task execution command through the GAILLM based on the parsing result, and executing the at least one action corresponding to the at least one task execution command comprise:
obtaining, by the smart wearable device, the first user speech (¶¶35-36, electronic device 101 / processor 120 receives and processes audio input / input utterances from an input device like a microphone; ¶42, electronic device 101 with sensor 180 including one or more microphones), and sending the first user speech to the smart mobile terminal through a wireless interface (¶36 and ¶41, processor 120 communicates with electronic device 102 via wireless communication interface 170);
converting, by the smart mobile terminal, the first user speech into a first text through a speech-to-text engine (¶49, electronic device 101 / processor 120 performs automated speech recognition tasks on utterances received; per ¶45, electronic device 101 can request electronic device 102 to perform this task), performing the semantic parsing on the first text, generating the at least one first prompt message based on the parsed semantics using the prompt generator (¶36, perform delexicalization on the utterance using a named entity database to create an encoded utterance including one or more masked entities; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance; per ¶45, electronic device 101 can request electronic device 102 to perform this task), and sending the at least one first prompt message to the model server (¶36, transmit encoded utterance to a server on which a language model is stored; per ¶45, electronic device 101 can request electronic device 102 to perform this task), to obtain, through the GAILLM on the model server, the at least one task execution command based on the at least one first prompt message (¶38, ¶46, and ¶58, server 106 with language model processes the encoded utterances from client device (i.e., electronic device 102), and transmits corresponding intent, slots and instructions back to client device (i.e., electronic device 102)); and
receiving, by the smart mobile terminal, the at least one task execution command from the model server, and executing the at least one action corresponding to the at least one task execution command (¶38 and ¶58, server 106 transmits instructions back to client device (i.e., electronic device 102) to execute functions associated with instructions included in the utterances; see ¶56, electronic device 101 / processor 120 creates an action plan for executing one or more actions performable by electronic device 101 based on the intent of the utterance and corresponding slots; per ¶45, electronic device 101 can request electronic device 102 to perform this task).
Kanungo does not disclose send the first user speech to the smart mobile terminal through a Bluetooth.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)) and a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other via Bluetooth (¶98).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to send first user speech from the smart wearable device to the smart mobile terminal through Bluetooth in order to transmit and receive data from external devices through wireless communication technologies (Jung, ¶97).
Regarding claim 22, Kanungo discloses wherein the smart wearable device system comprises the smart wearable device (¶15 and ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses) and the smart mobile terminal (¶15, “electronic device” can be a mobile phone (i.e., electronic device 102 can be a mobile phone)), the GAILLM is configured on the smart mobile terminal (¶¶48-49 and Fig. 2, system 200 with machine learning models 205 can be used with electronic device 101 or any suitable devices (i.e., electronic device 102 / mobile phone)), and the steps of obtaining the first user speech, performing the semantic parsing on the first user speech and obtaining the parsing result, obtaining the at least one task execution command through the GAILLM based on the parsing result, and executing the at least one action corresponding to the at least one task execution command comprise:
obtaining, by the smart wearable device, the first user speech (¶42 and Fig. 1, electronic device 101 includes one or more sensors 180, which comprises one or more microphones), and sending the first user speech to the smart mobile terminal through a wireless network (¶36, electronic device 101 / processor 120 creates encoded utterance and transmit the encoded utterance to a server on which a language model is stored; in view of ¶45 and ¶48, electronic device 102 / mobile phone can implement system 200 with machine learning model and be requested to perform the function of the server); and
converting, by the smart mobile terminal, the first user speech into a first text through a speech-to-text engine (¶49, processes 205 can be accessed by processor 120 to perform one or more automated speech recognition tasks; per ¶45, electronic device 101 can request electronic device 102 / mobile phone to execute processes 205 / automated speech recognition tasks), performing the semantic parsing on the first text and obtaining a parsing result , obtaining the at least one task execution command through the GAILLM based on the parsing result (¶36, perform delexicalization on the utterance using a named entity database to create an encoded utterance including one or more masked entities; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance; per ¶45, electronic device 101 can request electronic device 102 to perform this task), and executing the at least one action corresponding to the at least one task execution command (¶56, electronic device 101 / processor 120 creates an action plan for executing one or more actions performable by electronic device 101 based on the intent of the utterance and corresponding slots; per ¶45, electronic device 101 can request electronic device 102 to perform this task where electronic device 102 is able to execute the requested functions and transfer a result of the execution to the electronic device 101).
Kanungo does not disclose send the first user speech to the smart mobile terminal through a Bluetooth.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)) and a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other via Bluetooth (¶98).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to send first user speech from the smart wearable device to the smart mobile terminal through Bluetooth in order to transmit and receive data from external devices through wireless communication technologies (Jung, ¶97).
Regarding claim 23, Kanungo discloses wherein the smart mobile terminal generates the first prompt messages based on the parsed semantics using the prompt generator, and sends, the first prompt messages, and appearance order of semantics corresponding to each of the first prompt messages in the first text, to the model server (¶52 and see Fig. 4B, in one example, create encoded and masked utterances for “if I call Mike in California at 6 pm” comprising delexicalized utterance “if I call_PERSON_in_LOCATION_at 6 pm”; ¶36, transmit encoded utterance to a server on which a language model is stored; per ¶45, electronic device 101 (i.e., smart glasses) can request electronic device 102 (i.e., mobile phone) to perform this task);
the model server obtains the task execution commands through the GAILLM based on the first prompt messages and the appearance order (¶75, trained language model annotates the encoded utterance with intent and slot tags), and sends, the task execution commands, and execution order of each of the task execution commands, to the smart mobile terminal (¶38, ¶46, and ¶58, server 106 with language model processes the encoded utterances from client device (i.e., device 102), and transmits corresponding intent, slots and instructions back to client device (i.e., device 102)), wherein the execution order corresponds to the appearance order (¶76, create an action plan determining what actions to perform and under what parameters to perform the actions requested in the original user utterance 601;; e.g., ¶57, for “call mom”, create an action plan to instruct audio output device 208 to output “calling Mom” and to cause a phone application to begin a communication session with a “mom” contact stored on electronic device 101; ¶45, electronic device 101 can request server 106 to perform this task / action planning); and
the step of receiving, by the smart mobile terminal, the at least one task execution command from the model server, and executing the at least one action corresponding to the at least one task execution command further comprises:
receiving, by the smart mobile terminal, the task execution commands and the execution order of each of the task execution commands from the model server, and executing actions corresponding to each of the task execution commands according to the execution order (¶45, request electronic device 102 (i.e., mobile phone) to perform the tasks; and per ¶56, using the action plan to instruct at least one action of the electronic device 101 or another device that corresponds to one or more instructions or requests provided in the utterance).
Regarding claim 24, Kanungo discloses wherein the smart mobile terminal generates the first prompt messages based on the parsed semantics using the prompt generator, and sends the first prompt messages one by one to the model server (Figs. 4A – 4C, creating respective encoded utterances / prompt for “set a reminder to call Joe at 7 pm”, “if I call Mike in California at 6 pm”, and “his time be set a meeting with Hongbin” for transmission to the server per ¶36; per ¶45, electronic device 101 (i.e., smart glasses) can request electronic device 102 (i.e., mobile phone) to perform this task).
Regarding claim 28, Kanungo discloses wherein the smart wearable device system comprises the smart wearable device (¶15 and ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses), the smart mobile terminal (¶15, “electronic device” can be a mobile phone (i.e., electronic device 102 can be a mobile phone)) and a prompt server (¶45, server 106 includes a group of one or more servers such that one can be the prompt server and the other the model server), the prompt generator is configured on the prompt server (per ¶45, electronic device 101 can request a server 106 in the group of one or more servers to perform the task of generating the prompt / encoded utterance in ¶36), the GAILLM is configured on the smart mobile terminal (¶¶48-49 and Fig. 2, system 200 with machine learning models 205 can be used with electronic device 101 or any suitable devices (i.e., electronic device 102 / mobile phone)), and the steps of obtaining the first user speech, performing the semantic parsing on the first user speech and obtaining the parsing result, obtaining the at least one task execution command through the GAILLM based on the parsing result, and executing the at least one action corresponding to the at least one task execution command comprise:
obtaining, by the smart wearable device, the first user speech (¶¶35-36, electronic device 101 / processor 120 receives and processes audio input / input utterances from an input device like a microphone; ¶42, electronic device 101 with sensor 180 including one or more microphones), and sending the first user speech to the prompt server through a wireless network (¶36 and ¶41, processor 120 communicates with server via wireless communication interface 170; ¶45, electronic device 101 / smart glasses can request one of the one or more servers to perform a function);
converting, by the prompt server, the first user speech into a first text using a speech-to-text engine (¶49, processes 205 can be accessed by processor 120 to perform one or more automated speech recognition tasks; ¶58, processes 205 can be stored on server 106; per ¶45, electronic device 101 can request server 106 to execute processes 205 / automated speech recognition tasks), performing the semantic parsing on the first text, generating the at least one first prompt message based on the parsed semantics using the prompt generator (¶36, perform delexicalization on the utterance using a named entity database to create an encoded utterance including one or more masked entities; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance; per ¶45, electronic device 101 can request server 106 to perform this task), and sending the at least one first prompt message to the smart wearable device through the wireless network (¶41, communication interface 170 sets up communication between electronic device 101 / smart glasses and external electronic device 102 / mobile phone and server 106; per ¶45, since electronic device 101 / smart glasses requested the one of the one or more servers to perform the function (i.e., speech to text), the one or more servers transmit the results of speech to text back to the electronic device 101 / smart glasses);
sending, by the smart wearable device, the at least one first prompt message to the smart mobile terminal through the wireless network (¶41, communication interface 170 sets up communication between electronic device 101 / smart glasses and external electronic device 102 / mobile phone and server 106; ¶45, electronic device 101 / smart glasses can request the electronic device 102 / mobile phone to perform a function); and
obtaining, by the smart mobile terminal, the at least one task execution command through the GAILLM based on the at least one first prompt message (see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance; per ¶45, electronic device 101 can request electronic device 102 to perform this task)), and executing the at least one action corresponding to the at least one task execution command (¶56, electronic device 101 / processor 120 creates an action plan for executing one or more actions performable by electronic device 101 based on the intent of the utterance and corresponding slots; per ¶45, electronic device 101 can request electronic device 102 to perform this task where electronic device 102 is able to execute the requested functions and transfer a result of the execution to the electronic device 101).
Kanungo does not disclose sending the first prompt message to the smart mobile terminal through a Bluetooth.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)) and a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other via Bluetooth (¶98), where a speech to text server configured to convert first user speech into a first text using a speech to text engine (Fig. 1, STT server 20 and see ¶58).
Kanungo noted that all or some of the operations executed on the electronic device 101 / smart glasses can be executed on another or multiple other electronic devices such as electronic devices 102 / smart mobile terminal or server 106 (¶45).
When implement a speech to text engine on the group of one or more servers constituting server 106 (Kanungo, ¶45) to convert first user speech into first text to generate the first prompt message based on semantic parsing and requesting electronic device 102 / smart mobile terminal configured with GAILLM to generate the task execution command, it would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement a Bluetooth type wireless network to facilitate transfer of first prompt message from electronic device 101 / smart glasses to electronic device 102 / smart mobile terminal in order to transmit and receive data from external devices through wireless communication technologies (Jung, ¶97).
Regarding claim 29, Kanungo discloses wherein the smart wearable device system comprises the smart wearable device (¶15 and ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses), the smart mobile terminal (¶15, “electronic device” can be a mobile phone (i.e., electronic device 102 can be a mobile phone)) and a prompt server (¶45, server 106 includes a group of one or more servers such that one can be the prompt server and the other the model server), the prompt generator is configured on the prompt server (per ¶45, electronic device 101 can request a server 106 in the group of one or more servers to perform the task of generating the prompt / encoded utterance in ¶36), the GAILLM is configured on a model server (¶36, ¶46, ¶57, server 106 (i.e., the group of one or more servers where one can be the prompt server and the other model server) uses trained GPT model to process encoded utterance, perform delexicalization and named entity identification to determine intent and slots associated with each utterance), and the steps of obtaining the first user speech, performing the semantic parsing on the first user speech and obtaining the parsing result, obtaining the at least one task execution command through the GAILLM based on the parsing result, and executing the at least one action corresponding to the at least one task execution command comprise:
obtaining, by the smart wearable device, the first user speech (¶¶35-36, electronic device 101 / processor 120 receives and processes audio input / input utterances from an input device like a microphone; ¶42, electronic device 101 with sensor 180 including one or more microphones), and sending the first user speech to the prompt server through a wireless network (¶36 and ¶41, processor 120 communicates with server via wireless communication interface 170; ¶45, electronic device 101 / smart glasses can request one of the one or more servers to perform a function);
converting, by the prompt server, the first user speech into a first text using a speech-to-text engine (¶49, processes 205 can be accessed by processor 120 to perform one or more automated speech recognition tasks; ¶58, processes 205 can be stored on server 106; per ¶45, electronic device 101 can request server 106 to execute processes 205 / automated speech recognition tasks), performing the semantic parsing on the first text, generating at least one first prompt message based on the parsed semantics using the prompt generator (¶36, perform delexicalization on the utterance using a named entity database to create an encoded utterance including one or more masked entities; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance; per ¶45, electronic device 101 can request server 106 to perform this task), and sending the at least one first prompt message to the smart wearable device through the wireless network (¶41, communication interface 170 sets up communication between electronic device 101 / smart glasses and external electronic device 102 / mobile phone and server 106; per ¶45, since electronic device 101 / smart glasses requested the one of the one or more servers to perform automated speech recognition, the server transmits the ASR results back to the electronic device 101 / smart glasses);
sending, by the smart wearable device, the at least one first prompt message to the smart mobile terminal through the wireless network (¶41, communication interface 170 sets up communication between electronic device 101 / smart glasses and external electronic device 102 / mobile phone and server 106; ¶45, electronic device 101 / smart glasses can request electronic device 102 / mobile phone to perform a function);
sending, by the smart mobile terminal, the at least one first prompt message to the model server (¶36, transmit encoded utterance to a server on which a language model is stored; per ¶45, electronic device 101 can request electronic device 102 and server 106 to perform this task; i.e., electronic device 101 can request electronic device 102 to transmit the encoded utterance to server 106), so that the model server obtains the at least one task execution command through the GAILLM based on the at least one first prompt message (¶38, ¶46, and ¶58, server 106 (the group of one or more servers) with language model processes the encoded utterances from client device (i.e., electronic device 102), and transmits corresponding intent, slots and instructions back to client device (i.e., electronic device 102)); and
receiving, by the smart mobile terminal, the at least one task execution command from the model server, and executing the at least one action corresponding to the at least one task execution command (¶38 and ¶58, server 106 transmits instructions back to client device (i.e., electronic device 102) to execute functions associated with instructions included in the utterances; see ¶56, electronic device 101 / processor 120 creates an action plan for executing one or more actions performable by electronic device 101 based on the intent of the utterance and corresponding slots; per ¶45, electronic device 101 can request electronic device 102 to perform this task).
Kanungo does not disclose sending the first prompt message to the smart mobile terminal through a Bluetooth.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)) and a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other via Bluetooth (¶98), where a speech to text server configured to convert first user speech into a first text using a speech to text engine (Fig. 1, STT server 20 and see ¶58).
Kanungo noted that all or some of the operations executed on the electronic device 101 / smart glasses can be executed on another or multiple other electronic devices such as electronic devices 102 / smart mobile terminal or server 106 (¶45).
When implement a speech to text engine on the group of one or more servers constituting server 106 (Kanungo, ¶45) to convert first user speech into first text to generate the first prompt message based on semantic parsing as requested by electronic device 101 / smart wearable device and requesting server 106 (the group of one or more servers with one being the prompt server and the other being the model server configured with GAILLM) configured with GAILLM to generate the task execution command, it would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement a Bluetooth type wireless network to facilitate transfer of first prompt message from electronic device 101 / smart glasses to electronic device 102 / smart mobile terminal in order to transmit and receive data from external devices through wireless communication technologies (Jung, ¶97).
Regarding claim 30, Kanungo discloses wherein the smart wearable device system comprises the smart wearable device (¶15 and ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses), the smart mobile terminal (¶15, “electronic device” can be a mobile phone (i.e., electronic device 102 can be a mobile phone)) and a prompt server (¶45, server 106 includes a group of one or more servers such that one can be the prompt server and the other the model server), the prompt generator is configured on the prompt server (per ¶45, electronic device 101 can request a server 106 in the group of one or more servers to perform the task of generating the prompt / encoded utterance in ¶36), the GAILLM is configured on a model server (¶36, ¶46, ¶57, server 106 (i.e., the group of one or more servers where one can be the prompt server and the other model server) uses trained GPT model to process encoded utterance, perform delexicalization and named entity identification to determine intent and slots associated with each utterance), and the steps of obtaining the first user speech, performing the semantic parsing on the first user speech and obtaining the parsing result, obtaining the at least one task execution command through the GAILLM based on the parsing result, and executing the at least one action corresponding to the at least one task execution command comprise:
obtaining, by the smart wearable device, the first user speech (¶¶35-36, electronic device 101 / processor 120 receives and processes audio input / input utterances from an input device like a microphone; ¶42, electronic device 101 with sensor 180 including one or more microphones), and sending the first user speech to the prompt server through a wireless network (¶36 and ¶41, processor 120 communicates with server via wireless communication interface 170; ¶45, electronic device 101 / smart glasses can request one of the one or more servers to perform a function);
converting, by the prompt server, the first user speech into a first text using a speech-to-text engine (¶49, processes 205 can be accessed by processor 120 to perform one or more automated speech recognition tasks; ¶58, processes 205 can be stored on server 106; per ¶45, electronic device 101 can request server 106 to execute processes 205 / automated speech recognition tasks), performing the semantic parsing on the first text, generating at least one first prompt message based on the parsed semantics using the prompt generator (¶36, perform delexicalization on the utterance using a named entity database to create an encoded utterance including one or more masked entities; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance; per ¶45, electronic device 101 can request server 106 to perform this task), and sending the at least one first prompt message to the smart wearable device through the wireless network (¶41, communication interface 170 sets up communication between electronic device 101 / smart glasses and external electronic device 102 / mobile phone and server 106; per ¶45, since electronic device 101 / smart glasses requested the one of the one or more servers to perform automated speech recognition, the server transmits the ASR results back to the electronic device 101 / smart glasses);
sending, by the smart wearable device, the at least one first prompt message to the model server through the wireless network (¶41, communication interface 170 sets up communication between electronic device 101 / smart glasses and external electronic device 102 / mobile phone and server 106; ¶45, electronic device 101 / smart glasses can request server 106 to perform a function), so that the model server obtains the at least one task execution command through the GAILLM based on the at least one first prompt message (¶38, ¶46, and ¶58, server 106 (the group of one or more servers) with language model processes the encoded utterances from client device (i.e., electronic device 101 / smart glasses), and transmits corresponding intent, slots and instructions back to client device (i.e., electronic device 101 / smart glasses));
receiving, by the smart wearable device, the at least one task execution command from the model server through the wireless network (¶38, ¶46, and ¶58, server 106 (the group of one or more servers) with language model processes the encoded utterances from client device (i.e., electronic device 101 / smart glasses), and transmits corresponding intent, slots and instructions back to client device (i.e., electronic device 101 / smart glasses), and sending the at least one task execution command to the smart mobile terminal through a wireless network (¶45, electronic device 101 / smart glasses can request electronic device 102 / mobile phone to perform a function); and
executing, by the smart mobile terminal, the at least one action corresponding to the at least one task execution command (¶56, electronic device 101 / processor 120 creates an action plan for executing one or more actions performable by electronic device 101 based on the intent of the utterance and corresponding slots; per ¶45, electronic device 101 can request electronic device 102 to perform this task).
Kanungo does not disclose smart wearable device sending the first prompt message to the smart mobile terminal through a Bluetooth.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)) and a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other via Bluetooth (¶98), where a speech to text server configured to convert first user speech into a first text using a speech to text engine (Fig. 1, STT server 20 and see ¶58).
Kanungo noted that all or some of the operations executed on the electronic device 101 / smart glasses can be executed on another or multiple other electronic devices such as electronic devices 102 / smart mobile terminal (¶45).
When electronic device 101 / smart wearable device requests electronic device 102 / mobile phone to perform or execute action corresponding to the task execution command, it would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement a Bluetooth type wireless network to facilitate transfer of task execution command from electronic device 101 / smart glasses to electronic device 102 / smart mobile terminal in order to transmit and receive data from external devices through wireless communication technologies (Jung, ¶97).
Claims 5-6, 8-9, and 25-27 are rejected under 35 USC 103(a) as being unpatentable over Kanungo et al. (US 2024/0071376 A1) in view of Jung et al. (US 2024/0119930 A1) as applied to claims 3 and 19, in further view of Pitschel et al. (US 9922642 B2).
Regarding claim 5, Kanungo discloses the smart mobile terminal is further configured to convert the text into a speech (¶57, processor 120 instructs audio output device 208 to output “calling Mom”; per ¶45, electronic device 101 / processor 120 (i.e., smart glasses) can request electronic device 102 (i.e., mobile phone) to perform this task), and send the speech to the smart glasses (¶57 and Fig. 2, request electronic device 102 (i.e., mobile phone) to instruct audio output device 208 (built into electronic device 101 / smart glasses) to output “calling Mom”);
the smart glasses are further configured to receive the speech through the wireless communication interface, play the speech through a built-in speaker (¶57, instruct audio output device 208 (built into electronic device 101 / smart glasses) to output “calling Mom”), obtain a second user speech through the microphone, and send the second user speech to the smart mobile terminal through the wireless communication interface (¶57, receive “start Dobby at 8 pm at 425 degrees” via audio input device 206 (per ¶36, a microphone input device) and request electronic device 102 (i.e., mobile phone) to process the utterance into encoded utterance per ¶45, which requires communicating the utterance to electronic device 102 via communication interface 170 per ¶¶43-44); and
the smart mobile terminal is further configured to convert the second user speech into a second text using the speech-to-text engine (¶45 and ¶49, processor 120 can request electronic device 102 to perform automated speech recognition tasks), perform a semantic parsing on the second text, generate, through the prompt generator, the second prompt message based on the parsed semantics in the second text (¶57, create an encoded utterance using delexicalization (see Fig. 4); ¶45, processor 120 can request the electronic device 102 to perform this function / create the encoded utterance or prompt), and send the second prompt message to the model server (¶36 and ¶58, request electronic device 102 (i.e., mobile phone) to transmit encoded utterance to server 106).
Kanungo does not disclose using a text-to-speech engine to convert the text into speech, send the speech to the smart glasses through the Bluetooth, and send the second user speech to the smart mobile terminal through the Bluetooth.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)) and a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other via Bluetooth (¶98), where the smart mobile terminal converts text into speech through a text to speech engine (¶196 and Fig. 5, processor 180 of AI device 10 includes speech synthesis engine 550).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to use a text to speech engine on a smart mobile terminal to convert text into speech, send the speech to the smart glasses through Bluetooth, and send the second user speech to the smart mobile terminal through the Bluetooth in order to transmit and receive data from external devices through wireless communication technologies (Jung, ¶97).
Kanungo does not disclose wherein the model server is further configured to determine, through the GAILLM, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message.
Pitschel teaches a system (Fig. 3A and Col 7, Rows 15-26, digital assistant system 300 comprising a client portion / user device 104 and server portion server system 106/108) comprising an AI language model server and a smart terminal / user device (Fig. 1 and see Col 4, Rows 4-17 and Rows 38-48, client-server digital assistant comprising a server storing AI models (Col 1, Rows 40-41 and Col 4, Rows 24-25, Col 16, Rows 26-31, trained AI language model)) wherein the AI language model server is further configured to determine, through the AI language model, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message (Col 9, Rows 49-56, speech to text processing module 330 uses language models to recognize speech input for conversion into sequence of words; Col 10, Rows 6-14, NLP module 332 associate words with actionable intents representing a task that can be performed by the digital assistant; Col 13, Rows 40-56, for “Make me a dinner reservation at a sushi place at 7”, identify actionable intent “restaurant reservation” and determine that the user utterance contain insufficient information to complete an associated structured query).
when there is no the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on the at least one first prompt message (Col 13, Row 64 – Col 14, Row 4 and Col 15, Rows 4-7, once task flow processor 336 has completed the structured query for an actionable intent, proceed to perform the ultimate task associated with the actionable intent);
when there is the information needs to be supplemented or confirmed, generate, through the AI language model, a text comprising prompt information of the information needs to be supplemented or confirmed (Col 14, Rows 1-21, initiate additional dialogue with the user to obtain additional information to complete the structured query), and send the text to the smart mobile terminal (Col 9, Rows 25-30 and 40-48, I/O processing module 328 interacts with user with user device 104 such as sending follow-up questions and receiving answers from the user; e.g., Col 14, Rows 25-33, actionable intent for restaurant reservation requires parameters “party size” and “date” and therefore generates questions “for how many people?” and “on which day?”);
receive a second prompt message from the smart mobile terminal (Col 14, Rows 33-37, once answers are received from the user, populate the structured query with the missing information), and determine, through the AI language model, whether there is the information needs to be supplemented or confirmed based on all received prompt messages (Col 15, Rows 54-60, iteratively obtain information to further clarify and refine user intent);
when there is no the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on all received prompt messages (Col 15, Rows 58-60, finally generate a response to fulfill the user’s intent); and
when there is the information needs to be supplemented or confirmed, return to the step of generating, through the AI language model, the text comprising the prompt information of the information needs to be supplemented or confirmed, and sending the text to the smart mobile terminal (Col 15, Rows 54-60, iteratively obtaining information to further clarify and refine the user intent using NLP 332, dialogue processor 334, and task flow processor 336; in view of Col 9, Rows 25-30 and 40-48, I/O processing module 328 interacts with user with user device 104 such as sending follow-up questions and receiving answers from the user).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to further configure the model server to determine, through the GAILLM (i.e., AI language model), whether there is information needs to be supplemented or confirmed based on the at least one first prompt message and second prompt message in order to clarify and refine user intent to finally generate a response to fulfill the user’s intent (Pitschel, Col 15, Rows 54-60) when there are needs to initiate dialogue with the user in order to obtain additional information and disambiguate potentially ambiguous utterances (Pitschel, Col 14, Rows 10-15).
Regarding claim 6, Kanungo discloses wherein the smart glasses system comprises the smart glasses (¶15 and ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses) and a prompt server (¶45, server 106 includes a group of one or more servers such that one can be the prompt server and the other the model server), the prompt generator is configured on the prompt server (per ¶45, electronic device 101 can request a server 106 in the group of one or more servers to perform the task of generating the prompt / encoded utterance in ¶36); and
wherein the smart glasses are configured to obtain the first user speech through a built-in microphone (¶42 and Fig. 1, electronic device 101 includes one or more sensors 180, which comprises one or more microphones), and send the first user speech to the prompt server (¶36, electronic device 101 / processor 120 creates encoded utterance and transmit the encoded utterance to a server on which a language model is stored);
the prompt server is configured to convert the first user speech into a first text (¶49, processes 205 can be accessed by processor 120 to perform one or more automated speech recognition tasks; ¶58, processes 205 can be stored on server 106; per ¶45, electronic device 101 can request server 106 to execute processes 205 / automated speech recognition tasks) and to perform the semantic parsing on the first text, generate, through the prompt generator, the at least one first prompt message based on the parsed semantics, and send the at least one first prompt message to the model server (¶36, ¶46, ¶57, server 106 (i.e., the group of one or more servers where one can be the prompt server and the other model server) uses trained GPT model to process encoded utterance, perform delexicalization and named entity identification to determine intent and slots associated with each utterance);
the prompt server is further configured to send one or more control instructions to at least one device in the IoT according to the at least one task execution command to control the at least one device to execute one or more actions specified by the at least one task execution command (¶¶57-58, e.g., server 106 receives audio input “call mom” from client device and transmit instructions back to the client device to execute functions associated with instructions included in “call mom” to cause a phone application to begin a communication session with “mom” contact stored on the electronic device 101; per 51, applications also include smart things application including Internet of Things devices), generate a corresponding text according to the prompt information for conversion to speech (¶57, instruction includes instructing audio output device 208 to output “calling Mom”), and the prompt server is further configured to send the speech to the smart glasses (¶¶57-58, server 106 transmits instructions comprising instruction to audio output device 208 to output “calling Mom” to processor 120 / electronic device 101 (i.e., smart glasses)), and the smart glasses are further configured to play the speech through a built-in speaker (per Fig. 2 and ¶50, audio output device 208 is a speaker build into electronic device 101 / smart glasses).
Kanungo does not disclose the control system further comprises a speech-to-text server and a text-to-speech server.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)), a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other and with a prompt / model server (Fig. 1, NLP server 30), a speech to text server configured to convert first user speech into a first text for transmission to the prompt / model server (Fig. 1, STT server 20 and see ¶58), and a text to speech server configured to convert corresponding text into a speech and send the speech to the prompt / model server (¶84, ¶¶89-90, and Fig. 1, Speech Synthesis Server 40 generates synthetic voice corresponding to given text data in which a user intention is reflected based on analysis information from NLP server 30).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement a speech to text server and a text to speech server as part of the group of one or more servers (i.e., prompt server and model server) of Kanungo (¶45) in order to generate synthetic voice corresponding to given text data based on analysis information from the prompt / model server (Jung, ¶84; compare Kanungo, ¶¶57-58, server 106 transmits instruction and synthetic voice “calling Mom” to smart glasses / electronic device 101 to instruct audio output device 208 / build-in speaker to output “calling Mom”).
Kanungo does not disclose wherein the model server is further configured to determine, through the GAILLM, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message.
Pitschel teaches a system (Fig. 3A and Col 7, Rows 15-26, digital assistant system 300 comprising a client portion / user device 104 and server portion server system 106/108) comprising a smart terminal device / user device, AI language model server comprising model server and prompt server (Fig. 1 and see Col 4, Rows 4-17 and Rows 38-48, client-server digital assistant comprising a server storing AI models (Col 1, Rows 40-41 and Col 4, Rows 24-25, Col 16, Rows 26-31, trained AI language model)) wherein the AI language model / prompt server is further configured to determine, through the AI language model, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message (Col 9, Rows 49-56, speech to text processing module 330 uses language models to recognize speech input for conversion into sequence of words; Col 10, Rows 6-14, NLP module 332 associate words with actionable intents representing a task that can be performed by the digital assistant; Col 13, Rows 40-56, for “Make me a dinner reservation at a sushi place at 7”, identify actionable intent “restaurant reservation” and determine that the user utterance contain insufficient information to complete an associated structured query);
when there is no the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on the at least one first prompt message and send the at least one task execution command to the prompt server (Col 13, Row 64 – Col 14, Row 4 and Col 15, Rows 4-7, once task flow processor 336 has completed the structured query for an actionable intent, proceed to perform the ultimate task associated with the actionable intent); and
when there is the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on the at least one first prompt message, and send, the at least one task execution command (Col 13, Rows 48-51, generate a partial structured query including parameters Cuisine = “Sushi” and Time = 7 pm), and prompt information of the information needs to be supplemented or confirmed, to the prompt server (Col 14, Rows 25-37, invoke dialogue flow processor 334 to determine “party size” and “date” information for the structured query is missing, dialogue flow processor 335 generates “For how many people?” and “on which day?”);
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to further configure the model server + prompt server to determine, through the GAILLM (i.e., AI language model), whether there is information needs to be supplemented or confirmed based on the at least one first prompt message and second prompt message in order to clarify and refine user intent to finally generate a response to fulfill the user’s intent (Pitschel, Col 15, Rows 54-60) when there are needs to initiate dialogue with the user in order to obtain additional information and disambiguate potentially ambiguous utterances (Pitschel, Col 14, Rows 10-15).
Regarding claim 8, Kanungo discloses wherein the smart glasses system comprises smart glasses (¶15 and ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses) and a smart mobile terminal (¶15, “electronic device” can be a mobile phone (i.e., electronic device 102 can be a mobile phone)), the control system further comprises a prompt server (¶45, server 106), the prompt server is configured with a speech-to-text engine (¶58, process 205 can be stored on server 106; in view of ¶45 and ¶49, electronic device 101 can request a server in the server group of server 106 to access process 205 to perform automated speech recognition tasks) and the prompt generator (per ¶45, electronic device 101 can request a server 106 in the group of one or more servers to perform the task of generating the prompt / encoded utterance in ¶36), and the smart mobile terminal is configured with a text-to-speech engine (¶57, instruct output device 208 on electronic device 102 to output “calling Mom”; compare Jung, ¶196 and Fig. 5, processor 180 of AI device 10 includes speech synthesis engine 550); and
wherein the smart glasses are configured to obtain the first user speech by a built-in microphone (¶42 and Fig. 1, electronic device 101 includes one or more sensors 180, which comprises one or more microphones), and send the first user speech to the prompt server through a wireless network (¶36, electronic device 101 / processor 120 creates encoded utterance and transmit the encoded utterance to a server on which a language model is stored);
the prompt server is configured to convert the first user speech into a first text through the speech-to-text engine (¶49, processes 205 can be accessed by processor 120 to perform one or more automated speech recognition tasks; ¶58, processes 205 can be stored on server 106; per ¶45, electronic device 101 can request server 106 to execute processes 205 / automated speech recognition tasks; compare Jung, Fig. 1, STT server 20 and see ¶58), perform the semantic parsing on the first text, generate, through the prompt generator, the at least one first prompt message based on the parsed semantics (¶36, perform delexicalization on the utterance using a named entity database to create an encoded utterance including one or more masked entities; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance; per ¶45, electronic device 101 can request server 106 to perform this task), and send the at least one first prompt message to the smart mobile terminal (¶36, transmit encoded utterance to a server on which a language model is stored; per ¶45, electronic device 101 can request electronic device 102 to perform this task, which requires server 106 to transmit the encoded utterance to electronic device 102);
the smart mobile terminal is configured to send the at least one first prompt message to the model server (¶36, transmit encoded utterance to a server on which a language model is stored; in view of ¶45, server 106 includes a group of one or more servers where one can be the prompt server and the other server can be the model server; per ¶45, electronic device 101 can request electronic device 102 to perform this task requires electronic device to send the encoded utterance to the other server / model server with the GPT language model);
the model server is configured to obtain the at least one task execution command through the GAILLM based on the at least one first prompt message from the smart mobile terminal, and send the at least one task execution command to the smart mobile terminal (¶38, ¶46, and ¶58, server 106 with language model processes the encoded utterances from client device (i.e., electronic device 102), and transmits corresponding intent, slots and instructions back to client device (i.e., electronic device 102)); and
the smart mobile terminal is further configured to execute the at least one action corresponding to the at least one task execution command (¶38 and ¶58, server 106 transmits instructions back to client device (i.e., electronic device 102) to execute functions associated with instructions included in the utterances; see ¶56, electronic device 101 / processor 120 creates an action plan for executing one or more actions performable by electronic device 101 based on the intent of the utterance and corresponding slots; per ¶45, electronic device 101 can request electronic device 102 to perform this task);
the smart mobile terminal is further configured to generate a corresponding text according to the prompt information, convert the corresponding text into a speech through the text-to-speech engine, and send the speech to the smart glasses (¶57, e.g., electronic device 102 can instruct electronic device 101 / smart glasses to output “calling Mom” via output device 208 by using synthesized voice such as the speech synthesis engine 550 on AI device 10 of Jung); and
the smart glasses are further configured to play the speech through a built-in speaker (¶57, electronic device 101 outputs “calling Mom” via audio output device 208).
Kanungo does not disclose wherein the model server is further configured to determine, through the GAILLM, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message.
Pitschel teaches a system (Fig. 3A and Col 7, Rows 15-26, digital assistant system 300 comprising a client portion / user device 104 and server portion server system 106/108) comprising an AI language model server and a smart terminal / user device (Fig. 1 and see Col 4, Rows 4-17 and Rows 38-48, client-server digital assistant comprising a server storing AI models (Col 1, Rows 40-41 and Col 4, Rows 24-25, Col 16, Rows 26-31, trained AI language model)) wherein the AI language model server is further configured to determine, through the AI language model, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message (Col 9, Rows 49-56, speech to text processing module 330 uses language models to recognize speech input for conversion into sequence of words; Col 10, Rows 6-14, NLP module 332 associate words with actionable intents representing a task that can be performed by the digital assistant; Col 13, Rows 40-56, for “Make me a dinner reservation at a sushi place at 7”, identify actionable intent “restaurant reservation” and determine that the user utterance contain insufficient information to complete an associated structured query);
when there is no the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on the at least one first prompt message (Col 13, Row 64 – Col 14, Row 4 and Col 15, Rows 4-7, once task flow processor 336 has completed the structured query for an actionable intent, proceed to perform the ultimate task associated with the actionable intent) and send the at least one task execution command to the smart mobile terminal (Col 18, Rows 23-40);
when there is the information needs to be supplemented or confirmed, generate, through the AI language model, a text comprising prompt information of the information needs to be supplemented or confirmed (Col 14, Rows 1-21, initiate additional dialogue with the user to obtain additional information to complete the structured query), and send the text to the smart mobile terminal (Col 9, Rows 25-30 and 40-48, I/O processing module 328 interacts with user with user device 104 such as sending follow-up questions and receiving answers from the user; e.g., Col 14, Rows 25-33, actionable intent for restaurant reservation requires parameters “party size” and “date” and therefore generates questions “for how many people?” and “on which day?”);
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to further configure the model server to determine, through the GAILLM (i.e., AI language model), whether there is information needs to be supplemented or confirmed based on the at least one first prompt message and second prompt message in order to clarify and refine user intent to finally generate a response to fulfill the user’s intent (Pitschel, Col 15, Rows 54-60) when there are needs to initiate dialogue with the user in order to obtain additional information and disambiguate potentially ambiguous utterances (Pitschel, Col 14, Rows 10-15).
Regarding claim 9, Kanungo discloses wherein the smart glasses system comprises the smart glasses (¶15 and ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses) and the smart mobile terminal (¶15, “electronic device” can be a mobile phone (i.e., electronic device 102 can be a mobile phone)), and the smart mobile terminal is configured with the GAILLM (¶¶48-49 and Fig. 2, system 200 with machine learning models 205 can be used with electronic device 101 or any suitable devices (i.e., electronic device 102 / mobile phone)); and
wherein the smart glasses are configured to obtain the first user speech through a built-in microphone (¶42 and Fig. 1, electronic device 101 includes one or more sensors 180, which comprises one or more microphones), and send the first user speech to the smart mobile terminal through a wireless network (¶36, electronic device 101 / processor 120 creates encoded utterance and transmit the encoded utterance to a server on which a language model is stored; in view of ¶45 and ¶48, electronic device 102 / mobile phone can implement system 200 with machine learning model and be requested to perform the function of the server);
the smart mobile terminal is configured to convert the first user speech into a first text through a speech-to-text engine (¶49, processes 205 can be accessed by processor 120 to perform one or more automated speech recognition tasks; per ¶45, electronic device 101 can request electronic device 102 / mobile phone to execute processes 205 / automated speech recognition tasks), perform the semantic parsing on the first text, obtain the at least one task execution command through the GAILLM based on the parsed semantics (¶36, perform delexicalization on the utterance using a named entity database to create an encoded utterance including one or more masked entities; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance; per ¶45, electronic device 101 can request electronic device 102 to perform this task), and execute the at least one action corresponding to the at least one task execution command (¶56, electronic device 101 / processor 120 creates an action plan for executing one or more actions performable by electronic device 101 based on the intent of the utterance and corresponding slots; per ¶45, electronic device 101 can request electronic device 102 to perform this task where electronic device 102 is able to execute the requested functions and transfer a result of the execution to the electronic device 101);
the smart glasses are further configured to receive the speech through the wireless interface (¶45, electronic device 102 / mobile phone transfers the result of the execution to the electronic device 101 / smart glasses), play the speech through a built-in speaker (¶57, instruct audio output device 208 (built into electronic device 101 / smart glasses) to output “calling Mom”), obtain a second user speech through the built-in microphone, and send the second user speech to the smart mobile terminal through the wireless network (¶57, receive “start Dobby at 8 pm at 425 degrees” via audio input device 206 (per ¶36, a microphone input device) and request electronic device 102 (i.e., mobile phone) to process the utterance into encoded utterance per ¶45, which requires communicating the utterance to electronic device 102 via communication interface 170 per ¶¶43-44).
the smart mobile terminal is further configured to convert the second user speech into a second text through the speech to text engine (¶45 and ¶49, processor 120 can request electronic device 102 to perform automated speech recognition tasks), perform a semantic parsing on the second text (¶57, create an encoded utterance using delexicalization (see Fig. 4)), and determine, through the GAILLM the at least one task execution command based on all parsed semantics (¶57, determine that “Dobby” is user’s smart oven device, create an encoded utterance using delexicalization to determine an intent and slot tags for the utterance, and create an action plan to instruct the smart oven device to preheat to 425 degrees at 8 pm; per ¶45, electronic device 101 can request electronic device 102 to perform this task).
Kanungo does not disclose send the first user speech to the smart mobile terminal through the Bluetooth, and send the second user speech to the smart mobile terminal through the Bluetooth.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)) and a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other via Bluetooth (¶98), where the smart mobile terminal converts text into speech through a text to speech engine (¶196 and Fig. 5, processor 180 of AI device 10 includes speech synthesis engine 550).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to send the first user speech to the smart mobile terminal through the Bluetooth, and send the second user speech to the smart mobile terminal through the Bluetooth in order to transmit and receive data from external devices through wireless communication technologies (Jung, ¶97).
Kanungo does not disclose determine, through the GAILLM, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message.
Pitschel teaches a system (Fig. 3A and Col 7, Rows 15-26, digital assistant system 300 comprising a client portion / user device 104 and server portion server system 106/108) comprising a smart terminal / user device (Fig. 1 and see Col 4, Rows 4-17 and Rows 38-48, client-server digital assistant comprising a server storing AI models (Col 1, Rows 40-41 and Col 4, Rows 24-25, Col 16, Rows 26-31, trained AI language model)) configured with AI language model (Col 5, Rows 1-6, implementation installed on a user device means digital assistant server with trained AI language model can be implemented on the user device) wherein the smart terminal is further configured to determine, through the AI language model, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message (Col 9, Rows 49-56, speech to text processing module 330 uses language models to recognize speech input for conversion into sequence of words; Col 10, Rows 6-14, NLP module 332 associate words with actionable intents representing a task that can be performed by the digital assistant; Col 13, Rows 40-56, for “Make me a dinner reservation at a sushi place at 7”, identify actionable intent “restaurant reservation” and determine that the user utterance contain insufficient information to complete an associated structured query).
when there is no the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on the at least one first prompt message (Col 13, Row 64 – Col 14, Row 4 and Col 15, Rows 4-7, once task flow processor 336 has completed the structured query for an actionable intent, proceed to perform the ultimate task associated with the actionable intent);
when there is the information needs to be supplemented or confirmed, generate, through the AI language model, a text comprising prompt information of the information needs to be supplemented or confirmed (Col 14, Rows 1-21, initiate additional dialogue with the user to obtain additional information to complete the structured query; Col 9, Rows 25-30 and 40-48, I/O processing module 328 interacts with user with user device 104 such as sending follow-up questions and receiving answers from the user; e.g., Col 14, Rows 25-33, actionable intent for restaurant reservation requires parameters “party size” and “date” and therefore generates questions “for how many people?” and “on which day?”);
receive a second user speech from the smart mobile terminal (Col 14, Rows 33-37, once answers are received from the user, populate the structured query with the missing information), and determine, through the AI language model, whether there is the information needs to be supplemented or confirmed based on all received prompt messages (Col 15, Rows 54-60, iteratively obtain information to further clarify and refine user intent);
when there is no the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on all received prompt messages (Col 15, Rows 58-60, finally generate a response to fulfill the user’s intent); and
when there is the information needs to be supplemented or confirmed, return to the step of generating, through the AI language model, the text comprising the prompt information of the information needs to be supplemented or confirmed (Col 15, Rows 54-60, iteratively obtaining information to further clarify and refine the user intent using NLP 332, dialogue processor 334, and task flow processor 336).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to further configure the model server to determine, through the GAILLM (i.e., AI language model), whether there is information needs to be supplemented or confirmed based on the at least the first user speech and the second user speech (which are user responses to questions on information needs to be supplemented or confirmed) and using the text-to-speech engine of Jung to output question text regarding information needs to be supplemented and confirmed (i.e., using speech synthesis engine to generate “for how many people?” and “on which day?”) in order to clarify and refine user intent to finally generate a response to fulfill the user’s intent (Pitschel, Col 15, Rows 54-60) when there are needs to initiate dialogue with the user in order to obtain additional information and disambiguate potentially ambiguous utterances (Pitschel, Col 14, Rows 10-15).
Regarding claim 25, Kanungo discloses wherein the method further comprises: receiving, by the smart mobile terminal, the text from the model server (¶58, server 106 transmits instructions back to the client device to execute functions associated with instructions included in the utterances), converting the text into a speech (¶57, processor 120 instructs audio output device 208 to output “calling Mom”; per ¶45, electronic device 101 / processor 120 (i.e., smart glasses) can request electronic device 102 (i.e., mobile phone) to perform this task), and sending the speech to the smart wearable device (¶57 and Fig. 2, request electronic device 102 (i.e., mobile phone) to instruct audio output device 208 (built into electronic device 101 / smart glasses) to output “calling Mom”);
playing, by the smart wearable device, the speech (¶57, instruct audio output device 208 (built into electronic device 101 / smart glasses) to output “calling Mom”), obtaining a second user speech, and sending the second user speech to the smart mobile terminal through the Bluetooth (¶57, receive “start Dobby at 8 pm at 425 degrees” via audio input device 206 (per ¶36, a microphone input device) and request electronic device 102 (i.e., mobile phone) to process the utterance into encoded utterance per ¶45, which requires communicating the utterance to electronic device 102 via communication interface 170 per ¶¶43-44); and
converting, by the smart mobile terminal, the second user speech into a second text using the speech-to-text engine (¶45 and ¶49, processor 120 can request electronic device 102 to perform automated speech recognition tasks), performing a semantic parsing on the second text, generating a second prompt message based on the parsed semantics in the second text by using the prompt generator (¶57, create an encoded utterance using delexicalization (see Fig. 4) by requesting electronic device 102 / mobile phone to perform this function per ¶45), and sending the second prompt message to the model server (¶36 and ¶58, request electronic device 102 (i.e., mobile phone) to transmit encoded utterance to server 106).
Kanungo does not disclose using a text-to-speech engine to convert the text into speech, send the speech to the smart glasses through the Bluetooth, and send the second user speech to the smart mobile terminal through the Bluetooth.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)) and a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other via Bluetooth (¶98), where the smart mobile terminal converts text into speech through a text to speech engine (¶196 and Fig. 5, processor 180 of AI device 10 includes speech synthesis engine 550).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to use a text to speech engine on a smart mobile terminal to convert text into speech, send the speech to the smart glasses through Bluetooth, and send the second user speech to the smart mobile terminal through the Bluetooth in order to transmit and receive data from external devices through wireless communication technologies (Jung, ¶97).
Kanungo does not disclose wherein the model server is further configured to determine, through the GAILLM, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message.
Pitschel teaches a system (Fig. 3A and Col 7, Rows 15-26, digital assistant system 300 comprising a client portion / user device 104 and server portion server system 106/108) comprising an AI language model server and a smart terminal / user device (Fig. 1 and see Col 4, Rows 4-17 and Rows 38-48, client-server digital assistant comprising a server storing AI models (Col 1, Rows 40-41 and Col 4, Rows 24-25, Col 16, Rows 26-31, trained AI language model)) wherein the AI language model server is further configured to determine, through the AI language model, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message (Col 9, Rows 49-56, speech to text processing module 330 uses language models to recognize speech input for conversion into sequence of words; Col 10, Rows 6-14, NLP module 332 associate words with actionable intents representing a task that can be performed by the digital assistant; Col 13, Rows 40-56, for “Make me a dinner reservation at a sushi place at 7”, identify actionable intent “restaurant reservation” and determine that the user utterance contain insufficient information to complete an associated structured query).
when there is no the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on the at least one first prompt message (Col 13, Row 64 – Col 14, Row 4 and Col 15, Rows 4-7, once task flow processor 336 has completed the structured query for an actionable intent, proceed to perform the ultimate task associated with the actionable intent);
when there is the information needs to be supplemented or confirmed, generate, through the AI language model, a text comprising prompt information of the information needs to be supplemented or confirmed (Col 14, Rows 1-21, initiate additional dialogue with the user to obtain additional information to complete the structured query; Col 9, Rows 25-30 and 40-48, I/O processing module 328 interacts with user with user device 104 such as sending follow-up questions and receiving answers from the user; e.g., Col 14, Rows 25-33, actionable intent for restaurant reservation requires parameters “party size” and “date” and therefore generates questions “for how many people?” and “on which day?”);
receive a second user speech from the smart mobile terminal (Col 14, Rows 33-37, once answers are received from the user, populate the structured query with the missing information), and determine, through the AI language model, whether there is the information needs to be supplemented or confirmed based on all received prompt messages (Col 15, Rows 54-60, iteratively obtain information to further clarify and refine user intent);
when there is no the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on all received prompt messages (Col 15, Rows 58-60, finally generate a response to fulfill the user’s intent); and
when there is the information needs to be supplemented or confirmed, return to the step of generating, through the AI language model, the text comprising the prompt information of the information needs to be supplemented or confirmed (Col 15, Rows 54-60, iteratively obtaining information to further clarify and refine the user intent using NLP 332, dialogue processor 334, and task flow processor 336).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to further configure the model server to determine, through the GAILLM (i.e., AI language model), whether there is information needs to be supplemented or confirmed based on the at least the first user speech and the second user speech (which are user responses to questions on information needs to be supplemented or confirmed) and using the text-to-speech engine of Jung to output question text regarding information needs to be supplemented and confirmed (i.e., using speech synthesis engine to generate “for how many people?” and “on which day?”) to the smart mobile terminal which in turn outputs the question text to the smart wearable device in order to clarify and refine user intent to finally generate a response to fulfill the user’s intent (Pitschel, Col 15, Rows 54-60) when there are needs to initiate dialogue with the user in order to obtain additional information and disambiguate potentially ambiguous utterances (Pitschel, Col 14, Rows 10-15).
Regarding claim 26, Kanungo discloses wherein the steps of performing the semantic parsing on the first user speech and obtaining the parsing result, obtaining the at least one task execution command through the GAILLM based on the parsing result, and executing the at least one action corresponding to the at least one task execution command comprise:
performing, by the smart mobile terminal, the semantic parsing on the first user speech and obtaining a first parsing result (¶¶48-49 and Fig. 2, system 200 with machine learning models 205 can be used with electronic device 101 or any suitable devices (i.e., electronic device 102 / mobile phone); ¶36, perform delexicalization on the utterance using a named entity database to create an encoded utterance including one or more masked entities; see ¶52 and Fig. 4, process input utterance to tag the named entity in the utterance; per ¶45, electronic device 101 can request electronic device 102 to perform this task);
converting the text into a speech (¶57, e.g., electronic device 102 can instruct electronic device 101 / smart glasses to output “calling Mom” via output device 208), and sending the speech to the smart wearable device through the wireless interface;
playing, by the smart wearable device, the speech (¶57, electronic device 101 outputs “calling Mom” via audio output device 208), obtaining a second user speech, and sending the second user speech to the smart mobile terminal through the wireless interface (¶57, receive “start Dobby at 8 pm at 425 degrees” via audio input device 206 (per ¶36, a microphone input device) and request electronic device 102 (i.e., mobile phone) to process the utterance into encoded utterance per ¶45, which requires communicating the utterance to electronic device 102 via communication interface 170 per ¶¶43-44);
converting, by the smart mobile terminal, the second user speech into a second text using a speech-to-text engine (¶45 and ¶49, processor 120 can request electronic device 102 to perform automated speech recognition tasks), performing a semantic parsing on the second user speech and obtaining a second parsing result (¶57, determine that “Dobby” is user’s smart oven device, create an encoded utterance using delexicalization to determine an intent and slot tags for the utterance, and create an action plan to instruct the smart oven device to preheat to 425 degrees at 8 pm; per ¶45, electronic device 101 can request electronic device 102 to perform this task).
Kanungo does not disclose send the speech to the smart mobile terminal through the Bluetooth, and send the second user speech to the smart mobile terminal through the Bluetooth.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)) and a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other via Bluetooth (¶98), where the smart mobile terminal converts text into speech through a text to speech engine (¶196 and Fig. 5, processor 180 of AI device 10 includes speech synthesis engine 550).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to send the speech to the smart mobile terminal through the Bluetooth, and send the second user speech to the smart mobile terminal through the Bluetooth in order to transmit and receive data from external devices through wireless communication technologies (Jung, ¶97).
Kanungo does not disclose determine, through the GAILLM, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message.
Pitschel teaches a system (Fig. 3A and Col 7, Rows 15-26, digital assistant system 300 comprising a client portion / user device 104 and server portion server system 106/108) comprising an AI language model implemented on a smart terminal / user device (Fig. 1 and see Col 4, Rows 4-17 and Rows 38-48, client-server digital assistant comprising a server storing AI models (Col 1, Rows 40-41 and Col 4, Rows 24-25, Col 16, Rows 26-31, trained AI language model); Col 5, Rows 1-6, implement digital assistant as a standalone application on a user device; per Col 4, Rows 38-42, user device 104 can be a smart phone) configured to determine, through the AI language model, whether there is information needs to be supplemented or confirmed based on the at least parsing result of first prompt message (Col 9, Rows 49-56, speech to text processing module 330 uses language models to recognize speech input for conversion into sequence of words; Col 10, Rows 6-14, NLP module 332 associate words with actionable intents representing a task that can be performed by the digital assistant; Col 13, Rows 40-56, for “Make me a dinner reservation at a sushi place at 7”, identify actionable intent “restaurant reservation” and determine that the user utterance contain insufficient information to complete an associated structured query).
when there is no the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on the at least the parsing result of first prompt message (Col 13, Row 64 – Col 14, Row 4 and Col 15, Rows 4-7, once task flow processor 336 has completed the structured query for an actionable intent, proceed to perform the ultimate task associated with the actionable intent);
when there is the information needs to be supplemented or confirmed, generate, through the AI language model, a text comprising prompt information of the information needs to be supplemented or confirmed (Col 14, Rows 1-21, initiate additional dialogue with the user to obtain additional information to complete the structured query; Col 9, Rows 25-30 and 40-48, I/O processing module 328 interacts with user with user device 104 such as sending follow-up questions and receiving answers from the user; e.g., Col 14, Rows 25-33, actionable intent for restaurant reservation requires parameters “party size” and “date” and therefore generates questions “for how many people?” and “on which day?”);
receive a second user speech from the smart mobile terminal (Col 14, Rows 33-37, once answers are received from the user, populate the structured query with the missing information), and determine, through the AI language model, whether there is the information needs to be supplemented or confirmed based on the first parsing result and a second parsing result based on semantic parsing of the second user speech (Col 15, Rows 54-60, iteratively obtain information to further clarify and refine user intent);
when there is no the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on the first parsing result and the second parsing result (Col 15, Rows 58-60, finally generate a response to fulfill the user’s intent); and
when there is the information needs to be supplemented or confirmed, return to the step of generating, through the AI language model, the text comprising the prompt information of the information needs to be supplemented or confirmed (Col 15, Rows 54-60, iteratively obtaining information to further clarify and refine the user intent using NLP 332, dialogue processor 334, and task flow processor 336).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to further configure the smart mobile terminal to determine, through the GAILLM (i.e., AI language model), whether there is information needs to be supplemented or confirmed based on the at least the first user speech and the second user speech (which are user responses to questions on information needs to be supplemented or confirmed) and using the text-to-speech engine of Jung to output question text regarding information needs to be supplemented and confirmed (i.e., using speech synthesis engine to generate “for how many people?” and “on which day?”) in order to clarify and refine user intent to finally generate a response to fulfill the user’s intent (Pitschel, Col 15, Rows 54-60) when there are needs to initiate dialogue with the user in order to obtain additional information and disambiguate potentially ambiguous utterances (Pitschel, Col 14, Rows 10-15).
Regarding claim 27, Kanungo discloses wherein the smart wearable device system comprises a smart wearable device (¶15 and ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses) and a prompt server (¶45, server 106 includes a group of one or more servers such that one can be the prompt server and the other the model server), the prompt generator is configured on the prompt server (per ¶45, electronic device 101 can request a server 106 in the group of one or more servers to perform the task of generating the prompt / encoded utterance in ¶36), and the steps of obtaining the first user speech, performing the semantic parsing on the first user speech and obtaining the parsing result, obtaining the at least one task execution command through the GAILLM based on the parsing result, and executing the at least one action corresponding to the at least one task execution command comprise:
obtaining, by the smart wearable device, the first user speech through a built-in microphone of the smart wearable device (¶¶35-36, electronic device 101 / processor 120 receives and processes audio input / input utterances from an input device like a microphone; ¶42, electronic device 101 with sensor 180 including one or more microphones), and sending the first user speech to the prompt server (¶36, electronic device 101 / processor 120 creates encoded utterance and transmit the encoded utterance to a server on which a language model is stored);
converting, by the prompt server, the first user speech into a first text through the speech-to-text server (¶49, processes 205 can be accessed by processor 120 to perform one or more automated speech recognition tasks; ¶58, processes 205 can be stored on server 106; per ¶45, electronic device 101 can request server 106 to execute processes 205 / automated speech recognition tasks);
performing, by the prompt server, the semantic parsing on the first text, generating the at least one first prompt message based on the parsed semantics using the prompt generator, and sending the at least one first prompt message to the model server (¶36, ¶46, ¶57, server 106 (i.e., the group of one or more servers where one can be the prompt server and the other model server) where the smart glasses / electronic device 101 requests a first server to perform the function of creating the encoded utterance and request another server to use trained GPT model to process encoded utterance, perform delexicalization and named entity identification to determine intent and slots associated with each utterance),
sending, by the prompt server, one or more control instructions to at least one device in the IoT according to the at least one task execution command to control the at least one device to execute one or more actions specified by the at least one task execution command (¶¶57-58, e.g., server 106 receives audio input “call mom” from client device and transmit instructions back to the client device to execute functions associated with instructions included in “call mom” to cause a phone application to begin a communication session with “mom” contact stored on the electronic device 101; per 51, applications also include smart things application including Internet of Things devices), generating a corresponding text according to the prompt information (¶57, instruction includes instructing audio output device 208 to output “calling Mom”), and
and the prompt server is further configured to send the speech to the smart wearable device (¶¶57-58, server 106 transmits instructions comprising instruction to audio output device 208 to output “calling Mom” to processor 120 / electronic device 101 (i.e., smart glasses)), and the smart wearable device are further configured to play the speech through a built-in speaker (per Fig. 2 and ¶50, audio output device 208 is a speaker build into electronic device 101 / smart glasses).
Kanungo does not disclose the control system further comprises a speech-to-text server and a text-to-speech server.
Jung teaches a smart wearable device system (Fig. 1, AI device 10) comprising a smart wearable device (¶50, glass type AI device (a smart glass)), a smart mobile terminal (¶296, AI device 10 supported like a mobile device) communicating with each other and with a prompt / model server (Fig. 1, NLP server 30), a speech to text server configured to convert first user speech into a first text for transmission to the prompt / model server (Fig. 1, STT server 20 and see ¶58), and a text to speech server configured to convert corresponding text into a speech and send the speech to the prompt / model server (¶84, ¶¶89-90, and Fig. 1, Speech Synthesis Server 40 generates synthetic voice corresponding to given text data in which a user intention is reflected based on analysis information from NLP server 30).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement a speech to text server to convert the first user speech transmitted from the prompt server into a first text and a text to speech server as part of the group of one or more servers (i.e., prompt server and model server) of Kanungo (¶45) in order to generate synthetic voice corresponding to given text data based on analysis information from the prompt / model server (Jung, ¶84; compare Kanungo, ¶¶57-58, server 106 transmits instruction and synthetic voice “calling Mom” to smart glasses / electronic device 101 to instruct audio output device 208 / build-in speaker to output “calling Mom”).
Kanungo does not disclose wherein the model server is further configured to determine, through the GAILLM, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message.
Pitschel teaches a system (Fig. 3A and Col 7, Rows 15-26, digital assistant system 300 comprising a client portion / user device 104 and server portion server system 106/108) comprising a smart terminal device / user device, AI language model server comprising model server and prompt server (Fig. 1 and see Col 4, Rows 4-17 and Rows 38-48, client-server digital assistant comprising a server storing AI models (Col 1, Rows 40-41 and Col 4, Rows 24-25, Col 16, Rows 26-31, trained AI language model)) wherein the AI language model / prompt server is further configured to determine, through the AI language model, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message (Col 9, Rows 49-56, speech to text processing module 330 uses language models to recognize speech input for conversion into sequence of words; Col 10, Rows 6-14, NLP module 332 associate words with actionable intents representing a task that can be performed by the digital assistant; Col 13, Rows 40-56, for “Make me a dinner reservation at a sushi place at 7”, identify actionable intent “restaurant reservation” and determine that the user utterance contain insufficient information to complete an associated structured query);
when there is no the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on the at least one first prompt message and send the at least one task execution command to the prompt server (Col 13, Row 64 – Col 14, Row 4 and Col 15, Rows 4-7, once task flow processor 336 has completed the structured query for an actionable intent, proceed to perform the ultimate task associated with the actionable intent); and
when there is the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on the at least one first prompt message, and send, the at least one task execution command (Col 13, Rows 48-51, generate a partial structured query including parameters Cuisine = “Sushi” and Time = 7 pm), and prompt information of the information needs to be supplemented or confirmed, to the prompt server (Col 14, Rows 25-37, invoke dialogue flow processor 334 to determine “party size” and “date” information for the structured query is missing, dialogue flow processor 335 generates “For how many people?” and “on which day?”);
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to further configure the model server + prompt server to determine, through the GAILLM (i.e., AI language model), whether there is information needs to be supplemented or confirmed based on the at least one first prompt message and send at least one task execution command through the GAILLM / AI language model and prompt information of the information that needs to be confirmed or supplemented to the prompt server in order to clarify and refine user intent to finally generate a response to fulfill the user’s intent (Pitschel, Col 15, Rows 54-60) when there are needs to initiate dialogue with the user in order to obtain additional information and disambiguate potentially ambiguous utterances (Pitschel, Col 14, Rows 10-15).
Claim 7 is rejected under 35 USC 103(a) as being unpatentable over Kanungo et al. (US 2024/0071376 A1) in view of Pitschel et al. (US 9922642 B2).
Regarding claim 7, Kanungo discloses wherein the smart glasses system comprises the smart glasses (¶15 and ¶43, electronic device 101 can be an augmented reality wearable device / eyeglasses), the smart mobile terminal (¶15, “electronic device” can be a mobile phone (i.e., electronic device 102 can be a mobile phone)) and a prompt server (¶45, server 106 includes a group of one or more servers such that one can be the prompt server and the other the model server); and
wherein the smart glasses are configured to obtain the first user speech by a built-in microphone (¶¶35-36, electronic device 101 / processor 120 receives and processes audio input / input utterances from an input device like a microphone; ¶42, electronic device 101 with sensor 180 including one or more microphones), and send the first user speech to the smart mobile terminal (¶36 and ¶41, processor 120 communicates with electronic device 102 via wireless communication interface 170);
the smart mobile terminal is configured to convert the first user speech into a first text (¶49, electronic device 101 / processor 120 performs automated speech recognition tasks on utterances received; per ¶45, electronic device 101 can request electronic device 102 to perform this task), and send the first text to the prompt server (¶36 and ¶41, processor 120 communicates with electronic device 102 via wireless communication interface 170; per ¶38, ¶46, and ¶58, server 106 with language model processes the encoded utterances from client device (i.e., device 102));
the prompt server is configured to perform the semantic parsing on the first text, generate, through the prompt generator, the at least one first prompt message based on the parsed semantics, and send the at least one first prompt message to the model server (¶36, ¶46, ¶57, server 106 (i.e., the group of one or more servers where one can be the prompt server and the other model server) uses trained GPT model to process encoded utterance, perform delexicalization and named entity identification to determine intent and slots associated with each utterance);
the prompt server is further configured to send one or more control instructions to at least one device in an IoT according to the at least one task execution command to control the at least one device to execute one or more actions specified by the at least one task execution command (¶¶57-58, e.g., server 106 receives audio input “call mom” from client device and transmit instructions back to the client device to execute functions associated with instructions included in “call mom” to cause a phone application to begin a communication session with “mom” contact stored on the electronic device 101; per 51, applications also include smart things application including Internet of Things devices), generate a corresponding text according to the prompt information (¶57, instruction includes instructing audio output device 208 to output “calling Mom”), and send the corresponding text to the smart mobile terminal (¶38, ¶46, and ¶58, server 106 with language model processes the encoded utterances from client device (i.e., device 102), and transmits corresponding intent, slots and instructions back to client device (i.e., device 102));
the smart mobile terminal is further configured to convert the corresponding text into a speech (¶57, for “call mom”, create an action plan to instruct audio output device 208 to output “calling Mom” and to cause a phone application to begin a communication session with a “mom” contact stored on electronic device 101; ¶45, electronic device 101 can request electronic device 102 / mobile phone to perform this task / action planning), and send the speech to the smart glasses (¶57 and Fig. 2, request electronic device 102 (i.e., mobile phone) to instruct audio output device 208 (built into electronic device 101 / smart glasses) to output “calling Mom”); and
the smart glasses are further configured to play the speech through a built-in speaker (¶57, instruct audio output device 208 (built into electronic device 101 / smart glasses) to output “calling Mom”).
Kanungo does not disclose wherein the model server is further configured to determine, through the GAILLM, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message.
Pitschel teaches a system (Fig. 3A and Col 7, Rows 15-26, digital assistant system 300 comprising a client portion / user device 104 and server portion server system 106/108) comprising a smart terminal device / user device, AI language model server comprising model server and prompt server (Fig. 1 and see Col 4, Rows 4-17 and Rows 38-48, client-server digital assistant comprising a server storing AI models (Col 1, Rows 40-41 and Col 4, Rows 24-25, Col 16, Rows 26-31, trained AI language model)) wherein the AI language model / prompt server is further configured to determine, through the AI language model, whether there is information needs to be supplemented or confirmed based on the at least one first prompt message (Col 9, Rows 49-56, speech to text processing module 330 uses language models to recognize speech input for conversion into sequence of words; Col 10, Rows 6-14, NLP module 332 associate words with actionable intents representing a task that can be performed by the digital assistant; Col 13, Rows 40-56, for “Make me a dinner reservation at a sushi place at 7”, identify actionable intent “restaurant reservation” and determine that the user utterance contain insufficient information to complete an associated structured query);
when there is no the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on the at least one first prompt message and send the at least one task execution command to the prompt server (Col 13, Row 64 – Col 14, Row 4 and Col 15, Rows 4-7, once task flow processor 336 has completed the structured query for an actionable intent, proceed to perform the ultimate task associated with the actionable intent); and
when there is the information needs to be supplemented or confirmed, obtain, through the AI language model, the at least one task execution command based on the at least one first prompt message, and send, the at least one task execution command (Col 13, Rows 48-51, generate a partial structured query including parameters Cuisine = “Sushi” and Time = 7 pm), and prompt information of the information needs to be supplemented or confirmed, to the prompt server (Col 14, Rows 25-37, invoke dialogue flow processor 334 to determine “party size” and “date” information for the structured query is missing, dialogue flow processor 335 generates “For how many people?” and “on which day?”);
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to further configure the model server + prompt server to determine, through the GAILLM (i.e., AI language model), whether there is information needs to be supplemented or confirmed based on the at least one first prompt message and second prompt message in order to clarify and refine user intent to finally generate a response to fulfill the user’s intent (Pitschel, Col 15, Rows 54-60) when there are needs to initiate dialogue with the user in order to obtain additional information and disambiguate potentially ambiguous utterances (Pitschel, Col 14, Rows 10-15).
Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor Hai Phan whose telephone number is 571-272-6338. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICHARD Z ZHU/Primary Examiner, Art Unit 2654 02/11/2026