DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed with the present application.
Should applicant desire to obtain the benefit of foreign priority under 35 U.S.C. 119(a)-(d) prior to declaration of an interference, a certified English translation of the foreign application must be submitted in reply to this action. 37 CFR 41.154(b) and 41.202(e).
Failure to provide a certified translation may result in no benefit being accorded for the non-English application.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on June 12 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
The abstract of the disclosure is objected to because it exceeds 150 words. A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).
Claim Objections
Claims 1, 13 and 19 are objected to because of the following informalities: The claims recite "analysis of the . Appropriate correction is required.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-2, 4-5, 13-14, 17-18 and 19-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by U.S. Patent Application Publication 2021/0097776 to Faulkner et al. (hereinafter, "Faulkner").
Regarding claims 1, 13 and 19, Faulkner teaches a system, method and computer-readable media comprising communication circuitry (paragraph [0054], "In some embodiments, the controller 110 is communicatively coupled with the display generation component 120 (e.g., an HMD, a display, a projector, a touch-screen, etc.) via one or more wired or wireless communication channels 144 (e.g., BLUETOOTH, IEEE 802.11x, IEEE 802.16x, IEEE 802.3x, etc.).");
memory storing one or more computer programs (paragraph [0006], "In some embodiments, the computer system has a graphical user interface (GUI), one or more processors, memory and one or more modules, programs or sets of instructions stored in the memory for performing multiple functions."); and
one or more processors communicatively coupled to the communication circuitry and the memory (paragraph [0006], "In some embodiments, the computer system has a graphical user interface (GUI), one or more processors, memory and one or more modules, programs or sets of instructions stored in the memory for performing multiple functions."),
wherein the one or more computer programs include computer-executable instructions that, when executed by the one or more processors individually or collectively, cause the electronic device to:
receive, from an external electronic device, information indicating detection of user's gaze on a specified virtual object displayed on a display of the external electronic device wearable on at least part of a user's body through the communication circuitry (paragraph [0091], "Once the device-specific and user-specific parameters are determined for the eye tracking device 130, images captured by the eye tracking cameras can be processed using a glint-assisted method to determine the current visual axis and point of gaze of the user with respect to the display, in accordance with some embodiments."),
receive information from analysis of the user's gaze (paragraph [0090], "The gaze tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyze the images to generate gaze tracking information, and communicate the gaze tracking information to the controller 110."),
receive, from the external electronic device, first information from analysis of the user's face (paragraph [0070], "In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the face of the user that includes the eyes of the user (and may be referred to as an eye-tracking camera)."), second information from analysis of the user' gesture (paragraph [0070], "In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the user's hand(s) and optionally arm(s) of the user (and may be referred to as a hand-tracking camera)."), or third information from analysis of whether the user started utterance (paragraph [0172], "In FIG. 7T, the computer system detects an input that corresponds to a request to activate a voice-based virtual assistant."), corresponding to a point in time at which the user's gaze is detected (paragraph [0172], “In some embodiments, the user optionally turns to look at a predefined location in the three-dimensional environment that corresponds to a home location of the voice-based virtual assistant, and/or provides an activation input (e.g., a tap input on the user's finger or a controller, a gaze input, etc.).”),
determine a user's intention to utter a voice command based on whether at least one of the first information, the second information, or the third information, and the information from analysis of the user's gaze satisfy a specified condition (paragraph [0173], "In FIGS. 7U and 7W, in response to detecting the input that corresponds to the request to activate the voice-based virtual assistant in the three-dimensional environment, the computer system displays a visual representation of the virtual assistant in the three-dimensional environment."), and
execute a voice recognition application stored in the memory upon determining that there is the intention to utter and control the voice recognition application to be in a state of being capable of receiving a voice command of the user (paragraph [0248], "In response to detecting the request to activate the voice-based virtual assistant (12006): the computer system activates the voice-based virtual assistant configured to receive voice commands (e.g., for interacting with the three-dimensional scene).").
Regarding claims 2, 14 and 20, Faulkner teaches a system, method and computer-readable media wherein the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors individually or collectively, cause the electronic device to:
receive audio data corresponding to a user's voice input from the external electronic device through the communication circuitry (paragraph [0259], "In some embodiments, the device processes the voice input to determine a user command for the voice assistant after the voice-based virtual assistant is activated, and provides the user command to the virtual assistant as input to trigger performance of a corresponding operation by the virtual assistant."), and
execute a command corresponding to the voice input through the voice recognition application (paragraph [0259], "In some embodiments, the device processes the voice input to determine a user command for the voice assistant after the voice-based virtual assistant is activated, and provides the user command to the virtual assistant as input to trigger performance of a corresponding operation by the virtual assistant."), and
wherein the voice input does not include a wake-up word (paragraph [0253], "For example, as the user looks around a room, the central region of the user's visual field is clear and surrounded by a purple vignette, the objects within the central region of the user's visual field is the target of the voice command or provides the context of the voice command detected by the voice-based virtual assistant (e.g., 'turn this on', or 'change this photo').").
Regarding claims 4 and 17, Faulkner teaches a system, method and computer-readable media wherein the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors individually or collectively, cause the electronic device to determine that there is the intention to utter in case the information from analysis of the user's gaze indicates that a dwell time of the user's gaze on the specified virtual object is equal to or longer than a specified time (paragraph [0254], "In some embodiments, the first criteria also include a criterion that is met when the gaze input meets preset gaze stability and duration thresholds."), and wherein the second information indicates a gesture for the specified virtual object (paragraph [0262], "In some embodiments, the visual representation of the voice-based virtual assistant has a predefined location in a three-dimensional environment that includes the first virtual object and the first physical object (e.g., the three-dimensional environment is an augmented reality environment), and the request to activate the voice-based virtual assistant includes an input (e.g., a gaze input, gesture input, or a combination of both) directed to the predefined location.").
Regarding claims 5 and 18, Faulkner teaches a system, method and computer-readable media wherein the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors individually or collectively, cause the electronic device to determine that there is the intention to utter in case the information from analysis of the user's gaze indicates that a dwell time of the user's gaze on the specified virtual object is equal to or longer than a specified time, and wherein the third information indicates that the user starts uttering (paragraph [0254], "In some embodiments, detecting the request to activate the voice-based virtual assistant includes detecting a gaze input that meets first criteria, wherein the first criteria include a criterion that is met when the gaze input is directed to a location corresponding to the visual representation of the voice-based virtual assistant in the three-dimensional scene (e.g., the virtual assistant is activated when the user gazes upon the visual representation of the virtual assistant). In some embodiments, the first criteria also include a criterion that is met when the gaze input meets preset gaze stability and duration thresholds. In some embodiments, the request to activate the voice-based virtual assistant includes a preset trigger command 'Hey, assistant!'").
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 3, 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Faulkner in view of U.S. Patent 11,614,794 to Mixter et al. (hereinafter, "Mixter").
Regarding claims 3 and 16, Faulkner teaches a system, method and media wherein the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors individually or collectively, cause the electronic device to determine that there is the intention to utter in case the information from analysis of the user's gaze indicates that a dwell time of the user's gaze on the specified virtual object is equal to or longer than a specified time (paragraph [0254], "In some embodiments, detecting the request to activate the voice-based virtual assistant includes detecting a gaze input that meets first criteria, wherein the first criteria include a criterion that is met when the gaze input is directed to a location corresponding to the visual representation of the voice-based virtual assistant in the three-dimensional scene (e.g., the virtual assistant is activated when the user gazes upon the visual representation of the virtual assistant). In some embodiments, the first criteria also include a criterion that is met when the gaze input meets preset gaze stability and duration thresholds.").
Faulkner does not explicitly teach “wherein the first information indicates a specified facial expression,” and thus, Mixter is introduced.
Mixter teaches the first information indicates a specified facial expression (column 22, line 57, "In some implementations, the system determines both mouth movement and a directed gaze have been detected based on detecting mouth movement and directed gaze co-occur or occur within a threshold temporal proximity of one another.").
Faulkner and Mixter are considered analogous because they are each concerned with user interactions with virtual assistants. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Faulkner with the teachings of Mixter for the purpose of improving user interaction response time. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 8, Faulkner teaches, at paragraph [0166], "The computer performs an operation corresponding to a currently selected user interface object in response to a gesture input for activating the user interface object detected while the gaze input is on the currently selected user interface object," but does not explicitly teach “The electronic device of claim 1, wherein the first information, the second information, and the third information include analyzed information based on inputted information within a specified time from a point in time at which user's gaze is detected;” however, Mixter teaches the first information, the second information, and the third information include analyzed information based on inputted information within a specified time from a point in time at which user's gaze is detected (column 2, line 17, "As one example, the automated assistant can be adapted in response to detecting mouth movement of a user (optionally for a threshold duration), detecting that the gaze of the user is directed at an assistant device (optionally for the same or different threshold duration), and optionally that the mouth movement and the directed gaze of the user co-occur or occur within a threshold temporal proximity of one another (e.g., within 0.5 seconds, within 1.0 seconds, or other threshold temporal proximity).").
Faulkner and Mixter are considered analogous because they are each concerned with user interactions with virtual assistants. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Faulkner with the teachings of Mixter for the purpose of improving user interaction response time. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Claims 6 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Faulkner in view of U.S. Patent 10,684,703 to Hindi et al. (hereinafter, "Hindi").
Regarding claim 6, Faulkner teaches the electronic device of claim 4, wherein the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors individually or collectively, cause the electronic device to:
determine that there is the intention to utter in case the second information indicates a first gesture for the specified virtual object (paragraph [0262], "In some embodiments, the visual representation of the voice-based virtual assistant has a predefined location in a three-dimensional environment that includes the first virtual object and the first physical object (e.g., the three-dimensional environment is an augmented reality environment), and the request to activate the voice-based virtual assistant includes an input (e.g., a gaze input, gesture input, or a combination of both) directed to the predefined location.").
Faulkner does not explicitly teach “transmit a request to ask about the intention to utter to the external electronic device,” or “determine the intention to utter according to a response received from the external electronic device in case the second information indicates a second gesture for the specified virtual object,” and thus, Hindi is introduced.
Hindi teaches [transmitting] a request to ask about the intention to utter to the external electronic device (column 40, line 57, "As described above, in order to complete a structured query, task flow processing module 736 needs to initiate additional dialogue with the user in order to obtain additional information, and/or disambiguate potentially ambiguous utterances. When such interactions are necessary, task flow processing module 736 invokes dialogue flow processing module 734 to engage in a dialogue with the user. In some examples, dialogue flow processing module 734 determines how (and/or when) to ask the user for the additional information and receives and processes the user responses. The questions are provided to and answers are received from the users through I/O processing module 728."); and
[determining] the intention to utter according to a response received from the external electronic device in case the second information indicates a second gesture for the specified virtual object (column 41, line 1, "In some examples, dialogue flow processing module 734 presents dialogue output to the user via audio and/or visual output, and receives input from the user via spoken or physical (e.g., clicking) responses.").
Faulkner and Hindi are considered analogous because they are each concerned with user interactions with virtual assistants. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Faulkner with the teachings of Hindi for the purpose of improving assistant response quality. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 11, Faulkner teaches the electronic device of claim 1, [wherein] the information from analysis of the user's gaze indicates that a dwell time of the user's gaze on the specified virtual object is equal to or longer than the specified time (paragraph [0254], "In some embodiments, the first criteria also include a criterion that is met when the gaze input meets preset gaze stability and duration thresholds.") but does not explicitly teach “transmit a request to ask about the intention to utter to the external electronic device,” or “determine the intention to utter according to a response received from the external electronic device.”
Hindi teaches [transmitting] a request to ask about the intention to utter to the external electronic device (column 40, line 61, "When such interactions are necessary, task flow processing module 736 invokes dialogue flow processing module 734 to engage in a dialogue with the user. In some examples, dialogue flow processing module 734 determines how (and/or when) to ask the user for the additional information and receives and processes the user responses."); and
[determining] the intention to utter according to a response received from the external electronic device (column 41, line 1, "In some examples, dialogue flow processing module 734 presents dialogue output to the user via audio and/or visual output, and receives input from the user via spoken or physical (e.g., clicking) responses.").
Faulkner and Hindi are considered analogous because they are each concerned with user interactions with virtual assistants. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Faulkner with the teachings of Hindi for the purpose of improving assistant response quality. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Faulkner in view of U.S. Patent 10,860,096 to Kelly et al. (hereinafter, "Kelly").
Regarding claim 7, Faulkner teaches the electronic device of claim 5, wherein the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors individually or collectively, cause the electronic device to determine that the third information does not satisfy the specified condition in case the third information indicates that a user other than the user starts uttering (paragraph [0259], "In some embodiments, if the gaze input does not meet the first criteria or the voice input does not meet the second criteria, the virtual assistant does not perform an operation that corresponds to the voice command in the voice input."), but does not explicitly teach “or that sound input to the external electronic device is noise,” and thus, Kelly is introduced.
Kelly teaches [determining] that the third information does not satisfy the specified condition in case the… sound input to the external electronic device is noise (column 20, line 55, "In some embodiments, event monitor 171 sends requests to the peripherals interface 118 at predetermined intervals. In response, peripherals interface 118 transmits event information. In other embodiments, peripherals interface 118 transmits event information only when there is a significant event (e.g., receiving an input above a predetermined noise threshold and/or for more than a predetermined duration).").
Faulkner and Kelly are considered analogous because they are each concerned with user interactions with virtual assistants. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Faulkner with the teachings of Kelly for the purpose of improving assistant response quality. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Claims 9-10, 12 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Faulkner in view of WIPO Publication 2018/117608 to Lee et al. (hereinafter, "Lee").
Regarding claim 9, Faulkner does not explicitly teach “The electronic device of claim 1, wherein the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors individually or collectively, cause the electronic device to provide a hint for the voice command based on context information related to the user through the voice recognition application, and wherein the context information includes at least one of a usage history of the user for the voice recognition application or the user’s gesture,” and thus, Lee is introduced.
Lee teaches [providing] a hint for the voice command based on context information related to the user through the voice recognition application (page 7, "If the reliability of all of the plurality of candidate speech intents is less than a predetermined value, the processor 120 may display a UI including the plurality of candidate speech intents so that the user may directly select the speech intents. In addition, the processor 120 may perform an operation corresponding to the speech intent selected by the user."), and
wherein the context information includes at least one of a usage history of the user for the voice recognition application or the user's gesture (page 6, "In addition, the processor 120 may use big data and user-specific history data for speech recognition and reliability measurement.").
Faulkner and Lee are considered analogous because they are each concerned with user interactions with virtual assistants. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Faulkner with the teachings of Lee for the purpose of improving assistant response quality. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claim 10, Lee further teaches the electronic device of claim 9, wherein the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors individually or collectively, cause the electronic device to provide a command to execute at least one function as the hint in a natural language form based on the user's gesture toward a virtual object representing an application supporting the at least one function (page 13, "When the reliability of the predicted plurality of speech intents is determined to be less than a predetermined value, the processor 120 may display the plurality of speech intents and receive a user selection. As shown in the second drawing from the left of FIG. 10, the processor 120 may select a list UI that selects 'weather search', 'navigation', and 'city information' corresponding to a plurality of speech intents associated with the entity name 'Seoul.'").
Regarding claims 12 and 15, Faulkner does not explicitly teach a system or method “wherein the one or more computer programs further include computer-executable instructions that, when executed by the one or more processors individually or collectively, cause the electronic device to construct and train an intention to utter determination model using analysis information including the first information, the second information, and the third information and information on whether the voice recognition application is actually used,” however, Lee teaches [constructing] and [training] an intention to utter determination model using analysis information including the […] third information and information on whether the voice recognition application is actually used (page 7, "The data learner 121 may learn criteria for speech recognition, language understanding, and user's speech intent determination… The data learner 121 acquires data to be used for learning, and applies the acquired data to a data recognition model to be described later to learn criteria for determining speech recognition and user speech intent… The data recognizer 122 recognizes a situation from predetermined data by using the learned data recognition model. can do. The data recognizer 122 may obtain predetermined data according to a predetermined criterion by learning, and use the data recognition model by using the acquired data as an input value. For example, the data recognizer122 may recognize the input user voice by using the learned acoustic model and the language model. The data recognizer122 may determine the user's intention to speak based on the recognized user voice. The data recognition unit 122 may update the data recognition model by using the data acquired as the voice recognition and speech intention result values for each user as input values again.").
Faulkner and Lee are considered analogous because they are each concerned with user interactions with virtual assistants. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have used the training method of Lee with the first, second and third information of Faulkner for the purpose of improving assistant response quality. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
U.S. Patent Application Publication 2023/0055477 to Mohajer et al.
U.S. Patent Application Publication 2020/0103980 to Katz et al.
U.S. Patent Application Publication 2019/0187787 to White et al.
U.S. Patent Application Publication 2016/0162020 to Lehman et al.
U.S. Patent Application Publication 2015/0187357 to Xia et al.
U.S. Patent Application Publication 2014/0354533 to Swaminathan.
U.S. Patent 11,393,491 to Han et al.
U.S. Patent 10,061,352 to Trail.
U.S. Patent 8,482,527 to Kim.
Korean Publication 10-2015-0066882 to Lee et al.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN T SMITH whose telephone number is (571)272-6643. The examiner can normally be reached Monday - Friday 8:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, PIERRE-LOUIS DESIR can be reached at (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SEAN THOMAS SMITH/Examiner, Art Unit 2659
/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659