DETAILED ACTION
1. This communication is in response to the Amendments and Arguments filed on 12/18/2025. Claims 1-19 are pending and have been examined.
Response to Amendments and Arguments
2. 35 USC 112(f) interpretation for claims 1-17 is withdrawn because the claims are amended to remove the generic placeholders “section.”
35 USC 101 rejection of claim 19 for computer program or software per se is withdrawn because the claim is amended to recite a “non-transitory computer-readable storage medium.”
35 USC 101 (abstract idea) rejections of claims 1-7, 11, 13-19 are maintained because the amended claims are still considered as involving a mental process, where no specific display device (such as a transmissive display device) is recited for the amended limitation ”wherein the circuitry executes the process of displaying the character information according to the determination result regarding the conveyance information and a detected line-of-sight of at least one of the speaker or the receiver.” As such, “the process of displaying” is subject to BRI.
With respect to 35 USC 103 rejections, the applicant’s amendments and arguments are carefully considered, but they are not persuasive. In particular, the applicant argues that the references do not teach all limitations of the whole independent claims -- “acquire character information obtained by converting utterance of a speaker into characters by speech recognition, determine presence or absence of a conveyance intention that the speaker tries to convey speaker's own utterance content to a receiver using the character information based on a state of the speaker, execute a process of displaying the character information on display devices used by respective ones of the speaker and the receiver, and execute a process of presenting a determination result regarding the conveyance intention to at least one of the speaker or the receiver, wherein the circuitry executes the process of displaying the character information according to the determination result regarding the conveyance information and a detected line-of-sight of at least one of the speaker or the receiver.” In response, the examiner respectfully disagrees.
Note that NAPOLITANO teaches: [0004] “perform natural language processing on the spoken user input to infer the user's intent; [0078], speech-to-text (STT) .. automatic speech recognition (ASR) <read on ‘converting utterance of a speaker into characters’> ..”
JP teaches: [0004] “a closed captioning controller configured to display closed captioning text <read on ‘displaying character information’ such as the converted speaker’s utterance> .. an eye tracking device configured to detect the location of the line of sight .. The closed captioning controller can be configured to recognize a predetermined line-of-sight pattern of the user's line of sight <read on ‘a state of the speaker’> and, upon detecting the predetermined line-of-sight pattern, partially or completely de-emphasize the display of the closed captioning text <read on ‘presenting a determination result’ or not – (without de-emphasizing or with de-emphasizing the converted speaker’s utterance) based on the ‘state of the speaker’ which is the user's line of sight>.”
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
3. Claims 1-7, 11, 13-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claims 1, 18, 19 recite an apparatus, a method and a computer readable storage medium thus relating to a statutory category.
Claims 1, 18, 19 further recite “converting utterance of a speaker into characters by speech recognition .. determines presence or absence of a conveyance intention .. based on a state of the speaker .. displaying the character information .. presenting a determination result ..” The limitations as drafted cover mental process. More specifically, a human can mentally convert voice input to text, detect a user’s line-of-sight to confirm the user’s intention, and mentally present the converted text and the confirmed intention. Note that the recited “a state of the speaker” without specific description is subject to BRI.
This judicial exception is not integrated into a practical application. In particular, independent claims 1, 18, 19 recite additional elements of “An information processing apparatus” and “a computer system” which amount to general purpose computing devices – see SPECIFICATION Figs. 3 and 4. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Claims 1, 18, 19 do not include additional elements that are sufficient to amount to significantly more than the judicial exception. Therefore, the claimed limitations are directed towards insignificant solution activity. The claim is not patent eligible.
With respect to dependent claim 2, the claim further recites “generates notification data for notifying .. that the conveyance intention is absent ..” where a human can base on a speaker’s state (such as line-of-sight) and mentally determine if the user’s intention is present or not and generate a notification. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 3, the claim further recites “the notification data includes at least one of visual data, haptic data, and sound data” where a human can generate any select notification signal such as a text notification. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 4, the claim further recites “detects a line-of-sight of the speaker .. determines whether or not the line-of-sight of the speaker is deviated from an area in which the character information is displayed .. starts a determination process of the conveyance intention ..” where a human can detect/determine if the line-of-sight deviates from a certain area. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 5, the claim further recites “executes the determination process of the conveyance intention based on at least one of the line-of-sight of the speaker, a speech speed of the speaker, a volume of the speaker, a head direction of the speaker, or a hand position of the speaker ..” where a human can detect/determine any of the above mentioned signals. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 6, the claim further recites “determines that the conveyance intention is absent if a state that the line-of-sight of the speaker is deviated from the area in which the character information is displayed continues for a predetermined time ..” where a human can detect/determine if the line-of-sight stays deviated for a certain length of time. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 7, the claim further recites “executes the determination process of the conveyance intention based on the line-of-sight of the speaker and a line-of-sight of the receiver ..” where a human can detect/determine the line-of-sight of the speaker and the receiver. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 8, the claim further recites “making a field of view of the speaker difficult to look if the line-of-sight of the speaker deviates from the area in which the character information is displayed ..” A human is not able to mentally make a field of view of the speaker difficult to look. The claim is considered including additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 9, the claim further recites “sets a speed at which the field of view of the speaker is difficult to look based on at least one of reliability of the speech recognition, the speech speed of the speaker, a motion tendency of the line-of-sight of the speaker, or a noise level around the speaker ..” A human is not able to mentally to set a speed at which the field of view of the speaker is difficult to look. The claim is considered including additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 10, the claim further recites “decreasing transparency of at least a part of the transmissive display device or a process of displaying an object that blocks the field of view of the speaker on the transmissive display device, as the process of making the field of view of the speaker difficult to look ..” A human is not able to mentally perform these operations. The claim is considered including additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 11, the claim further recites “cancels the process of making the field of view of the speaker difficult to look if the line-of-sight of the speaker returns to the area in which the character information is displayed ..” where a human can cancel any operation if the line-of-sight returns from deviation from a certain area. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 12, the claim further recites “displays the character information so as to intersect the line-of-sight of the speaker in the display device used by the speaker if it determines that the conveyance intention is absent ..” A human is not able to mentally display the character information so as to intersect the line-of-sight of the speaker. The claim is considered including additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 13, the claim further recites “executes a suppression process regarding the speech recognition if it determines that the conveyance intention is absent ..” where a human can cancel any operation under any condition. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 14, the claim further recites “stops a speech recognition process or stops the process of displaying the character information .. as the suppression process ..” where a human can stop any operation under any condition. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 15, the claim further recites “presents to at least the receiver that the conveyance intention is present if it determines that the conveyance intention is present ..” where a human can display/present any information with generic display device. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 16, the claim further recites “generates dummy information that the speaker appears to be uttering even if a speech of the speaker is absent, wherein the control section displays the dummy information on the display device used by the receiver until the character information indicating the utterance content of the speaker is acquired by the speech recognition during a period in which it determines that the conveyance intention is present ..” where a human can display any dummy information with generic display device. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to dependent claim 17, the claim further recites “wherein the dummy information includes at least one of information of a dummy effect that the speaker appears to be uttering or information of a dummy character string that the character information appears to be outputted ..” where a human can generate any dummy information. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Allowable Subject Matter
4. With respect to 35 USC 103 rejections, claims 8-12, 16-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. However, claims 11, 16-17 must also overcome 35 USC 101 abstract idea rejections.
Claim Rejections - 35 USC § 103
5. Claims 1-7, 13-15, 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Napolitano, et al. (US 20170068423; hereinafter NAPOLITANO) in view of JP, et al. (JP2017517045A; hereinafter JP).
As per claim 1, NAPOLITANO (Title: Intelligent automated assistant in a media environment) discloses “An information processing apparatus, comprising: circuitry configured to acquire character information obtained by converting utterance of a speaker into characters by speech recognition (NAPOLITANO, [0004], The virtual assistant can perform natural language processing on the spoken user input to infer the user's intent and operationalize the user's intent into tasks; [0078], speech-to-text (STT) .. automatic speech recognition (ASR) <read on ‘converting utterance of a speaker into characters’>);
[ determine presence or absence of a conveyance intention that the speaker tries to convey speaker’s own utterance content to a receiver using the character information based on a state of the speaker ];
execute a process of [ displaying the character information on display devices ] used by respective ones of the speaker and the receiver, and execute a process of [ presenting a determination result regarding the conveyance intention to at least one of the speaker and the receiver; wherein the circuitry executes the process of displaying the character information according to the determination result regarding the conveyance information and a detected line-of-sight of at least one of the speaker or the receiver ].”
NAPOLITANO does not explicitly disclose “determines presence or absence of a conveyance intention .. based on a state of the speaker .. displaying the character information on display devices .. presenting a determination result regarding the conveyance intention to at least one of the speaker and the receiver; wherein the circuitry executes the process of displaying the character information according to the determination result regarding the conveyance information and a detected line-of-sight of at least one of the speaker or the receiver.” However, this limitation is taught by JP (Title: Smart closed captioning with eye tracking).
In the same field of endeavor, JP teaches: [0004] “a closed captioning controller configured to display closed captioning text <read on to display any character information> .. an eye tracking device configured to detect the location of the line of sight .. The closed captioning controller can be configured to recognize a predetermined line-of-sight pattern of the user's line of sight <read on ‘a state of the speaker’> and, upon detecting the predetermined line-of-sight pattern, partially or completely de-emphasize the display of the closed captioning text <read on ‘presenting a determination result’ or not based on the ‘state of the speaker’ which is the user's line of sight>.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of JP in the system (as taught by NAPOLITANO) to detect a speaker’s line-of-sight to determine if the speaker’s intent is present or not.
As per claim 2 (dependent on claim 1), NAPOLITANO in view of JP further discloses “wherein the circuitry generates notification data for notifying the at least one of the speaker or the receiver that the conveyance intention is absent if it determines that the conveyance intention is absent (JP, [0004], The closed captioning controller .. upon detecting the predetermined line-of-sight pattern, partially or completely de-emphasize the display of the closed captioning text <read on ‘notifying .. the conveyance intention is absent’>).”
As per claim 3 (dependent on claim 2), NAPOLITANO in view of JP further discloses “wherein the notification data includes at least one of visual data, haptic data, or sound data. (See Claim 2 <read on ‘notification data .. visual data’>).”
As per claim 4 (dependent on claim 1), NAPOLITANO in view of JP further discloses “detects the line-of-sight of the speaker and determine whether or not the line-of-sight of the speaker is deviated from an area in which the character information is displayed in the display device used by the speaker based on a detection result of the line-of-sight of the speaker, and wherein the circuitry starts a determination process of the conveyance intention if the line-of-sight of the speaker deviates from the area in which the character information is displayed (See Claim 1. JP, [0004], an eye tracking device configured to detect the location of the line of sight <read on ‘if the line-of-sight of the speaker deviates from an area’>).”
As per claim 5 (dependent on claim 4), NAPOLITANO in view of JP further discloses “wherein the circuitry executes the determination process of the conveyance intention based on at least one of the line-of-sight of the speaker, a speech speed of the speaker, a volume of the speaker, a head direction of the speaker, or a hand position of the speaker (See Claim 2. JP, [0004], The closed captioning controller .. upon detecting the predetermined line-of-sight pattern, partially or completely de-emphasize the display of the closed captioning text <read on ‘determination process of the conveyance intention’>).”
As per claim 6 (dependent on claim 5), NAPOLITANO in view of JP further discloses “wherein the circuitry determines that the conveyance intention is absent if a state that the line-of-sight of the speaker is deviated from the area in which the character information is displayed continues for a predetermined time (JP, [0012], When a user's line of sight is tracked over a time interval .. the closed captioning controller can be configured to partially or completely deemphasize the display of the closed captioning text).”
As per claim 7 (dependent on claim 5), NAPOLITANO in view of JP further discloses “wherein the circuitry executes the determination process of the conveyance intention based on the line-of-sight of the speaker and a line-of-sight of the receiver (JP, [0004], an eye tracking device configured to detect the location of the line of sight .. The closed captioning controller can be configured to recognize a predetermined line-of-sight pattern of the user's line of sight and, upon detecting the predetermined line-of-sight pattern, partially or completely de-emphasize the display of the closed captioning text <read on a ready mechanism to detect line-of-sight including the speaker and the receiver, as well as any pre-defined condition for the system response>).”
As per claim 13 (dependent on claim 1), NAPOLITANO in view of JP further discloses “wherein the circuitry executes a suppression process regarding the speech recognition if it determines that the conveyance intention is absent (NAPOLITANO, [0078], speech-to-text (STT) .. automatic speech recognition (ASR) <where the system can start or stop a component speech recognition process under any preset condition, similar to the following JP’s teaching for de-emphasizing display>; JP, [0004], The closed captioning controller .. upon detecting the predetermined line-of-sight pattern, partially or completely de-emphasize the display of the closed captioning text).”
As per claim 14 (dependent on claim 13), NAPOLITANO in view of JP further discloses “wherein the circuitry stops a speech recognition process or stops the process of displaying the character information on at least one of the display devices used by the respective ones of the speaker and the receiver, as the suppression process (see Claim 13).”
As per claim 15 (dependent on claim 1), NAPOLITANO in view of JP further discloses “wherein the circuitry presents to at least the receiver that the conveyance intention is present if it determines that the conveyance intention is present (JP, [0004], an eye tracking device configured to detect the location of the line of sight .. The closed captioning controller can be configured to recognize a predetermined line-of-sight pattern of the user's line of sight and, upon detecting the predetermined line-of-sight pattern, partially or completely de-emphasize the display of the closed captioning text <read on ‘determining .. presenting a determination result’ such as when no de-emphasizing is shown means the intention is present>).”
Claims 18, 19 (both similar in scope to claim 1) are rejected under the same rationale as detailed above for claim 1.
Conclusion
6. THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG-TZER TZENG whose telephone number is 571-272-4609. The examiner can normally be reached on M-F (8:30-5:00). The fax phone number where this application or proceeding is assigned is 571-273-4609.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras Shah (SPE) can be reached on 571-270-1650.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/FENG-TZER TZENG/ 3/11/2026
Primary Examiner, Art Unit 2653