Last updated: April 19, 2026
Application No. 18/442,910
GAZE BASED DICTATION

Non-Final OA §101§103
Filed
Feb 15, 2024
Examiner
OGUNBIYI, OLUWADAMILOL M
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Apple Inc.
OA Round
3 (Non-Final)
Interview Optional

— +18.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 304 resolved cases, 2023–2026
Examiner Intelligence

OGUNBIYI, OLUWADAMILOL M View full profile →
Grants 78% — above average
Career Allow Rate
236 granted / 304 resolved
+15.6% vs TC avg
Strong +19% interview lift
Without
With
+18.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 12m
Avg Prosecution
31 currently pending
Career history
335
Total Applications
across all art units
Statute-Specific Performance

§101
20.1%
-19.9% vs TC avg
§103
47.0%
+7.0% vs TC avg
§102
12.1%
-27.9% vs TC avg
§112
13.7%
-26.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 304 resolved cases
Office Action

§101 §103
DETAILED ACTION
Claims 1, 3 – 9 and 11 – 21 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed on 27 February 2026 has been entered.
Response to Amendment
With regard to the Final Office Action from 22 December 2025, the Applicant has filed a response on 27 February 2026.
Claim 10 has been cancelled.
Claims 1, 19 and 20 were objected to for minor informalities. The claims have been amended in such a way that now overcomes the previous objection. The claim objection is hereby withdrawn.
Response to Arguments
With regard to the 35 U.S.C. 101 rejection given to the claims for being directed to a judicial exception without significantly more, the Applicant indicates (Remarks: page 7 par 5) that amended claim 1 is actually not directed towards mental processes. The Applicant indicates that the human mind is not equipped to perform the indicated task of a machine learning model in the way the machine learning model would (Remarks: page 8 par 1), particularly here regarding the machine learning model being trained to determine whether to enter the editing mode based on a velocity of the detected gaze of the user over a word displayed on a screen of the electronic device. To this, the Examiner indicates that a human can indeed mentally detect the speed of the detected gaze of a person’s eyes on a section on a screen, or on a piece of paper. The machine learning model present here is a tool being used to perform this detection which a human can perform. When the speed is detected by the human, the human may, based on this, decide to activate an editing mode. This is still a mental process by its recitation.
The Applicant goes on to indicate that (Remarks: page 8 par 3) claim 1 integrates the alleged abstract idea into a practical application. Such practical applications, as stated here, include ‘improving the efficiency of dictating and editing text with an electronic device and a digital assistant.’ The Examiner indicates here that this mentioned practical application by itself is a mental process. Consider a situation whereby a first human is dictating to a second human who writes down what is being dictated. Having the first human dwell on a written word for a reasonable length of time can indicate an intent to edit that word, especially in a situation where both humans have agreed on such a considerable dwell time being an indication to edit a word or a section. As stated, the indicated practical application is itself a mental process that can be performed in the human mind. The inclusion of the machine learning model simply serves as a particular tool being used to perform the mental process.
The Applicant states (Remarks: page 9 par 3) that the claim recites a specific machine learning model trained in a specific manner to perform a specific task, the task being to determine whether to enter the editing mode based on a velocity of the detected gaze of the user over a word displayed on the screen of the electronic device, and that this improves ‘the efficiency and responsiveness of a digital assistant of the electronic device.’ The Examiner has explained above that the indicated task of activating an editing mode based on a detected gaze velocity, is one that can mentally be performed by a human. The claim here attaches a machine learning model that is trained to perform this task that can be performed by a human. The task itself can be mentally performed and at this point, a machine learning model is attached to indicate that it was trained to be able to perform this, and is thereby performing it, but is present only as a tool to be applied to performing it. The machine learning model is not specific enough by its recitation, or by its training, to perform a task that a human would not be able to (relatively) perform.
The Examiner hereby maintains the 35 U.S.C. 101 rejection.
Regarding the 35 U.S.C. 103 rejection, the Applicant indicates that the prior art of record is not suitable to read upon the claimed invention. The Applicant indicates (Remarks: page 10 par 3) that the Dhatt reference does not teach of a machine learning model to determine to enter a dictation mode, nor of the determination to enter an editing mode, based on gaze detection. The Examiner indicates that first, the reference of Zurek et al. was applied to teach of the use of a gaze to determine to enter a dictation mode, the Examiner then making use of the Dhatt reference for teaching of the use of a machine learning model being applied to gaze detection. However, for the purpose of moving the application forward and based on the amendment to the independent claims that require a new ground of rejection, the claims will be addressed by their current presentation in the following section.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 3 – 9, 11 – 21 are rejected under 35 U.S.C. 101 because this claimed invention is directed to a judicial exception without significantly more.
Independent claims 1, 19 and 20 provide teaching for detecting a gaze of a user, applying a machine learning model to the gaze to determine if to enter a dictation mode, when in a dictation mode, receiving an utterance, and applying a machine learning model to the gaze to determine if to enter an editing mode based on a velocity of the gaze over a displayed word, and then when it is determined that an editing mode shouldn’t be entered, displaying a text associated with the utterance on a screen of the electronic device.
Nothing in the claims preclude the claimed technique from being performed in the human mind. The entire process involves data gathering through collecting a user’s gaze information and the collection of an utterance, data analysis through determining if to enter a dictation mode and also a determination to display a textual representation based on a decision not to enter an editing mode, data transformation by converting the utterance into text, and data presentation through the displaying of a textual representation of the utterance on a screen. A human may observe a user’s gaze, observe the user’s gaze is an indication that the user intends on beginning a dictation, collect utterances from the user who has signalled through a gaze that a dictation should begin, receive the user’s utterance, observe that the user slows a gaze on a particular word to determine if to begin to make edits, and when it is determined that there is no need to make edits, display a transcription of the utterance for the user to observe. The claim hereby recites a mental process.
This judicial exception is not integrated into a practical application as the claims simply teach of data gathering, analysis, transformation, and presentation. While the claim makes mention of a storage medium, processors, an electronic device, these are recited in generic terms.
The invention is not tied to any particular defining structure and simply provides instructions to apply the judicial exception. The technique can be performed by a generic computer which would be presented as a tool to implement the abstract idea (classifiable as automation of the mental process steps). The Specification in [0048] provides several computer devices suitable to read upon the limitations of this claim. The computer parts are recited at a high level of generality that they amount to no more than mere instructions to apply the exception using a generic computer. The trained machine learning model serves as an additional element used to make certain decisions, the decisions being to either enter a dictation mode or an editing mode. A generic machine learning model is provided in [0200] of the Specification, and it is recited without specificity on which machine learning model is applied, nor the specificities on how to apply the machine learning model. The presence of a machine learning model serves the purpose of providing nothing more than mere instructions to implement an abstract idea on a generic computer. A machine learning model is used to generally apply the abstract idea without limiting how the trained machine learning model functions. The machine learning model is described at high level of generality that mentioning it amounts to using a computer with a generic machine learning model to apply the abstract idea. A human who understands how to determine to activate a dictation mode and an editing mode based on predetermined gazes could perform the steps of this claim. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the invention is not tied to a practical application.
The claims provide techniques that amount to no more than mere instructions that apply the judicial exception which can be performed by a generic device. While the claims make mention of a trained machine learning model, the claims do not recite specifics on how the model is performed, and therefore still does not amount to significantly more than the mentioned judicial exception. Mere instructions to apply an exception using a generic device cannot provide an inventive concept. Claims 1, 19 and 20 are not eligible.
Claim 3 provides determining whether to enter the dictation mode and the editing mode based on the machine learning model. The machine learning model serves as an additional element used to make a decision. A generic machine learning model is provided in [0200] of the Specification, and it is recited without specificity on which machine learning model is applied, nor the specificities on how to apply the machine learning models. The presence of a machine learning model serves the purpose of providing nothing more than mere instructions to implement an abstract idea on a generic computer. A machine learning model is used to generally apply the abstract idea without limiting how the trained machine learning model functions. The machine learning model is described at high level of generality that mentioning it amounts to using a computer with a generic machine learning model to apply the abstract idea. A human who understands how to determine to activate a dictation mode and an editing mode could perform the steps of this claim. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 4 provides that the determination based on the gaze to enter a dictation mode involves determining whether the detected gaze is directed at a text field displayed on the screen of an electronic device, as detected by the machine learning model able to determine a mode. A human may observe the gaze of a user and if it is observed to be directed to a particular text field, encourage the user to begin dictating. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 5 provides that the determination based on the gaze to enter a dictation mode involves determining a first location on the text field where the detected gaze of the user is directed to, as detected by the machine learning model able to determine a mode. A human may observe the gaze of a user and if it is observed to be directed to a particular location of a text field, encourage the user to begin dictating. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 6 provides teaching for determining whether the detected gaze is directed at a text field displayed on a screen comprises determining a time the gaze was directed, and determining that the gaze time exceeds a threshold. A human may observe a user’s gaze and time the gaze to collect the duration of the gaze. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 7 provides applying the gaze and the utterance to determine whether to enter the editing mode comprises determining that the user’s gaze is directed to a second location on the text field and a determination is made not to enter the editing mode when a determination that the second location is at the end of the text displayed in the text field, as detected by the machine learning model able to determine a mode. A human may observe a user’s gaze to be at a particular location at the end of a text field, and determine based on the observation that an edit does not need to be made. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 8 provides teaching for determining to enter an editing mode based on the location being a word of text displayed in the text field, as detected by the machine learning model able to determine a mode. If a human observes that the user is gazing at a word of text in the text field, the human may determine that as a reason to activate editing. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 9 provides a determination to enter the editing mode based on a determination that a gaze spread of the detected gaze is below a spread threshold, as detected by the machine learning model able to determine a mode. A human could observe the range of a user’s eye focus on certain displayed words, and if the range isn’t wide, up to a threshold, the human may activate editing. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 11 provides determining a third location on a screen to display the textual representation. A human may select a different new location to write out a text transcription. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 12 provides that the third location is based on the user’s gaze on the screen. A human may observe a user’s gaze to see the location the user would like to place the transcription, and place the transcription at the location the user looked at. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 13 provides that the third location is based on the end of the text displayed on the screen. A human may continue to place a transcription at the end of the previous transcription text, so as to continue writing. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 14 provides that associated with a determination to enter an editing mode, a word displayed on a screen is determined to be edited, determining a change to be made to the word, and editing the word b applying the change to the word. A human may, after activating editing, determine a word to edit, the change to be made to the word, and then editing the word by changing it based on the determined change. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception. 
Claim 15 provides that the determination of the displayed word to edit is based on one or more of a detected gaze, distance between a location of the detected gaze of the user and the word, a dwell time of the detected gaze and the utterance. A human may determine a word to edit based on observation of the user to determine one of the gaze direction, distance between the user’s gaze and the word, dwell time of the detected gaze, or based on listening to the user’s utterance. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 16 provides that the determined change to the word is based on the utterance and a context of displayed words. A human may listen to the user’s utterance and determine the context of the displayed words to determine the change to be made to the word. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 17 provides teaching for the determining the word to edit based on a linguistic property of the word. A human may read through the words of a sentence, consider the linguistic property of the word, and determine to edit a word that is out of place based on its linguistic property. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 18 provides the determination of a first distance between a location of a detected gaze and a first word to be equal to a second distance between a location of the detected gaze and a second word, and if these are equal, a determination is made to edit the first and second words. This is a calculation that a human may perform by observing the user’s gaze location to be equally between two words, or even with the user indicating the location of his/her gaze, so that the human may activate the editing of both words. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 21 provides that the machine learning model is trained to determine a mode based on different factors which include a direction of a user’s gaze, a proximity of the detected gaze to a word displayed on the screen, and a dwell time of the gaze. A human may make a decision based on predetermined factors associated with a gaze such as determining if the user performs any of the above-stated operations. The indicated machine learning model is simply provided here in generic terms as a tool applied to perform the intended determination action. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 19, 20 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Pu et al. (US 2022/0284904 A1: hereafter — Pu) in view of Thörn (US 2015/0364140 A1).
For claim 1, Pu discloses a non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device (Pu: [0239] — non-transitory storage medium; [0232] — a processor; [0233] — execution of computer instructions; [0085] — electronic device), the one or more programs including instructions for:
detecting a gaze of a user (Pu: [0008] — ‘the assistant system may use gaze as an additional signal to determine when the user wants to input text and/or make an edit to the inputted text’);
determining, using a machine learning model trained to determine a mode based on the detected gaze of the user, whether to enter a dictation mode (Pu: [0008] — ‘the assistant system may use gaze as an additional signal to determine when the user wants to input text and/or make an edit to the inputted text’; [0133] — machine learning being used for face tracking as well as for recognising gestures (indicating the use of machine learning for determining to enter a dictation mode using a detected gaze));
in accordance with a determination to enter the dictation mode:
receiving an utterance (Pu: [0187] — ‘[a]s an example and not by way of limitation, when the user focuses on a field with gaze, the assistant system 140 may prompt the user to dictate their utterance to enter text into that field’ (the gaze being detected to prompt the user to dictate an utterance));
determining, using the machine learning model trained to determine the mode based on the detected gaze of the user, whether to enter an editing mode, [[wherein the machine learning model is trained to determine whether to enter the editing mode based on a velocity of the detected gaze of the user]] over a word displayed on a screen of the electronic device (Pu: [0187] — ‘[i]f the user indicates they want to make an edit, the user's gaze on a particular section of text may be used as a signal by the assistant system 140 to determine what to prompt the user to edit’ (user gaze to determine to enter an editing mode); [0133] — machine learning being used for face tracking as well as for recognising gestures (indicating the use of machine learning for determining to enter an editing mode using a detected gaze); [0213] — the user making use of gaze to make edits to a word by focusing a gaze on a word); and
in accordance with a determination not to enter the editing mode, displaying a textual representation of the utterance on the screen of the electronic device (Pu: [0112] — ‘the ASR module 208b may allow a user to dictate and have speech transcribed as written text’; [0120] — ‘[t]he dictated message may be “be there in twenty”, which is displayed in the textbox 500’).
The reference of Pu provides teaching for receiving a user’s gaze and also activating a dictation mode to enter text and an editing mode for editing the dictated text. This reference however fails to disclose the further limitation regarding determining to enter the editing mode based on the velocity of the detected gaze of the user.
This is however not new to the art as the reference of Thorn is now introduced to teach as:
determining, using the machine learning model trained to determine the mode based on the detected gaze of the user, whether to enter an editing mode, wherein the machine learning model is trained to determine whether to enter the editing mode based on a velocity of the detected gaze of the user over a word displayed on a screen of the electronic device (Thorn: [0069] — activating text editing of speech-to-text generated text, through user gaze, based on detecting a dwell time of the user’s gaze on the activation area, to know if a dwell time exceeds a threshold (thereby teaching of determining that a user focuses the gaze at a particular region with a slow speed/velocity, calculated based on the dwell time)).
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to improve upon the activation of an editing mode based on gaze detection on a word meant to be edited as determined through machine learning just as taught by the reference of Pu, by incorporating the detection of a dwell time of the gaze for entering the editing mode as taught by the reference of Thorn, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of applying the determined speed of the user’s gaze on a word intended to be edited as a confirmation that the user does indeed desire to edit that word over other words which the user’s gaze quickly dashes over, leading to a simple way of detecting the desire to make an edit. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
For claim 3, the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium, wherein determining whether to enter the dictation mode and determining whether to enter the editing mode are determined by the same machine learning model (Pu: [0133] — machine learning being used for face tracking as well as for recognising gestures (indicating the use of machine learning for determining to enter a dictation mode and to enter an editing mode using a detected gaze)).
For claim 4, claim 1 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium, wherein determining, using a machine learning model trained to determine a mode based on the detected gaze of the user, whether to enter the dictation mode further comprises:
determining whether the detected gaze of the user is directed at a text field displayed on a screen of the electronic device (Pu: [0187] — determining that a user should begin a dictation in a text field when a user gaze is detected at that text field of a display on a user interface).
For claim 5, claim 4 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium wherein determining, using a machine learning model trained to determine a mode based on the detected gaze of the user, whether to enter a dictation mode further comprises:
determining a first location on the text field where the detected gaze of the user is directed (Pu: [0187] — having the user focus a gaze attention on a source, or on an assistant icon (to indicate the fixing of the gaze attention on a location of the screen)).
For claim 6, claim 4 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium, wherein determining whether the detected gaze of the user is directed at the text field displayed on a screen of the electronic device (Pu: [0187] — determining that a user should begin a dictation in a text field when a user gaze is detected at that text field of a display on a user interface) further comprises:
determining a time that the detected gaze of the user is directed at the text field (Thorn: [0069] — activating text editing of speech-to-text generated text, through user gaze, based on detecting a dwell time of the user’s gaze on the activation area, to know if a dwell time exceeds a threshold); and
determining that the detected gaze of the user is directed at the text field in accordance with a determination that the time exceeds a threshold (Thorn: [0069] — activating text editing of speech-to-text generated text, through user gaze, based on detecting a dwell time of the user’s gaze on the activation area, to know if a dwell time exceeds a threshold).
For claim 7, claim 1 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium, wherein determining, using a machine learning model trained to determine a mode based on the detected gaze of the user, whether to enter the editing mode further comprises:
determining a second location on the text field where the detected gaze of the user is directed (Pu: FIG. 9D Part 915; [0210] — directing the gaze at a location on the display for the purpose of editing); and
determining not to enter the editing mode in accordance with a determination that the second location is at the end of text displayed in the text field (Pu: FIG. 9E Part 915; [0210] — the user’s gaze is directed to a location at the end of the text, and this is not an indication to enter an editing mode).
For claim 8, claim 7 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium, wherein determining, using a machine learning model trained to determine a mode based on the detected gaze of the user, whether to enter the editing mode further comprises:
determining to enter the editing mode in accordance with a determination that the second location is on a word of text displayed in the text field (Pu: FIG. 9D Part 915; [0210] — directing the gaze at a location on the display for the purpose of editing, the gaze being directed to the text displayed on the screen).
For claim 11, claim 1 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium, the one or more programs further including instructions for:
determining a third location to display the textual representation of the utterance on the screen of the electronic device (Pu: FIG. 6D Part 615, [0207] — ‘FIG. 6D illustrates an example user interface showing the new dictation. The user may say “I'll be there in thirty 615”’ (showing a new location 615 to display the textual representation)).
For claim 12, claim 11 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium, wherein the third location to display the textual representation of the utterance on the screen of the electronic device is determined based on the location of the user’s gaze on the screen (Pu: [0187] — determining that a user should begin a dictation in a text field when a user gaze is detected at that text field of a display on a user interface (the text being placed in the gaze location which is the text field region)).
For claim 13, claim 11 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium, wherein the third location to display the textual representation of the utterance on the screen of the electronic device is determined based on the end of text displayed on the screen of the electronic device (Pu: FIG. 6D Part 615 — the location of the displayed textual representation of the utterance comes shows up at the end of the previous text, after the text in Part 500).
For claim 14, claim 1 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium, the one or more programs further including instructions for:
in accordance with a determination to enter the editing mode:
determining a word displayed on the screen of the electronic device to edit (Pu: [0210], FIGs. 9C, 9D — ‘In FIG. 9C, the circle 915 may indicate the user's gaze input, which is fixated at the block “in twenty” 910. This means the user wants to edit the block “in twenty” 910. FIG. 9D illustrates an example user interface showing an edit to the block.’ (choosing to edit ‘in twenty’));
determining a change to be made to the word displayed on the screen of the electronic device (Pu: [0210], FIG. 9D — choosing the replacement as ‘in thirty’); and
editing the word by applying the change to the word (Pu: [0210], FIG. 9E — enacting the change to ‘in thirty’).
For claim 15, claim 14 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium, wherein determining the word displayed on the screen of the electronic device to edit is based on one or more of the detected gaze of the user, the distance between a location of the detected gaze of the user and the word, a dwell time of the detected gaze of the user, and the utterance (Pu: [0210], FIG. 9C Part 915 — “In FIG. 9C, the circle 915 may indicate the user’s gaze input, which is fixated at the block “in twenty” 910” as an indication of determining the word to be edited).
As for claim 19, electronic device of claim 19 and computer programme product claim 1 are related as device system for performing the available programme instructions. Pu in [0232] provides a processor and in [0239] provides storage memory, suitable to read upon the limitations of this claim. Accordingly, claim 19 is similarly rejected under the same rationale as applied above with respect to computer programme product claim 1.
As for claim 20, method claim 20 and computer-readable medium claim 1 are related as method detailing procedures taken to implement the available computer-programmable instructions. Pu in FIG. 16 provides a method, an electronic device in [0085], a processor in [0232] and a storage medium in [0239], suitable to address the limitations of this claim. Accordingly, claim 20 is similarly rejected under the same rationale as applied above with respect to computer-readable medium claim 1.
For claim 21, claim 1 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium wherein the machine learning model is trained to determine a mode based on a plurality of factors, the plurality of factors including at least one of a direction of the detected gaze of the user (Pu: [0203] — determines that a user’s eyes are moving in a direction), a proximity of the detected gaze of the user to a word displayed on the screen of the electronic device, and a dwell time of the detected gaze of the user (Thorn: [0069] — determining user gaze dwell time).
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Pu (US 2022/0284904 A1) in view of Thorn (US 2015/0364140 A1) as applied to claim 1, further in view of Schwartz (US 2020/0168038 A1).
For claim 9, claim 1 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium, wherein determining, using a machine learning model trained to determine a mode based on the detected gaze of the user, whether to enter the editing mode further comprises:
determining to enter the editing mode [[in accordance with a determination that a gaze spread of the detected gaze is below a spread threshold]] (Pu: [0210] — ‘FIG. 9C illustrates an example user interface showing a gaze input. In FIG. 9C, the circle 915 may indicate the user's gaze input, which is fixated at the block "in twenty" 910. This means the user wants to edit the block "in twenty" 910.’ (teaching of a user using a gaze to have the system fixate on a certain area, the fixation on the words leading to a determination to perform editing)).
The combination of Pu in view of Thorn provides teaching for activating an edit by determining where a user’s gaze is fixated upon, but differs from the claimed invention in that the claimed invention further provides teaching for a determination that a gaze spread is below a threshold.
This isn’t new to the art as the reference of Schwartz is now introduced to teach this as:
determining to enter the editing mode in accordance with a determination that a gaze spread of the detected gaze is below a spread threshold (Schwartz: Claim 7 — determining a high engagement level when a gaze direction is determined to be within a pre-determined are of focus (a gaze direction being within an area of focus indicates that the gaze spread is within a threshold, and the high engagement level indicates a fixation on the gazed-upon area)).
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to improve upon the teaching of the combination of Pu in view of Thorn which teach of determining where a gaze is fixated upon to determine that an edit should be made, by applying the known technique of Schwartz which determines that a user’s gaze is fixed on a location through determining that there is a high engagement level, through determining that the area of focus of the gaze is less than a threshold, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of being able to properly determine the user’s focus location without confusing it with a larger area, so that the editing can be localised to only the words that need editing. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Pu (US 2022/0284904 A1) in view of Thorn (US 2015/0364140 A1) as applied to claim 14, further in view of Goel at al. (US 2015/0242391 A1: hereafter — Goel).
For claim 16, claim 14 is incorporated and the combination of Pu in view of Thorn discloses the non-transitory computer-readable storage medium, wherein determining the change to be made to the word displayed on the screen of the electronic device is based on the utterance [[and a context of the words displayed on the screen of the electronic device]] (Pu: [0210] — ‘As illustrated in FIG. 9D, the user may have dictated the edit as “in thirty” 920 to replace “in twenty” 910’ (user dictates the utterance that is to be used to make the edit)).
The combination of Pu in view of Thorn however fails to teach the further limitation of this claim regarding the use of context, for which the reference of Goel is now introduced to teach as:
the non-transitory computer-readable storage medium, wherein determining the change to be made to the word displayed on the screen of the electronic device is based on the utterance and a context of the words displayed on the screen of the electronic device (Goel: [0041] — providing replacements for one or more words by making use of context-appropriate words; FIG. 3 — making corrections to words displayed on a screen of an electronic device).
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to combine the known teaching of Goel which determines the change to be made to the word based on a context of displayed words, with the teaching of determining the change to be made based on the utterance as taught by the combination of Pu in view of Thorn, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of ensuring that the replacement word is contextually appropriate for the sentence it is being inserted into, leading to a grammatically correct sentence. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Pu (US 2022/0284904 A1) in view of Thorn (US 2015/0364140 A1) as applied to claim 14, further in view of Bojja et al. (US 2017/0185581 A1: hereafter — Bojja).
For claim 17, claim 14 is incorporated but the combination of Pu in view of Thorn fails to disclose the teaching of this claim, for which the reference of Bojja is now introduced to teach as the non-transitory computer-readable storage medium, wherein determining the word displayed on the screen of the electronic device to edit is based on a linguistic property of the word (Bojja: [0036] — grammar error correction method which parses an input sentient to determine parts of speech of individual words (the part of speech being a linguistic property)).
The combination of Pu in view of Thorn provides teaching for determining a change is to be made to a word displayed on a screen, but differs from the claimed invention in that the claimed invention further provides teaching for determining the word based on a linguistic property of the word. This isn’t new to the art as the reference of Bojja is seen to teach above, the linguistic property being a part-of-speech.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to improve upon the teaching of the combination of Pu in view of Thorn which teaches of determining that a change is to be made to a word displayed on a screen, by applying the known technique of Bojja which determines the part-of-speech of each word in a sentence for the purpose of performing error correction, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of ensuring that the arrangement of the words in the sentence are linguistically correct, so the discovery of a linguistic misalignment would signal an error in the sentence. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Pu (US 2022/0284904 A1) in view of Thorn (US 2015/0364140 A1) as applied to claim 1, further in view of POWDERLY et al. (US 2018/0307303 A1: hereafter — Powderly).
For claim 18, claim 1 is incorporated but the combination of Pu in view of Thorn fails to disclose the limitations of this claim, for which the reference of Powderly is now introduced to teach as the non-transitory computer-readable storage medium, the one or more programs further including instructions for:
in accordance with a determination that a first distance between a location of the detected gaze of the user and a first word is equal to a second distance between a location of the detected gaze of the user and a second word, determining to edit both the first word and the second word (Powderly: [0308] — a user may select a phrase using gaze by focusing on first word and a second word, so that both words (and those in-between) may be selected for editing (the centre of the user’s gaze would be a point equidistant to both the first and second words)).
The combination of Pu in view of Thorn provides teaching for obtaining a transcription of words of an utterance of a user received based on a user’s gaze, but differs from the claimed invention in that the claimed invention further provides teaching for determining to edit two words for which the distance between the user’s gaze and each of the words is equal. This isn’t new to the art as is seen to be taught by the reference of Powderly above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to improve upon the teaching of the combination of Pu in view of Thorn which applies gaze to begin a dictation and provide a transcription of the dictation from a user, by applying the known technique of Powderly which is able to make a selection of two words for an edit based on the user’s gaze having gone through both words, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of ensuring the editing of multiple words with a single action. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
LEWIS (US 2021/0074277 A1) provides teaching for capturing a user’s gaze and dwell time to be able to determine to edit a word, while detecting saccades focused on particular words ([0034], [0053], [0055]).
RUDCHENKO et al. (US 2019/0034038 A1) provides teaching for eye-gaze editing being used to determine that a user intends to begin a word or end a word by detecting that the eye-gaze location is inside or outside the virtual keyboard region, respectively, this being performed through the use of a machine learning algorithm. [0034].
Any inquiry concerning this communication or earlier communications from the Examiner should be directed to OLUWADAMILOLA M. OGUNBIYI whose telephone number is (571)272-4708. The Examiner can normally be reached Monday – Thursday (8:00 AM – 5:30 PM Eastern Standard Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, PARAS D SHAH can be reached at (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/OLUWADAMILOLA M OGUNBIYI/Examiner, Art Unit 2653

/Paras D Shah/Supervisory Patent Examiner, Art Unit 2653                                                                                                                                                                                                        
03/21/2026
Read full office action
Prosecution Timeline

Feb 15, 2024
Application Filed
Sep 04, 2024
Response after Non-Final Action
Apr 08, 2025
Response after Non-Final Action
Sep 06, 2025
Non-Final Rejection — §101, §103
Sep 24, 2025
Examiner Interview Summary
Sep 24, 2025
Applicant Interview (Telephonic)
Sep 29, 2025
Response Filed
Dec 17, 2025
Final Rejection — §101, §103
Feb 23, 2026
Examiner Interview Summary
Feb 23, 2026
Applicant Interview (Telephonic)
Feb 27, 2026
Request for Continued Examination
Mar 02, 2026
Response after Non-Final Action
Mar 21, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/615,766
Patent 12579979
NAMING DEVICES VIA VOICE COMMANDS
2y 5m to grant Granted Mar 17, 2026
19/024,112
Patent 12537007
METHOD FOR DETECTING AIRCRAFT AIR CONFLICT BASED ON SEMANTIC PARSING OF CONTROL SPEECH
2y 5m to grant Granted Jan 27, 2026
18/082,346
Patent 12508086
SYSTEM AND METHOD FOR VOICE-CONTROL OF OPERATING ROOM EQUIPMENT
2y 5m to grant Granted Dec 30, 2025
17/693,171
Patent 12499885
VOICE-BASED PARAMETER ASSIGNMENT FOR VOICE-CAPTURING DEVICES
2y 5m to grant Granted Dec 16, 2025
17/988,376
Patent 12469510
TRANSFORMING SPEECH SIGNALS TO ATTENUATE SPEECH OF COMPETING INDIVIDUALS AND OTHER NOISE
2y 5m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
96%
With Interview (+18.6%)
2y 12m
Median Time to Grant
High
PTA Risk
Based on 304 resolved cases by this examiner. Grant probability derived from career allow rate.
GAZE BASED DICTATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email