Last updated: April 19, 2026

Application No. 18/483,678

ELECTRONIC DEVICE WITH VOICE CONTROL DIRECTED BASED UPON CONTEXT DATA

Non-Final OA §102§103

Filed

Oct 10, 2023

Examiner

WOZNIAK, JAMES S

Art Unit

2655

Tech Center

2600 — Communications

Assignee

LENOVO (SINGAPORE) PTE. LTD.

OA Round

3 (Non-Final)

This examiner grants 59% of cases after interview

— +40.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 385 resolved cases, 2023–2026

Examiner Intelligence

WOZNIAK, JAMES S View full profile →

Grants 59% of resolved cases

Career Allow Rate

227 granted / 385 resolved

-3.0% vs TC avg

Strong +40% interview lift

Without

With

+40.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 7m

Avg Prosecution

42 currently pending

Career history

427

Total Applications

across all art units

Statute-Specific Performance

§101

18.1%

-21.9% vs TC avg

§103

40.1%

+0.1% vs TC avg

§102

18.4%

-21.6% vs TC avg

§112

16.1%

-23.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 385 resolved cases

Office Action

§102 §103

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment

In response to the Advisory Office Action mailed on 12/12/2025, Applicant has filed a Request for Continued Examination (RCE) on 12/12/2025.  In this reply, Applicant resubmits the amendments previously submitted after Final along that have now been entered via the filing of the RCE along with the associated arguments filed on 12/4/2025 (see the marked box on EFS form PTO/SB/30EFS).  The resubmitted arguments regard to failure of the prior art of record to teach "determine the subject matter contained on a display and actuate a control point in real time at a first location on the display based on the context data, subject matter contained on the display, and the voice command" (Remarks, Pages 7-8).  These arguments have been fully considered, however, due to the position of record explained in the Advisory Action and maintained herein they are not found to be persuasive.

Response to Arguments

With respect to independent Claim 1, Applicant argues that Gatzke, et al. (U.S. PG Publication:  2024/0211204 A1) fails to teach does not "determine the subject matter contained on a display and actuate a control point in real time at a first location on the display based on the context data, subject matter contained on the display, and the voice command" because Gatzke discloses a 3D video headset that has a gaze and select feature that does not make determinations related to subject matter on a display itself nor actuate control points based on the subject matter (Remarks, Page 7).
In response to these arguments, it is noted that Gatzke discloses the limitation added entered upon the filing of the RCE in the combination of the following citations:   Paragraph 0028 that describes UI objects that are "interactive", Paragraph 0029 that describes various UI objects (i.e., displayed subject matter) and what happens when such objects are actuated with a voice command, and Paragraph 0030 that describes linking all of these concepts together (UI object (i.e., displayed subject matter), voice command, gaze-based context information) in the description "While the participant 102-1 is looking/gazing at a particular voice enabled UI object, say, for example, gazing/looking at the "Share Content" voice-enabled UI object 220-2 as shown at 222 in FIG. 2B, and it is determined that the participant 102-1 is not currently an active speaker for the video conference, a media stream generated for the participant 102-1 (e.g., media stream 122-1) can be examined/analyzed to determine whether the participant has spoken a voice command associated with the particular voice-enabled UI object at which the participant is looking/gazing."  Upon such a determination described in Paragraph 0030, the control point or UI object is actuated (see “cause the video conference application 120 to perform one or more actions” in Paragraph 0028; see also Paragraph 0022 describing various UI objections and associated actions carried out by performing the visual context-based voice commands).
Accordingly, based upon at least these citations that describe displayed UI elements being actuated based upon voice commands and gaze context, Applicant arguments directed towards claim 1 are not found to be persuasive.
The prior art rejections of the remaining independent and dependent claims have been traversed for reasons similar to Claim 1 (Remarks, Page 8).  In regards to such arguments, see the response directed towards independent claim 1.
Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-8, 17-26, and 28 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Gatzke, et al (U.S. PG Publication:  2024/0211204 A1).
With respect to Claim 1, Gatzke discloses:
A system for operating a program on a primary electronic device comprising: 
a primary electronic device including a microphone and a display, and having a memory to store executable instructions and one or more processors, when implementing the executable instructions (processor(s) and memories storing instructions, Paragraph 0015, 0017, and 0051; microphone and video displays, Paragraphs 0014-0015), to: 
obtain, with a sensor (see camera, Paragraphs 0021 and 0030), context data related to a user of the primary electronic device (eye gaze tracking that provide a context for a voice command interface operating on an electronic device, Paragraphs 0018, 0020-0021, and 0030); 
determine in real time, a first location on the display of the primary electronic device based on the context data (objects on a display are identified based upon gaze tracking (e.g., "UI buttons and/or menu commands" and "to determine when the participant is looking/gazing at a particular voice-enabled UI object"), Paragraphs 0021, 0030, and 0034; the operations for this step and others mentioning "real-time" in Gatzke are performed in the context of streaming conferencing communication (Paragraphs 0013) and utilize high speed networks and computer equipment (Paragraphs 0055-0056), thus it is implied that such operations utilize in conference room-style communications are performed in real-time); 
determine subject matter contained on the display (determination of displayed subject matter such as displayed interactive "UI objects", Paragraphs 0008, 0018, and 0028; Fig. 2A, Elements associated with 220);
determine in real time, using the microphone of the primary electronic device, a voice command related to the first location on the display (Paragraph 0020- "utter or speak a voice command associated with the given voice-enabled UI object in order to cause the video conference application 120 to perform one or more actions within the video conference;" Paragraph 0030- "media stream generated for the participant 102-1 (e.g., media stream 122-1) can be examined/analyzed to determine whether the participant has spoken a voice command associated with the particular voice-enabled UI object at which the participant is looking/gazing;" See also Paragraphs 0032, 0036, and 0064; device microphone as an audio input, Paragraph 0015); and 
actuate a control point in real time at the first location on the display based on the context data, the subject matter contained on the display, and the voice command (Paragraph 0022- "the voice command is associated with or otherwise linked to a particular voice-enabled UI object such that speaking or uttering the voice command by the participant 102-1 may cause performance of one or more actions, operations, etc. within the video conference for the participant (e.g., mute, hang-up, share content, etc.);" Paragraph 0028- "interactive environment 200 may include one or more voice-enabled UI objects 220, each of which can be associated with a particular voice command ( or a particular set of voice commands) that can be uttered or spoken by participant 102-1 in order to cause the video conference application 120 to perform one or more actions within the video conference;” Paragraph 0032- “for the "Share Content" voice-enabled UI object 220-2 at which it is determined that the participant 102-1 is gazing/looking at (as generally shown at 222), if it is further determined that the participant is not an active speaker for the video conference and has spoken the "Share Content" voice command.” See also Paragraphs 0030 and 0035).
With respect to Claim 2, Gatzke further discloses:
The system of claim 1, wherein to determine the voice command the one or more processors are further configured to: 
detect a sound in an environment of the user (Paragraph 0031- “monitoring audio obtained for the participant 102-1 via audio I/O device(s)”);
determine the user created the sound (Paragraph 0031- “determine if a volume of audio for the participant 102-1 satisfies (e.g., is greater than or is greater than or equal to) a particular "active speaker" threshold”); 
identify at least one word from the sound (Paragraph 0032- “determined that the participant is not an active speaker for the video conference and has spoken the "Share Content" voice command;” note that the particular word(s) of the spoken command are determined;” See also Paragraph 0029 for other command words that may be identified); 
convert the at least one word into voice to text data (Paragraph 0032- “determined that the participant is not an active speaker for the video conference and has spoken the "Share Content" voice command;” note that the particular word(s) of the spoken command are determined;” See also Paragraph 0029 for other command words that may be identified; accordingly the system taking in a sound command and converting the command into a particular command word in a system vocabulary is a transcription operation); and 
compare the at least one word to a list of words associated with the first location to determine the voice command (different recognized voice commands are matched with different system vocabulary commands (e.g., for muting, hanging up a call, sharing content, etc.) to have those specific commands carried out, Paragraphs 0029 and 0032; for example a user speaking a “share content” command that is recognized is implied to be matched to the share content command in the vocabulary so that this particular function is performed instead of muting or hanging up a call).
With respect to Claim 3, Gatzke further discloses:
The system of claim 1, wherein the one or more processors are further configured to determine a second location on the display in response to actuating the control point at the first location (additional display locations can be identified in response to execution/actuation of the control point such as a display of shared content in response to a “share content” command or in response to a new user gaze location in a UI after a command was issued, Paragraphs 0020, 0029 and 0034).
With respect to Claim 4, Gatzke further discloses:
The system of claim 1, wherein in the one or more processors are further configured to: determine whether data is stored within the memory that defines the control point; and dynamically adjust the control point, in real time,  in the memory in response to determining the control point is in the memory ("gaze tracking logic" that keeps track of "eye tracking/gaze details" to determine whether a user is looking towards "one or more voice-enabled user interface objects," Paragraph 0015, 0021, 0025-0026, and 0045; data and information is tracked/stored in memory, Paragraph 0054; tracking information is ongoing and dynamic and can identify status when and object is being looked at and when gaze ceases, Paragraph 0034).
With respect to Claim 5, Gatzke further discloses:
The system of claim 1, wherein the primary electronic device includes at least one sensor in communication with the one or more processors, and the one or more processors obtain the context data from the at least one sensor (Paragraph 0030- “eye tracking/gaze details/information generated via video camera(s) 112;” See also Paragraph 0021).
With respect to Claim 6, Gatzke further discloses:
The system of claim 1, wherein the one or more processors utilize computer vison to determine the first location on the display (computer-based eye/gaze tracking, Paragraphs 0015, 0018, and 0021).
With respect to Claim 7, Gatzke further discloses:
The system of claim 1, wherein the one or more processors are further configured to obtain context data from a communication from a secondary electronic device (secondary electronic device such as a video headset obtains context data in the form of gaze, Paragraphs 0012, 0017, and 0021; Fig. 1, Elements 110-1 and 130-1 showing a multi-device environment; see also the use of an audio I/O device for obtaining audio-based context information to determine an active speaker, Paragraph 0031).
With respect to Claim 8, Gatzke further discloses:
The system of claim 1, the one or more processors are further configured to determine the control point is within the first location before actuating the control point (detection of a user looking at a particular location/UI object within a display prior to voice command actuation/operation (e.g., muting a call or sharing content), Paragraphs 0020, 0026, and 0030).
Claim 17 relates to the method practiced by the system of claim 1 embodied as a non-transitory computer readable storage medium comprising computer executable code, and thus, contains similar subject matter.  Accordingly, claim 17 is rejected for reasons similar to claim 1.  Furthermore, Gatzke discloses method embodiment as a non-transitory computer readable storage medium comprising computer executable code (Paragraphs 0062-0063).
Claims 18-20 contain subject matter respectively similar to claims 2-4, and thus, are rejected under similar rationale.
With respect to Claim 21, Gatzke disclsoes:
A system for operating a program on a primary electronic device comprising: a primary electronic device having a memory to store executable instructions and one or more processors, when implementing the executable instructions (processor(s) and memories storing instructions, Paragraph 0015, 0017, and 0051; multi-device operating environment, Fig. 1, e.g., Elements 110-1 and 130-1), to: 
obtain, with a sensor (see camera, Paragraphs 0021 and 0030), context data related to a user of the primary electronic device (eye gaze tracking that provide a context for a voice command interface operating on an electronic device, Paragraphs 0018, 0020-0021, and 0030); 
determine subject matter contained on the display (determination of displayed subject matter such as displayed interactive "UI objects", Paragraphs 0008, 0018, and 0028; Fig. 2A, Elements associated with 220);
determine whether the program is muting the user based on the context data (context data in the form of gaze used to determine that a user is muted, Paragraph 0026, 0029, and 0034); 
determine, based on the context data and the subject matter contained on the display in real time, whether the user intends to communicate using sound via the program in response to determining the program is muting the user (again relying on gaze information of UI objects (“no longer looking at”) to determine that a user intends to speak, Paragraphs 0030, 0034, 0049, and 0065; the operations for this step and others mentioning "real-time" in Gatzke are performed in the context of streaming conferencing communication (Paragraphs 0013) and utilize high speed networks and computer equipment (Paragraphs 0055-0056), thus it is implied that such operations utilize in conference room-style communications are performed in real-time); and
 automatically actuate the program to unmute a microphone in real time to allow the sound of the user to be communicated to the program when determining that user intends to communicate using the sound (unmuting operation is automatically performed based on the context information in the form of gaze tracking, Paragraphs 0034, 0049, and 0065; audio input is received via a microphone that is muted if mute is active, Paragraph 0015).
With respect to Claim 22, Gatzke disclsoes:
The system of claim 21, wherein the program is a conference calling application (note that claim 22 further limits an intended use recitation (i.e., “for operating a program) that does not structurally limit the system of claim 21, however, it is nevertheless worth noting that Gatzke discloses a conference calling application in the form of a “video conference application” such as Webex, Paragraph 0011).
With respect to Claim 23, Gatzke further discloses:
The system of claim 21, wherein to determine the user intends to communicate using sound the one or more processors are configured to: analyze the context data to determine the user is in front of the primary electronic device (the gaze tracking is capable of detecting a user looking directly at the primary device wherein “directly at” corresponds to a user being in front of the device, Paragraphs 0026); and 
determine, using Computer Vision (CV), a gaze of the user is looking at a working space ((computer-based eye/gaze tracking, Paragraphs 0015, 0018, and 0021, pertaining to looking at a computer-based conference work space, Paragraph 0016 and Fig. 3A element 300).
With respect to Claim 24, Gatzke further discloses:
The system of claim 21, further comprising: the sensor coupled to the one or more processors and configured to obtain the context data (Paragraph 0030- “eye tracking/gaze details/information generated via video camera(s) 112;” See also Paragraph 0021; see Fig. 5 showing a communication bus (508) coupling I/O sensors to the processor(s)).
With respect to Claim 25, Gatzke further discloses:
The system of claim 24, wherein the sensor is at least one of a camera, microphone, infrared sensor, or temperature sensor (Paragraph 0030- “eye tracking/gaze details/information generated via video camera(s) 112).
With respect to Claim 26, Gatzke further discloses:
The system of claim 21, wherein the one or more processors are further configured to: determine a working space of the user (physical work environment recognition, Paragraph 0016); and determine whether the user is within the working space in front of the primary electronic device (the gaze tracking is capable of detecting a user looking directly at the primary device wherein “directly at” corresponds to a user being in front of the device, Paragraphs 0015, 0018, 0021, and 0026).
With respect to Claim 28, Gatzke further discloses:
The system of claim 21, wherein the one or more processors are further configured to: determine the sound of the user is a voice command (Paragraph 0032- “determined that the participant is not an active speaker for the video conference and has spoken the "Share Content" voice command;” note that the particular word(s) of the spoken command are determined;” See also Paragraph 0029 for other command words that may be identified); and 
implement the voice command (Paragraph 0022- "the voice command is associated with or otherwise linked to a particular voice-enabled UI object such that speaking or uttering the voice command by the participant 102-1 may cause performance of one or more actions, operations, etc. within the video conference for the participant (e.g., mute, hang-up, share content, etc.)).




Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Gatzke, et al. in view of Terrano (U.S. PG Publication:  2020/0090401 A1).
With respect to Claim 9, Gatzke teaches the gaze-directed voice command execution system as applied to claim 9.  Although Gatzke teaches that gaze tracking may employ any “techniques/logic/algorithms now known in the art” (Paragraph 0021), Gatzke does not specifically teach utilizing an artificial intelligence application to analyze the context data.  Terrano, however, discloses using artificial intelligence (AI) software (see “machine learning”) to analyze context data in the form of gaze tracking information (Paragraphs 0021 and 0038).
Gatzke and Terrano are analogous art because they are from a similar field of endeavor in voice command processing using gaze detection.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to use the gaze tracking using artificial intelligence as taught by Terrano as one of the techniques used to analyze gaze taught by Gatzke to provide a predictable result of using an algorithm that can learn how to effectively track a gaze direction of a user.

Claim 27 is rejected under 35 U.S.C. 103 as being unpatentable over Gatzke, et al. in view of Jorasch, et al. (U.S. PG Publication:  2021/0399911 A1).
With respect to Claim 27, Gatzke teaches the system for video conference management using context in the form of gaze tracking and muting functionality as applied to Claim 26.  Although Gatzke teaches the removal of “another individual” (e.g., family) or animal (e.g., “dog”) from conference audio (Paragraph 0023), Gatzke does not describe their identification and removal from image data.  Jorasch, however, disclsoes the identification of other objects in a video conference such as other people or children and their removal from the scene (Paragraphs 2514 and 2834).
Gatzke and Jorasch are analogous are because they are from a similar field of endeavor in video conferencing interfaces.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to use the image filtering taught by Jorasch in the video conference application taught by Gatzke in order to provide a predictable result of removing distractions in a video conference.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Summa, et al. (U.S. PG Publication:  2020/0227034 A1; originally cited in the 7/8/2025 PTO-892)- teaches "tracking a gaze of a human user that is viewing the displayed graphics scene; detecting a location of a tracked gaze of the human user relative to a location of a predetermined activation area while the graphics scene is displayed on the video display device;" and based upon where a user is looking mutes a voice chat to process a voice command.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES S WOZNIAK whose telephone number is (571)272-7632. The examiner can normally be reached 7-3, off alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant may use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JAMES S. WOZNIAK
Primary Examiner
Art Unit 2655



/JAMES S WOZNIAK/Primary Examiner, Art Unit 2655

Read full office action

Prosecution Timeline

Oct 10, 2023

Application Filed

Jul 03, 2025

Non-Final Rejection — §102, §103

Oct 06, 2025

Response Filed

Oct 14, 2025

Final Rejection — §102, §103

Dec 04, 2025

Response after Non-Final Action

Dec 12, 2025

Request for Continued Examination

Jan 13, 2026

Response after Non-Final Action

Jan 30, 2026

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/399,876

Patent 12597422

SPEAKING PRACTICE SYSTEM WITH RELIABLE PRONUNCIATION EVALUATION

2y 5m to grant Granted Apr 07, 2026

18/488,578

Patent 12586569

Knowledge Distillation with Domain Mismatch For Speech Recognition

2y 5m to grant Granted Mar 24, 2026

18/359,113

Patent 12511476

CONCEPT-CONDITIONED AND PRETRAINED LANGUAGE MODELS BASED ON TIME SERIES TO FREE-FORM TEXT DESCRIPTION GENERATION

2y 5m to grant Granted Dec 30, 2025

18/390,934

Patent 12512100

AUTOMATED SEGMENTATION AND TRANSCRIPTION OF UNLABELED AUDIO SPEECH CORPUS

2y 5m to grant Granted Dec 30, 2025

18/448,628

Patent 12475882

METHOD AND SYSTEM FOR AUTOMATIC SPEECH RECOGNITION (ASR) USING MULTI-TASK LEARNED (MTL) EMBEDDINGS

2y 5m to grant Granted Nov 18, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

59%

Grant Probability

99%

With Interview (+40.1%)

3y 7m

Median Time to Grant

High

PTA Risk

Based on 385 resolved cases by this examiner. Grant probability derived from career allow rate.