Last updated: May 29, 2026

Application No. 18/579,756

Machine Learning for Automated Navigation of User Interfaces

Non-Final OA §103§112

Filed

Jan 16, 2024

Priority

Sep 08, 2021 — provisional 63/241,561 +1 more

Examiner

SILVERMAN, SETH ADAM

Art Unit

2172

Tech Center

2100 — Computer Architecture & Software

Assignee

Google LLC

OA Round

2 (Non-Final)

Interview Optional

— +14.1% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 73% grant rate with +14.1% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 457 resolved cases, 2023–2026

Examiner Intelligence

SILVERMAN, SETH ADAM View full profile →

Grants 73% — above average

Career Allowance Rate

334 granted / 457 resolved

+18.1% vs TC avg

Moderate +14% lift

Without

With

+14.1%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

30 currently pending

Career history

496

Total Applications

across all art units

Statute-Specific Performance

§101

0.6%

-39.4% vs TC avg

§103

93.9%

+53.9% vs TC avg

§102

2.3%

-37.7% vs TC avg

§112

0.6%

-39.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 457 resolved cases

Office Action

§103 §112

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 1/16/2024, 4/19/2024, 6/26/2024, & 8/7/2025, were filed before the first office action.  The submissions are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Response to Arguments
Applicant’s arguments, filed 1/14/2026, with respect to the rejection(s) of claim(s) 1 under 35 USC 102 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of 35 USC 103 rejection is made in view of The combination of Harries and Williams, wherein Williams has been added to cure the deficiencies of Harries.  Accordingly, this action has been reset to a Non-Final Office action.

Response to Amendment
The previous 112b rejections to claim and 10 have been removed in view of the applicant’s amendments.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 20, is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 20 is rejected for the indefinite language that neither elaborates on the computation/determination of the loss, nor on how the determination of this loss affects the navigation of the machine-learned interface navigation model.  In line with the wording of the present claim 20, the masking operation may not have any effect at all.

Claim Rejection Notes
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3, 4, 6, 9, 11-14, and 16-19, and 21, are rejected under 35 U.S.C. 103 as being unpatentable over Harries et al. ("DRIFT:  Deep Reinforcement Learning for Functional Software Testing", 33rd Deep Reinforcement Learning Workshop (NeurIPS 2019), 12/8/2019, pages 1-10, XP093006546, Vancouver Canada, DOI: 10.48550/arxiv.2007.08220, URL:https://www.microsoft.com/en-us/research/uploads/prod/2020/02/DRIFT_26_CameraReadySubmission_NeurlPS_DRL.pdf), in view of Williams et al. (US 20070033172 A1, published: 2/8/2007).
Claim 1. (Currently Amended):  Harries teaches a computing system configured to navigate user interfaces using machine learning, the computing system comprising: one or more processors; and
one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising (this computing system performs a method, which is referred to as Deep Reinforcement Learning for Functional Software Testing (DRIFT) as can be seen from the title and the abstract. This method is described in more detail in section 3 on pages 2-6. Its computer- implementation follows e.g., from "PyTorch" in the second paragraph of section 4 on page 6 [Harries]):
obtaining, by the computing system, user interface data descriptive of a user interface that comprises a plurality of user interface elements ("UITree" in the first two paragraphs of section 3.1 and FIG. 1 [Harries]);
generating, by the computing system and based on the user interface data, a plurality of element embeddings respectively for the plurality of user interface elements (the first two paragraphs of the subsection, "Modeling the state using Graph Neural Networks" of section 3.2 on page 4, from which it follows that the element embeddings are represented as respective element embedding vectors [Harries]);
processing, by the computing system, the plurality of element embeddings with a machine-learned interface navigation model to generate a selected action as an output of the machine-learned interface navigation model ("and then apply the graph neural network" in first paragraph of subsection "Modeling the state using Graph Neural Networks: of section 3.2 on page 4.  Also the third paragraph spanning pages 4 and 5 [Harries]),
wherein the machine-learned interface navigation model selects the selected action from a predefined action space comprising a plurality of predefined candidate actions (paragraph spanning page 5 and 6 [Harries]); and
performing, by the computing system, the selected action on the user interface (paragraph spanning page 5 and 6 [Harries]).

Harries does not teach wherein the machine-learned interface navigation model is further configured to receive data descriptive of a query as an input alongside the plurality of element embeddings, wherein the query indicates a desired result of machine interaction with the user interface.
However, Williams teaches wherein the machine-learned interface navigation model is further configured to receive data descriptive of a query as an input alongside the plurality of element embeddings, wherein the query indicates a desired result of machine interaction with the user interface (by highlighting various commands and user interface elements as the user enters query text, and by providing additional distinctive highlighting in response to navigation within the menu containing search results, the present invention provides the user with a quick mechanism for learning how to find commands and other user interface elements [Williams, 0087]).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine-learned element embedded user interface invention of Harries to include the icon recognition model features of Williams.
One would have been motivated to make this modification for both novice and advanced users to navigate a user interface, and for providing a mechanism that avoids the need for users to memorize the location of commands in a user interface (Williams, 0006).
Claim 21, having similar elements to claim 1, is likewise rejected.

Claim 3. (Currently Amended):  The combination of Harries and Williams, teaches the computing system of claim 12.  Harries further teaches wherein the query does not reference any of the plurality of predefined candidate actions (the query input is a scalar reward.  This reward does not reference any of the plurality of predefined actions. [Harries, par. 3, section 2, pg. 2).
 
Claim 4. (Currently Amended):  The combination of Harries and Williams, teaches the computing system of claim 1.  Harries further teaches wherein the query comprises a single instruction, and wherein the computing system is configured to perform a plurality of actions from the predefined action space in response to single instruction (the agent has a policy 'Pi' which selects an action given the state Pi(a|s) = P[A = a|S = s]. The action is passed to the environment, and the state is updated using the transition function 7 [Harries, Section 2, paragraph beginning with Reinforcement Learning, pg. 2]).

Claim 6:  The combination of Harries and Williams, teaches the computing system of claim 1.  Harries further teaches wherein the user interface data comprises structural metadata descriptive of a structure of the user interface ("UITree" [Harries, first 2 paragraphs of section 3.1, pg. 1, FIG. 1]).

Claim 9:  The combination of Harries and Williams, teaches the computing system of claim 1.  Harries further teaches wherein processing, by the computing system, the plurality of element embeddings with the machine-learned interface navigation model to generate the selected action further comprises processing, by the computing system, the plurality of element embeddings with the machine-learned interface navigation model to further generate an element index and an argument as an output of the machine-learned interface navigation model, wherein the element index identifies one of the plurality of user interface elements as a target of the selected action ("node identified", "action type" [Harries, fourth paragraph of section 3.1, pg. 3]).

Claim 11:  The combination of Harries and Williams, teaches the computing system of claim 1.  Harries further teaches wherein the machine-learned interface navigation model comprises: a first attention model configured to perform self-attention on the plurality of element embeddings to generate a plurality of first intermediate embeddings; a second attention model configured to perform attention between a query embedding and the plurality of first intermediate embeddings to generate one or more second intermediate embeddings ([Harries, paragraphs spanning pgs. 4-5, and paragraphs that follow]); and one or more prediction heads configured to process the one or more second intermediate embeddings to generate one or more predictions, the one or more prediction heads comprising at least an action prediction head configured to select the selected action from the plurality of predefined candidate actions (Examiner's Official Notice: in the field it is common practice to try out different structures for a neural network at hand, and neural networks, in particular so-called Transformers, whose structure is characterized by an attention mechanism, are frequently encountered).
 
Claim 12:  The combination of Harries and Williams, teaches the computing system of claim 1.  Harries further teaches wherein the machine-learned interface navigation model comprises a reinforcement learning agent ([Harries, last paragraph of section 1, pg. 2]).
 
Claim 13. (Currently Amended):  Harries teaches a computer-implemented method to train a machine-learned interface navigation model to navigate interfaces, the method comprising (algorithm 1 on page 5.  In the last paragraph of section 1 of page 2, the interface navigation model is referred to as agent.  The method's computer-implementation follow from "PyTorch" in the second paragraph of section 4 on page 6 [Harries]):
obtaining, by a computing system comprising one or more computing devices, user interface data descriptive of a user interface that comprises a plurality of user interface elements ("UITree" in the first two paragraphs of section 3.1 and FIG. 1 [Harries]);
generating, by the computing system and based on the user interface data, a plurality of element embeddings respectively for the plurality of user interface elements (the first two paragraphs of the subsection, "Modeling the state using Graph Neural Networks" of section 3.2 on page 4, from which it follows that the element embeddings are represented as respective element embedding vectors [Harries]);
processing, by the computing system, the plurality of element embeddings with the machine-learned interface navigation model to generate a selected action as an output of the machine-learned interface navigation model ("and then apply the graph neural network" in first paragraph of subsection "Modeling the state using Graph Neural Networks: of section 3.2 on page 4.  Also the third paragraph spanning pages 4 and 5 [Harries]),
wherein the machine-learned interface navigation model selects the selected action from a predefined action space comprising a plurality of predefined candidate actions (paragraph spanning page 5 and 6 [Harries]);
determining, by the computing system, a reward based at least in part on the selected action ("get_reward" in algorithm 1 [Harries, pg. 5]); and
modifying, by the computing system, one or more values of one or more parameters of the machine-learned interface navigation model based at least in part on the reward ("train(batch, …)" in algorithm 1 [Harries, pg. 5]).
 
Claim 14. (Currently Amended):  The combination of Harries and Williams, teaches the computer-implemented method of claim 13.  Williams further teaches wherein the query corresponds to a user utterance (voice command [Williams, 0049]).

Claim 16:  The combination of Harries and Williams, teaches the computer-implemented method of claim 13.  Harries further teaches wherein: processing, by the computing system, the plurality of element embeddings with the machine-learned interface navigation model to generate the selected action further comprises processing, by the computing system, the plurality of element embeddings with the machine-learned interface navigation model to further generate an element index and an argument as an output of the machine-learned interface navigation model, wherein the element index identifies one of the plurality of user interface elements as a target of the selected action; and performing, by the computing system, the selected action comprises performing the selected action on the identified user interface element in accordance with the argument ("node identified" and "action type" [Harries, section 3.1, fourth paragraph, pg. 3]).
 
Claim 17:  The combination of Harries and Williams, teaches the computer-implemented method of claim 13.  Harries further teaches wherein the user interface data comprises augmented user interface data generated by performance of one or more augmentation operations on existing user interface training data ([Harries, section 3.1, second paragraph, pg. 3]; Examiner's Note: GUIs, which are obtained from other GUIs, by performing one or more augmentation operations on the former are frequently encountered ).
 
Claim 18:  The combination of Harries and Williams, teaches the computer-implemented method of claim 17.  Harries further teaches wherein the one or more augmentation operations comprise: modifying texts or locations of one or more user interface elements that have been classified as irrelevant ([Harries, section 3.1, second paragraph, pg. 3]; Examiner's Note: GUIs, which are obtained from other GUIs, by performing one or more augmentation operations on the former are frequently encountered ).
 
Claim 19:  The combination of Harries and Williams, teaches the computer-implemented method of claim 13.  Harries further teaches wherein determining, by the computing system, a reward based at least in part on the selected action comprises comparing, by the computing system, the selected action to a demonstration action that was included in a human demonstration (function call "episode_meets_objective(…)" in algorithm 1 [Harries, pg. 5]).

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Harries et al. ("DRIFT:  Deep Reinforcement Learning for Functional Software Testing", 33rd Deep Reinforcement Learning Workshop (NeurIPS 2019), 12/8/2019, pages 1-10, XP093006546, Vancouver Canada, DOI: 10.48550/arxiv.2007.08220, URL:https://www.microsoft.com/en-us/research/uploads/prod/2020/02/DRIFT_26_CameraReadySubmission_NeurlPS_DRL.pdf) and Williams et al. (US 20070033172 A1, published: 2/8/2007), and in further view of Chen ET AL: ("From UI Design Image to GUI Skeleton: A Neural Machine Translator to Bootstrap Mobile GUI Implementation", Proceedings of ICSE '18: 40th International Conference on Software Engineering, 27 May 2018 (2018-05-27), XP093006555, DOI: 10.1145/3180155.3180222 Retrieved from the Internet: URL:https://ieeexplore.ieee.org/stampPDF/getPDF.jsp? tp=&arnumber=8453135&ref=aHROCHM6Ly9pZWVIeHBsb3JILmIIZW Uub3JUnL2RvY3ViZW50LZgONTMxMzU=).
Claim 5:  The combination of Harries and Williams, teaches the computing system of claim 1.  Harries further teaches wherein: the user interface data comprises imagery that depicts the user interface; and generating, by the computing system, the plurality of element embeddings comprises one or more of the following: performing optical character recognition on the imagery; processing the imagery with an icon recognition model; and processing the imagery with an image detection model (as part of a disadvantageous modification of the system [Harries, third paragraph of section 3.1, pg. 3).  Harries does not explicitly teach icon recognition model.
However, Chen teaches icon recognition model ([Chen, FIG. 3]).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine-learned element embedded user interface invention of the combination of Harries and Williams, to include the icon recognition model features of Chen.
One would have been motivated to make this modification to further narrow the machine-learned elements of Harries, with the narrower machine-learning of icon recognition.  Such would improve said machine-learning by expanding its abilities.

Claim(s) 7, 8, and 15, is/are rejected under 35 U.S.C. 103 as being unpatentable over Harries et al. ("DRIFT:  Deep Reinforcement Learning for Functional Software Testing", 33rd Deep Reinforcement Learning Workshop (NeurIPS 2019), 12/8/2019, pages 1-10, XP093006546, Vancouver Canada, DOI: 10.48550/arxiv.2007.08220, URL:https://www.microsoft.com/en-us/research/uploads/prod/2020/02/DRIFT_26_CameraReadySubmission_NeurlPS_DRL.pdf) and Williams et al. (US 20070033172 A1, published: 2/8/2007), and in further view of Eskonen ET AL: ("Automating GUI Testing with Image-Based Deep Reinforcement Learning", 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS), 1 August 2020 (2020-08-01), pages 160-167, XP093006561, DOI: 10.1109/ACSOS4961 4.2020.00038 ISBN: 978-1-7281-7277-4 Retrieved from the Internet: URL:https://ieeexplore.ieee.org/stampPDF/getPDF.jsp? tp=&arnumber=91 96452&ref=).
Claim 7:  The combination of Harries and Williams, teaches the computing system of claim 1.  Harries does not teach wherein the selected action comprises a macro action that comprises a sequence of two or more component actions.
However Eskonen teaches wherein the selected action comprises a macro action that comprises a sequence of two or more component actions (first paragraph on page 163 left column, and second paragraph on page 165 left column [Eskonen]).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine-learned element embedded user interface invention of the combination of Harries and Williams, to include the macro action features of Eskonen.
One would have been motivated to make this modification to improve the machine learning process of the former.

Claim 8:  The combination of Harries and Williams, teaches the computing system of claim 7.  Harries does not teach wherein the macro action comprises a focus and type action in which an argument is entered into a data entry field of the user interface.
However Eskonen teaches wherein the macro action comprises a focus and type action in which an argument is entered into a data entry field of the user interface (first paragraph on page 163 left column, and second paragraph on page 165 left column [Eskonen]).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine-learned element embedded user interface invention of the combination of Harries and Williams, to include the macro action features of Eskonen.
One would have been motivated to make this modification to improve the machine learning process of the former.

Claim 15:  The combination of Harries and Williams, teaches the computer-implemented method of claim 13.  Harries does not teach wherein the selected action comprises a macro action that comprises a sequence of two or more component actions.
However Eskonen teaches wherein the selected action comprises a macro action that comprises a sequence of two or more component actions (first paragraph on page 163 left column, and second paragraph on page 165 left column [Eskonen]).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the machine-learned element embedded user interface invention of the combination of Harries and Williams, to include the macro action features of Eskonen.
One would have been motivated to make this modification to improve the machine learning process of the former.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SETH A SILVERMAN whose telephone number is (571)272-9783. The examiner can normally be reached Mon-Thur, 8AM-4PM MST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Adam Queler can be reached at (571)272-4140. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Seth A Silverman/Primary Examiner, Art Unit 2172

Read full office action

Prosecution Timeline

Jan 16, 2024

Application Filed

Oct 14, 2025

Non-Final Rejection mailed — §103, §112

Jan 08, 2026

Examiner Interview Summary

Jan 08, 2026

Applicant Interview (Telephonic)

Jan 14, 2026

Response Filed

Mar 02, 2026

Non-Final Rejection mailed — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/380,949

Patent 12638953

METHOD FOR DISPLAYING SIDEBAR, TERMINAL AND STORAGE MEDIUM

2y 7m to grant Granted May 26, 2026

18/622,787

Patent 12639316

GENERATIVE CONTENT SERVICE WITH MULTI-PATH CONTENT PIPELINE

2y 1m to grant Granted May 26, 2026

18/065,484

Patent 12630159

DISTURBANCE COMPENSATION USING CONTROL SYSTEMS FOR AUTONOMOUS SYSTEMS AND APPLICATIONS

3y 5m to grant Granted May 19, 2026

18/082,249

Patent 12630167

METHOD AND SYSTEM FOR DETERMINING COMPATIBILITY OF SYSTEM PARAMETERS

3y 5m to grant Granted May 19, 2026

18/391,583

Patent 12613626

Proximity-Based Controls on a Second Device

2y 4m to grant Granted Apr 28, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

2-3

Expected OA Rounds

73%

Grant Probability

87%

With Interview (+14.1%)

2y 8m (~4m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 457 resolved cases by this examiner. Grant probability derived from career allowance rate.