Last updated: May 29, 2026

Application No. 18/634,414

GENERATIVE ARTIFICIAL INTELLIGENCE INTEGRATED WITH USER INTERFACE ELEMENT DETECTION AND AUTOMATION INCLUDING CONTEXTUAL AWARENESS

Non-Final OA §102§103

Filed

Apr 12, 2024

Examiner

NAZAR, AHAMED I

Art Unit

2178

Tech Center

2100 — Computer Architecture & Software

Assignee

UIPATH, INC.

OA Round

1 (Non-Final)

This examiner grants 53% of cases after interview

— +32.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 383 resolved cases, 2023–2026

Examiner Intelligence

NAZAR, AHAMED I View full profile →

Grants 53% of resolved cases

Career Allowance Rate

204 granted / 383 resolved

-1.7% vs TC avg

Strong +33% interview lift

Without

With

+32.8%

Interview Lift

resolved cases with interview

Typical timeline

4y 1m

Avg Prosecution

12 currently pending

Career history

409

Total Applications

across all art units

Statute-Specific Performance

§101

0.7%

-39.3% vs TC avg

§103

87.0%

+47.0% vs TC avg

§102

10.0%

-30.0% vs TC avg

§112

1.0%

-39.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 383 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This communication is responsive to the application filed 4/12/2024.
Claims 1-20 are pending with claims 1 and 11 as independent claims. 

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 4/12/2024 and 9/8/2025 were filed on and after the mailing date of the application on 4/12/2024.  The submissions are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

Claims 1-5, 7-15, and 17-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Singh et al. (US 2021/0107164, published 4/15/2021, hereinafter as Singh).

Claim 1. A method executed by an interface engine implemented as a computer program within a computing environment, the interface engine executing detection and automation of computer activity, the method comprising:
recording the computer activity across one or more user interfaces (Uls); Singh teaches in [0024 and 0064] “computing system applications (e.g., desktop and laptop applications, mobile device applications, wearable computer applications, etc.)… In order to extract data pertaining to actions taken by users on computing systems 602, 604, 606, listeners 610 may record where a user clicked on the screen and in what application, keystrokes, which button was clicked, instances of the user switching between applications, focus changes, that an email was sent and what the email pertains to, etc. Such data can be used to generate a high-fidelity log of the user's interactions with computing systems 602, 604, 606.” (emphasis added) examiner note: screen (user interface) activities across computing systems 602, 604, and 606 may be recorded,
automatically processing the computer activity utilizing at least one generative Al model to extract patterns comprising portions of the computer activity that are similar or resilient to changes; Singh teaches in [0017, 0065-0066, and 0070-0075] “The data collected by the listeners may then be sent to one or more servers and be stored in a database. This data may be analyzed by AI layers to recognize patterns of user behavioral processes therein. These recognized processes may then be distilled into respective RPA workflows and deployed to automate the processes… data may be generated until a desired per-user or total volume of data and/or a maximum recording time (per user or total) is reached… Patterns may be determined individually by an AI layer or collectively by multiple AI layers… Once many actions are obtained from multiple users, the identified actions (e.g., clicked buttons, applications that were used, text that was entered, etc.) may then be fed to AI layers 632 to extract processes therefrom. Alternatively, extracted actions from a single user's screenshots could be used to automatically generate a workflow, which the SME may then edit to ensure it is correct in some embodiments.” (emphasis added) examiner note: action data recorded by listeners may be analyzed to determine patterns of activities for a time period (duration), and
determining a plurality of existing automations according to the patterns. Singh teaches in [0060-0061] “If a similar process already exists, server 630 may identify this similarity and know that the identified process should replace an existing process for a similar automation that works less optimally. For example, similarities between processes may be determined by a common beginning and end and some amount of statistical commonality in the steps taking in between… Server 630 may then automatically generate a workflow including the identified process, generate a robot implementing the workflow (or a replacement robot), and push the generated robot out to user computing systems 602, 604, 606 to be executed thereon.” (emphasis added).

Claims 2 and 12. The rejection of the method of claim 1 is incorporated, wherein the computer activity comprises text changes with a desktop or a home screen of a display. Singh teaches in [0041] “Workflows may include user-defined activities 320 and UI automation activities 330. Some embodiments are able to identify non-textual visual components in an image, which is called computer vision (CV) herein. Some CV activities pertaining to such components may include, but are not limited to, click, type, get text, hover, element exists, refresh scope, highlight, etc. Click in some embodiments identifies an element using CV, optical character recognition (OCR), fuzzy text matching, and multi-anchor, for example, and clicks it. Type may identify an element using the above and types in the element. Get text may identify the location of specific text and scan it using OCR. Hover may identify an element and hover over it.” (emphasis added).

Claims 3 and 13. The rejection of the method of claim 1 is incorporated, wherein the computer activity comprises one or more events comprising actions or occurrences generated or triggered within the computer environment and recognized by software of the computer environment. Example of the actions or occurrences include, but are not limited to, application errors, application initiation, application closing, changing or switching between Uls, saving instances, selecting an icon, cutting, pasting, activities externally originating asynchronously from the computing environment. Singh teaches in [0041] “Workflows may include user-defined activities 320 and UI automation activities 330. Some embodiments are able to identify non-textual visual components in an image, which is called computer vision (CV) herein. Some CV activities pertaining to such components may include, but are not limited to, click, type, get text, hover, element exists, refresh scope, highlight, etc. Click in some embodiments identifies an element using CV, optical character recognition (OCR), fuzzy text matching, and multi-anchor, for example, and clicks it. Type may identify an element using the above and types in the element. Get text may identify the location of specific text and scan it using OCR. Hover may identify an element and hover over it.” (emphasis added).

Claims 4 and 14. The rejection of the method of claim 1 is incorporated, wherein the computer activity includes one or more screenshots of a display. Singh teaches in [0074-0075 and 0082] “The process begins with a listener capturing screenshots from a user's computing system while the user interacts with the computing system at 910. In some embodiments, the screenshots may be captured with a predetermined frequency, when a user takes a certain action, or a combination thereof.” (emphasis added).

Claims 5 and 15. The rejection of the method of claim 1 is incorporated, wherein the generative Al model comprises one or more of a generative pre-trained transform (GPT), a Al agent, and a large language models (LLM) gateway. Singh teaches in [0022, 0026, 0059 and 0070] “Once a workflow is developed in designer 110, execution of business processes is orchestrated by conductor 120, which orchestrates one or more robots 130 that execute the workflows developed in designer 110… Robots 130 are execution agents that run workflows built in designer 110… AI layers 632 process the log data and identify one or more potential processes therein. AI layers 632 may perform statistical modeling (e.g., hidden Markov models (HMMs)) and utilize deep learning techniques (e.g., long short term memory (LSTM) deep learning, encoding of previous hidden states, etc.) and perform case identification to identify an atomic instance of a process… Each AI layer is an algorithm (or model) that runs on the log data, and the AI model itself may be deep learning neural networks (DLNNs) of trained artificial “neurons” that are trained in training data. Layers may be run in series or in parallel.” (emphasis added) examiner note: the generative AI model may be an execution AI agent.

Claims 7 and 17. The rejection of the method of claim 1 is incorporated, wherein the changes comprise changes in one or more of colors, themes, languages, resolution, screen size, and browser across the Uls. Singh teaches in [0030-0031] “Agents may be Windows® Presentation Foundation (WPF) applications that display the available jobs in the system tray window. Agents may be a client of the service. Agents may request to start or stop jobs and change settings. The command line is a client of the service. The command line is a console application that can request to start jobs and waits for their output… Special behaviors may be configured per component this way, such as setting up different firewall rules for the executor and the service. The executor may always be aware of DPI settings per monitor in some embodiments. As a result, workflows may be executed at any DPI, regardless of the configuration of the computing system on which they were created. Projects from designer 110 may also be independent of browser zoom level ins some embodiments. For applications that are DPI-unaware or intentionally marked as unaware, DPI may be disabled in some embodiments.” (emphasis added) examiner note: changing setting of screen DPI may be changing screen resolution.

Claims 8 and 18. The rejection of the method of claim 1 is incorporated, wherein the patterns identify one or more controls on individual screens that are similar or resilient to changes and groups a plurality of screens according to the one or more controls as the portions of the computer activity. Singh teaches in [0057-0059] “Listeners 610 may be robots generated via an RPA designer application, part of an operating system, a downloadable application for a personal computer (PC) or smart phone, or any other software and/or hardware without deviating from the scope of the invention… Listeners 610 generate logs of user interactions with the respective computing system 602, 604, 606 and send the log data via a network 620… to a server 630. The data that is logged may include, but is not limited to, which buttons were clicked, where a mouse was moved, the text that was entered in a field, that one window was minimized and another was opened, the application associated with a window, etc.…. the log data may be sent to server 630 once a predetermined amount of log data has been collected, after a predetermined time period has elapsed, or both… server 630 accesses log data collected from various users by listeners 610 from database 640 and runs the log data through multiple AI layers 632. AI layers 632 process the log data and identify one or more potential processes therein. AI layers 632 may perform statistical modeling (e.g., hidden Markov models (HMMs)) and utilize deep learning techniques (e.g., long short term memory (LSTM) deep learning, encoding of previous hidden states, etc.) and perform case identification to identify an atomic instance of a process. For invoice processing, for example, completion of one invoice may be a case.” (emphasis added) examiner note: text entered in a field may be a control that is resilient to change such that text field may be designed to accept changes by entering textual characters. Collecting and sending log data once a predetermined amount of log data has been collected may be grouping user interactions as portions of computer activity.  

Claims 9 and 19. The rejection of the method of claim 1 is incorporated, wherein the plurality of automations comprises new automations or robotic process automations (RPAs). Singh teaches in [0057-0061] “Each computing system 602, 604, 606 has a listener 610 installed thereon. Listeners 610 may be robots generated via an RPA designer application… Listeners 610 generate logs of user interactions with the respective computing system 602, 604, 606 and send the log data via a network 620… to a server 630… server 630 accesses log data collected from various users by listeners 610 from database 640 and runs the log data through multiple AI layers 632. AI layers 632 process the log data and identify one or more potential processes therein… identified processes may be listed for a user to peruse, and may be sorted by various factors including, but not limited to, an RPA score indicating how suitable a given process is to RPA (e.g., based on complexity of the automation, execution time, perceived benefit to key performance indicators such as revenue generated, revenue saved, time saved, etc.), process name, total recording time, the number of users who executed the process, process execution time (e.g., least or most time), etc.…. If a similar process already exists, server 630 may identify this similarity and know that the identified process should replace an existing process for a similar automation that works less optimally.” (emphasis added) examiner note: identified process (RPA) may be compared to existing processes (RPAs) to determine similarities between new identified process (RPA) and existing processes (RPAs). 

Claims 10 and 20. The rejection of the method of claim 1 is incorporated, wherein the interface engine searches a repository of existing automations or RPAs according to descriptions or titles associated with the existing automations or RPAs to find one or more matches to the patterns. Singh teaches in [0060-0061] “identified processes may be listed for a user to peruse, and may be sorted by various factors including, but not limited to, an RPA score indicating how suitable a given process is to RPA (e.g., based on complexity of the automation, execution time, perceived benefit to key performance indicators such as revenue generated, revenue saved, time saved, etc.), process name, total recording time, the number of users who executed the process, process execution time (e.g., least or most time), etc. The process workflow may be displayed when a user clicks on a given process, including steps, parameters, and interconnections. In certain embodiments, only process activities that appear to be important from a clustering perspective may be used… If a similar process already exists, server 630 may identify this similarity and know that the identified process should replace an existing process for a similar automation that works less optimally.” (emphasis added) examiner note: identified processes may be sorted based process name indicating process (RPA) title associated with the process.

Claim 11. The claim is directed towards a computing system to implement the method of claim 1, therefore, the claim is similarly rejected as claim 1. Further, the computing system comprising a memory storing a program code of an interface engine for executing detection and automation of computer activity; and at least one processor executing the program code to cause the interface engine. Singh teaches in [0046-0047] “Computing system 500 includes a bus 505 or other communication mechanism for communicating information, and processor(s) 510 coupled to bus 505 for processing information… Computing system 500 further includes a memory 515 for storing information and instructions to be executed by processor(s) 510.” (emphasis added).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Singh as applied to claim 1 above, and further in view of MURAKAWA (US 2024/0028188, published 1/25/2024).

Claims 6 and 16. The rejection of the method of claim 1 is incorporated, Singh does not explicitly disclose wherein the patterns extracted by the interface engine are language agnostic. However, MURAKAWA, in an analogous art, teaches in [0041-0046 and 0126] “the command COM 1 is a command conforming to the specifications of the programming language of the software 131 dedicated to camera control. The command COM4 is a command conforming to a programming language of the software 134 dedicated to robot control that is different from the programming language of the software 131. The programming language used in the software 131 is an example of a first programming language. The programming language used in the software 134 is an example of a second programming language. The software 134 dedicated to robot control is, for example, a robot program. The user can easily perform programming without learning these programming languages, and thus the convenience of the programming is further improved.” (emphasis added) examiner note: actions performed user interface, as in fig. 4, may be performed by different robotic devices configured  performing commands specified in different programming languages.
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Singh with the teaching of MURAKAWA because “the source programs PR1 and PR2 may be created in different programming languages… The source programs PR1 and PR2 are created in advance by a designer familiar with a predetermined programming language, for example, C #, and the user does not need to create the source programs PR1 and PR2. As described above, the user can program the control flow program 101 by combining modules even if the user does not have knowledge for creating the source programs PR1 and PR2, and thus the convenience of programming is improved.” MURAKAWA [0129-0132].

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AHAMED I NAZAR whose telephone number is (571)270-3174. The examiner can normally be reached 10 am to 7 pm Mon-Fri.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached at 571-272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/AHAMED I NAZAR/Examiner, Art Unit 2178                                                                                                                                                                                                        4/28/2026

/STEPHEN S HONG/Supervisory Patent Examiner, Art Unit 2178

Read full office action

Prosecution Timeline

Apr 12, 2024

Application Filed

May 07, 2026

Non-Final Rejection mailed — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/899,130

Patent 12619796

VIRTUAL ENVIRONMENT FOR LARGE-SCALE CAPITAL PROJECTS

3y 8m to grant Granted May 05, 2026

17/761,532

Patent 12564342

METHODS, SYSTEMS, AND DEVICES FOR THE DIAGNOSIS OF BEHAVIORAL DISORDERS, DEVELOPMENTAL DELAYS, AND NEUROLOGIC IMPAIRMENTS

3y 11m to grant Granted Mar 03, 2026

17/566,782

Patent 12548333

DYNAMIC NETWORK QUANTIZATION FOR EFFICIENT VIDEO INFERENCE

4y 1m to grant Granted Feb 10, 2026

18/383,839

Patent 12549503

INFORMATION INTERACTION METHOD AND APPARATUS, AND ELECTRONIC DEVICE

2y 3m to grant Granted Feb 10, 2026

17/771,649

Patent 12539042

Multi-Modal Imaging System and Method Therefor

3y 9m to grant Granted Feb 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

53%

Grant Probability

86%

With Interview (+32.8%)

4y 1m (~1y 12m remaining)

Median Time to Grant

Low

PTA Risk

Based on 383 resolved cases by this examiner. Grant probability derived from career allowance rate.