DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the Office Action mailed August 15, 2025, applicant submitted an amendment filed on November 17, 2025, in which the applicant amended and requested reconsideration.
Response to Arguments
Applicants argue that the prior art cited fails to teach the claims as amended. Applicants’ arguments are persuasive, but are moot in view of new grounds of rejection.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-7, 9-16 and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatenable over Folkens et al. (PGPUB 2016/0283595), hereinafter referenced as Folkens in view of Klein et al. (PGPUB 2024/0241624), hereinafter referenced as Klein.
Regarding claim 1, Folkens discloses a method, comprising:
obtaining a screenshot of an application window rendered by an application (capture screen; fig. 2 with p. 0128-0129);
obtaining an application window metadata (p. 0097, 0112-0115);
identifying an image within the screenshot (image of interest; fig. 2 element 230 with p. 0128-0129);
generating, with a multi-modal model, a caption of the image (tag; fig. 2, element 240 with p. 0128-0129); and
generating a label of the screenshot based on the caption of the image and the application window metadata (tag; fig. 2, element 240 with p. 0128-0129, 0097, 0112-0115), but does not specifically teach training a machine learning model to predict human-computer interactions from screenshots by providing the machine learning model with the screenshot and the label of the screenshot, wherein another application personalizes a user experience by using the machine learning model to predict, from an individual screenshot, what a user is doing on a computing device.
Klein discloses a method comprising training a machine learning model to predict human-computer interactions from screenshots by providing the machine learning model (training a machine learning by learning past data of user interactions; p. 0156) with the screenshot and the label of the screenshot (screenshot as well as context data in the screenshot), wherein another application personalizes a user experience by using the machine learning model to predict (predict user intent from screenshot), from an individual screenshot, what a user is doing on a computing device (p. 0107-0115), to determine, detect and predict user intent.
Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to assist with seamlessly performing native functions and to support multi-modal input.
Regarding claim 2, Folkens discloses a method further comprising:
identifying a region of text within the screenshot (fig. 2, element 230); and
extracting text from the region of text (OCR; p. 0113). In addition, Klein discloses generating the label of the screenshot at least in part based on the extracted text (OCR; p. 0115-118).
Regarding claim 3, Folkens discloses a method further comprising:
a title of the application window, wherein the title of the application window is displayed in a title bar of the application window (title; figs. 2 and 3 with p. 0113, 0248).
Regarding claim 4, Folkens discloses a method further comprising:
automatically navigating the application to a website and causing an automated agent to interact with the website in accordance with a usage history of the website (generating results automatically; fig. 3 with p. 0130). In addition, Klein discloses a method wherein the screenshot is captured after the automated agent interacts with the website (p. 0107-0115).
Regarding claim 5, it is interpreted and rejected for similar reasons as set forth above. In addition, Folkens discloses a method further comprising:
generating an activity set by grouping screenshots taken while causing an automated agent to navigate to frequently visited locations of the application,
validating a feature of an individual application that uses an individual machine learning model trained with the activity set (measure of confidence assigned to correctly characterized content; p. 0021, 0037, 0067, 0086-0089, 0099).
Regarding claim 6, Folkens discloses a method further comprising:
generating an activity set by grouping screenshots taken while causing an automated agent to navigate through a stream of locations within the application window (classification; p. 0094-0099, 0107-0108, 0032, 0068-0069); and
training a machine learning model with the activity set (retag/retrain for improved processing; p. 0115-0123, 0127, 0166).
Regarding claim 7, Folkens discloses a method further comprising:
providing an individual screenshot to a feature of an application that uses the machine learning model; (fig. 3); and
validating the feature by comparing an output of the feature by comparing an output of the feature with a label of the individual screenshot (comparing data; p. 0212).
Regarding claim 9, it is interpreted and rejected for similar reasons as set forth in claim 1. In addition, Folkens discloses a system comprising:
a processing unit (p. 0094-0096); and
a memory storing having computer-executable instructions, which, when executed by the processing unit (p. 0094-0096), cause the processing unit to:
receive a source text derived from a screenshot of an application window (OCR; p. 0112-0115);
receive a label that annotates the screenshot (tag; fig. 2, element 240 with p. 0128-0129);
determine a level of correctness of the label in relation to the source text (measure of confidence assigned to correctly characterized content; p. 0021, 0037, 0067, 0086-0089);
determine that the label satisfies a quality criteria (measure of confidence assigned to correctly characterized content; p. 0021, 0037, 0067, 0086-0089);
determine a label grade of the label based on the level of correctness and the determination that the label satisfies the quality criteria (measure of confidence assigned to correctly characterized content; p. 0021, 0037, 0067, 0086-0089); and
determine that the label grade exceeds a defined threshold and validate a feature of an application that uses a machine learning model trained on labeled screenshots of application windows, wherein validating comprises providing the label to the machine learning model and verifying that an output of the machine learning model matches the screenshot (measure of confidence assigned to correctly characterized content; p. 0021, 0037, 0067, 0086-0089, 0099).
Regarding claim 10, Folkens discloses a system wherein the computer-executable instructions further cause the processing unit to:
identify a portion of the source text that satisfies a usefulness criteria, wherein the level of correctness of the label is determined in relation to the identified portion of the source text (measure of confidence assigned to correctly characterized content; p. 0021, 0037, 0067, 0086-0089).
Regarding claim 11, Folkens discloses a system wherein the label is one of a plurality of labels, and wherein the computer-executable instructions further cause the processing unit to:
compute a diversity score of the plurality of labels, wherein the label grade is additionally based on the diversity score (weightings/uniqueness/numeric value; p. 0099-0101, 0122, 0127).
Regarding claim 12, Folkens discloses a system wherein the diversity score is computed based on distances between embedding scores computed for each of the plurality of labels, and wherein the label grade (weightings) is proportional to the diversity score (uniqueness; p; 0099-0101, 0122, 0127).
Regarding claim 13, Folkens discloses a system, wherein the source text is processed by a short text clustering engine before the portion of the source text is determined to satisfy the usefulness criteria (association/classification; p. 0107-0108, 0032, 0068-0069, 0094-0099).
Regarding claim 14, Folkens discloses a system wherein the computer-executable instructions further cause the processing unit to:
identify, across a plurality of labels applied to a plurality of screenshots, clusters of related explanations of label incorrectness or low label quality (lower weight; p. 0094-0099, 0107-0108, 0032, 0068-0069).
Regarding claim 15, it is interpreted and rejected for similar reasons as set forth in claim 1. In addition, Folkens teaches obtaining metadata of the application (p. 0097, 0112-0115, 0122, 0241).
Regarding claim 16, Folkens discloses a medium wherein the metadata comprises a tree of properties of application windows of a desktop that includes the application (list/rank/order/relevance; p. 0097, 0241-0242, 0258-0259).
Regarding claim 18, it is interpreted and rejected for similar reasons as set forth above. In addition, Klein discloses a medium wherein the metadata includes a description of an image displayed in the application, wherein the description of the image is obtained from the application and wherein the label of the screenshot is generated in part based on the description of the image (p. 0115-0118).
Regarding claim 19, Folkens discloses a medium wherein the label is generated by a large language model based on a prompt that instructs the large language model to label the annotated screenshot with a particular use (large library; p. 0213).
Claim(s) 17 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Folkens in view of Klein and in further view of Winn et al. (PGPUB 2020/0150832), hereinafter referenced as Winn.
Regarding claim 17, Folkens and Klein disclose a medium as described above, but fails to teach wherein the screenshot is of a desktop, and wherein the screenshot is cropped to the application based on a location and a size of the application window obtained from the metadata.
Winn discloses a medium wherein the screenshot is of a desktop, and wherein the screenshot is cropped to the application based on a location and a size of the application window obtained from the metadata (automatically enhance, crop, reduce file size; p. 0079, 0135), to provide a high-quality output.
Therefore, it would have been obvious to one of ordinary skill of the art, before the effective filing date of the claimed invention, to modify the method as described above, to refresh the user interface.
Regarding claim 20, it is interpreted and rejected for similar reasons as set forth above. In addition, Folkens discloses a medium wherein the screenshot is generated by a computing device configured with a language (p. 0113). In addition, Winn teaches a medium wherein a screen resolution and a user interface theme selected to create screenshots under a diversity of computing environments (image resolution; p. 0054, 0066-0067, 0079, 0135, 0022-0023).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. This information has been detailed in the PTO 892 attached (Notice of References Cited).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAKIEDA R JACKSON whose telephone number is (571)272-7619. The examiner can normally be reached Mon - Fri 6:30a-2:30p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at 571.272.5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JAKIEDA R JACKSON/Primary Examiner, Art Unit 2657