Prosecution Insights
Last updated: April 19, 2026
Application No. 18/743,275

Producing and Using a Graph Neural Network that Represents Relationships among Screenshots

Final Rejection §103
Filed
Jun 14, 2024
Examiner
MORRIS, JOHN J
Art Unit
2152
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Final)
61%
Grant Probability
Moderate
3-4
OA Rounds
4y 0m
To Grant
81%
With Interview

Examiner Intelligence

Grants 61% of resolved cases
61%
Career Allow Rate
167 granted / 273 resolved
+6.2% vs TC avg
Strong +20% interview lift
Without
With
+20.1%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
21 currently pending
Career history
294
Total Applications
across all art units

Statute-Specific Performance

§101
11.6%
-28.4% vs TC avg
§103
62.0%
+22.0% vs TC avg
§102
11.1%
-28.9% vs TC avg
§112
5.8%
-34.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 273 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . DETAILED ACTION This Office Action corresponds to application 18/743,275 which was filed on 6/14/2024. Response to Amendment In the reply filed 11/8/2025, claims 1-9, 14-15, and 18-20 have been amended. Claims 10-11 and 17 have been cancelled and claims 21-23 have been added. Accordingly, claims 1-9, 12-16, and 18-23 are currently pending. Response to Arguments Applicant’s arguments filed 11/8/2025 have been fully considered but are moot in view of new grounds of rejection. The applicant argues that Wang does not teach “receiving by the local computing device a trained graph neural network from one or more servers”. After further review of the reference, the examiner respectfully disagrees. Wang teaches, in figures 1-2 and paragraphs 45 and 92, that the computer vision system, which comprises the graph neural network, may be stored on the computing devices. Wang further teaches that the graph neural network architecture may be a pre-trained or preexisting neural network architecture, which means the computing device received a trained graph neural network from one or more servers. Additionally, newly cited Anorga teaches, in paragraph 174, the trained model may be generated on a different device and then provided to the application. Therefore, the examiner is not persuaded. The applicant argues that Wang does not teach “generating a plurality of target embeddings associated with nodes in the graph that represent the plurality of screenshots using the trained graph neural network”. After further review of the reference, the examiner respectfully disagrees. Wang teaches, in figure 8 and paragraphs 18-22, generating target embeddings associated with nodes in the graph that represent the images. Wang further teaches, in figures 1-2 and paragraphs 45 and 92, that the graph neural network architecture may be a pre-trained or preexisting neural network architecture, which means the computing device received a trained graph neural network from one or more servers. Additionally, newly cited Anorga teaches, in paragraph 174, the trained model may be generated on a different device and then provided to the application. Therefore, the examiner is not persuaded. The applicant argues that Bhat does not teach “a first machine-trained model that produces an image embedding based on the at least one image region, a second machine-trained model that produces a text embedding based on text content extracted from the least one text region” and “the edges including a first edge of a first edge type that represents common image embeddings produced by the first machine-trained model for two previously captured screenshots, a second edge of a second edge type that represents common text embeddings produced by the second machine-trained model for the two previously captured screenshots” stating that Bhat does not discriminate between images or text. The examiner respectfully disagrees. Bhat teaches, in paragraph 6, that each embedding may identify a different UI element, e.g., a text element or an image element, and that the embeddings may be used to generate a graph with nodes and edges. Bhat further states that each node may correspond to an element of the UI, which is interpreted to mean separate text and image nodes for the corresponding text and image elements. Therefore, the examiner is not persuaded. Examiner’s note Claims 19-20 falls within a statutory category, because paragraph 142 of the instant specification recites “The specific term “computer-readable storage medium” or “storage device” expressly excludes propagated signals per se; a computer-readable storage medium or storage device is “non-transitory” in this regard.” Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-8, and 19-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US2021/0081677, previously cited in ‘892), hereinafter Wang, in view of Hansmann et al. (US2022/0321508, previously cited in ‘892), hereinafter Hansmann, Bhat et al. (US2025/0156199, previously cited in ‘892), hereinafter Bhat, and Anorga et al. (US2018/0336226), hereinafter Anorga. Regarding Claim 1: Wang teaches: A method for generating a graph in a local computing device, comprising: receiving by the local computing device a trained graph neural network from one or more servers (Wang, figures 1-2, [0045, 0092], note the computer vision system, which comprises the graph neural network, may be stored on the computing devices; note the graph neural network architecture may be a pre-trained or preexisting neural network architecture, which means the computing device received a trained graph neural network from one or more servers); capturing and storing a plurality of screenshots at different respective times at the local computing device, the screenshots being images (Wang, figures 1-2 and 8, [0020, 0045, 0048, 0090-0091], note receiving a collection of images; note the images may be at different respective times, e.g., video sequence of images; note the images may be captured and stored; note screenshots are images; note the computer vision system may be stored on the computing devices, e.g., local computing device); using machine-trained models to identify features associated with the plurality of screenshots, the machine-trained models including, for the particular screenshot: a first machine-trained model that produces an image embedding based on the at least one image region (Wang, figure 8, [0020, 0092, 0097], note feature extraction component identifies features using pre-trained or preexisting neural networks to generate a node embedding for the image); determining relationships among pairs of screenshots based on the features (Wang, figure 8, [0019-0021, 0093-0097], note determining relationships among nodes; note nodes are associated with the node embeddings of the images; note screenshots are images); assigning nodes in the graph to represent the plurality of screenshots, and assigning edges to the pairs of nodes having relationships that are determined to satisfy one or more prescribed similarity tests, the edges including a first edge of a first edge type that represents common image embeddings produced by the first machine-trained model for two previously captured screenshots, a second edge of a second edge type that represents common text embeddings, and a third edge of a third edge type that represents a common classification result (Wang, figure 8, [0003, 0019-0021, 0071, 0083, 0093-0097], note determining relationships among nodes; note nodes are associated with the node embeddings of the images; note determining/deriving the edges/relationships is interpreted as a prescribed similarity test; note identifying common objects, which is interpreted to include common image content, common text, and common entities since images may comprise text and entities); generating a plurality of target embeddings associated with nodes in the graph that represent the plurality of screenshots using the trained graph neural network (Wang, figures 1-2 and 8, [0018-0022, 0045, 0092], note using the graph to train/update the graph neural network and obtain high-order relationship information and spatial information to perform segmentation functions on image content such as identifying and segmenting target objects or common objects, e.g., target embeddings; note screenshots are images; note the computer vision system, which comprises the graph neural network, may be stored on the computing devices; note the graph neural network architecture may be a pre-trained or preexisting neural network architecture, which means the computing device received a trained graph neural network from one or more servers); and storing the plurality of target embeddings in a data store as an index (Wang, figure 8, [0018-0022, 0048], note using the graph to train/update the graph neural network and obtain high-order relationship information and spatial information to perform segmentation functions on image content such as identifying and segmenting target objects or common objects, e.g., target embeddings; note the segmentation results are stored in the database, e.g., data store). While Wang teaches the creation and use of graph neural networks for images, Wang doesn’t specifically teach partitioning the plurality of screenshots into regions, including, for a particular screenshot, at least one text region and at least one image region; extracting text from text image regions produced by the partitioning using optical character recognition; using a plurality of machine-trained models to identify features associated with the plurality of screenshots. However, Hansmann, is in the same field of endeavor, data analysis, and Hansmann teaches: using a plurality of different machine-trained models to identify features associated with the plurality of screenshots, the different machine-trained models including, for the particular screenshot: a first machine-trained model that produces an image embedding based on the at least one image region, a second machine-trained model that produces a text embedding based on text content extracted from the least one text region (Hansmann, figure 5, [0036, 0076], note images may be screenshots with associated text; note using a CNN based encoder to analysis and extract image embeddings and using an RNN/BERT based transformer to analysis and extract text embeddings, which are a plurality of machine-trained models used to identify features associated with the images. When combined with the previous reference this would be for the machine learning models used to extract features as taught by Wang). assigning nodes in the graph to represent the plurality of screenshots, and assigning edges to the pairs of nodes having relationships that are determined to satisfy one or more prescribed similarity tests, the edges including a first edge of a first edge type that represents common image embeddings produced by the first machine-trained model for two previously captured screenshots, a second edge of a second edge type that represents common text embeddings produced by the second machine-trained model for the two previously captured screenshots (Hansmann, figure 5, [0036, 0041-0043, 0076], note extracting image and text embeddings; note assigning nodes and edges; note edges are indicative of the common or similar relationships between two nodes; note determining similarities of represented objects and identifying common occurring intents/entities); storing the plurality of target embeddings in a data store as an index (Hansmann, [0053], note creation of a knowledge graph with indices. When combined with the previously cited references this would be for the database storing teachings of Wang). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. While Wang as modified teaches the creation and use of graph neural networks for images, to further support Wang as teaching using a machine-trained models to identify features associated with the plurality of screenshots, the machine-trained models including, for the particular screenshot: a machine-trained model that produces an image embedding based on the at least one image region, a machine-trained model that produces a text embedding based on text content extracted from the least one text region, Bhat, is in the same field of endeavor, data analysis, and Bhat teaches: using a machine-trained models to identify features associated with the plurality of screenshots, the machine-trained models including, for the particular screenshot: a machine-trained model that produces an image embedding based on the at least one image region, a machine-trained model that produces a text embedding based on text content extracted from the least one text region (Bhat, figure 1, [0006, 0021, 0025-0027], note using a machine learning model to generate text and image embeddings from text and image elements of an image/screenshot; note embeddings may be nodes on a graph representation and that each element of the UI, e.g., text and image elements, may be an individual node); It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Bhat because all references are directed to data analysis and because Bhat would expand upon the teachings of the previously cited references in image and text analysis which would improve the effectiveness and accessibility of the system by utilizing a generative model for user interface analysis and improvement (Bhat, [0007-0008]). While Wang as modified teaches the creation and use of graph neural networks for images, Wang as modified doesn’t specifically teach partitioning the plurality of screenshots into regions, including, for a particular screenshot, at least one text region and at least one image region and extracting text from text image regions produced by the partitioning using optical character recognition. However, Anorga, is in the same field of endeavor, data analysis, and Anorga teaches: receiving by the local computing device a trained graph neural network from one or more servers (Anorga, figure 9, [0174-0175], note the trained model may be generated on a different device and then provided to the application; note the trained model may be any type of neural network, e.g., graph neural network; When combined with the previously cited references this would be for the trained graph neural network as taught above); capturing and storing a plurality of screenshots at different respective times at the local computing device, the screenshots being images (Anorga, [0006, 0012, 0033, 0060], note capturing and storing an image to local memory; note images may be screenshots); partitioning the plurality of screenshots into regions, including, for a particular screenshot, at least one text region and at least one image region (Anorga, [0007, 0033, 0054, 0056], note performing image segmentation to identify portions of the image that include text, e.g., text regions, which means the other portions are image regions); extracting text from text image regions produced by the partitioning using optical character recognition (Anorga, [0057], note performing optical character recognition to generate text extracts from the image); using machine-trained models to identify features associated with the plurality of screenshots, the machine-trained models including, for the particular screenshot: a third machine-trained model that produces a classification result based on the at least one text region and/or the at least one image region (Anorga, [0006, 0008-0012, 0033, 0059, 0174-0175], note using the trained model to determine categories of images; note the images may be screenshots); a third edge of a third edge type that represents a common classification result produced by the third machine-trained model for the two previously captured screenshots (Anorga, [0008-0012, 0059-0062, 0092-0093, 0174-0175], note using the trained model to determine common/similar categories for multiple images. When combined with the previously references this would be for the creation the nodes and edges for the graph neural network as taught above, e.g., Wang and Hansmann teaching nodes of the graph represent the entity and the edges represent the relationship/similarity between the nodes), It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Anorga because all references are directed to data analysis and because Anorga would expand upon the teachings of the previously cited references in image and text analysis which would improve accuracy and speed of the system by identifying an utilizing different regions of the image and using trained models for analysis (Anorga, 0063). Regarding Claim 2: Wang as modified shows the method as disclosed above; Wang as modified further teaches: wherein a node in the graph that describes the particular screenshot describes an entirety of contents presented on a user interface presentation at a particular time (Wang, figure 8, [0019-0021, 0092, 0097], note the nodes represent the images; note screenshots are images which may describe an entirety of contents presented on a user interface; note the image describing an entirety of contents presented on a user interface is nonfunctional descriptive material as explained in section 2111.05 of the MPEP and does not hold patentable weight) (Hansmann, [0036], note images may be screenshots of a user interface). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. Regarding Claim 3: Wang as modified shows the method as disclosed above; Wang as modified further teaches: wherein a node in the graph that describes the particular screenshot describes a portion of an entirety of contents presented on a user interface presentation at a particular time, the portion being less than the entirety (Wang, figure 8, [0019-0021, 0092, 0097], note the nodes represent the images; note screenshots are images which may describe a portion of contents presented on a user interface; note the image describing a portion of contents presented on a user interface is nonfunctional descriptive material as explained in section 2111.05 of the MPEP and does not hold patentable weight) (Hansmann, [0036], note images may be screenshots of a user interface). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. Regarding Claim 4: Wang as modified shows the method as disclosed above; Wang as modified further teaches: wherein the graph also includes: text nodes associated with instances of text, each instance of text being associated with at least one of the plurality of screenshots (Hansmann, figure 5, [0036, 0041-0043, 0076], note extracting image and text embeddings; note assigning nodes and edges; note edges are indicative of the common or similar relationships between two nodes; note determining similarities of represented objects and identifying common occurring intents/entities) (Bhat, figure 1, [0006, 0021, 0025-0027], note using a machine learning model to generate text and image embeddings from text and image elements of an image/screenshot; note embeddings may be nodes on a graph representation and that each element of the UI, e.g., text and image elements, may be an individual node); and forth edges that connect the text nodes to nodes that represent the plurality of screenshots (Wang, figure 8, [0003, 0019-0021, 0071, 0083, 0093-0097], note determining relationships among nodes; note nodes are associated with the node embeddings of the images; note determining/deriving the edges/relationships is interpreted as a prescribed similarity test; note identifying common objects, which is interpreted to include common image content, common text, and common entities since images may comprise text and entities) (Hansmann, figure 5, [0036, 0041-0043, 0076], note extracting image and text embeddings; note determining and linking text and image relationships; note assigning nodes and edges; note edges are indicative of the common or similar relationships between two nodes; note determining similarities of represented objects and identifying common occurring intents/entities) (Bhat, figure 1, [0006, 0021, 0025-0027], note using a machine learning model to generate text and image embeddings from text and image elements of an image/screenshot; note embeddings may be nodes on a graph representation and that each element of the UI, e.g., text and image elements, may be an individual node). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Bhat because all references are directed to data analysis and because Bhat would expand upon the teachings of the previously cited references in image and text analysis which would improve the effectiveness and accessibility of the system by utilizing a generative model for user interface analysis and improvement (Bhat, [0007-0008]). Regarding Claim 5: Wang as modified shows the method as disclosed above; Wang as modified further teaches: identifying entities associated with the plurality of screenshots based on the features using the third machine-trained model (Wang, figure 8, [0019-0021, 0091-0097], note feature extraction component identifies features to generate a node embeddings for the images, e.g., identifying entities) (Hansmann, [0040, 0043, 0052], note identifying entities and associated them with other nodes) (Anorga, [0006, 0008-0012, 0033, 0059-0062, 0174-0175], note using the trained model to determine common/similar categories for multiple images; note the images may be screenshots); and linking nodes associated with the screenshots that are associated with common entities (Wang, figure 8, [0019-0021, 0091-0097], note determining relationships amongst nodes and linking the nodes) (Hansmann, [0040, 0043, 0052], note identifying entities and associated them with other nodes) (Anorga, [0006, 0008-0012, 0033, 0059-0062, 0174-0175], note using the trained model to determine common/similar categories for multiple images; note the images may be screenshots. When combined with the previously references this would be for the creation the nodes and edges for the graph neural network as taught above, e.g., Wang and Hansmann teaching nodes of the graph represent the entity and the edges represent the relationship/similarity between the nodes). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Anorga because all references are directed to data analysis and because Anorga would expand upon the teachings of the previously cited references in image and text analysis which would improve accuracy and speed of the system by identifying an utilizing different regions of the image and using trained models for analysis (Anorga, 0063). Regarding Claim 6: Wang as modified shows the method as disclosed above; Wang as modified further teaches: wherein the third machine-trained model is: a classification machine-trained model that classifies a topic expressed by the particular screenshot; or a classification machine-trained model that classifies a named entity expressed by the particular screenshot; or a classification machine-trained model that classifies an activity expressed by the particular screenshot (Anorga, [0006, 0008-0012, 0033, 0059, 0174-0175], note using the trained model to determine categories of images; note the images may be screenshots; note that image categories may be interpreted as a topic, name, or activity). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Anorga because all references are directed to data analysis and because Anorga would expand upon the teachings of the previously cited references in image and text analysis which would improve accuracy and speed of the system by identifying an utilizing different regions of the image and using trained models for analysis (Anorga, 0063). Regarding Claim 7: Wang as modified shows the method as disclosed above; Wang as modified further teaches: wherein the third edge type represents: a common occurrence of at least one topic in the two previously captured screenshots; or a common occurrence of at least one named entity in the two previously screenshots; or a common activity associated with the two previously captured screenshots (Wang, figure 8, [0003, 0019-0021, 0071, 0083, 0091-0097], note identifying common objects; note determining relationships amongst nodes and linking the nodes) (Hansmann, [0039-0043, 0052], note determining similarities of represented objects; note identifying common occurring intents/entities; note identifying entities and associated them with other nodes) (Anorga, [0008-0012, 0059-0062, 0092-0093, 0174-0175], note using the trained model to determine common/similar categories for multiple images; note that common image categories may be interpreted as a common topic, name, or activity. When combined with the previously references this would be for the creation the nodes and edges for the graph neural network as taught above, e.g., Wang and Hansmann teaching nodes of the graph represent the entity and the edges represent the relationship/similarity between the nodes). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Anorga because all references are directed to data analysis and because Anorga would expand upon the teachings of the previously cited references in image and text analysis which would improve accuracy and speed of the system by identifying an utilizing different regions of the image and using trained models for analysis (Anorga, 0063). Regarding Claim 8: Wang as modified shows the method as disclosed above; Wang as modified further teaches: wherein the one or more servers have produced the trained graph neural network by: producing a pretrained model by performing pretraining based on a first set of training examples that include images and instances of text associated with the images, wherein the first set of training examples includes images that are not screenshot images (Wang, figure 8, [0092], note the feature extraction component may represent a pre-trained or preexisting neural network architecture to extract feature information from images; note the images do not have to be screenshots) (Bhat, figure 1, [0006-0007, 0021, 0025-0027], note generating text and image embeddings from elements of an image/screenshot; note the images do not have to be an explicit screenshot. When combined with the previously cited references this would be for the features extracted from images and therefore the pretrained and finetuned models as well) (Anorga, figure 9, [0174-0175], note the trained model may be generated on a different device and then provided to the application; note the trained model may be any type of neural network, e.g., graph neural network; note the images used for the model do not have to been screenshots. When combined with the previously cited references this would be for the trained graph neural network as taught above); and producing a finetuned model by performing finetuning based on a second set of training examples that describe example screenshots and instances of text associated with the example screenshots, wherein the trained graph neural network that is provided to the local computing device is the finetuned model (Wang, figure 8, 0018-0022, 0091-0097], note producing a finetuned graph neural network for images/screenshots) (Hansmann, figure 5, [0036, 0076], note images may be screenshots of a user interface) (Bhat, figure 1, [0006-0007, 0021, 0025-0027], note generating text and image embeddings from elements of an image/screenshot. When combined with the previously cited references this would be for the features extracted from images and therefore the pretrained and finetuned models as well) (Anorga, figure 9, [0174-0175], note the trained model may be generated on a different device and then provided to the application; note the trained model may be any type of neural network, e.g., graph neural network. When combined with the previously cited references this would be for the trained graph neural network as taught above). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Bhat because all references are directed to data analysis and because Bhat would expand upon the teachings of the previously cited references in image and text analysis which would improve the effectiveness and accessibility of the system by utilizing a generative model for user interface analysis and improvement (Bhat, [0007-0008]). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Anorga because all references are directed to data analysis and because Anorga would expand upon the teachings of the previously cited references in image and text analysis which would improve accuracy and speed of the system by identifying an utilizing different regions of the image and using trained models for analysis (Anorga, 0063). Regarding Claim 19: Wang teaches: A computer-readable storage medium for storing computer-readable instructions, a processing system executing the computer-readable instructions to perform operations, the operations comprising: using a plurality of machine-trained models to identify features associated with the plurality of images in a first set of training examples, the first set of training examples including images that are not screenshot images (Wang, figure 8, [0020, 0092, 0097], note feature extraction component identifies features using pre-trained or preexisting neural networks to generate a node embedding for the image; note the images do not have to be screenshots); determining relationships among pairs of images based on the features (Wang, figure 8, [0019-0021, 0093-0097], note determining relationships among nodes; note nodes are associated with the node embeddings of the images; note screenshots are images); assigning nodes in the graph to represent the plurality of images, and assigning edges to the pairs of nodes having relationships that are determined to satisfy one or more prescribed similarity tests (Wang, figure 8, [0019-0021, 0093-0097], note determining relationships among nodes; note nodes are associated with the node embeddings of the images; note determining/deriving the edges/relationships is interpreted as a prescribed similarity test); training a graph neural network based on the graph, to produce a pretrained graph neural network, the training including generating a plurality of target embeddings that represent the plurality of images in the first set of training examples (Wang, figure 8, [0018-0022, 0092], note using the graph to train/update the graph neural network and obtain high-order relationship information and spatial information to perform segmentation functions on image content such as identifying and segmenting target objects or common objects, e.g., target embeddings; note the feature extraction component may represent a pre-trained or preexisting neural network architecture to extract feature information from images; note the images do not have to be screenshots); finetuning the graph neural network based on a second set of training examples, to produce a finetuned graph neural network, the second set of training examples describing screenshots produced by plural local computing devices, and instances of text associated with the screenshots (Wang, figure 8, 0018-0022, 0091-0097], note using the graph to train/update the graph neural network and producing a finetuned graph neural network for images/screenshots); and the finetuned graph neural network is used to produce target embeddings for screenshots locally captured by the local computing device (Wang, figures 1-2 and 8, [0018-0022, 0045, 0092], note using the graph to train/update the graph neural network and obtain high-order relationship information and spatial information to perform segmentation functions on image content such as identifying and segmenting target objects or common objects, e.g., target embeddings; note screenshots are images; note the computer vision system, which comprises the graph neural network, may be stored on the computing devices; note the graph neural network architecture may be a pre-trained or preexisting neural network architecture, which means the computing device received a trained graph neural network from one or more servers). While Wang teaches the creation and use of graph neural networks for images, Wang doesn’t specifically teach using a plurality of machine-trained models to identify features associated with the plurality of images in a first set of training examples and instances of text associated with the images, the first set of training examples including images that are not screenshot images. However, Hansmann, is in the same field of endeavor, data analysis, and Hansmann teaches: using a plurality of machine-trained models to identify features associated with the plurality of images in a first set of training examples and instances of text associated with the images, the first set of training examples including images that are not screenshot images (Hansmann, figure 5, [0036, 0076], note images may be screenshots with associated text; note using a CNN based encoder to analysis and extract image embeddings and using an RNN/BERT based transformer to analysis and extract text embeddings, which are a plurality of machine-trained models used to identify features associated with the images. When combined with the previous reference this would be for the machine learning models used to extract features as taught by Wang). the second set of training examples describing screenshots produced by plural local computing devices, and instances of text associated with the screenshots (Hansmann, figure 5, [0036, 0076], note images may be screenshots with associated text; note using a CNN based encoder to analysis and extract image embeddings and using an RNN/BERT based transformer to analysis and extract text embeddings, which are a plurality of machine-trained models used to identify features associated with the images. When combined with the previous reference this would be for the machine learning models used to extract features as taught by Wang) It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. While Wang as modified teaches the creation and use of graph neural networks for images, to further support Wang as modified teaching training a graph neural network based on the graph, to produce a pretrained graph neural network, the training including generating a plurality of target embeddings that represent the plurality of images in the first set of training examples. However, Bhat, is in the same field of endeavor, data analysis, and Bhat teaches: using a plurality of machine-trained models to identify features associated with the plurality of images in a first set of training examples and instances of text associated with the images, the first set of training examples including images that are not screenshot images (Bhat, figure 1, [0006-0007, 0021, 0025-0027], note generating text and image embeddings from elements of an image/screenshot; note the images do not have to be an explicit screenshot. When combined with the previously cited references this would be for the features extracted from images and therefore the pretrained and finetuned models as well) It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Bhat because all references are directed to data analysis and because Bhat would expand upon the teachings of the previously cited references in image and text analysis which would improve the effectiveness and accessibility of the system by utilizing a generative model for user interface analysis and improvement (Bhat, [0007-0008]). While Wang as modified teaches the creation and use of graph neural networks for images, Wang as modified doesn’t specifically teach transferring parameters of the finetuned graph neural network to a particular local computing device, where the finetuned graph neural network is used to produce target embeddings for screenshots locally captured by the local computing device. However, Anorga, is in the same field of endeavor, data analysis, and Anorga teaches: transferring parameters of the finetuned graph neural network to a particular local computing device, where the finetuned graph neural network is used to produce target embeddings for screenshots locally captured by the local computing device (Anorga, figure 9, [0006, 0008-0012, 0033, 0059, 0174-0175], note the trained model may be generated on a different device and then provided to the application; note the trained model may be any type of neural network, e.g., graph neural network; note using the trained model to determine categories of images; note the images may be screenshots. When combined with the previously cited references this would be for the trained graph neural network as taught above). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Anorga because all references are directed to data analysis and because Anorga would expand upon the teachings of the previously cited references in image and text analysis which would improve accuracy and speed of the system by identifying an utilizing different regions of the image and using trained models for analysis (Anorga, 0063). Regarding Claim 20: Wang as modified shows the computer-readable storage medium as disclosed above; Wang as modified further teaches: wherein the plurality of different machine-trained models includes: a first machine-trained model that produces an image embedding based on image content of a particular image; or a second machine-trained model that produces a text embedding based on text content of the particular image; or a third machine-trained model that identifies a topic expressed by the particular image; or a fourth machine-trained model that identifies a named entity expressed by the particular image; or a fifth machine-trained model that identifies an activity expressed by the particular image; or any combination of the first, second, third, fourth, and first machine-trained models (Wang, figures 1 and 8, [0019-0021, 0091-0097], note a machine-trained model that produces an image embedding based on image content) (Hansmann, figure 5, [0036, 0076], note images may be screenshots with associated text; note using a CNN based encoder to analysis and extract image embeddings and using an RNN/BERT based transformer to analysis and extract text embeddings, which are a plurality of machine-trained models used to identify features associated with the images. When combined with the previous reference this would be for the machine learning models used to extract features as taught by Wang) (Bhat, figure 1, [0006, 0021, 0025-0027], note using a machine learning model to generate text and image embeddings from text and image elements of an image/screenshot; note embeddings may be nodes on a graph representation and that each element of the UI, e.g., text and image elements, may be an individual node) (Anorga, [0006, 0008-0012, 0033, 0059, 0174-0175], note using the trained model to determine categories of images; note the images may be screenshots). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Bhat because all references are directed to data analysis and because Bhat would expand upon the teachings of the previously cited references in image and text analysis which would improve the effectiveness and accessibility of the system by utilizing a generative model for user interface analysis and improvement (Bhat, [0007-0008]). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Anorga because all references are directed to data analysis and because Anorga would expand upon the teachings of the previously cited references in image and text analysis which would improve accuracy and speed of the system by identifying an utilizing different regions of the image and using trained models for analysis (Anorga, 0063). Regarding Claim 21: Wang as modified shows the method as disclosed above; Wang as modified further teaches: wherein the third machine-trained model is a classification machine-trained model that classifies a topic expressed by the particular screenshot (Anorga, [0006, 0008-0012, 0033, 0059, 0174-0175], note using the trained model to determine categories of images; note the images may be screenshots; note that image categories may be interpreted as a topic, name, or activity), and wherein the plurality of machine-trained models also includes: a classification machine-trained model that classifies a named entity expressed by the particular screenshot (Anorga, [0006, 0008-0012, 0033, 0059, 0174-0175], note using the trained model to determine categories of images; note the images may be screenshots; note that image categories may be interpreted as a topic, name, or activity); and a classification machine-trained model that classifies an activity expressed by the particular screenshot (Anorga, [0006, 0008-0012, 0033, 0059, 0174-0175], note using the trained model to determine categories of images; note the images may be screenshots; note that image categories may be interpreted as a topic, name, or activity). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Anorga because all references are directed to data analysis and because Anorga would expand upon the teachings of the previously cited references in image and text analysis which would improve accuracy and speed of the system by identifying an utilizing different regions of the image and using trained models for analysis (Anorga, 0063). Regarding Claim 22: Wang as modified shows the method as disclosed above; Wang as modified further teaches: wherein the third edge type represents a common occurrence of at least one topic in the two previously captured screenshots (Wang, figure 8, [0003, 0019-0021, 0071, 0083, 0091-0097], note identifying common objects; note determining relationships amongst nodes and linking the nodes) (Hansmann, [0039-0043, 0052], note determining similarities of represented objects; note identifying common occurring intents/entities; note identifying entities and associated them with other nodes) (Anorga, [0008-0012, 0059-0062, 0092-0093, 0174-0175], note using the trained model to determine common/similar categories for multiple images; note that common image categories may be interpreted as a common topic, name, or activity. When combined with the previously references this would be for the creation the nodes and edges for the graph neural network as taught above, e.g., Wang and Hansmann teaching nodes of the graph represent the entity and the edges represent the relationship/similarity between the nodes), and wherein the edges also include: a fourth edge type that represents a common occurrence of at least one named entity in the two previously screenshots (Wang, figure 8, [0003, 0019-0021, 0071, 0083, 0091-0097], note identifying common objects; note determining relationships amongst nodes and linking the nodes) (Hansmann, [0039-0043, 0052], note determining similarities of represented objects; note identifying common occurring intents/entities; note identifying entities and associated them with other nodes) (Anorga, [0008-0012, 0059-0062, 0092-0093, 0174-0175], note using the trained model to determine common/similar categories for multiple images; note that common image categories may be interpreted as a common topic, name, or activity. When combined with the previously references this would be for the creation the nodes and edges for the graph neural network as taught above, e.g., Wang and Hansmann teaching nodes of the graph represent the entity and the edges represent the relationship/similarity between the nodes); and a fifth edge type that represents a common activity associated with the two previously captured screenshots (Wang, figure 8, [0003, 0019-0021, 0071, 0083, 0091-0097], note identifying common objects; note determining relationships amongst nodes and linking the nodes) (Hansmann, [0039-0043, 0052], note determining similarities of represented objects; note identifying common occurring intents/entities; note identifying entities and associated them with other nodes) (Anorga, [0008-0012, 0059-0062, 0092-0093, 0174-0175], note using the trained model to determine common/similar categories for multiple images; note that common image categories may be interpreted as a common topic, name, or activity. When combined with the previously references this would be for the creation the nodes and edges for the graph neural network as taught above, e.g., Wang and Hansmann teaching nodes of the graph represent the entity and the edges represent the relationship/similarity between the nodes). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Anorga because all references are directed to data analysis and because Anorga would expand upon the teachings of the previously cited references in image and text analysis which would improve accuracy and speed of the system by identifying an utilizing different regions of the image and using trained models for analysis (Anorga, 0063). Claim Rejections - 35 USC § 103 Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang, in view of Hansmann, Bhat, Anorga, and Malhotra et al. (US2025/0068910), hereinafter Malhotra. Regarding Claim 9: Wang as modified shows the method as disclosed above; Wang as modified further teaches: wherein the pretraining uses supervised learning by: masking nodes in a graph that describes the first set of training examples, to produce masked nodes (Wang, [0022], note masks that identify semantically similar objects in a collection of images); predicting identities of the masked nodes using the pretrained model (Wang, [0056], note prediction masks); While Wang as modified teaches the creation and use of graph neural networks, Wang as modified doesn’t specifically teach generating loss information based an extent to which the predicted identities of the masked nodes accurately match actual identities of the mask modes; and updating parameters of the pretrained model based on the loss information. However, Malhotra, is in the same field of endeavor, data analysis, and Malhotra teaches: wherein the pretraining uses supervised learning by: masking nodes in a graph that describes the first set of training examples, to produce masked nodes (Malhotra, [0036], note masking features corresponding to nodes of a graph); predicting identities of the masked nodes using the pretrained model (Malhotra, [0036], note predicating the identities of the masks nodes); generating loss information based an extent to which the predicted identities of the masked nodes accurately match actual identities of the mask modes (Malhotra, [0036], note calculating reconstruction loss); and updating parameters of the pretrained model based on the loss information (Malhotra, [0036], note fine-tuning the models based on the loss information). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Malhotra because all references are directed to data analysis and because Malhotra would expand upon the teachings of the previously cited references in image and text analysis which would improve the performance of the machine learning models by learning form more information than conventionally possible (Malhotra, [0042-0043]). Claim Rejections - 35 USC § 103 Claim(s) 12-16, 18, and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang in view of Hansmann, Bhat, Anorga, and Choubey et al. (US2025/0225370), hereinafter Choubey. Regarding Claim 12: Wang as modified shows the method as disclosed above; While Wang as modified teaches the creation and use of graph neural networks, Wang as modified doesn’t specifically teach adding a query node that describes the query to the first graph to produce a second graph that represents an updated version of the first graph; generating a query embedding using the graph neural network based on the second graph; identifying a target embedding associated with the query node that matches the query embedding; and retrieving a previously captured screenshot that is associated with the target embedding. However, Choubey, is in the same field of endeavor, data analysis, and Choubey teaches: wherein the graph is a first graph, and wherein the method further includes using the first graph to perform a retrieval operation by: adding a query node that describes the query to the first graph to produce a second graph that represents an updated version of the first graph (Choubey, figure 4A-4B, [0019, 0074-0079], note adding a query node to the graph, which would produce a second/updated graph); generating a query embedding using the graph neural network based on the second graph (Choubey, figures 4A-4B, [0019, 0074-0079], note generating a query representation, e.g., query embedding); identifying a target embedding associated with the query node that matches the query embedding (Choubey, figures 4A-4B, [0019, 0074-0081], note determining a portion of the graph associated with the query, e.g., identifying a target embedding); and retrieving a previously captured screenshot that is associated with the target embedding (Choubey, figures 4A-4B, [0019, 0077-0083], note transmitting the results to a computer device, when combined with the previously cited references this would be the images/screenshots as taught by Wang and Hansmann). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Choubey because all references are directed to data analysis and because Choubey would expand upon the teachings of the previously cited references in data analysis which would improve the accuracy of the machine learned models by utilizing techniques that expose data that is relevant to a user’s input (Choubey, [0001, 0010-0011]). Regarding Claim 13: Wang as modified shows the method as disclosed above; Wang as modified further teaches: wherein the adding the query node comprises: using the plurality of machine-trained models to identify query features of the query (Wang, figure 8, [0020, 0092, 0097], note feature extraction component identifies features using pre-trained or preexisting neural networks to generate a node embedding for the image) (Hansmann, figure 5, [0036, 0076], note images may be screenshots with associated text; note using a CNN based encoder to analysis and extract image embeddings and using an RNN/BERT based transformer to analysis and extract text embeddings, which are a plurality of machine-trained models used to identify features associated with the images. When combined with the previous reference this would be for the machine learning models used to extract features as taught by Wang) (Choubey, figure 4A-4B, [0019, 0074-0079], note adding a query node to the graph)and using the query features to identify one or more links that connect the query node to one or more other nodes in the first graph (Choubey, figures 4A-4B, [0019, 0074-0081], note determining a portion of the graph associated with the query); and adding one or more edges to the first graph associated with the one or more links (Choubey, figures 4A-4B, [0019, 0074-0081], note adding a query node to the graph, note linking the node by adding edges to the other associated nodes; note determining a portion of the graph associated with the query). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Choubey because all references are directed to data analysis and because Choubey would expand upon the teachings of the previously cited references in data analysis which would improve the accuracy of the machine learned models by utilizing techniques that expose data that is relevant to a user’s input (Choubey, [0001, 0010-0011]). Regarding Claim 14: Wang further teaches: A computing system for accessing screenshot information, comprising: an instruction data store for storing computer-readable instructions (Wang, figures 1-2, [0027-0031], note computing devices, memory, and instructions); and a processing system for executing the computer-readable instructions in the data store (Wang, figure 1, [0027-0031], processing units), to perform operations including: receiving a trained graph neural network from one or more servers (Wang, figures 1-2, [0045, 0092], note the computer vision system, which comprises the graph neural network, may be stored on the computing devices; note the graph neural network architecture may be a pre-trained or preexisting neural network architecture, which means the computing device received a trained graph neural network from one or more servers); generating a first graph having screenshot nodes that are associated with a plurality of previously captured screenshots and instances of text that are associated with the previously captured screenshots, the screenshots being images and the screenshot nodes being associated with respective target embeddings produced by the trained graph neural network (Wang, figures 1 and 8, [0019-0022, 0045, 0048, 0090-0097], note receiving a collection of images; note the images may be captured and stored; note screenshots are images; note feature extraction component identifies features to generate a nodes embeddings for a graph neural network; note determining relationships among nodes; note nodes are associated with the node embeddings of the images); receiving a query (Wang, [0017-0025], note identifying target objects is interpreted as a query for the targeted objects); While Wang teaches the creation and use of graph neural networks for images, Wang doesn’t specifically teach a plurality of previously captured screenshots and instances of text associated with the previously captured screenshots; adding a query node that describes the query to a first graph to produce a second graph that represents an updated version of the first graph, generating a query embedding associated with the query node using the graph neural network based on the second graph; identifying a target embedding that matches the query embedding; and retrieving a previously captured screenshot that is associated with the target embedding. However, Hansmann, is in the same field of endeavor, data analysis, and Hansmann teaches: generating a first graph having screenshot nodes that are associated with a plurality of previously captured screenshots and text nodes associated with instances of text that are associated with the previously captured screenshots, the screenshots being images and the screenshot nodes and text nodes being associated with respective target embeddings produced by the trained graph neural network (Hansmann, figure 5, [0036, 0041-0043, 0076], note images may be screenshots with associated text; note using a CNN based encoder to analysis and extract image embeddings and using an RNN/BERT based transformer to analysis and extract text embeddings, which are a plurality of machine-trained models used to identify features associated with the images; note assigning nodes and edges; note edges are indicative of the common or similar relationships between two nodes; note determining similarities of represented objects and identifying common occurring intents/entities. When combined with the previous reference this would be for the features and graph nodes as taught by Wang). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. While Wang as modified teaches the creation and use of graph neural networks for images, to further support Wang as modified teaching generating a first graph having screenshot nodes that are associated with a plurality of previously captured screenshots and text nodes associated with instances of text that are associated with the previously captured screenshots, the screenshots being images and the screenshot nodes and text nodes being associated with respective target embeddings produced by the trained graph neural network, Bhat, is in the same field of endeavor, data analysis, and Bhat teaches: generating a first graph having screenshot nodes that are associated with a plurality of previously captured screenshots and text nodes associated with instances of text that are associated with the previously captured screenshots, the screenshots being images and the screenshot nodes and text nodes being associated with respective target embeddings produced by the trained graph neural network (Bhat, figure 1, [0006, 0021, 0025-0027], note using a machine learning model to generate text and image embeddings from text and image elements of an image/screenshot; note embeddings may be nodes on a graph representation and that each element of the UI, e.g., text and image elements, may be an individual node); It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Bhat because all references are directed to data analysis and because Bhat would expand upon the teachings of the previously cited references in image and text analysis which would improve the effectiveness and accessibility of the system by utilizing a generative model for user interface analysis and improvement (Bhat, [0007-0008]). While Wang as modified teaches the creation and use of graph neural networks for images, to further support Wang as modified teaching receiving a trained graph neural network from one or more servers, Anorga, is in the same field of endeavor, data analysis, and Anorga teaches: receiving a trained graph neural network from one or more servers (Anorga, figure 9, [0174-0175], note the trained model may be generated on a different device and then provided to the application; note the trained model may be any type of neural network, e.g., graph neural network; When combined with the previously cited references this would be for the trained graph neural network as taught above); It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Anorga because all references are directed to data analysis and because Anorga would expand upon the teachings of the previously cited references in image and text analysis which would improve accuracy and speed of the system by identifying an utilizing different regions of the image and using trained models for analysis (Anorga, 0063). While Wang as modified teaches the creation and use of graph neural networks, Wang as modified doesn’t specifically teach adding a query node that describes the query to a first graph to produce a second graph that represents an updated version of the first graph, generating a query embedding associated with the query node using the graph neural network based on the second graph; identifying a target embedding that matches the query embedding; and retrieving a previously captured screenshot that is associated with the target embedding. However, Choubey, is in the same field of endeavor, data analysis, and Choubey teaches: receiving a query (Choubey, figure 4A-4B, [0018-0019, 0074-0079], note receiving a query); adding a query node that describes the query to a first graph to produce a second graph that represents an updated version of the first graph (Choubey, figure 4A-4B, [0019, 0074-0079], note adding a query node to the graph, which would produce a second/updated graph); generating a query embedding associated with the query node using the trained graph neural network based on the second graph (Choubey, figures 4A-4B, [0019, 0074-0079], note generating a query representation, e.g., query embedding); identifying a target embedding that matches the query embedding (Choubey, figures 4A-4B, [0019, 0074-0081], note determining a portion of the graph associated with the query, e.g., identifying a target embedding); and retrieving a previously captured screenshot that is associated with the target embedding (Choubey, figures 4A-4B, [0019, 0077-0083], note transmitting the results to a computer device, when combined with the previously cited references this would be the images/screenshots as taught by Wang and Hansmann). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Choubey because all references are directed to data analysis and because Choubey would expand upon the teachings of the previously cited references in data analysis which would improve the accuracy of the machine learned models by utilizing techniques that expose data that is relevant to a user’s input (Choubey, [0001, 0010-0011]). Regarding Claim 15: Wang as modified shows the system as disclosed above; Wang as modified further teaches: wherein the adding the query node comprises: using a plurality of machine-trained models to identify query features of the query (Wang, figure 8, [0020, 0092, 0097], note feature extraction component identifies features using pre-trained or preexisting neural networks to generate a node embedding for the image) (Hansmann, figure 5, [0036, 0076], note images may be screenshots with associated text; note using a CNN based encoder to analysis and extract image embeddings and using an RNN/BERT based transformer to analysis and extract text embeddings, which are a plurality of machine-trained models used to identify features associated with the images. When combined with the previous reference this would be for the machine learning models used to extract features as taught by Wang) (Choubey, figure 4A-4B, [0019, 0074-0079], note adding a query node to the graph); using the query features to identify one or more links that connect the query node to one or more of the screenshot nodes in the first graph (Wang, figure 8, [0020, 0092, 0097], note feature extraction component identifies features using pre-trained or preexisting neural networks to generate a node embedding for the image) (Hansmann, figure 5, [0036, 0076], note images may be screenshots with associated text) (Choubey, figure 4A, [0019, 0074-0081], note determining a portion of the graph associated with the query); and adding one or more edges to the first graph associated with the one or more links (Choubey, figure 4A, [0019, 0074-0081], note adding a query node to the graph, note linking the node by adding edges to the other associated nodes; note determining a portion of the graph associated with the query). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Choubey because all references are directed to data analysis and because Choubey would expand upon the teachings of the previously cited references in data analysis which would improve the accuracy of the machine learned models by utilizing techniques that expose data that is relevant to a user’s input (Choubey, [0001, 0010-0011]). Regarding Claim 16: Wang as modified shows the system as disclosed above; Wang as modified further teaches: wherein the previously captured screenshot is associated with a target node in the second graph, and wherein the operations further include identifying neighbor nodes of the target node and retrieving information regarding one or more other screenshots that are associated with the neighbor nodes (Wang, figures 1 and 8, [0019-0022, 0045, 0048, 0090-0097], note receiving a collection of images; note the images may be captured and stored; note screenshots are images; note feature extraction component identifies features to generate a nodes embeddings for a graph neural network; note determining relationships among nodes; note nodes are associated with the node embeddings of the images) (Choubey, figures 3 and 4A-4B, [0019, 0067, 0077-0083], note determining a portion of the graph associated with the query, e.g., identifying a target embedding, includes identifying neighbor nodes which is interpreted as information regarding one or more other screenshots associated with the neighbor nodes when combined with the previously cited references; note transmitting the results to a computer device. When combined with the previously cited references this would be the images/screenshots as taught by Wang and Hansmann). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Choubey because all references are directed to data analysis and because Choubey would expand upon the teachings of the previously cited references in data analysis which would improve the accuracy of the machine learned models by utilizing techniques that expose data that is relevant to a user’s input (Choubey, [0001, 0010-0011]). Regarding Claim 18: Wang as modified shows the system as disclosed above; Wang as modified further teaches: wherein the trained graph neural network has parameters that have been trained by the one or more servers by: producing a pretrained model by performing pretraining based on a first set of training examples that include images and instances of text associated with the images, the first set of training examples including images that are not screenshot images (Wang, figure 8, [0092], note the feature extraction component may represent a pre-trained or preexisting neural network architecture to extract feature information from images; note the images do not have to be screenshots) (Bhat, figure 1, [0006-0007, 0021, 0025-0027], note generating text and image embeddings from elements of an image/screenshot; note the images do not have to be an explicit screenshot. When combined with the previously cited references this would be for the features extracted from images and therefore the pretrained and finetuned models as well) (Anorga, figure 9, [0174-0175], note the trained model may be generated on a different device and then provided to the application; note the trained model may be any type of neural network, e.g., graph neural network; note the images used for the model do not have to been screenshots. When combined with the previously cited references this would be for the trained graph neural network as taught above); and producing a finetuned model by performing finetuning on the pretrained model based on a second set of training examples that describe screenshots and instances of text associated with the screenshots, wherein the finetuned model is the graph neural network that is received by the computing system (Wang, figure 8, 0018-0022, 0091-0097], note producing a finetuned graph neural network for images/screenshots) (Hansmann, figure 5, [0036, 0076], note images may be screenshots of a user interface) (Bhat, figure 1, [0006-0007, 0021, 0025-0027], note generating text and image embeddings from elements of an image/screenshot. When combined with the previously cited references this would be for the features extracted from images and therefore the pretrained and finetuned models as well) (Anorga, figure 9, [0174-0175], note the trained model may be generated on a different device and then provided to the application; note the trained model may be any type of neural network, e.g., graph neural network. When combined with the previously cited references this would be for the trained graph neural network as taught above). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Bhat because all references are directed to data analysis and because Bhat would expand upon the teachings of the previously cited references in image and text analysis which would improve the effectiveness and accessibility of the system by utilizing a generative model for user interface analysis and improvement (Bhat, [0007-0008]). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Anorga because all references are directed to data analysis and because Anorga would expand upon the teachings of the previously cited references in image and text analysis which would improve accuracy and speed of the system by identifying an utilizing different regions of the image and using trained models for analysis (Anorga, 0063). Regarding Claim 23: Wang as modified shows the system as disclosed above; Wang as modified further teaches: wherein the first graph has first edges that connect screenshot nodes associated with screenshots having common image content, second graph edges that connect screenshot nodes associated with screenshots with common text content, and third edges that connect text nodes and associated screenshot nodes (Wang, figure 8, [0003, 0019-0021, 0071, 0083, 0093-0097], note determining relationships among nodes; note nodes are associated with the node embeddings of the images; note determining/deriving the edges/relationships is interpreted as a prescribed similarity test; note identifying common objects, which is interpreted to include common image content, common text, and common entities since images may comprise text and entities) (Hansmann, figure 5, [0036, 0041-0043, 0076], note determining and linking text and image relationships; note extracting image and text embeddings; note assigning nodes and edges; note edges are indicative of the common or similar relationships between two nodes; note determining similarities of represented objects and identifying common occurring intents/entities) (Bhat, figure 1, [0006, 0021, 0025-0027], note using a machine learning model to generate text and image embeddings from text and image elements of an image/screenshot; note embeddings may be nodes on a graph representation and that each element of the UI, e.g., text and image elements, may be an individual node) (Anorga, [0008-0012, 0059-0062, 0092-0093, 0174-0175], note using the trained model to determine common/similar categories for multiple images. When combined with the previously references this would be for the creation the nodes and edges for the graph neural network as taught above, e.g., Wang and Hansmann teaching nodes of the graph represent the entity and the edges represent the relationship/similarity between the nodes). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Hansmann because all references are directed to data analysis and because Hansmann would expand upon the teachings of the previously cited references in image and text analysis which would improve the analysis of the system by utilizing systems that are optimized for the particular type of analysis. It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Bhat because all references are directed to data analysis and because Bhat would expand upon the teachings of the previously cited references in image and text analysis which would improve the effectiveness and accessibility of the system by utilizing a generative model for user interface analysis and improvement (Bhat, [0007-0008]). It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Anorga because all references are directed to data analysis and because Anorga would expand upon the teachings of the previously cited references in image and text analysis which would improve accuracy and speed of the system by identifying an utilizing different regions of the image and using trained models for analysis (Anorga, 0063). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Yao et al. (US2020/0394499) teaches training a graph neural network; Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN J MORRIS whose telephone number is (571)272-3314. The examiner can normally be reached M-F 6:00-2:00 PM EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached at 571-270-0474. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /JOHN J MORRIS/Examiner, Art Unit 2152 2/13/2026 /NEVEEN ABEL JALIL/Supervisory Patent Examiner, Art Unit 2152
Read full office action

Prosecution Timeline

Jun 14, 2024
Application Filed
Aug 08, 2025
Non-Final Rejection — §103
Nov 06, 2025
Applicant Interview (Telephonic)
Nov 06, 2025
Examiner Interview Summary
Nov 08, 2025
Response Filed
Feb 13, 2026
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12585666
CLOUD ENVIRONMENT DATA DISTRIBUTION
2y 5m to grant Granted Mar 24, 2026
Patent 12585630
METHOD AND APPARATUS FOR ANALYZING COVERAGE, BIAS, AND MODEL EXPLANATIONS IN LARGE DIMENSIONAL MODELING DATA
2y 5m to grant Granted Mar 24, 2026
Patent 12536137
VALIDATING DATA FOR INTEGRATION
2y 5m to grant Granted Jan 27, 2026
Patent 12530369
RESUME BACKUP OF EXTERNAL STORAGE DEVICE USING MULTI-ROOT SYSTEM
2y 5m to grant Granted Jan 20, 2026
Patent 12524397
AUTOMATED BATCH GENERATION AND SUBSEQUENT SUBMISSION AND MONITORING OF BATCHES PROCESSED BY A SYSTEM
2y 5m to grant Granted Jan 13, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
61%
Grant Probability
81%
With Interview (+20.1%)
4y 0m
Median Time to Grant
Moderate
PTA Risk
Based on 273 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month