Last updated: May 29, 2026
Application No. 18/046,013
SPATIO-TEMPORAL GRAPH NEURAL NETWORK FOR TIME SERIES PREDICTION

Final Rejection §103
Filed
Oct 12, 2022
Priority
Feb 09, 2022 — continuation of PCTCN2022075671
Examiner
LAU, KAITLYN RENEE
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Final)
Interview Optional

— +100.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 67% grant rate with +100.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 3 resolved cases, 2023–2026
Examiner Intelligence

LAU, KAITLYN RENEE View full profile →
Grants 67% — above average
Career Allowance Rate
2 granted / 3 resolved
+11.7% vs TC avg
Strong +100% interview lift
Without
With
+100.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
13 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
21.6%
-18.4% vs TC avg
§103
66.7%
+26.7% vs TC avg
§102
5.9%
-34.1% vs TC avg
§112
5.9%
-34.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 3 resolved cases
Office Action

§103
DETAILED ACTION
This action is in response to the application filed 01/14/2026. Claims 1-20 are pending and have been examined.
	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-3, 6-7, 11-13, and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Karimi et al. (“Spatiotemporal Graph Neural Network for Performance Prediction of Photovoltaic Power Systems”) (hereafter referred to as Karimi) in view of Jin et al. (“Spatio-Temporal Dual Graph Neural Networks for Travel Time Estimation”) (hereafter referred to as Jin) in further view of Yang et al. (“NENN: Incorporate Node and Edge Features in Graph Neural Networks”) (hereafter referred to as Yang).

Regarding claim 1, Karimi teaches 
A computing system, comprising: a processor; and a memory storing instructions executable by the processor to, during a run-time phase (Karimi, page 3, 2nd column, 4th paragraph, “The TensorFlow2 (Abadi et al. 2016) library was used for building the GNN model and the model was trained and tested on Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz, 48 GB memory, 12 CPU cores, and 12 GB Nvidia GeForce RTX 2080 GPU card” where “Acquisition and processing of real-time data at regular intervals are critical for power forecasting, so the deployment of a production pipeline will need to have an integrated and automated connection with the data management and inference modules” (Karimi, page 5, 1st column, 2nd paragraph). Examiner notes that the deployment of the pipeline is the run-time phase.), 
receive run-time input data that includes time series data indicating a state of a graph network at each of a series of time steps, the graph network including a plurality of nodes, and at least one edge connecting pairs of the nodes (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input” where “Figure 1: The distribution of PV systems across the United States and the graph model used to represent the relationships between the PV systems. (a) All PV systems in our dataset are shown by red circles on the map. (b,c) Graph showing the proximity of PV systems in which edges are included between PV systems that are proximate according to a low/moderate proximity threshold (ϵ = 0, ϵ = 0.5)” where “Acquisition and processing of real-time data at regular intervals are critical for power forecasting, so the deployment of a production pipeline will need to have an integrated and automated connection with the data management and inference modules” (Karimi, page 5, 1st column, 2nd paragraph). Examiner notes that the deployment of the pipeline is the run-time phase that uses similar input data to the training described above.), 
and input the run-time input data into a trained graph neural network to thereby cause the graph neural network to output a predicted state of the graph network at one or more future time steps (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input. The output of the model is of the form                                 
                                    
                                            R
                                        
                                            M
                                            ×
                                            N
                                        
                             where M is the number of future time points for which model will forecast” where “Acquisition and processing of real-time data at regular intervals are critical for power forecasting, so the deployment of a production pipeline will need to have an integrated and automated connection with the data management and inference modules” (Karimi, page 5, 1st column, 2nd paragraph). Examiner notes that the deployment of the pipeline is the run-time phase that uses similar input data to the training described above.), 
wherein the graph neural network includes; a node spatial layer configured to receive, as input, the state of the graph network, and to output, for each node, an aggregate representation of a node neighborhood of the node (Karimi, page 3, 1st column, 3rd paragraph, “The convolutional layers in the GNN adopt a neighborhood aggregation architecture to learn a discriminative vector representation h(v) for each node v (called “node embedding”) across multiple transformation layers. The new layer hi(v) takes a node embedding hi-1(v) (h0(v) is X(v) from input feature matrix) and updates its embedding hi(v) by aggregating the embeddings from its neighbors.” Examiner notes that hi(v) is the node spatial layer.) 
Karimi does not teach 
an edge spatial layer configured to receive, as input for each edge of the at least one edge, a representation of embedded edge features, from the node spatial layer, an aggregate representation of a first node neighborhood of a first node connected by the edge, and from the node spatial layer, an aggregate representation of a second node neighborhood of a second node connected by the edge, 
and wherein the edge spatial layer is configured to output an aggregate representation of an edge neighborhood of the edge, 
and a fully connected layer configured to receive output data from the node spatial layer and the edge spatial layer via a temporal gate, 
and to combine the output data from the node spatial layer and the edge spatial layer with an input temporal state of the network to predict the state of the graph network at the one or more future time steps.
the node spatial layer comprises a sigmoidal function that operates on a weighted sum of the first node and the aggregate representation of the first node neighborhood;
or the edge spatial layer comprises a sigmoidal function that operates on a weighted sum of the edge and the aggregate representation of a second edge connecting a third and a fourth node.
However, Jin discloses
an edge spatial layer configured to receive, as input for each edge of the at least one edge, a representation of embedded edge features, from the node spatial layer, an aggregate representation of a first node neighborhood of a first node connected by the edge, and from the node spatial layer, an aggregate representation of a second node neighborhood of a second node connected by the edge (Jin, page 5, “Thus the dual graph convolution is formulated as: 
    PNG
    media_image1.png
    91
    460
    media_image1.png
    Greyscale

where θn*G and θe*G are respectively the node-wise and edge-wise graph convolution operation, Zn and Ze are respectively the node-wise and edge-wise latent representation, and                         
                            P
                            ∈
                            
                                    R
                                
                                    |
                                    
                                            V
                                        
                                            n
                                        
                                    |
                                    ×
                                    |
                                    
                                            E
                                        
                                            n
                                        
                                    |
                                
                     is the incidence matrix that encodes the connections between nodes and edges, defined as: Pi, (i→j) = Pj, (i→j) = 1 and 0 otherwise” and Jin, page 4, Fig.2 and Jin, page 5, Fig. 3

    PNG
    media_image2.png
    420
    1013
    media_image2.png
    Greyscale

    PNG
    media_image3.png
    521
    774
    media_image3.png
    Greyscale
 
Examiner notes that the Pi is the aggregate representation of a first node neighborhood of a first node connected by the edge and Pj is the aggregate representation of a second node neighborhood of a second node connected by the edge. Examiner further notes that the edge spatial layer is the Edge-wise GCN and the temporal convolution from Figure 3 in the highlighted box. Examiner also further notes that the node spatial layer is Node-wise GCN and the temporal convolution from Figure 3 in the highlighted circle. Examiner additionally notes that the DGCN sends the representations to the edge-wise GCN in Figure 2 which is part of the edge spatial layer. Examiner further notes that the link representations in Figure 3 is the representation of embedded edge features for each edge.); 
and a fully connected layer configured to receive output data from the node spatial layer and the edge spatial layer via a temporal gate (Jin, page 4, Figure 2, 
    PNG
    media_image2.png
    420
    1013
    media_image2.png
    Greyscale
Examiner notes that DGCN is the fully connected layer and the TCN is the temporal gate.), 
and to combine the output data from the node spatial layer and the edge spatial layer with an input temporal state of the network to predict the state of the graph network at the one or more future time steps (Jin, page 4, Figure 2, and Jin page 3, Figure 1, 
    PNG
    media_image2.png
    420
    1013
    media_image2.png
    Greyscale
 
    PNG
    media_image4.png
    308
    496
    media_image4.png
    Greyscale

Examiner notes that the spatio-temporal dual graph learning layer in Figures 1 and 2 combines the output data. Examiner further notes that the historical trajectories in Figure 1 is the input temporal state of the network and the travel time of each link/intersection and travel time of entire path is the state of the graph network at the one or more future time steps.).
the node spatial layer comprises a sigmoidal function that operates on a weighted sum of the first node and the aggregate representation of the first node neighborhood (Jin, page 5, “Thus the dual graph convolution is formulated as: 
    PNG
    media_image1.png
    91
    460
    media_image1.png
    Greyscale

where θn*G and θe*G are respectively the node-wise and edge-wise graph convolution operation, Zn and Ze are respectively the node-wise and edge-wise latent representation, and                         
                            P
                            ∈
                            
                                    R
                                
                                    |
                                    
                                            V
                                        
                                            n
                                        
                                    |
                                    ×
                                    |
                                    
                                            E
                                        
                                            n
                                        
                                    |
                                
                     is the incidence matrix that encodes the connections between nodes and edges, defined as: Pi, (i→j) = Pj, (i→j) = 1 and 0 otherwise” where “to capture the spatial dependencies, we select the simple graph convolution approach to aggregate information of 1-hop neighbors” (Jin, page 5, 1st column, last paragraph) and where “we adopt a dual directional graph convolution to make full of directed graphs which is defined as 
    PNG
    media_image5.png
    36
    449
    media_image5.png
    Greyscale
 Then, we use temporal convolution network to capture temporal dependency of both intersections and links respectively as shown in Fig. 3. We adopt gated mechanism into the temporal convolution network which is proven an effective approach to control the reserved information [40]. The temporal convolution operation is formulated as follows: 
    PNG
    media_image6.png
    29
    426
    media_image6.png
    Greyscale
 ….Empirically, Tanh function can be selected as σ1(•) and Sigmoid function is usually selected as σ2(•) to control the ratio of information passed” (Jin, page 5, 2nd column, 1st-2nd paragraph) and Jin, page 5, Fig. 3

    PNG
    media_image3.png
    521
    774
    media_image3.png
    Greyscale
 
Examiner notes that the Pi is the aggregate representation of a first node neighborhood of a first node. Examiner further notes equation 7 is the weighted sum with W being the weights. Additionally, Examiner notes that the sigmoid function which is in the temporal convolution operates on the outputs of node-wise GCN and intersection representations which thus is the sigmoid function operating on the weighted sum and aggregate representation of the first node. Examiner further notes that the node spatial layer is the node-wise GCN and the temporal convolution highlighted in a box in Figure 3.)
or the edge spatial layer comprises a sigmoidal function that operates on a weighted sum of the edge and the aggregate representation of a second edge connecting a third node and a fourth node (Jin, page 5, “Thus the dual graph convolution is formulated as: 
    PNG
    media_image1.png
    91
    460
    media_image1.png
    Greyscale

where θn*G and θe*G are respectively the node-wise and edge-wise graph convolution operation, Zn and Ze are respectively the node-wise and edge-wise latent representation, and                         
                            P
                            ∈
                            
                                    R
                                
                                    |
                                    
                                            V
                                        
                                            n
                                        
                                    |
                                    ×
                                    |
                                    
                                            E
                                        
                                            n
                                        
                                    |
                                
                     is the incidence matrix that encodes the connections between nodes and edges, defined as: Pi, (i→j) = Pj, (i→j) = 1 and 0 otherwise” where “to capture the spatial dependencies, we select the simple graph convolution approach to aggregate information of 1-hop neighbors” (Jin, page 5, 1st column, last paragraph) and where “we adopt a dual directional graph convolution to make full of directed graphs which is defined as 
    PNG
    media_image5.png
    36
    449
    media_image5.png
    Greyscale
 Then, we use temporal convolution network to capture temporal dependency of both intersections and links respectively as shown in Fig. 3. We adopt gated mechanism into the temporal convolution network which is proven an effective approach to control the reserved information [40]. The temporal convolution operation is formulated as follows: 
    PNG
    media_image6.png
    29
    426
    media_image6.png
    Greyscale
 ….Empirically, Tanh function can be selected as σ1(•) and Sigmoid function is usually selected as σ2(•) to control the ratio of information passed” (Jin, page 5, 2nd column, 1st-2nd paragraph) and Jin, page 5, Fig. 3

    PNG
    media_image3.png
    521
    774
    media_image3.png
    Greyscale
 
Examiner notes that the Pi, (i→j)  is the aggregate representation of a second edge connecting a third node and a fourth node. Examiner further notes equation 7 is the weighted sum with W being the weights. Additionally, Examiner notes that the sigmoid function which is in the temporal convolution operates on the outputs of edge-wise GCN and link representations which thus is the sigmoid function operating on the weighted sum and aggregate representation of a second edge connecting a third node and a fourth node. Examiner further notes that the edge spatial layer is the edge-wise GCN and the temporal convolution highlighted in a circle in Figure 3)
Karimi and Jin are considered analogous to the claimed invention because they both describe a spatio-temporal graph neural network. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Karimi to include an edge spatial layer to perform a prediction like in Jin. Doing so is advantageous because they “fully exploit multi-level spatio-temporal information to improve the quality of the final representations from spatio-temporal learning module” (Jin, page 2, 1st column, point (2)).
Karimi and Jin do not teach, but Yang does teach 
and wherein the edge spatial layer is configured to output an aggregate representation of an edge neighborhood of the edge (Yang, page 8, Algorithm 1)

    PNG
    media_image7.png
    852
    1064
    media_image7.png
    Greyscale
Examiner notes that lines 8-9 of Algorithm 1 show outputting an aggregate representation for each edge.)
Karimi, Jin, and Yang are considered analogous to the claimed invention because they incorporate node and edge features into a graph neural network. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Karimi in view of Jin to include an edge spatial layer like in Yang. Doing so is advantageous because they “incorporate[] node and features to enhance the node and edge embeddings across neural network layer” (Yang, page 13, Conclusion).

Regarding claim 2, Karimi in view of Jin and Yang teaches the computing system of claim 1. Karimi further teaches
wherein the instructions are further executable to, during a training phase: receive training data that includes time series data indicating a state of the graph network at each of a series of historical time steps (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input” where “Figure 1: The distribution of PV systems across the United States and the graph model used to represent the relationships between the PV systems. (a) All PV systems in our dataset are shown by red circles on the map. (b,c) Graph showing the proximity of PV systems in which edges are included between PV systems that are proximate according to a low/moderate proximity threshold (ϵ = 0, ϵ = 0.5)”); 
and train the graph neural network using the training data to output the predicted state of the graph network at the one or more future time steps (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input. The output of the model is of the form                                 
                                    
                                            R
                                        
                                            M
                                            ×
                                            N
                                        
                             where M is the number of future time points for which model will forecast.”).  

Regarding claim 3, Karimi in view of Jin and Yang teaches the computing system of claim 1. Karimi further teaches
wherein the graph network comprises an energy distribution graph network, wherein the nodes represent a plurality of energy generation and/or energy consumption subsystems, and wherein the at least one edge represents an energy distribution linkage between the respective subsystems of each node (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input” where “Figure 1: The distribution of PV systems across the United States and the graph model used to represent the relationships between the PV systems. (a) All PV systems in our dataset are shown by red circles on the map. (b,c) Graph showing the proximity of PV systems in which edges are included between PV systems that are proximate according to a low/moderate proximity threshold (ϵ = 0, ϵ = 0.5).” Examiner notes that the nodes represent the PV systems which generate energy and the edges represent the proximity to another PV system.)

Regarding claim 6, Karimi in view of Jin and Yang teaches the computing system of claim 1. Karimi does not teach, but Jin does teach 
wherein each state of the graph network includes a plurality of node features and a plurality of edge features, which are variable between each state (Jin, page 2, “We build node-wise and edge-wise graphs to respectively characterize the features of both intersections and road segments and present a spatio-temporal dual graph learning layer to capture the independence and interactive latent correlations from them. Further, we propose a novel multi-scale architecture to fully exploit multi-scale architecture to fully exploit multi-level spatio-temporal information to improve the quality of the final representations from spatio-temporal learning module” where “cn(l) and ce(l) are respectively the hidden state of node-wise graph and edge-wise graph after we processed the information from (l-1)-th layer, and its initial value cn(-1) = 0 and ce(􀀀1) = 0.”  Examiner notes that the hidden states are the states of the graph network and are dependent on the variable features of intersections, or nodes, and road segments, or edges.). 
Karimi and Jin are considered analogous to the claimed invention because they both describe a spatio-temporal graph neural network. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Karimi to include variable states of the node features and edge features. Doing so is advantageous because they “respectively characterize the features of both intersections and road segments and present a spatio-temporal graph learning layer to capture the independence and interactive latent correlations from them” (Jin, page 2, 1st column, point (2)).
	
Regarding claim 7, Karimi in view of Jin and Yang teaches the computing system of claim 1. Karimi further teaches 
wherein each state of the graph network further comprises adjacency information (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input” where “Figure 1: The distribution of PV systems across the United States and the graph model used to represent the relationships between the PV systems. (a) All PV systems in our dataset are shown by red circles on the map. (b,c) Graph showing the proximity of PV systems in which edges are included between PV systems that are proximate according to a low/moderate proximity threshold (ϵ = 0, ϵ = 0.5).” Examiner notes that the graph showing proximity of PV systems, the graph network is showing adjacency information.).  

Regarding claim 11, Karimi teaches 
At a computing device, a method for predicting a future state of a graph neural network, the method comprising (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input. The output of the model is of the form                                 
                                    
                                            R
                                        
                                            M
                                            ×
                                            N
                                        
                             where M is the number of future time points for which model will forecast”.): 
During a run-time phase, receiving run-time input data that includes time series data indicating a state of a graph network at each of a series of time steps, the graph network including a plurality of nodes, and at least one edge connecting pairs of the nodes (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input” where “Figure 1: The distribution of PV systems across the United States and the graph model used to represent the relationships between the PV systems. (a) All PV systems in our dataset are shown by red circles on the map. (b,c) Graph showing the proximity of PV systems in which edges are included between PV systems that are proximate according to a low/moderate proximity threshold (ϵ = 0, ϵ = 0.5)” where “Acquisition and processing of real-time data at regular intervals are critical for power forecasting, so the deployment of a production pipeline will need to have an integrated and automated connection with the data management and inference modules” (Karimi, page 5, 1st column, 2nd paragraph). Examiner notes that the deployment of the pipeline is the run-time phase that uses similar input data to the training described above.), 
and inputting the run-time input data into a trained graph neural network to thereby cause the graph neural network to output a predicted state of the graph network at one or more future time steps (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input. The output of the model is of the form                                 
                                    
                                            R
                                        
                                            M
                                            ×
                                            N
                                        
                             where M is the number of future time points for which model will forecast” where “Acquisition and processing of real-time data at regular intervals are critical for power forecasting, so the deployment of a production pipeline will need to have an integrated and automated connection with the data management and inference modules” (Karimi, page 5, 1st column, 2nd paragraph). Examiner notes that the deployment of the pipeline is the run-time phase that uses similar input data to the training described above.), 
wherein the graph neural network includes; a node spatial layer configured to receive, as input, the state of the graph network, and to output, for each node, an aggregate representation of a node neighborhood of the node (Karimi, page 3, 1st column, 3rd paragraph, “The convolutional layers in the GNN adopt a neighborhood aggregation architecture to learn a discriminative vector representation h(v) for each node v (called “node embedding”) across multiple transformation layers. The new layer hi(v) takes a node embedding hi-1(v) (h0(v) is X(v) from input feature matrix) and updates its embedding hi(v) by aggregating the embeddings from its neighbors.” Examiner notes that hi(v) is the node spatial layer.); 
Karimi does not teach 
an edge spatial layer configured to receive, as input for each edge of the at least one edge, a representation of embedded edge features, from the node spatial layer, an aggregate representation of a first node neighborhood of a first node connected by the edge, and from the node spatial layer, an aggregate representation of a second node neighborhood of a second node connected by the edge, 
and wherein the edge spatial layer is configured to output an aggregate representation of an edge neighborhood of the edge;
and a fully connected layer configured to receive output data from the node spatial layer and the edge spatial layer via a temporal gate, 
and to combine the output data from the node spatial layer and the edge spatial layer with an input temporal state of the network to predict the state of the graph network at the one or more future time steps.
the node spatial layer comprises a sigmoidal function that operates on a weighted sum of the first node and the aggregate representation of the first node neighborhood;
or the edge spatial layer comprises a sigmoidal function that operates on a weighted sum of the edge and the aggregate representation of a second edge connecting a third and a fourth node.
Jin, however, discloses
an edge spatial layer configured to receive, as input for each edge of the at least one edge, a representation of embedded edge features, from the node spatial layer, an aggregate representation of a first node neighborhood of a first node connected by the edge, and from the node spatial layer, an aggregate representation of a second node neighborhood of a second node connected by the edge (Jin, page 5, “Thus the dual graph convolution is formulated as: 
    PNG
    media_image1.png
    91
    460
    media_image1.png
    Greyscale

where θn*G and θe*G are respectively the node-wise and edge-wise graph convolution operation, Zn and Ze are respectively the node-wise and edge-wise latent representation, and                         
                            P
                            ∈
                            
                                    R
                                
                                    |
                                    
                                            V
                                        
                                            n
                                        
                                    |
                                    ×
                                    |
                                    
                                            E
                                        
                                            n
                                        
                                    |
                                
                     is the incidence matrix that encodes the connections between nodes and edges, defined as: Pi, (i→j) = Pj, (i→j) = 1 and 0 otherwise” and Jin, page 4, Fig.2 and Jin, page 5, Fig. 3

    PNG
    media_image2.png
    420
    1013
    media_image2.png
    Greyscale

    PNG
    media_image3.png
    521
    774
    media_image3.png
    Greyscale
 
Examiner notes that the Pi is the aggregate representation of a first node neighborhood of a first node connected by the edge and Pj is the aggregate representation of a second node neighborhood of a second node connected by the edge. Examiner further notes that the edge spatial layer is the Edge-wise GCN and the temporal convolution from Figure 3 in the highlighted box. Examiner also further notes that the node spatial layer is Node-wise GCN and the temporal convolution from Figure 3 in the highlighted circle. Examiner additionally notes that the DGCN sends the representations to the edge-wise GCN in Figure 2 which is part of the edge spatial layer. Examiner further notes that the link representations in Figure 3 is the representation of embedded edge features for each edge.); 
and a fully connected layer configured to receive output data from the node spatial layer and the edge spatial layer via a temporal gate (Jin, page 4, Figure 2, 
    PNG
    media_image2.png
    420
    1013
    media_image2.png
    Greyscale
Examiner notes that DGCN is the fully connected layer and the TCN is the temporal gate.), 
and to combine the output data from the node spatial layer and the edge spatial layer with an input temporal state of the network to predict the state of the graph network at the one or more future time steps (Jin, page 4, Figure 2, and Jin page 3, Figure 1, 
    PNG
    media_image2.png
    420
    1013
    media_image2.png
    Greyscale
 
    PNG
    media_image4.png
    308
    496
    media_image4.png
    Greyscale

Examiner notes that the spatio-temporal dual graph learning layer in Figures 1 and 2 combines the output data. Examiner further notes that the historical trajectories in Figure 1 is the input temporal state of the network and the travel time of each link/intersection and travel time of entire path is the state of the graph network at the one or more future time steps.).
the node spatial layer comprises a sigmoidal function that operates on a weighted sum of the first node and the aggregate representation of the first node neighborhood (Jin, page 5, “Thus the dual graph convolution is formulated as: 
    PNG
    media_image1.png
    91
    460
    media_image1.png
    Greyscale

where θn*G and θe*G are respectively the node-wise and edge-wise graph convolution operation, Zn and Ze are respectively the node-wise and edge-wise latent representation, and                         
                            P
                            ∈
                            
                                    R
                                
                                    |
                                    
                                            V
                                        
                                            n
                                        
                                    |
                                    ×
                                    |
                                    
                                            E
                                        
                                            n
                                        
                                    |
                                
                     is the incidence matrix that encodes the connections between nodes and edges, defined as: Pi, (i→j) = Pj, (i→j) = 1 and 0 otherwise” where “to capture the spatial dependencies, we select the simple graph convolution approach to aggregate information of 1-hop neighbors” (Jin, page 5, 1st column, last paragraph) and where “we adopt a dual directional graph convolution to make full of directed graphs which is defined as 
    PNG
    media_image5.png
    36
    449
    media_image5.png
    Greyscale
 Then, we use temporal convolution network to capture temporal dependency of both intersections and links respectively as shown in Fig. 3. We adopt gated mechanism into the temporal convolution network which is proven an effective approach to control the reserved information [40]. The temporal convolution operation is formulated as follows: 
    PNG
    media_image6.png
    29
    426
    media_image6.png
    Greyscale
 ….Empirically, Tanh function can be selected as σ1(•) and Sigmoid function is usually selected as σ2(•) to control the ratio of information passed” (Jin, page 5, 2nd column, 1st-2nd paragraph) and Jin, page 5, Fig. 3

    PNG
    media_image3.png
    521
    774
    media_image3.png
    Greyscale
 
Examiner notes that the Pi is the aggregate representation of a first node neighborhood of a first node. Examiner further notes equation 7 is the weighted sum with W being the weights. Additionally, Examiner notes that the sigmoid function which is in the temporal convolution operates on the outputs of node-wise GCN and intersection representations which thus is the sigmoid function operating on the weighted sum and aggregate representation of the first node. Examiner further notes that the node spatial layer is the node-wise GCN and the temporal convolution highlighted in a box in Figure 3.)
or the edge spatial layer comprises a sigmoidal function that operates on a weighted sum of the edge and the aggregate representation of a second edge connecting a third node and a fourth node (Jin, page 5, “Thus the dual graph convolution is formulated as: 
    PNG
    media_image1.png
    91
    460
    media_image1.png
    Greyscale

where θn*G and θe*G are respectively the node-wise and edge-wise graph convolution operation, Zn and Ze are respectively the node-wise and edge-wise latent representation, and                         
                            P
                            ∈
                            
                                    R
                                
                                    |
                                    
                                            V
                                        
                                            n
                                        
                                    |
                                    ×
                                    |
                                    
                                            E
                                        
                                            n
                                        
                                    |
                                
                     is the incidence matrix that encodes the connections between nodes and edges, defined as: Pi, (i→j) = Pj, (i→j) = 1 and 0 otherwise” where “to capture the spatial dependencies, we select the simple graph convolution approach to aggregate information of 1-hop neighbors” (Jin, page 5, 1st column, last paragraph) and where “we adopt a dual directional graph convolution to make full of directed graphs which is defined as 
    PNG
    media_image5.png
    36
    449
    media_image5.png
    Greyscale
 Then, we use temporal convolution network to capture temporal dependency of both intersections and links respectively as shown in Fig. 3. We adopt gated mechanism into the temporal convolution network which is proven an effective approach to control the reserved information [40]. The temporal convolution operation is formulated as follows: 
    PNG
    media_image6.png
    29
    426
    media_image6.png
    Greyscale
 ….Empirically, Tanh function can be selected as σ1(•) and Sigmoid function is usually selected as σ2(•) to control the ratio of information passed” (Jin, page 5, 2nd column, 1st-2nd paragraph) and Jin, page 5, Fig. 3

    PNG
    media_image3.png
    521
    774
    media_image3.png
    Greyscale
 
Examiner notes that the Pi, (i→j)  is the aggregate representation of a second edge connecting a third node and a fourth node. Examiner further notes equation 7 is the weighted sum with W being the weights. Additionally, Examiner notes that the sigmoid function which is in the temporal convolution operates on the outputs of edge-wise GCN and link representations which thus is the sigmoid function operating on the weighted sum and aggregate representation of a second edge connecting a third node and a fourth node. Examiner further notes that the edge spatial layer is the edge-wise GCN and the temporal convolution highlighted in a circle in Figure 3)
Karimi and Jin are considered analogous to the claimed invention because they both describe a spatio-temporal graph neural network. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Karimi to include an edge spatial layer to perform a prediction like in Jin. Doing so is advantageous because they “fully exploit multi-level spatio-temporal information to improve the quality of the final representations from spatio-temporal learning module” (Jin, page 2, 1st column, point (2)).
Karimi and Jin do not teach, but Yang does teach 
and wherein the edge spatial layer is configured to output an aggregate representation of an edge neighborhood of the edge (Yang, page 8, Algorithm 1)

    PNG
    media_image7.png
    852
    1064
    media_image7.png
    Greyscale
Examiner notes that lines 8-9 of Algorithm 1 show outputting an aggregate representation for each edge.)
Karimi, Jin, and Yang are considered analogous to the claimed invention because they incorporate node and edge features into a graph neural network. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Karimi in view of Jin to include an edge spatial layer like in Yang. Doing so is advantageous because they “incorporate[] node and features to enhance the node and edge embeddings across neural network layer” (Yang, page 13, Conclusion).

Regarding claim 12, claim 12 recites substantially similar limitations to claim 2, and is therefore rejected under the same analysis.

Regarding claim 13, claim 13 recites substantially similar limitations to claim 3, and is therefore rejected under the same analysis.

Regarding claim 15, claim 15 recites substantially similar limitations to claim 7, and is therefore rejected under the same analysis.

Claim(s) 4-5, 14, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Karimi in view of Jin in further view of Yang, Liu et al. (“Graph Neural Networks for Learning Real-Time Prices in Electricity Market”) (hereafter referred to as Liu), and Mehta et al. (“Graph Theory Based Online Optimal Power Flow Control of Power Grid With Distributed Flexible AC Transmission Systems (D-FACTS) Devices”) (hereafter referred to as Mehta).

Regarding claim 4, Karimi in view of Jin and Yang teach the computing system of claim 3. Karimi in view of Jin and Yang do not teach
for each node, an energy price and a rate of energy generation or energy consumption at that node; 
and for each edge, an energy transmission rate and an energy transmission capacity 
Liu, however, does teach
for each node, an energy price and a rate of energy generation or energy consumption at that node; (Liu, page 2, first column, first paragraph “Consider a power grid modeled by an undirected graph G = (V; E). The node set V consists of N nodes, each connected to loads or generators, while the edge set E                                 
                                    ∈
                                
                             V x V includes transmission lines or transformers. Let p, q                                 
                                    ∈
                                     
                                            R
                                        
                                            N
                                        
                             collect the nodal active and reactive power injections, respectively; and similarly for voltage v                                 
                                    ∈
                                     
                                            R
                                        
                                            N
                                        
                            ” and “LMPs [locational marginal prices]are market signals used by each generator or demand to determine the flexible power injection in order to minimize its own cost” (Liu, page 2, 2nd column, 2nd paragraph).  Examiner notes that the LMP is the energy price and the flexible power injection is the rate of energy generation.) 
Karimi, Jin, Yang, and Liu are considered analogous to the claimed invention because they predict using graph neural networks. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Karimi in view of Jin and Yang to have each node represent an energy price and a rate of energy generation. Doing so is advantageous because “This physics-aware approach not only capitalizes on the locality property of LMPs, but also motivates meaningful regularization on the feasibility of OPF line limits” (Liu, page 1, 2nd column, last paragraph).
Karimi, Jin, Yang, and Liu do not teach, but Mehta does teach
and for each edge, an energy transmission rate and an energy transmission capacity (Mehta, page 2, 1st column, 1st paragraph, “We define a flow to be a function on the edges of the graph that satisfies the capacity constraint 0 ≤ f(u,v) ≤ c(u,v).”).
Karimi, Jin, Yang, Liu, and Mehta are considered analogous to the claimed invention because they predict using graph networks. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Karimi in view of Jin, Yang, and Liu to have each edge represent an energy transmission rate. Doing so is advantageous because “the architecture optimizes power flow losses and maintains the voltage margin as well as the optimal line capacity” (Mehta, page 1, 2nd column, 2nd to last paragraph).

Regarding claim 5, Karimi in view of Jin, Yang, Liu, and Mehta teach the computing system of claim 4. Karimi in view of Jin, Yang, Liu, and Mehta further teach 
wherein the energy transmission rate is constrained by the energy transmission capacity (Mehta, page 2, 1st column, 1st paragraph, “We define a flow to be a function on the edges of the graph that satisfies the capacity constraint 0 ≤ f(u,v) ≤ c(u,v).”).
Karimi, Jin, Yang, Liu, and Mehta are considered analogous to the claimed invention because they predict using graph networks. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Karimi in view of Jin, Yang, and Liu to have each edge represent an energy transmission rate constrained by the energy transmission capacity. Doing so is advantageous because “the architecture optimizes power flow losses and maintains the voltage margin as well as the optimal line capacity” (Mehta, page 1, 2nd column, 2nd to last paragraph).

Regarding claim 14, claim 14 recites substantially similar limitations to claim 4, and is therefore rejected under the same analysis.

Claim(s) 8 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Karimi in view of Jin in further view of Yang and Zhao et al. (“T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction”) (hereafter referred to as Zhao).

Regarding claim 8, Karimi, Jin, and Yang teach the computing system of claim 1. Karimi, Jin, and Yang do not teach, but Zhao does teach 
wherein the temporal gate comprises a gated recurrent unit (GRU) or a long short-term memory (LSTM) (Zhao, page 4, Overview “In this section, we describe how to use the T-GCN model to realize the traffic forecasting task based on the urban roads. Specifically, the T-GCN model consists of two parts: the graph convolutional network and the gated recurrent unit. As shown in Figure 3, we first use the historical n time series data as input and the graph convolution network for obtaining the spatial features. Second, the obtained time series with spatial features are input into the gated recurrent unit model and the dynamic change is obtained by information transmission between the units, to capture temporal features. Finally we get results through the fully connected layer” and Zhao, page 4, Figure 3, 
    PNG
    media_image8.png
    436
    700
    media_image8.png
    Greyscale
. Examiner notes that the temporal feature layer in Figure 3 is the temporal gate comprising a GRU.
Karimi, Jin, Yang and Zhao are considered analogous to the claimed invention because they use temporal gates in graph neural networks. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Karimi, Jin, and Yang to use a GRU in the temporal gate. Doing so is advantageous because “the graph convolutional network is used to capture the topological structure of the road network for modeling spatial dependence. The gated recurrent unit is used to capture the dynamic change of traffic data on the roads for temporal dependence. The T-GCN model can also be applied to other spatio-temporal forecasting tasks” (Zhao, page 2, 1st column, point (1)).
	
Regarding claim 16, claim 16 recites substantially similar limitations to claim 8, and is therefore rejected under the same analysis.

Claim(s) 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Karimi in view of Liu, Jin, and Yang.

Regarding claim 19, Karimi teaches 
A computing system, comprising: a processor; and a memory storing instructions executable by the processor to, during a run-time phase (Karimi, page 3, 2nd column, 4th paragraph, “The TensorFlow2 (Abadi et al. 2016) library was used for building the GNN model and the model was trained and tested on Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz, 48 GB memory, 12 CPU cores, and 12 GB Nvidia GeForce RTX 2080 GPU card” where “Acquisition and processing of real-time data at regular intervals are critical for power forecasting, so the deployment of a production pipeline will need to have an integrated and automated connection with the data management and inference modules” (Karimi, page 5, 1st column, 2nd paragraph). Examiner notes that the deployment of the pipeline is the run-time phase.), 
receive run-time input data that includes time series data indicating a state of an energy distribution graph network at each of a series of time steps, the energy distribution graph network including nodes representing a plurality of energy generation and/or energy consumption subsystems, and at least one edge connecting pairs of the nodes, the edge representing an energy distribution linkage between the respective subsystems of each node (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input” where “Figure 1: The distribution of PV systems across the United States and the graph model used to represent the relationships between the PV systems. (a) All PV systems in our dataset are shown by red circles on the map. (b,c) Graph showing the proximity of PV systems in which edges are included between PV systems that are proximate according to a low/moderate proximity threshold (ϵ = 0, ϵ = 0.5)” where “Acquisition and processing of real-time data at regular intervals are critical for power forecasting, so the deployment of a production pipeline will need to have an integrated and automated connection with the data management and inference modules” (Karimi, page 5, 1st column, 2nd paragraph). Examiner notes that the deployment of the pipeline is the run-time phase that uses similar input data to the training described above. Examiner further notes that the nodes represent the PV systems which generate energy and the edges represent the proximity to another PV system.), 
and input the run-time input data into a trained graph neural network to thereby cause the graph neural network to output a predicted state of the energy distribution graph network at one or more future time steps (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input. The output of the model is of the form                                 
                                    
                                            R
                                        
                                            M
                                            ×
                                            N
                                        
                             where M is the number of future time points for which model will forecast” where “Acquisition and processing of real-time data at regular intervals are critical for power forecasting, so the deployment of a production pipeline will need to have an integrated and automated connection with the data management and inference modules” (Karimi, page 5, 1st column, 2nd paragraph). Examiner notes that the deployment of the pipeline is the run-time phase that uses similar input data to the training described above.), 
wherein the graph neural network includes; a node spatial layer configured to receive, as input, the state of the graph network, and to output, for each node, an aggregate representation of a node neighborhood of the node (Karimi, page 3, 1st column, 3rd paragraph, “The convolutional layers in the GNN adopt a neighborhood aggregation architecture to learn a discriminative vector representation h(v) for each node v (called “node embedding”) across multiple transformation layers. The new layer hi(v) takes a node embedding hi-1(v) (h0(v) is X(v) from input feature matrix) and updates its embedding hi(v) by aggregating the embeddings from its neighbors” where “Acquisition and processing of real-time data at regular intervals are critical for power forecasting, so the deployment of a production pipeline will need to have an integrated and automated connection with the data management and inference modules” (Karimi, page 5, 1st column, 2nd paragraph). Examiner notes that the deployment of the pipeline is the run-time phase that uses similar input data to the training described above. Examiner further notes that hi(v) is the node spatial layer.); 
Karimi does not teach 
wherein the predicted state of the network at each future time step includes, for each node, an energy price and a rate of energy generation or energy consumption at that node
and for each edge, a predicted energy transmission rate at the future time
an edge spatial layer configured to receive, as input for each edge of the at least one edge, a representation of embedded edge features, from the node spatial layer, an aggregate representation of a first node neighborhood of a first node connected by the edge, and from the node spatial layer, an aggregate representation of a second node neighborhood of a second node connected by the edge, 
and wherein the edge spatial layer is configured to output an aggregate representation of an edge neighborhood of the edge;
and a fully connected layer configured to receive output data from the node spatial layer and the edge spatial layer via a temporal gate, 
and to combine the output data from the node spatial layer and the edge spatial layer with an input temporal state of the network to predict the state of the graph network at the one or more future time steps.
the node spatial layer comprises a sigmoidal function that operates on a weighted sum of the first node and the aggregate representation of the first node neighborhood;
or the edge spatial layer comprises a sigmoidal function that operates on a weighted sum of the edge and the aggregate representation of a second edge connecting a third and a fourth node.
However, Liu does teach 
wherein the predicted state of the network at each future time step includes, for each node, an energy price and a rate of energy generation or energy consumption at that node; (Liu, page 2, first column, first paragraph “Consider a power grid modeled by an undirected graph G = (V; E). The node set V consists of N nodes, each connected to loads or generators, while the edge set E                                 
                                    ∈
                                
                             V x V includes transmission lines or transformers. Let p, q                                 
                                    ∈
                                     
                                            R
                                        
                                            N
                                        
                             collect the nodal active and reactive power injections, respectively; and similarly for voltage v                                 
                                    ∈
                                     
                                            R
                                        
                                            N
                                        
                            ” and “LMPs [locational marginal prices]are market signals used by each generator or demand to determine the flexible power injection in order to minimize its own cost” (Liu, page 2, 2nd column, 2nd paragraph) and where “we instead advocate to predict the actual OPF [optimal power flow] outputs for electricity market, namely the locational marginal prices (LMPs) known as the real-time market signals” Examiner notes that the LMP is the energy price and the flexible power injection is the rate of energy generation.) 
and for each edge, a predicted energy transmission rate at the future time (Liu, page 2, 1st column, 1st paragraph, “Consider a power grid modeled by an undirected graph G = (V; E). The node set V consists of N nodes, each connected to loads or generators, while the edge set E                                 
                                    ∈
                                
                             V x V includes transmission lines or transformers. Let p, q                                 
                                    ∈
                                     
                                            R
                                        
                                            N
                                        
                             collect the nodal active and reactive power injections, respectively; and similarly for voltage v                                 
                                    ∈
                                     
                                            R
                                        
                                            N
                                        
                            ” where “we advocate the following chain to generate the corresponding nodal injection and line flow solutions to the predicted LMP” (Liu, page 3, 2nd column, 5th paragraph). Examiner notes that each edge has a transmission line and the line flow on the transmission line solution that is based on the predicted LMP is the predicted energy transmission rate at the future time.).
Karimi and Liu are considered analogous to the claimed invention because they predict using energy graph neural networks. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Karimi to have each node have a predicted energy price and a rate of energy generation and each edge to have a predicted transmission rate. Doing so is advantageous because “This physics-aware approach not only capitalizes on the locality property of LMPs, but also motivates meaningful regularization on the feasibility of OPF line limits” (Liu, page 1, 2nd column, last paragraph).
Karimi in view of Liu does not teach 
an edge spatial layer configured to receive, as input for each edge of the at least one edge, a representation of embedded edge features, from the node spatial layer, an aggregate representation of a first node neighborhood of a first node connected by the edge, and from the node spatial layer, an aggregate representation of a second node neighborhood of a second node connected by the edge, 
and wherein the edge spatial layer is configured to output an aggregate representation of an edge neighborhood of the edge;
and a fully connected layer configured to receive output data from the node spatial layer and the edge spatial layer via a temporal gate, 
and to combine the output data from the node spatial layer and the edge spatial layer with an input temporal state of the network to predict the state of the graph network at the one or more future time steps.
the node spatial layer comprises a sigmoidal function that operates on a weighted sum of the first node and the aggregate representation of the first node neighborhood;
or the edge spatial layer comprises a sigmoidal function that operates on a weighted sum of the edge and the aggregate representation of a second edge connecting a third and a fourth node.
However, Jin discloses
an edge spatial layer configured to receive, as input for each edge of the at least one edge, a representation of embedded edge features, from the node spatial layer, an aggregate representation of a first node neighborhood of a first node connected by the edge, and from the node spatial layer, an aggregate representation of a second node neighborhood of a second node connected by the edge (Jin, page 5, “Thus the dual graph convolution is formulated as: 
    PNG
    media_image1.png
    91
    460
    media_image1.png
    Greyscale

where θn*G and θe*G are respectively the node-wise and edge-wise graph convolution operation, Zn and Ze are respectively the node-wise and edge-wise latent representation, and                         
                            P
                            ∈
                            
                                    R
                                
                                    |
                                    
                                            V
                                        
                                            n
                                        
                                    |
                                    ×
                                    |
                                    
                                            E
                                        
                                            n
                                        
                                    |
                                
                     is the incidence matrix that encodes the connections between nodes and edges, defined as: Pi, (i→j) = Pj, (i→j) = 1 and 0 otherwise” and Jin, page 4, Fig.2 and Jin, page 5, Fig. 3

    PNG
    media_image2.png
    420
    1013
    media_image2.png
    Greyscale

    PNG
    media_image3.png
    521
    774
    media_image3.png
    Greyscale
 
Examiner notes that the Pi is the aggregate representation of a first node neighborhood of a first node connected by the edge and Pj is the aggregate representation of a second node neighborhood of a second node connected by the edge. Examiner further notes that the edge spatial layer is the Edge-wise GCN and the temporal convolution from Figure 3 in the highlighted box. Examiner also further notes that the node spatial layer is Node-wise GCN and the temporal convolution from Figure 3 in the highlighted circle. Examiner additionally notes that the DGCN sends the representations to the edge-wise GCN in Figure 2 which is part of the edge spatial layer. Examiner further notes that the link representations in Figure 3 is the representation of embedded edge features for each edge.); 
and a fully connected layer configured to receive output data from the node spatial layer and the edge spatial layer via a temporal gate (Jin, page 4, Figure 2, 
    PNG
    media_image2.png
    420
    1013
    media_image2.png
    Greyscale
Examiner notes that DGCN is the fully connected layer and the TCN is the temporal gate.), 
and to combine the output data from the node spatial layer and the edge spatial layer with an input temporal state of the network to predict the state of the graph network at the one or more future time steps (Jin, page 4, Figure 2, and Jin page 3, Figure 1, 
    PNG
    media_image2.png
    420
    1013
    media_image2.png
    Greyscale
 
    PNG
    media_image4.png
    308
    496
    media_image4.png
    Greyscale

Examiner notes that the spatio-temporal dual graph learning layer in Figures 1 and 2 combines the output data. Examiner further notes that the historical trajectories in Figure 1 is the input temporal state of the network and the travel time of each link/intersection and travel time of entire path is the state of the graph network at the one or more future time steps.).
the node spatial layer comprises a sigmoidal function that operates on a weighted sum of the first node and the aggregate representation of the first node neighborhood (Jin, page 5, “Thus the dual graph convolution is formulated as: 
    PNG
    media_image1.png
    91
    460
    media_image1.png
    Greyscale

where θn*G and θe*G are respectively the node-wise and edge-wise graph convolution operation, Zn and Ze are respectively the node-wise and edge-wise latent representation, and                         
                            P
                            ∈
                            
                                    R
                                
                                    |
                                    
                                            V
                                        
                                            n
                                        
                                    |
                                    ×
                                    |
                                    
                                            E
                                        
                                            n
                                        
                                    |
                                
                     is the incidence matrix that encodes the connections between nodes and edges, defined as: Pi, (i→j) = Pj, (i→j) = 1 and 0 otherwise” where “to capture the spatial dependencies, we select the simple graph convolution approach to aggregate information of 1-hop neighbors” (Jin, page 5, 1st column, last paragraph) and where “we adopt a dual directional graph convolution to make full of directed graphs which is defined as 
    PNG
    media_image5.png
    36
    449
    media_image5.png
    Greyscale
 Then, we use temporal convolution network to capture temporal dependency of both intersections and links respectively as shown in Fig. 3. We adopt gated mechanism into the temporal convolution network which is proven an effective approach to control the reserved information [40]. The temporal convolution operation is formulated as follows: 
    PNG
    media_image6.png
    29
    426
    media_image6.png
    Greyscale
 ….Empirically, Tanh function can be selected as σ1(•) and Sigmoid function is usually selected as σ2(•) to control the ratio of information passed” (Jin, page 5, 2nd column, 1st-2nd paragraph) and Jin, page 5, Fig. 3

    PNG
    media_image3.png
    521
    774
    media_image3.png
    Greyscale
 
Examiner notes that the Pi is the aggregate representation of a first node neighborhood of a first node. Examiner further notes equation 7 is the weighted sum with W being the weights. Additionally, Examiner notes that the sigmoid function which is in the temporal convolution operates on the outputs of node-wise GCN and intersection representations which thus is the sigmoid function operating on the weighted sum and aggregate representation of the first node. Examiner further notes that the node spatial layer is the node-wise GCN and the temporal convolution highlighted in a box in Figure 3.)
or the edge spatial layer comprises a sigmoidal function that operates on a weighted sum of the edge and the aggregate representation of a second edge connecting a third node and a fourth node (Jin, page 5, “Thus the dual graph convolution is formulated as: 
    PNG
    media_image1.png
    91
    460
    media_image1.png
    Greyscale

where θn*G and θe*G are respectively the node-wise and edge-wise graph convolution operation, Zn and Ze are respectively the node-wise and edge-wise latent representation, and                         
                            P
                            ∈
                            
                                    R
                                
                                    |
                                    
                                            V
                                        
                                            n
                                        
                                    |
                                    ×
                                    |
                                    
                                            E
                                        
                                            n
                                        
                                    |
                                
                     is the incidence matrix that encodes the connections between nodes and edges, defined as: Pi, (i→j) = Pj, (i→j) = 1 and 0 otherwise” where “to capture the spatial dependencies, we select the simple graph convolution approach to aggregate information of 1-hop neighbors” (Jin, page 5, 1st column, last paragraph) and where “we adopt a dual directional graph convolution to make full of directed graphs which is defined as 
    PNG
    media_image5.png
    36
    449
    media_image5.png
    Greyscale
 Then, we use temporal convolution network to capture temporal dependency of both intersections and links respectively as shown in Fig. 3. We adopt gated mechanism into the temporal convolution network which is proven an effective approach to control the reserved information [40]. The temporal convolution operation is formulated as follows: 
    PNG
    media_image6.png
    29
    426
    media_image6.png
    Greyscale
 ….Empirically, Tanh function can be selected as σ1(•) and Sigmoid function is usually selected as σ2(•) to control the ratio of information passed” (Jin, page 5, 2nd column, 1st-2nd paragraph) and Jin, page 5, Fig. 3

    PNG
    media_image3.png
    521
    774
    media_image3.png
    Greyscale
 
Examiner notes that the Pi, (i→j)  is the aggregate representation of a second edge connecting a third node and a fourth node. Examiner further notes equation 7 is the weighted sum with W being the weights. Additionally, Examiner notes that the sigmoid function which is in the temporal convolution operates on the outputs of edge-wise GCN and link representations which thus is the sigmoid function operating on the weighted sum and aggregate representation of a second edge connecting a third node and a fourth node. Examiner further notes that the edge spatial layer is the edge-wise GCN and the temporal convolution highlighted in a circle in Figure 3)
Karimi, Liu, and Jin are considered analogous to the claimed invention because they describe a spatio-temporal graph neural network. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Karimi in view of Liu to include an edge spatial layer to perform a prediction like in Jin. Doing so is advantageous because they “fully exploit multi-level spatio-temporal information to improve the quality of the final representations from spatio-temporal learning module” (Jin, page 2, 1st column, point (2)).
Karimi, Liu, and Jin do not teach, but Yang does teach 
and wherein the edge spatial layer is configured to output an aggregate representation of an edge neighborhood of the edge (Yang, page 8, Algorithm 1)

    PNG
    media_image7.png
    852
    1064
    media_image7.png
    Greyscale
Examiner notes that lines 8-9 of Algorithm 1 show outputting an aggregate representation for each edge.)
Karimi, Liu, Jin, and Yang are considered analogous to the claimed invention because they incorporate node and edge features into a graph neural network. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Karimi in view of Liu and Jin to include an edge spatial layer like in Yang. Doing so is advantageous because they “incorporate[] node and features to enhance the node and edge embeddings across neural network layer” (Yang, page 13, Conclusion).

Regarding claim 20, Karimi in view of Liu, Jin, and Yang teaches the computing system of claim 19. Karimi further teaches
wherein the instructions are further executable to, during a training phase: receive training data that includes time series data indicating a state of the energy distribution graph network at each of a series of historical time steps (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input” where “Figure 1: The distribution of PV systems across the United States and the graph model used to represent the relationships between the PV systems. (a) All PV systems in our dataset are shown by red circles on the map. (b,c) Graph showing the proximity of PV systems in which edges are included between PV systems that are proximate according to a low/moderate proximity threshold (ϵ = 0, ϵ = 0.5); 
and train the graph neural network using the training data to output the predicted state of the energy distribution graph network at the one or more future time steps (Karimi, page 3, 2nd column, 3rd paragraph, “The GNN model’s input data dimension is                                 
                                    
                                            R
                                        
                                            H
                                            ×
                                            N
                                            ×
                                            C
                                        
                            , where H is the number of previous data points of the timeseries used in the model, N is the total number of PV systems and C is the number of channels in the input. The output of the model is of the form                                 
                                    
                                            R
                                        
                                            M
                                            ×
                                            N
                                        
                             where M is the number of future time points for which model will forecast.”).

Allowable Subject Matter
Claim 9-10 and 17-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Specifically regarding claim 9 “wherein the sigmoidal function of the node spatial layer is defined as a sigmoidal function             
                σ
                (
                
                        W
                    
                        n
                    
                        l
                    
                (
                
                        x
                    
                        i
                    
                +
                A
                G
                G
                (
                
                        x
                    
                        j
                    
                ,
                 
                        e
                    
                        i
                        j
                    
                )
                )
                ,
                 
                        x
                    
                        j
                    
                )
            
         where             
                
                        W
                    
                        n
                    
                        l
                    
         is a nodewise weight at level l, AGG (x,eij) is an aggregate of a representation of a node x connected to a node xi, and eij is a representation of an edge connecting the node xi and the node x1” in conjunction with the other limitations of the claim are not taught by the prior art of record. The closest prior art of record is Karimi and Jin.
Karimi discloses the node spatial layer (Karimi, page 3, 1st column 3rd paragraph) and a sigmoidal function (Karimi, page 3, 2nd column, 2nd paragraph). Karimi fails to explicitly disclose the sigmoidal function of the node spatial layer is defined as a sigmoidal function             
                σ
                (
                
                        W
                    
                        n
                    
                        l
                    
                (
                
                        x
                    
                        i
                    
                +
                A
                G
                G
                (
                
                        x
                    
                        j
                    
                ,
                 
                        e
                    
                        i
                        j
                    
                )
                )
                ,
                 
                        x
                    
                        j
                    
                )
            
         where             
                
                        W
                    
                        n
                    
                        l
                    
         is a nodewise weight at level l, AGG (x,eij) is an aggregate of a representation of a node x connected to a node xi, and eij is a representation of an edge connecting the node xi and the node x1.
Jin discloses a sigmoidal function of a node spatial layer (Jin, page 5, Figure 3). Jin fails to explicitly disclose a sigmoidal function             
                σ
                (
                
                        W
                    
                        n
                    
                        l
                    
                (
                
                        x
                    
                        i
                    
                +
                A
                G
                G
                (
                
                        x
                    
                        j
                    
                ,
                 
                        e
                    
                        i
                        j
                    
                )
                )
                ,
                 
                        x
                    
                        j
                    
                )
            
         where             
                
                        W
                    
                        n
                    
                        l
                    
         is a nodewise weight at level l, AGG (x,eij) is an aggregate of a representation of a node x connected to a node xi, and eij is a representation of an edge connecting the node xi and the node x1.
Specifically regarding claim 10 “wherein the sigmoidal function of the edge spatial layer is defined as a sigmoidal function             
                σ
                (
                
                        W
                    
                        e
                    
                        l
                    
                (
                
                        e
                    
                        i
                        j
                    
                +
                A
                G
                G
                (
                 
                        e
                    
                        k
                        l
                    
                )
                )
                ,
                 
                        e
                    
                        k
                        l
                    
                )
            
         where             
                
                        W
                    
                        e
                    
                        l
                    
         is a nodewise weight at level l, eij is a representation of a first edge connecting a node (i) and a node (j), and AGG(ekl) is an aggregate of a representation of a second edge connecting a node (k) and a node (l)” in conjunction with the other limitations of the claim are not taught by the prior art of record. The closest prior art of record is Jin.
Jin discloses a sigmoidal function of an edge spatial layer (Jin, page 5, Figure 3). Jin fails to explicitly disclose a sigmoidal function             
                σ
                (
                
                        W
                    
                        e
                    
                        l
                    
                (
                
                        e
                    
                        i
                        j
                    
                +
                A
                G
                G
                (
                 
                        e
                    
                        k
                        l
                    
                )
                )
                ,
                 
                        e
                    
                        k
                        l
                    
                )
            
         where             
                
                        W
                    
                        e
                    
                        l
                    
         is a nodewise weight at level l, eij is a representation of a first edge connecting a node (i) and a node (j), and AGG(ekl) is an aggregate of a representation of a second edge connecting a node (k) and a node (l).
	Claim 17 recites substantially similar limitations as claim 9 and is therefore allowable under the same rationale if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim 18 recites substantially similar limitations as claim 10 and is therefore allowable under the same rationale if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Response to Arguments
	On pages 12-14, Applicant argues
In contrast, Karimi discloses a spatiotemporal graph neural network architecture comprising graph convolutional layers for spatial modeling, specifically for node neighborhood aggregation. Karimi further describes temporal convolution layers using 1-D convolutions and sigmoid gated units for time series modeling, where a spatial layer is sandwiched between two temporal convolutional layers (see page 3, second column, paragraphs 2 and 3 and Figure 2). While Karimi does disclose a sigmoid gated linear unit, it is used in the temporal convolution layers, and not in the graph convolutional layers, which the Examiner maps to the node spatial layer of the claim (see page 5 of the Office action). Furthermore, Karimi is silent regarding an edge spatial layer and does not disclose the use of a sigmoidal function in the edge spatial layer.
Yang describes NENN, a graph neural network architecture with alternating aggregation of node and edge embeddings using a hierarchical dual-level attention mechanism (see Abstract). However, Yang is silent about the use of a sigmoidal function in the spatial layers.
Jin describes dual graph modeling with separate node-wise and edge-wise graphs, along with spatio-temporal dual graph learning layers and dual graph interaction mechanisms (see Abstract and page 2, first column, paragraph 2). Jin further discloses use of a sigmoid function in the temporal convolution operation. Specifically, page 5, second column, paragraph 2 of Jin discloses:

…

According to the above disclosure, Temporal Convolution Network (TCN) operation implements the gated mechanism described in Equation (8), where Sigmoid controls the ratio of information passed within the temporal gating process.
However, Jin does not disclose or suggest the use of a sigmoidal function within the node spatial layer or the edge spatial layer as recited in amended claim 1. Jin applies ReLU as the activation function for graph convolution operations in both node-wise and edge-wise spatial layers (see page 5, first column, paragraph 2; Equation (6)). The only mention of a sigmoid function in Jin is confined to the temporal convolution network (TCN) for gating purposes (see page 5, second column, paragraph 2; Equation (8)), which is separate and distinct from the spatial layers.
Further, Jin does not teach or suggest applying a sigmoidal function to operate on (i) a weighted sum of a node and its neighborhood aggregate or (ii) a weighted sum of an edge and an aggregate representation of a second edge, as specified by amended claim 1. Jin's spatial layers perform standard graph convolution aggregations using adjacency based propagation and ReLU activation, without any indication of sigmoidal gating or weighted summation in the manner claimed.
Therefore, Jin, Kairmi, and Yang fail to disclose or suggest at least the above newly recited features of amended claim 1. Other cited references also fail to disclose or suggest at least the recited features of the subject claim.
Accordingly, Applicant respectfully submits that claim 1 is not anticipated by or obvious over the disclosure of Karimi in view of Jin, Yang, and/or other cited references and thus claim 1 is in condition for allowance.

	Regarding the Applicant’s argument that the prior art does not teach the newly amended limitations, Examiner respectfully disagrees. Specifically, Examiner respectfully notes that Jin teaches these limitations. Examiner notes that the edge spatial layer is mapped to the edge-wise GCN and the temporal convolution (TCN). Examiner further notes that the node spatial layer is mapped to the node-wise GCN and the temporal convolution (TCN). Since the sigmoid function is on the temporal convolution, the node spatial layer and the edge spatial layer comprise a sigmoidal function. Examiner further notes that the sigmoidal function notated by σ2 (Jin, page 5, equation 8) operates on a weighted sum Z(l+1) (Jin, page 5, equation 7). Examiner also notes that the Z(l) in equation 7 is the neighborhood aggregate or aggregate representation of a second edge since Z is found by aggregating information of 1-hop neighbors. Thus, since the sigmoidal function in Jin operates on the weighted sum of Z(l+1) and thus Z(l), the sigmoidal function operates on the weighted sum and neighborhood aggregate or aggregate representation of a second edge. 
Examiner respectfully notes that the newly amended limitations read broader than claims 9 and 10 and thus, the newly amended limitations in claims 1, 11, and 19 are mapped to Jin. As stated in the allowable subject matter, Jin does not disclose the specific sigmoidal function defined in claims 9 and 10.

	On pages 14-15, Applicant argues:
Claims 11 and 19 have been amended in a similar manner as claim 1, and thus Applicant submits that claims 11 and 19 are in condition for allowance for similar reasons.
Claims 2-8 depend from claim 1 and thus are also in condition for allowance at least for the reason of dependence from an allowable claim.
Claims 12-16 depend from claim 11 and thus are also in condition for allowance at least for the reason of dependence from an allowable claim.
Claims 20 depends from claim 19 and thus is also in condition for allowance at least for the reason of dependence from an allowable claim.
In view of the above, Applicant respectfully requests the withdrawal of the rejection of claims 1-8, 11-16, 19, and 20 under 35 U.S.C. 103.

Regarding the Applicant’s argument that the dependent claims are allowable at least due in part to their dependency on the independent claims, the Examiner respectfully disagrees and notes the instant rejections and response to arguments regarding the independent claims above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Owerko et al. (Optimal Power Flow Using Graph Neural Networks) also describes using a graph neural network to model energy distribution.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KAITLYN R HAEFNER whose telephone number is (571)272-1429. The examiner can normally be reached Monday - Thursday: 7:15 am - 5:15 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached at (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/K.R.H./Examiner, Art Unit 2148                                                                                                                                                                                                        /MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Oct 12, 2022
Application Filed
Sep 15, 2025
Non-Final Rejection mailed — §103
Nov 18, 2025
Examiner Interview Summary
Nov 18, 2025
Applicant Interview (Telephonic)
Jan 14, 2026
Response Filed
Mar 09, 2026
Final Rejection mailed — §103
May 26, 2026
Examiner Interview (Telephonic)
May 26, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

17/493,365
Patent 12572828
METHOD FOR INDUSTRY TEXT INCREMENT AND ELECTRONIC DEVICE
4y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
67%
Grant Probability
99%
With Interview (+100.0%)
3y 10m (~3m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 3 resolved cases by this examiner. Grant probability derived from career allowance rate.