Last updated: April 19, 2026
Application No. 17/731,550
SYSTEM AND METHOD FOR MOLECULAR PROPERTY PREDICTION USING HIERARCHICAL LAYER-WISE PROPAGATION OF GRAPH POOLING LAYER

Final Rejection §103§112
Filed
Apr 28, 2022
Examiner
BALAKRISHNAN, VIJAY MURALI
Art Unit
2143
Tech Center
2100 — Computer Architecture & Software
Assignee
Tata Consultancy Services Limited
OA Round
2 (Final)
This examiner grants 43% of cases after interview

— +85.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 14 resolved cases, 2023–2026
Examiner Intelligence

BALAKRISHNAN, VIJAY MURALI View full profile →
Grants 43% of resolved cases
Career Allow Rate
6 granted / 14 resolved
-12.1% vs TC avg
Strong +86% interview lift
Without
With
+85.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 12m
Avg Prosecution
26 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
26.4%
-13.6% vs TC avg
§103
31.5%
-8.5% vs TC avg
§102
13.2%
-26.8% vs TC avg
§112
24.3%
-15.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 14 resolved cases
Office Action

§103 §112
DETAILED ACTION
	This final action is in response to the amendment and remarks filed on 10/07/2025 for application 17/731,550. 
	Claims 1-10 and 12-18 have been amended. Claim 11 is cancelled. Claim 19 is a newly added claim.
	Claims 1-10 and 12-19 are pending in the application. Claims 1, 7, and 13 are independent claims.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
	The amendment filed 10/07/2025 has been entered.
	Applicant’s amendment to the claims with respect to resolving claim objections and indefiniteness rejections under 35 U.S.C. 112(b) has been considered, and overcomes the objections and 112(b) rejections set forth in the nonfinal office action mailed 07/09/2025. Consequently, the previous objections and rejections have been withdrawn.
Claim Objections
	Claims 1-3, 7-9, and 13-15 are objected to because of the following informalities:
In claims 1, 7, and 13, “wherein the database include annotated independent and identically distributed molecular graphs” should read “wherein the database includes annotated independent and identically distributed molecular graphs” to improve grammatical clarity.
In claims 2, 8, and 14, the claim term “spatial graph convolution” should be amended to either be consistently recited as “spatial graph convolution” (without hyphen) or “spatial-graph convolution” (with hyphen).
In claims 3, 9, and 15, the claim term “spatial dynamic neighborhood aggregation” should be amended to either be consistently recited as “spatial dynamic neighborhood aggregation” (without hyphen) or “spatial-dynamic neighborhood aggregation” (with hyphen).
	Appropriate corrections are required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-10 and 12-19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Regarding claim 1, it recites the limitation “wherein the hierarchical layer-wise propagation of the graph pooling layer is represented by, 
    PNG
    media_image1.png
    92
    221
    media_image1.png
    Greyscale
”. However, the meaning of reference characters 
    PNG
    media_image2.png
    95
    91
    media_image2.png
    Greyscale
 and 
    PNG
    media_image3.png
    102
    86
    media_image3.png
    Greyscale
in the recited equation is not clear. The claim does previously recite “a projection vector 
    PNG
    media_image4.png
    37
    37
    media_image4.png
    Greyscale
”, but does not recite any indices or terms “i / j” in relation to the recited projection vector. It is entirely unclear if the claim is thereby reciting multiple projection vectors, subcomponents of the same projection vector 
    PNG
    media_image4.png
    37
    37
    media_image4.png
    Greyscale
, or any other potential configuration. The specification also does not provide any further clarification to ascertain an intended meaning. Consequently, one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
	For purposes of examination and as best understood in light of the specification, the limitation is interpreted as providing a representation of the previous limitation (“obtaining a hierarchical layer-wise propagation of a graph pooling layer of the molecular graph by taking a product of the edge-information aware node attributes 
    PNG
    media_image5.png
    46
    152
    media_image5.png
    Greyscale
 and a unit vector associated with the projection vector 
    PNG
    media_image6.png
    39
    49
    media_image6.png
    Greyscale
, wherein the direction of the unit vector is same as the direction of the projection vector 
    PNG
    media_image6.png
    39
    49
    media_image6.png
    Greyscale
”) in equation form.
	Claim 1 further recites “performing graph-pooling on varying input graph sizes”. There is insufficient antecedent basis for the term “input graph” in the claim, rendering the scope of the term uncertain. The claim previously recites performing the claimed down-sampling procedure on “molecular graphs” – it is thereby uncertain if the recited input graphs are referring to the same molecular graphs, or are instead referring to an entirely different set of graphs. Consequently, one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
	For purposes of examination and as best understood in light of the specification, “performing graph-pooling on varying input graph sizes” is interpreted as “performing graph-pooling on varying molecular graph sizes”.
	Regarding claim 19, it recites the limitation “augmenting the receptive field”. There is insufficient antecedent basis for the term “the receptive field” in the claim; parent claim 1 previously recites “augmenting a node-local receptive field”, but does not recite “a receptive field”. Consequently, one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.
	For purposes of examination and as best understood in light of the specification, “augmenting the receptive field” is interpreted as “augmenting a receptive field”.
Regarding claims 7 and 13, they have the same deficiencies as those found in claim 1 above. Consequently, they are rejected for the same reasons as claim 1 and are likewise interpreted as detailed above.
	Regarding claims 2-6, 8-10, 12, and 14-18, they inherit the deficiencies of their parent claims. Consequently, they are also rejected under 35 U.S.C. 112(b) as being indefinite for depending on an indefinite parent claim. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 5, 7-8, 13-14, 17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over over Wei et al., ("Pooling Architecture Search for Graph Classification", available arXiv 24 Aug 2021), hereinafter Wei, in view of Jiang et al., ("Graph Neural Network Architecture Search for Molecular Property Prediction", available arXiv 27 Aug 2020), hereinafter Jiang, and Gao et al., ("Graph U-Nets", published 2019), hereinafter Gao.
Regarding claim 1, Wei teaches A processor-implemented method (In this work, to the best of our knowledge, we made the first attempt to address the two aforementioned problems and propose an efficient NAS method to obtain data-specific pooling architectures for graph classification. Firstly, by revisiting various existing human-designed pooling architectures, we propose a unified pooling framework consisting of four key modules for graph classification, which covers both the global and hierarchical pooling methods. Then based on the unified framework, a customized and effective search space is designed...In this way, data-specific architectures are obtained, and the proposed method is dubbed PAS (Pooling Architecture Search)” [Wei page 2 Introduction]; “All models are implemented with Pytorch [31] on a GPU 2080Ti (Memory: 12GB, Cuda version: 10.2). Thus, for consistent comparisons of baseline models, we use the implementation of all GNN baselines by the popular GNN library: Pytorch Geometric (PyG) (version 1.6.1) [9], which provides a unifying code framework 4 for various GNN models [Wei page 12 The implementation details of PAS]) comprising: 
accessing, via one or more hardware processors, a database comprising a plurality of molecular graphs associated with a plurality of molecules and a plurality of labels indicative of chemical properties of the plurality of the molecular graphs, wherein each molecular graph of the plurality of molecular graphs comprises a plurality of nodes and a plurality of edges connecting the plurality of nodes; (“To demonstrate the effectiveness of PAS, we conduct extensive experiments on six real-world datasets from three domains, and experimental results show that the searched architectures can outperform various baselines for graph classification. [Wei page 2 Introduction]; “"Notations. We represent a graph as 𝐺 = (A, H) ,where A ∈ R 𝑁 ×𝑁 is the adjacency matrix of this graph and H ∈ R 𝑁 ×𝑑 is the node features. 𝑁 is the node number. N ( e 𝑣) = {𝑣 } ∪ {𝑢|A𝑢𝑣 ≠ 0} represents set of the self-contained first-order neigbors of node 𝑣. Give a dataset D = {(𝐺1, 𝑦1), · · · , (𝐺𝑀, 𝑦𝑀 )}, (𝐺𝑖 , 𝑦𝑖) is the 𝑖-th graph of this dataset. 𝑀 is the number of total graphs, 𝑦 ∈ Y is the graph label. In a 𝐿-layer GNN, for clear presentation, the input graph is denoted by 𝐺 0 = (A0 , H0 ), and the input of 𝑙-th layer is 𝐺 𝑙−1 = (A𝑙−1 , H𝑙−1 ), and the output is 𝐺 𝑙 = (A𝑙 , H𝑙 ). The features of node 𝑣 in 𝑙-th layer are denoted by h 𝑙 𝑣" [Wei page 2 Introduction]; "Datasets. In this paper, we use six datasets as shown in Table 3. D&D and PROTEINS datasets, provided by [6], are both protein graphs. In the D&D dataset, nodes represent the amino acids and two nodes are connected if the distance is less than 6 𝐴¤. In the PROTEINS dataset, nodes are secondary structure elements and edges represent nodes are in an amino acid or in a close 3D space" [Wei page 6 Experimental Settings])
performing, via the one or more hardware processors, a first iteration to down-sample a molecular graph from amongst the plurality of molecular graphs into a coarsened molecular graph, (“In a 𝐿-layer GNN, for clear presentation, the input graph is denoted by 𝐺 0 = (A0 , H0 ), and the input of 𝑙-th layer is 𝐺 𝑙−1 = (A𝑙−1 , H𝑙−1 ), and the output is 𝐺 𝑙 = (A𝑙 , H𝑙 )” [Wei page 2 Introduction]; “We define a unified framework that consists of four key modules for learning graph-level representation derived form existing pooling architectures, Aggregation, Pooling, Readout and Merge Module, respectively. In general, one Pooling Module is placed after each Aggregation Module in each layer, and Merge Module is utilized to incorporate the intermediate graph representations produced by Readout Module. In Figure 2(b), we use a 2-layer architecture backbone as an illustrative example of the unified framework. With the input Graph𝐺0, Aggregation Module updates node embeddings and produce the graph 𝐺1𝑎 = (A0,H1𝑎), Pooling Module generates the coarse graph 𝐺1 = (A1,H1) behind. 3 Readout Modules used to capture the graph representations z in all layers, and Merge Module generates the final graph representation z𝐹 . Based on this framework, we can unify most existing pooling methods including global and hierarchical ones” [Wei page 3 The Unified Framework]) the first iteration comprising:
obtaining a real-valued feature matrix 
    PNG
    media_image7.png
    50
    68
    media_image7.png
    Greyscale
 of the molecular graph from amongst the plurality of molecular graphs, wherein each row vector of the real-valued feature matrix 
    PNG
    media_image7.png
    50
    68
    media_image7.png
    Greyscale
 represents a feature attribute 
    PNG
    media_image8.png
    46
    41
    media_image8.png
    Greyscale
 associated with a node [i] of the plurality of nodes of the molecular graph; ("Notations. We represent a graph as 𝐺 = (A, H) ,where A ∈ R 𝑁 ×𝑁 is the adjacency matrix of this graph and H ∈ R 𝑁 ×𝑑 is the node features. 𝑁 is the node number. N ( e 𝑣) = {𝑣 } ∪ {𝑢|A𝑢𝑣 ≠ 0} represents set of the self-contained first-order neigbors of node 𝑣. Give a dataset D = {(𝐺1, 𝑦1), · · · , (𝐺𝑀, 𝑦𝑀 )}, (𝐺𝑖 , 𝑦𝑖) is the 𝑖-th graph of this dataset. 𝑀 is the number of total graphs, 𝑦 ∈ Y is the graph label. In a 𝐿-layer GNN, for clear presentation, the input graph is denoted by 𝐺 0 = (A0 , H0 ), and the input of 𝑙-th layer is 𝐺 𝑙−1 = (A𝑙−1 , H𝑙−1 ), and the output is 𝐺 𝑙 = (A𝑙 , H𝑙 ). The features of node 𝑣 in 𝑙-th layer are denoted by h 𝑙 𝑣" [Wei page 2 Introduction]; Under broadest reasonable interpretation in light of the instant specification, “real-valued feature matrix” is interpreted as merely a feature matrix comprising node features. Each row vector (d) of matrix H represents features for node N (i.e., feature attribute))
transforming feature attribute 
    PNG
    media_image8.png
    46
    41
    media_image8.png
    Greyscale
 associated with the node [i] by taking a product of the feature attribute 
    PNG
    media_image8.png
    46
    41
    media_image8.png
    Greyscale
 with a feed-forward layer 
    PNG
    media_image9.png
    53
    142
    media_image9.png
    Greyscale
, wherein the feed-forward layer comprises a parametrized neural network function 
    PNG
    media_image10.png
    38
    47
    media_image10.png
    Greyscale
; (“We define a unified framework that consists of four key modules for learning graph-level representation derived form existing pooling architectures, Aggregation, Pooling, Readout and Merge Module… Aggregation Module updates node embeddings and produce the graph 𝐺1𝑎 = (A0,H1𝑎),” [Wei page 3 The Unified Framework]; "Aggregation Module. We add five widely used GNNs: GCN [21], GAT [40], GraphSAGE [16] with mean aggregator, GIN [46] and GraphConv [30], which denoted as GCN, GAT, SAGE, GIN and GRAPHCONV. Besides, we incorporate the operation MLP, which applies a twolayer MLP (Multilayer Perceptrons) to update node embeddings without using the graph structure" [Wei page 4 The Design of the Search Space]; A graph neural network (GNN) or multilayer perceptron (MLP) (i.e., types of neural networks) inherently comprise parameters and matrix multiplication operations via layer-wise propagation (i.e., taking a product of node embeddings (i.e., feature attributes) with weights of neurons in feed-forward layers))
obtaining a hierarchical layer-wise propagation of a graph pooling layer of the molecular graph; ("On the other hand, hierarchical pooling methods are proposed to solve this problem by aggregate messages on coarser and coarser graphs, e.g., from 𝐺 0 to 𝐺 𝐿 as shown in Figure 2(a). It is achieved by applying a pooling operation to reduce the size of a graph after an aggregation operation in each layer. For these hierarchical pooling methods, SAGPool [23], Graph U-Net [11] and ASAP [34] sample a set of nodes based on diverse node score functions and form corresponding coarse graphs; DiffPool [49] and STRUCTPOOL [52] focus on grouping nodes into clusters with different assignment functions, re-generate the edges among these clusters" [Wei page 2 GNN for Graph Classification]; “We define a unified framework that consists of four key modules for learning graph-level representation derived form existing pooling architectures, Aggregation, Pooling, Readout and Merge Module,… . In Figure 2(b), we use a 2-layer architecture backbone as an illustrative example of the unified framework. With the input Graph G0, Aggregation Module updates node embeddings and produce the graph 𝐺 1𝑎 = (A0 , H1𝑎 ), Pooling Module generates the coarse graph 𝐺 1 = (A1 , H1 ) behind…Based on this framework, we can unify most existing pooling methods including global and hierarchical ones” [Wei page 3 The Unified Framework])
down-sampling the molecular graph using the hierarchical layer-wise propagation of the graph pooling layer, wherein the down-sampling of the molecular graph comprises performing a m-max-pooling operation on the molecular graph to sample a subset of m top-ranked nodes to form a coarsened molecular graph, wherein the down-sampling of the molecular graph results in rejecting a first set of nodes and retaining a second set of nodes from amongst the plurality of nodes of the molecular graph based on a ranking of the plurality of nodes; ("Pooling Module. The pooling operations in our search space can be unified by a computation process as [equation 3], [equation 4]. We firstly calculate a node score matrix S ∈ R 𝑁 ×1 with a score function 𝑓𝑠 , which is used to evaluate the importance of nodes with different metrics, then generate the coarse graph by selecting top-𝑘 nodes 𝑖𝑑𝑥 with the function TOP𝑘 , and formulating the coarse graph according to Eq. (4). Three existing pooling operations TOPKPOOL [11], SAGPOOL [23] and ASAP [34] are incorporated in our search space" [Wei page 4 The Design of the Search Space]; See Figure 2(a), 2(b), and 2(c) -- "Figure 2: (a) In general, hierarchical methods use one aggregation and one pooling operation in each layer, which is responsible for update node embeddings and generate the coarse graph...(b) We choose a 2-layer supernet as an illustrative example of the unified framework. Each layer contains 1 Aggregation Module and 1 Pooling Module. Merge Module used to incorporate 3 intermediate graph representations generated by Readout Module. (c) The coarsening strategy we used. For unselected nodes and edges (in grey), we set the features and weights to 0 so different coarse graph 𝐺 𝑙 𝑖 can be summarized directly" [Wei page 4])
determining a first adjacency matrix of the coarsened molecular graph using the second set of nodes; and
determining a first feature matrix of the coarsened molecular graph using the second set of nodes, wherein each row of the first feature matrix corresponds to hidden state node attributes of the coarsened molecular graph; ("...then generate the coarse graph by selecting top-𝑘 nodes 𝑖𝑑𝑥 with the function TOP𝑘 , and formulating the coarse graph according to Eq. (4)." [Wei page 4 The Design of the Search Space]; "Notations. We represent a graph as 𝐺 = (A, H) ,where A ∈ R 𝑁 ×𝑁 is the adjacency matrix of this graph and H ∈ R 𝑁 ×𝑑 is the node features. 𝑁 is the node number. N ( e 𝑣) = {𝑣 } ∪ {𝑢|A𝑢𝑣 ≠ 0} represents set of the self-contained first-order neigbors of node 𝑣. Give a dataset D = {(𝐺1, 𝑦1), · · · , (𝐺𝑀, 𝑦𝑀 )}, (𝐺𝑖 , 𝑦𝑖) is the 𝑖-th graph of this dataset. 𝑀 is the number of total graphs, 𝑦 ∈ Y is the graph label. In a 𝐿-layer GNN, for clear presentation, the input graph is denoted by 𝐺 0 = (A0 , H0 ), and the input of 𝑙-th layer is 𝐺 𝑙−1 = (A𝑙−1 , H𝑙−1 ), and the output is 𝐺 𝑙 = (A𝑙 , H𝑙 ). The features of node 𝑣 in 𝑙-th layer are denoted by h 𝑙 𝑣" [Wei page 2 Introduction]; Under broadest reasonable interpretation in light of the instant specification, "hidden state node attributes" are interpreted as any representation of attributes of nodes in the intermediate/hidden layers (i.e., hidden nodes) of a neural architecture)
	performing, via the one or more hardware processors, one or more second iterations, wherein each iteration of the one or more second iterations comprises performing, on the coarsened molecular graph of an immediately preceding iteration of the one or more second iterations, obtaining a real-valued feature matrix of the coarsened molecular graph, transforming feature attribute associated with each node of the second set of nodes, obtaining the hierarchical layer-wise propagation of a graph pooling layer of the coarsened molecular graph, down-sampling the coarsened molecular graph using the hierarchical layer- wise propagation of the graph pooling layer, and determining a second adjacency matrix and a second feature matrix of the coarsened molecular graph; ("In a 𝐿-layer GNN, for clear presentation, the input graph is denoted by 𝐺 0 = (A0 , H0 ), and the input of 𝑙-th layer is 𝐺 𝑙−1 = (A𝑙−1 , H𝑙−1 ), and the output is 𝐺 𝑙 = (A𝑙 , H𝑙 )" [Wei page 2 Introduction]; “On the other hand, hierarchical pooling methods are proposed to solve this problem by aggregate messages on coarser and coarser graphs, e.g., from 𝐺0 to 𝐺𝐿 as shown in Figure 2(a). It is achieved by applying a pooling operation to reduce the size of a graph after an aggregation operation in each layer” [Wei page 2 GNN for Graph Classification]; See Figure 2(a) and 2(b) [Wei page 4]; In a hierarchical pooling procedure, the aggregation and pooling modules (which perform the steps of the claimed procedure, as detailed above) are continually executed in one or more second iterations to generate coarser and coarser graphs, as shown in example 2(b) -- Layer 2 receives input G1 (i.e., G l-1, outputted coarsened graph of immediately preceding layer), and repeats (i.e., iterates) the aggregation (G1a) and pooling (G1) performed in layer 1 via its own aggregation (G2a) and pooling (G2) modules, wherein the pooling (G2) module outputs a further coarsened graph)
computing, via the one or more hardware processors, an average of the hidden state node attributes of the coarsened molecular graph obtained after preforming the one or more second iterations to obtain a graph level representation vector of the molecular graph; and
determining, via the one or more hardware processors, one or more molecular properties using a linear layer from the graph level representation vector. ("We define a unified framework that consists of four key modules for learning graph-level representation derived form existing pooling architectures, Aggregation, Pooling, Readout and Merge Module, respectively...3 Readout Modules used to capture the graph representations z in all layers, and Merge Module generates the final graph representation z𝐹" [Wei page 3 The Unified Framework]; “Readout Module. We provide 7 global pooling functions to obtain the graph representation vector z ∈ R 𝑑 : 3 existing methods GLOBAL_SORT [53], GLOBAL_ATT [25] and SET2SET [41]; simple global mean, max and sum functions denoted as GLOBAL_MEAN, GLOBAL_MAX and GLOBAL_SUM respectively; ZERO operation, which generate a zero vector, indicating the graph embeddings in this layer are not used for the final representation. [Wei page 5 The Design of the Search Space]; Merge Module. Motivated by [4, 47] that intermediate layers help to fomulate expressive embeddings, we add 5 merge functions to incorporate the graph representations in each layer: LSTM, concatenation, max, mean and sum, which denoted as M_LSTM, M_CONCAT, M_MAX, M_MEAN and M_SUM in our search space” [Wei page 5 The Design of the Search Space]; See Figure 2(a) and 2(b) [Wei page 4]; see Table 4 -- "Table 4: Performance comparisons of PAS and all baselines. We report the mean test accuracy and the standard deviation by 10-fold cross-validation. The best results in different groups of baselines are underlined, and the best result on each dataset is in boldface" [Wei page 7]; Under broadest reasonable interpretation in light of the instant specification, "using a linear layer" is interpreted as merely performing a linear operation. As detailed above, in a hierarchical pooling procedure, the aggregation and pooling modules (which perform all steps of the claimed procedure, as detailed above) are continually executed in one or more second iterations to generate coarser and coarser graphs, as shown in example 2(b). Further, the readout module can be used to produce intermediate graph representation vectors after any iteration (e.g., second iteration) of the aggregation and pooling modules, as shown in example 2(b) -- z2 is an intermediate representation after two iterations (G1a and G1, G2a and G2) of the aggregation and pooling modules. The readout module can therefore produce a graph representation vector z of any intermediate layer (and its associated nodes) in the PAS framework based on an average (e.g., GLOBAL_MEAN) function, and the merge module can perform linear operations (e.g., M_MEAN, M_SUM) on readout module outputs (i.e., graph representation vectors) to determine a final graph representation vector used for classification inference (e.g., for predicting molecular properties - see D&D and PROTEINS datasets [Wei page 6 Experimental Settings] and Table 4 reporting mean test accuracy of PAS framework on D&D and PROTEINS datasets))
However, Wei does not expressly teach wherein the database include annotated independent and identically distributed molecular graphs, or taking a product of the feature attribute with a feed-forward layer to obtain edge-information aware node attributes 
    PNG
    media_image11.png
    56
    187
    media_image11.png
    Greyscale
, wherein the feed-forward layer 
    PNG
    media_image12.png
    52
    138
    media_image12.png
    Greyscale
comprises a parametrized neural network function [θ] performed on edge features 
    PNG
    media_image13.png
    56
    433
    media_image13.png
    Greyscale
 of the plurality of edges.
In the same field of endeavor, Jiang teaches a neural architecture search (NAS) approach for automating design of graph neural networks (GNNs) for classification inference (e.g., molecular property prediction) ("Neural architecture search (NAS) is a promising approach to discover high-performing neural network architectures automatically. To that end, we develop an NAS approach to automate the design and development of GNNs for molecular property prediction. Specifically, we focus on automated development of message-passing neural networks (MPNNs) to predict the molecular properties of small molecules in quantum mechanics and physical chemistry datasets from the MoleculeNet benchmark" [Jiang Abstract]) wherein the database includes annotated independent and identically distributed molecular graphs (“For a molecular propriety dataset, the amount of training data is often limited by the expense of simulation and/or experiments. The molecular datasets are diverse in terms of their molecular structure and properties. Consequently, a model trained for one molecular dataset cannot be transferred to another because of the non-Euclidean characteristics of the molecular structure data…The manually designed MPNNs not only require a substantial number of experiments in the design space but also tend to be lower-performing when applied to a new dataset. The data-specific nature of MPNN design means that it urgently needs an automated MPNN search to identify the best task-specific architecture for a given dataset… We focus on developing NAS for MPNNs that incorporates both the node and edge features to predict molecular properties... We use the small molecule datasets provided by MoleculeNet [10] that include two groups of datasets” [Jiang pages 1-2 Introduction]; “For the split of training, validation, and test data, we followed the MoleculeNet implementation and used the given stratified splitter to split the QM7 dataset and the random splitter to split other datasets using fixed random seeds.” [Jiang page 5 Evaluation Strategy]; The NAS method is utilized to identify an architecture particularly corresponding to a given molecular dataset; it is thereby assumed that when the dataset is split into training and testing sets, that the data is drawn from the same underlying distribution (i.e., independent and identically distributed), thereby allowing the model to learn correlations from the training set and apply them to the test set),
that takes a product of a feature attribute
    PNG
    media_image14.png
    51
    62
    media_image14.png
    Greyscale
 with a feed-forward layer 
    PNG
    media_image12.png
    52
    138
    media_image12.png
    Greyscale
 to obtain edge-information aware node attributes 
    PNG
    media_image11.png
    56
    187
    media_image11.png
    Greyscale
, wherein the feed-forward layer 
    PNG
    media_image12.png
    52
    138
    media_image12.png
    Greyscale
comprises a parametrized neural network function [θ] performed on edge features 
    PNG
    media_image13.png
    56
    433
    media_image13.png
    Greyscale
 of the plurality of edges (“To update the hidden feature of a node v, the message function Mt at step t takes as inputs the node v feature h t v , the neighboring node feature h t w for w ∈ N (v), and the edge feature evw between node v and w…. The update function Ut at step t combines the node feature h t v and the intermediate hidden feature mt+1 v to create the new hidden feature of step t+ 1 h t+1 v . The detailed structures of the message function Mt and update function Ut are as follows. [equation 3] [equation 4] (4) Typically, the message function has a multilayer perceptron (MLP or edge network) to handle the edge feature evw. The processed edge feature is multiplied with h t w to yield a message from node w to v. In this case, the processed edge feature MLP(evw) (evw) can be viewed as a weight for h t w” [Jiang page 3 Stacked MPNN Search Space]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein the database include annotated independent and identically distributed molecular graphs and taking a product of a feature attribute
    PNG
    media_image14.png
    51
    62
    media_image14.png
    Greyscale
 with a feed-forward layer 
    PNG
    media_image12.png
    52
    138
    media_image12.png
    Greyscale
 to obtain edge-information aware node attributes 
    PNG
    media_image11.png
    56
    187
    media_image11.png
    Greyscale
, wherein the feed-forward layer 
    PNG
    media_image12.png
    52
    138
    media_image12.png
    Greyscale
comprises a parametrized neural network function [θ] performed on edge features 
    PNG
    media_image13.png
    56
    433
    media_image13.png
    Greyscale
 of the plurality of edges as taught by Jiang into Wei because they are both directed towards NAS approaches for automating design of GNNs for classification inference.
Wei already discusses reference Jiang as teaching a related NAS framework in the art ("Apart from design aggregation layers, RE-MPNN [19] learns adaptive global pooling functions additionally. However, these methods fail to obtain the data-specific pooling architectures because the pooling operations which are essential to graph classification are not considered" [Wei page 3 Graph Neural Architecture Search]; see Table 1 including RE-MPNN [19] -- "Table 1: Comparing existing human-designed and NAS based pooling methods with PAS. We set the search algorithm of hand-designed methods as “-”. A: Aggregation, P: Pooling, R: Readout, M: Merge" [Wei page 3]; see [19] in References [Wei page 11]), and also further discusses a variety of aggregator functions being included in the architecture search space (see Aggregation Module [Wei page 4]). Wei also discusses application of the PAS framework to molecular graph data (“Datasets. In this paper, we use six datasets as shown in Table 3. D&D and PROTEINS datasets, provided by [6], are both protein graphs” [Wei page 6 Experimental Settings]). Therefore, one of ordinary skill in the art would recognize that incorporating the teachings of Jiang (e.g., incorporating the message passing function in Jiang into the aggregator module of the PAS framework) would enable consideration of edge features in molecular property prediction ("We focus on developing NAS for MPNNs that incorporates both the node and edge features to predict molecular properties" [Jiang page 2 Introduction]), as is traditional for message passing neural networks (MPNNs), which are well-known and widely used in the art ("..the information passing weight between nodes is governed by the edge feature in traditional MPNNs" [Jiang page 3 Stacked MPNN Search Space]; "MPNNs have been widely used to study molecular properties [8]" [Jiang page 2 Introduction]).
However, the combination of Wei and Jiang does not explicitly teach computing a scalar projection [zi] of the real-valued feature matrix 
    PNG
    media_image15.png
    48
    62
    media_image15.png
    Greyscale
 on a projection vector 
    PNG
    media_image16.png
    47
    57
    media_image16.png
    Greyscale
, and wherein the scalar projection [zi] further measures a feature information of the node [i] to be retained when projected in the direction of the projection vector 
    PNG
    media_image16.png
    47
    57
    media_image16.png
    Greyscale
;
obtaining the hierarchical layer-wise propagation of a graph pooling layer of the molecular graph by taking a product of the edge-information aware node attributes 
    PNG
    media_image17.png
    56
    187
    media_image17.png
    Greyscale
 and a unit vector associated with the projection vector 
    PNG
    media_image16.png
    47
    57
    media_image16.png
    Greyscale
, wherein the direction of the unit vector is same as the direction of the projection vector 
    PNG
    media_image16.png
    47
    57
    media_image16.png
    Greyscale
, wherein the hierarchical layer-wise propagation of the graph pooling layer is represented by 
    PNG
    media_image18.png
    102
    265
    media_image18.png
    Greyscale

or wherein the ranking of the plurality of nodes is performed by utilizing a scalar projection scores to sample indexes of the second set of nodes, wherein the ranking augments a node-local receptive field and enables a high-level feature information encoding by performing graph-pooling on various input graph sizes, wherein the hierarchical layer-wise propagation of the graph pooling layer considers both the edge-information aware node attributes 
    PNG
    media_image19.png
    30
    92
    media_image19.png
    Greyscale
 and a graph topology to perform the down-sampling on the molecular graph, wherein selection of the first set of nodes and the second set of nodes is performed based on a pooling ratio.
In the same field of endeavor, Gao teaches a method of using a hierarchical pooling framework for classification inference on graph data (“Based on the gPool and gUnpool layers, we develop graph U-Nets, which allow high-level feature encoding and decoding for network embedding. Experimental results on node classification and graph classification tasks demonstrate the effectiveness of our proposed methods as compared to previous methods” [Gao page 2 Introduction]) that computes a scalar projection [zi] of the real-valued feature matrix 
    PNG
    media_image15.png
    48
    62
    media_image15.png
    Greyscale
 on a projection vector 
    PNG
    media_image16.png
    47
    57
    media_image16.png
    Greyscale
, and wherein the scalar projection [zi] further measures a feature information of the node [i] to be retained when projected in the direction of the projection vector 
    PNG
    media_image16.png
    47
    57
    media_image16.png
    Greyscale
;
obtain[s] the hierarchical layer-wise propagation of a graph pooling layer of the molecular graph by taking a product of the node attributes and a unit vector associated with the projection vector 
    PNG
    media_image16.png
    47
    57
    media_image16.png
    Greyscale
, wherein the direction of the unit vector is same as the direction of the projection vector 
    PNG
    media_image16.png
    47
    57
    media_image16.png
    Greyscale
, wherein the hierarchical layer-wise propagation of the graph pooling layer is represented by 
    PNG
    media_image18.png
    102
    265
    media_image18.png
    Greyscale
("In this section, we propose the graph pooling (gPool) layer to enable down-sampling on graph data. In this layer, we adaptively select a subset of nodes to form a new but smaller graph. To this end, we employ a trainable projection vector p. By projecting all node features to 1D, we can perform k-max pooling for node selection. Since the selection is based on 1D footprint of each node, the connectivity in the new graph is consistent across nodes. Given a node i with its feature vector xi , the scalar projection of xi on p is 
    PNG
    media_image20.png
    32
    172
    media_image20.png
    Greyscale
. Here, yi measures how much information of node i can be retained when projected onto the direction of p. By sampling nodes, we wish to preserve as much information as possible from the original graph. To achieve this, we select nodes with the largest scalar projection values on p to form a new graph…The layer-wise propagation rule of the graph pooling layer 
    PNG
    media_image21.png
    27
    17
    media_image21.png
    Greyscale
 is defined as 
    PNG
    media_image22.png
    47
    247
    media_image22.png
    Greyscale
” [Gao pages 2-3 Graph Pooling Layer]; The examiner notes that the claimed "computing a scalar projection" step is interpreted as being interrelated with "taking a product of the aware node attributes 
    PNG
    media_image17.png
    56
    187
    media_image17.png
    Greyscale
 and a unit vector associated with the projection vector 
    PNG
    media_image23.png
    47
    57
    media_image23.png
    Greyscale
" based on the mathematical definition of a scalar projection ((see Oregon State University (“Dot Products and Projections”) [pages 1-2] – scalar projection of vector b onto vector a is functionally equivalent to the dot product of b and a/||a|} (unit vector of a, i.e., a divided by its magnitude))
and reject[s] a first set of nodes and retain[s] a second set of nodes from amongst a plurality of nodes of the molecular graph based on a ranking of the plurality of nodes wherein the ranking of the plurality of nodes is performed by utilizing a scalar projection scores to sample indexes of the second set of nodes, (“The layer-wise propagation rule of the graph pooling layer 
    PNG
    media_image21.png
    27
    17
    media_image21.png
    Greyscale
 is defined as 
    PNG
    media_image24.png
    282
    318
    media_image24.png
    Greyscale

where k is the number of nodes selected in the new graph. rank(y; k) is the operation of node ranking, which returns indices of the k-largest values in y. The idx returned by rank(y; k) contains the indices of nodes selected for the new graph… 
    PNG
    media_image25.png
    38
    38
    media_image25.png
    Greyscale
 is the feature matrix with row vectors x ` 1 , x ` 2 , · · · , x ` N , each of which corresponds to a node in the graph. We first compute the scalar projection of X` on p ` , resulting in y = [y1, y2, · · · , yN ] T with each yi measuring the scalar projection value of each node on the projection vector p ` . Based on the scalar projection vector y, rank(·) operation ranks values and returns the k-largest values in y” [Gao pages 2-3 Graph Pooling Layer]) wherein the ranking augments a node-local receptive field (“After the graph embedding layer, we build the encoder by stacking several encoding blocks, each of which contains a gPool layer followed by a GCN layer. gPool layers reduce the size of graph to encode higher-order features, while GCN layers are responsible for aggregating information from each node’s first-order information” [Gao page 4 Graph Unpooling Layer]; see Figure 2 including gPool layer producing coarsened graph with 4 nodes (based on ranking, as explained above [Gao pages 2-3 Graph Pooling Layer), followed by GCN layer aggregating node first-order information (i.e., node-local receptive field) [Gao page 4]) and enables a high-level feature information encoding by performing graph-pooling on various input graph sizes, (“Suppose there are N nodes in a graph G and each of which contains C features. The graph can be represented by two matrices; those are the adjacency matrix 
    PNG
    media_image26.png
    36
    148
    media_image26.png
    Greyscale
and the feature matrix  
    PNG
    media_image27.png
    32
    157
    media_image27.png
    Greyscale
” [Gao page 2 Graph Pooling Layer]; see Figure 1 – “An illustration of the proposed graph pooling layer” – including Input matrix 
    PNG
    media_image28.png
    37
    32
    media_image28.png
    Greyscale
 (representing graph of variable size) processed to produce pooled feature map 
    PNG
    media_image29.png
    33
    37
    media_image29.png
    Greyscale
 [Gao page 3]) wherein the hierarchical layer-wise propagation of the graph pooling layer considers both the node attributes and a graph topology to perform the down-sampling on the molecular graph, (see Figure 1 – “An illustration of the proposed graph pooling layer” including Top k Node Selection which as explained above [Gao pages 2-3 Graph Pooling Layer] is directly based on scores drawn from node attributes [Gao page 3]; “Figure 3 provides an illustration of a sample g-U-Nets with two blocks in encoder and decoder. Notably, there is a GCN layer before each gPool layer, thereby enabling gPool layers to capture the topological information in graphs implicitly” [Gao page 4 Graph U-Nets Architecture]) wherein selection of the first set of nodes and the second set of nodes is performed based on a pooling ratio (“Suppose there are N nodes in a graph G and each of which contains C features… The layer-wise propagation rule of the graph pooling layer 
    PNG
    media_image30.png
    25
    17
    media_image30.png
    Greyscale
 is defined as: [equation 2] where k is the number of nodes selected in the new graph” [Gao pages 2-3 Graph Pooling Layer]; The examiner notes that under a broadest reasonable representation in light of the specification [0045-0047], selection “based on a pooling ratio” amounts to selecting a particular number of nodes (e.g., k) out of the total number of nodes (e.g., N) for inclusion in the coarsened graph (i.e., forming a “ratio” of rejected nodes (i.e., first set of nodes N - k) to retained nodes (i.e., second set of nodes k) of N – k : k)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated computing a scalar projection [zi] of the real-valued feature matrix 
    PNG
    media_image15.png
    48
    62
    media_image15.png
    Greyscale
 on a projection vector 
    PNG
    media_image16.png
    47
    57
    media_image16.png
    Greyscale
, and wherein the scalar projection [zi] further measures a feature information of the node [i] to be retained when projected in the direction of the projection vector 
    PNG
    media_image16.png
    47
    57
    media_image16.png
    Greyscale
,
obtaining the hierarchical layer-wise propagation of a graph pooling layer of the molecular graph by taking a product of the node attributes and a unit vector associated with the projection vector 
    PNG
    media_image16.png
    47
    57
    media_image16.png
    Greyscale
, wherein the direction of the unit vector is same as the direction of the projection vector 
    PNG
    media_image16.png
    47
    57
    media_image16.png
    Greyscale
, wherein the hierarchical layer-wise propagation of the graph pooling layer is represented by 
    PNG
    media_image18.png
    102
    265
    media_image18.png
    Greyscale

and reject[s] a first set of nodes and retain[s] a second set of nodes from amongst a plurality of nodes of the molecular graph based on a ranking of the plurality of nodes, wherein the ranking of the plurality of nodes is performed by utilizing a scalar projection scores to sample indexes of the second set of nodes, wherein the ranking augments a node-local receptive field and enables a high-level feature information encoding by performing graph-pooling on various input graph sizes, wherein the hierarchical layer-wise propagation of the graph pooling layer considers both the node attributes and a graph topology to perform the down-sampling on the molecular graph, wherein selection of the first set of nodes and the second set of nodes is performed based on a pooling ratio as taught by Gao into Wei because they are both directed towards using hierarchical pooling frameworks for classification inference on graph data.
Wei already teaches use of the Graph U-Net framework of Gao within the pooling module of the PAS framework to generate coarse graph representations (i.e., perform down-sampling) (see TOPKPOOL [11] in Pooling Module – “Three existing pooling operations TOPKPOOL [11], SAGPOOL [23] and ASAP [34] are incorporated in our search space” [Wei Page 4]; see [11] in References [Wei page 11)). Therefore, one of ordinary skill in the art would recognize that incorporating the teachings of Gao would further enable the PAS framework to include the Graph U-Nets framework within the pooling module of the neural architecture search space (subsequent to the modified aggregator module as taught by Jiang), which would be desirable given that Graph U-Nets are a widely used ("For hierarchical pooling methods, we use 5 popular ones: Graph U-Net [11], DiffPool [49], SAGPool [23], ASAP [34] and MinCutPool [2]" [Wei page 6 Experimental Settings]) hierarchical pooling framework useful for classification inference (“Our experimental results on node classification and graph classification tasks demonstrate that our methods achieve consistently better performance than previous models" [Gao Abstract]).
Regarding claim 7, it is a system/apparatus claim that corresponds to the method of claim 1, which is taught by the combination of Wei, Jiang and Gao. Wei further teaches A system, comprising:
a memory storing instructions;
one or more communication interfaces; and
one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: perform the claimed functions (“All models are implemented with Pytorch [31] on a GPU 2080Ti (Memory: 12GB, Cuda version: 10.2). Thus, for consistent comparisons of baseline models, we use the implementation of all GNN baselines by the popular GNN library: Pytorch Geometric (PyG) (version 1.6.1) [9], which provides a unifying code framework 4 for various GNN models [Wei page 12 The implementation details of PAS])
Consequently, claim 7 is also taught by the combination of Wei, Jiang, and Gao, and is therefore rejected for the same reasons as claim 1.
Regarding claim 13, it is a product claim that corresponds to the method of claim 1, which is taught by the combination of Wei, Jiang and Gao. Wei further teaches One or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause: the claimed functions (“All models are implemented with Pytorch [31] on a GPU 2080Ti (Memory: 12GB, Cuda version: 10.2). Thus, for consistent comparisons of baseline models, we use the implementation of all GNN baselines by the popular GNN library: Pytorch Geometric (PyG) (version 1.6.1) [9], which provides a unifying code framework 4 for various GNN models [Wei page 12 The implementation details of PAS]).
Consequently, claim 13 is also taught by the combination of Wei, Jiang, and Gao, and is therefore rejected for the same reasons as claim 1.
Regarding claim 2, the combination of Wei, Jiang, and Gao teaches the limitations of parent claim 1, and Wei further teaches wherein obtaining the graph level representation vector of the molecular graph comprises: 
feeding the coarsened molecular graph as an input to a node-ordering invariant read-out function to determine the graph-level representation vector; (see Readout Module [Wei page 5 The Design of the Search Space], as detailed above).
Gao further teaches wherein obtaining the graph level representation vector of the molecular graph comprises: 
performing spatial-graph convolution on the coarsened molecular graph to transform and update each hidden state node attribute of the hidden state node attributes;
performing a first predetermined number of down-sampling and subsequent spatial-graph convolution on the coarsened molecular graph;
performing a second number, the second number being equivalent to the first predetermined number, of up-sampling and subsequent spatial-graph convolution to reinstate the coarsened molecular graph to an isomorphic clone of the molecular graph, wherein the isomorphic clone of the molecular graph is a node- information transformed isomorphic clone; and
feeding the node-information transformed isomorphic clone of the molecular graph as an input for further classification inference (“Inspired by the first order graph Laplacian methods, (Kipf & Welling, 2017) proposed graph convolutional networks (GCNs), which achieved promising performance on graph node classification tasks. The layer-wise forward-propagation operation of GCNs is defined as: [equation 1] where Aˆ = A + I is used to add self-loops in the input adjacency matrix A, X` is the feature matrix of layer `. The GCN layer uses the diagonal node degree matrix Dˆ to normalize Aˆ. W` is a trainable weight matrix that applies a linear transformation to feature vectors. GCNs essentially perform aggregation and transformation on node features without learning trainable filters” [Gao page 2 Related Work]; “After the graph embedding layer, we build the encoder by stacking several encoding blocks, each of which contains a gPool layer followed by a GCN layer. gPool layers reduce the size of graph to encode higher-order features, while GCN layers are responsible for aggregating information from each node’s first-order information. In the decoder part, we stack the same number of decoding blocks as in the encoder part. Each decoder block is composed of a gUnpool layer and a GCN layer. The gUnpool layer restores the graph into its higher resolution structure, and the GCN layer aggregates information from the neighborhood…Notably, there is a GCN layer before each gPool layer, thereby enabling gPool layers to capture the topological information in graphs implicitly” [Gao page 4 Graph U-Nets Architecture]; see Figure 3 – “Figure 3. An illustration of the proposed graph U-Nets (g-U-Nets). In this example, each node in the input graph has two features. The input feature vectors are transformed into low-dimensional representations using a GCN layer. After that, we stack two encoder blocks, each of which contains a gPool layer and a GCN layer. In the decoder part, there are also two decoder blocks. Each block consists of a gUnpool layer and a GCN layer. For blocks in the same level, encoder block uses skip connection to fuse the low-level spatial features from the encoder block. The output feature vectors of nodes in the last layer are network embedding, which can be used for various tasks such as node classification and link prediction” [Gao page 5]),
wherein the spatial graph convolution takes as input the coarsened molecular graph and each node in the coarsened molecular graph during the spatial graph convolution receives and dispatches feature information embedded local-graph messages to respective local-graph neighbors, and each node in the coarsened molecular graph transforms hidden state node attribute based on neural-information messages perceived from local neighborhood (“After the graph embedding layer, we build the encoder by stacking several encoding blocks, each of which contains a gPool layer followed by a GCN layer. gPool layers reduce the size of graph to encode higher-order features, while GCN layers are responsible for aggregating information from each node’s first-order information” [Gao page 4 Graph U-Nets Architecture]; see Figure 2 including gPool layer producing coarsened graph, followed by GCN layer aggregating node first-order information (i.e. local neighborhood) [Gao page 4]; As is typical behavior for message-passing algorithms, the GCN layer receives and dispatches information across nodes wherein each node aggregates information from its first-order neighbors (i.e., local graph neighbors))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein obtaining the graph level representation vector of the molecular graph comprises:  performing spatial-graph convolution on the coarsened molecular graph to transform and update each hidden state node attribute of the hidden state node attributes; performing a first predetermined number of down-sampling and subsequent spatial-graph convolution on the coarsened molecular graph; performing a second number, the second number being equivalent to the first predetermined number, of up-sampling and subsequent spatial-graph convolution to reinstate the coarsened molecular graph to an isomorphic clone of the molecular graph, wherein the isomorphic clone of the molecular graph is a node- information transformed isomorphic clone; and feeding the node-information transformed isomorphic clone of the molecular graph as an input for further classification inference, wherein the spatial graph convolution takes as input the coarsened molecular graph and each node in the coarsened molecular graph during the spatial graph convolution receives and dispatches feature information embedded local-graph messages to respective local-graph neighbors, and each node in the coarsened molecular graph transforms hidden state node attribute based on neural-information messages perceived from local neighborhood, as taught by Gao into Wei because they are both directed towards using hierarchical pooling frameworks for classification inference on graph data.
Wei already teaches use of the Graph U-Net framework of Gao within the pooling module of the PAS framework to generate coarse graph representations (i.e., perform down-sampling) (see TOPKPOOL [11] in Pooling Module [Wei page 4] and [11] in References [Wei page 11], as detailed above). Additionally, similarly to Gao, Wei already teaches use of graph convolution networks (GCNs) within the aggregation module of the PAS framework (“Aggregation Module. We add five widely used GNNs: GCN [21], GAT [40], GraphSAGE [16] with mean aggregator, GIN [46] and GraphConv [30], which denoted as GCN, GAT, SAGE, GIN and GRAPHCONV” [Wei page 5 The Design of the Search Space]; see [21] in References [Wei page 11]). Therefore, one of ordinary skill in the art would recognize that incorporating the teachings of Gao would further enable the PAS framework to include the Graph U-Nets framework within the neural architecture search space, which would be desirable given that Graph U-Nets are a widely used ([Wei page 6 Experimental Settings], as detailed above) hierarchical pooling framework useful for classification inference ([Gao Abstract], as detailed above).
Regarding claim 5, the combination of Wei, Jiang, and Gao teaches the limitations of parent claim 1, and Wei further teaches wherein obtaining the graph level representation vector of the molecular graph comprises: 
feeding the coarsened molecular graph as an input to a node-ordering invariant read-out function to determine the graph-level representation vector; (see Readout Module [Wei page 5 The Design of the Search Space], as detailed above).
Gao further teaches wherein obtaining the graph level representation vector of the molecular graph comprises: 
performing spatial-Identity Mapping Convolution Networks on the coarsened molecular graph to transform and update each hidden state node attribute of the hidden state node attributes;
performing an first predetermined number of down-sampling and subsequent spatial-Identity Mapping Convolution Networks on the coarsened molecular graph;
performing a second number, the second number being equivalent to the first predetermined number, of up-sampling and subsequent spatial-Identity Mapping Convolution Networks to reinstate the coarsened molecular graph to an isomorphic clone of the molecular graph, wherein the isomorphic clone of the molecular graph is a node- information transformed isomorphic clone; and
feeding the node-information transformed isomorphic clone of the molecular graph as an input for further classification inference ([Gao page 2 Related Work] and [Gao page 4 Graph U-Nets Architecture] and Figure 3 [Gao page 5], as detailed above; “For all layers in the model, we use identity activation function (Gao et al., 2018) after each GCN layer” [Gao page 6 Experimental Setup]),
wherein the spatial-Identity Mapping Convolution Networks overhauls the each hidden state node attribute by perceiving neural messages from local-graph neighbors, wherein the neural messages characterize local-graph neighborhood feature information transformed by the node attributes as given by a local-graph connectivity (“After the graph embedding layer, we build the encoder by stacking several encoding blocks, each of which contains a gPool layer followed by a GCN layer. gPool layers reduce the size of graph to encode higher-order features, while GCN layers are responsible for aggregating information from each node’s first-order information” [Gao page 4 Graph U-Nets Architecture]; see Figure 2 including gPool layer producing coarsened graph, followed by GCN layer aggregating node first-order information (i.e. local-graph neighborhood) [Gao page 4]; As is typical behavior for message-passing algorithms, the GCN layer receives and dispatches information across nodes wherein each node aggregates feature information from its first-order neighbors (i.e., local-graph neighbors))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein obtaining the graph level representation vector of the molecular graph comprises:  performing spatial-Identity Mapping Convolution Networks on the coarsened molecular graph to transform and update each hidden state node attribute of the hidden state node attributes; performing an first predetermined number of down-sampling and subsequent spatial-Identity Mapping Convolution Networks on the coarsened molecular graph; performing a second number, the second number being equivalent to the first predetermined number, of up-sampling and subsequent spatial-Identity Mapping Convolution Networks to reinstate the coarsened molecular graph to an isomorphic clone of the molecular graph, wherein the isomorphic clone of the molecular graph is a node- information transformed isomorphic clone; and feeding the node-information transformed isomorphic clone of the molecular graph as an input for further classification inference, wherein the spatial-Identity Mapping Convolution Networks overhauls the each hidden state node attribute by perceiving neural messages from local-graph neighbors, wherein the neural messages characterize local-graph neighborhood feature information transformed by the node attributes as given by a local-graph connectivity as taught by Gao into Wei because they are both directed towards using hierarchical pooling frameworks for classification inference on graph data.
Wei already teaches use of the Graph U-Net framework of Gao within the pooling module of the PAS framework to generate coarse graph representations (i.e., perform down-sampling) (see TOPKPOOL [11] in Pooling Module [Wei page 4] and [11] in References [Wei page 11], as detailed above). Additionally, similarly to Gao, Wei already teaches use of graph convolution networks (GCNs) within the aggregation module of the PAS framework (see Aggregation Module [Wei page 5 The Design of the Search Space] and [21] in References [Wei page 11], as detailed above). Therefore, one of ordinary skill in the art would recognize that incorporating the teachings of Gao would further enable the PAS framework to include the Graph U-Nets framework within the neural architecture search space, which would be desirable given that Graph U-Nets are a widely used ([Wei page 6 Experimental Settings], as detailed above) hierarchical pooling framework useful for classification inference ([Gao Abstract], as detailed above).
Regarding claims 8 and 14, they are system/apparatus and product claims that correspond to the method of claim 2, which is taught by the combination of Wei, Jiang, and Gao. Consequently, claims 8 and 14 are also taught by the combination of Wei, Jiang, and Gao and are therefore rejected for the same reasons as claim 2.
Regarding claim 17, it is a product claim that corresponds to the method of claim 5, which is taught by the combination of Wei, Jiang, and Gao. Consequently, claim 17 is also taught by the combination of Wei, Jiang, and Gao, and is therefore rejected for the same reasons as claim 5.
Regarding claim 19, the combination of Wei, Jiang, and Gao teaches the limitations of parent claim 1, and Gao further teaches wherein the graph pooling layer is an Edge-Conditioned Hierarchical Graph Pooling Layer (EC-GPL) and an encoder of an Edge-Conditioned Graph U-Nets architecture consists of a plurality of encoding blocks and a decoder of the Edge-Conditioned Graph U-Nets architecture consists of a plurality of decoding blocks, (“After the graph embedding layer, we build the encoder by stacking several encoding blocks, each of which contains a gPool layer followed by a GCN layer. gPool layers reduce the size of graph to encode higher-order features, while GCN layers are responsible for aggregating information from each node’s first-order information. In the decoder part, we stack the same number of decoding blocks as in the encoder part” [Gao page 4 Graph U-Nets Architecture]; see Figure 3 – “An illustration of the proposed graph U-Nets (g-U-Nets)” including two encoder blocks and two decoder blocks [Gao page 5])
wherein each encoding block of the plurality of encoding blocks is composed of a downsampling (gPool) layer and an Edge-Conditioned Graph Convolutional Network (ECGCN) layer (“…we build the encoder by stacking several encoding blocks, each of which contains a gPool layer followed by a GCN layer” [Gao page 4 Graph U-Nets Architecture]), wherein the gPool layer reduces a feature map size by encoding higher-order features and augmenting the receptive field, wherein the EC-GCN transforms node embeddings of the coarsened graph by neural-message passing schemes which encapsulate an hierarchical structure by exploiting the graph topology, (“…gPool layers reduce the size of graph to encode higher-order features, while GCN layers are responsible for aggregating information from each node’s first-order information…Notably, there is a GCN layer before each gPool layer, thereby enabling gPool layers to capture the topological information in graphs implicitly” [Gao page 4 Graph U-Nets Architecture])
wherein each decoder block of the plurality of decoding blocks consists of an Upsampling (gUnpool) layer and accompanied subsequently by the EC-GCN layer, (“Each decoder block is composed of a gUnpool layer and a GCN layer. The gUnpool layer restores the graph into its higher resolution structure, and the GCN layer aggregates information from the neighborhood” [Gao page 4 Graph U-Nets Architecture])
wherein the Edge-Conditioned Graph U-Nets architecture includes skip-connections to pass on the node attributes information by feature map summation between resembling and coinciding blocks of the encoder and the decoder layers on different granularities of the coarsened molecular graph, and wherein the skip-connections retains neural embedded information of the nodes in the molecular graph obtained from former intermediate message passing schemes (“There are skip-connections between corresponding blocks of encoder and decoder layers, which transmit spatial information to decoders for better performance. The skip-connection can be either feature map addition or concatenation” [Gao page 4 Graph U-Nets Architecture]; “In the encoder part, we stack four blocks, each of which consists of a gPool layer and a GCN layer. We sample 2000, 1000, 500, 200 nodes in the four gPool layers, respectively. Correspondingly, the decoder part also contains four blocks. Each decoder block is composed of a gUnpool layer and a GCN layer. We use addition operation in skip connections between blocks of encoder and decoder parts” [Gao page 6 Experimental Setup])
Claim 3, 9, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Wei, Jiang, and Gao, as applied to claims 1, 7, and 13 above, further in view of Hamilton et al., ("Inductive Representation Learning on Large Graphs", available arXiv 10 Sep 2018), hereinafter Hamilton.
Regarding claim 3, the combination of Wei, Jiang, and Gao teaches the limitations of parent claim 1, and Wei further teaches wherein obtaining the graph level representation vector of the molecular graph comprises: 
feeding the coarsened molecular graph as an input to a node-ordering invariant read-out function to determine the graph-level representation vector; (see Readout Module [Wei page 5 The Design of the Search Space], as detailed above).
Gao further teaches wherein obtaining the graph level representation vector of the molecular graph comprises: 
performing aggregation on the coarsened molecular graph to transform and update each hidden state node attribute of the hidden state node attributes;
performing a first predetermined number of down-sampling and subsequent aggregation on the coarsened molecular graph;
performing a second number, the second number being equivalent to the first predetermined number, of up-sampling and subsequent aggregation to reinstate the coarsened molecular graph to an isomorphic clone of the molecular graph, wherein the isomorphic clone of the molecular graph is a node- information transformed isomorphic clone; and
feeding the node-information transformed isomorphic clone of the molecular graph as an input for further classification inference ([Gao page 2 Related Work] and [Gao page 4 Graph U-Nets Architecture] and Figure 3 [Gao page 5], as detailed above)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein obtaining the graph level representation vector of the molecular graph comprises:  performing aggregation on the coarsened molecular graph to transform and update each hidden state node attribute of the hidden state node attributes; performing a first predetermined number of down-sampling and subsequent aggregation on the coarsened molecular graph; performing a second number, the second number being equivalent to the first predetermined number, of up-sampling and subsequent aggregation to reinstate the coarsened molecular graph to an isomorphic clone of the molecular graph, wherein the isomorphic clone of the molecular graph is a node- information transformed isomorphic clone; and feeding the node-information transformed isomorphic clone of the molecular graph as an input for further classification inference as taught by Gao into Wei because they are both directed towards using hierarchical pooling frameworks for classification inference on graph data.
Wei already teaches use of the Graph U-Net framework of Gao within the pooling module of the PAS framework to generate coarse graph representations (i.e., perform down-sampling) (see TOPKPOOL [11] in Pooling Module [Wei page 4] and [11] in References [Wei page 11], as detailed above). Therefore, one of ordinary skill in the art would recognize that incorporating the teachings of Gao would further enable the PAS framework to include the Graph U-Nets framework within the neural architecture search space, which would be desirable given that Graph U-Nets are a widely used ([Wei page 6 Experimental Settings], as detailed above) hierarchical pooling framework useful for classification inference ([Gao Abstract], as detailed above).
However, the combination of Wei, Jiang, and Gao does not explicitly teach using spatial dynamic neighborhood aggregation as an aggregation function, wherein the spatial-dynamic neighborhood aggregation transforms the each hidden state node attribute by revisiting a previous iteration step local-graph neighborhood hidden states during a message-passing phase through attention mechanism.
In the same field of endeavor, Hamilton teaches a means of transforming node embeddings in a graph neural network for classification inference (“Here we present GraphSAGE, a general inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions” [Hamilton Abstract]) by using spatial dynamic neighborhood aggregation as an aggregation function, wherein the spatial-dynamic neighborhood aggregation transforms the each hidden state node attribute by revisiting a previous iteration step local-graph neighborhood hidden states during a message-passing phase through attention mechanism  (Algorithm 1 describes the embedding generation process in the case where the entire graph, G = (V, E), and features for all nodes xv, ∀v ∈ V, are provided as input. We describe how to generalize this to the minibatch setting below. Each step in the outer loop of Algorithm 1 proceeds as follows, where k denotes the current step in the outer loop (or the depth of the search) and h k denotes a node’s representation at this step: First, each node v ∈ V aggregates the representations of the nodes in its immediate neighborhood, {h k−1 u , ∀u ∈ N (v)}, into a single vector h k−1 N(v) . Note that this aggregation step depends on the representations generated at the previous iteration of the outer loop (i.e., k − 1), and the k = 0 (“base case”) representations are defined as the input node features. After aggregating the neighboring feature vectors, GraphSAGE then concatenates the node’s current representation, h k−1 v , with the aggregated neighborhood vector, h k−1 N(v) , and this concatenated vector is fed through a fully connected layer with nonlinear activation function σ, which transforms the representations to be used at the next step of the algorithm (i.e., h k v , ∀v ∈ V). [Hamilton page 4 Embedding generation (i.e., forward propagation) algorithm]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated using spatial dynamic neighborhood aggregation as an aggregation function, wherein the spatial-dynamic neighborhood aggregation transforms the each hidden state node attribute by revisiting a previous iteration step local-graph neighborhood hidden states during a message-passing phase through attention mechanism as taught by Hamilton into Gao (and thereby, the combination of Wei, Jiang, and Gao) because they are both directed towards transforming node embeddings in a graph neural network for classification inference. 
Gao already discusses reference Hamilton as a related aggregation technique to GCNs (“GCNs essentially perform aggregation and transformation on node features without learning trainable filters. (Hamilton et al., 2017) tried to sample a fixed number of neighboring nodes to keep the computational footprint consistent”), and Wei already teaches use of the GraphSAGE framework of Hamilton within the aggregation module of the PAS framework (“Aggregation Module. We add five widely used GNNs: GCN [21], GAT [40], GraphSAGE [16] with mean aggregator, GIN [46] and GraphConv [30], which denoted as GCN, GAT, SAGE, GIN and GRAPHCONV” [Wei page 5 The Design of the Search Space]; see [16] in References [Wei page 11]). Therefore, one of ordinary skill in the art would recognize that incorporating the teachings of Hamilton would further enable the PAS framework to include the GraphSAGE framework within the aggregation module of the neural architecture search space as an alternative to GCNs, which would be desirable given the demonstrated usefulness of GraphSAGE for classification inference (“Our algorithm outperforms strong baselines on three inductive node-classification benchmarks” [Hamilton Abstract]).
Regarding claims 9 and 15, they are system/apparatus and product claims that correspond to the method of claim 3, which is taught by the combination of Wei, Jiang, Gao, and Hamilton. Consequently, claims 9 and 15 are also taught by the combination of Wei, Jiang, Gao, and Hamilton, and are therefore rejected for the same reasons as claim 3.
Claim 4, 10, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Wei, Jiang, and Gao, as applied to claims 1, 7, and 13 above, further in view of Velickovic et al., ("Graph Attention Networks", available arXiv 4 Feb 2018), hereinafter Velickovic.
Regarding claim 4, the combination of Wei, Jiang, and Gao teaches the limitations of parent claim 1, and Wei further teaches wherein obtaining the graph level representation vector of the molecular graph comprises: 
feeding the coarsened molecular graph as an input to a node-ordering invariant read-out function to determine the graph-level representation vector; (see Readout Module [Wei page 5 The Design of the Search Space], as detailed above).
Gao further teaches wherein obtaining the graph level representation vector of the molecular graph comprises: 
performing aggregation on the coarsened molecular graph to transform and update each hidden state node attribute of the hidden state node attributes;
performing a first predetermined number of down-sampling and subsequent aggregation on the coarsened molecular graph;
performing a second number, the second number being equivalent to the first predetermined number, of up-sampling and subsequent aggregation to reinstate the coarsened molecular graph to an isomorphic clone of the molecular graph, wherein the isomorphic clone of the molecular graph is a node- information transformed isomorphic clone; and
feeding the node-information transformed isomorphic clone of the molecular graph as an input for further classification inference ([Gao page 2 Related Work] and [Gao page 4 Graph U-Nets Architecture] and Figure 3 [Gao page 5], as detailed above)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein obtaining the graph level representation vector of the molecular graph comprises:  performing aggregation on the coarsened molecular graph to transform and update each hidden state node attribute of the hidden state node attributes; performing a first predetermined number of down-sampling and subsequent aggregation on the coarsened molecular graph; performing a second number, the second number being equivalent to the first predetermined number, of up-sampling and subsequent aggregation to reinstate the coarsened molecular graph to an isomorphic clone of the molecular graph, wherein the isomorphic clone of the molecular graph is a node- information transformed isomorphic clone; and feeding the node-information transformed isomorphic clone of the molecular graph as an input for further classification inference as taught by Gao into Wei because they are both directed towards using hierarchical pooling frameworks for classification inference on graph data.
Wei already teaches use of the Graph U-Net framework of Gao within the pooling module of the PAS framework to generate coarse graph representations (i.e., perform down-sampling) (see TOPKPOOL [11] in Pooling Module [Wei page 4] and [11] in References [Wei page 11], as detailed above). Therefore, one of ordinary skill in the art would recognize that incorporating the teachings of Gao would further enable the PAS framework to include the Graph U-Nets framework within the neural architecture search space, which would be desirable given that Graph U-Nets are a widely used ([Wei page 6 Experimental Settings], as detailed above) hierarchical pooling framework useful for classification inference ([Gao Abstract], as detailed above).
However, the combination of Wei, Jiang, and Gao does not explicitly teach using spatial graph-attention feed-forward propagation layer mechanism as an aggregation function, wherein the spatial graph-attention feed-forward propagation layer mechanism overhauls the each hidden state node attribute by weighing local-graph neighborhood nodes of importance, wherein the local-graph neighborhood nodes importance to a sink node is determined through attention mechanism.
In the same field of endeavor, Velickovic teaches a means of transforming node embeddings in a graph neural network for classification inference (“Inspired by this recent work, we introduce an attention-based architecture to perform node classification of graph-structured data. The idea is to compute the hidden representations of each node in the graph, by attending over its neighbors, following a self-attention strategy" [Velickovic page 2 Introduction]) by using spatial graph-attention feed-forward propagation layer mechanism as an aggregation function, wherein the spatial graph-attention feed-forward propagation layer mechanism overhauls the each hidden state node attribute by weighing local-graph neighborhood nodes of importance, wherein the local-graph neighborhood nodes importance to a sink node is determined through attention mechanism. (“We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods’ features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems” [Velickovic Abstract]; The disclosed architecture incorporates specifying weights of nodes (i.e., importance) in a given neighborhood into the message passing procedure via self-attentional layers (i.e., attention mechanism). The examiner notes that the term “sink node” is interpreted as referring to the node currently receiving information from its neighbors for any given neighborhood).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated using spatial graph-attention feed-forward propagation layer mechanism as an aggregation function, wherein the spatial graph-attention feed-forward propagation layer mechanism overhauls the each hidden state node attribute by weighing local-graph neighborhood nodes of importance, wherein the local-graph neighborhood nodes importance to a sink node is determined through attention mechanism as taught by Velickovic into Gao (and thereby, the combination of Wei, Jiang, and Gao) because they are both directed towards transforming node embeddings in a graph neural network for classification inference. 
Gao already discusses reference Velickovic as a related aggregation technique to GCNs (“GCNs essentially perform aggregation and transformation on node features without learning trainable filters… (Velickovic et al., 2017) proposed to use attention mechanisms to enable different weights for neighboring nodes” [Gao page 2 Related Work]), and Wei already teaches use of the graph attention network (GAT) framework of Velickovic within the aggregation module of the PAS framework (“Aggregation Module. We add five widely used GNNs: GCN [21], GAT [40], GraphSAGE [16] with mean aggregator, GIN [46] and GraphConv [30], which denoted as GCN, GAT, SAGE, GIN and GRAPHCONV” [Wei page 5 The Design of the Search Space]; see [40] in References [Wei page 11]). Therefore, one of ordinary skill in the art would recognize that incorporating the teachings of Velickovic would further enable the PAS framework to include the GAT framework within the aggregation module of the neural architecture search space as an alternative to GCNs, which would be desirable given the demonstrated usefulness of GAT for classification inference (“For the transductive tasks, we report the mean classification accuracy (with standard deviation) on the test nodes of our method after 100 runs, and reuse the metrics already reported in Kipf &Welling (2017) and Monti et al. (2016) for state-of-the-art techniques… Our results successfully demonstrate state-of-the-art performance being achieved or matched across all four datasets—in concordance with our expectations, as per the discussion in Section 2.2” [Velickovic pages 7-8 Results]).
Regarding claims 10 and 16, they are system/apparatus and product claims that correspond to the method of claim 4, which is taught by the combination of Wei, Jiang, Gao, and Velickovic. Consequently, claims 10 and 16 are also taught by the combination of Wei, Jiang, Gao, and Velickovic, and are therefore rejected for the same reasons as claim 4.
Claim 6, 12, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Wei, Jiang, and Gao, as applied to claims 1, 7, and 13 above, further in view of Thekumparampil et al., ("Attention-based Graph Neural Network for Semi-supervised Learning", available arXiv March 13 2018), hereinafter Thekumparampil.
Regarding claim 6, the combination of Wei, Jiang, and Gao teaches the limitations of parent claim 1, and and Wei further teaches wherein obtaining the graph level representation vector of the molecular graph comprises: 
feeding the coarsened molecular graph as an input to a node-ordering invariant read-out function to determine the graph-level representation vector; (see Readout Module [Wei page 5 The Design of the Search Space], as detailed above).
Gao further teaches wherein obtaining the graph level representation vector of the molecular graph comprises: 
performing aggregation on the coarsened molecular graph to transform and update each hidden state node attribute of the hidden state node attributes;
performing a first predetermined number of down-sampling and subsequent aggregation on the coarsened molecular graph;
performing a second number, the second number being equivalent to the first predetermined number, of up-sampling and subsequent aggregation to reinstate the coarsened molecular graph to an isomorphic clone of the molecular graph, wherein the isomorphic clone of the molecular graph is a node- information transformed isomorphic clone; and
feeding the node-information transformed isomorphic clone of the molecular graph as an input for further classification inference ([Gao page 2 Related Work] and [Gao page 4 Graph U-Nets Architecture] and Figure 3 [Gao page 5], as detailed above).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated wherein obtaining the graph level representation vector of the molecular graph comprises:  performing aggregation on the coarsened molecular graph to transform and update each hidden state node attribute of the hidden state node attributes; performing a first predetermined number of down-sampling and subsequent aggregation on the coarsened molecular graph; performing a second number, the second number being equivalent to the first predetermined number, of up-sampling and subsequent aggregation to reinstate the coarsened molecular graph to an isomorphic clone of the molecular graph, wherein the isomorphic clone of the molecular graph is a node- information transformed isomorphic clone; and feeding the node-information transformed isomorphic clone of the molecular graph as an input for further classification inference for further inference as taught by Gao into Wei because they are both directed towards using hierarchical pooling frameworks for classification inference on graph data.
Wei already teaches use of the Graph U-Net framework of Gao within the pooling module of the PAS framework to generate coarse graph representations (i.e., perform down-sampling) (see TOPKPOOL [11] in Pooling Module [Wei page 4] and [11] in References [Wei page 11], as detailed above). Therefore, one of ordinary skill in the art would recognize that incorporating the teachings of Gao would further enable the PAS framework to include the Graph U-Nets framework within the neural architecture search space, which would be desirable given that Graph U-Nets are a widely used ([Wei page 6 Experimental Settings], as detailed above) hierarchical pooling framework useful for classification inference ([Gao Abstract], as detailed above).
However, the combination of Wei, Jiang, and Gao does not explicitly teach using spatial Graph Attentional Propagation as an aggregation function, wherein the spatial Graph Attentional Propagation transforms the each hidden state node attribute by determining local-graph neighborhood nodes of importance through attention mechanism, wherein attention mechanism includes a cosine similarity to determine attention coefficients to weigh the local-graph neighborhood nodes of importance to transform each hidden state node attribute
In the same field of endeavor, Thekumparampil teaches a means of transforming node embeddings in a graph neural network for classification inference (“This further motivates us to design a new way of aggregating neighborhood information through attention mechanism since, intuitively, neighbors might not be equally important. This proposed attention-based graph neural network captures this intuition and (a) greatly reduces the model complexity, with only a single scalar parameter at each intermediate layer; (b) discovers dynamically and adaptively which nodes are relevant to the target node for classification; and (c) improves upon state-of-the-art methods in terms of accuracy on standard benchmark datasets" [Thekumparampil page 2 Introduction]) by using spatial Graph Attentional Propagation as an aggregation function, wherein the spatial Graph Attentional Propagation transforms the each hidden state node attribute by determining local-graph neighborhood nodes of importance through attention mechanism, wherein attention mechanism includes a cosine similarity to determine attention coefficients to weigh the local-graph neighborhood nodes of importance to transform each hidden state node attribute ("To this end, we introduce a novel Attention-based Graph Neural Network (AGNN). AGNN is simple; it only has a single scalar parameter β (t) at each intermediate layer. AGNN captures relevance; the proposed attention mechanism over neighbors in (5) learns which neighbors are more relevant and weighs their contributions accordingly" [Thekumparampil page 4 Attention-based Graph Neural Network (AGNN)] “We start with a word-embedding layer that maps a bag-of-words representation of a document into an averaged word embedding...This is followed by layers of attention-guided propagation layers parameterized by β(t) ∈ R at each layer,...The softmax function at attention ensures that the propagation layer P(t) row-sums to one. The attention from node j to node i is [7] with with C = P j∈N(i)∪{i} e β (t) cos(H (t) i ,H(t) j ) which captures how relevant j is to i, as measured by the cosine of the angle between the corresponding hidden states” [Thekumparampil pages 4-5 AGNN Model]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated using spatial Graph Attentional Propagation as an aggregation function, wherein the spatial Graph Attentional Propagation transforms the each hidden state node attribute by determining local-graph neighborhood nodes of importance through attention mechanism, wherein attention mechanism includes a cosine similarity to determine attention coefficients to weigh the local-graph neighborhood nodes of importance to transform each hidden state node attribute as taught by Thekumparampil into Wei because they are both directed towards transforming node embeddings in a graph neural network for classification inference. One of ordinary skill in the art would recognize that incorporating the teachings of Thekumparampil would further enable the PAS framework of Wei to include the Attention-based Graph Neural Network (AGNN) framework within the aggregation module of the neural architecture search space as an alternative to GCNs, which would be desirable given the computational simplicity of AGNN and its demonstrated usefulness for classification inference (“This proposed attention-based graph neural network captures this intuition and (a) greatly reduces the model complexity, with only a single scalar parameter at each intermediate layer; (b) discovers dynamically and adaptively which nodes are relevant to the target node for classification; and (c) improves upon state-of-the-art methods in terms of accuracy on standard benchmark datasets" [Thekumparampil page 2 Introduction]).
Regarding claims 12 and 18, they are system/apparatus and product claims that correspond to the method of claim 6, which is taught by the combination of Wei, Jiang, Gao, and Thekumparampil. Consequently, claims 12 and 18 are also taught by the combination of Wei, Jiang, Gao, and Thekumparampil, and are therefore rejected for the same reasons as claim 6.
Response to Arguments
	The remarks filed 10/07/2025 have been fully considered.
	Applicant’s remarks [Remarks pages 28-47] traversing the obviousness rejections under 35 U.S.C. 103 set forth in the office action mailed 07/09/2025, in view of claims 1-10 and 12-19 as amended, have been considered but are not persuasive.
	The examiner respectfully notes that applicant’s arguments against the prior art of record largely amount to copying or summarizing portions of references cited in the previous office action, copying portions of the specification corresponding to newly added limitations, and then concluding arguments with mere assertions that the cited references are “silent on”, or “fail to teach or suggest” limitations at issue, without any further analysis or explanation provided. The arguments provided fail to utilize specific evidence or reasoning to explain why the cited prior art allegedly does not teach or suggest the limitations at issue.
	Applicant is directed towards the updated grounds of rejection under 35 U.S.C. 103 set forth above. Arguments of note are further summarized and addressed below.
	Applicant argues [Remarks pages 32-33] that the “specific representation of a hierarchical layer-wise propagation of the graph pooling layer” (
    PNG
    media_image31.png
    93
    217
    media_image31.png
    Greyscale
 ), as recited in amended claim 1 (and thereby corresponding claims 9 and 13) is not taught or suggested by the prior art of record.
	The examiner respectfully disagrees, and notes that as best understood in light of the specification [see 112(b) rejection of claim 1 set forth above], the recited representation appears to do no more than simply repeat previous limitations (“obtaining a hierarchical layer-wise propagation of a graph pooling layer of the molecular graph by taking a product of the edge-information aware node attributes 
    PNG
    media_image5.png
    46
    152
    media_image5.png
    Greyscale
 and a unit vector associated with the projection vector 
    PNG
    media_image6.png
    39
    49
    media_image6.png
    Greyscale
, wherein the direction of the unit vector is same as the direction of the projection vector 
    PNG
    media_image6.png
    39
    49
    media_image6.png
    Greyscale
”) in equation form.
	Applicant argues [Remarks pages 35-37] that the prior art of record fails to teach “selection of the first set of nodes and the second set of nodes [being] performed based on a pooling ratio”, as recited in amended claim 1 (and thereby corresponding claims 9 and 13), because Wei and Gao merely suggest selecting “top-k nodes” or a “smaller subset of nodes” to form the coarse graph.
	The examiner respectfully disagrees. Under a broadest reasonable representation in light of the specification [0045-0047], selection “based on a pooling ratio” amounts to simply selecting of a particular number of nodes (e.g., k) out of the total number of nodes (e.g., n) for inclusion in the coarsened graph (i.e., a “ratio” of rejected nodes (i.e., first set of nodes) to retained nodes (i.e., second set of nodes) of (n – k) : k), which is functionally equivalent to the top-k procedure of Wei and Gao, as further explained in the rejection set forth above.
	Applicant argues [Remarks pages 37-39] that the prior art of record fails to teach an “Edge-Conditioned Hierarchical Graph Pooling Layer (EC-GPL)” and specific “Edge-Conditioned Graph U-Nets Architecture” as recited in newly added claim 19.
	The examiner respectfully disagrees, and notes that claim limitations are considered under a broadest reasonable interpretation in light of the specification, and that limitations from the specification (i.e., specific embodiments or architectures) are not read into the claims. Unless applicant clearly sets forth or designates a special definition of a claim term in the specification, all claim terms (e.g., Edge-Conditioned Hierarchical Graph Pooling Layer, Edge-Conditioned Graph U-Nets Architecture) are given their meaning as it would be understood by one of ordinary skill in the art. The limitations of claim 19 thereby appear to do no more than recite known operation procedure of the widely used Graph U-Nets architecture and its subcomponents therein, as disclosed in Gao, without any further distinctive features provided (see 103 rejection of claim 19 set forth above). The recitations of said architecture as being “Edge-Conditioned” do not provide any apparent distinction over the combination of cited references, given that Jiang is already explained to teach the recited “edge-information aware node attributes” via inclusion of edge features (see 103 rejection of claim 1 set forth above).
	Applicant argues [Remarks pages 39-46] that the prior art of record fails to teach newly added limitations to dependent claims 2-6 (and thereby corresponding claims 8-10 and 12 and 14-18).
	The examiner respectfully disagrees, and notes that the newly added limitations appear to do no more than either use variations in terminology to append onto the claims a typical procedure of information aggregation through message passing, commonly inherent to graph neural network architectures (incl. GCN layer of Gao – see 103 rejection of claims 2 and 5 set forth above), or add limitations that were already taught and/or suggested by the previously cited art (see 103 rejection of claims 3-4 and 6 set forth above).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Hamilton et al. (“Embedding Logical Queries on Knowledge Graphs”, available arXiv 29 Oct 2019) discloses a framework for efficiently making predictions about conjunctive logical queries—a flexible but tractable subset of first-order logic—on incomplete knowledge graphs.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIJAY M BALAKRISHNAN whose telephone number is (571) 272-0455. The examiner can normally be reached 10am-5pm EST Mon-Thurs.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JENNIFER WELCH can be reached on (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/V.M.B./
Examiner, Art Unit 2143 
	
/JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143
Read full office action
Prosecution Timeline

Apr 28, 2022
Application Filed
Jul 03, 2025
Non-Final Rejection — §103, §112
Oct 07, 2025
Response Filed
Jan 07, 2026
Final Rejection — §103, §112
Apr 09, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/766,854
Patent 12585912
GATED LINEAR CONTEXTUAL BANDITS
2y 5m to grant Granted Mar 24, 2026
17/517,698
Patent 12468967
METHOD AND SYSTEM FOR GENERATING A SOCIO-TECHNICAL DECISION IN RESPONSE TO AN EVENT
2y 5m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
43%
Grant Probability
99%
With Interview (+85.7%)
3y 12m
Median Time to Grant
Moderate
PTA Risk
Based on 14 resolved cases by this examiner. Grant probability derived from career allow rate.