Prosecution Insights
Last updated: May 29, 2026
Application No. 17/421,358

FEATURE INTERACTION VIA EDGE SEARCH

Final Rejection §101§103
Filed
Jul 07, 2021
Priority
Jul 14, 2020 — nonprovisional of PCTCN2020101902
Examiner
DAY, ROBERT N
Art Unit
2122
Tech Center
2100 — Computer Architecture & Software
Assignee
Alibaba Group Holding Limited
OA Round
6 (Final)
21%
Grant Probability
At Risk
7-8
OA Rounds
0m
Est. Remaining
41%
With Interview

Examiner Intelligence

Grants only 21% of cases
21%
Career Allowance Rate
5 granted / 24 resolved
-34.2% vs TC avg
Strong +20% interview lift
Without
With
+20.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
16 currently pending
Career history
61
Total Applications
across all art units

Statute-Specific Performance

§101
4.6%
-35.4% vs TC avg
§103
85.7%
+45.7% vs TC avg
§102
9.7%
-30.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 24 resolved cases

Office Action

§101 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 26 November 2025 has been entered. Claims 1, 11, and 16 are amended. Claim 3 was previously canceled. Claims 1, 2, and 4-20 are pending and have been examined. Response to Arguments Applicant's arguments, see pages 10-15, filed 26 November 2025, with respect to the rejections of Claims 1, 2, and 4-20 under 35 U.S.C. 101 for abstract idea have been fully considered and are persuasive. The rejections of Claims 1, 2, and 4-20 under 35 U.S.C. 101 for abstract idea have been withdrawn. APPLICANT'S ARGUMENT: Applicant argues (page 10, paragraph 5) that "Applicant respectfully submits that the pending claims are directed to improving existing technologies." Applicant argues (page 14, continued paragraph) that "As the predictive model is less complicated than the initial neural network and uses a number of interactive features that is fewer than the number of distinct features originally associated with the application, fewer computational and memory resources are required. Furthermore, the use of adjacency matrices of a same size to represent connections of respective interactive features of a plurality of feature graphs of different orders avoids mathematical complexity and incompatibility involving operations such as additions and multiplications between matrices of any different orders, thus further reducing the computational and memory resources, and making an improvement to the existing technologies." EXAMINER'S RESPONSE: Examiner agrees. The rejections of Claims 1, 2, and 4-20 under 35 U.S.C. 101 for abstract idea have been withdrawn in light of arguments and/or amendments. Applicant's arguments, see pages 10-15, filed 26 November 2025, with respect to the rejections of Claims 1, 2, and 4-20 under 35 U.S.C. 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. APPLICANT'S ARGUMENT: Applicant argues (page 16, paragraph 2) that "Without conceding propriety of the asserted combination, Velickovic, Leel, Lee2, and Guilak do not render claim 1 unpatentable." Applicant argues (page 17, paragraph 2) that "Claims 2 and 6-10 ultimately depend from independent claim 1. As discussed above, claim 1 is allowable over the cited documents. Therefore, claims 2 and 6-10 are allowable over the cited documents of record for at least their dependency from an allowable base claim, and also for the additional features that each recites." Applicant argues (page 18, paragraph 3) that "For at least reasons similar to those discussed above with respect to claim 1, Velickovic, Leel, Lee2, and Guilak fail to teach each and every feature of amended claim 11." Applicant argues (page 19, paragraph 2) that "claims 13-15 are allowable over the cited documents of record for at least their dependency from an allowable base claim." Applicant argues (page 19, paragraph 5) that "Without conceding propriety of the asserted combination, Velickovic, Leel, Lee2, and Guilak do not render claim 16 unpatentable." Applicant argues (page 20, paragraph 4) that "claims 18-20 are allowable over the cited documents of record for at least their dependency from an allowable base claim, and also for the additional features that each recites." Applicant argues (page 21, paragraph 3) that "Without conceding the propriety of the combination of Velickovic, Leel, Lee2, Guilak, and LINQS, the combination fails to teach or suggest one or more features of independent claim 1." Applicant argues (page 22, paragraph 2) that "claim 5, 12, and 17 are allowable over the cited documents of record for at least their dependency from an allowable base claim, and also for the additional features that each recites." EXAMINER'S RESPONSE: Examiner notes that Applicant's arguments pertain to newly claimed limitations. Further, Applicant's arguments are now moot. In the 35 U.S.C. 103 rejection of amended Claim 1 below, the claimed invention is made obvious under a new ground of rejection in view of Veličković in view of Lee in view of Leskovec in view of Chen in view of Lee in view of Guilak. Dependent claims inherit the rejection of their independent claims and are made obvious as indicated below. Claim Rejections - 35 USC § 101 The rejections of Claims 1, 2, and 4-20 under 35 U.S.C. 101 for abstract idea have been withdrawn in light of arguments and/or amendments. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1, 2, 6, 7-11, 13-16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Veličković, et al., "Graph Attention Networks" (hereinafter "Veličković") in view of Lee, et al. (US 2020/0285944 A1, hereinafter "Lee1") in view of Leskovec, et al. "Kronecker graphs: an approach to modeling networks" (hereinafter "Leskovec") in view of Chen, Hanting, et al. "Learning Student Networks via Feature Embedding" (hereinafter "Chen") in view of Lee, et al. (US 2021/0294840 A1, hereinafter "Lee2"), in further view of Guilak (US 6,292,577 B1, hereinafter "Guilak"). Regarding Claim 1, Veličković teaches: a method (Veličković, p. 7, 3.4 Results: "For the transductive tasks, we report the mean classification accuracy (with standard deviation) on the test nodes of our method after 100 runs") implemented by one or more computing devices (Veličković, p. 5, 2.2 Comparisons to related work: "Depending on the regularity of the graph structure in place, GPUs may not be able to offer major performance benefits compared to CPUs in these sparse scenarios ... parallelization across all the graph edges, especially in a distributed manner, may involve a lot of redundant computation, as the neighborhoods will often highly overlap in graphs of interest"), the method comprising: ... stages including a transformation stage (Veličković, p. 3, 2.1 Graph attentional layer: "We will start by describing a single graph attentional layer.... The input to our layer is a set of node features," where Veličković's suggested collection of node features as inputs corresponds to the instant transformation), an edge search stage (Veličković, p. 3, 2.1 Graph attentional layer: "to obtain sufficient expressive power to transform the input features into higher-level features .... We then perform self-attention on the nodes," where Veličković's self-attention transformation of features corresponds to the instant edge search), a propagation stage (Veličković, p. 7, 3.3 Experimental Setup, Transductive learning: "For the transductive learning tasks, we apply a two-layer GAT model) ..., wherein: the transformation stage includes at least: associating a plurality of nodes in a feature graph of a first order (Veličković, p. 3, 2.1 Graph attentional layer: "We inject the graph structure into the mechanism by performing masked attention--we only compute e i j for nodes j ∈ N i , where N i is some neighborhood of node i in the graph. In all our experiments, these will be exactly the first-order neighbors of i (including i )," where Veličković's first order neighborhood corresponds to the instant first order) with a plurality of distinct features (Veličković, p. 3, 2.1 Graph attentional layer: "We will start by describing a single graph attentional layer.... The input to our layer is a set of node features, h = { h 1 → , h 2 → , … , h N → } , h i → ∈ R F , where N is the number of nodes, and F is the number of features in each node") that is associated with an application (Veličković, p. 5, 3.3 Experimental setup: "For the transductive learning tasks, we apply a two-layer GAT model. Its architectural hyperparameters have been optimized on the Cora dataset and are then reused for Citeseer. ... The second layer is used for classification," where the instant application corresponds to Veličković's classification); the edge search stage includes at least: iteratively generating interactive features of a higher order from interactive features of a lower order (Veličković, p. 3, 2.1 Graph attentional layer: "to obtain sufficient expressive power to transform the input features into higher-level features, at least one learnable linear transformation is required.... We then perform self-attention on the nodes—a shared attentional mechanism ... that indicate the importance of node j ’s features to node i ," where Veličković's transform input into higher-level corresponds to the instant generate) to form a plurality of feature graphs of different orders (Veličković, p. 3, 2.1 Graph attentional layer: "The layer produces a new set of node features (of potentially different cardinality F ' ), h ' = { h 1 ' → , h 2 → , … , h N → } , h i → ∈ R F ' , as its output," where Veličković's output new set of node features corresponds to the instant feature graph of higher-order features, where the output node is also depicted as node h' on p. 4, Figure 1, "Right: An illustration of multihead attention (with K = 3 heads) by node 1 on its neighborhood"), wherein: ... matrices ... are used to represent connections of respective interactive features of the plurality of feature graphs of different orders (Veličković, p. 5, 2.2 Comparisons To Related Work: "We were able to produce a version of the GAT layer that leverages sparse matrix operations, reducing the storage complexity to linear in the number of nodes and edges and enabling the execution of GAT models on larger graph datasets"), a binarization function is applied with a tunable threshold value ... to obtain binary adjacency ... at an end of the edge search stage (Veličković, p. 7, 3.3 Experimental Setup, Transductive learning: "During training ... dropout ... with p = 0.6 is applied to both layers' inputs, as well as to the normalized attention coefficients (critically, this means that at each training iteration, each node is exposed to a stochastically sampled neighborhood)," where Veličković's neighborhood edge sampling corresponds to the instant binarization function, and dropout rate corresponds to the instant tunable threshold); the propagation stage includes at least: propagating the respective interactive features of the plurality of feature graphs of the different orders to a neural network (Veličković, p. 7, 3.3 Experimental Setup, Transductive learning: "For the transductive learning tasks, we apply a two-layer GAT model. ... The first layer consists of K = 8 attention heads computing F ' = 8 features each (for a total of 64 features), followed by an exponential linear unit (ELU) ... nonlinearity. The second layer is used for classification: a single attention head that computes C features (where C is the number of classes), followed by a softmax activation," where Veličković's first layer is a neural network, as in p. 3, 2.1 Graph Attentional Layer: "the attention mechanism a is a single-layer feedforward neural network," and where Veličković's second layer corresponds to the instant neural network) to determine a number of interactive features of one or more orders for the application (Veličković, p. 9, 4 Conclusions: "Our models leveraging attention have successfully achieved or matched state-of-the-art performance across four well-established node classification benchmarks, both transductive and inductive (especially, with completely unseen graphs used for testing). ... Moreover, extending the method to perform graph classification instead of node classification would also be relevant from the application perspective," where Veličković's node classification corresponds to the instant application), the number of interactive features of the one or more orders being fewer than a number of the plurality of distinct features (Veličković, p. 7, 3.3 Experimental Setup, Transductive learning: "For the transductive learning tasks, we apply a two-layer GAT model. ... During training ... dropout ... with p = 0.6 is applied to both layers' inputs, as well as to the normalized attention coefficients (critically, this means that at each training iteration, each node is exposed to a stochastically sampled neighborhood)," where Veličković's sampled neighborhood corresponds to the instant a number of interactive features). Veličković teaches a method of associating nodes in a feature graph and iteratively generating interactive features of a higher order from interactive features of a lower order to form feature graphs of different orders. Veličković does not explicitly teach wherein adjacency matrices of a same size are used to represent connections of respective interactive features of the plurality of feature graphs of different orders. However, Lee1 teaches: wherein adjacency matrices of a same size are used to represent connections of respective interactive features of the plurality of feature graphs of different orders (Lee1, [0085]: "for a given network G = v , ϵ with N = v nodes and M = ϵ edges, and a set of T types of motifs H = { H 1 , … , H t } (such as the set of motifs described with respect to FIG. 6), a set of T different motif-induced adjacency matrices A = { A 1 , … , A t } is determined, where the motif-induced adjacency matrix A t for motif t is defined as: A t , i , j =-number of motifs of type H t which each include nodes i and j . ," where Lee1's motif-induced adjacency matrices corresponds to the instant adjacency matrices of a same size, as in [0090]: "the motif-induced adjacency matrix A t ∈ A , (such as one-hop edge-based adjacency matrix 950 and one-hop motif-induced adjacency matrix 1050 described above) is transformed using a function Ψ : R N × N → R N × N to determine motif-induced adjacency matrices A ~ t for use in graph convolution using equation (9)," and where Lee1's motif is used to represent features, as in [0044]: "Some GCNs use multiple motifs to select multiple neighborhoods for each node in the graph, perform information integration in each of the multiple neighborhoods, and then combine the results (e.g., as a weighted sum) from the multiple neighborhoods to determine a feature embedding for the node") It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Veličković regarding a method of associating nodes in a feature graph and iteratively generating interactive features of a higher order from interactive features of a lower order to form feature graphs of different orders with those of Lee1 regarding wherein adjacency matrices of a same size are used to represent connections of respective interactive features of the plurality of feature graphs of different orders. The motivation to do so would be to facilitate improving the relevance of node's neighborhoods by including information from more relevant higher-order neighborhoods while processing graphs (Lee1, [0045]: "graph convolutional networks with motif-based attention ... capture information from more relevant neighboring nodes for each individual node. In some embodiments, multi-hop motifs are used to capture information from more relevant higher-order neighborhoods (e.g., within a longer distance) of individual nodes. In some embodiments, a motif attention mechanism is used to select the most relevant motif-induced neighborhoods for information integration at each respective node in a graph, where different motifs and/or different distances ( e.g., number of hops) are selected and used to determine the most relevant neighborhoods for different nodes"). The Veličković/Lee1 combination teaches matrices are used to represent connections of respective interactive features of the plurality of feature graphs of different orders, and a binarization function is applied with a tunable threshold value to obtain binary adjacency at an end of the edge search stage. The Veličković/Lee1 combination does not explicitly teach adjacency matrices ... and are regarded as Bernoulli random variables parameterized by an edge state. However, Leskovec teaches: adjacency matrices ... are regarded as Bernoulli random variables parameterized by an edge state (Leskovec, p. 1010, 5.2 Problem Formulation: "by using Θ we create the Stochastic Kronecker graph adjacency matrix P = P k = Θ k . Permutation σ defines the mapping of nodes of G to the rows and columns of stochastic adjacency matrix P . ... ¶ We then model edges as independent Bernoulli random variables parameterized by the parameter matrix Θ . So, each entry p u v of P gives exactly the probability of edge u , v appearing"). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Veličković/Lee1 combination regarding matrices are used to represent connections of respective interactive features of the plurality of feature graphs of different orders, and a binarization function is applied with a tunable threshold value to obtain binary adjacency at an end of the edge search stage with those of Leskovec regarding adjacency matrices ... and are regarded as Bernoulli random variables parameterized by an edge state. The motivation to do so would be to facilitate modeling with feature graphs that accurately represent properties of real networks (Leskovec, p. 1034, 8. Conclusion: "the main contribution of this work is a family of models of network structure that uses a non-traditional matrix operation, the Kronecker product. The resulting graphs (a) have all the static properties (heavy-tailed degree distribution, small diameter, etc.), (b) all the temporal properties (densification, shrinking diameter) that are found in real networks. ... ¶ Currently Stochastic Kronecker graphs use a Bernoulli edge generation model, that is, an entry of big matrix P encodes the parameter of a Bernoulli coin"). The Veličković/Lee1/Leskovec combination teaches training a predictive model for the application based at least in part on the determined number of interactive features of the one or more orders. The Veličković/Lee1/Leskovec combination does not explicitly teach the method comprising: ... a training stage, wherein: ... the training stage includes at least: training a lightweight predictive model for the application based at least in part on the determined number of interactive features of the one or more orders. However, Chen teaches: the method comprising: ... a training stage, wherein: ... the training stage includes at least: training a lightweight (Chen, p. 4, Algorithm 1, Learning student network by exploiting the proposed locality preserving loss, line 1: "Initialize the student network N S , whose number of parameters is significantly fewer than that in N T " where Chen's significantly fewer number of parameters corresponds to the instant lightweight) predictive model for the application (Chen, p. 3, A. Teacher-Student Interactions: "The student network is then learned using the following loss function ...[Eq. 2] ... where H is the cross-entropy loss, y is the ground-truth label and λ is a Trade-off parameter. The first term denotes the classical classification objective," where Chen's classification corresponds to the instant predictive) based at least in part on the determined number of interactive features of the one or more orders (Chen, p. 4, Algorithm 1, Learning student network by exploiting the proposed locality preserving loss, lines 4-10, where student weights are updated according to the teacher network batch results, as in p. 5, Eq. 8). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Veličković/Lee1/Leskovec combination regarding training a predictive model for the application based at least in part on the determined number of interactive features of the one or more orders with those of Chen regarding the method comprising: a training stage, wherein: the training stage includes at least: training a lightweight predictive model for the application based at least in part on the determined number of interactive features of the one or more orders. The motivation to do so would be to facilitate training of networks requiring fewer resources resulting in greater model portability (Chen, p. 1, I. Introduction: "As deeper networks often have higher performance than that of shallow networks [18], the teacher-student learning paradigm ... has emerged to learn portable networks (student network) of deeper architecture yet fewer parameters and convolution filters from the original network (teacher network). Compared with other approaches, portable networks generated by the teacher-student paradigm are much more flexible since they are exactly regular neural networks which do not need any additional supports for implementing online inference"). The Veličković/Lee1/Leskovec/Chen combination teaches training a lightweight predictive model for the application based at least in part on the determined number of interactive features of the one or more orders. The Veličković/Lee1/Leskovec/Chen combination does not explicitly teach storing information of the determined number of interactive features of the one or more orders in a database. However, Lee2 teaches: storing information of the determined number of interactive features of the one or more orders in a database (Lee2, [0025]: "The music search system can ... store the feature vectors in the database"). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Veličković/Lee1/Leskovec/Chen combination regarding training a lightweight predictive model for the application based at least in part on the determined number of interactive features of the one or more orders with those of Lee2 regarding storing information of the interactive features in a database. The motivation to do so would be to support application scenarios where user input can be used to generate features according to similarity with stored features (Lee2, [0038]: "the music search system does not require a user-supplied query ... and instead can receive ... a musical attribute from a user... The music search system can generate a prototype feature vector for each possible ... attribute from stored feature vectors in a database ... averaging these feature vectors to form a prototype feature vector"). The Veličković/Lee1/Leskovec/Chen/Lee2 combination teaches training a lightweight predictive model for the application based at least in part on the determined number of interactive features of the one or more orders. The Veličković/Lee1/Leskovec/Chen/Lee2 combination does not explicitly teach storing parameters of the predictive model in the database. However, Guilak teaches: storing parameters of the ... predictive model in the database (Guilak, [0045]: "Each weight in the set of parameters is assigned an initial value, and both the weight and the assigned value are stored in a parameter weight database 314. Initial weighting values stored in the parameter weight database 314 are adjusted by analyzing the test set 312 of training documents"). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Veličković/Lee1/Leskovec/Chen/Lee2 combination regarding training a lightweight predictive model for the application based at least in part on the determined number of interactive features of the one or more orders with those of Guilak regarding storing parameters of the predictive model in the database. The motivation to do so would be enable a training process that incrementally improves model performance to be highly effective (Guilak, [0045]: "Initial weighting values stored in the parameter weight database 314 are adjusted by analyzing the test set 312 of training documents. ... The error information accumulated over a large set of training data 304 ...is then used to incrementally adjust the initial parameter weightings stored in the parameter weight database 314. ... This process is repeated in an iterative fashion to arrive at a set of feature weightings that are highly predictive of the selected type of content"). Regarding Claim 11, Veličković teaches: one or more computer readable media storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts (Veličković, p. 5, 2.2 Comparisons to related work: "Depending on the regularity of the graph structure in place, GPUs may not be able to offer major performance benefits compared to CPUs in these sparse scenarios ... parallelization across all the graph edges, especially in a distributed manner, may involve a lot of redundant computation, as the neighborhoods will often highly overlap in graphs of interest") comprising: precisely those steps recited by the method of Claim 1. Claim 11 is rejected under the same rationale as Claim 1. Regarding Claim 16, Veličković teaches: a system comprising: one or more processors; memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts (Veličković, p. 5, 2.2 Comparisons to related work: "Depending on the regularity of the graph structure in place, GPUs may not be able to offer major performance benefits compared to CPUs in these sparse scenarios ... parallelization across all the graph edges, especially in a distributed manner, may involve a lot of redundant computation, as the neighborhoods will often highly overlap in graphs of interest") comprising: : precisely those steps recited by the method of Claim 1. Claim 11 is rejected under the same rationale as Claim 1. Regarding Claim 2, the rejection of Claim 1 is incorporated. The Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination teaches: wherein iteratively generating the interactive features of the higher order from the interactive features of the lower order to form the plurality of feature graphs of the different orders comprises determining whether to connect two interactive features of the lower order to form an interactive feature of the higher order based at least in part on a ... function (Veličković, p. 3, 2.1 Graph attentional layer: "a shared attentional mechanism a ... computes attention coefficients e i j ... We inject the graph structure into the mechanism by performing masked attention—we only compute e i j for nodes j ∈ N i , where N i is some neighborhood of node i in the graph," where the instant function corresponds to Veličković's masked attention). Lee1 further teaches: determining whether to connect two interactive features of the lower order to form an interactive feature of the higher order based at least in part on a reward function (Lee1, [0105]: "In some embodiments, the attention mechanism is trained using a second loss function based on reinforcement learning. The second loss function, which is referred to as an attention loss function L A , is defined as: [Eq. 22] where R v is the reward given to the system (e.g., R v = 1 if the MCN classifies node v correctly, otherwise R v = - 1 )"). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination regarding combining first-order features based on attention masking with the further teachings of Lee1 regarding use of a reward function. The motivation to do so would be to ensure that determining node neighborhood by way of an attention mechanism properly rewards interactions of node neighborhoods across layers of the network (Lee1, [0105]: "While the cross-entropy loss function described in equation (21) is sufficient for training the MCN to classify nodes in input graphs, it may not be sufficient for training the attention mechanism that selects the best motif and step-size for each node at each layer. In some embodiments, the attention mechanism is trained using a second loss function based on reinforcement learning. ... The attention loss function L A is used to reward the actions of the classified nodes at the last layer; and then reward the actions of the neighbors of the classified nodes at the previous layer (if there is one) because their actions affect the outcome"). Regarding Claim 6, the rejection of Claim 1 is incorporated. The Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination teaches: associating the plurality of nodes in the feature graph of the first order with the plurality of distinct features (Veličković, p. 3, 2.1 Graph attentional layer: "We will start by describing a single graph attentional layer.... The input to our layer is a set of node features, h = { h 1 → , h 2 → , … , h N → } , h i → ∈ R F , where N is the number of nodes, and F is the number of features in each node") that are associated with the application (Veličković, p. 5, 3.3 Experimental setup: "For the transductive learning tasks, we apply a two-layer GAT model. Its architectural hyperparameters have been optimized on the Cora dataset and are then reused for Citeseer. ... The second layer is used for classification," where the instant application corresponds to Veličković's classification) comprises: modeling each distinct feature of the plurality of distinct features as a respective node of the plurality of nodes in the feature graph (Veličković, p. 3, 2.1 Graph attentional layer: "The input to our layer is a set of node features ... where N is the number of nodes, and F is the number of features in each node" and p. 5, 2.2 Comparisons to related work: "The time complexity of a single GAT attention head computing F0 features may be expressed as O V F F ' + E F ' , where F is the number of input features, and V and E are the numbers of nodes and edges in the graph, respectively"), and an interaction between two distinct features of the plurality of distinct features as an edge between corresponding nodes of the plurality of nodes in the feature graph (Veličković, p. 3, 2.1 Graph attentional layer: "We then perform self-attention on the nodes—a shared attentional mechanism ... computes attention coefficients ... that indicate the importance of node j ’s features to node i ," where the instant interaction between two distinct features corresponds to Veličković's attention coefficients, as depicted p. 4, Figure 1). Claims 13 and 18 recite substantively the features of Claim 6 in computer readable media and system forms, respectively, and are rejected under the same rationale. Regarding Claim 7, the rejection of Claim 1 is incorporated. The Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination teaches: iteratively generating (Veličković, p. 4, 2.1 Graph attentional layer: "To stabilize the learning process of self-attention, we have found extending our mechanism to employ multi-head attention to be beneficial.... Specifically, K independent attention mechanisms execute the transformation of Equation 4, and then their features are concatenated") the interactive features of the higher order from the interactive features of the lower order (Veličković, p. 3, 2.1 Graph attentional layer: "to obtain sufficient expressive power to transform the input features into higher-level features, at least one learnable linear transformation is required.... We then perform self-attention on the nodes—a shared attentional mechanism ... that indicate the importance of node j ’s features to node i ," where the instant interactive corresponds to Veličković's importance to) to form the plurality of feature graphs of different orders (Veličković, p. 3, 2.1 Graph attentional layer: "We will start by describing a single graph attentional layer.... The input to our layer is a set of node features, h = { h 1 → , h 2 → , … , h N → } , h i → ∈ R F , where N is the number of nodes, and F is the number of features in each node," where Veličković's input set of node features corresponds to the instant feature graph of first-order features; and "The layer produces a new set of node features (of potentially different cardinality F ' ), h ' = { h 1 ' → , h 2 → , … , h N → } , h i → ∈ R F ' , as its output," where the instant feature graph of higher-order features corresponds to Veličković's output new set of node features, also depicted as h' on p. 4, Figure 1) comprises: crossing an interactive feature of the lower order with a feature in the feature graph of the first order (Veličković, p. 3, 2.1 Graph attentional layer: "We then perform self-attention on the nodes—a shared attentional mechanism a : R F ' × R F ' → R computes attention coefficients e i j = a W h → i , W h → j (1) that indicate the importance of node j ’s features to node i ," where the instant crossing an interactive feature corresponds to Veličković's shared attentional mechanism a ) to generate an interactive feature of the higher order through an edge search (Veličković, p. 3, 2.1 Graph attentional layer: "We inject the graph structure into the mechanism by performing masked attention—we only compute e i j for nodes j ∈ N i , where N i is some neighborhood of node i in the graph"). Claims 14 and 19 recite substantively the features of Claim 7 in computer readable media and system forms, respectively, and are rejected under the same rationale. Regarding Claim 8, the rejection of Claim 7 is incorporated. Lee1 further teaches: wherein the edge search comprises an edge search through a Markov Decision Process (Lee1, [0133]: "the computing system selects, for each node in a set of nodes in the graph representing the graph-structured dataset, a type of motif t and a step size k for determining the most relevant neighborhood. As described above, the computing system selects the type of motif and/or the step size using the attention mechanisms described above ... To select the step size of the selected type of motif for each respective node in the graph, the computer system applies a second trainable function (e.g., f'1) to the state matrix and the probability value associated with each type of motif in the multiple types of motifs for the respective node to determine a probability value associated with each respective step size in K step sizes, and then select, for the respective node, a step size corresponding to a highest probability value among the K step sizes as the step size k. The trainable functions for selecting the type of motif and the step size are trained in a reinforcement learning process," where Lee1's reinforcement learning process corresponds to the instant Markov Decision Process, and where Lee1's determining the neighborhood by way of an attention mechanism corresponds to the instant the edge search). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination regarding generating an interactive feature of the higher order through an edge search with those of Lee1 regarding the edge search comprising an edge search through a Markov Decision Process. The motivation to do so would be to ensure that determining node neighborhood by way of an attention mechanism properly rewards interactions of node neighborhoods across layers of the network (Lee1, [0105]: "While the cross-entropy loss function described in equation (21) is sufficient for training the MCN to classify nodes in input graphs, it may not be sufficient for training the attention mechanism that selects the best motif and step-size for each node at each layer. In some embodiments, the attention mechanism is trained using a second loss function based on reinforcement learning. ... The attention loss function L A is used to reward the actions of the classified nodes at the last layer; and then reward the actions of the neighbors of the classified nodes at the previous layer (if there is one) because their actions affect the outcome"). Regarding Claim 9, the rejection of Claim 1 is incorporated. The Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination teaches: wherein an interactive feature of an order of k comprises a crossing product of k distinct features, wherein k is an integer greater than or equal to one (Veličković, p. 3, 2.1 Graph attentional layer: "The input to our layer is a set of node features, h = { h 1 → , h 2 → , … , h N → } , h i → ∈ R F , where N is the number of nodes, and F is the number of features in each node"). Regarding Claim 10, the rejection of Claim 1 is incorporated. The Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination teaches: collecting data for the determined number of interactive features of the one or more orders (Veličković, p. 6, 3.1 Datasets: "We utilize three standard citation network benchmark datasets—Cora, Citeseer and Pubmed.... The Cora dataset contains 2708 nodes, 5429 edges, 7 classes and 1433 features per node"); and making new inferences for the application based on the collected data using the lightweight predictive model (Veličković, p. 8, 3.4 Results: "Table 2: Summary of results in terms of classification accuracies, for Cora, Citeseer and Pubmed"). Claims 15 and 20 recite substantively the features of Claim 10 in computer readable media and system forms, respectively, and are rejected under the same rationale. Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Veličković, et al., "Graph Attention Networks" (hereinafter "Veličković") in view of Lee, et al. (US 2020/0285944 A1, hereinafter "Lee1") in view of Leskovec, et al. "Kronecker graphs: an approach to modeling networks" (hereinafter "Leskovec") in view of Chen, Hanting, et al. "Learning Student Networks via Feature Embedding" (hereinafter "Chen") in view of Lee, et al. (US 2021/0294840 A1, hereinafter "Lee2"), in further view of Guilak (US 6,292,577 B1, hereinafter "Guilak") in view of the LINQS Cora dataset. Regarding Claim 4, the rejection of Claim 1 is incorporated. The Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination teaches: receiving the plurality of distinct features (Veličković, p. 6, 3.1 Datasets: "We utilize three standard citation network benchmark datasets—Cora, Citeseer and Pubmed.... the training algorithm has access to all of the nodes’ feature vectors.... The Cora dataset contains 2708 nodes, 5429 edges, 7 classes and 1433 features per node"). The Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination may not explicitly teach receiving the plurality of distinct features in a tabular format. However, the LINQS Cora dataset teaches: receiving the plurality of distinct features in a tabular format (Cora dataset). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination regarding using a graph attention network to perform classification with those of with those of LINQS Cora data regarding receiving the plurality of distinct features in a tabular format. The motivations to do so would be to provide performance statistics against an established benchmark (Veličković p. 1, Abstract: "Our GAT models have achieved or matched state-of-theart results across four established transductive and inductive graph benchmarks: the Cora" and p. 2, Introduction: "We validate the proposed approach on four challenging benchmarks: Cora, Citeseer and Pubmed citation networks as well as an inductive protein-protein interaction dataset, achieving or matching state-of-the-art results that highlight the potential of attention-based models when dealing with arbitrarily structured graphs"). Claims 5, 12, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Veličković, et al., "Graph Attention Networks" (hereinafter "Veličković") in view of Lee, et al. (US 2020/0285944 A1, hereinafter "Lee1") in view of Leskovec, et al. "Kronecker graphs: an approach to modeling networks" (hereinafter "Leskovec") in view of Chen, Hanting, et al. "Learning Student Networks via Feature Embedding" (hereinafter "Chen") in view of Lee, et al. (US 2021/0294840 A1, hereinafter "Lee2"), in further view of Guilak (US 6,292,577 B1, hereinafter "Guilak") in view of Bojchevski, et al., "Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking," hereinafter Bojchevski. Regarding Claim 5, the rejection of Claim 1 is incorporated. The Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination teaches: wherein associating the plurality of nodes in the feature graph of the first order with the plurality of distinct features (Veličković, p. 3, 2.1 Graph attentional layer: "We will start by describing a single graph attentional layer.... The input to our layer is a set of node features, h = { h 1 → , h 2 → , … , h N → } , h i → ∈ R F , where N is the number of nodes, and F is the number of features in each node") that are associated with the application comprises (Veličković, p. 5, 3.3 Experimental setup: "For the transductive learning tasks, we apply a two-layer GAT model. Its architectural hyperparameters have been optimized on the Cora dataset and are then reused for Citeseer. ... The second layer is used for classification," where the instant application corresponds to Veličković's classification): ... mapping the feature representation into feature ... vectors, the feature ... vectors being treated as the plurality of nodes in the feature graph of the first order (Veličković, p. 3, 2.1 Graph attentional layer: "The input to our layer is a set of node features, h = { h 1 → , h 2 → , … , h N → } , h i → ∈ R F , where N is the number of nodes, and F is the number of features in each node," where Veličković's input features are transformed into higher-level features, as in p. 3, 2.1 Graph attentional layer: "to transform the input features into higher-level features, at least one learnable linear transformation is required"). The Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination does not explicitly teach converting the plurality of distinct features into a feature representation using an one-hot encoding and mapping the feature representation into feature embedding vectors. However, Bojchevski teaches: converting the plurality of distinct features into a feature representation using an one-hot encoding (Bojchevski, 3.4, Discussion: "Even though attributed graphs are often found in the real-world, sometimes it is desirable to analyze plain graphs. As already discussed, our method easily handles plain graphs, when the attributes are not available, by using one-hot encoding of the nodes instead"); and mapping the feature representation into feature embedding vectors (Bojchevski, p. 3, 3 Deep Gaussian embedding: "Let G = (A,X) be a directed attributed graph, where ... X ... collects the attribute information for each node where x i is a D dimensional attribute vector of the ith node.... We aim to find a lower-dimensional Gaussian distribution embedding ... such that nodes similar w.r.t. attributes and network structure are also similar in the embedding space"). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Veličković/Lee1/Leskovec/Chen/Lee2/Guilak combination regarding associating nodes distinct features with first-order nodes of the feature graph, with those of Bojchevski regarding mapping features to embedding vectors and using a one-hot encoding. The motivations to do so would be to take advantage of proved learning techniques (Bojchevski p. 1, Abstract: "By operating in the embedding space, one can employ proved learning techniques and bypass the difficulty of incorporating the complex node interactions"). Claims 12 and 17 recite substantively the features of Claim 5 in computer readable media and system forms, respectively, and are rejected under the same rationale. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Lai, et al., "Policy-GNN: Aggregation Optimization for Graph Neural Networks," teach a method of training a policy network to adaptively aggregate node features of graph data. Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT N DAY whose telephone number is (703)756-1519. The examiner can normally be reached M-F 9-5. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /R.N.D./Examiner, Art Unit 2122 /KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
Read full office action

Prosecution Timeline

Show 6 earlier events
Jun 17, 2025
Non-Final Rejection mailed — §101, §103
Jul 24, 2025
Response Filed
Nov 05, 2025
Final Rejection mailed — §101, §103
Nov 26, 2025
Request for Continued Examination
Dec 02, 2025
Response after Non-Final Action
Jan 13, 2026
Non-Final Rejection mailed — §101, §103
Jan 29, 2026
Response Filed
May 26, 2026
Final Rejection mailed — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12632783
FEDERATED CONTINUAL LEARNING
3y 10m to grant Granted May 19, 2026
Patent 12406181
METHOD, DEVICE, AND COMPUTER PROGRAM PRODUCT FOR UPDATING MODEL
4y 5m to grant Granted Sep 02, 2025
Patent 12229685
MODEL SUITABILITY COEFFICIENTS BASED ON GENERATIVE ADVERSARIAL NETWORKS AND ACTIVATION MAPS
4y 0m to grant Granted Feb 18, 2025
Study what changed to get past this examiner. Based on 3 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

7-8
Expected OA Rounds
21%
Grant Probability
41%
With Interview (+20.0%)
4y 1m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 24 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month