Office Action Analysis: 18306144 — TRAINING METHOD AND APPARATUS FOR GRAPH NEURAL NETWORK

Examiner Intelligence

COLEMAN, PAUL View full profile →
Grants 64% of resolved cases
Career Allowance Rate
9 granted / 14 resolved
+9.3% vs TC avg
Strong +46% interview lift
Without
With
+45.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
9 currently pending
Career history
34
Total Applications
across all art units
Statute-Specific Performance

§101
2.1%
-37.9% vs TC avg
§103
97.9%
+57.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 14 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/06/2025 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 for reciting an abstract idea without significantly more.  

Regarding claim 1
Claim 1 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a process.
Claim 1 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
“performing multiple rounds of updating on a graph neural network based on a relational graph, a round of the multiple rounds including:” – this limitation recites an iterative training scheme for a mathematical model (GNN), i.e., an algorithmic workflow. The “multiple rounds of updating” is an idea of organizing computation over iterations, which is a mental process and/or mathematical concept under MPEP § 2106.04(a)(2)(III) and MPEP § 2106.04(a)(2)(I). 
“processing the relational graph by using a current graph neural network, to obtain multiple classification prediction vectors corresponding to multiple nodes in the relational graph;” – output prediction vectors is applying a mathematical model to input data to produce numeric outputs (classification probabilities/vectors). This is a mathematical concept (evaluating a function/model on data) and also a mental process (evaluation/analysis that can be performed conceptually). See MPEP § 2106.04(a)(2)(I); §2106.04(a)(2)(III).  
“allocating a corresponding pseudo classification label to a first quantity of unlabeled nodes in the multiple nodes based on the multiple classification prediction vectors;” – this limitation is selecting/assigning labels based on numeric outputs (e.g., choose a category based on prediction probabilities). This is merely an evaluation + decision rule over data (a mental process), implemented as a classification rule. See MPEP § 2106.04(a)(III). 
“determining, for each of the first quantity of unlabeled nodes, an information gain generated by training the current graph neural network by using the unlabeled node;” – this limitation recites “determining … information gain” is explicitly an information gain” is explicitly an information theory / mathematical measurement (information gain/entropy) applied to the training process. This is a mathematical concept under MPEP § 2106.04(a)(2)(I). 
“and updating a model parameter in the current graph neural network according to a classification prediction vector and a real classification label that are corresponding to each labeled node in the multiple nodes, and a classification prediction vector, a pseudo classification label, and an information gain that are corresponding to each unlabeled node.” – updating model parameters “according to” prediction vectors/labels/information gain is mathematical optimization (adjusting parameters via a loss/gradient/backprop-type update), which is a mathematical concept under MPEP § 2106.04(a)(2)(I). 
Claim 1 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 1 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 2
Claim 2 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a process.
Claim 2 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
“wherein the multiple nodes comprise a second quantity of unlabeled nodes,” – this limitation is merely data characterization (identifying a set of unlabeled nodes). It is part of an algorithmic data model and can be performed mentally/conceptually as classifying items in a set. See MPEP § 2106.04(a)(2)(III). 
“and classification prediction vectors comprise multiple prediction probabilities corresponding to multiple categories;” – this limitation is a mathematical representation of data (probabilities per category), i.e., numeric values describing likelihoods. See MPEP § 2106.04(a)(2)(I). 
“wherein the allocating the corresponding pseudo classification label to the first quantity of unlabeled nodes in the multiple nodes based on the multiple classification prediction vectors includes: for each node in the second quantity of unlabeled nodes, in response to that a maximum prediction probability included in a classification prediction vector corresponding to the node reaches a threshold, classifying the node into the first quantity of unlabeled nodes, and determining a category corresponding to the maximum prediction probability as a pseudo classification label of the node.” – this is a decision rule based on comparing value to a threshold and selecting the argmax category, i.e., an evaluation + judgment/selection that can be performed mentally and is also a mathematical operation over probabilities. See MPEP § 2106.04(a)(2)(III); §2106.04(a)(2)(I). 
Claim 2 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 2 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 3
Claim 3 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a process.
Claim 3 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
“wherein the determining, for each of the first quantity of unlabeled nodes, the information gain generated by training the current graph neural network by using the unlabeled node includes: for a first unlabeled node of the first quantity of unlabeled nodes, training the current graph neural network by using a first classification prediction vector and a pseudo classification label that are corresponding to the first unlabeled node, and determining a second classification prediction vector of the first unlabeled node based on a trained first graph neural network;” – this recites mathematical model training and evaluation 
“determining first information entropy according to the first classification prediction vector; determining second information entropy according to the second classification prediction vector; and obtaining the information gain based on a difference between the second information entropy and the first information entropy.”
Claim 3 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. The additional elements:
Claim 3 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. The additional elements are:

Regarding claim 4
Claim 4 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a process.
Claim 4 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea. 
“performing, at an aggregation layer in the multiple aggregation layers, random zeroing processing on vector elements in multiple aggregation vectors for the multiple nodes that are output by an upper aggregation layer, and determining, based on the multiple aggregation vectors after the random zeroing processing, multiple aggregation vectors that are output by the aggregation layer for the multiple nodes;” – this limitation recites operations on vectors within layers of a neural network (dropout-like masking / stochastic zeroing) and computing new vectors based thereon. These are mathematical operations (matrix/vector manipulation with a random mask) and/or a mathematical approximation technique used in model inference/training. See MPEP § 2106.04(a)(2)(I). 
“and processing, at the output layer, an aggregation vector output by a last aggregation layer for the first unlabeled node, to obtain the second classification prediction vector.” – this is the application of a mathematical layer function to an input vector to produce prediction vector (again, evaluating a mathematical model). See MPEP § 2106.04(a)(2)(I). 
Claim 4 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 4 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 5
Claim 5 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a process.
Claim 5 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
“performing, at an aggregation layer in the multiple aggregation layers, random zeroing processing on a matrix element in an adjacency matrix corresponding to the relational graph, and determining, based on the adjacency matrix after the random zeroing processing and multiple aggregation vectors that are output by an upper aggregation layer for the multiple nodes, multiple aggregation vectors for the multiple nodes that are output by the aggregation layer;” – this limitation recites manipulating an adjacency matrix (randomly zeroing an element) and computing new vectors based on that matrix and prior vectors. These are mathematical operations on abstract data structures (matrices/vectors), i.e., a stochastic masking/perturbation of the graph representation followed by matrix/vector-based aggregation. See MPEP § 2106.04(a)(2)(I).
“and processing, at the output layer, an aggregation vector output by a last aggregation layer for the first unlabeled node, to obtain the second classification prediction vector.” – applying an output layer function to an aggregation vector to obtain a prediction vector is evaluation of a mathematical model (vector transformation). See MPEP § 2106.04(a)(2)(I). 
Claim 5 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 5 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 6
Claim 6 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a process.
Claim 6 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
“performing for multiple times an operation of determining the second classification prediction vector to correspondingly obtain multiple second classification prediction vectors;” – this is repeating a model-evaluation operation multiple times to generate multiple outputs. Repetition/iteration of a mathematical model evaluation (especially in the context of the random zeroing/sampling in claim 4) is a mathematical sampling procedure. See MPEP § 2106.04(a)(2)(I). 
“determining an average value of multiple pieces of information entropy respectively corresponding to the multiple second classification prediction vectors as the second information entropy.” – computing entropies and then taking an average is a straightforward mathematical calculation. See MPEP § 2106.04(a)(2)(I).
Claim 6 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application.
Claim 6 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 7
Claim 7 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a process.
Claim 7 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
“determining a first loss term according to the classification prediction vector and the real classification label that are corresponding to each labeled node;” – computing a “loss term” from prediction vectors and labels is a mathematical calculation (objective function evaluation). See MPEP § 2106.04(a)(2)(I). 
“determining a second loss term for each unlabeled node according to the classification prediction vector and the pseudo classification label that are corresponding to each unlabeled node, and weighting the second loss term by using the information gain corresponding to the unlabeled node;” – computing a second loss term and applying a weight (information gain) is a mathematical operation (weighted objective function / weighted summation). See MPEP § 2106.04(a)(2)(I). 
“and updating the model parameter according to the first loss term and the weighted second loss term.” – updating model parameters based on loss terms is mathematical optimization (e.g., gradient-based update) and is a mathematical concept under MPEP § 2106.04(a)(2)(I).
Claim 7 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application.
Claim 7 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?

Regarding claim 8
Claim 8 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a process.
Claim 8 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
“normalizing the information gain of each unlabeled node by using a first quantity of information gains corresponding to the first quantity of unlabeled nodes, to obtain a corresponding weighting coefficient;” – normalization and computing a weighting coefficient from a set of values is a mathematical calculation (scaling/normalizing a set of numbers). See MPEP § 2106.04(a)(2)(I). 
“and performing weighting processing by using the weighting coefficient.” – applying a computed weight to a quantity (here, the second loss term per claim 7) is a mathematical operation (multiplication/scaling or weighted combination). See MPEP § 2106.04(a)(2)(I).
Claim 8 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 8 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claims 9-16 (apparatus / system claims)
Each of claims 9-16 is the apparatus/system analog of the already-analyzed method claim (9 [Wingdings font/0xDF][Wingdings font/0xE0] 1; 10 [Wingdings font/0xDF][Wingdings font/0xE0] 2; 11 [Wingdings font/0xDF][Wingdings font/0xE0] 3; 12 [Wingdings font/0xDF][Wingdings font/0xE0] 4; 13 [Wingdings font/0xDF][Wingdings font/0xE0] 5; 14 [Wingdings font/0xDF][Wingdings font/0xE0] 6; 15 [Wingdings font/0xDF][Wingdings font/0xE0] 7; 16 [Wingdings font/0xDF][Wingdings font/0xE0] 8;). In each of claims 9-16, the functional language associated with “at least one memory storing executable instructions” and “at least one processor configured to execute the executable instructions” merely requires generic computing components (processor and memory) configured to perform the same abstract operations already identified in method claims 1-8 (e.g., processing a relational graph with a graph neural network to obtain per-node prediction vectors; allocating pseudo labels to unlabeled nodes based on prediction vectors (including thresholding/argmax); determining an information gain/information measure (including entropy-based computation, stochastic masking such as random zeroing of aggregation vectors or adjacency matrix elements, and repeated sampling/averaging where applicable); and updating model parameters using labeled loss and weighted/normalized unlabeled loss terms). The apparatus/system form does not materially change the §101 analysis. Under Step 2A (Prong One / Prong Two), recasting the same abstract mathematical concepts and mental processes as operations performed by a generic processor executing instructions stored in generic memory does not add a different judicial exception and does not integrate the abstract idea into a practical application; the processor/memory are used in their ordinary, result-oriented fashion to execute the same abstract training computations. Under Step 2B, the recited processor and memory are generic computing hardware operating in a conventional manner in a generic computing environment (well-understood, routine, and conventional (WURC)) and thus does not provide an inventive concept. Because each of claims 9-16 is the system/apparatus analog of a corresponding method claim and merely implements the same abstract mathematical-concept limitations on generic computing units without adding a practical application or inventive concept, claims 9-16 are rejected under 35 U.S.C. § 101 for the same reasons discussed above for claims 1-8, respectively.

Regarding claims 17-20 (computer-readable medium / computer program product claims)
Each of claims 17-20 is the non-transitory computer-readable medium (CRM) analog of an already-analyzed method claim (17 [Wingdings font/0xDF][Wingdings font/0xE0] 1; 18 [Wingdings font/0xDF][Wingdings font/0xE0] 2; 19 [Wingdings font/0xDF][Wingdings font/0xE0] 3; 20 [Wingdings font/0xDF][Wingdings font/0xE0] 4). For each of claims 17-20, the functional language associated with “computer-executable instructions” and “when executed by a processor, configure the processor to perform actions comprising …” merely causes a generic processor, when executing instructions stored on a generic non-transitory medium, to perform the same abstract operations already identified in method claims 1-4 (e.g., processing as relational graph with a graph neural network to obtain prediction vectors; allocating pseudo labels to a subset of unlabeled nodes based on prediction vectors (including thresholding/argmax); determining information grain/information measures including entropy-based calculations and retraining/re-prediction where applicable; and generating second prediction vectors using stochastic masking such as random zeroing at aggregation/output layers, as applicable). This CRM form does not materially change the eligibility analysis. Under Step 2A (Prong One / Prong Two), encoding the abstract method as instructions on a non-transitory storage medium does not add a different juridical exception and does not integrate the abstract idea into a practical application; the instructions simply implement the same abstract mathematical computations on a generic processors operating in a conventional networked or stand-alone computing environment (WURC)  and thus do not provide an inventive concept. Because each of claims 17-20 is the computer-program-product analog of a corresponding method claim and merely implements the same abstract mathematical concept limitations on generic computer-readable media and processors without adding a practical application or inventive concept, claims 17-20 are rejected under 35 U.S.C. § 101 for the same reasons discussed above for claims 1-4, respectively.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yayong Li et al. (Informative Pseudo-Labeling for Graph Neural Networks with Few Labels) in view of Yarin Gal et al. (Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning) and further in view of Huiyuan Chen et al. (US11,966,832B2).

Regarding claim 1, Li in view of Gal, teach a method for training a graph neural network, comprising: 
“performing multiple rounds of updating on a graph neural network based on a relational graph, a round of the multiple rounds including:” – Li teaches this limitation. Li discloses iterative self-training:
“Our proposed InfoGNN framework is given by Algorithm 1, which consists of one pre-training phase and one formal training phase..” (Li, p. 11, § 4.6 Model Training and Computational Complexity)

“… for t = 0; t < epoches; t = t + 1 do” (Li, p. 12, § Algorithm 1: Training InfoGNN with few labels)
The explicit training loop “for t … epoches” and the two-phase regime (“pre-training phase” and “formal training phase”) establish multiple rounds of updating on the graph-based model (GNN), meeting the claimed iterative updating requirement.

“processing the relational graph by using a current graph neural network, to obtain multiple classification prediction vectors corresponding to multiple nodes in the relational graph;” – Li teaches this limitation. Li defines the graph G={V, E, X} and uses a GNN encoder (GCN) that outputs class prediction probabilities per node, including a confidence score derived from the maximum predicted probability:
“Let                 
                    G
                    =
                    {
                    V
                    ,
                     
                    E
                    ,
                    X
                    }
                
             represents an undirected graph, where                 
                    V
                    =
                    {
                    
                            v
                        
                            1
                        
                    ,
                     
                            v
                        
                            1
                        
                    ,
                    …
                    ,
                     
                            v
                        
                            n
                        
                    }
                
             denotes a set of                 
                    n
                
             nodes, and                 
                    ε
                
             denotes the set of edges connected nodes.” (Li, p. 5, § 3 Problem Statement)

“The generated node embeddings can then be used as input to any differentiable prediction layer, for example, a softmax layer for node classification.” (Li, p. 4, § 2.1 Graph Learning with Few Labels)

“We then produce the pseudo labels for                 
                    
                            U
                        
                            p
                        
             utilizing the GNN encoder                 
                    
                            f
                        
                            θ
                        
                    (
                    ∙
                    )
                
            :                 
                    
                                    y
                                
                                ^
                            
                            v
                        
                    =
                    
                            a
                            r
                            g
                            m
                            a
                            x
                        
                            j
                        
                            f
                        
                            θ
                        
                            (
                            
                                    x
                                
                                    v
                                
                            )
                        
                            j
                        
                    ;
                    v
                    ∈
                    
                            U
                        
                            p
                        
            ” (Li, p. 10, § 4.3 Candidate Selection for Pseudo Labelling)
A POSITA would understand                 
                    
                            f
                        
                            θ
                        
                    (
                    
                            x
                        
                            v
                        
                    )
                
             in Eq. (11) as the per-node classification prediction vector (e.g., softmax outputs over classes). Processing the graph G={V, E, X} with the current GNN encoder                 
                    
                            f
                        
                            θ
                        
                    (
                    ∙
                    )
                
             yields such per-node prediction vectors for nodes                 
                    v
                    ∈
                    V
                
            , satisfying this limitation.

“allocating a corresponding pseudo classification label to a first quantity of unlabeled nodes in the multiple nodes based on the multiple classification prediction vectors;” – Li teaches this limitation. Li teaches:
“                
                    
                            U
                        
                            p
                        
                    =
                    {
                    v
                    ∈
                    U
                    |
                    
                                    (
                                    s
                                
                                    r
                                
                                    v
                                
                            +
                            
                                    s
                                
                                    c
                                
                                    v
                                
                            )
                        
                            2
                        
                    >
                    k
                    ,
                     
                    s
                    .
                    t
                    .
                    
                            s
                        
                            c
                        
                            v
                        
                    >
                    k
                    }
                
            ” (Li, p. 9, § 4.3 Candidate Selection for Pseudo Labelling)

“We then produce the pseudo labels for                 
                    
                            U
                        
                            p
                        
             utilizing the GNN encoder                 
                    
                            f
                        
                            θ
                        
                    (
                    ∙
                    )
                
            :                 
                    
                                    y
                                
                                ^
                            
                            v
                        
                    =
                    
                            a
                            r
                            g
                            m
                            a
                            x
                        
                            j
                        
                            f
                        
                            θ
                        
                            (
                            
                                    x
                                
                                    v
                                
                            )
                        
                            j
                        
                    ;
                    v
                    ∈
                    
                            U
                        
                            p
                        
            ” (Li, p. 10, § 4.3 Candidate Selection for Pseudo Labelling)
Equation (10) defines                 
                    
                            U
                        
                            p
                        
             as a subset (first quantity) of unlabeled nodes                 
                    U
                
            , selecting using model-derived scoring (including confidence                 
                    
                            s
                        
                            c
                        
                    (
                    v
                    )
                
            ). Equation (11) then allocates pseudo labels                 
                    
                                    y
                                
                                ^
                            
                            v
                        
             based on the GNN output                 
                    
                            f
                        
                            θ
                        
                    (
                    
                            x
                        
                            v
                        
                    )
                
             (the prediction vector). This directly meets the limitation.

“determining, for each of the first quantity of unlabeled nodes, an information gain generated by training the current graph neural network by using the unlabeled node;” – Li teaches this limitation in part. Li teaches determining, per unlabeled node, an information-related quantity (informativeness/representativeness) used in pseudo-label selection and retraining:
“assess node informativeness based on MI estimation maximization.” (Li, p. 18, § 6 Conclusion)

“more informative unlabeled nodes are selected for pseudo labeling.” (Li, p. 18, § 6 Conclusion)

And  Li’s algorithm explicitly generates informativeness scores for nodes and then retrains the GNN using pseudo labels:
“generate prediction probabilities and informativeness score for each node” (Li, p. 12, § 4.6 Model Training and Computational Complexity)

“both given labels and pseudo labels are used to re-train the GNN” (Li, p. 12, § 4.6 Model Training and Computational Complexity)

Gal teaches a well-known technique (MC dropout) for quantifying predictive uncertainty/information from the network via repeated stochastic forward passes, which a POSITA would have used to implement Li’s per-node information metric (information gain/informativeness) in the context of Li’s pseudo-labeled retraining pipeline:
“equivalent to performing                 
                    T
                
             stochastic forward passes through the network and averaging the results.” (Gal, p. 4, § 4. Obtaining Model Uncertainty)
A POSITA would have understood Li’s “informativeness score for each node” (computed for nodes and used in pseudo-label selection) as an information-based metric associated with incorporating those unlabeled nodes into training, because Li expressly (i) “generate[s] … informativeness score for each node” and (ii) uses the resulting pseudo labels such that “both given labels and pseudo labels are used to re-train the GNN”, and a POSITA would have been motivated to implement that per-node information metric using Gal’s MC-dropout technique to quantify predictive uncertainty/information attributable to training that includes the unlabeled node, with a reasonable expectation of success.

“and updating a model parameter in the current graph neural network according to a classification prediction vector and a real classification label that are corresponding to each labeled node in the multiple nodes, and a classification prediction vector, a pseudo classification label, and an information gain that are corresponding to each unlabeled node.” – Li teaches this limitation. Li teaches:
“The pre-training phase (Step 2-4) is used to train a parameterized GNN with given labels. Accordingly, network parameters are updated by:                 
                    
                            l
                        
                            p
                            r
                            e
                        
                    =
                    
                            l
                        
                            L
                        
                    +
                    
                            α
                            l
                        
                            I
                        
            ” (Li, p. 11, § 4.6 Model Training Computational Complexity)

“At the beginning of the formal training phase, the pre-trained GNN is applied
to generate prediction probabilities and informativeness score for each node,
which are then used to produce pseudo labels (Step 6-8). Finally, both given
labels and pseudo labels are used to re-train the GNN by minimizing the
following loss function (Step 9):                 
                    l
                    =
                    
                            l
                        
                            L
                        
                    +
                    
                            l
                        
                            T
                        
                    +
                    
                            α
                            l
                        
                            I
                        
                    +
                    
                            β
                            l
                        
                            K
                            L
                        
            ” (Li, p. 12, § 4.6 Model Training and Computational Complexity)

“Compared with the SCE loss, it actually weighs each gradient by an additional                 
                    
                            f
                        
                            θ
                        
                    (
                    
                                    x
                                
                                    i
                                
                            )
                        
                            j
                        
                            q
                        
            ,which reduces the gradient descending on those unreliable pseudo labels with lower prediction probabilities.” (Li, p. 10, § 4.4 Mitigating Noisy Pseudo Labels)
The labeled-node update is supported by the pre-training disclosure explicitly stating that “network parameters are updated” during training with given labels (Eq. 18). The unlabeled-node update is supported by the formal training disclosure that the model generates prediction probabilities and informativeness scores, produces pseudo labels, and “re-train[s] the GNN” using “both given labels and pseudo labels” with a combined loss (Eq. 19).

Li already uses prediction confidence and an MI-based (mutual information) informativeness measure to select and incorporate unlabeled nodes into pseudo-label retraining. Gal teaches a well-known technique (MC dropout) to estimate predictive uncertainty/information via “T stochastic forward passes … and averaging”. A POSITA would have been motivated to use Gal’s MC-dropout uncertainty estimation within the InfoGNN pseudo-labeling/training pipeline to quantify information/uncertainty associated with unlabeled nodes and thereby determine an information-based value (information gain/informativeness) for use in the training process, with a reasonable expectation of success, because both references address measuring predictive uncertainty/information and improving learning under limited labels.

Regarding claim 2, Li teaches the method according to claim 1, wherein 
“the multiple nodes comprise a second quantity of unlabeled nodes,” – Li teaches this limitation. Li expressly defines                         
                            U
                        
                     as the set of unlabeled nodes and then selects a subset                         
                            
                                    U
                                
                                    p
                                
                            ∈
                            U
                        
                    : 
“                
                    
                            U
                        
                            p
                        
                    =
                    {
                    v
                    ∈
                    U
                    |
                    
                                    (
                                    s
                                
                                    r
                                
                                    v
                                
                            +
                            
                                    s
                                
                                    c
                                
                                    v
                                
                            )
                        
                            2
                        
                    >
                    k
                    ,
                     
                    s
                    .
                    t
                    .
                    
                            s
                        
                            c
                        
                            v
                        
                    >
                    k
                    }
                
            ” (Li, p. 9, § 4.3 Candidate Selection for Pseudo Labelling)

“and classification prediction vectors comprise multiple prediction probabilities corresponding to multiple categories;” – Li teaches this limitation. Li discloses:
“The generated node embeddings can then be used as input to any differentiable prediction layer, for example, a softmax layer for node classification.” (Li, p. 4, § 2 Related Works)

“Finally, according to the class prediction probabilities, we can obtain the confidence score for each node v:                 
                    
                            s
                        
                            c
                        
                            v
                        
                    =
                    
                            m
                            a
                            x
                        
                            j
                        
                            f
                        
                            θ
                        
                            (
                            
                                    x
                                
                                    v
                                
                            )
                        
                            j
                        
            ” (Li, p. 7, § 4.2 The GNN Encoder)
A “softmax layer for node classification” produces, for each node, a vector of class prediction probabilities across multiple categories, and Li expressly refers to “class prediction probabilities” and takes max over j of                 
                    
                            f
                        
                            θ
                        
                            (
                            
                                    x
                                
                                    v
                                
                            )
                        
                            j
                        
            , which necessarily implies multiple category probabilities.
 
“wherein the allocating the corresponding pseudo classification label to the first quantity of unlabeled nodes in the multiple nodes based on the multiple classification prediction vectors includes: for each node in the second quantity of unlabeled nodes, in response to that a maximum prediction probability included in a classification prediction vector corresponding to the node reaches a threshold, classifying the node into the first quantity of unlabeled nodes,” – Li teaches this limitation. Li explicitly defines the “maximum prediction probability” and applies a threshold as a condition for selecting nodes:
“                
                    
                            s
                        
                            c
                        
                            v
                        
                    =
                    
                            m
                            a
                            x
                        
                            j
                        
                            f
                        
                            θ
                        
                            (
                            
                                    x
                                
                                    v
                                
                            )
                        
                            j
                        
            ” (Li, p. 7, § 4.2 The GNN Encoder)

“                
                    
                            U
                        
                            p
                        
                    =
                    {
                    v
                    ∈
                    U
                    |
                    
                                    (
                                    s
                                
                                    r
                                
                                    v
                                
                            +
                            
                                    s
                                
                                    c
                                
                                    v
                                
                            )
                        
                            2
                        
                    >
                    k
                    ,
                     
                    s
                    .
                    t
                    .
                    
                            s
                        
                            c
                        
                            v
                        
                    >
                    k
                    }
                
            ” (Li, p. 9, § 4.3 Candidate Selection for Pseudo Labelling)

“Hyperparameter k is the threshold for which controls how many unlabeled nodes are selected for pseudo labeling.” (Li, p. 18, § 5.6 Sensitivity Analysis)

“and determining a category corresponding to the maximum prediction probability as a pseudo classification label of the node.” – Li teaches this limitation. Li expressly teaches determining the pseudo label as the argmax category of the prediction probability vector:
“We then produce the pseudo labels for                 
                    
                            U
                        
                            p
                        
             utilizing the GNN encoder                 
                    
                            f
                        
                            θ
                        
                    (
                    ∙
                    )
                
            :                 
                    
                                    y
                                
                                ^
                            
                            v
                        
                    =
                    
                            a
                            r
                            g
                            m
                            a
                            x
                        
                            j
                        
                            f
                        
                            θ
                        
                            (
                            
                                    x
                                
                                    v
                                
                            )
                        
                            j
                        
                    ;
                    v
                    ∈
                    
                            U
                        
                            p
                        
            ” (Li, p. 10, § 4.3 Candidate Selection for Pseudo Labelling)

Claim 2 depends from claim 1; therefore, the same motivation to combine applied to claim 1 is also applied to claim 2.

Regarding claim 3, Li in view of Gal, teach the method according to claim 1, wherein the determining, for each of the first quantity of unlabeled nodes, the information gain generated by training the current graph neural network by using the unlabeled node includes: 
“for a first unlabeled node of the first quantity of unlabeled nodes, training the current graph neural network by using a first classification prediction vector and a pseudo classification label that are corresponding to the first unlabeled node,” – Li teaches this limitation. Li expressly defines training on pseudo labels for unlabeled nodes as part of the training objective:  
“                
                    U
                
             denotes the set of unlabeled nodes.” (Li, p. 6, § 3 Problem Statement)

“                
                    Y
                    (
                    ∙
                    )
                
             for generating reliable pseudo labels” (Li, p. 6, §§ 3 Problem Statement - Problem 1) 

“We then produce the pseudo labels for                 
                    
                            U
                        
                            p
                        
             utilizing the GNN encoder                 
                    
                            f
                        
                            θ
                        
                    (
                    ∙
                    )
                
            :                 
                    
                                    y
                                
                                ^
                            
                            v
                        
                    =
                    
                            a
                            r
                            g
                            m
                            a
                            x
                        
                            j
                        
                            f
                        
                            θ
                        
                            (
                            
                                    x
                                
                                    v
                                
                            )
                        
                            j
                        
                    ;
                    v
                    ∈
                    
                            U
                        
                            p
                        
            ” (Li, p. 10, § 4.3 Candidate Selection for Pseudo Labelling)

And Li’s training objective explicitly includes a loss term over unlabeled nodes using pseudo labels:
“                
                    
                            m
                            i
                            n
                        
                            θ
                        
                    J
                    =
                    
                            ∑
                            
                                        x
                                    
                                        i
                                    
                                ∈
                                L
                            
                                    l
                                
                                    L
                                
                                            y
                                        
                                            i
                                        
                                    ,
                                     
                                            f
                                        
                                            θ
                                        
                                                    x
                                                
                                                    i
                                                
                            +
                            
                                    ∑
                                    
                                                x
                                            
                                                i
                                                ∈
                                                U
                                            
                                            l
                                        
                                            U
                                        
                                            Y
                                            
                                                            x
                                                        
                                                            i
                                                        
                                            ,
                                             
                                                    f
                                                
                                                    θ
                                                
                                                            x
                                                        
                                                            i
                                                        
                                    *
                                    J
                                    (
                                    
                                            x
                                        
                                            i
                                        
                                    )
                                
            ” (Li, p. 6, §§ 3 Problem Statement - Problem 1) 

“and determining a second classification prediction vector of the first unlabeled node based on a trained first graph neural network;” – Li teaches this limitation. Li teaches a pre-train phase, then a formal training (re-training) phase where parameters are updated, and the model generates prediction probabilities: 
“The pre-training phase (Step 2-4) is used to train a parameterized GNN with given labels. Accordingly, network parameters are updated by:” (Li, p. 11, § 4.6 Model Training and Computational Complexity)

“At the beginning of the formal training phase, the pre-trained GNN is applied to generate prediction probabilities …” (Li, p. 12, § 4.6 Model Training and Computational Complexity)

“Finally, both given labels and pseudo labels are used to re-train the GNN by minimizing the following loss function … (19)” (Li, p. 12, § 4.6 Model Training and Computational Complexity)

“” – Li teaches this limitation in part. Li explicitly discloses the model output used for pseudo-labeling as                         
                            
                                    f
                                
                                    θ
                                
                            (
                            
                                    x
                                
                                    v
                                
                            )
                        
                     and that the system generates “prediction probabilities”, which is exactly a classification prediction vector (per-node vector over classes): 
“At the beginning of the formal training phase, the pre-trained GNN is applied to generate prediction probabilities …” (Li, p. 12, § 4.6 Model Training and Computational Complexity)

“                
                    
                                    y
                                
                                ^
                            
                            v
                        
                    =
                    
                            a
                            r
                            g
                            m
                            a
                            x
                        
                            j
                        
                            f
                        
                            θ
                        
                            (
                            
                                    x
                                
                                    v
                                
                            )
                        
                            j
                        
                    ;
                
            ” (Li, p. 10, § 4.3 Candidate Selection for Pseudo Labelling)

“… a softmax layer for node classification.” (Li, p. 4, § 2.1 Graph Learning with Few Labels)

“”  - Li teaches this limitation in part. Li discloses that after pseudo labels are produced, the GNN is re-trained, and the model generates prediction probabilities: 
“Finally, both given labels and pseudo labels are used to re-train the GNN …” (Li, p. 12, § 4.6 Model Training and Computational Complexity)

“the pre-trained GNN is applied to generate prediction probabilities … ” (Li, p. 12, § 4.6 Model Training and Computational Complexity) 

“” – Li teaches this limitation in part. Li discloses two model states enabling two entropy values:
“… pre-trained GNN” (Li, p. 12, § 4.6 Model Training and Computational Complexity)

“… re-train the GNN” (Li, p. 12 § 4.6 Model Training and Computational Complexity)
These disclosures establish before/after training states. Once first and second information entropy values are determined (as taught by applying Gal’s entropy measure to Li’s prediction probabilities before and after the retraining step), a POSITA would have found it obvious to obtain an “information gain” by computing a difference between the two entropy values (i.e., change in predictive uncertainty/information attributable to training/re-training), because the references teach entropy as the quantitative measure of predictive uncertainty/information and Li provides distinct model states (pre-trained vs retrained) that yield distinct prediction probability vectors for which entropy may be computed.  

Li does not teach these limitations: 
“determining first information entropy according to the …”
“determining second information entropy according to the …”
“and obtaining the information gain based on a difference between”

Gal, however, teaches these limitations: 
“determining first information entropy according to the …” – Gal explicitly teaches using entropy of the model prediction to quantify uncertainty/information:
“Model uncertainty in such cases can be quantified by looking at the entropy or variation ratios of the model prediction.” (Gal, p. 6, § 5.2 Model Uncertainty in Classification Tasks)

“determining second information entropy according to the …” – Gal explicitly teaches using entropy of the model prediction to quantify uncertainty/information:
“Model uncertainty in such cases can be quantified by looking at the entropy or variation ratios of the model prediction.” (Gal, p. 6, § 5.2 Model Uncertainty in Classification Tasks)

“and obtaining the information gain based on a difference between” – Gal teaches this limitation. Gal teaches that entropy is the information/uncertainty quantity of interest: 
“… quantified by looking at the entropy or variation ratios of the model prediction.” (Gal, p. 6, § 5.2 Model Uncertainty in Classification Tasks)

Li teaches a pseudo-labeling GNN framework that (i) generates prediction probabilities, (ii) produces pseudo labels from those predictions, and (iii) re-trains the GNN using pseudo labels, thereby producing updated prediction probabilities after training updates. Gal teaches that model uncertainty/information can be quantified by “looking at the entropy … of the model prediction”. A POSITA would have been motivated to apply Gal’s entropy-based uncertainty/information quantification to Li’s pre-training and post-retraining prediction probabilities to quantify information/uncertainty before and after training with pseudo-labeled unlabeled nodes, and to compute the change (difference) between these entropy values as an “information gain” associated with training that includes the unlabeled node, with a reasonable expectation of success.

Regarding claim 4, Li in view of Chen, teach the method according to claim 3, wherein 
“the trained first graph neural network comprises multiple aggregation layers and an output layer;” – Li teaches this limitation. Li expressly teaches a GNN/GCN with multiple layers performing aggregation/message passing (“aggregates and transforms”), and an output/prediction layer (“softmax layer for node classification”): 
“Most modern GNNs rely on an iterative message passing procedure that aggregates and transforms the features of neighboring nodes to learn node embeddings, which are then used for node classification.” (Li, p. 2, § 1 Introduction)

“graph convolutional networks (GCNs) typically with two layers)” (Li, p. 2. § 1 Introduction)

“The generated node embeddings can then be used as input to any differentiable prediction layer, for example, a softmax layer for node classification.” (Li, p. 4, § 2.1 Graph Learning with Few Labels)
This discloses “multiple aggregation layers” plus an “output layer”.

“and the determining the second classification prediction vector of the first unlabeled node based on the trained first graph neural network includes: performing, at an aggregation layer in the multiple aggregation layers, ” – Li teaches this limitation in part. Li explicitly teaches representing the relational/graph structure using an adjacency matrix A with matrix elements A(i, j):
“The graph structure is represented by the adjacent matrix                 
                    A
                    ∈
                    
                            R
                        
                            n
                            x
                            n
                        
            , where                 
                    A
                    
                            i
                            ,
                             
                            j
                        
                    ∈
                    {
                    0,1
                    }
                
            ” (Li, p. 6, § 3 Problem Statement)
This discloses edge relationships, thus satisfying this limitation. 

“and determining, ” – Li teaches this limitation in part. Li teaches that a propagation/aggregation layer aggregates and transforms neighbor features/messages to produce updated node embeddings (i.e., the layer output vectors for multiple nodes):
“Most modern GNNs rely on an iterative message passing procedure that aggregates and transforms the features of neighboring nodes to learn node embeddings …” (Li, p. 2, § 1 Introduction)

“and processing, at the output layer, an aggregation vector output by a last aggregation layer for the first unlabeled node, to obtain the second classification prediction vector.” – Li teaches this limitation. Li discloses that the node embeddings produced by the (last) message-passing/aggregation layers are input to a prediction layer (softmax) for classification: 

“The generated node embeddings can then be used as input to any differentiable prediction layer, for example, a softmax layer for node classification.” (Li, p. 4, § 2.1 Graph Learning with Few Labels)

Li does not teach these limitations: 
“… random zeroing processing on vector elements in multiple aggregation vectors …”
“… based on the multiple aggregation vectors after the random zeroing processing, …”

Chen, however, teaches this limitations: 
“… random zeroing processing on vector elements in multiple aggregation vectors …” – Chen teaches this limitation. Chen discloses that message dropout randomly drops outgoing messages in each propagation layer, i.e., randomly zeroing/omitting components of the message/feature vectors propagated from an upper layer during aggregation:
“Message Dropout … randomly drops the outgoing messages in each propagation layers to refine representations.” (Chen, col. 9, lines 40-42)

“Vanilla Dropout … randomly masks out the elements of weight matrix” (Chen, col. 12, lines 10-11)
This confirms that dropout is understood in the art as randomly masking elements, which corresponds to random zeroing of vector elements.

“… based on the multiple aggregation vectors after the random zeroing processing, …” – Chen teaches that after dropout/message-dropout, the messages/vectors used in the propagation layer are modified by random dropping/masking:
“Message Dropout … randomly drops the outgoing messages in each propagation layers to refine representations.” (Chen, col. 9, lines 40-42)

Li describes a GNN/GCN architecture in which multiple message-passing/aggregation layers produce node embeddings that are then input to a prediction/output layer (e.g., softmax) for node classification. Chen teaches applying dropout within graph propagation layers, including “Message Dropout” that “randomly drops the outgoing message in each propagation layers“, i.e., random masking/zeroing of propagated feature/message vectors during aggregation to refine representations. A POSITA would have been motivated to apply Chen’s message-dropout masking within Li’s multi-layer GNN message-passing pipeline (at an aggregation layer using outputs of an upper layer) to obtain masked intermediate vectors and then compute the layer outputs based on the masked vectors, before applying the output layer to obtain the classification prediction vector, with a reasonable expectation of success.

Regarding claim 5, Li in view of Chen, teach the method according to claim 3, wherein 
“the trained first graph neural network comprises multiple aggregation layers and an output layer;” – Li teaches this limitation. Li expressly teaches a GNN/GCN with multiple layers performing aggregation/message passing (“aggregates and transforms”), and an output/prediction layer (“softmax layer for node classification”): 
“Most modern GNNs rely on an iterative message passing procedure that aggregates and transforms the features of neighboring nodes to learn node embeddings, which are then used for node classification.” (Li, p. 2, § 1 Introduction)

“graph convolutional networks (GCNs) typically with two layers)” (Li, p. 2. § 1 Introduction)

“The generated node embeddings can then be used as input to any differentiable prediction layer, for example, a softmax layer for node classification.” (Li, p. 4, § 2.1 Graph Learning with Few Labels)
This discloses “multiple aggregation layers” plus an “output layer”.

“and the determining the second classification prediction vector of the first unlabeled node based on the trained first graph neural network includes: performing, at an aggregation layer in the multiple aggregation layers, ” – Li teaches this limitation in part. Li explicitly teaches representing the relational/graph structure using an adjacency matrix A with matrix elements A(i, j):
“The graph structure is represented by the adjacent matrix                 
                    A
                    ∈
                    
                            R
                        
                            n
                            x
                            n
                        
            , where                 
                    A
                    
                            i
                            ,
                             
                            j
                        
                    ∈
                    {
                    0,1
                    }
                
            ” (Li, p. 6, § 3 Problem Statement)
This discloses edge relationships, thus satisfying this limitation. 

“and determining, ” – Li teaches this limitation. Li teaches a multi-layer message passing / aggregation architecture, 
“graph convolutional networks (GCNs) typically with two layers)” (Li, p. 2, § 1 Introduction)
This discloses that “upper aggregation layer” outputs node embeddings/vectors that serve as inputs to a subsequent layer. 

And Li’s message passing layers compute updated node embeddings by aggregating/transformation of neighbor features/messages:
“aggregates and transforms the features of neighboring nodes to learn node embeddings …” (Li, p. 2, § 1 Introduction)

“and processing, at the output layer, an aggregation vector output by a last aggregation layer for the first unlabeled node, to obtain the second classification prediction vector.”  – Li teaches this limitation. Li discloses that the node embeddings produced by the (last) message-passing/aggregation layers are input to a prediction layer (softmax) for classification: 
“The generated node embeddings can then be used as input to any differentiable prediction layer, for example, a softmax layer for node classification.” (Li, p. 4, § 2.1 Graph Learning with Few Labels)

Li does not teach these limitations: 
“… random zeroing processing on a matrix element …”
“… based on the adjacency matrix after the random zeroing processing… ”

Chen, however, teaches these limitations: 
“… random zeroing processing on a matrix element …” – Chen teaches this limitation. Chen discloses: 
“DropEdge … randomly removes a certain number of edges from the input graphs” (Chen, col. 9, lines 37-39)

“the stochastic binary masks (i.e., 1 is sampled and O is dropped)” (Chen, col. 7, lines 7-8)
Randomly removing edges or applying binary masks where “0 is dropped” corresponds to randomly setting one or more adjacency matrix elements to zero (i.e., “random zeroing processing on a matrix element”). 

“… based on the adjacency matrix after the random zeroing processing… ” – Chen teaches that after dropout/message-dropout, the messages/vectors used in the propagation layer are modified by random dropping/masking:
“Message Dropout … randomly drops the outgoing messages in each propagation layers to refine representations.” (Chen, col. 9, lines 40-42)

Li provides the GNN architecture and representation pipeline (aggregation/message passing layers producing embeddings, then a prediction/output layer). Chen teaches DropEdge and layerwise binary masking that “randomly removes … edges” and “0 is dropped”, i.e., random zeroing of adjacency/edge entries used in message passing. A POSITA would have been motivated to apply Chen’s edge-dropping/masking during Li’s message-passing aggregation to regularize training and improve robustness, with a reasonable expectation of success. 

Regarding claim 6, Gal teaches the method according to claim 4, wherein the determining the second classification prediction vector of the unlabeled node based on the trained first graph neural network includes: 
“performing for multiple times an operation of determining the second classification prediction vector to correspondingly obtain multiple second classification prediction vectors;” – Gal teaches:
“We refer to this Monte Carlo estimate as MC dropout. In practice this is equivalent to performing                 
                    T
                
             stochastic forward passes through the network and averaging the results.” (Gal, p. 4, § 4. Obtaining Model Uncertainty)
Gal’s disclosed “performing for multiple times” corresponds to “performing T stochastic forward passes”, where each stochastic forward pass yields a prediction output (i.e., a classification vector). Thus, repeating the prediction operation produces “multiple second classification prediction vectors”. 

“wherein the determining the second information entropy according to the second classification prediction vector includes:” – Gal expressly teaches determining entropy of the model prediction to quantify uncertainty/information:
“Model uncertainty in such cases can be quantified by looking at the entropy or variation ratios of the model prediction.” (Gal, p. 6, § 5.2 Model Uncertainty in Classification Tasks)
Accordingly, entropy is determined “according to” each prediction vector produced by the forward pass. 

“determining an average value of multiple pieces of information entropy respectively corresponding to the multiple second classification prediction vectors as the second information entropy.” – Gal teaches (i) repeated stochastic forward passes and (ii) averaging across the results, and separately teaches using entropy of predictions as the information/uncertainty measure: 
“… performing                 
                    T
                
             stochastic forward passes through the network and averaging the results.” (Gal, p. 4, § 4. Obtaining Model Uncertainty)

“… quantified by looking at the entropy … of the model prediction.” (Gal, p. 6, § 5.2 Model Uncertainty in Classification Tasks)
A POSITA would have found it obvious to compute the entropy for each prediction vector obtained from the multiple stochastic forward passes and to determine an average of those entropy values as an aggregated (“second”) entropy estimate, consistent with Gal’s MC-dropout averaging approach and entropy-based uncertainty quantification.

Claim 6 depends from claim 4, which already provides generating a “second classification prediction vector” using stochastic masking in the GNN. Gal’s MC dropout explicitly teaches repeating stochastic forward passes multiple times and averaging results, and teaches entropy of model predictions as a quantitative uncertainty/information measure. Thus, applying Gal’s repeated stochastic evaluation and averaging to the claim 4 second-prediction-vector generation yields claim 6’s “multiple second classification prediction vectors” and “average value of multiple pieces of information entropy”, with a reasonable expectation of success.

Regarding claim 7, Li teaches the method according to claim 1, wherein the updating the model parameter in the current graph neural network according to the classification prediction vector and the real classification label that are corresponding to each labeled node in the multiple nodes, and the classification prediction vector, the pseudo classification label, and the information gain that are corresponding to each unlabeled node includes: 
“determining a first loss term according to the classification prediction vector and the real classification label that are corresponding to each labeled node;” – Li teaches:
“We use the SCE loss to optimize GCN for node classification:                 
                    
                            l
                        
                            L
                        
                            y
                            ,
                            
                                    f
                                
                                    θ
                                
                                    x
                                
                    =
                    -
                    
                            ∑
                            
                                i
                                ∈
                                L
                            
                                    y
                                
                                    i
                                
                                    log
                                
                                ⁡
                                
                                                    f
                                                
                                                    θ
                                                
                                                            x
                                                        
                                                            i
                                                        
            .” (Li, p. 7, § 4.2 The GNN Encoder)
Li’s equation (3) is a loss term computed from the model output                 
                    
                                    f
                                
                                    θ
                                
                                            x
                                        
                                            i
                                        
             (classification prediction vector) and the true label (                
                    
                            y
                        
                            i
                        
            ) for labeled nodes (                
                    
                            l
                        
                            L
                        
            ), meeting the “first loss term” limitation.

“determining a second loss term for each unlabeled node according to the classification prediction vector and the pseudo classification label that are corresponding to each unlabeled node,” – Li teaches a pseudo-labeling paradigm with an unlabeled-node loss term:
“                
                    U
                
             denotes the set of unlabeled nodes.” (Li, p. 6, § 3 Problem Statement)

“                
                    
                            m
                            i
                            n
                        
                            θ
                        
                    J
                    =
                    
                            ∑
                            
                                        x
                                    
                                        i
                                    
                                ∈
                                L
                            
                                    l
                                
                                    L
                                
                                            y
                                        
                                            i
                                        
                                    ,
                                     
                                            f
                                        
                                            θ
                                        
                                                    x
                                                
                                                    i
                                                
                            +
                            
                                    ∑
                                    
                                                x
                                            
                                                i
                                                ∈
                                                U
                                            
                                            l
                                        
                                            U
                                        
                                            Y
                                            
                                                            x
                                                        
                                                            i
                                                        
                                            ,
                                             
                                                    f
                                                
                                                    θ
                                                
                                                            x
                                                        
                                                            i
                                                        
                                    *
                                    J
                                    (
                                    
                                            x
                                        
                                            i
                                        
                                    )
                                
            ” (Li, p. 6, §§ 3 Problem Statement - Problem 1) 

Li also ties pseudo labels to model predictions:
“We then produce the pseudo labels for                 
                    
                            U
                        
                            p
                        
             utilizing the GNN encoder                 
                    
                            f
                        
                            θ
                        
                    (
                    ∙
                    )
                
            :                 
                    
                                    y
                                
                                ^
                            
                            v
                        
                    =
                    
                            a
                            r
                            g
                            m
                            a
                            x
                        
                            j
                        
                            f
                        
                            θ
                        
                            (
                            
                                    x
                                
                                    v
                                
                            )
                        
                            j
                        
                    ;
                    v
                    ∈
                    
                            U
                        
                            p
                        
            ” (Li, p. 10, § 4.3 Candidate Selection for Pseudo Labelling)
Li’s disclosed equation (1) expressly includes a second loss                 
                    
                            l
                        
                            U
                        
                    (
                    ∙
                    )
                
             over unlabeled nodes                 
                    
                            x
                        
                            i
                        
                    ∈
                    U
                
             based on pseudo label                 
                    Y
                    
                                    x
                                
                                    i
                                
             and the prediction vector                 
                    
                            f
                        
                            θ
                        
                                    x
                                
                                    i
                                
            . The pseudo-label generates step (argmax of model outputs) provides the “pseudo classification label”.
“and weighting the second loss term by using the information gain corresponding to the unlabeled node;” – Li’s equation (1) explicitly weights the unlabeled/pseudo-label loss by a per-node factor:
“                
                    
                            m
                            i
                            n
                        
                            θ
                        
                    J
                    =
                    
                            ∑
                            
                                        x
                                    
                                        i
                                    
                                ∈
                                L
                            
                                    l
                                
                                    L
                                
                                            y
                                        
                                            i
                                        
                                    ,
                                     
                                            f
                                        
                                            θ
                                        
                                                    x
                                                
                                                    i
                                                
                            +
                            
                                    ∑
                                    
                                                x
                                            
                                                i
                                                ∈
                                                U
                                            
                                            l
                                        
                                            U
                                        
                                            Y
                                            
                                                            x
                                                        
                                                            i
                                                        
                                            ,
                                             
                                                    f
                                                
                                                    θ
                                                
                                                            x
                                                        
                                                            i
                                                        
                                    *
                                    J
                                    (
                                    
                                            x
                                        
                                            i
                                        
                                    )
                                
            ” (Li, p. 6, §§ 3 Problem Statement - Problem 1) 

Li also teaches a per-node “informativeness/representativeness” score used for pseudo-labeled decisions (i.e., an information-based quantity per node):
“We utilize this affinity to define the informative-ness score for each node:                 
                    
                            s
                        
                            r
                        
                            v
                        
                    =
                    D
                    (
                    φ
                    
                                    h
                                
                                    v
                                
                    ,
                    ϕ
                    
                                    H
                                
                                            N
                                        
                                            v
                                        
                    )
                
            ” (Li, p. 9, §§ 4.3 Candidate Selection for Pseudo Labeling – Pseudo Labeling)

And explains this is based on MI maximization:
“we employ MI maximization techniques … to estimate the MI” (Li, p. 8, §§ 4.3 Candidate Selection for Pseudo Labelling - Informativeness Measure by MI Maximization)

Li expressly teaches weighting the unlabeled-node loss term by a per-node weighing factor                 
                    (
                    J
                    (
                    
                            x
                        
                            i
                        
                    )
                    )
                
            . Li also expressly teaches determining, for each node, an informativeness/representativeness score based on MI maximization, which is an information-based quantity per node. A POSITA would have understood Li’s per-node informativeness/representativeness score as a form of “information gain/information measure” associated with the unlabeled node and would have found it obvious to implement the per-node weighting coefficient                 
                    J
                    (
                    
                            x
                        
                            i
                        
                    )
                
             as (or derived from) such informativeness/representativeness (information gain) values, because Li’s framework explicitly uses node informativeness for pseudo-labeling and explicitly weights the unlabeled-node loss by a per-node factor.

“and updating the model parameter according to the first loss term and the weighted second loss term.” – Li expressly describes optimizing/training using combined losses including labeled loss and pseudo-label loss:
“Finally, both given labels and pseudo labels are used to re-train the GNN by minimizing the following loss function (Step 9):                 
                    l
                    =
                    
                            l
                        
                            L
                        
                    +
                    
                            l
                        
                            T
                        
                    +
                    
                            α
                            l
                        
                            I
                        
                    +
                    
                            β
                            l
                        
                            K
                            L
                        
            ” (Li, p. 12, § 4.6 Model Training and Computational Complexity)
Li teaches model training/retraining by minimizing a loss function that includes a labeled-node loss component and pseudo-label components, which inherently updates model parameters according to those loss terms.

Claim 7 depends from claim 1; therefore, the same motivation to combine applied to claim 1 is also applied to claim 7.

Regarding claim 8, Li teaches the method according to claim 7, wherein the weighting the second loss term by using the information gain corresponding to the unlabeled node includes: 
“normalizing the information gain of each unlabeled node by using a first quantity of information gains corresponding to the first quantity of unlabeled nodes, to obtain a corresponding weighting coefficient;” – Li teaches that a per-node weighting coefficient exists (unlabeled loss is multiplied by a coefficient):
“                
                    
                            m
                            i
                            n
                        
                            θ
                        
                    J
                    =
                    
                            ∑
                            
                                        x
                                    
                                        i
                                    
                                ∈
                                L
                            
                                    l
                                
                                    L
                                
                                            y
                                        
                                            i
                                        
                                    ,
                                    
                                            f
                                        
                                            θ
                                        
                                                    x
                                                
                                                    i
                                                
                            +
                            
                                    ∑
                                    
                                                x
                                            
                                                i
                                            
                                        ∈
                                        U
                                    
                                            l
                                        
                                            U
                                        
                                            Y
                                            
                                                            x
                                                        
                                                            i
                                                        
                                            ,
                                            
                                                    f
                                                
                                                    θ
                                                
                                                            x
                                                        
                                                            i
                                                        
                                    *
                                    l
                                    (
                                    
                                            x
                                        
                                            i
                                        
                                    )
                                
            .” (Li, p. 6, §§ 3 Problem Statement - Problem 1)

Li’s disclosed framework teaches normalization/averaging over a set, where                 
                    
                            f
                            
                                    (
                                    X
                                    )
                                
                                    j
                                
                        -
                    
             is the mean value of prediction probability distribution over pseudo labels, which is calculated as follows:
“                
                    
                            f
                            
                                    (
                                    X
                                    )
                                
                                    j
                                
                        -
                    
                    =
                    
                            1
                        
                                            U
                                        
                                            p
                                        
                            ∑
                            
                                        x
                                    
                                        i
                                    
                                ∈
                                
                                        U
                                    
                                        p
                                    
                                    f
                                    (
                                    
                                            x
                                        
                                            i
                                        
                                    )
                                
                                    j
                                
            ” (Li, p. 11, § 4.5 Class-balanced Regularization)

Li expressly teaches (i) a per-unlabeled-node weighting coefficient (                
                    J
                    (
                    
                            x
                        
                            i
                        
                    )
                
            ) applied to the unlabeled/pseudo-label loss and (ii) computing a normalized/averaged quantity over a set (mean over pseudo labels). While Li does not explicitly use the word “normalize” specifically for “information gain” scores, a POSITA would have found it obvious to normalize per-node information-based scores (e.g., representativeness/informativeness scores) across the selected unlabeled-node set to generate stable weighting coefficients, because normalization of weights across a set is a routine stabilizing technique in optimization and Li itself uses set-based averaging/normalization operations (equation (17)) within the same training framework.

“and performing weighting processing by using the weighting coefficient.” – Li expressly performs weighting by multiplying the unlabeled-node (pseudo-label) loss term                         
                            
                                    l
                                
                                    U
                                
                            (
                            ∙
                            )
                        
                     by a per-unlabeled node weighting factor                         
                            J
                            (
                            
                                    x
                                
                                    i
                                
                            )
                        
                    : 
“                
                    
                            m
                            i
                            n
                        
                            θ
                        
                    J
                    =
                    
                            ∑
                            
                                        x
                                    
                                        i
                                    
                                ∈
                                L
                            
                                    l
                                
                                    L
                                
                                            y
                                        
                                            i
                                        
                                    ,
                                     
                                            f
                                        
                                            θ
                                        
                                                    x
                                                
                                                    i
                                                
                            +
                            
                                    ∑
                                    
                                                x
                                            
                                                i
                                                ∈
                                                U
                                            
                                            l
                                        
                                            U
                                        
                                            Y
                                            
                                                            x
                                                        
                                                            i
                                                        
                                            ,
                                             
                                                    f
                                                
                                                    θ
                                                
                                                            x
                                                        
                                                            i
                                                        
                                    *
                                    J
                                    (
                                    
                                            x
                                        
                                            i
                                        
                                    )
                                
            ” (Li, p. 6, §§ 3 Problem Statement - Problem 1) 

Claim 8 depends from claim 1; therefore, the same motivation to combine applied to claim 1 is also applied to claim 8.

Regarding claims 9-16 (apparatus / computing system claims)
Each of claims 9-16 is the apparatus/system analog of a corresponding method claim previously mapped, namely (9 [Wingdings font/0xDF][Wingdings font/0xE0] 1), (10 [Wingdings font/0xDF][Wingdings font/0xE0] 2), (11 [Wingdings font/0xDF][Wingdings font/0xE0] 3), (12 [Wingdings font/0xDF][Wingdings font/0xE0] 4), (13 [Wingdings font/0xDF][Wingdings font/0xE0] 5), (14 [Wingdings font/0xDF][Wingdings font/0xE0] 6), (15 [Wingdings font/0xDF][Wingdings font/0xE0] 7), and (16 [Wingdings font/0xDF][Wingdings font/0xE0] 8). As such, the “actions” recited as being implemented by “at least one processor” executing “executable instructions” stored in “at least one memory” corresponds to the same limitations previously addressed for method claims 1-8, respectively, and are taught for the same reasons by the same disclosures cited for those method claims. Accordingly, because the system claims merely recast the same limitations as the corresponding method claims in terms of a processor and memory executing instructions, and because the functional actions required by the system claims are taught and would have been obvious for the same reasons set forth for method claims 1-8, claims 9-16 are unpatentable over Li in view of Gal and further in view of Chen, as applied. 

Regarding claims 17-20 (non-transitory computer-readable storage medium claims)
Each of claims 17-20 is the CRM analog of a corresponding method claim previously mapped, namely (17 [Wingdings font/0xDF][Wingdings font/0xE0] 1), (18 [Wingdings font/0xDF][Wingdings font/0xE0] 2), (19 [Wingdings font/0xDF][Wingdings font/0xE0] 3), and (20 [Wingdings font/0xDF][Wingdings font/0xE0] 4). The recited “non-transitory computer-readable storage medium” storing “computer executable instructions” which, when executed by a processor, cause the processor to perform the claimed actions, corresponds to implementing the same method limitations in software form. Therefore, the “actions comprising” in claims 17-20 are taught/obvious for the same reasons as the corresponding method claims. Accordingly, because claims 17-20 merely store and execute instructions to carry out the same limitations as the corresponding method claims, and because those limitations are taught/obvious for the same reasons set forth for the method claims 1-4, claims 17-20 are unpatentable over Li in view of Gal and further in view of Chen, as applied. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Paul Coleman whose telephone number is (571)272-4687. The examiner can normally be reached Mon-Fri.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PAUL COLEMAN/               Examiner, Art Unit 2126                                                                

/DAVID YI/               Supervisory Patent Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Apr 24, 2023
Application Filed
Feb 27, 2026
Non-Final Rejection mailed — §101, §103
May 27, 2026
Applicant Interview (Telephonic)
May 27, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

17/789,333
Patent 12620453
METHOD, APPARATUS, AND COMPUTER PROGRAM FOR PREDICTING INTERACTION OF COMPOUND AND PROTEIN
3y 10m to grant Granted May 05, 2026
17/605,731
Patent 12614105
METHOD AND DEVICE FOR USE IN DATA PROCESSING, AND MEDIUM
4y 6m to grant Granted Apr 28, 2026
18/036,312
Patent 12597489
METHOD, DEVICE, AND COMPUTER PROGRAM FOR PREDICTING INTERACTION BETWEEN COMPOUND AND PROTEIN
2y 11m to grant Granted Apr 07, 2026
17/662,696
Patent 12574861
METHOD AND SYSTEM FOR ACCELERATING DISTRIBUTED PRINCIPAL COMPONENTS WITH NOISY CHANNELS
3y 10m to grant Granted Mar 10, 2026
17/551,708
Patent 12443678
STEPWISE UNCERTAINTY-AWARE OFFLINE REINFORCEMENT LEARNING UNDER CONSTRAINTS
3y 10m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
64%
Grant Probability
99%
With Interview (+45.5%)
3y 9m (~8m remaining)
Median Time to Grant
Low
PTA Risk
Based on 14 resolved cases by this examiner. Grant probability derived from career allowance rate.
TRAINING METHOD AND APPARATUS FOR GRAPH NEURAL NETWORK

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

TRAINING METHOD AND APPARATUS FOR GRAPH NEURAL NETWORK

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email