Last updated: May 29, 2026
Application No. 18/096,021
METHOD AND APPARATUS FOR LEARNING GRAPH REPRESENTATION FOR OUT-OF-DISTRIBUTION GENERALIZATION, DEVICE AND STORAGE MEDIUM

Non-Final OA §103
Filed
Jan 12, 2023
Priority
Mar 08, 2022 — CN 202210227151.5
Examiner
LEY, SALLY THI
Art Unit
2147
Tech Center
2100 — Computer Architecture & Software
Assignee
Tsinghua University
OA Round
1 (Non-Final)
This examiner grants 19% of cases after interview

— +33.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 36 resolved cases, 2023–2026
Examiner Intelligence

LEY, SALLY THI View full profile →
Grants only 19% of cases
Career Allowance Rate
7 granted / 36 resolved
-35.6% vs TC avg
Strong +33% interview lift
Without
With
+33.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
17 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
10.3%
-29.7% vs TC avg
§103
83.2%
+43.2% vs TC avg
§102
3.8%
-36.2% vs TC avg
§112
2.7%
-37.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 36 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
	This Office Action is in response to the communication filed on 12 Jan 2023.
	Claims 1-13 are being considered on the merits.

Drawings
	The drawings filed on 12 Jan 2023 are accepted.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-8 and 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, et. al. (arXiv:2202.05441v1 [cs.LG] 11 Feb 2022; hereinafter, “Chen”) in view of Huang, et. al (arXiv:2006.07889v4 [cs.LG] 8 Jan 2021; hereinafter, “Huang”)

Regarding claim 1, Chen teaches:
A method for learning graph representations for out-of-distribution generalization, comprising: inputting an original graph dataset into a graph structured data representation network, (Chen, sec. 2: “In graph classification, we are given a set of N graphs {Gi}N i=1 ⊆ G and their labels {Yi}N i=1 ⊆ Y = Rc from c classes. Then, we train a GNN hθ ◦ ρ with an encoder hθ : G → Rh that learns a meaningful representation rG for each graph G to help predict their labels yG = ρ(rG) with a downstream classifier ρ : Rh → Y.” Examiner notes Chen teaches a graph dataset input into a graph neural network that produces representations).
wherein the graph structured data representation network comprises a first graph neural network and a second graph neural network; (Chen Fig. 1 and sec. 4.2: “Inspired by the rationales of GNN reasoning uncovered by Xu et al. (2020), we propose the GOOD framework that explicitly aligns with the two sub-processes in Eq. 4, by decoupling the model into a featurizer and a classifier. Specifically, the featurizer g : G → Gc aims to identify the underlying Gc, and the classifier fc : Gc → Y will predict the label Y based on the estimated Gc.” Examiner notes Chen teaches a first graph neural work as a featurizer g and a second graph neural network as a classifier fc)
identifying a stable subgraph and a noise subgraph in each original graph structured data in the original graph dataset by performing an identification on the original graph structured data via the first graph neural network, and obtaining identified graph structured data; (Chen, assumption 3.1 and figure 2: “In Assumption 3.1, C and S control the generation of the adjacency matrices and features of the invariant subgraph Gc and spurious subgraph Gs through two pairs of latent variables (Zc A,Zc X) and (Zs A, Zs X), respectively”  Examiner notes Chen teaches identifying a stable subgraph as an “invariant subgraph” and the remaining subpart of a graph as the noise subgraph Gs)
obtaining a vectorized representation of the stable subgraph and a vectorized representation of the noise subgraph by performing representation processing on the identified graph structured data via the second graph neural network; (Chen, sec. 3.3: “Moreover, the architecture of CIGA can have multiple other implementations for both the featurizer and classifier, such as identifying Gc at the latent space” Examiner notes Chen teaches vectorized representations of all subgraphs as implicitly required for use in the classification neural network). 
simulating a multi-distribution environment according to the vectorized representation of the noise subgraph, and obtaining a corresponding prediction result by predicting m the multi-distribution environment according to the vectorized representation of the stable subgraph; (Chen, sec. 2: “Invariant learning typically considers a supervised learning setting based on the data D = {De}e collected from multiple environments Eall, where De = {Ge i,ye i}ne i=1 is the dataset from environment e ∈ Eall, ne is the number of instances in environment e, and Ge i ∈ G and ye i ∈ Y correspond to the input graph and the label for the i-th instance from De.”)
for each original graph structured data in the original graph dataset, calculating a loss function based on the prediction result and a label of the original graph structured data, (Chen, sec. 4.2: “Optimization objective. To train the model end-to-end, we merge I( ˆ Gc,Y ) the empirical risk via the variational bound…[equations omitted]… where R( ˆGc,fc) is the empirical loss of fc based on ˆ Gc.”)
performing a parameter optimization on the graph structured data representation network, and (Chen, sec. 4.2: “. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc). Once the underlying Gc is successfully disentangled, the predictions made by the classifier are invariant to distribution shifts introduced by E, in the sense of Proposition 4.1” Examiner notes Chen teaches a classifier which is optimizable via change in its parameters)
obtaining a graph structured data representation model; and (Chen, sec. 4.2: “Specifically, the featurizer g : G → Gc aims to identify the underlying Gc, and the classifier fc : Gc → Y will predict the label Y based on the estimated Gc.”)
executing a graph data-related task by using the graph structured data representation model, and obtaining a target result of the graph data-related task. (Chen, sec. 4.2: “Classifier. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc).” Examiner notes Chen teaches a data-related task of graph classification). 

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Huang into Chen. Chen teaches generalizing the invariance principle to enable OOD generalization of GNNs; Huang teaches a meta-learning algorithm for graphs. One of ordinary skill would have been motivated to combine the teachings of Huang into Chen in order to enable scalable meta learning problems on graph-structured data (Huang, sec. 1). 

Regarding claim 4, Chen as modified teaches:
The method according to claim 1, wherein simulating the multi-distribution environment according to the vectorized representation of the noise subgraph and obtaining the corresponding prediction result by predicting in the multi-distribution environment according to the vectorized representation of the stable subgraph comprises: (Chen, sec. 2: “Invariant learning typically considers a supervised learning setting based on the data D = {De}e collected from multiple environments Eall, where De = {Ge i,ye i}ne i=1 is the dataset from environment e ∈ Eall, ne is the number of instances in environment e, and Ge i ∈ G and ye i ∈ Y correspond to the input graph and the label for the i-th instance from De.”) 
simulating the multi-distribution environment by performing clustering calculation on the vectorized representation of the noise subgraph; and (Huang, sec. 6: “Then, we compute the Graphlet Distribution Vector [31] for each node, which characterizes the local graph structures and then we apply spectral clustering on this vector to generate the labels.”) 
obtaining the corresponding prediction result by executing, according to the vectorized representation of the stable subgraph, a corresponding prediction task in the multi-distribution environment. (Chen, sec. 4.2: “Specifically, the featurizer g : G → Gc aims to identify the underlying Gc, and the classifier fc : Gc → Y will predict the label Y based on the estimated Gc.” Examiner notes Chen teaches a corresponding classifier prediction to a featurizer where vectorized representations of all subgraphs is implicitly required for use in the classification neural network). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Huang into Chen as set forth above with respect to claim 1. 

Regarding claim 5, Chen as modified teaches: 
The method according to claim 1, wherein for each original graph structured data in the original graph dataset, calculating the loss function based on the prediction result and the label of the original graph structured data, (Chen, sec. 4.2: “Optimization objective. To train the model end-to-end, we merge I( ˆ Gc,Y ) the empirical risk via the variational bound…[equations omitted]… where R( ˆGc,fc) is the empirical loss of fc based on ˆ Gc.”) performing the parameter optimization on the graph structured data representation network, (Chen, sec. 4.2: “. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc). Once the underlying Gc is successfully disentangled, the predictions made by the classifier are invariant to distribution shifts introduced by E, in the sense of Proposition 4.1” Examiner notes Chen teaches a classifier which is optimizable via change in its parameters) and obtaining the graph structured data representation model comprises: (Chen, sec. 4.2: “Specifically, the featurizer g : G → Gc aims to identify the underlying Gc, and the classifier fc : Gc → Y will predict the label Y based on the estimated Gc.”)
for each original graph structured data in the original graph dataset, calculating the loss function based on the prediction result and the label of the original graph structured data, and obtaining a corresponding loss value; and (Chen, sec. 4.2: “Optimization objective. To train the model end-to-end, we merge I( ˆ Gc,Y ) the empirical risk via the variational bound…[equations omitted]… where R( ˆGc,fc) is the empirical loss of fc based on ˆ Gc.”)
obtaining the graph structured data representation model by performing gradient updating on the graph structured data representation network according to the loss value. (Huang, sec. 5: “During meta-training inner loop, we perform the regular stochastic gradient descent on the support loss for each task Ti: θj = θj−1−α∇Lsupport.”)

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Huang into Chen as set forth above with respect to claim 1. 


Regarding claim 6, Chen as modified teaches: 
The method according to claim 1, wherein executing the graph data-related task by using the graph structured data representation model and obtaining the target result of the graph data-related task comprises: (Chen, sec. 4.2: “Classifier. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc).” Examiner notes Chen teaches a data-related task of graph classification).
using the graph structured data representation model to receive a graph dataset corresponding to the graph data related task; (Chen, sec. 2: “In graph classification, we are given a set of N graphs {Gi}N i=1 ⊆ G and their labels {Yi}N i=1 ⊆ Y = Rc from c classes. Then, we train a GNN hθ ◦ ρ with an encoder hθ : G → Rh that learns a meaningful representation rG for each graph G to help predict their labels yG = ρ(rG) with a downstream classifier ρ : Rh → Y.” Examiner notes Chen teaches a graph dataset input into a graph neural network that produces representations).
obtaining graph representation vectors (Huang, sec. 6: “Then, we compute the Graphlet Distribution Vector [31] for each node, which characterizes the local graph structures and then we apply spectral clustering on this vector to generate the labels.”) corresponding to each graph structured data in the graph dataset by performing representation on the graph dataset; and (Chen, sec. 3.3: “Moreover, the architecture of CIGA can have multiple other implementations for both the featurizer and classifier, such as identifying Gc at the latent space” Examiner notes Chen teaches vectorized representations of all subgraphs as implicitly required for use in the classification neural network).
obtaining the target result by predicting, based on the graph representation vectors, with respect to a corresponding task target. (Chen, sec. 4.2: “Classifier. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc).” Examiner notes Chen teaches a task of graph classification where the target result is a classification corresponds to the task of classifying).

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Huang into Chen as set forth above with respect to claim 1. 

Regarding claim 7, Chen teaches: 
A non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements operations of: (Chen, sec. E6: “We implement our methods with PyTorch (Paszke et al., 2019) and PyTorch Geometric (Fey and Lenssen, 2019). We ran our experiments on Linux Servers with 40 cores Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, 256 GB Memory, and Ubuntu 18.04 LTS installed. GPU environments are varied from 4 NVIDIA RTX 2080Ti graphics cards with CUDA 10.2, 2 NVIDIA RTX 2080Ti and 2 NVIDIA RTX 3090Ti graphics cards with CUDA 11.3, and NVIDIA TITAN series with CUDA 11.3.”)
inputting an original graph dataset into a graph structured data representation network, (Chen, sec. 2: “In graph classification, we are given a set of N graphs {Gi}N i=1 ⊆ G and their labels {Yi}N i=1 ⊆ Y = Rc from c classes. Then, we train a GNN hθ ◦ ρ with an encoder hθ : G → Rh that learns a meaningful representation rG for each graph G to help predict their labels yG = ρ(rG) with a downstream classifier ρ : Rh → Y.” Examiner notes Chen teaches a graph dataset input into a graph neural network that produces representations) wherein the graph structured data representation network comprises a first graph neural network and a second graph neural network; (Chen Fig. 1 and sec. 4.2: “Inspired by the rationales of GNN reasoning uncovered by Xu et al. (2020), we propose the GOOD framework that explicitly aligns with the two sub-processes in Eq. 4, by decoupling the model into a featurizer and a classifier. Specifically, the featurizer g : G → Gc aims to identify the underlying Gc, and the classifier fc : Gc → Y will predict the label Y based on the estimated Gc.” Examiner notes Chen teaches a first graph neural work as a featurizer g and a second graph neural network as a classifier fc)
identifying a stable subgraph and a noise subgraph in each original graph structured data in the original graph dataset by performing an identification on the original graph structured data via the first graph neural network, and obtaining identified graph structured data; (Chen, assumption 3.1 and figure 2: “In Assumption 3.1, C and S control the generation of the adjacency matrices and features of the invariant subgraph Gc and spurious subgraph Gs through two pairs of latent variables (Zc A,Zc X) and (Zs A, Zs X), respectively”  Examiner notes Chen teaches identifying a stable subgraph as an “invariant subgraph” and the remaining subpart of a graph as the noise subgraph Gs)
obtaining a vectorized representation (Huang, sec. 6: “Then, we compute the Graphlet Distribution Vector [31] for each node, which characterizes the local graph structures and then we apply spectral clustering on this vector to generate the labels.”) of the stable subgraph and a vectorized representation of the noise subgraph by performing representation processing on the identified graph structured data via the second graph neural network; (Chen, sec. 3.3: “Moreover, the architecture of CIGA can have multiple other implementations for both the featurizer and classifier, such as identifying Gc at the latent space” Examiner notes Chen teaches vectorized representations of all subgraphs as implicitly required for use in the classification neural network).
simulating a multi-distribution environment according to the vectorized representation of the noise subgraph, and obtaining a corresponding prediction result by predicting m the multi-distribution environment according to the vectorized representation of the stable subgraph; (Chen, sec. 2: “Invariant learning typically considers a supervised learning setting based on the data D = {De}e collected from multiple environments Eall, where De = {Ge i,ye i}ne i=1 is the dataset from environment e ∈ Eall, ne is the number of instances in environment e, and Ge i ∈ G and ye i ∈ Y correspond to the input graph and the label for the i-th instance from De.”)
for each original graph structured data in the original graph dataset, calculating a loss function based on the prediction result and a label of the original graph structured data, (Chen, sec. 4.2: “Optimization objective. To train the model end-to-end, we merge I( ˆ Gc,Y ) the empirical risk via the variational bound…[equations omitted]… where R( ˆGc,fc) is the empirical loss of fc based on ˆ Gc.”) performing a parameter optimization on the graph structured data representation network, and  (Chen, sec. 4.2: “. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc). Once the underlying Gc is successfully disentangled, the predictions made by the classifier are invariant to distribution shifts introduced by E, in the sense of Proposition 4.1” Examiner notes Chen teaches a classifier which is optimizable via change in its parameters) obtaining a graph structured data representation model; and (Chen, sec. 4.2: “Specifically, the featurizer g : G → Gc aims to identify the underlying Gc, and the classifier fc : Gc → Y will predict the label Y based on the estimated Gc.”)
executing a graph data-related task by using the graph structured data representation model, and obtaining a target result of the graph data-related task. (Chen, sec. 4.2: “Classifier. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc).” Examiner notes Chen teaches a data-related task of graph classification).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Huang into Chen as set forth above with respect to claim 1. 

Regarding claim 8, Chen teaches: 
An electronic device, comprising a memory, a processor, and a computer program that is stored in the memory and is executable in the processor, wherein the computer program, when executed by the processor, causing the electronic device to implement operations comprising: (Chen, sec. E6: “We implement our methods with PyTorch (Paszke et al., 2019) and PyTorch Geometric (Fey and Lenssen, 2019). We ran our experiments on Linux Servers with 40 cores Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, 256 GB Memory, and Ubuntu 18.04 LTS installed. GPU environments are varied from 4 NVIDIA RTX 2080Ti graphics cards with CUDA 10.2, 2 NVIDIA RTX 2080Ti and 2 NVIDIA RTX 3090Ti graphics cards with CUDA 11.3, and NVIDIA TITAN series with CUDA 11.3.”)
inputting an original graph dataset into a graph structured data representation network, (Chen, sec. 2: “In graph classification, we are given a set of N graphs {Gi}N i=1 ⊆ G and their labels {Yi}N i=1 ⊆ Y = Rc from c classes. Then, we train a GNN hθ ◦ ρ with an encoder hθ : G → Rh that learns a meaningful representation rG for each graph G to help predict their labels yG = ρ(rG) with a downstream classifier ρ : Rh → Y.” Examiner notes Chen teaches a graph dataset input into a graph neural network that produces representations).
wherein the graph structured data representation network comprises a first graph neural network and a second graph neural network; (Chen Fig. 1 and sec. 4.2: “Inspired by the rationales of GNN reasoning uncovered by Xu et al. (2020), we propose the GOOD framework that explicitly aligns with the two sub-processes in Eq. 4, by decoupling the model into a featurizer and a classifier. Specifically, the featurizer g : G → Gc aims to identify the underlying Gc, and the classifier fc : Gc → Y will predict the label Y based on the estimated Gc.” Examiner notes Chen teaches a first graph neural work as a featurizer g and a second graph neural network as a classifier fc)
identifying a stable subgraph and a noise subgraph in each original graph structured data in the original graph dataset by performing an identification on the original graph structured data via the first graph neural network, and obtaining identified graph structured data; (Chen, assumption 3.1 and figure 2: “In Assumption 3.1, C and S control the generation of the adjacency matrices and features of the invariant subgraph Gc and spurious subgraph Gs through two pairs of latent variables (Zc A,Zc X) and (Zs A, Zs X), respectively”  Examiner notes Chen teaches identifying a stable subgraph as an “invariant subgraph” and the remaining subpart of a graph as the noise subgraph Gs)
obtaining a vectorized representation (Huang, sec. 6: “Then, we compute the Graphlet Distribution Vector [31] for each node, which characterizes the local graph structures and then we apply spectral clustering on this vector to generate the labels.”) of the stable subgraph and a vectorized representation of the noise subgraph by performing representation processing on the identified graph structured data via the second graph neural network; (Chen, sec. 3.3: “Moreover, the architecture of CIGA can have multiple other implementations for both the featurizer and classifier, such as identifying Gc at the latent space” Examiner notes Chen teaches vectorized representations of all subgraphs as implicitly required for use in the classification neural network).
simulating a multi-distribution environment according to the vectorized representation of the noise subgraph, and obtaining a corresponding prediction result by predicting m the multi-distribution environment according to the vectorized representation of the stable subgraph; (Chen, sec. 2: “Invariant learning typically considers a supervised learning setting based on the data D = {De}e collected from multiple environments Eall, where De = {Ge i,ye i}ne i=1 is the dataset from environment e ∈ Eall, ne is the number of instances in environment e, and Ge i ∈ G and ye i ∈ Y correspond to the input graph and the label for the i-th instance from De.”)
for each original graph structured data in the original graph dataset, calculating a loss function based on the prediction result and a label of the original graph structured data, (Chen, sec. 4.2: “Optimization objective. To train the model end-to-end, we merge I( ˆ Gc,Y ) the empirical risk via the variational bound…[equations omitted]… where R( ˆGc,fc) is the empirical loss of fc based on ˆ Gc.”) performing a parameter optimization on the graph structured data representation network, and (Chen, sec. 4.2: “. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc). Once the underlying Gc is successfully disentangled, the predictions made by the classifier are invariant to distribution shifts introduced by E, in the sense of Proposition 4.1” Examiner notes Chen teaches a classifier which is optimizable via change in its parameters) obtaining a graph structured data representation model; and (Chen, sec. 4.2: “Specifically, the featurizer g : G → Gc aims to identify the underlying Gc, and the classifier fc : Gc → Y will predict the label Y based on the estimated Gc.”)
executing a graph data-related task by using the graph structured data representation model, and obtaining a target result of the graph data-related task. (Chen, sec. 4.2: “Classifier. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc).” Examiner notes Chen teaches a data-related task of graph classification).

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Huang into Chen as set forth above with respect to claim 1. 

Regarding claim 11, Chen as modified teaches: 
The electronic device according to claim 8, wherein the processor is further configured to perform operations of: simulating the multi-distribution environment by performing clustering calculation on the vectorized representation of the noise subgraph; and  (Huang, sec. 6: “Then, we compute the Graphlet Distribution Vector [31] for each node, which characterizes the local graph structures and then we apply spectral clustering on this vector to generate the labels.”)
obtaining the corresponding prediction result by executing, according to the vectorized representation of the stable subgraph, a corresponding prediction task in the multi-distribution environment. (Chen, sec. 4.2: “Specifically, the featurizer g : G → Gc aims to identify the underlying Gc, and the classifier fc : Gc → Y will predict the label Y based on the estimated Gc.” Examiner notes Chen teaches a corresponding classifier prediction to a featurizer where vectorized representations of all subgraphs is implicitly required for use in the classification neural network).

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Huang into Chen, as modified, as set forth above with respect to claim 1. 

Regarding claim 12, Chen as modified teaches: 
The electronic device according to claim 8, wherein the processor is further configured to perform operations of: for each original graph structured data in the original graph dataset, calculating the loss function based on the prediction result and the label of the original graph structured data, and obtaining a corresponding loss value; and (Chen, sec. 4.2: “Optimization objective. To train the model end-to-end, we merge I( ˆ Gc,Y ) the empirical risk via the variational bound…[equations omitted]… where R( ˆGc,fc) is the empirical loss of fc based on ˆ Gc.”)
obtaining the graph structured data representation model by performing gradient updating on the graph structured data representation network according to the loss value. (Huang, sec. 5: “During meta-training inner loop, we perform the regular stochastic gradient descent on the support loss for each task Ti: θj = θj−1−α∇Lsupport.”)

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Huang into Chen, as modified, as set forth above with respect to claim 1. 

Regarding claim 13, Chen as modified teaches: 
The electronic device according to claim 8, wherein the processor is further configured to perform operations of: using the graph structured data representation model to receive a graph dataset corresponding to the graph data related task; (Chen, sec. 2: “In graph classification, we are given a set of N graphs {Gi}N i=1 ⊆ G and their labels {Yi}N i=1 ⊆ Y = Rc from c classes. Then, we train a GNN hθ ◦ ρ with an encoder hθ : G → Rh that learns a meaningful representation rG for each graph G to help predict their labels yG = ρ(rG) with a downstream classifier ρ : Rh → Y.” Examiner notes Chen teaches a graph dataset input into a graph neural network that produces representations).
obtaining graph representation vectors corresponding to each graph structured data in the graph dataset by performing representation on the graph dataset; and (Chen, sec. 3.3: “Moreover, the architecture of CIGA can have multiple other implementations for both the featurizer and classifier, such as identifying Gc at the latent space” Examiner notes Chen teaches vectorized representations of all subgraphs as implicitly required for use in the classification neural network).
obtaining the target result by predicting, based on the graph representation vectors, with respect to a corresponding task target. (Chen, sec. 4.2: “Classifier. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc).” Examiner notes Chen teaches a task of graph classification where the target result is a classification corresponds to the task of classifying).


Claims 2-3 and 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Huang, and further in view of Shang, et. al. (arXiv:1802.06189v1 [cs.SI] 17 Feb 2018; hereinafter, “Shang”)

Regarding claim 2, Chen as modified teaches:
The method according to claim 1, wherein identifying the stable subgraph and the noise subgraph in each original graph structured data in the original graph dataset by performing the identification on the original graph structured data via the first graph neural network comprises: (Chen, assumption 3.1 and figure 2: “In Assumption 3.1, C and S control the generation of the adjacency matrices and features of the invariant subgraph Gc and spurious subgraph Gs through two pairs of latent variables (Zc A,Zc X) and (Zs A, Zs X), respectively”  Examiner notes Chen teaches identifying a stable subgraph as an “invariant subgraph” and the remaining subpart of a graph as the noise subgraph Gs)
obtaining graph structured data having an updated node representation by updating node information of the original graph structured data; (Chen, sec. 4.2: “Classifier. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc).” Examiner notes Chen teaches updating the graph structured data input from a node by passing it through the featurizer which would update the node).  
obtaining, by calculating a similarity between nodes of the graph structured data having the updated node representation, similarities between each node and neighborhood nodes in the graph structured data; and (Shang, sec. 1 and algorithm 1: “To avoid such numerous but meaning less subgraphs, we propose to induce contrast subgraphs from the coherent cores, i.e., a subset of nodes with similar edge structures in GA and GB.” “Update ˆ g by the current S − T cut”)  
selecting, according to the similarities, nodes having similarities greater than a preset similarity threshold and edges between the nodes to form the stable subgraph, and using remaining nodes and edges to form the noise subgraph. (Shang, sec. 1: “In this paper, we derive a polynomial-time algorithm to efficiently identify coherent subgraph cores and then extract contrast subgraphs. More specifically, we apply a binary search on the coherent/contrast score and construct a network such that whether the current score is achievable is equivalent to whether the min S − T cut in the network is above a certain threshold”)

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Shang into Chen, as Shang. Shang teaches identification and use of coherent cores to extract subgraphs. One of ordinary skill would have been motivated to combine the teachings of Shang into Chen as modified in order to efficiently extract contrast subgraphs (Shang, abstract). 

Regarding claim 3, Chen as modified teaches:
The method according to claim 2, wherein obtaining the graph structured data having the updated node representation by updating node information of the original graph structured data comprises: (Chen, sec. 4.2: “Classifier. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc).” Examiner notes Chen teaches updating the graph structured data input from a node by passing it through the featurizer which would update the node).  
acquiring node information of each node in the original graph structured data; and (Chen, sec. 4.2: “Classifier. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc).” Examiner notes Chen teaches updating the graph structured data input from a node by passing the original graph information through the featurizer which would update the node).  
obtaining the graph structured data having updated node representation by performing neighborhood aggregation on each node according to the node information of each node. (Shang, sec. 1 and algorithm 1: “To avoid such numerous but meaning less subgraphs, we propose to induce contrast subgraphs from the coherent cores, i.e., a subset of nodes with similar edge structures in GA and GB.” “Update ˆ g by the current S − T cut”)  

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Shang into Chen as modified as set forth above with respect to claim 2.

Regarding claim 9, Chen as modified teaches: 
The electronic device according to claim 8, wherein the processor is further configured to perform operations of: obtaining graph structured data having an updated node representation by updating node information of the original graph structured data; (Chen, sec. 4.2: “Classifier. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc).” Examiner notes Chen teaches updating the graph structured data input from a node by passing it through the featurizer which would update the node).  
obtaining, by calculating a similarity between nodes of the graph structured data having the updated node representation, similarities between each node and neighborhood nodes in the graph structured data; and (Shang, sec. 1 and algorithm 1: “To avoid such numerous but meaning less subgraphs, we propose to induce contrast subgraphs from the coherent cores, i.e., a subset of nodes with similar edge structures in GA and GB.” “Update ˆ g by the current S − T cut”)  
selecting, according to the similarities, nodes having similarities greater than a preset similarity threshold and edges between the nodes to form the stable subgraph, and using remaining nodes and edges to form the noise subgraph. (Shang, sec. 1: “In this paper, we derive a polynomial-time algorithm to efficiently identify coherent subgraph cores and then extract contrast subgraphs. More specifically, we apply a binary search on the coherent/contrast score and construct a network such that whether the current score is achievable is equivalent to whether the min S − T cut in the network is above a certain threshold”)

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Shang into Chen as modified as set forth above with respect to claim 2.

Regarding claim 10, Chen as modified teaches: 
The electronic device according to claim 9, wherein the processor is further configured to perform operations of: acquiring node information of each node in the original graph structured data; and  (Chen, sec. 4.2: “Classifier. Given the estimated ˆ ˆ Gc from the featurizer, the classifier fc will predict the label y =fc( ˆGc).” Examiner notes Chen teaches updating the graph structured data input from a node by passing the original graph information through the featurizer which would update the node).  
obtaining the graph structured data having updated node representation by performing neighborhood aggregation on each node according to the node information of each node. (Shang, sec. 1 and algorithm 1: “To avoid such numerous but meaning less subgraphs, we propose to induce contrast subgraphs from the coherent cores, i.e., a subset of nodes with similar edge structures in GA and GB.” “Update ˆ g by the current S − T cut”)  

It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Shang into Chen as modified as set forth above with respect to claim 2.

Prior Art
Shen, et. al. (arXiv:2108.13624v1 [cs.LG] 31 Aug 2021) teaches a systemic and comprehensive discussion of the OOD generalization problem, from the definition, methodology, evaluation to the implications and future directions.
Knyazev (arXiv:1905.02850v3 [cs.LG] 28 Oct 2019) teaches graph reasoning tasks that allow us to study attention in a controlled environment. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sally T. Ley whose telephone number is (571)272-3406. The examiner can normally be reached Monday - Thursday, 10:00am - 6:00pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/STL/Examiner, Art Unit 2147                                                                                                                                                                                                        
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
Prosecution Timeline

Jan 12, 2023
Application Filed
Feb 24, 2026
Non-Final Rejection mailed — §103
Apr 21, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

17/981,796
Patent 12632746
A METHOD AND APPARATUS FOR DISPLAYING CATEGORIZED CARBON EMISSIONS
3y 6m to grant Granted May 19, 2026
16/733,393
Patent 12443830
COMPRESSED WEIGHT DISTRIBUTION IN NETWORKS OF NEURAL PROCESSORS
5y 9m to grant Granted Oct 14, 2025
16/835,892
Patent 12135927
EXPERT-IN-THE-LOOP AI FOR MATERIALS DISCOVERY
4y 7m to grant Granted Nov 05, 2024
17/992,958
Patent 11880776
GRAPH NEURAL NETWORK (GNN)-BASED PREDICTION SYSTEM FOR TOTAL ORGANIC CARBON (TOC) IN SHALE
1y 2m to grant Granted Jan 23, 2024
Study what changed to get past this examiner. Based on 4 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
19%
Grant Probability
53%
With Interview (+33.3%)
4y 8m (~1y 4m remaining)
Median Time to Grant
Low
PTA Risk
Based on 36 resolved cases by this examiner. Grant probability derived from career allowance rate.