Prosecution Insights
Last updated: April 19, 2026
Application No. 17/231,289

KNOWLEDGE GRAPH COMPRESSION

Final Rejection §103
Filed
Apr 15, 2021
Examiner
WU, NICHOLAS S
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
5 (Final)
47%
Grant Probability
Moderate
6-7
OA Rounds
3y 9m
To Grant
90%
With Interview

Examiner Intelligence

Grants 47% of resolved cases
47%
Career Allow Rate
18 granted / 38 resolved
-7.6% vs TC avg
Strong +43% interview lift
Without
With
+43.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
44 currently pending
Career history
82
Total Applications
across all art units

Statute-Specific Performance

§101
26.7%
-13.3% vs TC avg
§103
52.6%
+12.6% vs TC avg
§102
3.1%
-36.9% vs TC avg
§112
17.4%
-22.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 38 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant's arguments filed 07/29/2025 have been fully considered but they are not fully persuasive. Regarding the 103 rejections, applicant's arguments filed with respect to the prior art rejections for claims 1 and 15 have been fully considered but they are moot. Applicant has amended the claims to recite new combinations of limitations. Applicant's arguments are directed at the amendment. Please see below for new grounds of rejection, necessitated by Amendment. Regarding the 103 rejections, applicant's arguments filed with respect to the prior art rejections for claim 8 have been fully considered but they are not persuasive. Alleged no teaching of sufficient training based on a performance metric threshold In Remarks/Arguments pg. 9-10, applicant contends: “Further, Ma and Schlichtkrull, viewed individually or in combination, do not relate to ‘determining whether there is sufficient training for the KG compression ML model based on a performance metric threshold, wherein the KG compression ML model is determined to have sufficient training based on a performance metric score of the KG compression ML model exceeding the performance metric threshold’ of claim 8, as Ma and Schlichtkrull do not ‘determine whether there is sufficient training,’ let alone "based on a performance metric score.’” The relevant claim limitations appear to be: determining whether there is sufficient training for the KG compression ML model based on a performance metric threshold, wherein the KG compression ML model is determined to have sufficient training based on a performance metric score of the KG compression ML model exceeding the performance metric threshold; in claim 8. Ma teaches: (Ma, pg. 11, “We evaluate all methods using 10-fold cross validation. For training, we use the Adam optimizer (Kingma & Ba, 2014) with a tuned initial learning rate and a fixed decay rate 0:5 for every 50 epochs. We perform unsupervised training for a maximum of 200 epochs and choose the model at the best validation loss.”). In other words, Ma teaches determining whether there is a sufficient training for a compression model based on a performance metric score exceeding a performance threshold. Ma uses a 10-fold cross validation to choose the best trained coarsening model by comparing validation losses. Under the broadest reasonable interpretation, a validation loss is interpreted as a performance metric score as it reflects a model’s error rate. Choosing the best validation loss is interpreted as a performance metric score exceeding a threshold as the best validation loss exceeds a threshold of being the best loss value. Therefore, the applicant’s arguments regarding claim 8 are not persuasive. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1, 4-6, and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Ma, et al., Non-Patent Literature “Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport” (“Ma”) in view of Schlichtkrull, et al., Non-Patent Literature “Modeling Relational Data with Graph Convolutional Networks” (“Schlichtkrull”) and further in view of Akyildiz, et al., Non-Patent Literature “Understanding Coarsening for Embedding Large-Scale Graphs” (“Akyildiz”). Regarding claim 1, Ma discloses: A method comprising: encoding,…an input knowledge graph (KG) to receive a first set of node embeddings; (Ma, pg. 5 col. 1, “To extend the definition of optimal transport of two probability measures to that of two graphs G and Gc, we treat the node features from each graph as atoms of an empirical measure; G is interpreted as the input KG and node features are interpreted as node embeddings (i.e. A method comprising: encoding,…an input knowledge graph (KG) to receive a first set of node embeddings;).”). compressing, by a graph coarsening machine learning (ML) model, the input KG into an output KG, (Ma, pg. 2, col. 1, “OTCOARSENING consists of two ingredients: a parameterized graph coarsening strategy in the algebraic multigrid (AMG) style; and an optimal transport that minimizes the structural transportation between two consecutive graphs in the hierarchy. The “OT” part of the name comes from Optimal Transport. We show that this unsupervised approach produces meaningful coarse graphs that are structure preserving [compressing, by a graph coarsening machine learning (ML) model, the input KG into an output KG,]”). wherein the graph coarsening machine learning model is trained to select top ranked nodes of the input KG to include in the output KG; (Ma, pg. 4 col. 2, “For a coarsening into m nodes, we pick the top m values of α and list them in the sorted order. Denote by αs ∈ R m×1 such a vector, where the subscript s means sorted and picked. We similarly denote by Abs ∈ R n×m the column-sorted and picked version of Ab [wherein the graph coarsening machine learning model is trained to select top ranked nodes of the input KG to include in the output KG;].”). encoding,…the output KG to receive a second set of node embeddings; (Ma, pg. 5 col. 2, “Denote by GNN(A, X) a generic graph neural network architecture that takes the graph adjacency matrix A and node feature matrix X as input and produces as output a transformed feature matrix. We produce the feature matrix Xc of the coarse graph through the following encoder-decoder-like architecture: Z = GNN(A, X), Zc = S TZ, Xc = GNN(Ac, Zc). (8) The encoder produces an embedding matrix Zc of the coarse graph [encoding,…the output KG to receive a second set of node embeddings;]”). training a KG compression ML model based on optimal transport of a distance matrix between the first set of node embeddings and the second set of node embeddings; (Ma, pg. 4-5 section 3.2, “The second ingredient of the proposed OTCOARSENING uses optimal transport for unsupervised learning [training a KG compression ML model based on optimal transport]…In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc [of a distance matrix between the first set of node embeddings and the second set of node embeddings;]. Then, the distance constitutes the coarsening loss, from which model parameters are learned in an unsupervised manner.”). and compressing a new input KG using the trained KG compression ML model, wherein the KG compression ML model includes updated parameters resulting from training based on optimal transport (Ma, pg. 5 col. 2, “After training, for each graph G [and compressing a new input KG using the trained KG compression ML model,] we obtain a coarsening sequence and the corresponding node embedding matrices Zc for each graph in the sequence. These node embeddings may be used for a downstream task; the mention of after training is interpreted as using a trained model with updated parameters (i.e. wherein the KG compression ML model includes updated parameters resulting from training based on optimal transport).”). and wherein compressing the new input KG by the KG compression ML model generates a new output KG, the generation of the new output KG dependent on the updated parameters of the KG compression ML model resulting from training based on optimal transport of the distance matrix between the first set of node embeddings and the second set of node embeddings. (Ma, pg. 8 col. 2, “Coarsening is a common approach for solving large-scale graph problems in various scientific disciplines. It generates a sequence of successively smaller graphs, each one being a structural summary of the prior one [and wherein compressing the new input KG by the KG compression ML model generates a new output KG,]” and Ma, pg. 5 col. 2, “After training, for each graph G we obtain a coarsening sequence and the corresponding node embedding matrices Zc for each graph in the sequence. These node embeddings may be used for a downstream task; the mention of after training is interpreted as using a trained model with updated parameters (i.e. the generation of the new output KG dependent on the updated parameters of the KG compression ML model resulting from training based on optimal transport of the distance matrix between the first set of node embeddings and the second set of node embeddings.).”). While Ma teaches a system for compressing a knowledge graph into a coarse graph using optimal transport, Ma does not explicitly teach: encoding by a graph convolutional neural network (GCN), determining whether there is sufficient training for the KG compression ML model based on a training cycle threshold, wherein the KG compression ML model is determined to have sufficient training based on a number of compression cycles the KG compression ML model is trained on satisfying the training cycle threshold; Schlichtkrull teaches encoding by a graph convolutional neural network (GCN), (Schlichtkrull, pg. 1-2, “Our link prediction model can be regarded as an autoencoder consisting of (1) an encoder: an R-GCN producing latent feature representations of entities [encoding by a graph convolutional neural network (GCN),]”). Ma and Schlichtkrull are both in the same field of endeavor (i.e. graph-based machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma and Schlichtkrull to teach the above limitation(s). The motivation for doing so is that using an R-GCN improves node embeddings by considering neighborhood relationships (cf. Schlichtkrull, pg. 2 col. 1, “This result demonstrates that explicit modeling of neighborhoods in R-GCNs is beneficial for recovering missing facts in knowledge bases.”). While Ma in view of Schlichtkrull teaches a system for compressing a knowledge graph into a coarse graph using optimal transport and a GCN, the combination does not explicitly teach: determining whether there is sufficient training for the KG compression ML model based on a training cycle threshold, wherein the KG compression ML model is determined to have sufficient training based on a number of compression cycles the KG compression ML model is trained on satisfying the training cycle threshold; Akyildiz teaches determining whether there is sufficient training for the KG compression ML model based on a training cycle threshold, wherein the KG compression ML model is determined to have sufficient training based on a number of compression cycles the KG compression ML model is trained on satisfying the training cycle threshold; (Akyildiz, pg. 3 col. 2, “The embedding methods in the literature work in epochs, where an epoch is simply a pass over all the vertices. Usually, an epoch budget, i.e., the number of such passes, is given [determining whether there is sufficient training for the KG compression ML model based on a training cycle threshold,]. The budget distribution in GOSH, i.e, the number of epochs assigned to each level, follows a hybrid distribution. A fraction of p < 1 epochs are distributed uniformly across levels, while the remaining (1−p)×e epochs are distributed geometrically such that level i is assigned ei = e/D+ei epochs where ei is half of ei+1. Smaller values of p mean that more work is done at the coarser levels which will make the embedding faster, and larger values put more work on the finer levels resulting in more fine-tuned embeddings [wherein the KG compression ML model is determined to have sufficient training based on a number of compression cycles the KG compression ML model is trained on satisfying the training cycle threshold;].”). Ma, in view of Schlichtkrull, and Akyildiz are both in the same field of endeavor (i.e. graph coarsening). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma, in view of Schlichtkrull, and Akyildiz to teach the above limitation(s). The motivation for doing so is that using a epoch budget improves the coarsening learning at different coarsening levels (cf. Akyildiz, pg. 3 col. 2, “A fraction of p < 1 epochs are distributed uniformly across levels, while the remaining (1−p)×e epochs are distributed geometrically such that level i is assigned ei = e/D+ei epochs where ei is half of ei+1. Smaller values of p mean that more work is done at the coarser levels which will make the embedding faster, and larger values put more work on the finer levels resulting in more fine-tuned embeddings.”). Regarding claim 4, Ma in view of Schlichtkrull and Akyildiz teaches the method of claim 1. Ma further teaches wherein the graph coarsening ML model utilizes algebraic multigrid (AMG). (Ma, pg. 3 col. 1, “The first ingredient coarsens a graph G into a smaller one Gc [wherein the graph coarsening ML model]. For a differentiable parameterization, an operator will need be defined that transforms the corresponding graph adjacency matrix A ∈ R n×n into Ac ∈ R m×m, where n and m are the number of nodes of G and Gc respectively, with m < n. We motivate the definition by algebraic multigrid [utilizes algebraic multigrid (AMG).] (Ruge & Stuben, 1987), because of the hierarchical connection and a graph-theoretic interpretation.”). Regarding claim 5, Ma in view of Schlichtkrull and Akyildiz teaches the method of claim 1. Ma further teaches wherein the graph coarsening ML model utilizes Attention Based Top-K Pooling. (Ma, pg. 2 col. 1, “In the second approach, the top nodes according to some parameterized ordering are selected and the induced subgraph becomes the next graph in the hierarchy. The third approach is similar to the second one, except that the ordering is computed through self-attention [wherein the graph coarsening ML model utilizes Attention Based Top-K Pooling.].”). Regarding claim 6, Ma in view of Schlichtkrull and Akyildiz teaches the method of claim 1. Ma further teaches wherein training the KG compression ML model includes minimizing a Wasserstein distance. (Ma, pg. 5 col. 1, “If the two measures lie on the same metric space and if the infinitesimal mass transportation cost is a distance metric, then optimal transport is the same as the Wasserstein-1 distance. In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc. Then, the distance constitutes the coarsening loss from which model parameters are learned in an unsupervised manner; the Wasserstein distance being equivalent to the coarsening loss is interpreted as training to minimize the Wasserstein distance (i.e. wherein training the KG compression ML model includes minimizing a Wasserstein distance.).”). Regarding claim 21, Ma in view of Schlichtkrull and Akyildiz teaches the method of claim 1. Ma further teaches wherein the training is unsupervised, wherein the KG compression ML model for KG compression is a neural network. (Ma, abstract, “In this work, we propose an unsupervised approach [wherein the training is unsupervised,], coined OTCOARSENING, with the use of optimal transport.”, and Ma, pg. 5 col. 2, “Z = GNN(A, X), Zc = S TZ, Xc = GNN(Ac, Zc). (8) The encoder produces an embedding matrix Zc of the coarse graph through a combination of GNN [wherein the KG compression ML model for KG compression is a neural network.] transformation and aggregation S T , whereas the decoder maps Zc to the original feature space so that the resulting Xc lies in the same metric space as X.”). Regarding claim 22, Ma in view of Schlichtkrull and Akyildiz teaches the method of claim 21. Ma further teaches wherein the unsupervised training based on optimal transport includes an objective function that minimizes a Wasserstein distance of the distance matrix. (Ma, pg. 4-5 section 3.2, “The second ingredient of the proposed OTCOARSENING uses optimal transport for unsupervised learning [wherein the unsupervised training based on optimal transport]…In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc. Then, the distance constitutes the coarsening loss, from which model parameters are learned in an unsupervised manner; the Wasserstein distance being equivalent to the coarsening loss is interpreted as training to minimize the Wasserstein distance (i.e. includes an objective function that minimizes a Wasserstein distance of the distance matrix.).”). Regarding claim 23, Ma in view of Schlichtkrull and Akyildiz teaches the method of claim 22. Ma further teaches wherein the Wasserstein distance is entropy regularized. (Ma, pg. 5 col. 1, “We define the distance of two graphs as Wγ(G, Gc) [wherein the Wasserstein distance] := min P ∈U(a,b) hP, Mi − γE(P), (4) where P, a matrix of the same size as M, denotes the joint probability distribution constrained to the space U(a, b) := {P ∈ R n×m + | P1 = a, P T 1 = b} characterized by marginals a and b; E is the entropic regularization E(P) := − P i,j Pij (log Pij − 1) [is entropy regularized.]; and γ > 0 is the regularization magnitude.”). Claims 8 and 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Ma, et al., Non-Patent Literature “Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport” (“Ma”) in view of Schlichtkrull, et al., Non-Patent Literature “Modeling Relational Data with Graph Convolutional Networks” (“Schlichtkrull”). Regarding claim 8, Ma discloses: encoding,…an input knowledge graph (KG) to receive a first set of node embeddings; (Ma, pg. 5 col. 1, “To extend the definition of optimal transport of two probability measures to that of two graphs G and Gc, we treat the node features from each graph as atoms of an empirical measure; G is interpreted as the input KG and node features are interpreted as node embeddings (i.e. encoding,…an input knowledge graph (KG) to receive a first set of node embeddings;).”). compressing, by a graph coarsening machine learning (ML) model, the input KG into an output KG, (Ma, pg. 2, col. 1, “OTCOARSENING consists of two ingredients: a parameterized graph coarsening strategy in the algebraic multigrid (AMG) style; and an optimal transport that minimizes the structural transportation between two consecutive graphs in the hierarchy. The “OT” part of the name comes from Optimal Transport. We show that this unsupervised approach produces meaningful coarse graphs that are structure preserving [compressing, by a graph coarsening machine learning (ML) model, the input KG into an output KG,]”). wherein the graph coarsening machine learning model is trained to select top ranked nodes of the input KG to include in the output KG; (Ma, pg. 4 col. 2, “For a coarsening into m nodes, we pick the top m values of α and list them in the sorted order. Denote by αs ∈ R m×1 such a vector, where the subscript s means sorted and picked. We similarly denote by Abs ∈ R n×m the column-sorted and picked version of Ab [wherein the graph coarsening machine learning model is trained to select top ranked nodes of the input KG to include in the output KG;].”). encoding,…the output KG to receive a second set of node embeddings; (Ma, pg. 5 col. 2, “Denote by GNN(A, X) a generic graph neural network architecture that takes the graph adjacency matrix A and node feature matrix X as input and produces as output a transformed feature matrix. We produce the feature matrix Xc of the coarse graph through the following encoder-decoder-like architecture: Z = GNN(A, X), Zc = S TZ, Xc = GNN(Ac, Zc). (8) The encoder produces an embedding matrix Zc of the coarse graph [encoding,…the output KG to receive a second set of node embeddings;]”). training a KG compression ML model based on optimal transport of a distance matrix between the first set of node embeddings and the second set of node embeddings; (Ma, pg. 4-5 section 3.2, “The second ingredient of the proposed OTCOARSENING uses optimal transport for unsupervised learning [training a KG compression ML model based on optimal transport]…In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc [of a distance matrix between the first set of node embeddings and the second set of node embeddings;]. Then, the distance constitutes the coarsening loss, from which model parameters are learned in an unsupervised manner.”). determining whether there is sufficient training for the KG compression ML model based on a performance metric threshold, wherein the KG compression ML model is determined to have sufficient training based on a performance metric score of the KG compression ML model exceeding the performance metric threshold; (Ma, pg. 11, “We evaluate all methods using 10-fold cross validation [determining whether there is sufficient training for the KG compression ML model based on a performance metric threshold,]. For training, we use the Adam optimizer (Kingma & Ba, 2014) with a tuned initial learning rate and a fixed decay rate 0:5 for every 50 epochs. We perform unsupervised training for a maximum of 200 epochs and choose the model at the best validation loss; selecting the model based on the best validation loss is interpreted as exceeding a performance metric threshold (i.e. wherein the KG compression ML model is determined to have sufficient training based on a performance metric score of the KG compression ML model exceeding the performance metric threshold;).”). and compressing a new input KG using the trained KG compression ML model, wherein the KG compression ML model includes updated parameters resulting from training based on optimal transport (Ma, pg. 5 col. 2, “After training, for each graph G [and compressing a new input KG using the trained KG compression ML model,] we obtain a coarsening sequence and the corresponding node embedding matrices Zc for each graph in the sequence. These node embeddings may be used for a downstream task; the mention of after training is interpreted as using a trained model with updated parameters (i.e. wherein the KG compression ML model includes updated parameters resulting from training based on optimal transport).”). and wherein compressing the new input KG by the KG compression ML model generates a new output KG, the generation of the new output KG dependent on the updated parameters of the KG compression ML model resulting from training based on optimal transport of the distance matrix between the first set of node embeddings and the second set of node embeddings. (Ma, pg. 8 col. 2, “Coarsening is a common approach for solving large-scale graph problems in various scientific disciplines. It generates a sequence of successively smaller graphs, each one being a structural summary of the prior one [and wherein compressing the new input KG by the KG compression ML model generates a new output KG,]” and Ma, pg. 5 col. 2, “After training, for each graph G we obtain a coarsening sequence and the corresponding node embedding matrices Zc for each graph in the sequence. These node embeddings may be used for a downstream task; the mention of after training is interpreted as using a trained model with updated parameters (i.e. the generation of the new output KG dependent on the updated parameters of the KG compression ML model resulting from training based on optimal transport of the distance matrix between the first set of node embeddings and the second set of node embeddings.).”). While Ma teaches a system for compressing a knowledge graph into a coarse graph using optimal transport, Ma does not explicitly teach: A system comprising: one or more processors; and one or more computer-readable storage media storing program instructions which, when executed by the one or more processors, are configured to cause the one or more processors to perform a method comprising: encoding by a graph convolutional neural network (GCN), Schlichtkrull teaches A system comprising: one or more processors; and one or more computer-readable storage media storing program instructions which, when executed by the one or more processors, are configured to cause the one or more processors to perform a method comprising: (Schlichtkrull, pg. 4 col. 2, “experiments were run on CPU nodes with 64GB of memory [A system comprising: one or more processors; and one or more computer-readable storage media storing program instructions which, when executed by the one or more processors, are configured to cause the one or more processors to perform a method comprising:].”). Ma and Schlichtkrull are both in the same field of endeavor (i.e. graph-based machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma and Schlichtkrull to teach the above limitation(s). The motivation for doing so is that a CPU and memory are required in order to run a machine learning process. Schlichtkrull also teaches encoding by a graph convolutional neural network (GCN), (Schlichtkrull, pg. 1-2, “Our link prediction model can be regarded as an autoencoder consisting of (1) an encoder: an R-GCN producing latent feature representations of entities [encoding by a graph convolutional neural network (GCN),]”). Ma and Schlichtkrull are both in the same field of endeavor (i.e. graph-based machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma and Schlichtkrull to teach the above limitation(s). The motivation for doing so is that using an R-GCN improves node embeddings by considering neighborhood relationships (cf. Schlichtkrull, pg. 2 col. 1, “This result demonstrates that explicit modeling of neighborhoods in R-GCNs is beneficial for recovering missing facts in knowledge bases.”). Regarding claim 11, Ma in view of Schlichtkrull teaches the system of claim 8. Ma further teaches wherein the graph coarsening ML model utilizes algebraic multigrid (AMG). (Ma, pg. 3 col. 1, “The first ingredient coarsens a graph G into a smaller one Gc [wherein the graph coarsening ML model]. For a differentiable parameterization, an operator will need be defined that transforms the corresponding graph adjacency matrix A ∈ R n×n into Ac ∈ R m×m, where n and m are the number of nodes of G and Gc respectively, with m < n. We motivate the definition by algebraic multigrid [utilizes algebraic multigrid (AMG).] (Ruge & Stuben, 1987), because of the hierarchical connection and a graph-theoretic interpretation.”). Regarding claim 12, Ma in view of Schlichtkrull teaches the system of claim 8. Ma further teaches wherein the graph coarsening ML model utilizes Attention Based Top-K Pooling. (Ma, pg. 2 col. 1, “In the second approach, the top nodes according to some parameterized ordering are selected and the induced subgraph becomes the next graph in the hierarchy. The third approach is similar to the second one, except that the ordering is computed through self-attention [wherein the graph coarsening ML model utilizes Attention Based Top-K Pooling.].”). Regarding claim 13, Ma in view of Schlichtkrull teaches the system of claim 8. Ma further teaches wherein training the KG compression ML model includes minimizing a Wasserstein distance. (Ma, pg. 5 col. 1, “If the two measures lie on the same metric space and if the infinitesimal mass transportation cost is a distance metric, then optimal transport is the same as the Wasserstein-1 distance. In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc. Then, the distance constitutes the coarsening loss from which model parameters are learned in an unsupervised manner; the Wasserstein distance being equivalent to the coarsening loss is interpreted as training to minimize the Wasserstein distance (i.e. wherein training the KG compression ML model includes minimizing a Wasserstein distance.).”). Claims 15 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Ma, et al., Non-Patent Literature “Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport” (“Ma”) in view of Schlichtkrull, et al., Non-Patent Literature “Modeling Relational Data with Graph Convolutional Networks” (“Schlichtkrull”) and further in view of Safavi, et al., Non-Patent Literature “Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket” (“Safavi”) and Kim, et al., Non-Patent Literature “Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score” (“Kim”). Regarding claim 15, Ma discloses: encoding,…an input knowledge graph (KG) to receive a first set of node embeddings…; (Ma, pg. 5 col. 1, “To extend the definition of optimal transport of two probability measures to that of two graphs G and Gc, we treat the node features from each graph as atoms of an empirical measure; G is interpreted as the input KG and node features are interpreted as node embeddings (i.e. encoding,…an input knowledge graph (KG) to receive a first set of node embeddings;).”). compressing, by a graph coarsening machine learning (ML) model, the input KG into an output KG, (Ma, pg. 2, col. 1, “OTCOARSENING consists of two ingredients: a parameterized graph coarsening strategy in the algebraic multigrid (AMG) style; and an optimal transport that minimizes the structural transportation between two consecutive graphs in the hierarchy. The “OT” part of the name comes from Optimal Transport. We show that this unsupervised approach produces meaningful coarse graphs that are structure preserving [compressing, by a graph coarsening machine learning (ML) model, the input KG into an output KG,]”). wherein the graph coarsening machine learning model is trained to select top ranked nodes of the input KG to include in the output KG…; (Ma, pg. 4 col. 2, “For a coarsening into m nodes, we pick the top m values of α and list them in the sorted order. Denote by αs ∈ R m×1 such a vector, where the subscript s means sorted and picked. We similarly denote by Abs ∈ R n×m the column-sorted and picked version of Ab [wherein the graph coarsening machine learning model is trained to select top ranked nodes of the input KG to include in the output KG;].”). encoding,…the output KG to receive a second set of node embeddings; (Ma, pg. 5 col. 2, “Denote by GNN(A, X) a generic graph neural network architecture that takes the graph adjacency matrix A and node feature matrix X as input and produces as output a transformed feature matrix. We produce the feature matrix Xc of the coarse graph through the following encoder-decoder-like architecture: Z = GNN(A, X), Zc = S TZ, Xc = GNN(Ac, Zc). (8) The encoder produces an embedding matrix Zc of the coarse graph [encoding,…the output KG to receive a second set of node embeddings;]”). training a KG compression ML model based on optimal transport of a distance matrix between the first set of node embeddings and the second set of node embeddings; (Ma, pg. 4-5 section 3.2, “The second ingredient of the proposed OTCOARSENING uses optimal transport for unsupervised learning [training a KG compression ML model based on optimal transport]…In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc [of a distance matrix between the first set of node embeddings and the second set of node embeddings;]. Then, the distance constitutes the coarsening loss, from which model parameters are learned in an unsupervised manner.”). and compressing a new input KG using the trained KG compression ML model, wherein the KG compression ML model includes updated parameters resulting from training based on optimal transport (Ma, pg. 5 col. 2, “After training, for each graph G [and compressing a new input KG using the trained KG compression ML model,] we obtain a coarsening sequence and the corresponding node embedding matrices Zc for each graph in the sequence. These node embeddings may be used for a downstream task; the mention of after training is interpreted as using a trained model with updated parameters (i.e. wherein the KG compression ML model includes updated parameters resulting from training based on optimal transport).”). and wherein compressing the new input KG by the KG compression ML model generates a new output KG, the generation of the new output KG dependent on the updated parameters of the KG compression ML model resulting from training based on optimal transport of the distance matrix between the first set of node embeddings and the second set of node embeddings…. (Ma, pg. 8 col. 2, “Coarsening is a common approach for solving large-scale graph problems in various scientific disciplines. It generates a sequence of successively smaller graphs, each one being a structural summary of the prior one [and wherein compressing the new input KG by the KG compression ML model generates a new output KG,]” and Ma, pg. 5 col. 2, “After training, for each graph G we obtain a coarsening sequence and the corresponding node embedding matrices Zc for each graph in the sequence. These node embeddings may be used for a downstream task; the mention of after training is interpreted as using a trained model with updated parameters (i.e. the generation of the new output KG dependent on the updated parameters of the KG compression ML model resulting from training based on optimal transport of the distance matrix between the first set of node embeddings and the second set of node embeddings….).”). While Ma teaches a system for compressing a knowledge graph into a coarse graph using optimal transport, Ma does not explicitly teach: A computer program product comprising one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising instructions configured to cause one or more processors to perform a method comprising: encoding by a graph convolutional neural network (GCN), the input KG including a set of nodes and a set of edges that link nodes of the set of nodes, wherein respective nodes of the set of nodes correspond to words of a document and respective edges of the set of edges correspond to term frequency- inverse document frequency (tf-idf) values wherein the output KG includes a subset of nodes of the set of nodes and a subset of edges of the set of edges, wherein the subset of nodes signify relevant words within the document wherein the new output KG includes a second subset of nodes of the new input KG that signify relevant words within a new document the new input KG was generated from Schlichtkrull teaches A computer program product comprising one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising instructions configured to cause one or more processors to perform a method comprising: (Schlichtkrull, pg. 4 col. 2, “experiments were run on CPU nodes with 64GB of memory [A computer program product comprising one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising instructions configured to cause one or more processors to perform a method comprising:].”). Ma and Schlichtkrull are both in the same field of endeavor (i.e. graph-based machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma and Schlichtkrull to teach the above limitation(s). The motivation for doing so is that a CPU and memory are required in order to run a machine learning process. Schlichtkrull also teaches encoding by a graph convolutional neural network (GCN), (Schlichtkrull, pg. 1-2, “Our link prediction model can be regarded as an autoencoder consisting of (1) an encoder: an R-GCN producing latent feature representations of entities [encoding by a graph convolutional neural network (GCN),]”). Ma and Schlichtkrull are both in the same field of endeavor (i.e. graph-based machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma and Schlichtkrull to teach the above limitation(s). The motivation for doing so is that using an R-GCN improves node embeddings by considering neighborhood relationships (cf. Schlichtkrull, pg. 2 col. 1, “This result demonstrates that explicit modeling of neighborhoods in R-GCNs is beneficial for recovering missing facts in knowledge bases.”). While Ma in view of Schlichtkrull teaches a system for compressing a knowledge graph into a coarse graph using optimal transport and a GCN, the combination does not explicitly teach: the input KG including a set of nodes and a set of edges that link nodes of the set of nodes, wherein respective nodes of the set of nodes correspond to words of a document and respective edges of the set of edges correspond to term frequency- inverse document frequency (tf-idf) values wherein the output KG includes a subset of nodes of the set of nodes and a subset of edges of the set of edges, wherein the subset of nodes signify relevant words within the document wherein the new output KG includes a second subset of nodes of the new input KG that signify relevant words within a new document the new input KG was generated from Safavi teaches: the input KG including a set of nodes and a set of edges that link nodes of the set of nodes, wherein respective nodes of the set of nodes correspond to words of a document and respective edges of the set of edges… (Safavi, pg. 528 col. 1 and Figure 1, “Encyclopedic knowledge graphs, which store facts about the world by connecting entities via semantically meaningful relations [including a set of nodes and a set of edges that link nodes of the set of nodes,], have shown to be useful tools for AI tasks including question answering, item recommendation, query expansion, language modeling, and more [38], [40], [37], [9]. Modern knowledge graphs (KGs) contain up to billions of entities and relationships, and are continually being augmented with new facts [26]. These increasingly large stores of world knowledge necessitate summarization, which reduces large KGs [the input KG] to more concise but still interpretable and query-able forms [22]; Figure 1 shows that words are connected to other words using relationships (i.e. wherein respective nodes of the set of nodes correspond to words of a document and respective edges of the set of edges…).”). wherein the output KG includes a subset of nodes of the set of nodes and a subset of edges of the set of edges, wherein the subset of nodes signify relevant words within the document (Safavi, pg. 528 col. 1 and Figure 1, “This paper proposes the new task of personalized knowledge graph summarization, the goal of which is to find a sparse summary of a large KG—in essence, a “mini-KG”— containing the facts most relevant to each individual user’s interests and queries [wherein the output KG includes a subset of nodes of the set of nodes and a subset of edges of the set of edges, wherein the subset of nodes signify relevant words within the document].”). wherein the new output KG includes a second subset of nodes of the new input KG that signify relevant words within a new document the new input KG was generated from (Safavi, pg. 528 col. 1 and Figure 1, “This paper proposes the new task of personalized knowledge graph summarization, the goal of which is to find a sparse summary of a large KG—in essence, a “mini-KG”— containing the facts most relevant to each individual user’s interests and queries [wherein the new output KG includes a second subset of nodes of the new input KG that signify relevant words within a new document the new input KG was generated from].”). Ma, in view of Schlichtkrull, and Safavi are both in the same field of endeavor (i.e. graph-based machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma, in view of Schlichtkrull, and Safavi to teach the above limitation(s). The motivation for doing so is that identifying relevant word entities in a knowledge graph allows for personalized graph compression (cf. Safavi, pg. 529 col. 1, “Some graph summarization techniques create summaries preserving specific classes or types of subgraph queries (e.g., path or star queries), though again toward summarizing the entire graph [6], [23]. A few recent approaches also consider user- or task-specific input [1], [19], [15]. That said, no existing work handles the actual content of user queries. This calls for new summarization approaches tailored to personalization, as a personal summary cannot be the same for individuals with different interests (as expressed by their past queries).”). While Ma in view of Schlichtkrull and Safavi teaches a system for compressing a knowledge graph into a coarse graph using optimal transport and a GCN while considering personalized values, the combination does not explicitly teach: correspond to term frequency- inverse document frequency (tf-idf) values Kim teaches correspond to term frequency- inverse document frequency (tf-idf) values (Kim, pg. 15-16, “For example, in the optimized associative knowledge graph of Figure 9, the weight of {Traffic}→{Congestion} is 1.877. Having the word ’congestion’ in a traffic circumstance is a general association rule so that the value of lift is low. On the contrary, the lift value of {Traffic}→{Depression} is 7.542. The appearance of ’depression’ in a traffic circumstance is not general. In short, it is an association rule of providing new information. For this reason, the lift weight is high. Accordingly, the TF-IDF based knowledge graph generates more optimized knowledge than a conventional knowledge graph [correspond to term frequency- inverse document frequency (tf-idf) values].”). Ma, in view of Schlichtkrull and Safavi, and Kim are both in the same field of endeavor (i.e. knowledge graph analysis). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma, in view of Schlichtkrull and Safavi, and Kim to teach the above limitation(s). The motivation for doing so is that using TF-IDF on a knowledge graph improves the knowledge graph’s information by showing the importance of each word of a knowledge graph (cf. Kim, abstract, “the proposed method utilizes TF-IDF based ranking scores to remove terms with low scores and recreate transactions. Based on the result, the association rule algorithm is applied to create an optimized knowledge model.”). Regarding claim 18, Ma in view of Schlichtkrull, Safavi, and Kim teaches the computer program product of claim 15. Ma further teaches wherein the graph coarsening ML model utilizes algebraic multigrid (AMG). (Ma, pg. 3 col. 1, “The first ingredient coarsens a graph G into a smaller one Gc [wherein the graph coarsening ML model]. For a differentiable parameterization, an operator will need be defined that transforms the corresponding graph adjacency matrix A ∈ R n×n into Ac ∈ R m×m, where n and m are the number of nodes of G and Gc respectively, with m < n. We motivate the definition by algebraic multigrid [utilizes algebraic multigrid (AMG).] (Ruge & Stuben, 1987), because of the hierarchical connection and a graph-theoretic interpretation.”). Regarding claim 19, Ma in view of Schlichtkrull, Safavi, and Kim teaches the computer program product of claim 15. Ma further teaches wherein training the KG compression ML model includes minimizing a Wasserstein distance. (Ma, pg. 5 col. 1, “If the two measures lie on the same metric space and if the infinitesimal mass transportation cost is a distance metric, then optimal transport is the same as the Wasserstein-1 distance. In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc. Then, the distance constitutes the coarsening loss from which model parameters are learned in an unsupervised manner; the Wasserstein distance being equivalent to the coarsening loss is interpreted as training to minimize the Wasserstein distance (i.e. wherein training the KG compression ML model includes minimizing a Wasserstein distance.).”). Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS S WU whose telephone number is (571)270-0939. The examiner can normally be reached Monday - Friday 8:00 am - 4:00 pm EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached on 571-431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /N.S.W./Examiner, Art Unit 2148 /MICHELLE T BECHTOLD/ Supervisory Patent Examiner, Art Unit 2148
Read full office action

Prosecution Timeline

Apr 15, 2021
Application Filed
Apr 03, 2024
Non-Final Rejection — §103
Jun 26, 2024
Interview Requested
Jul 03, 2024
Examiner Interview Summary
Jul 03, 2024
Applicant Interview (Telephonic)
Jul 08, 2024
Response Filed
Oct 17, 2024
Non-Final Rejection — §103
Jan 14, 2025
Examiner Interview Summary
Jan 14, 2025
Applicant Interview (Telephonic)
Jan 21, 2025
Response Filed
May 15, 2025
Final Rejection — §103
Jul 22, 2025
Applicant Interview (Telephonic)
Jul 22, 2025
Examiner Interview Summary
Jul 29, 2025
Request for Continued Examination
Aug 04, 2025
Response after Non-Final Action
Sep 23, 2025
Non-Final Rejection — §103
Dec 08, 2025
Examiner Interview Summary
Dec 08, 2025
Applicant Interview (Telephonic)
Dec 16, 2025
Response Filed
Feb 17, 2026
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12488244
APPARATUS AND METHOD FOR DATA GENERATION FOR USER ENGAGEMENT
2y 5m to grant Granted Dec 02, 2025
Patent 12423576
METHOD AND APPARATUS FOR UPDATING PARAMETER OF MULTI-TASK MODEL, AND STORAGE MEDIUM
2y 5m to grant Granted Sep 23, 2025
Patent 12361280
METHOD AND DEVICE FOR TRAINING A MACHINE LEARNING ROUTINE FOR CONTROLLING A TECHNICAL SYSTEM
2y 5m to grant Granted Jul 15, 2025
Patent 12354017
ALIGNING KNOWLEDGE GRAPHS USING SUBGRAPH TYPING
2y 5m to grant Granted Jul 08, 2025
Patent 12333425
HYBRID GRAPH NEURAL NETWORK
2y 5m to grant Granted Jun 17, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

6-7
Expected OA Rounds
47%
Grant Probability
90%
With Interview (+43.1%)
3y 9m
Median Time to Grant
High
PTA Risk
Based on 38 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month