DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 07/29/2025 have been fully considered but they are not fully persuasive.
Regarding the 103 rejections, applicant's arguments filed with respect to the prior art rejections for claims 1 and 15 have been fully considered but they are moot. Applicant has amended the claims to recite new combinations of limitations. Applicant's arguments are directed at the amendment. Please see below for new grounds of rejection, necessitated by Amendment.
Regarding the 103 rejections, applicant's arguments filed with respect to the prior art rejections for claim 8 have been fully considered but they are not persuasive.
Alleged no teaching of sufficient training based on a performance metric threshold
In Remarks/Arguments pg. 9-10, applicant contends:
“Further, Ma and Schlichtkrull, viewed individually or in combination, do not relate to ‘determining whether there is sufficient training for the KG compression ML model based on a performance metric threshold, wherein the KG compression ML model is determined to have sufficient training based on a performance metric score of the KG compression ML model exceeding the performance metric threshold’ of claim 8, as Ma and Schlichtkrull do not ‘determine whether there is sufficient training,’ let alone "based on a performance metric score.’”
The relevant claim limitations appear to be: determining whether there is sufficient training for the KG compression ML model based on a performance metric threshold, wherein the KG compression ML model is determined to have sufficient training based on a performance metric score of the KG compression ML model exceeding the performance metric threshold; in claim 8. Ma teaches:
(Ma, pg. 11, “We evaluate all methods using 10-fold cross validation. For training, we use the Adam optimizer (Kingma & Ba, 2014) with a tuned initial learning rate and a fixed decay rate 0:5 for every 50 epochs. We perform unsupervised training for a maximum of 200 epochs and choose the model at the best validation loss.”).
In other words, Ma teaches determining whether there is a sufficient training for a compression model based on a performance metric score exceeding a performance threshold. Ma uses a 10-fold cross validation to choose the best trained coarsening model by comparing validation losses. Under the broadest reasonable interpretation, a validation loss is interpreted as a performance metric score as it reflects a model’s error rate. Choosing the best validation loss is interpreted as a performance metric score exceeding a threshold as the best validation loss exceeds a threshold of being the best loss value. Therefore, the applicant’s arguments regarding claim 8 are not persuasive.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 4-6, and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Ma, et al., Non-Patent Literature “Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport” (“Ma”) in view of Schlichtkrull, et al., Non-Patent Literature “Modeling Relational Data with Graph Convolutional Networks” (“Schlichtkrull”) and further in view of Akyildiz, et al., Non-Patent Literature “Understanding Coarsening for Embedding Large-Scale Graphs” (“Akyildiz”).
Regarding claim 1, Ma discloses:
A method comprising: encoding,…an input knowledge graph (KG) to receive a first set of node embeddings; (Ma, pg. 5 col. 1, “To extend the definition of optimal transport of two probability measures to that of two graphs G and Gc, we treat the node features from each graph as atoms of an empirical measure; G is interpreted as the input KG and node features are interpreted as node embeddings (i.e. A method comprising: encoding,…an input knowledge graph (KG) to receive a first set of node embeddings;).”).
compressing, by a graph coarsening machine learning (ML) model, the input KG into an output KG, (Ma, pg. 2, col. 1, “OTCOARSENING consists of two ingredients: a parameterized graph coarsening strategy in the algebraic multigrid (AMG) style; and an optimal transport that minimizes the structural transportation between two consecutive graphs in the hierarchy. The “OT” part of the name comes from Optimal Transport. We show that this unsupervised approach produces meaningful coarse graphs that are structure preserving [compressing, by a graph coarsening machine learning (ML) model, the input KG into an output KG,]”).
wherein the graph coarsening machine learning model is trained to select top ranked nodes of the input KG to include in the output KG; (Ma, pg. 4 col. 2, “For a coarsening into m nodes, we pick the top m values of α and list them in the sorted order. Denote by αs ∈ R m×1 such a vector, where the subscript s means sorted and picked. We similarly denote by Abs ∈ R n×m the column-sorted and picked version of Ab [wherein the graph coarsening machine learning model is trained to select top ranked nodes of the input KG to include in the output KG;].”).
encoding,…the output KG to receive a second set of node embeddings; (Ma, pg. 5 col. 2, “Denote by GNN(A, X) a generic graph neural network architecture that takes the graph adjacency matrix A and node feature matrix X as input and produces as output a transformed feature matrix. We produce the feature matrix Xc of the coarse graph through the following encoder-decoder-like architecture: Z = GNN(A, X), Zc = S TZ, Xc = GNN(Ac, Zc). (8) The encoder produces an embedding matrix Zc of the coarse graph [encoding,…the output KG to receive a second set of node embeddings;]”).
training a KG compression ML model based on optimal transport of a distance matrix between the first set of node embeddings and the second set of node embeddings; (Ma, pg. 4-5 section 3.2, “The second ingredient of the proposed OTCOARSENING uses optimal transport for unsupervised learning [training a KG compression ML model based on optimal transport]…In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc [of a distance matrix between the first set of node embeddings and the second set of node embeddings;]. Then, the distance constitutes the coarsening loss, from which model parameters are learned in an unsupervised manner.”).
and compressing a new input KG using the trained KG compression ML model, wherein the KG compression ML model includes updated parameters resulting from training based on optimal transport (Ma, pg. 5 col. 2, “After training, for each graph G [and compressing a new input KG using the trained KG compression ML model,] we obtain a coarsening sequence and the corresponding node embedding matrices Zc for each graph in the sequence. These node embeddings may be used for a downstream task; the mention of after training is interpreted as using a trained model with updated parameters (i.e. wherein the KG compression ML model includes updated parameters resulting from training based on optimal transport).”).
and wherein compressing the new input KG by the KG compression ML model generates a new output KG, the generation of the new output KG dependent on the updated parameters of the KG compression ML model resulting from training based on optimal transport of the distance matrix between the first set of node embeddings and the second set of node embeddings. (Ma, pg. 8 col. 2, “Coarsening is a common approach for solving large-scale graph problems in various scientific disciplines. It generates a sequence of successively smaller graphs, each one being a structural summary of the prior one [and wherein compressing the new input KG by the KG compression ML model generates a new output KG,]” and Ma, pg. 5 col. 2, “After training, for each graph G we obtain a coarsening sequence and the corresponding node embedding matrices Zc for each graph in the sequence. These node embeddings may be used for a downstream task; the mention of after training is interpreted as using a trained model with updated parameters (i.e. the generation of the new output KG dependent on the updated parameters of the KG compression ML model resulting from training based on optimal transport of the distance matrix between the first set of node embeddings and the second set of node embeddings.).”).
While Ma teaches a system for compressing a knowledge graph into a coarse graph using optimal transport, Ma does not explicitly teach:
encoding by a graph convolutional neural network (GCN),
determining whether there is sufficient training for the KG compression ML model based on a training cycle threshold, wherein the KG compression ML model is determined to have sufficient training based on a number of compression cycles the KG compression ML model is trained on satisfying the training cycle threshold;
Schlichtkrull teaches encoding by a graph convolutional neural network (GCN), (Schlichtkrull, pg. 1-2, “Our link prediction model can be regarded as an autoencoder consisting of (1) an encoder: an R-GCN producing latent feature representations of entities [encoding by a graph convolutional neural network (GCN),]”).
Ma and Schlichtkrull are both in the same field of endeavor (i.e. graph-based machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma and Schlichtkrull to teach the above limitation(s). The motivation for doing so is that using an R-GCN improves node embeddings by considering neighborhood relationships (cf. Schlichtkrull, pg. 2 col. 1, “This result demonstrates that explicit modeling of neighborhoods in R-GCNs is beneficial for recovering missing facts in knowledge bases.”).
While Ma in view of Schlichtkrull teaches a system for compressing a knowledge graph into a coarse graph using optimal transport and a GCN, the combination does not explicitly teach:
determining whether there is sufficient training for the KG compression ML model based on a training cycle threshold, wherein the KG compression ML model is determined to have sufficient training based on a number of compression cycles the KG compression ML model is trained on satisfying the training cycle threshold;
Akyildiz teaches determining whether there is sufficient training for the KG compression ML model based on a training cycle threshold, wherein the KG compression ML model is determined to have sufficient training based on a number of compression cycles the KG compression ML model is trained on satisfying the training cycle threshold; (Akyildiz, pg. 3 col. 2, “The embedding methods in the literature work in epochs, where an epoch is simply a pass over all the vertices. Usually, an epoch budget, i.e., the number of such passes, is given [determining whether there is sufficient training for the KG compression ML model based on a training cycle threshold,]. The budget distribution in GOSH, i.e, the number of epochs assigned to each level, follows a hybrid distribution. A fraction of p < 1 epochs are distributed uniformly across levels, while the remaining (1−p)×e epochs are distributed geometrically such that level i is assigned ei = e/D+ei epochs where ei is half of ei+1. Smaller values of p mean that more work is done at the coarser levels which will make the embedding faster, and larger values put more work on the finer levels resulting in more fine-tuned embeddings [wherein the KG compression ML model is determined to have sufficient training based on a number of compression cycles the KG compression ML model is trained on satisfying the training cycle threshold;].”).
Ma, in view of Schlichtkrull, and Akyildiz are both in the same field of endeavor (i.e. graph coarsening). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma, in view of Schlichtkrull, and Akyildiz to teach the above limitation(s). The motivation for doing so is that using a epoch budget improves the coarsening learning at different coarsening levels (cf. Akyildiz, pg. 3 col. 2, “A fraction of p < 1 epochs are distributed uniformly across levels, while the remaining (1−p)×e epochs are distributed geometrically such that level i is assigned ei = e/D+ei epochs where ei is half of ei+1. Smaller values of p mean that more work is done at the coarser levels which will make the embedding faster, and larger values put more work on the finer levels resulting in more fine-tuned embeddings.”).
Regarding claim 4, Ma in view of Schlichtkrull and Akyildiz teaches the method of claim 1. Ma further teaches wherein the graph coarsening ML model utilizes algebraic multigrid (AMG). (Ma, pg. 3 col. 1, “The first ingredient coarsens a graph G into a smaller one Gc [wherein the graph coarsening ML model]. For a differentiable parameterization, an operator will need be defined that transforms the corresponding graph adjacency matrix A ∈ R n×n into Ac ∈ R m×m, where n and m are the number of nodes of G and Gc respectively, with m < n. We motivate the definition by algebraic multigrid [utilizes algebraic multigrid (AMG).] (Ruge & Stuben, 1987), because of the hierarchical connection and a graph-theoretic interpretation.”).
Regarding claim 5, Ma in view of Schlichtkrull and Akyildiz teaches the method of claim 1. Ma further teaches wherein the graph coarsening ML model utilizes Attention Based Top-K Pooling. (Ma, pg. 2 col. 1, “In the second approach, the top nodes according to some parameterized ordering are selected and the induced subgraph becomes the next graph in the hierarchy. The third approach is similar to the second one, except that the ordering is computed through self-attention [wherein the graph coarsening ML model utilizes Attention Based Top-K Pooling.].”).
Regarding claim 6, Ma in view of Schlichtkrull and Akyildiz teaches the method of claim 1. Ma further teaches wherein training the KG compression ML model includes minimizing a Wasserstein distance. (Ma, pg. 5 col. 1, “If the two measures lie on the same metric space and if the infinitesimal mass transportation cost is a distance metric, then optimal transport is the same as the Wasserstein-1 distance. In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc. Then, the distance constitutes the coarsening loss from which model parameters are learned in an unsupervised manner; the Wasserstein distance being equivalent to the coarsening loss is interpreted as training to minimize the Wasserstein distance (i.e. wherein training the KG compression ML model includes minimizing a Wasserstein distance.).”).
Regarding claim 21, Ma in view of Schlichtkrull and Akyildiz teaches the method of claim 1. Ma further teaches wherein the training is unsupervised, wherein the KG compression ML model for KG compression is a neural network. (Ma, abstract, “In this work, we propose an unsupervised approach [wherein the training is unsupervised,], coined OTCOARSENING, with the use of optimal transport.”, and Ma, pg. 5 col. 2, “Z = GNN(A, X), Zc = S TZ, Xc = GNN(Ac, Zc). (8) The encoder produces an embedding matrix Zc of the coarse graph through a combination of GNN [wherein the KG compression ML model for KG compression is a neural network.] transformation and aggregation S T , whereas the decoder maps Zc to the original feature space so that the resulting Xc lies in the same metric space as X.”).
Regarding claim 22, Ma in view of Schlichtkrull and Akyildiz teaches the method of claim 21. Ma further teaches wherein the unsupervised training based on optimal transport includes an objective function that minimizes a Wasserstein distance of the distance matrix. (Ma, pg. 4-5 section 3.2, “The second ingredient of the proposed OTCOARSENING uses optimal transport for unsupervised learning [wherein the unsupervised training based on optimal transport]…In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc. Then, the distance constitutes the coarsening loss, from which model parameters are learned in an unsupervised manner; the Wasserstein distance being equivalent to the coarsening loss is interpreted as training to minimize the Wasserstein distance (i.e. includes an objective function that minimizes a Wasserstein distance of the distance matrix.).”).
Regarding claim 23, Ma in view of Schlichtkrull and Akyildiz teaches the method of claim 22. Ma further teaches wherein the Wasserstein distance is entropy regularized. (Ma, pg. 5 col. 1, “We define the distance of two graphs as Wγ(G, Gc) [wherein the Wasserstein distance] := min P ∈U(a,b) hP, Mi − γE(P), (4) where P, a matrix of the same size as M, denotes the joint probability distribution constrained to the space U(a, b) := {P ∈ R n×m + | P1 = a, P T 1 = b} characterized by marginals a and b; E is the entropic regularization E(P) := − P i,j Pij (log Pij − 1) [is entropy regularized.]; and γ > 0 is the regularization magnitude.”).
Claims 8 and 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Ma, et al., Non-Patent Literature “Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport” (“Ma”) in view of Schlichtkrull, et al., Non-Patent Literature “Modeling Relational Data with Graph Convolutional Networks” (“Schlichtkrull”).
Regarding claim 8, Ma discloses:
encoding,…an input knowledge graph (KG) to receive a first set of node embeddings; (Ma, pg. 5 col. 1, “To extend the definition of optimal transport of two probability measures to that of two graphs G and Gc, we treat the node features from each graph as atoms of an empirical measure; G is interpreted as the input KG and node features are interpreted as node embeddings (i.e. encoding,…an input knowledge graph (KG) to receive a first set of node embeddings;).”).
compressing, by a graph coarsening machine learning (ML) model, the input KG into an output KG, (Ma, pg. 2, col. 1, “OTCOARSENING consists of two ingredients: a parameterized graph coarsening strategy in the algebraic multigrid (AMG) style; and an optimal transport that minimizes the structural transportation between two consecutive graphs in the hierarchy. The “OT” part of the name comes from Optimal Transport. We show that this unsupervised approach produces meaningful coarse graphs that are structure preserving [compressing, by a graph coarsening machine learning (ML) model, the input KG into an output KG,]”).
wherein the graph coarsening machine learning model is trained to select top ranked nodes of the input KG to include in the output KG; (Ma, pg. 4 col. 2, “For a coarsening into m nodes, we pick the top m values of α and list them in the sorted order. Denote by αs ∈ R m×1 such a vector, where the subscript s means sorted and picked. We similarly denote by Abs ∈ R n×m the column-sorted and picked version of Ab [wherein the graph coarsening machine learning model is trained to select top ranked nodes of the input KG to include in the output KG;].”).
encoding,…the output KG to receive a second set of node embeddings; (Ma, pg. 5 col. 2, “Denote by GNN(A, X) a generic graph neural network architecture that takes the graph adjacency matrix A and node feature matrix X as input and produces as output a transformed feature matrix. We produce the feature matrix Xc of the coarse graph through the following encoder-decoder-like architecture: Z = GNN(A, X), Zc = S TZ, Xc = GNN(Ac, Zc). (8) The encoder produces an embedding matrix Zc of the coarse graph [encoding,…the output KG to receive a second set of node embeddings;]”).
training a KG compression ML model based on optimal transport of a distance matrix between the first set of node embeddings and the second set of node embeddings; (Ma, pg. 4-5 section 3.2, “The second ingredient of the proposed OTCOARSENING uses optimal transport for unsupervised learning [training a KG compression ML model based on optimal transport]…In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc [of a distance matrix between the first set of node embeddings and the second set of node embeddings;]. Then, the distance constitutes the coarsening loss, from which model parameters are learned in an unsupervised manner.”).
determining whether there is sufficient training for the KG compression ML model based on a performance metric threshold, wherein the KG compression ML model is determined to have sufficient training based on a performance metric score of the KG compression ML model exceeding the performance metric threshold; (Ma, pg. 11, “We evaluate all methods using 10-fold cross validation [determining whether there is sufficient training for the KG compression ML model based on a performance metric threshold,]. For training, we use the Adam optimizer (Kingma & Ba, 2014) with a tuned initial learning rate and a fixed decay rate 0:5 for every 50 epochs. We perform unsupervised training for a maximum of 200 epochs and choose the model at the best validation loss; selecting the model based on the best validation loss is interpreted as exceeding a performance metric threshold (i.e. wherein the KG compression ML model is determined to have sufficient training based on a performance metric score of the KG compression ML model exceeding the performance metric threshold;).”).
and compressing a new input KG using the trained KG compression ML model, wherein the KG compression ML model includes updated parameters resulting from training based on optimal transport (Ma, pg. 5 col. 2, “After training, for each graph G [and compressing a new input KG using the trained KG compression ML model,] we obtain a coarsening sequence and the corresponding node embedding matrices Zc for each graph in the sequence. These node embeddings may be used for a downstream task; the mention of after training is interpreted as using a trained model with updated parameters (i.e. wherein the KG compression ML model includes updated parameters resulting from training based on optimal transport).”).
and wherein compressing the new input KG by the KG compression ML model generates a new output KG, the generation of the new output KG dependent on the updated parameters of the KG compression ML model resulting from training based on optimal transport of the distance matrix between the first set of node embeddings and the second set of node embeddings. (Ma, pg. 8 col. 2, “Coarsening is a common approach for solving large-scale graph problems in various scientific disciplines. It generates a sequence of successively smaller graphs, each one being a structural summary of the prior one [and wherein compressing the new input KG by the KG compression ML model generates a new output KG,]” and Ma, pg. 5 col. 2, “After training, for each graph G we obtain a coarsening sequence and the corresponding node embedding matrices Zc for each graph in the sequence. These node embeddings may be used for a downstream task; the mention of after training is interpreted as using a trained model with updated parameters (i.e. the generation of the new output KG dependent on the updated parameters of the KG compression ML model resulting from training based on optimal transport of the distance matrix between the first set of node embeddings and the second set of node embeddings.).”).
While Ma teaches a system for compressing a knowledge graph into a coarse graph using optimal transport, Ma does not explicitly teach:
A system comprising: one or more processors; and one or more computer-readable storage media storing program instructions which, when executed by the one or more processors, are configured to cause the one or more processors to perform a method comprising:
encoding by a graph convolutional neural network (GCN),
Schlichtkrull teaches A system comprising: one or more processors; and one or more computer-readable storage media storing program instructions which, when executed by the one or more processors, are configured to cause the one or more processors to perform a method comprising: (Schlichtkrull, pg. 4 col. 2, “experiments were run on CPU nodes with 64GB of memory [A system comprising: one or more processors; and one or more computer-readable storage media storing program instructions which, when executed by the one or more processors, are configured to cause the one or more processors to perform a method comprising:].”).
Ma and Schlichtkrull are both in the same field of endeavor (i.e. graph-based machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma and Schlichtkrull to teach the above limitation(s). The motivation for doing so is that a CPU and memory are required in order to run a machine learning process.
Schlichtkrull also teaches encoding by a graph convolutional neural network (GCN), (Schlichtkrull, pg. 1-2, “Our link prediction model can be regarded as an autoencoder consisting of (1) an encoder: an R-GCN producing latent feature representations of entities [encoding by a graph convolutional neural network (GCN),]”).
Ma and Schlichtkrull are both in the same field of endeavor (i.e. graph-based machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma and Schlichtkrull to teach the above limitation(s). The motivation for doing so is that using an R-GCN improves node embeddings by considering neighborhood relationships (cf. Schlichtkrull, pg. 2 col. 1, “This result demonstrates that explicit modeling of neighborhoods in R-GCNs is beneficial for recovering missing facts in knowledge bases.”).
Regarding claim 11, Ma in view of Schlichtkrull teaches the system of claim 8. Ma further teaches wherein the graph coarsening ML model utilizes algebraic multigrid (AMG). (Ma, pg. 3 col. 1, “The first ingredient coarsens a graph G into a smaller one Gc [wherein the graph coarsening ML model]. For a differentiable parameterization, an operator will need be defined that transforms the corresponding graph adjacency matrix A ∈ R n×n into Ac ∈ R m×m, where n and m are the number of nodes of G and Gc respectively, with m < n. We motivate the definition by algebraic multigrid [utilizes algebraic multigrid (AMG).] (Ruge & Stuben, 1987), because of the hierarchical connection and a graph-theoretic interpretation.”).
Regarding claim 12, Ma in view of Schlichtkrull teaches the system of claim 8. Ma further teaches wherein the graph coarsening ML model utilizes Attention Based Top-K Pooling. (Ma, pg. 2 col. 1, “In the second approach, the top nodes according to some parameterized ordering are selected and the induced subgraph becomes the next graph in the hierarchy. The third approach is similar to the second one, except that the ordering is computed through self-attention [wherein the graph coarsening ML model utilizes Attention Based Top-K Pooling.].”).
Regarding claim 13, Ma in view of Schlichtkrull teaches the system of claim 8. Ma further teaches wherein training the KG compression ML model includes minimizing a Wasserstein distance. (Ma, pg. 5 col. 1, “If the two measures lie on the same metric space and if the infinitesimal mass transportation cost is a distance metric, then optimal transport is the same as the Wasserstein-1 distance. In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc. Then, the distance constitutes the coarsening loss from which model parameters are learned in an unsupervised manner; the Wasserstein distance being equivalent to the coarsening loss is interpreted as training to minimize the Wasserstein distance (i.e. wherein training the KG compression ML model includes minimizing a Wasserstein distance.).”).
Claims 15 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Ma, et al., Non-Patent Literature “Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport” (“Ma”) in view of Schlichtkrull, et al., Non-Patent Literature “Modeling Relational Data with Graph Convolutional Networks” (“Schlichtkrull”) and further in view of Safavi, et al., Non-Patent Literature “Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket” (“Safavi”) and Kim, et al., Non-Patent Literature “Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score” (“Kim”).
Regarding claim 15, Ma discloses:
encoding,…an input knowledge graph (KG) to receive a first set of node embeddings…; (Ma, pg. 5 col. 1, “To extend the definition of optimal transport of two probability measures to that of two graphs G and Gc, we treat the node features from each graph as atoms of an empirical measure; G is interpreted as the input KG and node features are interpreted as node embeddings (i.e. encoding,…an input knowledge graph (KG) to receive a first set of node embeddings;).”).
compressing, by a graph coarsening machine learning (ML) model, the input KG into an output KG, (Ma, pg. 2, col. 1, “OTCOARSENING consists of two ingredients: a parameterized graph coarsening strategy in the algebraic multigrid (AMG) style; and an optimal transport that minimizes the structural transportation between two consecutive graphs in the hierarchy. The “OT” part of the name comes from Optimal Transport. We show that this unsupervised approach produces meaningful coarse graphs that are structure preserving [compressing, by a graph coarsening machine learning (ML) model, the input KG into an output KG,]”).
wherein the graph coarsening machine learning model is trained to select top ranked nodes of the input KG to include in the output KG…; (Ma, pg. 4 col. 2, “For a coarsening into m nodes, we pick the top m values of α and list them in the sorted order. Denote by αs ∈ R m×1 such a vector, where the subscript s means sorted and picked. We similarly denote by Abs ∈ R n×m the column-sorted and picked version of Ab [wherein the graph coarsening machine learning model is trained to select top ranked nodes of the input KG to include in the output KG;].”).
encoding,…the output KG to receive a second set of node embeddings; (Ma, pg. 5 col. 2, “Denote by GNN(A, X) a generic graph neural network architecture that takes the graph adjacency matrix A and node feature matrix X as input and produces as output a transformed feature matrix. We produce the feature matrix Xc of the coarse graph through the following encoder-decoder-like architecture: Z = GNN(A, X), Zc = S TZ, Xc = GNN(Ac, Zc). (8) The encoder produces an embedding matrix Zc of the coarse graph [encoding,…the output KG to receive a second set of node embeddings;]”).
training a KG compression ML model based on optimal transport of a distance matrix between the first set of node embeddings and the second set of node embeddings; (Ma, pg. 4-5 section 3.2, “The second ingredient of the proposed OTCOARSENING uses optimal transport for unsupervised learning [training a KG compression ML model based on optimal transport]…In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc [of a distance matrix between the first set of node embeddings and the second set of node embeddings;]. Then, the distance constitutes the coarsening loss, from which model parameters are learned in an unsupervised manner.”).
and compressing a new input KG using the trained KG compression ML model, wherein the KG compression ML model includes updated parameters resulting from training based on optimal transport (Ma, pg. 5 col. 2, “After training, for each graph G [and compressing a new input KG using the trained KG compression ML model,] we obtain a coarsening sequence and the corresponding node embedding matrices Zc for each graph in the sequence. These node embeddings may be used for a downstream task; the mention of after training is interpreted as using a trained model with updated parameters (i.e. wherein the KG compression ML model includes updated parameters resulting from training based on optimal transport).”).
and wherein compressing the new input KG by the KG compression ML model generates a new output KG, the generation of the new output KG dependent on the updated parameters of the KG compression ML model resulting from training based on optimal transport of the distance matrix between the first set of node embeddings and the second set of node embeddings…. (Ma, pg. 8 col. 2, “Coarsening is a common approach for solving large-scale graph problems in various scientific disciplines. It generates a sequence of successively smaller graphs, each one being a structural summary of the prior one [and wherein compressing the new input KG by the KG compression ML model generates a new output KG,]” and Ma, pg. 5 col. 2, “After training, for each graph G we obtain a coarsening sequence and the corresponding node embedding matrices Zc for each graph in the sequence. These node embeddings may be used for a downstream task; the mention of after training is interpreted as using a trained model with updated parameters (i.e. the generation of the new output KG dependent on the updated parameters of the KG compression ML model resulting from training based on optimal transport of the distance matrix between the first set of node embeddings and the second set of node embeddings….).”).
While Ma teaches a system for compressing a knowledge graph into a coarse graph using optimal transport, Ma does not explicitly teach:
A computer program product comprising one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising instructions configured to cause one or more processors to perform a method comprising:
encoding by a graph convolutional neural network (GCN),
the input KG including a set of nodes and a set of edges that link nodes of the set of nodes, wherein respective nodes of the set of nodes correspond to words of a document and respective edges of the set of edges correspond to term frequency- inverse document frequency (tf-idf) values
wherein the output KG includes a subset of nodes of the set of nodes and a subset of edges of the set of edges, wherein the subset of nodes signify relevant words within the document
wherein the new output KG includes a second subset of nodes of the new input KG that signify relevant words within a new document the new input KG was generated from
Schlichtkrull teaches A computer program product comprising one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising instructions configured to cause one or more processors to perform a method comprising: (Schlichtkrull, pg. 4 col. 2, “experiments were run on CPU nodes with 64GB of memory [A computer program product comprising one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising instructions configured to cause one or more processors to perform a method comprising:].”).
Ma and Schlichtkrull are both in the same field of endeavor (i.e. graph-based machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma and Schlichtkrull to teach the above limitation(s). The motivation for doing so is that a CPU and memory are required in order to run a machine learning process.
Schlichtkrull also teaches encoding by a graph convolutional neural network (GCN), (Schlichtkrull, pg. 1-2, “Our link prediction model can be regarded as an autoencoder consisting of (1) an encoder: an R-GCN producing latent feature representations of entities [encoding by a graph convolutional neural network (GCN),]”).
Ma and Schlichtkrull are both in the same field of endeavor (i.e. graph-based machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma and Schlichtkrull to teach the above limitation(s). The motivation for doing so is that using an R-GCN improves node embeddings by considering neighborhood relationships (cf. Schlichtkrull, pg. 2 col. 1, “This result demonstrates that explicit modeling of neighborhoods in R-GCNs is beneficial for recovering missing facts in knowledge bases.”).
While Ma in view of Schlichtkrull teaches a system for compressing a knowledge graph into a coarse graph using optimal transport and a GCN, the combination does not explicitly teach:
the input KG including a set of nodes and a set of edges that link nodes of the set of nodes, wherein respective nodes of the set of nodes correspond to words of a document and respective edges of the set of edges correspond to term frequency- inverse document frequency (tf-idf) values
wherein the output KG includes a subset of nodes of the set of nodes and a subset of edges of the set of edges, wherein the subset of nodes signify relevant words within the document
wherein the new output KG includes a second subset of nodes of the new input KG that signify relevant words within a new document the new input KG was generated from
Safavi teaches:
the input KG including a set of nodes and a set of edges that link nodes of the set of nodes, wherein respective nodes of the set of nodes correspond to words of a document and respective edges of the set of edges… (Safavi, pg. 528 col. 1 and Figure 1, “Encyclopedic knowledge graphs, which store facts about the world by connecting entities via semantically meaningful relations [including a set of nodes and a set of edges that link nodes of the set of nodes,], have shown to be useful tools for AI tasks including question answering, item recommendation, query expansion, language modeling, and more [38], [40], [37], [9]. Modern knowledge graphs (KGs) contain up to billions of entities and relationships, and are continually being augmented with new facts [26]. These increasingly large stores of world knowledge necessitate summarization, which reduces large KGs [the input KG] to more concise but still interpretable and query-able forms [22]; Figure 1 shows that words are connected to other words using relationships (i.e. wherein respective nodes of the set of nodes correspond to words of a document and respective edges of the set of edges…).”).
wherein the output KG includes a subset of nodes of the set of nodes and a subset of edges of the set of edges, wherein the subset of nodes signify relevant words within the document (Safavi, pg. 528 col. 1 and Figure 1, “This paper proposes the new task of personalized knowledge graph summarization, the goal of which is to find a sparse summary of a large KG—in essence, a “mini-KG”— containing the facts most relevant to each individual user’s interests and queries [wherein the output KG includes a subset of nodes of the set of nodes and a subset of edges of the set of edges, wherein the subset of nodes signify relevant words within the document].”).
wherein the new output KG includes a second subset of nodes of the new input KG that signify relevant words within a new document the new input KG was generated from (Safavi, pg. 528 col. 1 and Figure 1, “This paper proposes the new task of personalized knowledge graph summarization, the goal of which is to find a sparse summary of a large KG—in essence, a “mini-KG”— containing the facts most relevant to each individual user’s interests and queries [wherein the new output KG includes a second subset of nodes of the new input KG that signify relevant words within a new document the new input KG was generated from].”).
Ma, in view of Schlichtkrull, and Safavi are both in the same field of endeavor (i.e. graph-based machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma, in view of Schlichtkrull, and Safavi to teach the above limitation(s). The motivation for doing so is that identifying relevant word entities in a knowledge graph allows for personalized graph compression (cf. Safavi, pg. 529 col. 1, “Some graph summarization techniques create summaries preserving specific classes or types of subgraph queries (e.g., path or star queries), though again toward summarizing the entire graph [6], [23]. A few recent approaches also consider user- or task-specific input [1], [19], [15]. That said, no existing work handles the actual content of user queries. This calls for new summarization approaches tailored to personalization, as a personal summary cannot be the same for individuals with different interests (as expressed by their past queries).”).
While Ma in view of Schlichtkrull and Safavi teaches a system for compressing a knowledge graph into a coarse graph using optimal transport and a GCN while considering personalized values, the combination does not explicitly teach:
correspond to term frequency- inverse document frequency (tf-idf) values
Kim teaches correspond to term frequency- inverse document frequency (tf-idf) values (Kim, pg. 15-16, “For example, in the optimized associative knowledge graph of Figure 9, the weight of {Traffic}→{Congestion} is 1.877. Having the word ’congestion’ in a traffic circumstance is a general association rule so that the value of lift is low. On the contrary, the lift value of {Traffic}→{Depression} is 7.542. The appearance of ’depression’ in a traffic circumstance is not general. In short, it is an association rule of providing new information. For this reason, the lift weight is high. Accordingly, the TF-IDF based knowledge graph generates more optimized knowledge than a conventional knowledge graph [correspond to term frequency- inverse document frequency (tf-idf) values].”).
Ma, in view of Schlichtkrull and Safavi, and Kim are both in the same field of endeavor (i.e. knowledge graph analysis). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Ma, in view of Schlichtkrull and Safavi, and Kim to teach the above limitation(s). The motivation for doing so is that using TF-IDF on a knowledge graph improves the knowledge graph’s information by showing the importance of each word of a knowledge graph (cf. Kim, abstract, “the proposed method utilizes TF-IDF based ranking scores to remove terms with low scores and recreate transactions. Based on the result, the association rule algorithm is applied to create an optimized knowledge model.”).
Regarding claim 18, Ma in view of Schlichtkrull, Safavi, and Kim teaches the computer program product of claim 15. Ma further teaches wherein the graph coarsening ML model utilizes algebraic multigrid (AMG). (Ma, pg. 3 col. 1, “The first ingredient coarsens a graph G into a smaller one Gc [wherein the graph coarsening ML model]. For a differentiable parameterization, an operator will need be defined that transforms the corresponding graph adjacency matrix A ∈ R n×n into Ac ∈ R m×m, where n and m are the number of nodes of G and Gc respectively, with m < n. We motivate the definition by algebraic multigrid [utilizes algebraic multigrid (AMG).] (Ruge & Stuben, 1987), because of the hierarchical connection and a graph-theoretic interpretation.”).
Regarding claim 19, Ma in view of Schlichtkrull, Safavi, and Kim teaches the computer program product of claim 15. Ma further teaches wherein training the KG compression ML model includes minimizing a Wasserstein distance. (Ma, pg. 5 col. 1, “If the two measures lie on the same metric space and if the infinitesimal mass transportation cost is a distance metric, then optimal transport is the same as the Wasserstein-1 distance. In our setting, we extend this framework for defining the distance of the original graph G and its coarsened version Gc. Then, the distance constitutes the coarsening loss from which model parameters are learned in an unsupervised manner; the Wasserstein distance being equivalent to the coarsening loss is interpreted as training to minimize the Wasserstein distance (i.e. wherein training the KG compression ML model includes minimizing a Wasserstein distance.).”).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS S WU whose telephone number is (571)270-0939. The examiner can normally be reached Monday - Friday 8:00 am - 4:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached on 571-431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/N.S.W./Examiner, Art Unit 2148 /MICHELLE T BECHTOLD/ Supervisory Patent Examiner, Art Unit 2148