Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The office action is responsive to the amendment filed on 01/20/2026. The status of the claims are as follow: claims 1-2, 12-13, and 20 have been amended, claims 9-11,15 and 19 have been cancelled. Thus, claims 1-8, 12-14, 16-18, and 20 are pending for examination.
Response to Arguments
Regarding objection to the abstract:
Applicant’s arguments, see pg. 8, section: I filed 01/20/2026, with respect to objection to the abstract have been fully considered and are persuasive. The objection of the abstract has been withdrawn.
Regarding objection to the claims:
Applicant’s arguments, see pg. 8-9, section: II filed 01/20/2026, with respect to objection to claims 1 and 20 have been fully considered and are persuasive. The objection of claims 1 and 20 has been withdrawn.
Regarding the 35 U.S.C § 112(b) Rejection:
Applicant's arguments see pg. 9, section: III filed 01/20/2026, have been fully considered but they are not persuasive.
APPLICANT ARGUMENTS:
“Applicant submits that the amendments to Claims 1, 2, 12, 13, and 20 obviate the objections. Furthermore, Claims 10-11, and 19 have been cancelled, rendering their rejections moot. Therefore, Applicant respectfully requests the rejections of Claims 1 and 2 be withdrawn”.
EXAMINER RESPONSE: Examiner respectfully disagree, applicant arguments are not persuasive.
While claims 10-11, and 19 have been cancelled, rendering their rejections moot, the amendments to claims 1, 2, 12, 13, and 20 does not obviate the 35 U.S.C § 112(b) rejection. For example, amended claim 1 recites the limitation “....receiving a trained machine learning (ML) model, the trained ML model comprising: input graph data; and a graph neural network (GNN) having GNN neurons and associated node embeddings, wherein the node embeddings comprise N scalar values representing features the GNN identifies for each neuron, a neuron identity, an edge identity, and a weight of each edge; extracting, from the received GNN...” and similar amended claim 12 recites “...extract, from a machine learning (ML) model: a count of a number of neurons in a penultimate layer of the ML model; node embeddings for each input graph node in graph neural network (GNN) neurons in the penultimate layer of the ML model, the node embeddings comprising N scalar values representing features the GNN identifies for each neuron, a neuron identity, an edge identity, and a weight of each edge; and scalar values from an output of the ML model...” however it’s not clear what is the difference between the ML model and the GNN. Are the ML model and the GNN model two distinct model for which different information is being extracted? or is it the ML model itself a GNN?. Paragraph [0065] of the instant application states “the ML model may correspond to, or be represented by a GNN, similar to the GNN 100 shown in FIG. 1” thus suggesting the ML model and the GNN are the same. Therefore, for purpose of examination, examiner is viewing the ML model itself being a GNN.
Further, even though the rection of claims 2 and 13 for reciting “input graph” in line 2 is being withdrawn. Claims 2 and 13 remain rejected under 35 U.S.C § 112(b) as being dependent on claims 1 and 12 and thus are rejected for reasons set forth in the rejection of claim 1 and 12.
Accordingly, claims 2-8, 13-14, and 16-18 are dependent on claims 1 and 12, and thus are rejected for reasons set forth in the rejection of claim 1 and 12.
Regarding the rejection to claim 20 for reciting “the penultimate layer node count” in line 9 have been withdrawn.
Regarding the 35 U.S.C § 101 Rejection:
Applicant’s arguments, see pg. 9, section: IV filed 01/20/2026, with respect to claims 1-8, 12-14, 16-18, and 20 have been fully considered and are persuasive. The rejection of claims 1-8, 12-14, 16-18, and 20 under of 35 U.S.C § 101 has been withdrawn.
Regarding the 35 U.S.C § 103 Rejection:
Applicant’s arguments with respect to claims 1-8, 12-14, 16-18, and 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 1-8, 12-14, 16-18, and 20 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Amended independent claim 1 recites the limitation in line 6-7 " wherein the node embeddings comprises ...a neuron identity, an edge identity... ", however, it’s not clear what constitute an “neuron identity” and an “edge identity”. Specifically, the specification does not provide description for these terminology. Under the broadest reasonable interpretation (BRI), “neuron identity” and an “edge identity” can be view a value that identifies the node and edge which can be just the embedding. Therefore, for purpose of examination weight or n scalar values in each node of a machine learning model can be view as “neuron identity” or “edge identity”.
Furthermore, amended claim 1 recites the limitation “....receiving a trained machine learning (ML) model, the trained ML model comprising: input graph data; and a graph neural network (GNN) having GNN neurons and associated node embeddings, wherein the node embeddings comprise N scalar values representing features the GNN identifies for each neuron, a neuron identity, an edge identity, and a weight of each edge; extracting, from the received GNN...” and similar amended claim 12 recites “...extract, from a machine learning (ML) model: a count of a number of neurons in a penultimate layer of the ML model; node embeddings for each input graph node in graph neural network (GNN) neurons in the penultimate layer of the ML model, the node embeddings comprising N scalar values representing features the GNN identifies for each neuron, a neuron identity, an edge identity, and a weight of each edge; and scalar values from an output of the ML model...” however it’s not clear what is the difference between the ML model and the GNN. Are the ML model and the GNN model two distinct model for which different information is being extracted? or is it the ML model itself a GNN?. The claim as presented recites “extracting from the received GNN” however the claim limitation in line 3 recites “receiving a trained machine learning (ML) model” and not a GNN. Also paragraph [0065] of the instant application states “the ML model may correspond to, or be represented by a GNN, similar to the GNN 100 shown in FIG. 1” thus the specification and the claim as presented suggest the ML model and the GNN are the same. Therefore, for purpose of examination, examiner is viewing the ML model itself being a GNN.
Claim 12 recites the limitation “the penultimate layer neuron count” in line 21. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, this will be view as “the penultimate layer neurons count”.
Claim 20 recites the limitation “the penultimate layer neuron count” in line 9. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, this will be view as “the penultimate layer neurons count”.
Claims 2-8, 13-14, and 16-18 are dependent on claims 1 and 12, and thus are rejected for reasons set forth in the rejection of claim 1 and 12.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 12-14, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Ruichi Yu et al. NISP: Pruning Networks using Neuron Importance Score Propagation (hereinafter Ruichi) in view of Muller et al. DT+GNN: A Fully Explainable Graph Neural Network using Decision Trees (hereinafter Muller) in further view of Afsar Ahamed Asaraf Ali A study of effect of pruning in a fully connected layer of a CNN architecture used for classification (hereinafter Ali) in further view Wong et al. US 2023/0118240 A1 (hereinafter Wong).
Regarding claim 1:
A computer-implemented method for reducing or selecting embedding dimensions in a machine learning model, the method comprising: (Ruichi pg. 9201, right col., para. 1, teaches using “GPU resources... to search the optimal hyper-parameters by trying different pruning ratio combinations on a validation set” which inheritably will necessitate a computer and Ruichi pg. 9196, left col., sec: 3.1. Feature Ranking on the Final Response Layer, para. 1, teaches applying feature ranking (filtering) on the penultimate layer, which someone ordinary skilled in the art will recognize can be used to reduce embedding dimensions ).
receiving a trained machine learning (ML) model, the trained ML model comprising: (Ruichi pg. 9195, left col., para. 2, teaches receiving a pre-trained model (machine learning (ML) model)).
a (Ruchi Fig. 2 teaches embedding with weight of the neural network. Also, Ruchi pg. 9194, right col. last paragraph that continues on pg. 9195 teaches the pre-trained model (i.e., neural network) with feature importance that have been identified in the final response layer based on response importance which is treated as feature. Thus, this suggest such features can be view as a “neuron identity, an edge identity, and a weight of each edge” as shown in Fig. 2).
receiving, as input: an importance threshold input for filtering the node embeddings; and ( Ruichi Fig. 1, teaches receiving a pre-defined pruning ratio input and pg. 9198, sec: 3.3. Pruning Networks Using NISP, para. 1, teaches the pre-defined pruning ration input can be used to “remove neurons with prune indicator value 0” ).
determining, using the model, an importance metric of each of the node embedding dimensions from the penultimate layer neurons, the determining comprising: (Ruichi pg. 9194, Abstract, teaches using a model (Neuron Importance Score Propagation (NISP) algorithm) to “propagate the importance scores of final responses to every neuron in the network” and pg. 9196, left col. sec: 3 Our Approach para. 1, teaches applying a feature ranking algorithm (NIPS) on the final response layer (penultimate layer) of the network to obtain an importance score (importance metric) of each neurons).
summing node embedding dimension importance value outputs of the model; and ( Ruichi pg. 9198, left col. sec: 3.2.4 Algorithm, para. 1 teaches obtaining an importance score “of every neuron in the final response layer of the network” which is computed by weighted sum in equation 19).
filtering the summed node embedding dimension importance value outputs to produce highest importance node embedding dimensions from the penultimate layer neurons by applying the importance threshold input to the summed and normalized node embedding dimension importance value outputs; ( Ruichi pg. 9196, left col., sec: 3.1. Feature Ranking on the Final Response Layer, para. 1, teaches applying feature ranking (filtering) on the penultimate layer and Ruichi pg. 9198, left col. sec: 3.2.4 Algorithm, para. 1, teaches obtaining importance score of every neuron in the final response layer of the network. Further, Ruichi pg. 9198, left col. sec: 3.3. Pruning Networks, using NIPS, para. 1, teaches “we propagate the importance scores, compute the prune indicator of neurons based on their importance scores and remove neurons with prune indicator value 0”).
restricting the penultimate layer of the GNN model to correspond to a number of the highest importance node embedding dimensions; and
PNG
media_image1.png
403
582
media_image1.png
Greyscale
Ruichi pg. 9196, left col., sec: 3.2. Neuron Importance Score Propagation (NISP) teaches restricting/pruning intermediate neurons based on the importance scores of penultimate layer ( i.e., final responses layer (FRL)) and Fig. 2 teaches the penultimate layer corresponding to “a number of the high importance node embedding dimensions”).
training the ML model using the restricted penultimate layer ( Ruichi Abstract, teaches “The CNN is pruned by removing neurons with least importance, and it is then fine-tuned to recover its predictive power”, this suggest the ML model (i.e., CNN) is trained using the pruned (restricted) layers including the penultimate layer (i.e., final response layer (FRL)).
Ruichi does not teach or suggest input graph data; and a graph neural network (GNN) having GNN neurons and associated node embeddings, wherein the node embeddings comprise N scalar values representing features the GNN identifies for each neuron, a neuron identity, an edge identity, and a weight of each edge; extracting, from the received GNN model: node embeddings for each input graph node in GNN neurons in penultimate layer nodes; and scalar values from an output of the GNN model; receiving, as input: identification of a tree-based model configured to return feature importance values; inputting, into the tree-based model, the node embeddings extracted from the neurons in the penultimate layer of the GNN model; determine, using a tree-based model, an importance metric of at least one embedding dimension of node embeddings from the penultimate layer neurons, comprising: processing the tree-based model; after training the tree-based model... normalizing the node embedding dimension importance value outputs of the tree-based model... ; extracting, from the received GN model: a count of a number of neurons in a penultimate layer of the GNN model; restrict the penultimate layer node count of the GNN model to correspond to a number of the high importance node embedding dimensions.
Nevertheless, Muller teaches the following:
input graph data; and (Muller pg. 15, sec: B - Datasets, teaches input graph data datasets).
a graph neural network (GNN) having GNN neurons and associated node embeddings, wherein the node embeddings comprise N scalar values representing features the GNN identifies for each neuron, a neuron identity, an edge identity, and a weight of each edge; ( Muller teaches a GNN having GNN nodes (neurons) and associated states (node embedding) per node for the current layer and teaches the GNN having weights (i.e., neuron identity & edge identity) connecting two vertices/node/neurons (see pg. 13, Appendix: Using the Tool, Fig. 7). In addition, Muller pg. 14 , Fig.8, teaches scalar values such as node-level importance scores for each node withing the GNN layer).
extracting, from the received GNN model: node embeddings for each input graph node in GNN neurons in penultimate layer nodes; and (
PNG
media_image2.png
558
839
media_image2.png
Greyscale
Muller Fig. 2 in pg. 2, teaches extracting node embedding (states
S
) for each node in a Decision Tree Graph Neural Network (DT+GNN) layer. Further, pg. 13, Appendix: Using the Tool Fig. 7 teaches a web tool where the DT+GNN can been seen specifically, where it is possible to extract from the GNN the states (embedding) per node (neuron) for the current layer, this can include extracting the node embeddings for each input graph node in graph neural network (GNN) neurons in the penultimate layer of the ML model since the tool allows to “switch between the layers” of the GNN, which can include the penultimate layer of the GNN node).
scalar values from an output of the GNN model; ( Muller pg. 13, Appendix: Using the Tool Fig. 7 teaches a web tool where the DT+GNN can been seen specifically, where it is possible to extract from the GNN (ML model) the node-level importance scores (scalar value) for a specific clicked node from the output layer of the GNN).
receiving, as input: identification of a tree-based model configured to return feature importance values; ( As would be familiar to one skilled in the art, a decision tree is a tree based model. For which, Muller pg. 5, sec: 3.4 Generating Explanations, para. 2, teaches deriving from a decision tree an importance feature, thus indicating the ability of the decision tree (i.e., tree-based model) to return feature importance as a quantify measurement).
inputting, into the tree-based model, the node embeddings extracted from the neurons in the penultimate layer of the GNN model; (
PNG
media_image3.png
285
932
media_image3.png
Greyscale
Muller Fig. 3 teaches the “Decision tree” ( i.e., tree-based model) receiving States S (embedding) from the layer of a GNN and pg. 4, sec: 3.2 Distilling the DT+GNN, para. 1, teaches model distillation, being performed in order to enable the decision tree to predict an output from the inputs States S).
determine, using a tree-based model, an importance metric of at least one embedding dimension of node embeddings from the penultimate layer neurons, comprising: (Muller pg. 5, sec: 3.4 Generating Explanations, para. 1-2, teaches “DT+GNN to create node importances” such that an explanation “assigns each node a real-valued importance, how much it contributed to node
v
being in state
t
in layer
l
”).
processing the tree-based model; ( Muller Fig. 3 teaches processing a Decision Tree, which a person skilled in the relevant art will recognize a decision tree is a tree based model. In addition, Muller pg. 2, 3rd bullet point, teaches pruning the decision tree without compromising accuracy, leading to a smaller tree by performing “Lossless Pruning” or “Lossy Pruning” on both training and validation data to avoid overfitting and maintain generalizability (pg. 5, sec: 3.3 Postprocessing the DT+GNN, para 1-4). Specifically, Muller teaches “Lossless Pruning” allows for the removal of nodes that can be replaces by leaves without affecting the model accuracy (see para. 2) and “Lossy Pruning” allows to prune nodes of the tree such that the tee becomes smaller even though it leads to accuracy deterioration (see para. 4). Additionally, Muller pg. 13, Appendix: Using the Tool Fig. 7 teaches a “Prune Trees” slider that enables to apply the lossy pruning).
Muller is also in the same field of endeavor as Ruichi (machine learning). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of decision tree graph neural network (DT+GNN), as being disclosed and taught by Muller, in the system taught by Ruichi to yield the predictable results of “discover problems in existing explanation benchmarks and to find interesting insights into real-world datasets” (Muller pg. 2, 5th bullet point).
Ruichi and Muller do not teach or suggest extracting, from the received ML model: a count of a number of neurons in a penultimate layer of the ML model; restricting the penultimate layer neuron count of the ML model to correspond to a number of the highest importance node embedding dimensions; and after training the tree-based model... normalizing the node embedding dimension importance value outputs of the tree-based model; and filtering the ...normalized node embedding dimension importance value outputs to produce highest importance node embedding dimensions...
However, Ali teaches the following:
extracting, from the received GNN model: (Ali pg. 3, sec: 1.4 Model, teaches extracting information from the model).
a count of a number of neurons in a penultimate layer of the GNN model; ( Ali Abstract and Fig. 2, teaches extracting a count of a number of neurons, (i.e., “84 neurons”) in a penultimate layer (i.e., “sixth layer”) of the model).
restrict the penultimate layer neuron count of the GNN model to correspond to a number of the high importance node ; and (Ali Abstract, teaches implementing pruning (restriction) techniques in “the penultimate fully connected layer” of a machine learning model. Further, Ali pg. 3-4, sec: 1.5 Pruning, para. 1-3, teaches pruning (restricting) the fully connected layer (penultimate layer) of the model in order to removed neurons based on redundancy and to spare only the most significant (high importance) neurons).
Ali is also in the same field of endeavor as Ruichi and Muller (Machine Learning). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of extracting information from a machine learning model, as being disclosed and taught by Ali, in the system taught by Ruichi and Muller to yield the predictable results of developing an understanding “about how pruning can affect performance of a chosen CNN architecture and to what extent it can be pruned so the performance of the pruned model is not significantly different comparing to the baseline in terms of performance” ( Ali, pg. 1, sec: 1.1 Motivation and goal).
Neither Ruchi, Muller or Ali teach after training the tree-based model... normalizing the node embedding dimension importance value outputs of the tree-based model; and filtering the ...normalized node embedding dimension importance value outputs to produce highest importance node embedding dimensions...
Nonetheless, Wong teaches the following:
after training the tree-based model... normalizing the importance value outputs of the tree-based model; (Wong teaches a machine learning model that comprises a decision tree (i.e., tree-based model), see [0055]. Further [0056] teaches a “machine learning model platform” that implements a decision tree in order to output feature importance and [0055] teaches the “machine learning model platform” is configurated to provide at least a single scalar output (i.e., importance value outputs), that can be normalized within a pre-defined range (e.g.,0 and 1)).
filtering the ...normalized importance value outputs... by applying threshold... to the ...normalized importance value outputs (Wong [0057] teaches filtering (e.g., approve or decline) the scalar output (i.e., importance value outputs) by applying a threshold to the scalar output).
Wong is also in the same field of endeavor as Ruichi, Muller, and Ali (machine learning). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of normalizing the output of the tree based model, as being disclosed and taught by Wong, in the system taught by Ruichi, Muller, and Ali to yield the predictable results of “provide dynamically updating machine learning models despite large transaction flows and/or despite the need for segregation of different data sources” (see Wong [0026]).
Regarding claim 2:
Ruichi, Muller, Ali, and Wong teach The method of claim 1. Muller specifically teaches wherein the node embeddings are extractable from the GNN's neurons for each node in the input graph data ( Muller Fig. 7 teaches node embedding (states) being extracted from neurons/nodes of the GNN).
Regarding claim 3:
Ruichi, Muller, Ali, and Wong teach The method of claim 1. Ruichi specifically teaches wherein the importance threshold input comprises a range between 0 and 1 ( Ruichi Fig. 1, teaches a pre-defined pruning ratio input and pg. 9198, sec: 3.3. Pruning Networks Using NISP, para. 1, teaches the pre-defined pruning ration input can be used to “remove neurons with prune indicator value 0”).
Regarding claim 4:
Ruichi, Muller, Ali, and Wong teach The method of claim 1. Muller specifically teaches wherein the tree-based model is configured to return feature importance values after the tree-based model has been trained (Muller pg. 5, sec: 3.4 Generating Explanations, para. 2, teaches deriving from a decision tree an importance feature, thus indicating the ability of the decision tree (tree-based model) to return feature importance as a quantify measurement after it has been trained).
Regarding claim 5:
Ruichi, Muller, Ali, and Wong teach The method of claim 1. Muller specifically teaches wherein the tree-based model is trained to map the node embeddings to one or more of node classes and target values (Muller pg. 5, sec: 3.3 Postprocessing the DT+GNN, para. 1 teaches a decision tree such as a MLP (tree-based model ). Further, Muller pg. 4, para. 2, teaches how the MLP can be used for node classification by concatenating node states (embeddings) and pg. 5, sec: 3.4 Generating Explanations, para. 2, teaches “ a DT+GNN layer
l
that maps from the input state space
S
to the target space
T
”, this suggest the Decision Tree (DT) is trained to map the node states (node embedding) to target values (target space
T
) ).
Regarding claim 6:
Ruichi, Muller, Ali, and Wong teach The method of claim 1. Ruichi specifically teaches wherein the importance threshold input for filtering the node embeddings is based on the determined importance metric ( Ruichi Fig. 1, teaches a pre-defined pruning ratio input (determined importance metric) and pg. 9198, sec: 3.3. Pruning Networks Using NISP, para. 1, teaches the pre-defined pruning ration input can be used to “remove neurons with prune indicator value 0”, such prune indicator value can be seen as the importance threshold input for filtering/pruning the node embeddings).
Regarding claim 12:
A system, comprising: a processor; and memory comprising instructions that when executed by the processor cause the processor to; (Ruichi pg. 9201, right col., para. 1, teaches using “GPU resources... to search the optimal hyper-parameters by trying different pruning ratio combinations on a validation set” which inheritably will require a system, a processor and memory).
determine, using a model, an importance metric of at least one embedding dimension of node embeddings from the penultimate layer neurons, comprising: (Ruichi pg. 9194, Abstract, teaches using a model (Neuron Importance Score Propagation (NISP) algorithm) to “propagate the importance scores of final responses to every neuron in the network” and pg. 9196, left col. sec: 3 Our Approach para. 1, teaches applying a feature ranking algorithm (NIPS) on the final response layer (penultimate layer) of the network to obtain an importance score (importance metric) of each neurons).
summing node embedding dimension importance value outputs of the model; and ( Ruichi pg. 9198, left col. sec: 3.2.4 Algorithm, para. 1 teaches obtaining an importance score “of every neuron in the final response layer of the network” which is computed by weighted sum in equation 19).
filtering the summed node embedding dimension importance value outputs to produce high importance node embedding dimensions from penultimate layer nodes by applying an importance threshold input to the summed and normalized node embedding dimension importance value outputs; ( Ruichi pg. 9196, left col., sec: 3.1. Feature Ranking on the Final Response Layer, para. 1, teaches applying feature ranking (filtering) on the penultimate layer and Ruichi pg. 9198, left col. sec: 3.2.4 Algorithm, para. 1, teaches obtaining importance score of every neuron in the final response layer of the network. Further, Ruichi pg. 9198, left col. sec: 3.3. Pruning Networks, using NIPS, para. 1, teaches “we propagate the importance scores, compute the prune indicator of neurons based on their importance scores and remove neurons with prune indicator value 0”).
restrict the penultimate layer of the ML model to correspond to a number of the high importance node embedding dimensions; and
PNG
media_image1.png
403
582
media_image1.png
Greyscale
Ruichi pg. 9196, left col., sec: 3.2. Neuron Importance Score Propagation (NISP) teaches restricting/pruning intermediate neurons based on the importance scores of penultimate layer ( i.e., final responses layer (FRL)) and Fig. 2 teaches the penultimate layer corresponding to “a number of the high importance node embedding dimensions”).
train the ML model using the restricted penultimate layer ( Ruichi Abstract, teaches “The CNN is pruned by removing neurons with least importance, and it is then fine-tuned to recover its predictive power”, this suggest the ML model (i.e., CNN) is trained using the pruned (restricted) layers including the penultimate layer (i.e., final response layer (FRL)).
Ruichi does not teach or suggest extracting a count of a number of neurons in a penultimate layer of the ML model and node embedding from the layer of a GNN, and scalar values from an output of the ML model. Further, Ruichi does not teach or suggest determining an importance metric using a tree-based model, processing a tree-based model, after training the tree-based model... normalizing the node embedding dimension importance value outputs of the tree-based model; and filtering the ...normalized node embedding dimension importance value outputs to produce highest importance node embedding dimensions...and restrict the penultimate layer node count of the ML model to correspond to a number of the high importance node embedding dimensions.
Nevertheless, Muller teaches the following:
extract, from a machine learning (ML) model: node embeddings for each input graph node in graph neural network (GNN) neurons in the penultimate layer of the ML model, the node embeddings comprising N scalar values representing features the GNN identifies for each neuron, a neuron identity, an edge identity, and a weight of each edge; and (
PNG
media_image4.png
677
1019
media_image4.png
Greyscale
Muller Fig. 2 in pg. 2, teaches extracting node embedding (states
S
) for each node in a Decision Tree Graph Neural Network (DT+GNN) layer. Further, pg. 13, Appendix: Using the Tool Fig. 7 teaches a web tool where the DT+GNN can been seen specifically, where it is possible to extract from the GNN the states (embedding) per node (neuron) for the current layer, this can include extracting the node embeddings for each input graph node in graph neural network (GNN) neurons in the penultimate layer of the ML model since the tool allows to “switch between the layers” of the GNN, which can include the penultimate layer of the GNN node. Furthermore, Muller teaches a GNN having GNN nodes (neurons) and associated states (node embedding) per node for the current layer and teaches the GNN having weights (i.e., neuron identity & edge identity) connecting two vertices/node/neurons (see pg. 13, Appendix: Using the Tool, Fig. 7). In addition, Muller pg. 14 , Fig.8, teaches scalar values such as node-level importance scores for each node withing the GNN layer).
scalar values from an output of the ML model; ( Muller pg. 13, Appendix: Using the Tool Fig. 7 teaches a web tool where the DT+GNN can been seen specifically, where it is possible to extract from the GNN (ML model) the node-level importance scores (scalar value) for a specific clicked node from the output layer of the GNN).
determine, using a tree-based model, an importance metric of at least one embedding dimension of node embeddings from the penultimate layer neurons, comprising: (Muller pg. 5, sec: 3.4 Generating Explanations, para. 1-2, teaches “DT+GNN to create node importances” such that an explanation “assigns each node a real-valued importance, how much it contributed to node
v
being in state
t
in layer
l
”).
processing the tree-based model; ( Muller Fig. 3 teaches processing a Decision Tree, which a person skilled in the relevant art will recognize a decision tree is a tree based model. In addition, Muller pg. 2, 3rd bullet point, teaches pruning the decision tree without compromising accuracy, leading to a smaller tree by performing “Lossless Pruning” or “Lossy Pruning” on both training and validation data to avoid overfitting and maintain generalizability (pg. 5, sec: 3.3 Postprocessing the DT+GNN, para 1-4). Specifically, Muller teaches “Lossless Pruning” allows for the removal of nodes that can be replaces by leaves without affecting the model accuracy (see para. 2) and “Lossy Pruning” allows to prune nodes of the tree such that the tee becomes smaller even though it leads to accuracy deterioration (see para. 4). Additionally, Muller pg. 13, Appendix: Using the Tool Fig. 7 teaches a “Prune Trees” slider that enables to apply the lossy pruning).
Muller is also in the same field of endeavor as Ruichi (machine learning). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of decision tree graph neural network (DT+GNN), as being disclosed and taught by Muller, in the system taught by Ruichi to yield the predictable results of “discover problems in existing explanation benchmarks and to find interesting insights into real-world datasets” (Muller pg. 2, 5th bullet point).
Muller does not suggest or teach extract, from a machine learning (ML) model: a count of a number of neurons in a penultimate layer of the ML model; and after training the tree-based model... normalizing the node embedding dimension importance value outputs of the tree-based model; and filtering the ...normalized node embedding dimension importance value outputs to produce highest importance node embedding dimensions... restrict the penultimate layer node count of the ML model to correspond to a number of the high importance node embedding dimensions; and
However, Ali teaches the following:
extract, from a machine learning (ML) model: (Ali pg. 3, sec: 1.4 Model, teaches extracting information from the model).
a count of a number of neurons in a penultimate layer of the ML model; ( Ali Abstract and Fig. 2, teaches extracting a count of a number of neurons, (i.e., “84 neurons”) in a penultimate layer (i.e., “sixth layer”) of the ML model).
restrict the penultimate layer neuron count of the ML model to correspond to a number of the high importance node ; and (Ali Abstract, teaches implementing pruning (restriction) techniques in “the penultimate fully connected layer” of a machine learning model. Further, Ali pg. 3-4, sec: 1.5 Pruning, para. 1-3, teaches pruning (restricting) the fully connected layer (penultimate layer) of the model in order to removed neurons based on redundancy and to spare only the most significant (high importance) neurons).
Ali is also in the same field of endeavor as Ruichi and Muller (Machine Learning - Pruning). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of extracting information from a machine learning model, as being disclosed and taught by Ali, in the system taught by Ruichi and Muller to yield the predictable results of developing an understanding “about how pruning can affect performance of a chosen CNN architecture and to what extent it can be pruned so the performance of the pruned model is not significantly different comparing to the baseline in terms of performance” ( Ali, pg. 1, sec: 1.1 Motivation and goal).
Neither Ruchi, Muller or Ali teaches after training the tree-based model... normalizing the node embedding dimension importance value outputs of the tree-based model; and filtering the ...normalized node embedding dimension importance value outputs to produce highest importance node embedding dimensions...
Nonetheless, Wong teaches the following:
after training the tree-based model... normalizing the importance value outputs of the tree-based model; (Wong teaches a machine learning model that comprises a decision tree (i.e., tree-based model), see [0055]. Further [0056] teaches a “machine learning model platform” that implements a decision tree in order to output feature importance and [0055] teaches the “machine learning model platform” is configurated to provide at least a single scalar output (i.e., importance value outputs), that can be normalized within a pre-defined range (e.g.,0 and 1)).
filtering the ...normalized importance value outputs... by applying threshold... to the ...normalized importance value outputs (Wong [0057] teaches filtering (e.g., approve or decline) the scalar output (i.e., importance value outputs) by applying a threshold to the scalar output).
Wong is also in the same field of endeavor as Ruichi, Muller, and Ali (machine learning). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of normalizing the output of the tree based model, as being disclosed and taught by Wong, in the system taught by Ruichi, Muller, and Ali to yield the predictable results of “provide dynamically updating machine learning models despite large transaction flows and/or despite the need for segregation of different data sources” (see Wong [0026]).
Regarding claim 13: is a system type claim comprising limitations similar to those of claim 2, therefore is rejected under the same rational of claim 2.
Regarding claim 14: is a system type claim comprising limitations similar to those of claim 3, therefore is rejected under the same rational of claim 3.
Regarding claim 16: is a system type claim comprising limitations similar to those of claim 5, therefore is rejected under the same rational of claim 5.
Regarding claim 17: is a system type claim comprising limitations similar to those of claim 6, therefore is rejected under the same rational of claim 6.
Claims 7-8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Ruichi, Muller, Ali, Wong in further view of Reese et al. US 11,093,864 B1 (hereinafter Reese).
Regarding claim 7:
Ruichi, Muller, Ali and Wong teach The method of claim 1. Neither Ruichi, Muller, Ali or Wong specifically teaches receiving an integer value specifying repetitions for training the tree-based model.
Nevertheless, Reese teaches the following:
further comprising: receiving an integer value specifying repetitions for training the tree-based model (Reese col.10:66-67, & col.11:1,10-13 teaches receiving hyperparameters such as iteration value for training the tree model).
Reese is also in the same field of endeavor as Ruichi, Muller, Ali and Wong (machine learning). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of hyperparameters such as iteration values for training the tree-based model, as being disclosed and taught by Reese, in the system taught by Ruichi, Muller, Ali and Wong to yield the predictable results of improve model accuracy with significantly faster computing times ( see Reese col.27:11-13).
Regarding claim 8:
Ruichi, Muller, Ali, Wong and Reese teach The method of claim 7. Reese specifically teaches wherein processing the tree-based model comprises performing a number of runs as specified by the received integer value ( Reese col. 10:64-67 and col:11:1 teaches the defined values (integer value) received for the hyperparameters are used to describe the training and validating process, therefore this suggest such hyperparameters are used to performs the iterations (number of runs) as specifies by the input).
Regarding claim 18:
Ruichi, Muller, Ali and Wong teach The system of claim 12. Ruichi, Muller, Ali and Wong do not disclose or teach wherein the system is further configured to process the tree-based model by performing a number of runs as specified by a received integer value specifying repetitions for training the tree-based model.
However, Reese teaches the following:
wherein the system is further configured to process the tree-based model by performing a number of runs as specified by a received integer value specifying repetitions for training the tree-based model (Reese col.10:66-67, & col.11:1,10-13 teaches receiving hyperparameters such as iteration value for training the tree model. Further, Reese col. 10:64-67 and col:11:1 teaches the defined values (integer value) received for the hyperparameters are used to describe the training and validating process, therefore this suggest such hyperparameters are used to performs the iterations (number of runs) as specifies by the input).
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Ruichi, Muller, Reese and Wong.
Regarding claim 20:
At least one non-transitory computer-readable medium comprising a set of instructions that, in response to being executed by a processor circuit, cause the processor circuit to: (Ruichi pg. 9201, right col., para. 1, teaches using “GPU resources... to search the optimal hyper-parameters by trying different pruning ratio combinations on a validation set” which inheritably will require a computer-readable medium comprising a set of instructions that, in response to being executed by a processor circuit).
determine, using a neural network , ...the determining comprising: (Ruichi pg. 9196, left col. sec: 3 Our Approach para. 1, teaches applying a feature ranking algorithm on the final response layer (penultimate layer) of the network to obtain an importance score (importance metric) of each neurons).
summing node embedding dimension importance value outputs of the tree-based model, and filtering the summed node embedding dimension importance value outputs by applying an importance threshold input to produce high importance node embedding dimensions associated with the penultimate layer neurons; ( Ruichi pg. 9198, left col. sec: 3.2.4 Algorithm, para. 1 teaches obtaining an importance score “of every neuron in the final response layer of the network” which is computed by weighted sum in equation 17. Further, Ruichi pg. 9196, left col., sec: 3.1. Feature Ranking on the Final Response Layer, para. 1, teaches applying feature ranking (filtering) on the penultimate layer and Yu pg. 9198, left col. sec: 3.2.4 Algorithm, para. 1, teaches obtaining importance score of every neuron in the final response layer of the network. Further, Yu pg. 9198, left col. sec: 3.3. Pruning Networks, using NIPS, para. 1, teaches “we propagate the importance scores, compute the prune indicator of neurons based on their importance scores and remove neurons with prune indicator value 0”).
restrict the penultimate layer neuron count to correspond to the high importance node embedding dimensions; and (Ruichi pg. 9196, left col., sec: 3.2. Neuron Importance Score Propagation (NISP) teaches restricting/pruning intermediate neurons based on the importance scores of penultimate layer ( i.e., final responses layer (FRL)) and Fig. 2 teaches the penultimate layer corresponding to “a number of the high importance node embedding dimensions”).
train a machine learning model using the restricted penultimate layer ( Ruichi Abstract, teaches “The CNN is pruned by removing neurons with least importance, and it is then fine-tuned to recover its predictive power”, this suggest the ML model (i.e., CNN) is trained using the pruned (restricted) layers including the penultimate layer (i.e., final response layer (FRL)).
Ruichi does not suggest determine an importance metric using a tree based model of a GNN and training a machine learning model associated with the GNN using... penultimate layer...the node embeddings comprising N scalar values representing features the GNN identifies for each neuron, a neuron identity, an edge identity, and a weight of each edge,...; processing the tree-based model by performing a number of runs as specified by a received integer value specifying repetitions for training the tree-based model; and after training the tree-based model... normalizing the node embedding dimension importance value outputs of the tree-based model, and filtering the ...normalized node embedding dimension importance value outputs to produce highest importance node embedding dimensions...
Nevertheless, Muller teaches the following:
determine, using a tree-based model, an importance metric of each dimension of node embeddings from penultimate layer neurons of a trained graph neural network (GNN), the node embeddings comprising N scalar values representing features the GNN identifies for each neuron, a neuron identity, an edge identity, and a weight of each edge, the determining comprising: (Muller pg. 5, sec: 3.4 Generating Explanations, para. 1-2, teaches “DT+GNN to create node importances” such that an explanation “assigns each node a real-valued importance, how much it contributed to node
v
being in state
t
in layer
l
”. Further, pg. 13, Appendix: Using the Tool, Fig. 7 teaches a web tool that can be used to determine the importance metric of each embedding (states) from the penultimate layer neurons of a trained graph neural network as the tool enables to “switch” between the layers of the GNN which can include the penultimate layer. Furthermore, Muller teaches a GNN having GNN nodes (neurons) and associated states (node embedding) per node for the current layer and teaches the GNN having weights (i.e., neuron identity & edge identity) connecting two vertices/node/neurons (see pg. 13, Appendix: Using the Tool, Fig. 7). In addition, Muller pg. 14 , Fig.8, teaches scalar values such as node-level importance scores for each node withing the GNN layer).
processing the tree-based model... ( Muller Fig. 3 teaches processing a Decision Tree, which a person skilled in the relevant art will recognize a decision tree is a tree based model. In addition, Muller pg. 2, 3rd bullet point, teaches pruning the decision tree without compromising accuracy, leading to a smaller tree by performing “Lossless Pruning” or “Lossy Pruning” on both training and validation data to avoid overfitting and maintain generalizability (pg. 5, sec: 3.3 Postprocessing the DT+GNN, para 1-4). Specifically, Muller teaches “Lossless Pruning” allows for the removal of nodes that can be replaces by leaves without affecting the model accuracy (see para. 2) and “Lossy Pruning” allows to prune nodes of the tree such that the tee becomes smaller even though it leads to accuracy deterioration (see para. 4). Additionally, Muller pg. 13, Appendix: Using the Tool Fig. 7 teaches a “Prune Trees” slider that enables to apply the lossy pruning).
train a machine learning model associated with the GNN using penultimate layer ( Muller pg. 7, para. 1, teaches the GNN being trained for 1500 epoch. A person skilled in the relevant art will recognize that the penultimate layer of a GNN is inheritably used during training).
Muller is also in the same field of endeavor as Ruichi (machine learning). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of decision tree graph neural network (DT+GNN), as being disclosed and taught by Muller, in the system taught by Ruichi to yield the predictable results of “discover problems in existing explanation benchmarks and to find interesting insights into real-world datasets” (Muller pg. 2, 5th bullet point).
While Muller teach processing a tree-based model, neither Ruichi and Muller explicitly teach processing the tree-based model by performing a number of runs as specified by a received integer value specifying repetitions for training the tree-based model; and after training the tree-based model... normalizing the node embedding dimension importance value outputs of the tree-based model, and filtering the ...normalized node embedding dimension importance value outputs to produce highest importance node embedding dimensions...
Nonetheless, Reese teaches the following:
processing the tree-based model by performing a number of runs as specified by a received integer value specifying repetitions for training the tree-based model; (Reese col.10:66-67, & col.11:1,10-13 teaches receiving hyperparameters such as iteration value for training the tree model. Further, Reese col. 10:64-67 and col:11:1 teaches the defined values (integer value) received for the hyperparameters are used to describe the training and validating process, therefore this suggest such hyperparameters are used to performs the iterations (number of runs) as specifies by the input).
Reese is also in the same field of endeavor as Ruichi and Muller (machine learning). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of processing a tree-based model, as being disclosed and taught by Reese, in the system taught by Ruichi and Muller to yield the predictable results of improve model accuracy with significantly faster computing times (Reese col.27:11-13).
Neither Ruichi, Muller or Reese teach or suggest after training the tree-based model... normalizing the node embedding dimension importance value outputs of the tree-based model, and filtering the ...normalized node embedding dimension importance value outputs to produce highest importance node embedding dimensions...
However, Wong teaches the following:
after training the tree-based model... normalizing the importance value outputs of the tree-based model and filtering the ...normalized importance value outputs... by applying threshold... to the ...normalized importance value outputs (Wong teaches a machine learning model that comprises a decision tree (i.e., tree-based model), see [0055]. Further [0056] teaches a “machine learning model platform” that implements a decision tree in order to output feature importance and [0055] teaches the “machine learning model platform” is configurated to provide at least a single scalar output (i.e., importance value outputs), that can be normalized within a pre-defined range (e.g.,0 and 1). Furthermore, Wong [0057] teaches filtering (e.g., approve or decline) the scalar output (i.e., importance value outputs) by applying a threshold to the scalar output).
Wong is also in the same field of endeavor as Ruichi, Muller, and Reese (machine learning). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the functionality of normalizing the output of the tree based model, as being disclosed and taught by Wong, in the system taught by Ruichi, Muller, and Reese to yield the predictable results of “provide dynamically updating machine learning models despite large transaction flows and/or despite the need for segregation of different data sources” (Wong [0026]).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GISEL G FACCENDA whose telephone number is (703)756-1919. The examiner can normally be reached Monday - Friday 8:00 am - 4:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached at (571) 270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/G.G.F./Examiner, Art Unit 2127
/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127