Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
The Action is responsive to the Request for Continued Examination and the Amendments and Remarks filed on 2/26/2026. Claims 1-2, 4-12, and 14-20 are pending claims. Claims 1, 11, and 19 are written in independent form. Claims 3 and 13 have been cancelled.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2, 5-12, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (U.S. Pre-Grant Publication No. 2022/0343329) and further in view of Widmann et al. (U.S. Pre-Grant Publication No. 2019/0378051, hereinafter referred to as Widmann).
Regarding Claim 1:
Wang teaches a computer-implemented method, comprising:
Accessing, by a server system, an entity-related dataset from a database associated with the server system, the entity-related dataset comprising information related to a plurality of entities;
Wang teaches receives training set 132 of transactions (from separation module 130) and performs one or more preprocessing techniques on transactions in the training set in order to generate a matrix 232 of feature vectors from the transactions' attributes” (Para. [0032]) where “transactions include various different attributes, such as a transaction location, currency, one or more entities involved in the transaction, internet protocol (IP) addresses of devices involved in the transaction, etc.” (Para. [0020]).
Generating, by the server system, a set of entity-specific features corresponding to each of the plurality of entities based, at least in part, on the entity-related dataset, the set of entity-specific features comprising a subset of numerical features and a subset of high-cardinality categorical features;
Wang teaches “numerical attributes (e.g., transaction amount, a number of days a user account has been registered, account balance, etc.) included in transactions” and “categorical attributes (e.g., funding source, customer region, consumer segmentation, merchant segmentation, currency code, etc.) of transactions” (Para. [0032]) and generating a transaction network graph 122 that “includes a plurality of edges connecting one or more pairs of nodes included in the plurality of nodes, where the edges represent electronic transactions between the multiple different entities. For example, the nodes are vertices representing consumers, merchants, etc. while the edges are transactions that occurred between the consumers, merchants, etc.” (Para. [0053]), thereby teaching including the features being organized specific to each of the entities (nodes) but still including numerical and categorical attributes.
Generating, by the server system, corresponding labels comprising the subset of high-cardinality categorical features by shifting out the subset of high-cardinality categorical features from the set of entity-specific features;
Wang teaches “preprocessing module 230 may perform a first data transformation routine on numerical attributes (e.g., transaction amount, a number of days a user account has been registered, account balance, etc.) included in transactions. Preprocessing module 230 may perform a second data transformation routine on categorical attributes (e.g., funding source, customer region, consumer segmentation, merchant segmentation, currency code, etc.) of transactions included in training set 132” (Para. [0032]) thereby teaching shifting out/separating the numerical attributes and the categorical attributes. Wang further teaches the categorical attributes as high-cardinality by teaching “categorical attributes (e.g., funding source, customer region, consumer segmentation, merchant segmentation, currency code, etc.)” (Para. [0032]).It is noted that Fig. 4 of the Present Specification shows that all categorical features are determined to be high-cardinality and the scope of the claims do not indicate how high cardinality categorical features are determined or different from any other categorical features.
Generating a first set of entity-specific embeddings corresponding to each of the plurality of entities based, at least in part, on the corresponding subset of numerical features;
Wang teaches generating “a first embedded matrix 142” which is the output of Matrix Module 250 in TopkPPR Module 210. (Para. [0024] & Fig. 2A). Wang further teaches “TopkPPR module 210 receives matrix 232 of feature vectors from preprocessing module 230 and training set 132 of transactions (from separation module 130)” (Para. [0033]), thereby teaching generating the first set of embeddings corresponding to each of the plurality of entities based, at least in part, on the features (including at least the numerical attribute features taught in Para. [0032]) and the known labels.
Generating a second set of entity-specific embeddings corresponding to each of the plurality of entities based on the corresponding subset of categorical features; and
Wang teaches generating “the second embedded matrix 144” which is the output of Aggregation Module 280 in Neighborhood Module 220 (Para. [0025] & Fig. 2A). Wang further teaches “Aggregation module 280 aggregates attributes of transactions at different nodes based on their assigned anomaly scores 262.” (Para. [0039]) thereby teaching generating the second set of embeddings corresponding to each of the plurality of entities based on the features (including at least the categorical attribute features taught in Para. [0032]).
Wherein generating the second set of entity-specific embeddings, further comprises:
Generating, by the server system, a set of intermediate entity-specific embeddings corresponding to each of the plurality of entities based, at least in part, on the corresponding subset of categorical features; and
Wang teaches “Anomaly module 260 determines anomaly scores 262 for nodes in the transaction network graph 122 at which transactions in the training set 132 occur. For example, anomaly module 260 aggregates attributes for all transactions associated with a particular node on that node and repeats this process for each node in the transaction network graph that is associated with transactions in the training set 132. Then, anomaly module 260 determines, using a histogram-based outlier score (HBOS), numerical values (i.e., anomaly scores 262) for each attribute. These numerical values are stored the proper node based on the aggregated attributes of these nodes.” (Para. [0037]). Therefore, Wang teaches creating intermediate entity-specific embeddings based at least in part on the features (including at least the categorical attribute features taught in Para. [0032]).
Generating, by the server system, a set of optimal embeddings corresponding to each of the plurality of entities based, at least in part, on concatenating the corresponding first set of entity-specific embeddings and the corresponding second set of entity-specific embeddings.
Wang teaches “Combination module 160 receives first matrix 142 and second matrix 144 from embedding module 140 and generates a final embedded matrix 162. Combination module 160 generates final matrix 162 by concatenating the first matrix 142 with the second matrix 144.” (Para. [0027]).
Wang explicitly teaches all of the elements of the claimed invention as recited above except:
Implementing, by the server system, a loss function based on the corresponding labels generated based on the subset of categorical features with high-cardinality;
Generating, by the server system via a first machine learning model, embeddings based at least in part on the loss function;
Generating, by the server system via a second machine learning model, embeddings based at least in part on the corresponding first set of entity-specific embeddings (output of the first machine learning model).
Determining, by the server system via a self-learning attention process, the second set of entity-specific embeddings corresponding to each of the plurality of entities by updating the corresponding set of intermediate entity-specific embeddings based, at least in part, on the corresponding first set of entity-specific embeddings and the subset of categorical features;
However, in the related field of endeavor of machine learning systems coupled to a graph structure, Widmann teaches:
Implementing, by the server system, a loss function based on the corresponding labels generated based on the subset of categorical features with high-cardinality;
Widmann teaches machine learning models as outputting embeddings by teaching “the system may use machine learning to determine an output. The output may include anomaly scores, heat scores/values, confidence values, and/or classification output. The system may use any machine learning model including xgboosted decision trees, auto-encoders, perceptron, decision trees, support vector machines, regression, and/or a neural network. The neural network may be any type of neural network including a feed forward network, radial basis network, recurrent neural network, long/short term memory, gated recurrent unit, auto encoder, variational autoencoder, convolutional network, residual network, Kohonen network, and/or other type. In one example, the output data in the machine learning system may be represented as multi-dimensional arrays, an extension of two-dimensional tables (such as matrices) to data with higher dimensionality.” (Para. [0056]) where “the neural network may include a loss function. A loss function may, in some examples, measure a number of missed positives; alternatively, it may also measure a number of false positives. The loss function may be used to determine error when comparing an output value and a target value. For example, when training the neural network the output of the output layer may be used as a prediction and may be compared with a target value of a training instance to determine an error. The error may be used to update weights in each layer of the neural network.” (Para. [0057]).Therefore, Widmann teaches implementing a loss function based on labels from input features when generating embeddings using a neural network.
Generating, by the server system via a first machine learning model, embeddings based at least in part on the loss function;
Widmann teaches using at least first and second machine learning models to perform operations on entity-specific features by teaching “Each entities' associated machine learning model may have different output. For example, in a transaction involving a merchant and a credit card, a machine learning model associated with the merchant may output a low fraud score, whereas a second machine learning model associated with the credit card may output a high fraud score. In some examples, the anomaly score may share a purpose with or be the same as the confidence score described herein.” (Para. [0078]).
Widmann further teaches the machine learning models as outputting embeddings by teaching “the system may use machine learning to determine an output. The output may include anomaly scores, heat scores/values, confidence values, and/or classification output. The system may use any machine learning model including xgboosted decision trees, auto-encoders, perceptron, decision trees, support vector machines, regression, and/or a neural network. The neural network may be any type of neural network including a feed forward network, radial basis network, recurrent neural network, long/short term memory, gated recurrent unit, auto encoder, variational autoencoder, convolutional network, residual network, Kohonen network, and/or other type. In one example, the output data in the machine learning system may be represented as multi-dimensional arrays, an extension of two-dimensional tables (such as matrices) to data with higher dimensionality.” (Para. [0056]) where “the neural network may include a loss function. A loss function may, in some examples, measure a number of missed positives; alternatively, it may also measure a number of false positives. The loss function may be used to determine error when comparing an output value and a target value. For example, when training the neural network the output of the output layer may be used as a prediction and may be compared with a target value of a training instance to determine an error. The error may be used to update weights in each layer of the neural network.” (Para. [0057]).
Generating, by the server system via a second machine learning model, embeddings based at least in part on the corresponding first set of entity-specific embeddings (output of the first machine learning model).
Widmann teaches using at least first and second machine learning models to perform operations on entity-specific features by teaching “Each entities' associated machine learning model may have different output. For example, in a transaction involving a merchant and a credit card, a machine learning model associated with the merchant may output a low fraud score, whereas a second machine learning model associated with the credit card may output a high fraud score. In some examples, the anomaly score may share a purpose with or be the same as the confidence score described herein.” (Para. [0078]).
Widmann further teaches the machine learning models as outputting embeddings by teaching “the system may use machine learning to determine an output. The output may include anomaly scores, heat scores/values, confidence values, and/or classification output. The system may use any machine learning model including xgboosted decision trees, auto-encoders, perceptron, decision trees, support vector machines, regression, and/or a neural network. The neural network may be any type of neural network including a feed forward network, radial basis network, recurrent neural network, long/short term memory, gated recurrent unit, auto encoder, variational autoencoder, convolutional network, residual network, Kohonen network, and/or other type. In one example, the output data in the machine learning system may be represented as multi-dimensional arrays, an extension of two-dimensional tables (such as matrices) to data with higher dimensionality.” (Para. [0056]).
Widmann also teaches using the output of the first machine learning model as input to the second machine learning model by teaching “Multiple machine learning models may be used together to make decisions. The output of one machine learning model may be used as the input of another machine learning model” (Para. [0161]).
Determining, by the server system via a self-learning attention process, the second set of entity-specific embeddings corresponding to each of the plurality of entities by updating the corresponding set of intermediate entity-specific embeddings based, at least in part, on the corresponding first set of entity-specific embeddings and the subset of categorical features;
Wang teaches “Aggregation module 280 aggregates attributes of transactions at different nodes based on their assigned anomaly scores 262. For example, aggregation module 280 combines, for a given node assigned an anomaly label (e.g., a node label of 1), attributes of transactions occurring at nodes within a neighborhood of nodes of the given node and attributes of transactions occurring at the given node. Further, aggregation module 280 combines, for a given node assigned an approved label (e.g., a node label of 0), attributes of transaction occurring at the approved labeled node.” (Para. [0039]) where the second matrix 144 is the output of Aggregation module 280 in Neighborhood Module 220 (Fig. 2A).
Widmann teaches using the output of the first machine learning model as input to the second machine learning model by teaching “Multiple machine learning models may be used together to make decisions. The output of one machine learning model may be used as the input of another machine learning model” (Para. [0161]) thereby teaching the first set of entity-specific embeddings from the TopkPPR Module 210 (Fig. 2A of Wang) as input to the Neighborhood Module 220 (Fig. 2A of Wang) that affects the output embeddings of Neighbhorhood Module 220.
Widmann further teaches using a self-learning attention process by teaching “Machine learning tasks are sometimes broadly categorized as either unsupervised learning or supervised learning. In unsupervised learning, a machine learning algorithm is left to generate any output (e.g., to label as desired) without feedback. The machine learning algorithm may teach itself (e.g., observe past output), but otherwise operates without (or mostly without) feedback from, for example, a human administrator. An embodiment involving unsupervised machine learning is described herein.” (Para. [0050]) and “self-learning may be used in a so-called unsupervised machine learning model, wherein the machine learning model reconfigures nodes and/or edges without external feedback.” (Para. [0100]).
Thus, it would have been obvious to one of ordinary skill in the art, having the teachings of Widmann and Wang at the time that the claimed invention was effectively filed, to have combined the use of supervised machine learning models connected with a graph module as taught by Widmann with the systems and methods for generating a concatenating final matrix for training a machine learning model to detect anomalous transactions, as taught by Wang.
One would have been motivated to make such combination because Widmann teaches “A supervised machine learning model may provide an indication to the graph module that output from the machine learning model was correct and/or incorrect. In response to that indication, the graph module may modify one or more nodes and/or edges to improve output.” and “modifications to the nodes and/or edges may be based on a prediction, by the machine learning model and/or the graph module, of a change that may result an improvement” (Para. [0136]) and it would have been obvious to a person having ordinary skill in the art that using the machine learning models, taught by Widmann, in combination with the graph module, taught by Wang, would improve at least the graph module 120 taught by Wang.
Regarding Claim 2:
Widmann and Wang further teach:
Wherein generating the first set of entity-specific embeddings, further comprises:
Extracting, by the server system, a set of important features from the subset of numerical features based, at least in part, on a task and a set of pre-defined rules;
Widmann teaches “Each of the nodes [of the artificial network in Fig. 1] may be connected to one or more other nodes. The connections may connect the output of a node to the input of another node. A connection may be correlated with a weighting value. For example, one connection may be weighted as more important or significant than another, thereby influencing the degree of further processing as input traverses across the artificial neural network.” (Para. [0061]) thereby teaching extracting only important features, “for influencing the degree of further processing as input traverses across the artificial neural network”, based a task and pre-defined rules related to the weight value and the connection of nodes in the artificial neural network.
Computing, by the server system, a new set of weights for each of the plurality of entities based, at least in part, on adjusting weights associated with the set of important features; and
Wang teaches “weighting module 240 generates multiple weight values 242 for each node” where each node represents an entity (Para. [0033]). Therefore, Wang teaches adjusting original/default weights for the features with the generated weight values.
Generating, by the server system via the first machine learning model, the first set of entity-specific embeddings based, at least in part, on the new set of weights calculated for each of the plurality of entities and the corresponding label.
Wang teaches “after generating weight matrix 252 in the illustrated embodiment matrix module 250 multiplies the weight matrix 252 by the feature vector matrix 232 to emphasize (or deemphasize) attributes of certain transactions in the training set 132” (Para. [0035]) where the first matrix 142 is output from the Matrix Module 250 (Fig. 2A).
Regarding Claim 5:
Widmann and Wang further teach:
Generating, by the server system, an entity-specific graph based, at least in part, on the set of optimal embeddings corresponding to each of the plurality of entities.
Widmann teaches “system including: a graph module configured to store and update a graph including nodes and edges, where each node represents an entity, where each entity is associated with one or more classifications, and where each edge represents a relationship between two entities; one or more machine learning engines configured to perform a respective decision-making process, where each of the one or more machine learning engines is associated with at least one of the one or more classifications” (Para. [0032]). Widmann further teaches “graph module may train the initial node configurations, in effect allowing the computational graph to dynamically reconfigure to effectuate learning. This may be effectuated by input and output associated with one or more machine learning models. In one example, based on data processed by the nodes and edges with respect to a machine learning module, the graph module may reconfigure one or more nodes and/or the edges.” (Para. [0098]) thereby teaching using the outputs of the machine learning models/modules to reconfigure, and thus generate a new, entity-specific graph based on the set of optimal embeddings output from machine learning models/modules taught by Wang and Widmann.
Regarding Claim 6:
Widmann and Wang further teach:
Determining, by the server system, a similarity range for an entity from the plurality of entities based, at least in part, on a task; and
Widmann teaches “an unsupervised machine learning model may analyze a graph representation using definitional functions” where “a definitional function may be, for example, descriptive (e.g., describing a characteristic of a device), quantitative (e.g., describing numerically the degree of similarity of two devices)” (Para. [0148]).
Generating, by the server system, a peer set for the entity based, at least in part, on the entity-specific graph, and the similarity range, the peer set indicating a set of entities from the entity-specific graph within the similarity range from the entity.
Widmann teaches “the UMLE may use a clustering technique to cluster the unsupervised feature vectors to determine whether any of the entities are exhibiting unusual behavior… The clustering algorithm may use a distance metric such as Euclidean distance, Manhattan distance, cosine distance, etc. to determine distances between unsupervised feature vectors when clustering.” (Para. [0140]) thereby teaching generating a peer set/cluster for the enetity based,a t least in part, on the graph features used by the unsupervised machine learning engine (UMLE) and the degree of similarity indicating whether similar entities are clustered together using a distance metric.
Regarding Claim 7:
Widmann and Wang further teach:
Wherein the first machine learning model and the second machine learning model are graph neural network (GNN) based machine learning models.
Wang teaches “various types of machine learning models including neural networks (e.g., graph convolutional network (GCN)), isolation forests, logistic regression, decision trees (e.g., XGBoost), etc.” (Para. [0029])
Regarding Claim 8:
Widmann and Wang further teach:
Wherein the entity-related dataset is a historical transaction dataset,
Widmann teaches “where the historical data store includes historical transaction data corresponding to the plurality of nodes” (Para. [0009])
the historical transaction dataset comprising information related to plurality of historical payment transactions performed between a plurality of cardholders and a plurality of merchants.
Widmann teaches “One or more financial transactions, such as a sender sending money to a receiver, may be understood in a graph form, such as that depicted in FIG. 2a. Though FIG. 2a depicts a single financial transaction, multiple transactions may be associated in a single graph. A graph representation may depict various entities, such as individuals, financial accounts and/or tools, computing devices, and the like, and the relationships between those entities. An individual, such as sender 201, may have a debit card 202 and a personal computer 203. The debit card may be issued by a financial institution 204. A transaction, such as transaction 205, may be attempted wherein the sender 201 purports to use the debit card 202 to transmit, via a website accessed using the personal computer 203, money to a checking account 206 associated with receiver 207. The transaction may be processed by a merchant 208. Furthermore, data available may suggest that the receiver 207 is associated with smartphone 209 and credit card 210.” (Para. [0103]) thereby teaching the transaction dataset comprising information related to historical payments between cardholders, such as debit card or credit card holders, and merchants.
Regarding Claim 9:
Widmann and Wang further teach:
Wherein the plurality of entities is one of a plurality of merchants, a plurality of cardholders, a plurality of issuers, and a plurality of acquirers.
Wang teaches “Within the transaction network graph 122, each node includes an entity identifier (ID) for the entity associated with that node. As one specific example, a consumer ID or merchant ID is assigned to each of two nodes based on the consumer or merchant involved in a given transaction between the two nodes.” (Para. [0022]) and “nodes (e.g., user, consumers, merchants, etc.) in transaction network graph 122” (Para. [0036]) thereby teaching a plurality of merchants as entities.
Regarding Claim 10:
Widmann and Wang further teach:
Wherein the server system is a payment server associated with a payment network.
Widmann teaches “the transaction data may comprise merchant data (e.g., name, identifier, merchant type, Boolean value), location data (e.g., IP address, ISP, MAC address, device identifier, UUID), amount data (e.g., monetary amount, currency type, tender type (e.g., credit card, mobile payment, debit card, online payment merchant, cryptocurrency)), and/or other characteristics.” (Par. [0108]) and “the data may correspond to one or more transactions, such as log-in transactions, computer network transactions, financial transactions, and/or the like.” (Para. [0070]) thereby teaching a payment server associated with a payment network
Wang also teaches “This mapped information can then be used to train a transaction classifier to identify such patterns prior to processing transactions.” (Para. [0018]).
Regarding Claim 11:
Some of the limitations herein are similar to some or all of the limitations of Claim 2.
Widmann and Wang further teach a server system, comprising:
A memory configured to store instructions (Wang – Para. [0073]);
A communication interface (Wang – “input/output (I/O) interface” – Para. [0069]); and
A processor in communication with the memory and the communication interface, the processor configured to execute the instructions stored in the memory and thereby cause the server system to perform steps (Wang – Para. [0073]).
Regarding Claim 12:
All of the limitations herein are similar to some or all of the limitations of Claim 2.
Regarding Claim 15:
All of the limitations herein are similar to some or all of the limitations of Claims 5 and 6.
Regarding Claim 16:
All of the limitations herein are similar to some or all of the limitations of Claim 7.
Regarding Claim 17:
All of the limitations herein are similar to some or all of the limitations of Claim 8.
Regarding Claim 18:
All of the limitations herein are similar to some or all of the limitations of Claim 9.
Regarding Claim 19:
Some of the limitations herein are similar to some or all of the limitations of Claim 1.
Widmann and Wang further teach:
A non-transitory computer-readable storage medium comprising computer-executable instructions that, when executed by at least a processor of a server system, cause the server system to perform a method (Wang – Para. [0073]).
Regarding Claim 20:
All of the limitations herein are similar to some or all of the limitations of Claim 2.
Claim(s) 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Widmann and Wang and further in view of Nouri et al. (U.S. Pre-Grant Publication No. 2022/0114456, hereinafter referred to as Nouri).
Regarding Claim 4:
Widmann and Wang further teach:
Generating, by the server system, an entity graph based, at least in part, on the entity-related dataset and the set of entity-specific features for each of the plurality of entities, the homogenous entity graph comprising a plurality of nodes, each of the plurality of nodes corresponding to each of the plurality of entities, wherein each of the plurality of nodes comprises the corresponding set of entity-specific features.
Wang teaches “numerical attributes (e.g., transaction amount, a number of days a user account has been registered, account balance, etc.) included in transactions” and “categorical attributes (e.g., funding source, customer region, consumer segmentation, merchant segmentation, currency code, etc.) of transactions” (Para. [0032]) and generating a transaction network graph 122 that “includes a plurality of edges connecting one or more pairs of nodes included in the plurality of nodes, where the edges represent electronic transactions between the multiple different entities. For example, the nodes are vertices representing consumers, merchants, etc. while the edges are transactions that occurred between the consumers, merchants, etc.” (Para. [0053]), thereby teaching including the features being organized specific to each of the entities (nodes) but still including numerical and categorical attributes.
Widmann and Wang explicitly teaches all of the elements of the claimed invention as recited above except:
a homogenous entity graph;
However, in the related field of endeavor of knowledge graph based embedding, explainability, and multi-task learning, Nouri teaches:
a homogenous entity graph;
Nouri teaches “for homogeneous graphs…meaning that all nodes belong to one type and all edges belong to one type” (Para. [0370]).
Thus, it would have been obvious to one of ordinary skill in the art, having the teachings of Nouri, Widmann, and Wang at the time that the claimed invention was effectively filed, to have combined the use of homogeneous graphs, as taught by Nouri, with the use of supervised machine learning models connected with a graph module as taught by Widmann, and the systems and methods for generating a concatenating final matrix for training a machine learning model to detect anomalous transactions, as taught by Wang.
One would have been motivated to make such combination because Widmann teaches “a high-performance CPU-GPU hybrid system for graph embeddings learning that is suitable for arbitrary types of graphs and easily scales to hundreds of millions of nodes, a novel objective function for heterogeneous graphs that overcomes limitations tied to a particular sampling strategy and provides parameters to tune explored samples, and/or an edge selection process for improving or optimizing a quality of embedding for low and high frequent nodes that efficiently optimizes a graph-aware, neighborhood preserving objective using SGD.” (Para. [0403])
Regarding Claim 14:
All of the limitations herein are similar to some or all of the limitations of Claim 4.
Response to Amendment
Applicant’s Amendments, filed on 2/26/2026, are acknowledged and accepted.
In light of the Amendments filed on 2/26/2026, the claim objection to claims 1, 11, and 19 has been withdrawn.
In light of the Amendments filed on 2/26/2026, the 112(b) rejection of claims 1-20 has been withdrawn.
Response to Arguments
On pages 9-10 of the Remarks filed on 2/26/2026, Applicant argues that “the motivation to combine these references in the precise manner claimed is absent from the prior art and is instead guided by Applicant's own disclosure” because “Wang does not teach this method of label generation. As noted in the previous response, Wang utilizes pre-existing "known labels" (Wang, par. [0020]) or generates labels from "anomaly scores" (Wang, par. [0038]). Wang's separation of numerical and categorical features (Wang, par. [0032]) is for distinct processing, not for generating labels from the categorical features themselves to be used in a supervised loss function.”, “Widmann does not provide any teaching or suggestion to generate these target values (labels) in the specific manner claimed by Applicant, nor does it motivate applying such a loss function to the first of a two-model architecture as in Wang to solve the specific technical problem of preserving high-cardinality categorical feature information.” and “a person of ordinary skill in the art, presented with Wang and Widmann, would not have been motivated to modify Wang's system to discard its own labeling methodology and instead invent the claimed process of shifting out categorical features to create labels for a loss function in the first model.”
Upon further time spent reviewing the prior art and considering Applicant’s argument, the argument was not found to be convincing to overcome the previously cited prior art. It is noted that the motivation recited is motivation to combine the two references where using the machine learning models, taught by Widmann (“modifications to the nodes and/or edges may be based on a prediction, by the machine learning model and/or the graph module, of a change that may result an improvement” - Para. [0136]), in combination with the graph module, taught by Wang, would improve at least the graph module 120 taught by Wang. It is further noted that the claims do not recite a scope of a supervised loss function or the loss function performing any preservation step of high-cardinality categorical feature information, as Applicant appears to be arguing.
On page 10 of the Remarks filed on 2/26/2026, Applicant argues, with respect to the limitations amended from claim 3 into the independent claims, that “the cited combination of Wang and Widmann fails to teach or suggest this specific [amended] two-step process involving a self-learning attention mechanism” because “The rejection of original claim 3 asserted that Wang's "anomaly scores" (Wang, par. [0037]) constitute the claimed "intermediate entity-specific embeddings." Respectfully, this is a misinterpretation. An anomaly score, as disclosed by Wang, is a single scalar value indicating an outlier, which is structurally and functionally distinct from an "embedding," which is understood in the art to be a multi-dimensional vector representation. Wang does not teach generating an intermediate vector representation that is subsequently updated as claimed.”
Upon further time spent reviewing the prior art and considering Applicant’s argument, the argument was not found to be convincing to overcome the previously cited prior art because an embedding, when given its broadest reasonable interpretation, may include single or multi-dimensional vector representation.
On pages 10-11 of the Remarks filed on 2/26/2026, Applicant argues that “it is asserted that Widmann's general discussion of "unsupervised machine learning" or "self- learning" (Widmann, pars. [0050], [0100]) teaches the claimed "self-learning attention process." This is unsupported. To a person of ordinary skill in the art, a "self-learning attention process" or "self-attention mechanism" is a specific, distinct technical mechanism used in machine learning to weigh the importance of different input features (Specification, pars. [0061], [0118]). It is not synonymous with the broad concept of unsupervised or self-learning. Widmann provides no disclosure of an attention mechanism, let alone one that updates an intermediate embedding using a first embedding and categorical features as claimed. The combination of references is therefore deficient as it is missing this key element.”
Applicant’s argument is not convincing because attention is necessarily understood as a component of self-learning, which is taught by Widmann (Paras. [0100]). Further, for argument sake, even if it wasn’t understood as a necessary component, self-attention was a well known mechanism at the effective filing date of the present application (September 20, 2023), as it was popularized in the 2017 research paper “Attention is All You Need”.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Onoro Rubio et al. (U.S. Pre-Grant Publication No. 2019/0205964) teaches collecting a dataset containing a plurality of entities and attributes for the entities. Relationships between the plurality of entities are generated. The plurality of entities, attributes and relationships are stored in a knowledge graph. A representation of the plurality of entities, attributes and relationships stored in the knowledge graph is learned. Zero-shot learning is performed for a new entity and attributes for the new entity. The new entity and attributes for the new entity are stored in the knowledge graph. A recommendation for a user is generated based on the knowledge graph.
Ma et al. (U.S. Patent/Pre-Grant Publication No. 2020/0201908) teaches a system for processing data. During operation, the system applies a first set of hash functions to a first entity identifier (ID) for a first entity to generate a first set of hash values. Next, the system produces a first set of intermediate vectors from the first set of hash values and a first set of lookup tables by matching each hash value in the first set of hash values to an entry in a corresponding lookup table in the first set of lookup tables. The system then performs an element-wise aggregation of the first set of intermediate vectors to produce a first embedding. Finally, the system outputs the first embedding for use by a machine learning model.
Tan et al. (U.S. Pre-Grant Publication No. 2024/0330718) teaches generating a knowledge graph database representing relationships between entities in the online system. The online system generates the knowledge graph database by at least obtaining descriptions for an item. The online system generates one or more prompts to a machine-learned language model, where a prompt includes a request to extract a set of attributes for the item from the description of the item. The online system receives a response generated from executing the machine-learned language model on the prompts. The online system parses the response to extract the set of attributes for the item. For each extracted attribute, the online system generates connections between an item node representing the item and a set of attribute nodes for the extracted set of attributes in the database.
Liu et al. (U.S. Pre-Grant Publication No. 2024/0419942) teaches an entity tag association prediction method, device, system, and a computer readable storage medium. The method includes: determining an entity relationship network, a tag relationship network and an entity tag association network; constructing an entity similarity graph according to the entity relationship network, constructing a tag similarity graph according to the tag relationship network and the entity tag association network, and constructing an entity tag association bipartite graph according to the entity tag association network; extracting an entity feature, and constructing a tag feature according to the tag similarity graph; integrating the entity similarity graph, the tag similarity graph, and the entity tag association bipartite graph into a graph convolutional network to construct a prediction model; inputting the entity feature and the tag feature into the prediction model for training until the model converges, and outputting a prediction result of the prediction model.
Charania et al. (U.S. Pre-Grant Publication No. 2024/0112075) teaches predicting differentiating features from tabular data for two different populations. In some aspects, the systems and methods provide for receiving first data entries and second data entries and generating a first graph based on a data entry from the first data entries and a second graph based on a data entry from the second data entries. A first node in the first graph is determined to correspond to a second node in the second graph. A first set of graph embeddings is generated based on the first graph. A second set of graph embeddings is generated based on the second graph. Using a machine learning model, the first set of graph embeddings and the second set of graph embeddings are processed to identify at least one feature indicative of a difference between the first data entries and the second data entries.
Bruss et al. (U.S. Patent No. 10,789,530) teaches neural embeddings of transaction data. A network graph of transaction data based on a plurality of transactions may be received. The network graph of transaction data may define relationships between the transactions, each transaction associated with at least a merchant and an account. A neural network may be trained based on training data comprising a plurality of positive entity pairs and a plurality of negative entity pairs. An embedding function may then encode transaction data for a first new transaction. An embeddings layer of the neural network may determine a vector for the first new transaction based on the encoded transaction data for the first new transaction. A similarity between the vectors for the transactions may be determined. The first new transaction may be determined to be related to the second transaction based on the similarity.
Newman (U.S. Patent No. 11,481,603) teaches receiving data from a computing device requesting approval of a loan application; accessing time-series data associated with the user from a knowledge graph; building a feature vector based on the accessed time-series data; inputting the feature vector into a machine learning model; receiving a response from the output from the machine learning mode, the output indicating a level of approval for the user with respect to the loan application; and transmitting a response to the request based on the level of approval.
Foreign Patent Document CN 115860748 A teaches a transaction risk identification method and system based on side feature enhancement, the method comprising: according to the transaction action data between different entities to construct the heterogeneous graph; determining the initial node characteristic of each node in the heterogeneous graph and the initial edge characteristic of the edge between each node; aiming at the initial node characteristic of each node, updating the node characteristic of each node according to the updated network using side characteristic enhancement to obtain the node updating characteristic of each node; aiming at the initial edge feature of each side, according to the updating network, updating the edge feature of each side, obtaining the edge updating feature of each side, wherein the updating network is based on side feature enhancement; and sending the node update feature of each node and the side update feature of each side into the classifier to identify the transaction risk.
Verma et al. (U.S. Pre-Grant Publication No. 2023/0289610) teaches unsupervised representation learning for bipartite graphs. Method performed by server system includes accessing historical transaction data from database. Method includes generating a bipartite graph based on historical transaction data. Bipartite graph represents a computer-based graph representation of a plurality of cardholders as first nodes and a plurality of merchants as second nodes and payment transactions between first nodes and second nodes as edges. Method includes sampling direct neighbor nodes and skip neighbor nodes associated with a node based on neighborhood sampling method and executing direct neighborhood aggregation method and skip neighborhood aggregation method to obtain direct neighborhood embedding and skip neighborhood embedding associated with node, respectively. Method includes optimizing combination of direct and skip neighborhood embeddings for obtaining final node representation associated with the node and executing graph context prediction tasks based on final node representations of first nodes and second nodes.It is noted that this reference shares the same assignee as the present application and was published (September 14, 2023) less than a year from the effective filing date (September 20, 2023) of the present application.
Matskevich et al. (U.S. Patent No. 10,437,931) teaches extracting facts from natural language texts. An example method of information extraction comprises extracting, from a natural language text, a first plurality of information objects; extracting, from the natural language text, a second plurality of information objects; identifying a set of conflicting information objects, such that a first information object of the set of conflicting information objects belongs to the first plurality of information objects and a second information object of the set of conflicting information objects belongs to the second plurality of information objects; and producing a final list of information objects extracted from the natural language text, by applying, to the set of conflicting information objects, a conflict arbitration function which performs at least one of: modifying the first information object, deleting the first information object, or merging two or more information objects of the set of conflicting information objects.
Zhang et al. (U.S. Pre-Grant Publication No. 2021/0097367) teaches a system for processing data. During operation, the system performs processing related to a first set of features for a first entity using a first series of embedding layers, wherein the processing includes applying each embedding layer in the first series of embedding layers to a concatenation of all outputs of one or more layers preceding the embedding layer. Next, the system obtains a first embedding as an output of a first final layer in the first series of embedding layers. The system then outputs the first embedding for use by a machine learning model.The reference further teaches “To improve the efficiency of the input layer, the dimensionality of a given input embedding is selected to be proportional to the cardinality of the feature used to produce the input embedding. For example, the input layer generates input embeddings as fixed-length vector representations of one-hot encoded categorical entity features such as skills, companies, titles, educational attributes (e.g., degrees, schools, fields of study, etc.), seniorities, and/or functions of members or jobs. An entity feature for skills includes tens of thousands of possible values, which are converted into an input embedding with a dimensionality in the hundreds. On the other hand, an entity feature for seniority includes around 10 values, which are converted into an input embedding with a dimensionality of around 4.” (Para. [0014]).
Friede et al. (U.S. Pre-Grant Publication No. 2022/0300831) teaches “Self-attention (also be known in the art as intra-attention) is an attention mechanism relating different positions of a single sequence (or other data collection) in order to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations. A self-attention model allows inputs to interact with each other (i.e. calculate attention of all other inputs with respect to one input).” (Para. [0014])
Nguyen et al. (U.S. Pre-Grant Publication No. 2023/0186319) teaches “As those having ordinary skill in the art will appreciate, application of self-attention to the set of numerical representations 210 can cause of each of the set of numerical representations 210 to become weighted according to a corresponding level of importance (e.g., each numerical representation can query and/or interact with all of the others in the set of numerical representations 210 so as to derive a set of weights, and the set of numerical representations 210 can thus be updated and/or scaled according to the set of weights).” (Para. [0079]).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT F MAY whose telephone number is (571)272-3195. The examiner can normally be reached Monday-Friday 9:30am to 6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Boris Gorney can be reached on 571-270-5626. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ROBERT F MAY/Examiner, Art Unit 2154 3/20/2026
/BORIS GORNEY/Supervisory Patent Examiner, Art Unit 2154