DETAILED ACTION
Status of Claims
Claim(s) 1-20 are pending and are examined herein.
Claim(s) 1-20 are rejected under 35 U.S.C. § 101 and 103.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement IDS(s) submitted on March 28, 2023, May 14, 2024, and December 30, 2025 are in compliance with the provisions of 37 CFR 1.97 and have been considered by the examiner.
Claim Objections
The disclosure is objected to because of the following informalities:
Regarding claim 9: the claim recites “generating a graph using the feature vector and other feature vectors that are respectively obtained from a reference set of objects, the reference set of objects respectively labeled with the first classification or the second classification, the feature vector for the object corresponding to a center node of a set of nodes of the graph, wherein determining the graph comprises:” lines 8-11. The claim limitation should read as “generating a graph using the feature vector and other feature vectors that are respectively obtained from a reference set of objects, the reference set of objects respectively labeled with the first classification or the second classification, the feature vector for the object corresponding to a center node of a set of nodes of the graph, wherein generating the graph comprises:”.
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter (Step 1). If the claim does fall within one of the statutory categories, the second step in the analysis is to determine whether the claim is directed to a judicial exception (Step 2A). The Step 2A analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined whether or not the claims recite a judicial exception (e.g., mathematical concepts, mental processes, certain methods of organizing human activity). If it is determined in Step 2A, Prong 1 that the claims recite a judicial exception, the analysis proceeds to the second prong (Step 2A, Prong 2), where it is determined whether or not the claims integrate the judicial exception into a practical application. If it is determined at step 2A, Prong 2 that the claims do not integrate the judicial exception into a practical application, the analysis proceeds to determining whether the claim is a patent-eligible application of the exception (Step 2B). If an abstract idea is present in the claim, any element or combination of elements in the claim must be sufficient to ensure that the claim integrates the judicial exception into a practical application, or else amounts to significantly more than the abstract idea itself. Applicant is advised to consult MPEP 2106 for more details of the analysis.
Under Step 1 analysis,
Claims 1-8 recite a method (representing a process); and
Claims 9-20 recite a method (representing a process).
Therefore, each set of the claims falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Claims
1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more, and hence is not patent-eligible subject matter.
Regarding Claim 1,
Step 2A Prong 1: The claim recites an abstract idea enumerated in the 2019 PEG.
determining a graph for each input sample of a set of input samples, a respective input sample corresponding to a center node of a set of nodes of the graph, (The “determining” step is an abstract idea of Mental Process and/or mathematical concepts. Examiner’s note: the “determining” step, as drafted, and under its broadest reasonable interpretation (BRI), covers concepts that can practically performed in the human mind with the aid of pen and paper. This limitation involves constructing a graph using previously generated vectors. This step represents a process that can be manually performed by an individual using pen and paper. See MPEP § 2106.04(a)(2)(III).)
wherein determining the graph comprises: selecting, using a distance metric associated with the center node of the graph, neighbor nodes that neighbor the center node and are to be included in the set of nodes of the graph, each neighbor node labeled as either a benign training sample or an adversarial training sample of the set of training samples; (That is part of the abstract idea of the Mental process and Mathematical Concept. The “selecting” step, as drafted, and under its broadest reasonable interpretation, cover concepts that would fall under the mental process and mathematical concept. Examiner note: this step involves selecting k nearest neighbors of nodes using distance measure. The distance metric is a mathematical calculation used to determine the nearest neighbor nodes to select. This step, as currently drafted, encompasses the mental processes and mathematical concepts that can be practically performed manually by an individual with the aid of pent and paper. See MPEP § 2106.04(a)(2)(I) & (III).)
determining an edge weight of an edge between a first node and a second node of the set of nodes of the graph based on a distance between respective feature vectors of the first node and the second node; (That is part of the abstract idea of a “Mental process” and “Mathematical Concept.” The “determining” step, as drafted, and under its broadest reasonable interpretation, covers concepts that falls under the mental process and mathematical concept. The process of determining edges between node pairings of the set of nodes via edge estimation using a Euclidean distance or a cosine distance encompasses the mathematical calculation that can be performed manually by an individual with physical tools (e.g., pen and paper). See MPEP § 2106.04(a)(2)(III).)
Step 2A Prong 2: Under this prong, we evaluate whether the claim recites additional elements that integrate the abstract idea into a practical application by considering the claim as a whole. The judicial exception is not integrated into a practical application.
Additional Elements Analysis:
The claim recite the limitation: “storing a set of training samples that comprises a first set of benign training samples and a second set of adversarial training samples, each training sample having a known classification from a plurality of classifications, the second set of adversarial training samples being generated using an adversarial sample generation method;”. (The “storing” step amounts to no more than adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP § 2106.05(g). This step represents a generic computer function that is recited at a high level of generality (e.g., data gathering). Such data gathering step does not meaningful limit the claim (i.e., all uses of the recited judicial exception require such data gathering or data output).
The claim recites the limitation of: “obtaining, with a pre-trained classification model, a feature vector for each training sample of the first set and the second set;”.(The “obtaining” step amounts to no more than adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP § 2106.05(g). The generic recitation of obtaining feature vectors represents a pre-solution activity incidental to the primary process or product that is merely a nominal or tangential addition to the claim. The recitation of using “a pre-trained classification model” to obtain a feature vector represents invoking a computer component to perform a generic computer function. In other words, the claim invokes computers or other machinery in their ordinary capacity to perform an existing process. See MPEP § 2106.05(f). The Examiner notes that the high-level recitation of using a pre-trained model to perform feature extraction represents a generic computer function that is well-known and conventional in the art. As evidenced by Mendes Rodrigues et al., (Pub. No.: US 20190323993 A1), a standard machine learning (ML) model may utilize a pre-trained convolutional neural network performs feature vector extraction and apply a simple classifier, like k-Nearest Neighbors, to identify similar image patches based on the similarity of the extracted feature vectors. See Mendes [0051].
The claim further recites the limitation of: “training, using each determined graph, a graph discriminator to differentiate between benign samples and adversarial samples, the training using (I) the feature vectors associated with nodes of the graph and (II) the edge weights between the nodes of the graph.” (The “training” step amounts to no more than reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea, as discussed in MPEP § 2106.05(f). Examiner’s Note: this represents high-level training a model/classifier on pretrained data. In other words, the claim invokes computers or other machinery in their ordinary capacity to perform an existing process.)
Step 2B: Under this prong, the claim must include additional elements that amount to significantly more than the judicial exception. These elements must not be well-understood, routine, or conventional in the relevant field. When viewed individually and as an ordered combination, the claim does not include any such additional elements that are sufficient to amount to significantly more (i.e., inventive concept).
Additional Elements Analysis:
As explained above, the additional elements such as “storing” and “obtaining” amount to insignificant extra-solution activities to the judicial exception. These additional elements merely represents generic computer functions (i.e., data gathering) in conjunction with the abstract idea. The courts have recognized computer functions such as “receiving or transmitting data” or “storing and retrieving information” as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. See MPEP § 2106.05(d). Additionally, the recitation of using “pre-trained classification model” to obtain feature vectors represents a generic computer function. The concept of using a pre-trained classification model to obtain feature vectors is also well-known and conventional in the field, as evidenced by Mendes, which describes that a standard machine learning (ML) model may utilize a pre-trained convolutional neural network performs feature vector extraction and apply a simple classifier, like k-Nearest Neighbors, to identify similar image patches based on the similarity of the extracted feature vectors. See Mendes Rodrigues et al., (Pub. No.: US 20190323993 A1) e.g., [0051].
Furthermore, the training limitation amounts to no more than reciting computer component to performing generic computer function. The additional elements that invoke computers or other machinery merely as a tool to perform an existing process will generally not amount to significantly more than a judicial exception. See MPEP § 2106.05(f).
Accordingly, the additional limitations whether considered individually or in combination with the judicial exception, are not sufficient to integrate the judicial exception into a practical application or amount to significantly more.
Therefore, claim 1 does not recite patent-eligible subject matter.
Regarding Claim 2,
Step 2A Prong 1: Claim 2, which incorporates the rejection of claim 1, recites further limitation such as:
wherein the graph comprises an embedding matrix and an adjacency matrix, wherein the embedding matrix comprises a plurality of feature vectors for the set of nodes of the graph, a feature vector corresponding to an embedding of the embedding matrix, and wherein the adjacency matrix comprises edge weights for edges that connect the nodes of the graph. (This limitation is part of the abstract idea recited claim 1. Claim 2 merely introduce the terms “embedding matrix” and “adjacency matrix” as a formal way of representing the graph. The embedding matrix broadly defines a numerical representation of the feature vectors of the nodes, where each row corresponds to a node’s feature vector. The adjacency matrix broadly defines a numerical representation of the edge weights between nodes, where each entry represents the weight (distance/similarity) of an edge between two nodes. That is part of the abstract idea of mathematical concept and mental process. See MPEP § 2106.04(a)(2)(I) & (III).)
Step 2A Prong 2: The claim does not recite additional element that integrates the judicial exception into a practical application.
Step 2B: The claim does not recite additional elements that amount to significantly more than the judicial exception.
Therefore, claim 2 is ineligible.
Regarding Claim 3,
Step 2A Prong 1: Claim 3, which incorporates the rejection of claim 1, recites further limitation such as:
wherein selecting the neighbor nodes for inclusion within the graph comprises generating a k-nearest neighbor graph that determines the neighbor nodes as being nearest neighbors from the center node, the distance metric being associated with parameters for generating the k-nearest neighbor graph. (This limitation is part of the abstract idea recited claim 1. Claim 3 merely introduce the “k-nearest neighbor graph” process of selecting the neighbor nodes. The “k-nearest neighbors” selection is an algorithm that defines a sequence of steps to determine and select neighbor nodes using a distance measure. Thus, the concept of finding and selecting neighbor nodes by computing the distance between a given center node and all other nodes, sorting the nodes by distance, and selecting the top k nodes as neighbors of the center node is a process that, under its broadest reasonable interpretation, covers mental processes and mathematical concepts. See MPEP § 2106.04(a)(2)(I) & (III).)
Step 2A Prong 2: The claim does not recite additional element that integrates the judicial exception into a practical application.
Step 2B: The claim does not recite additional elements that amount to significantly more than the judicial exception.
Therefore, claim 3 is ineligible.
Regarding Claim 4,
Step 2A Prong 1: Claim 4, which incorporates the rejection of claim 1, recites further limitation such as:
wherein the edge weight between the first node and the second node of the graph is inversely correlated with the distance between the respective feature vectors of the first node and the second node. (This limitation is part of the abstract idea recited claim 1. Claim 4 merely defines that the edge weight is determined based on the distance between two nodes. That is part of the abstract idea of mental processes and mathematical concepts. See MPEP § 2106.04(a)(2)(I) & (III).)
Step 2A Prong 2: The claim does not recite additional element that integrates the judicial exception into a practical application.
Step 2B: The claim does not recite additional elements that amount to significantly more than the judicial exception.
Therefore, claim 4 is ineligible.
Regarding Claim 5,
Step 2A Prong 1: Claim 5, which incorporates the rejection of claim 1, recites further limitation such as:
wherein the edge weight is determined based on a function that maps the distance between the respective feature vectors of the first node and the second node from a first space to a second space. (This limitation is part of the abstract idea of mental process and mathematical concept. Claim 5 merely defines an edge estimation function that maps a distance (e.g., a Euclidean distance) between two nodes (e.g., between two embeddings of a node pair) of the graph. The claim does not define the technical implementation of the edge weight estimation. This broader recitation covers concept that can be manually derived by an individual with the aid of pen and paper. See MPEP § 2106.04(a)(2)(I) & (III).)
Step 2A Prong 2: The claim does not recite additional element that integrates the judicial exception into a practical application.
Step 2B: The claim does not recite additional elements that amount to significantly more than the judicial exception.
Therefore, claim 5 is ineligible.
Regarding Claim 6,
Step 2A Prong 1: Claim 6, which incorporates the rejection of claim 5, recites further limitation such as:
determining parameters for the function, the parameters determined using each of the input samples. (This limitation is part of the abstract idea recited in claim 5. Claim 6 merely defines the parameter of the function for estimating the edge weight. The claim does not define the technical implementation of determining the parameter. The parameter of the function broadly represents the distance function. This broader recitation covers concept that can be manually determined by an individual with the aid of pen and paper. See MPEP § 2106.04(a)(2)(I) & (III).)
Step 2A Prong 2: The claim does not recite additional element that integrates the judicial exception into a practical application.
Step 2B: The claim does not recite additional elements that amount to significantly more than the judicial exception.
Therefore, claim 6 is ineligible.
Regarding Claim 7,
Step 2A Prong 1: Claim 7, which incorporates the rejection of claim 1, doesn’t recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
wherein a particular input sample of the input samples is associated with a ground truth label, and wherein the graph discriminator comprises a neural network that outputs a class prediction of whether the particular input sample is benign or adversarial, and wherein the neural network is trained by minimizing a loss between the class prediction of the graph discriminator and the ground truth label.(This limitation amounts to merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea, as discussed in MPEP § 2106.05(f). Examiner’ Note: the limitations of claim 7 merely defines a generic machine learning training process that is recited at a high level of generality. As evidenced by Li, which describes techniques known in the art of DNN model training, such as using gradient-based optimization or unsupervised greedy layer-wise training procedure. See Li et al., (Pub. No.: US 20160078339 A1) e.g., [0050]. Accordingly, these limitations the claim invokes computers or other machinery in their ordinary capacity merely as a tool to perform an existing process.)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As explained above, the additional element of using conventional machine learning functions amounts to no more than applying generic computer component to perform generic computer functions. These generic training recitation do not amount significantly more and do not provide an inventive concept.
Therefore, claim 7 is ineligible.
Regarding Claim 8,
Step 2A Prong 1: Claim 8, which incorporates the rejection of claim 7, doesn’t recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
wherein the graph discriminator receives as input, for a given training iteration, an adjacency matrix and an embedding matrix that are associated with the particular input sample. (This limitation amounts to adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP § 2106.05(g). Examiner’ Note: the limitation defines a data gathering step of the previously determined representation of data. Such data gathering or outputting step does not impose a meaningful limitation on the claim.)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As explained above, the additional element remains insignificant extra-solution activity. The courts have recognized computer functions such as “receiving or transmitting data” or “storing and retrieving information” as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. See MPEP § 2106.05(d.
Therefore, claim 8 is ineligible.
Regarding Claim 9,
Step 2A Prong 1: The claim recites an abstract idea enumerated in the 2019 PEG.
generating a graph using the feature vector and other feature vectors that are respectively obtained from a reference set of objects, the reference set of objects respectively labeled with the first classification or the second classification, the feature vector for the object corresponding to a center node of a set of nodes of the graph, (The “generating” step is an abstract idea of Mental Process and/or mathematical concepts. Examiner’s note: the “generating” step, as drafted, and under its broadest reasonable interpretation (BRI), covers concepts that can practically performed in the human mind with the aid of pen and paper. This limitation involves constructing a graph using previously generated vectors. This step represents a process that can be manually performed by an individual using pen and paper. See MPEP § 2106.04(a)(2)(III).)
wherein determining the graph comprises: selecting, using a distance metric associated with the center node of the graph, neighbor nodes that neighbor the center node and are to be included in the set of nodes of the graph, each neighbor node corresponding to an object of the reference set of objects and having the first classification or the second classification; (That is part of the abstract idea of the Mental process and Mathematical Concept. The “selecting” step, as drafted, and under its broadest reasonable interpretation, cover concepts that would fall under the mental process and mathematical concept. Examiner note: this step involves selecting k nearest neighbors of nodes using distance measure. The distance metric is a mathematical calculation used to determine the nearest neighbor nodes to select. This step, as currently drafted, encompasses the mental processes and mathematical concepts that can be practically performed manually by an individual with the aid of pent and paper. See MPEP § 2106.04(a)(2)(I) & (III).)
determining an edge weight of an edge between a first node and a second node of the set of nodes of the graph based on a distance between respective feature vectors of the first node and the second node; (That is part of the abstract idea of a “Mental process” and “Mathematical Concept.” The “determining” step, as drafted, and under its broadest reasonable interpretation, covers concepts that falls under the mental process and mathematical concept. The process of determining edges between node pairings of the set of nodes via edge estimation using a Euclidean distance or a cosine distance encompasses the mathematical calculation that can be performed manually by an individual with physical tools (e.g., pen and paper). See MPEP § 2106.04(a)(2)(III).)
determine whether the sample data of the object is to be classified with the first classification or the second classification, (An abstract idea of a mental process. The process of determining whether a sample data object is a benign object or an adversarial object falls within the mental process category. Labeling object as benign or adversarial is a decision making process that can be practically performed in the human mind. See MPEP § 2106.04(a)(2)(III).)
Step 2A Prong 2: Under this prong, we evaluate whether the claim recites additional elements that integrate the abstract idea into a practical application by considering the claim as a whole. The judicial exception is not integrated into a practical application.
Additional Elements Analysis:
The claim recite the limitation: “receiving sample data of an object to be classified;”. (The “receiving” step amounts to no more than adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP § 2106.05(g). This step represents a generic computer function that is recited at a high level of generality (e.g., data gathering). Such data gathering step does not meaningful limit the claim (i.e., all uses of the recited judicial exception require such data gathering or data output).
The claim recites the limitation of: “executing, using the sample data, a classification model to obtain a feature vector, the classification model trained to assign a classification of a plurality of classifications to the sample data, the plurality of classifications including a first classification and a second classification;” (This amounts to no more than invoking computer component as a tool to perform a generic computer function. The generic recitation of obtaining feature vectors represents a pre-solution activity incidental to the primary process or product that is merely a nominal or tangential addition to the claim. See MPEP § 2106.05(g). The recitation of executing “a classification model” to obtain a feature vector represents invoking a computer component to perform a generic computer function. In other words, the claim invokes computers or other machinery in their ordinary capacity to perform an existing process. See MPEP § 2106.05(f). The Examiner notes that the high-level recitation of using the classification model to perform feature extraction represents a generic computer function that is well-known and conventional in the art. As evidenced by Mendes Rodrigues et al., (Pub. No.: US 20190323993 A1), a standard machine learning (ML) model may utilize a pre-trained convolutional neural network performs feature vector extraction and apply a simple classifier, like k-Nearest Neighbors, to identify similar image patches based on the similarity of the extracted feature vectors. See Mendes [0051].)
The claim further recites the limitation of: “applying a graph discriminator to the graph to determine whether the sample data of the object is to be classified with the first classification or the second classification, the graph discriminator trained using (I) the feature vectors associated with nodes of the graph and (II) the edge weights between the nodes of the graph.” (The “applying” step amounts to no more than reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea, as discussed in MPEP § 2106.05(f). Examiner’s Note: this represents high-level recitation of using the pretrained model to perform prediction. In other words, the claim invokes computers or other machinery in their ordinary capacity to perform an existing process.)
Step 2B: Under this prong, the claim must include additional elements that amount to significantly more than the judicial exception. These elements must not be well-understood, routine, or conventional in the relevant field. When viewed individually and as an ordered combination, the claim does not include any such additional elements that are sufficient to amount to significantly more (i.e., inventive concept).
Additional Elements Analysis:
As explained above, the additional elements such as “receiving” and “obtaining” amount to insignificant extra-solution activities to the judicial exception. These additional elements merely represents generic computer functions (i.e., data gathering) in conjunction with the abstract idea. The courts have recognized computer functions such as “receiving or transmitting data” or “storing and retrieving information” as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. See MPEP § 2106.05(d). Additionally, the recitation of using “executing a classification model” to obtain feature vectors represents a generic computer function. The concept of using a classification model to obtain feature vectors is also well-known and conventional in the field, as evidenced by Mendes, which describes that a standard machine learning (ML) model may utilize a pre-trained convolutional neural network performs feature vector extraction and apply a simple classifier, like k-Nearest Neighbors, to identify similar image patches based on the similarity of the extracted feature vectors. See Mendes Rodrigues et al., (Pub. No.: US 20190323993 A1) e.g., [0051].
Furthermore, the applying limitation amounts to no more than reciting a computer component configured to performing the abstract idea. The additional elements that invoke computers or other machinery merely as a tool to perform an existing process will generally not amount to significantly more than a judicial exception. See MPEP § 2106.05(f).
Accordingly, the additional limitations whether considered individually or in combination with the judicial exception, are not sufficient to integrate the judicial exception into a practical application or amount to significantly more.
Therefore, claim 9 does not recite patent-eligible subject matter.
Regarding Claim 10,
Step 2A Prong 1: Claim 10, which incorporates the rejection of claim 9, doesn’t recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
wherein the reference set of objects comprises a first subset of objects respectively labeled with the first classification and a second subset of objects respectively labeled with the second classification. (This limitation amounts to no more than adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP § 2106.05(g). This broadly defines the received subset of objects as labeled information. This recitation does not impose a meaningful limitation on the abstract idea.)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the limitation remains insignificant extra-solution activity to the judicial exception as it merely defines the received data at a high level of generality.
Therefore, claim 10 is ineligible.
Regarding Claim 11,
Step 2A Prong 1: Claim 11, which incorporates the rejection of claim 10, recites further limitation such as:
selecting a first candidate nearest neighbor for a particular node of a current graph, the first candidate nearest neighbor selected from the first subset of objects having the first classification and associated with a first candidate feature vector of the feature vectors obtained from the reference set of objects; selecting a second candidate nearest neighbor for the particular node of the current graph, the second candidate nearest neighbor selected from the second subset of objects having the second classification and associated with a second candidate feature vector of the feature vectors obtained from the reference set of objects; ... connecting a candidate selected by the neural network to the particular node of the current graph. (These limitations are part of the abstract idea recited claim 9. Claim 11 merely introduce the “candidate nearest neighbor” selection process to identify and select neighbor nodes. The “k-nearest neighbors” selection is an algorithm that defines a sequence of steps to determine and select neighbor nodes using a distance measure. Thus, the concept of finding and selecting candidate neighbor nodes by computing the distance between a given center node and all other nodes, sorting the nodes by distance, and selecting the top k nodes as neighbors of the center node is a process that, under its broadest reasonable interpretation, covers mental processes and mathematical concepts. See MPEP § 2106.04(a)(2)(I) & (III).)
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
The recitation of “inputting (I) a current adjacency matrix of the current graph, (II) a current embedding matrix of the current graph, (III) the first candidate feature vector, and (IV) the second candidate feature vector into a neural network, the neural network trained to select one of the candidates;” amounts to no more than invoking computer component as a tool to perform the abstract idea. Merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea, as discussed in MPEP § 2106.05(f).
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As explained above in step 2A, prong Two, the additional element merely invokes computer component (i.e., neural network) as a tool to perform the abstract idea of selecting candidate neighbor nodes. Mere instruction to apply an exception cannot provide an inventive concept.
Therefore, claim 11 is ineligible.
Regarding Claim 12,
Step 2A Prong 1: Claim 12, which incorporates the rejection of claim 9, recites further limitation such as:
wherein selecting the neighbor nodes comprises determining nearest neighbors from the center node of the graph, the distance metric associated with parameters for determining the nearest neighbors. (This limitation is part of the abstract idea recited claim 9. Claim 12 merely defines the distance measure as a parameter used to determine the nearest neighbor nodes. This is part of the mental process and mathematical concept. See MPEP § 2106.04(a)(2)(I) & (III).)
Step 2A Prong 2: The claim does not recite additional element that integrates the judicial exception into a practical application.
Step 2B: The claim does not recite additional elements that amount to significantly more than the judicial exception.
Therefore, claim 12 is ineligible.
Regarding Claim 13,
Step 2A Prong 1: Claim 13, which incorporates the rejection of claim 9, recites further limitation such as:
wherein the graph comprises an embedding matrix and an adjacency matrix, wherein the embedding matrix comprises a plurality of feature vectors for the set of nodes of the graph, and wherein the adjacency matrix comprises edge weights for edges that connect the nodes of the graph. (This limitation is part of the abstract idea recited claim 9. Claim 13 merely introduce the terms “embedding matrix” and “adjacency matrix” representing the graph. The embedding matrix broadly defines a numerical representation of the feature vectors of the nodes, where each row corresponds to a node’s feature vector. The adjacency matrix broadly defines a numerical representation of the edge weights between nodes, where each entry represents the weight (distance/similarity) of an edge between two nodes. That is part of the abstract idea of mathematical concept and mental process. See MPEP § 2106.04(a)(2)(I) & (III).)
Step 2A Prong 2: The claim does not recite additional element that integrates the judicial exception into a practical application.
Step 2B: The claim does not recite additional elements that amount to significantly more than the judicial exception.
Therefore, claim 13 is ineligible.
Regarding Claim 14,
Step 2A Prong 1: Claim 14, which incorporates the rejection of claim 9, doesn’t recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
wherein the first classification corresponds to a benign object and the second classification corresponds to an adversarial object. (This limitation amounts to no more than adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP § 2106.05(g). This broadly defines the received subset of objects labeled as benign and adversarial objects. This does not impose a meaningful limitation on the abstract idea as it generally links the use of a judicial exception to a particular technological environment or field of use, as discussed in MPEP § 2106.05(h).)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, it is noted that a claim directed to a judicial exception cannot be made eligible simply by limitation the exception to a particular technological use. See MPEP § 2106.05(h).
Therefore, claim 14 is ineligible.
Regarding Claim 15,
Step 2A Prong 1: Claim 15, which incorporates the rejection of claim 14, recites further limitation such as:
wherein the adversarial object is generated by perturbing the benign object using entropy data. (This is an abstract idea of a mental process. The recitation of perturbing the benign object using entropy data is a process that can be practically performed by an individual with the aid of pen and paper. The claim does not define the technical implementation of the adversarial object generation and it is recited at a high level of generality such that it encompasses a process that can be manually performed. See MPEP § 2106.04(a)(2)(I) & (III).)
Step 2A Prong 2: The claim does not recite additional element that integrates the judicial exception into a practical application.
Step 2B: The claim does not recite additional elements that amount to significantly more than the judicial exception.
Therefore, claim 15 is ineligible.
Regarding Claim 16,
Step 2A Prong 1: Claim 16, which incorporates the rejection of claim 9, doesn’t recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
wherein object corresponds to an adversarial object, wherein the classification model initially assigns the classification of the object as being a benign object, and wherein the graph discriminator uses the feature vector obtained from the classification model to determine that the object is adversarial. (This limitation amounts to no more than using a generic computer component to perform generic computer function, as discussed in MPEP § 2106.05(f). The high-level recitation of using the classification model to classify objects and perform feature extraction represent generic computer functions that are well-known and conventional in the art. As evidenced by Mendes Rodrigues et al., (Pub. No.: US 20190323993 A1), a standard machine learning (ML) model may utilize a pre-trained convolutional neural network performs feature vector extraction and apply a simple classifier, like k-Nearest Neighbors, to identify similar image patches based on the similarity of the extracted feature vectors. See Mendes [0051].)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, the use of computer component (i.e., classification model or discriminator) to perform conventional machine learning functions does not amount to inventive concept. See MPEP § 2106.05(d).
Therefore, claim 16 is ineligible.
Regarding Claim 17,
Step 2A Prong 1: Claim 17, which incorporates the rejection of claim 9, doesn’t recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
wherein the object comprises a credential of a user, the credential operable for validating whether the user has authorization to access a resource. (This limitation amounts to generally linking the use of a judicial exception to a particular technological environment or field of use, as discussed in MPEP § 2106.05(h).)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, it is noted that a claim directed to a judicial exception cannot be made eligible simply by limitation the exception to a particular technological use. See MPEP § 2106.05(h).
Therefore, claim 17 is ineligible.
Regarding Claim 18,
Step 2A Prong 1: Claim 18, which incorporates the rejection of claim 9, recites further limitation such as:
aggregate the feature vector with the other feature vectors of the graph, the aggregation performed using the edge weights between nodes of the graph. (This is an abstract idea of a mental process. The process of aggregating feature vectors using weights cover concept performed in the human mind with the aid of pen and paper. See MPEP § 2106.04(a)(2)(I) & (III).)
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
The recitation of “a neural network that is trained to” perform the abstract idea of “aggregating feature amounts to invoking computer component as a tool to perform the abstract idea. Merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea, as discussed in MPEP § 2106.05(f).
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As explained above in step 2A, prong Two, the recitation of using a generic computer software (i.e., neural network) as a tool to perform the abstract idea (i.e., select and aggregate feature vectors using weights) does provide an inventive concept.
Therefore, claim 18 is ineligible.
Regarding Claim 19,
Step 2A Prong 1: Claim 19, which incorporates the rejection of claim 9, recites further limitation such as:
wherein generating the graph comprises performing a fine-tuning process that selects nodes for inclusion into the graph ... select, for each iteration, a candidate nearest neighbor from either a first subset of the reference set of objects or a second subset of the reference set of objects, the first subset associated with the first classification and the second subset associated with the second classification. (This is an abstract idea of a mental process. The process of selecting candidate nearest neighbor nodes is a process that, under its broadest reasonable interpretation, cover concepts that can be performed in the human mind. See MPEP § 2106.04(a)(2)(III).)
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
The recitation of “using the neural network” perform the abstract idea of “selecting candidate neighbor nodes amounts to invoking computer component as a tool to perform the abstract idea. Merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea, as discussed in MPEP § 2106.05(f).
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As explained above in step 2A, prong Two, the use of generic computer component (i.e., neural network) as a tool to perform the abstract idea (i.e., select candidate nearest neighbor nodes) does provide an inventive concept.
Therefore, claim 19 is ineligible.
Regarding Claim 20,
Step 2A Prong 1: Claim 20, which incorporates the rejection of claim 9, recites further limitation such as:
labeling the object using the classification determined by the graph discriminator; (This is an abstract idea of a mental process. The process of assigning a label based on a decision results cover performance that can be practically performed in the human mind. See MPEP § 2106.04(a)(2)(III).)
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
updating the reference set of objects to include a new reference object that corresponds to the labeled object. (This amounts to adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP § 2106.05(g). This represents a generic computer function (i.e., storing or retrieving the results).)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As explained above in step 2A, prong Two, the updating step merely represent a generic computer function (i.e., output or storing the results). The courts have recognized computer functions such as “storing and retrieving information” as well‐understood, routine, and conventional function when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. See MPEP § 2106.05(d).
Therefore, claim 20 is ineligible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-3 and 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al., (Pub. No.: US 20210067549 A1) in view of Dong et al., (NPL: "Efficient k-nearest neighbor graph construction for generic similarity measures." (2011)), and further in view of Cohen et al., (NPL: "Sub-image anomaly detection with deep pyramid correspondences." (May 2020)).
Regarding Claim 1,
Chen discloses the following:
A method for classifying an input sample as adversarial or benign, the method comprising: (Chen, [0020] “Embodiments of the present invention provide an effective adversarial defense system to protect against adversarial attacks. In particular, graph neural networks (GNNs) may be trained to automatically detect adversarial sample attacks, improving the robustness of the GNNs by making it possible to separate adversarial samples from meaningful samples.”)
storing a set of training samples that comprises a first set of benign training samples and a second set of adversarial training samples, each training sample having a known classification from a plurality of classifications, the second set of adversarial training samples being generated using an adversarial sample generation method; (Chen, [0030] “To train an adversarial sample detector on graph data, speculated adversarial instances may be included as part of the training data. ... To address this, the present embodiments may use a classifier that is built upon a large number of known-benign samples, and a smaller number of known-malicious samples.” [0034] “ The classifier may thus be trained with a mixture of original samples and adversarial samples, ” [0035] “Adversarial samples can be generated, ...” [0087] “The present embodiments can then classify the nodes into an anomalous class and a normal class, using the labeled historical data. The labeled network graph can then be used to identify anomalous behavior that may, for example, be indicative of a network failure or intrusion.”) [Examiner’s Note: Chen clearly obtains/stores both benign (normal/original) and adversarial (malicious/generated perturbed) samples. These samples (nodes) have known labels as (benign or anomalous).]
obtaining, ..., a feature vector for each training sample of the first set and the second set; (Chen, [0076]-[0077] “Referring now to FIG. 9, the overall structure of the GNN is shown. The refined graph encoder 902 accepts graph samples (including original samples and perturbed adversarial samples) as an input. ... The refined graph encoder 902 is used on a variety of samples, including original samples and perturbed adversarial samples. The refined graph encoder 902 generates respective representations for the samples, and these representations are provided together to adversarial contrastive learning 920.” [0013] “FIG. 6 is a block/flow diagram directed to a graph encoder refining process to improve a model for generating representations of graph samples,”) [Examiner’s Note: Chen teaches encoding the original and adversarial samples to generate a graph-based representation. The encoded representation includes features of the training samples.]
determining a graph for ... a set of input samples, (Chen, [0077] “The refined graph encoder 902 is used on a variety of samples, including original samples and perturbed adversarial samples. The refined graph encoder 902 generates respective representations for the samples, and these representations are provided together to adversarial contrastive learning 920.” [0087] “Given a network's historical records, a sequence of communication graphs can be constructed, where each node is a computational device and each edge indicates a communication. ... The labeled network graph can then be used to identify anomalous behavior that may, for example, be indicative of a network failure or intrusion.” )
training, using each determined graph, a graph discriminator to differentiate between benign samples and adversarial samples, the training using (I) the feature vectors associated with nodes of the graph and (II) the edge weights between the nodes of the graph. (Chen, [0028] “ a GNN can be a neural network parameterized function that leverages both graph structure and node features to learn the representation of a node/graph.” [0035] “... given a graph g(0)=(X(0), A(0)), with node features X(0), corresponding node labels y(0) and A, a classifier f(0) can be trained to distinguish between benign and anomalous nodes. Adversarial samples can be generated, and a perturbed graph G1=(X(1), A(1)) can be constructed. The labels of an attacked node are changed from benign to anomaly. Based on the perturbed graph G1, a new classifier f(1) may be trained. After a number of iterations (e.g., up to a predetermined maximum), the final classifier is selected as the adversary-resistant classifier.” [0077] “A readout layer 922 obtains the graph embedding 924 of each of the input final representations from the respective samples. An MIE 926 jointly models local and global representations, based on both the input final representation and the graph embedding. The output of the MIE 926 is processed by discriminator to determine whether the unlabeled node v is normal or anomalous, thereby providing a node label 930.” Further see [0028] and [0042]-[0044].) [Examiner’s Note: Chen teaches training a classifier/discriminator using GNN (graph-based representation of node features and weighted edges) to distinguish between benign and adversarial/anomalous samples.]
While Chen teaches the method/system of an adversarial sample detection with graph adversarial training including determining a graph representation of a set of input samples (benign and adversarial) to train a classifier/discriminator to distinguish between benign samples and adversarial samples, Chen does not appear to explicitly teach the following:
obtaining, with a pre-trained classification model, a feature vector for each training sample of the first set and the second set;
determining a graph for each input sample of a set of input samples, a respective input sample corresponding to a center node of a set of nodes of the graph, wherein determining the graph comprises:
selecting, using a distance metric associated with the center node of the graph, neighbor nodes that neighbor the center node and are to be included in the set of nodes of the graph, each neighbor node labeled as either a benign training sample or an adversarial training sample of the set of training samples; and
determining an edge weight of an edge between a first node and a second node of the set of nodes of the graph based on a distance between respective feature vectors of the first node and the second node;
However, Chen in view of Dong teaches the following:
obtaining, ... , a feature vector for each training sample of the first set and the second set; (Dong, [Pp. 4-5, Section: 3)] “Corel: This dataset contains features extracted from 66,000 Corel stock images. Each image is segmented into about 10 regions, and a feature is extracted from each region. We treat region features as individual objects. ... A feature is extracted from each model.”) [Examiner’s Note: Dong teaches extracting feature vectors from input samples (images, audio, shapes) and using these feature vectors as the basis for k-NN graph construction.]
determining a graph for each input sample of a set of input samples, a respective input sample corresponding to a center node of a set of nodes of the graph, wherein determining the graph comprises: (Dong, [P. 1, Section: 1] “The K-Nearest Neighbor Graph (K-NNG) for a set of objects V is a directed graph with vertex set V and an edge from each v ∈ V to its K most similar objects in V under a given similarity measure, e.g. cosine similarity for text, l2 distance of color histograms for images, etc.” [P. 2, Section: 2] “Let V be a dataset of size N = |V |, and let σ : V ×V → R be a similarity measure. For each v ∈ V , let BK(v) be v’s K-NN, i.e. the K objects in V (other than v) most similar to v. Let RK(v) = {u ∈ V | v ∈ BK(u)} be v’s reverse K-NN. In the algorithm, we use B[v] and R[v] to store the approximation of BK(v) and RK(v), together with the similarity values, and let B[v] = B[v]∪ R[v], referred to as the general neighbors of v. B[v] is organized as a heap, so updates cost O(log K). ... Our method is based on the following simple principle: a neighbor of a neighbor is also likely to be a neighbor. In other words, if we have an approximation of the K-NN for each point, then we can improve that approximation by exploring each point’s neighbors’ neighbors as defined by the current approximation.”) [Examiner’s Note: Dong teaches constructing k-NN graph where each object K items within a radius ∆/2 by exploring its neighbors’ neighbors (its K most similar object based on center distance measure). Thus, each point v is processed as a center with its own k-NN neighborhood structure.]
selecting, using a distance metric associated with the center node of the graph, neighbor nodes that neighbor the center node and are to be included in the set of nodes of the graph, (Dong, [P. 1, Section: 1] “The K-Nearest Neighbor Graph (K-NNG) for a set of objects V is a directed graph with vertex set V and an edge from each v ∈ V to its K most similar objects in V under a given similarity measure, e.g. cosine similarity for text, l2 distance of color histograms for images, etc. [P. 2, Sections: 2.1 and 2.2] “Let V be a dataset of size N = |V |, and let σ : V ×V → R be a similarity measure. For each v ∈ V , let BK(v) be v’s K-NN, i.e. the K objects in V (other than v) most similar to v. Let RK(v) = {u ∈ V | v ∈ BK(u)} be v’s reverse K-NN. In the algorithm, we use B[v] and R[v] to store the approximation of BK(v) and RK(v), together with the similarity values, and let B[v] = B[v]∪ R[v], referred to as the general neighbors of v. B[v] is organized as a heap, so updates cost O(log K). We are particularly interested in the case when V is a metric space with a distance metric
d
... Our method is based on the following simple principle: a neighbor of a neighbor is also likely to be a neighbor. In other words, if we have an approximation of the K-NN for each point, then we can improve that approximation by exploring each point’s neighbors’ neighbors as defined by the current approximation. ... In other words, we expect to halve the maximal distance to the set of approximate K nearest neighbors by exploring B ′ [v] for every v ∈ V. ... The approximate K-NNG can be viewed as K × N functions, each being the distance between one of the N objects and its k-th NN.” [P. 4, Section: 2.8] “Our algorithm can be easily implemented under the MapReduce framework. A record consists of a key object and a list of (candidate) neighbors each attached with its distances to the key object.”) [Examiner’s Note: Dong teaches the selection process by performing local search to find and store K candidate neighbor nodes (most similar neighbors) via heap. The proposed process is performed for center node v, explore candidates via local search, compute distance/similarity, update NN whether candidate u2 is included in B[v] based on whether 1 makes it one of the top K (
U
p
d
a
t
e
N
N
(
B
[
v
]
,
u
2
,
l
)
).] each neighbor node labeled as either a benign training sample or an adversarial training sample of the set of training samples; (Chen, [0035] “Adversarial samples can be generated, and a perturbed graph G1=(X(1), A(1)) can be constructed. The labels of an attacked node are changed from benign to anomaly.” [0084] “Each node 1104 in the network 1100 includes one or more attributes or labels. These labels identify some characteristic of the node 1104. For example, in a complex molecule, individual atoms and ions may be labeled as contributing to a pharmacological effect of the molecule, with some nodes 1104 being labeled as contributing, and other nodes 1104 being labeled as not contributing. In a computer network environment, the nodes 1104 may be labeled according to roles within the network (e.g., server vs workstation), or according to conformance to expected behavior (e.g., normal vs anomalous). The labels may include, for example, an attribute vector, denoting multiple attributes of the node respective 1104.”)
determining an edge weight of an edge between a first node and a second node of the set of nodes of the graph based on a distance between respective feature vectors of the first node and the second node; (Dong, [P. 1, Section: 1] “The K-Nearest Neighbor Graph (K-NNG) for a set of objects V is a directed graph with vertex set V and an edge from each v ∈ V to its K most similar objects in V under a given similarity measure, e.g. cosine similarity for text, l2 distance of color histograms for images, etc.” [P. 5, Section: 3.1.3] “The Earth Mover’s Distance (EMD) 1 [20] is a measure of distance between two weighted sets of feature vectors and has been shown effective for content-based image retrieval [16]. Let P = {hwi, vii} and Q = {hwj , vj i} be two sets of features with normalized weights
∑
i
w
i
=
∑
j
w
j
=
1
, and let
d
be the feature distance, ... Evaluating EMD is costly as it involves solving a linear programming. We use EMD to show the generality of our method. We set the weights of the feature vectors to be proportional to the number of pixels in their corresponding image regions. Following the original paper [16], we use l1 as feature distance.”)
Accordingly, at the effective filing date, it would have been prima facie obvious to one ordinarily skilled in the art of machine learning to modify the combination of Chen and Dong to incorporate the K-NNG graph construction method as taught by Dong. One would have been motivated to make such a combination in order to obtain efficient method for approximate K-Nearest Neighbor graph construction with arbitrary similarity measures. Thereby, would enable fast and accurate large-scaled dataset analysis (Dong [Conclusion]).
While Chen in view of Dong teaches extracting/encoding original and adversarial samples to generate respective original and adversarial graph representations, Chen in view of Dong does not appear to explicitly teach:
obtaining, with a pre-trained classification model, a feature vector for each training sample of the first set and the second set;
However, Cohen, in combination with Chen and Dong, teaches the limitation:
obtaining, with a pre-trained classification model, a feature vector for each training sample of the first set and the second set; (Cohen, [P. 5, Sections: 3.1 & 3.2] “We therefore used a ResNet feature extractor pre-trained on the ImageNet dataset. As image-level features we used the feature vector obtained ... For a given test image y, we retrieve its K nearest normal images from the training set, Nk(fy). The distance is measured using the Euclidean metric between the image-level feature representations. ... Images are labelled at this stage as normal or anomalous. Positive classification is determined by verifying if the kNN distance is larger than a threshold τ . It is expected that most images are normal, and only few images are designated as anomalous.” Let us denote the global feature extractor F, for a given image xi , we denote the extracted features fi : fi = F(xi) (1).” [Pp. 5-6, Section: 3.3] “We extract deep features at every pixel location p ∈ P using feature extractor F(xi , p) of the relevant test and normal training images. The details of the feature extractor will be described in Sec. 3.4. We construct a gallery of features at all pixel locations of the K nearest neighbors G = {F(x1, p)|p ∈ P} ∪ {F(x2, p)|p ∈ P}}.. ∪ {F(xK, p)|p ∈ P}}. The anomaly score at pixel p, is given by the average distance between the features F(y, p) and its κ nearest features from the gallery G. ... For a given threshold θ, a pixel is determined as anomalous if d(y, p) > θ, that is, if we cannot find a closely corresponding pixel in the K nearest neighbor normal images.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Chen, Dong, and Cohen before them, to incorporate the pre-trained feature extraction method for anomaly detection as taught by Cohen. One would have been motivated to make such a combination in order to improve and simplify anomaly detection by using K nearest neighbors of pixel-level feature pyramids extracted by pre-trained deep features. Doing so would achieve high accuracy and reasonable computational complexity (Cohen [Conclusion).
Regarding Claim 2, the combination of Chen, Dong, and Cohen teaches the elements of claim 1 as outlined above, and further teaches:
wherein the graph comprises an embedding matrix and an adjacency matrix, wherein the embedding matrix comprises a plurality of feature vectors for the set of nodes of the graph, a feature vector corresponding to an embedding of the embedding matrix, and wherein the adjacency matrix comprises edge weights for edges that connect the nodes of the graph. (Chen, [0035, 0037] “Given a graph g(0)=(X(0), A(0)), with node features X(0), corresponding node labels y(0) and A, a classifier f(0) can be trained to distinguish between benign and anomalous nodes. Adversarial samples can be generated, and a perturbed graph G1=(X(1), A(1)) can be constructed. ... the attack direction for a target node t includes the adjacency matrix A and the feature matrix X. The integrated gradients can be determined for the priority function J(A, X, t) for a node t, with respect to an input I. For a feature attack, then, I=X, and for a structure attack, I=A.” [0044]-[0045] Global-level representations, such as graph embedding, are less sensitive to adversarial perturbation than local-level representations, such as node embedding. As a result, the high-level graph representation can be used to guide the robust learning of local node representations ... A readout function can be used to obtain the graph embedding. Given the node representations {hv (final)}vϵV of graph G, ... where σ represents the sigmoid function, Wr represents a trainable linear mapping matrix, and n represents the number of nodes within the graph.”)
Regarding Claim 3, the combination of Chen, Dong, and Cohen teaches the elements of claim 1 as outlined above, and further teaches:
wherein selecting the neighbor nodes for inclusion within the graph comprises generating a k-nearest neighbor graph that determines the neighbor nodes as being nearest neighbors from the center node, the distance metric being associated with parameters for generating the k-nearest neighbor graph. (Dong, [p. 1, Section: 1] “The K-Nearest Neighbor Graph (K-NNG) for a set of objects V is a directed graph with vertex set V and an edge from each v ∈ V to its K most similar objects in V under a given similarity measure, e.g. cosine similarity for text, l2 distance of color histograms for images, etc.” [Pp. 2-3, Section: 2.2] “Let V be a dataset of size N = |V |, and let σ : V ×V → R be a similarity measure. For each v ∈ V , let BK(v) be v’s K-NN, i.e. the K objects in V (other than v) most similar to v. Let RK(v) = {u ∈ V | v ∈ BK(u)} be v’s reverse K-NN. In the algorithm, we use B[v] and R[v] to store the approximation of BK(v) and RK(v), together with the similarity values, and let B[v] = B[v]∪ R[v], referred to as the general neighbors of v. B[v] is organized as a heap, so updates cost O(log K). We are particularly interested in the case when V is a metric space with a distance metric d .... Our method is based on the following simple principle: a neighbor of a neighbor is also likely to be a neighbor. In other words, if we have an approximation of the K-NN for each point, then we can improve that approximation by exploring each point’s neighbors’ neighbors as defined by the current approximation. ... In other words, we expect to halve the maximal distance to the set of approximate K nearest neighbors by exploring B ′ [v] for every v ∈ V . ... Our basic NN-Descent algorithm, as shown in Algorithm 1, is just a repeated application of this observation. We start by picking a random approximation of K-NN for each object, iteratively improve that approximation by comparing each object against its current neighbors’ neighbors, including both K-NN and reverse K-NN, and stop when no improvement can be made. The approximate K-NNG can be viewed as K × N functions, each being the distance between one of the N objects and its k-th NN.) [Examiner’s Note: Algorithm 1 results in Result: K-NN list B of candidate neighbors nodes that are most similar to node v, the associated parameters are broadly interpreted as the parameters of the k-nearest neighbors algorithm]
Regarding Claim 7,
the combination of Chen, Dong, and Cohen teaches the elements of claim 1 as outlined above, and further teaches:
wherein a particular input sample of the input samples is associated with a ground truth label, and wherein the graph discriminator comprises a neural network that outputs a class prediction of whether the particular input sample is benign or adversarial, and wherein the neural network is trained by minimizing a loss between the class prediction of the graph discriminator and the ground truth label. (Chen, [0034]-[0035] “The classifier may thus be trained with a mixture of original samples and adversarial samples, with an overall loss function that may be defined as: ... where L and l measure dataset-level and instance-level classification errors, respectively, where β controls trade-off between original samples and adversarial samples. The label in the second term is fixed as 1, since these refer to the anomalies. The generated adversarial samples are expressed as v* in a set of such samples V*. ... In some embodiments, given a graph g(0)=(X(0), A(0)), with node features X(0), corresponding node labels y(0) and A, a classifier f(0) can be trained to distinguish between benign and anomalous nodes. Adversarial samples can be generated, and a perturbed graph G1=(X(1), A(1)) can be constructed. The labels of an attacked node are changed from benign to anomaly. Based on the perturbed graph G1, a new classifier f(1) may be trained. After a number of iterations (e.g., up to a predetermined maximum), the final classifier is selected as the adversary-resistant classifier.” [0077.] “The refined graph encoder 902 is used on a variety of samples, including original samples and perturbed adversarial samples. The refined graph encoder 902 generates respective representations for the samples, and these representations are provided together to adversarial contrastive learning 920. A readout layer 922 obtains the graph embedding 924 of each of the input final representations from the respective samples. An MIE 926 jointly models local and global representations, based on both the input final representation and the graph embedding. The output of the MIE 926 is processed by discriminator to determine whether the unlabeled node v is normal or anomalous, thereby providing a node label 930.” Further see [0068].)
Regrading Claim 8,
the combination of Chen, Dong, and Cohen teaches the elements of claim 7 as outlined above, and further teaches:
wherein the graph discriminator receives as input, for a given training iteration, an adjacency matrix and an embedding matrix that are associated with the particular input sample. (Chen, [0035] “given a graph g(0)=(X(0), A(0)), with node features X(0), corresponding node labels y(0) and A, a classifier f(0) can be trained to distinguish between benign and anomalous nodes. Adversarial samples can be generated, and a perturbed graph G1=(X(1), A(1)) can be constructed. The labels of an attacked node are changed from benign to anomaly. Based on the perturbed graph G1, a new classifier f(1) may be trained. After a number of iterations (e.g., up to a predetermined maximum), the final classifier is selected as the adversary-resistant classifier.” [0046] “For the discriminator, a mutual information estimator (MIE) can be used to jointly model the local and global graph representations via a bilinear scoring function: S(h v (final) |h G)=MIE(h v (final) , h G)=σ(h v (final) W D h G) where MIE(hv (K), hG) represents the mutual information between the node embedding and the graph embedding, WD is a trainable scoring matrix, K is a number of hops, and σ is the sigmoid function.” [0077] “A readout layer 922 obtains the graph embedding 924 of each of the input final representations from the respective samples. An MIE 926 jointly models local and global representations, based on both the input final representation and the graph embedding. The output of the MIE 926 is processed by discriminator to determine whether the unlabeled node v is normal or anomalous, thereby providing a node label 930.”)
Claim(s) 4-6 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Chen, Dong, and Cohen as described above, and further in view of Bhowan et al., (Pub. No.: US 20170286866 A1).
Regarding Claim 4, the combination of Chen, Dong, and Cohen teaches the elements of claim 1 as outlined above, and further teaches:
While the combination of Chen, Dong, and Cohen teaches the edge weight between nodes of the graph with the distance. Dong also describes that smaller distance means higher similarity between nodes. The combination of Chen, Dong, and Cohen does not appear to explicitly suggest:
wherein the edge weight between the first node and the second node of the graph is inversely correlated with the distance between the respective feature vectors of the first node and the second node.
However, it would have been obvious in view of Bhowan. Hereinafter, Bhowan, in combination with Chen, Dong, and Cohen, teaches:
wherein the edge weight between the first node and the second node of the graph is inversely correlated with the distance between the respective feature vectors of the first node and the second node. (Bhowan, [0062] “In particular, the edge weight represents a relationship between the pair of features mapped to the two nodes connected by the edge. In example implementations, the edge weight represents a normalized distance between the nodes, which represents a measure of a relationship between the feature pair such as the inter-dependence of the feature pair. In example implementations, the edge weight for each edge may be calculated using a predetermined formula. The predetermined formula may be based on an inverse of the ranking position difference between the feature pair. Examples of a suitable predetermined formula are discussed in more detail below. ... determine an edge weight for an edge between a pair of nodes representing a feature pair based on a formula in which the edge weight is inversely proportional to the distance between the feature pair.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Chen, Dong, and Cohen before them, to incorporate the method/system for generating the merged feature graph as taught by Kothari. One would have been motivated to make such a combination in order to refine or further develop the model for improved performance. Doing so would lead to a model with improved performance, e.g., in terms of accuracy, evaluation speed etc. (Bhowan [0009]).
Regarding Claim 5, the combination of Chen, Dong, and Cohen teaches the elements of claim 1 as outlined above.
While the combination of Chen, Dong, and Cohen describes a nonlinear mapping function to resenting weight matrix and nodes within the graph and describes the EMD distance measure between weighted sets of feature vectors, the combination of Chen, Dong, and Cohen does not appear to explicitly teach:
wherein the edge weight is determined based on a function that maps the distance between the respective feature vectors of the first node and the second node from a first space to a second space.
However, Bhowan, in combination with Chen, Dong, and Cohen, teaches:
wherein the edge weight is determined based on a function that maps the distance between the respective feature vectors of the first node and the second node from a first space to a second space. (Bhowan, [0062] “In particular, the edge weight represents a relationship between the pair of features mapped to the two nodes connected by the edge. In example implementations, the edge weight represents a normalized distance between the nodes, which represents a measure of a relationship between the feature pair such as the inter-dependence of the feature pair. In example implementations, the edge weight for each edge may be calculated using a predetermined formula. The predetermined formula may be based on an inverse of the ranking position difference between the feature pair. Examples of a suitable predetermined formula are discussed in more detail below. ... For each edge, the last column in Tables 1A and 1B shows the edge weight or strength value calculated using the following formula: Edge Weight=[1/(1+Pos. Diff)]*[W*ê(−Decay/5)] (equation 1). ” [0067] “ the edge weight for an edge between nodes representing features that are q positions apart, ... may be inversely proportional to q.”)
The same motivation that was utilized for combining Chen, Dong, Cohen, and Bhowan as set forth in claim 4 is equally applicable to claim 5.
Regarding Claim 6,
the combination of
Chen, Dong, Cohen, and Bhowan teaches the elements of claim 5 as outlined above, and further teaches:
While Chen teaches training the graph discriminator using generated graph representation. The combination of Chen, Dong, Cohen, and Bhowan does not appear to explicitly teach:
determining parameters for the function, the parameters determined using each of the input samples.
However, Fatemi, in combination with Chen, Dong, Cohen, and Bhowan, teaches the limitation:
wherein the training the graph discriminator further comprises: determining parameters for the function, the parameters determined using each of the input samples. (Fatemi, [0075] “generator 142, θG corresponds to the weights of a multi-layer perceptron (MLP) and Ã=GMLP(X; θG)=kNN(MLP(X)), where node features 120 are input to generator 142, and the k value specifies what k to use for k nearest neighbor. In some embodiments, MLP: is an MLP that produces a matrix with updated node features X′ and kNN: produces a sparse matrix à based on X′ where Ãij=sim(X′i,X′j) if vi is among the top k similar nodes to vi based on a similarity function sim; otherwise Ãij=0. Recall that vi and vj correspond to the ith and jth nodes in the graph respectively.” [0085] “classifier 146 can be implemented using a function GNNC: .. with parameters θGNN C . GNN C 146 takes the node features 120 as represented by X and the normalized adjacency matrix A as input, and provides, for each node, the logits for each class 147.” [0106] “The parameters of generator 142 are thus trained to generate adjacency that is a good adjacency for both of generator 142 and classifier 146.”) [Examiner’s Note: Fatemi describes jointly training the Generator and classifier (i.e., discriminator) to determine the graph structure for training the classifier.]
Accordingly, at the effective filing date, it would have been prima facie obvious to one ordinarily skilled in the art of machine learning to modify the combination of Chen, Dong, Cohen, and Bhowan to incorporate the method for structure learning for graph neural networks as taught by Fatemi. One would have been motivated to make such a combination in order to solve a supervision starvation problem in latent graph learning. Doing so would improve the performance of graph structure learning (Fatemi [0056]).
Claim(s) 9-10 and 12-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al., (Pub. No.: US 20210067549 A1) in view of Dou et al., (IDS: “Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters.” (August 2020)), and further in view of Liu et al., (NPL: “Learning to propagate labels: Transductive propagation network for few-shot learning.” (2019)).
Regarding Claim 9,
Chen discloses the following:
A method of using a machine learning model, the method comprising: (Chen, [0017] “FIG. 10 is a block diagram of an intrusion detection system that uses a graph neural network intrusion detection model, trained using adversarial samples, to identify and respond to anomalous network nodes in a manner that is resistant to adversarial training attacks;” [0080] “A network monitor 1006 collects information from agents at various hosts using the network interface 1005. This can include historical event information, including process-level events and network events, as well as real-time event information. A GNN-based intrusion detection model 1010, according to the present embodiments, determines whether a particular host, system, or agent on the network is behaving in an anomalous fashion, using inputs from the network monitor 1006.”)
receiving sample data of an object to be classified; ( [] “Given a sequence of attributed graphs 1100 for a set of nodes 1104, where each node 1104 has a unique class label over a period of time, the present embodiments predict the labels of unlabeled nodes 1102 by learning from the labeled ones. ... The present embodiments can then classify the nodes into an anomalous class and a normal class, using the labeled historical data. The labeled network graph can then be used to identify anomalous behavior that may, for example, be indicative of a network failure or intrusion. The anomalous behavior can then be corrected.” [0030] “To train an adversarial sample detector on graph data, speculated adversarial instances may be included as part of the training data. However, these samples may be limited in number, and hard to obtain, due to the unknown discrete adversarial space. To address this, the present embodiments may use a classifier that is built upon a large number of known-benign samples, and a smaller number of known-malicious samples.” [0034].)
executing, using the sample data, a classification model to obtain a feature vector, (Chen, [Abstract] “The original and adversarial samples are encoded to generate respective original and adversarial graph representations, based on node neighborhood aggregation.” [0077] “The refined graph encoder 902 is used on a variety of samples, including original samples and perturbed adversarial samples. The refined graph encoder 902 generates respective representations for the samples, and these representations are provided together to adversarial contrastive learning 920.”)
.... and
applying a graph discriminator to the graph to determine whether the sample data of the object is to be classified with the first classification or the second classification, the graph discriminator trained using (I) the feature vectors associated with nodes of the graph and (II) the edge weights between the nodes of the graph. (Chen, [0034]-[0035] “The classifier may thus be trained with a mixture of original samples and adversarial samples, with an overall loss function that may be defined as:{(L)}=βL(f GNN , G, y)+(1−β)Σv*ϵV* l(f GNN , v*, 1) where L and l measure dataset-level and instance-level classification errors, respectively, where β controls trade-off between original samples and adversarial samples. ... given a graph g(0)=(X(0), A(0)), with node features X(0), corresponding node labels y(0) and A, a classifier f(0) can be trained to distinguish between benign and anomalous nodes. Adversarial samples can be generated, and a perturbed graph G1=(X(1), A(1)) can be constructed. ... Based on the perturbed graph G1, a new classifier f(1) may be trained. After a number of iterations (e.g., up to a predetermined maximum), the final classifier is selected as the adversary-resistant classifier.” [0044] “A readout function can be used to obtain the graph embedding. Given the node representations {hv (final)}vϵV of graph G, the readout function ... where σ represents the sigmoid function, Wr represents a trainable linear mapping matrix, and n represents the number of nodes within the graph.” [0077] “The refined graph encoder 902 is used on a variety of samples, including original samples and perturbed adversarial samples. The refined graph encoder 902 generates respective representations for the samples, and these representations are provided together to adversarial contrastive learning 920. A readout layer 922 obtains the graph embedding 924 of each of the input final representations from the respective samples. An MIE 926 jointly models local and global representations, based on both the input final representation and the graph embedding. The output of the MIE 926 is processed by discriminator to determine whether the unlabeled node v is normal or anomalous, thereby providing a node label 930.”) [Examiner’s Note: Chen teaches the trained classifier/discriminator applied to graph embedding learned using node features and edge weights to distinguish between original/normal and adversarial/anomalous object.]
Chen does not appear to explicitly teach the following:
executing, using the sample data, a classification model to obtain a feature vector, the classification model trained to assign a classification of a plurality of classifications to the sample data, the plurality of classifications including a first classification and a second classification;
generating a graph using the feature vector and other feature vectors that are respectively obtained from a reference set of objects, the reference set of objects respectively labeled with the first classification or the second classification, the feature vector for the object corresponding to a center node of a set of nodes of the graph, wherein determining the graph comprises:
selecting, using a distance metric associated with the center node of the graph, neighbor nodes that neighbor the center node and are to be included in the set of nodes of the graph, each neighbor node corresponding to an object of the reference set of objects and having the first classification or the second classification; and
determining an edge weight of an edge between a first node and a second node of the set of nodes of the graph based on a distance between respective feature vectors of the first node and the second node;
However, Chen in view of Dou teaches the following:
obtain a feature vector, ... of a plurality of classifications to the sample data, the plurality of classifications including a first classification and a second classification; (Dou, [p. 2, Section: 2] “Definition 2.1. Multi-relation Graph. We define a multi-relation graph as G = V, X, {Er }|R r=1 ,Y , where V is the set of nodes {v1, . . . ,vn }. Each node vi has a d-dimensional feature vector xi ∈ R d and X = {x1, . . . , xn } represents a set of all node features. e r i,j = (vi ,vj) ∈ Er is an edge between vi and vj with a relation r ∈ {1, · · · , R}. Note that an edge can be associated with multiple relations and there are R different types of relations. Y is the a set of labels for each node in V. ... the node v represents the target entity whose suspiciousness needs to be justified. For example, it can be a review on the review website [19, 29] or a transaction in the trading system [23, 37]. The node has a label yv ∈ {0, 1} ∈ Y where 0 represents benign and 1 represents suspicious.”) [Examiner’s Note: the embeddings feature vector obtained from the input sample x and Y is the label for each entity.]
generating a graph using the feature vector and other feature vectors that are respectively obtained from a reference set of objects, the reference set of objects respectively labeled with the first classification or the second classification, the feature vector for the object corresponding to a center node of a set of nodes of the graph, (Dou, [p. 1, Section: 1] “To demonstrate the challenges induced by camouflaged fraudsters during the neighbor aggregation of GNNs, as shown in Figure 1, we construct a graph with two relations and two types of entities. ... we propose three neural modules to enhance the GNNs against the camouflages. 1) For the feature camouflage, we propose a label-aware similarity measure to find the most similar neighbors based on node features. Specifically, we design a neural classifier as a similarity measure, which is directly optimized according to experts with domain knowledge (i.e., annotated data). 2) For the relation camouflage, we devise a similarity-aware neighbor selector to select the similar neighbors of a center node within a relation.” [Pp. 2-3, Section: 2] “Definition 2.1. Multi-relation Graph. We define a multi-relation graph as G = V, X, {Er }|R r=1 ,Y , where V is the set of nodes {v1, . . . ,vn }. Each node vi has a d-dimensional feature vector xi ∈ R d and X = {x1, . . . , xn } represents a set of all node features. e r i,j = (vi ,vj) ∈ Er is an edge between vi and vj with a relation r ∈ {1, · · · , R}. Note that an edge can be associated with multiple relations and there are R different types of relations. Y is the a set of labels for each node in V. Definition 2.2. Fraud Detection on Graph. For the fraud detection problem, the node v represents the target entity whose suspiciousness needs to be justified. For example, it can be a review on the review website [19, 29] or a transaction in the trading system [23, 37]. The node has a label yv ∈ {0, 1} ∈ Y where 0 represents benign and 1 represents suspicious. ... Definition 2.3. GNN-based Fraud Detection. A Graph Neural Network (GNN) is a deep learning framework to embed graph-structured data via aggregating the information from its neighboring nodes [12, 17, 34]. ... For a center node v, h (l) v is the hidden embedding at l-th layer and h (0) v = xi is the input feature. E (l) r denotes edges under relation r at the l-th layer. h (l−1) v′ ,r is the embedding of neighboring node v ′ under relation r. ... For fraud detection problems, we first construct a multi-relation graph based on domain knowledge. Then, the GNN is trained with partially labeled nodes supervised by binary classification loss functions.”)
wherein generating the graph comprises: selecting, using a distance metric associated with the center node of the graph, neighbor nodes that neighbor the center node and are to be included in the set of nodes of the graph, each neighbor node corresponding to an object of the reference set of objects and having the first classification or the second classification; (Dou, [Pp. 4-3, Sections: 3.2 & 3.3] “Label-aware Similarity Measure. we employ a one-layer MLP as the node label predictor at each layer and use the l1-distance between the prediction results of two nodes as their similarity measure. For a center node v under relation r at the l-th layer and edge (v,v ′ ) ∈ E(l−1) r , the distance between v and v ′ is the l1-distance of two embeddings: .... and we can define the similarity measure as: S (l) (v,v ′ ) = 1 − D(l) (v,v ′ ), (3) where each layer has its own similarity measure. The input of MLP at the l-th layer is the node embedding at the previous layer, and the output of MLP is a scalar which is then fed into a nonlinear activation function σ (we use tanh in our work). ... Given the similarity scores between the center node and its neighbors with Eq. (3), we should select similar neighbors (i.e., filter camouflaged ones) to improve the capability of GNNs. ... We should devise an adaptive filtering/sampling criteria to select an optimal amount of similar neighbors automatically. Thus, we design a similarity-aware neighbor selector. It selects similar neighbors under each relation using top-p sampling with an adaptive filtering threshold. ... Specifically, during the training phase, for a node v in current batch under relation r, we first compute a set of similarity scores {S (l) (v,v ′ )} using Eq. (3) at the l-th layer where (v,v ′ ) ∈ E(l) r . E (l) r is a set of edges under relation r at the l-th layer. Then we rank its neighbors based on {S (l) (v,v ′ )} in descending order and take the first p (l) r · |{S (l) (v,v ′ )}| neighbors as the selected neighbors at the l-th layer.”) [Examiner’s Note: the similarity distance is computed between center node and neighbors node v’, and the neighbor selector uses the similarity measure to select candidate nodes. Each node in the graph either represents benign or fraudulent] and
determining an edge weight of an edge between a first node and a second node of the set of nodes of the graph based on a distance between respective feature vectors of the first node and the second node; (Dou, [P. 3, Section: 2] “For a center node v, h (l) v is the hidden embedding at l-th layer and h (0) v = xi is the input feature. E (l) r denotes edges under relation r at the l-th layer. h (l−1) v′ ,r is the embedding of neighboring node v ′ under relation r.” [] “For a center node v under relation r at the l-th layer and edge (v,v ′ ) ∈ E(l−1) r , the distance between v and v ′ is the l1-distance of two embeddings ... “ [p. 4, Section: 3.3] “Specifically, during the training phase, for a node v in current batch under relation r, we first compute a set of similarity scores {S (l) (v,v ′ )} using Eq. (3) at the l-th layer where (v,v ′ ) ∈ E(l) r . E (l) r is a set of edges under relation r at the l-th layer. Then we rank its neighbors based on {S (l) (v,v ′ )} in descending order and take the first p (l) r · |{S (l) (v,v ′ )}| neighbors as the selected neighbors at the l-th layer.”) [Examiner’s Note: an edge weight is determined based on the similarity distance measure between two embeddings/nodes.]
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention to modify the system/method of Chen to incorporate the proposed CARE-GNN framework as taught by Dou. One would have been motivated to make such a combination in order to enhance the GNN aggregation process with three unique modules against camouflages. Doing so would improve the GNN performance and reduce model complexity (Dou [Conclusion]).
Chen in view of Dou does not appear to explicitly teach:
executing, using the sample data, a classification model to obtain a feature vector, the classification model trained to assign a classification of a plurality of classifications to the sample data,
However, Liu, in combination with Chen and Dou, teaches the following:
executing, using the sample data, a classification model to obtain a feature vector, the classification model trained to assign a classification of a plurality of classifications to the sample data, the plurality of classifications including a first classification and a second classification; (Liu, [p. 4, Section: 3.1] “Given a relatively large labeled dataset with a set of classes Ctrain, the objective of this setting is to train classifiers for an unseen set of novel classes Ctest, for which only a few labeled examples are available. Specifically, in each episode, a small subset of N classes are sampled from Ctrain to construct a support set and a query set.” [pp. 4-5, Section: 3.2.1] “We employ a convolutional neural network fϕ to extract features of an input xi , where fϕ(xi ; ϕ) refers to the feature map and ϕ indicates a parameter of the network. Despite the generality, we adopt the same architecture used in several recent works (Snell et al., 2017; Sung et al., 2018; Vinyals et al., 2016). ... We use the same embedding function fϕ for both the support set S and the query set Q.”)
generating a graph using the feature vector and other feature vectors that are respectively obtained from a reference set of objects, the reference set of objects respectively labeled with the first classification or the second classification, the feature vector for the object corresponding to a center node of a set of nodes of the graph, wherein determining the graph comprises: selecting, using a distance metric associated with the center node of the graph, neighbor nodes that neighbor the center node and are to be included in the set of nodes of the graph, each neighbor node corresponding to an object of the reference set of objects and having the first classification or the second classification; and determining an edge weight of an edge between a first node and a second node of the set of nodes of the graph based on a distance between respective feature vectors of the first node and the second node (Liu, [p. 5, Section: 3.2.2] “Manifold learning ... discovers the embedded low-dimensional subspace in the data, where it is critical to choose an appropriate neighborhood graph. A common choice is Gaussian similarity function: (See Eq. (3)) where d(·, ·) is a distance measure (e.g., Euclidean distance) and σ is the length scale parameter. The neighborhood structure behaves differently with respect to various σ, which means that it needs to carefully select the optimal σ for the best performance of label propagation. ... To obtain a proper neighborhood graph in meta-learning, we propose a graph construction module built on the union set of support set and query set: S ∪ Q. This module is composed of a convolutional neural network gφ which takes the feature map fϕ(xi) for xi ∈ S ∪ Q to produce an example-wise length-scale parameter σi = gφ(fϕ(xi)). Note that the scale parameter is determined example-wisely and learned in an episodic training procedure, which adapts well to different tasks and makes it suitable for few-shot learning. ... We only keep the k-max values in each row of W to construct a k-nearest neighbor graph. Then we apply the normalized graph Laplacians (Chung and Graham, 1997) on W, that is, S = D−1/2W D−1/2 , where D is a diagonal matrix with its (i, i)-value to be the sum of the i-th row of W.”) [Examiner’s Note: Liu teaches the convolutional neural network of the ResNet employed to extract feature vectors from input samples of different classes. The extracted feature vectors are provided to a graph construction module to construct a k-nearest neighbor graph to exploit the manifold structure of the sample data from different classes. The similarity distance function is used to select the nearest neighbors of the constructed graph. The normalized graph Laplacian represents normalized adjacency matrix, where edge weight Wij is determined based on the distance between xi and xj.]
Accordingly, at the effective filing date, it would have been prima facie obvious to one ordinarily skilled in the art to modify the combination of Chen, Dou, and Liu to incorporate the proposed Transductive Propagation Network as taught by Liu. One would have been motivated to make such a combination in order to exploit the underlying manifold structure in few-shot classification task by jointly learning feature embeddings and graph construction parameters, thereby improving classification performance (Liu [Conclusion]).
Regarding Claim 10, the combination of Chen, Dou, and Liu teaches the elements of claim 9 as outlined above, and further teaches:
wherein the reference set of objects comprises a first subset of objects respectively labeled with the first classification and a second subset of objects respectively labeled with the second classification. (Dou, [p. 2, Section: 2] “The node has a label yv ∈ {0, 1} ∈ Y where 0 represents benign and 1 represents suspicious. The relations R are rules, interactions, or shared attributes between nodes, e.g., two reviews from the same user [25] or transactions from the same devices [24]. ... Graph-based fraud detectors are trained based on the labeled node information along with the graph composed of multi-relations. The trained models are then used to predict the suspiciousness of unlabeled nodes.”) [Examiner’s Note: the sample set with nodes are labeled entities as benign and suspicious/adversarial.]
Regarding Claim 12,
the combination of Chen, Dou, and Liu teaches the elements of claim 9 as outlined above, and further teaches:
wherein selecting the neighbor nodes comprises determining nearest neighbors from the center node of the graph, the distance metric associated with parameters for determining the nearest neighbors. (Dou, [P. 4, Section: 3.2] “For a center node v under relation r at the l-th layer and edge (v,v ′ ) ∈ E(l−1) r , the distance between v and v ′ is the l1-distance of two embeddings ... where each layer has its own similarity measure. ... During the training process, the similarity measure parameters are directly updated through the above loss function.” [p. 4, Section: 3.3] “Specifically, during the training phase, for a node v in current batch under relation r, we first compute a set of similarity scores {S (l) (v,v ′ )} using Eq. (3) at the l-th layer where (v,v ′ ) ∈ E(l) r . E (l) r is a set of edges under relation r at the l-th layer. Then we rank its neighbors based on {S (l) (v,v ′ )} in descending order and take the first p (l) r · |{S (l) (v,v ′ )}| neighbors as the selected neighbors at the l-th layer. All other nodes are discarded at the current batch and will not attend the aggregation process. The top-p sampling process is applied to the center node at every layer for each relation.”)
Regarding Claim 13,
the combination of Chen, Dou, and Liu teaches the elements of claim 9 as outlined above, and further teaches:
wherein the graph comprises an embedding matrix and an adjacency matrix, wherein the embedding matrix comprises a plurality of feature vectors for the set of nodes of the graph, and wherein the adjacency matrix comprises edge weights for edges that connect the nodes of the graph. (Chen, [0035, 0037] “Given a graph g(0)=(X(0), A(0)), with node features X(0), corresponding node labels y(0) and A, a classifier f(0) can be trained to distinguish between benign and anomalous nodes. Adversarial samples can be generated, and a perturbed graph G1=(X(1), A(1)) can be constructed. ... the attack direction for a target node t includes the adjacency matrix A and the feature matrix X. The integrated gradients can be determined for the priority function J(A, X, t) for a node t, with respect to an input I. For a feature attack, then, I=X, and for a structure attack, I=A.” [0044]-[0045] Global-level representations, such as graph embedding, are less sensitive to adversarial perturbation than local-level representations, such as node embedding. As a result, the high-level graph representation can be used to guide the robust learning of local node representations ... A readout function can be used to obtain the graph embedding. Given the node representations {hv (final)}vϵV of graph G, ... where σ represents the sigmoid function, Wr represents a trainable linear mapping matrix, and n represents the number of nodes within the graph.”)
Regarding Claim 14, the combination of Chen, Dou, and Liu teaches the elements of claim 9 as outlined above, and further teaches:
wherein the first classification corresponds to a benign object and the second classification corresponds to an adversarial object. (Chen, [0030] “ To address this, the present embodiments may use a classifier that is built upon a large number of known-benign samples, and a smaller number of known-malicious samples.” [0042] “the present generator function G may generate adversarial samples as hard negative samples. These hard negative samples can help the system to learn a better discriminator for distinguishing the positive and negative pairs.”
Dou, also teaches: the first classification corresponds to a benign object and the second classification corresponds to an adversarial object. (Dou, [P. 2, Section: 2] “... the node v represents the target entity whose suspiciousness needs to be justified. For example, it can be a review on the review website [19, 29] or a transaction in the trading system [23, 37]. The node has a label yv ∈ {0, 1} ∈ Y where 0 represents benign and 1 represents suspicious.”)
Regarding Claim 15, the combination of Chen, Dou, and Liu teaches the elements of claim 14 as outlined above, and further teaches:
wherein the adversarial object is generated by perturbing the benign object using entropy data. (Chen, [0004] “A method for detecting and responding to an intrusion in a computer network includes generating an adversarial training data set that includes original samples and adversarial samples, by perturbing one or more of the original samples with an integrated gradient attack to generate the adversarial samples.” [0029] “During an adversarial attack, the graph is maliciously transformed through a perturbation function to create a perturbed graph G′, such that the performance of the trained GNN model drops significantly.” [0035] “Adversarial samples can be generated, and a perturbed graph G1=(X(1), A(1)) can be constructed. The labels of an attacked node are changed from benign to anomaly. Based on the perturbed graph G1, a new classifier f(1) may be trained.”
Regarding Claim 16,
the combination of Chen, Dou, and Liu teaches the elements of claim 9 as outlined above, and further teaches:
wherein object corresponds to an adversarial object, wherein the classification model initially assigns the classification of the object as being a benign object, and wherein the graph discriminator uses the feature vector obtained from the classification model to determine that the object is adversarial. (Chen, [0035] “... given a graph g(0)=(X(0), A(0)), with node features X(0), corresponding node labels y(0) and A, a classifier f(0) can be trained to distinguish between benign and anomalous nodes. Adversarial samples can be generated, and a perturbed graph G1=(X(1), A(1)) can be constructed. The labels of an attacked node are changed from benign to anomaly. Based on the perturbed graph G1, a new classifier f(1) may be trained. After a number of iterations (e.g., up to a predetermined maximum), the final classifier is selected as the adversary-resistant classifier.” [0077] “The output of the MIE 926 is processed by discriminator to determine whether the unlabeled node v is normal or anomalous, thereby providing a node label 930.”)
Regarding Claim 17, the combination of Chen, Dou, and Liu teaches the elements of claim 9 as outlined above, and further teaches:
wherein the object comprises a credential of a user, the credential operable for validating whether the user has authorization to access a resource. (Dou, [p. 1, Section: 1] “we construct a graph with two relations and two types of entities. The relation can be any common attributes supposing to be shared by similar entities (e.g., the User-IP-User relation connects entities with the same IP address). There are two types of camouflages as follows: 1) Feature camouflage: smart fraudsters may adjust their behaviors [8, 10], add special characters in reviews [19, 41] (so-called spamouflage), or employ deep language generation models [15] to gloss over explicit suspicious outcomes.” [p. 2, Section: 2] “Definition 2.2. Fraud Detection on Graph. For the fraud detection problem, the node v represents the target entity whose suspiciousness needs to be justified. For example, it can be a review on the review website [19, 29] or a transaction in the trading system [23, 37]. The node has a label yv ∈ {0, 1} ∈ Y where 0 represents benign and 1 represents suspicious. The relations R are rules, interactions, or shared attributes between nodes, e.g., two reviews from the same user [25] or transactions from the same devices [24].”)
Regarding Claim 18,
the combination of Chen, Dou, and Liu teaches the elements of claim 9 as outlined above, and further teaches:
wherein the graph discriminator comprises a neural network that is trained to aggregate the feature vector with the other feature vectors of the graph, the aggregation performed using the edge weights between nodes of the graph. (Chen, [0045]-[0046] “A readout function can be used to obtain the graph embedding. Given the node representations {hv (final)}vϵV of graph G, ... where σ represents the sigmoid function, Wr represents a trainable linear mapping matrix, and n represents the number of nodes within the graph. “ For the discriminator, a mutual information estimator (MIE) can be used to jointly model the local and global graph representations via a bilinear scoring function: ... where MIE(hv (K), hG) represents the mutual information between the node embedding and the graph embedding, WD is a trainable scoring matrix, K is a number of hops, and σ is the sigmoid function. ... Neighborhood aggregators of GNNs may include sum, max, and mean functions. The sum aggregator sums up the features within the neighborhood N0 to capture the full neighborhood. ” [0077] “The refined graph encoder 902 is used on a variety of samples, including original samples and perturbed adversarial samples. The refined graph encoder 902 generates respective representations for the samples, and these representations are provided together to adversarial contrastive learning 920. A readout layer 922 obtains the graph embedding 924 of each of the input final representations from the respective samples. An MIE 926 jointly models local and global representations, based on both the input final representation and the graph embedding. The output of the MIE 926 is processed by discriminator to determine whether the unlabeled node v is normal or anomalous, thereby providing a node label 930.”)
Regarding Claim 19, the combination of Chen, Dou, and Liu teaches the elements of claim 9 as outlined above, and further teaches:
wherein the neural network is trained to select, for each iteration, a candidate nearest neighbor from either a first subset of the reference set of objects or a second subset of the reference set of objects, the first subset associated with the first classification and the second subset associated with the second classification. (Dou, [pp. 4-5, Section: 3] “We should devise an adaptive filtering/sampling criteria to select an optimal amount of similar neighbors automatically. Thus, we design a similarity-aware neighbor selector. It selects similar neighbors under each relation using top-p sampling with an adaptive filtering threshold. We also devise a reinforcement learning (RL) algorithm to find optimal thresholds during the GNN training process. ... e formulate the RL process as a Bernoulli Multiarmed Bandit (BMAB) B(A, f ,T ) between the neighbor selector and the GNN with the similarity measure. A is the action space, f is the reward function, and T is the terminal condition [36]. Given an initial p (l) r , the neighbor selector choose to increase or decrease p (l) r as actions and the reward is dependent on the average distance differences between two consecutive epochs. ... The reward is positive when the average distance of newly selected neighbors at epoch e is less than that of the previous epoch, and vice versa. It is not easy to estimate the cumulative reward. Thus, we use the immediate reward to update the action greedily without exploration. Concretely, we increase p (l) r with a positive reward and decrease it vice versa. ... It means that the RL converges in the recent ten epochs and indicates an optimal threshold p (l) r is discovered. After the RL module terminates, the filtering thresholds are fixed as the optimal one until the convergence of GNN.”) [Examiner’s Note: the reinforcement learning (RL) to find the optimal amounts of neighbors to be selected is broadly interpreted as the fine-tuning, where the most similar neighbor nodes are being selected during each epoch.]
Regarding Claim 20,
the combination of Chen, Dou, and Liu teaches the elements of claim 9 as outlined above, and further teaches:
labeling the object using the classification determined by the graph discriminator; and updating the reference set of objects to include a new reference object that corresponds to the labeled object. (Chen, [0077] “The output of the MIE 926 is processed by discriminator to determine whether the unlabeled node v is normal or anomalous, thereby providing a node label 930.” [0028] “The parameters of a GNN may be trained in a supervised manner, with the aim being to learn a new representation of a node, which can be used to predict the node label.” [0087] “Given a sequence of attributed graphs 1100 for a set of nodes 1104, where each node 1104 has a unique class label over a period of time, the present embodiments predict the labels of unlabeled nodes 1102 by learning from the labeled ones. ... The present embodiments can then classify the nodes into an anomalous class and a normal class, using the labeled historical data. The labeled network graph can then be used to identify anomalous behavior that may, for example, be indicative of a network failure or intrusion. The anomalous behavior can then be corrected.”)
Claim(s) 11 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Chen, Dou, and Liu as described above, and further in view of Li et al., (Pub. No.: US 20200279151 A1).
Regarding Claim 11, the combination of Chen, Dou, and Liu teaches the elements of claim 10 as outlined above, and further teaches:
selecting a first candidate nearest neighbor for a particular node of a current graph, the first candidate nearest neighbor selected from the first subset of objects having the first classification and associated with a first candidate feature vector of the feature vectors obtained from the reference set of objects; selecting a second candidate nearest neighbor for the particular node of the current graph, the second candidate nearest neighbor selected from the second subset of objects having the second classification and associated with a second candidate feature vector of the feature vectors obtained from the reference set of objects; ... and connecting a candidate selected by the neural network to the particular node of the current graph. (Dou, [pp. 4-5, Section: 3.3] “3.3 Similarity-aware Neighbor Selector Given the similarity scores between the center node and its neighbors with Eq. (3), we should select similar neighbors (i.e., filter camouflaged ones) to improve the capability of GNNs. According to the relation camouflage, fraudsters may connect to different amounts of benign entities under different relations [44]. ... . We should devise an adaptive filtering/sampling criteria to select an optimal amount of similar neighbors automatically. Thus, we design a similarity-aware neighbor selector. It selects similar neighbors under each relation using top-p sampling with an adaptive filtering threshold. ... 3.3.1 Top-p Sampling. We employ top-p sampling to filter camouflaged neighbors under each relation. The filtering threshold for relation r at the l-th layer is p (l) r ∈ [0, 1]. The closed interval means we could discard or keep all neighbors of a node under a relation. Specifically, during the training phase, for a node v in current batch under relation r, we first compute a set of similarity scores {S (l) (v,v ′ )} using Eq. (3) at the l-th layer where (v,v ′ ) ∈ E(l) r . E (l) r is a set of edges under relation r at the l-th layer. Then we rank its neighbors based on {S (l) (v,v ′ )} in descending order and take the first p (l) r · |{S (l) (v,v ′ )}| neighbors as the selected neighbors at the l-th layer. All other nodes are discarded at the current batch and will not attend the aggregation process. The top-p sampling process is applied to the center node at every layer for each relation.” [P. 5, Section: 3.4] “After filtering neighbors under each relation, the next step is to aggregate the neighbor information from different relations. Previous methods adopt attention mechanism [23, 37, 48] or devise weighting parameters [24] to learn the relation weights during aggregating information from different relations. However, supposing we have selected the most similar neighbors under each relation, the attention coefficients or weighting parameters should be similar among different relations. ... after applying the top-p sampling, for node v, we define the intra-relation neighbor aggregation .... ”) [Examiner’s Note: Dou teaches the process of selecting similar neighbor under each relation and uses label-aware similarity measure to select the top ranking nodes as neighbors to the center node. The selection of top-p nodes from different samples of benign and fraud/adversarial class labels. The inter-relation neighbor aggregation represents the connection of the selected top-p nodes by the RL module.]
While Dou teaches the RL-module that consist of Neighbor Selector and Intra-relation AGG that takes graph embeddings and weight edges to perform top-p selection and then aggregating the selected neighbors via relations to obtain aggregated graph. The combination of Chen, Dou, and Liu does not appear to explicitly teach:
inputting (I) a current adjacency matrix of the current graph, (II) a current embedding matrix of the current graph, (III) the first candidate feature vector, and (IV) the second candidate feature vector into a neural network, the neural network trained to select one of the candidates;
However, Li, in combination with Chen, Dou, and Liu, teaches the limitations:
inputting (I) a current adjacency matrix of the current graph, (II) a current embedding matrix of the current graph, (III) the first candidate feature vector, and (IV) the second candidate feature vector into a neural network, the neural network trained to select one of the candidates; and connecting a candidate selected by the neural network to the particular node of the current graph. (Li, [0069]-[0070] “The node selection neural network 103 is configured to receive the input graph 105 and an indication of the candidate node 107. The node selection neural network 103 is further configured to process the input graph 105 and the indication of the candidate node 107 to output one or more probabilities 109 of adding an edge to the graph between the candidate node 107 and each of the nodes of the graph 105. That is, the node selection neural network 103 is configured to represent a probability distribution for determining which node of a particular graph a new edge from a candidate node is to be connected to. ... The node selection neural network 103 may be configured to process a node state vector associated with the candidate node 107 and a node state vector associated with each node of the input graph 105 to generate a score associated with each pairing of the candidate node 107 and a respective node of the input graph 105. The node selection neural network 103 may comprise a final softmax layer to output the one or more probabilities 109 of adding an edge between the candidate node 107 and each node of the graph 105 based upon the generated scores. [0089] “The graph is then updated to include an edge between the candidate node 107 and the node selected at step S406.” [0023] “The information associated with a node may be encoded as a state vector.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Chen, Dou, and Liu before them, to incorporate the method of generating a graph representing object using graph neural network as taught by Li. One would have been motivated to make such a combination in order to generate graph representation of objects using graph neural networks to enable efficient and accurate graph representation (Li [0053]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
(Pub. No.: US 20210109973 A1) – “Georgia Olympia Brikis” relates to “Generation of graph-structured representations of brownfield systems.”
(Pub. No.: US 20210034737 A1) – “Sakif Hossain Khan” relates to “Detection of adverserial attacks on graphs and graph subsets.”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SADIK ALSHAHARI whose telephone number is (703)756-4749. The examiner can normally be reached Monday - Friday, 9 a.m. 6 p.m. ET.
Examiner interviews are available via telephone, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/S.A.A./Examiner, Art Unit 2121
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121