DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 6 and claim 15 are objected to because of the following informalities:
“wherein the attacker model comprises a search algorithm is a Monte-Carlo Tree Search (MCTS)” appears to be a typo of “wherein the attacker model comprises a search algorithm that is a Monte-Carlo Tree Search (MCTS)”
“wherein the search algorithm that identifies” is considered a typo of “wherein the search algorithm identifies” as there is only one search algorithm in the claims.
“exhaustive list of transformation” is considered a typo of “exhaustive list of transformations”
Appropriate correction is required.
Claim 5 is objected to because of the following informalities: claim recites a typo of duplicate elements within the listing of options (multiple of 512 and 1024). Appropriate correction is required.
Claim 7 and 16 are objected to because of the following informalities: claim recites “neutral network” instead of “neural network”. Appropriate correction is required.
Claim Rejections - 35 USC § 112
Regarding 112(b):
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 2-5 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
In regard to Claim 2:
Claim 2 recites the limitation "wherein the learning algorithm is multi-layer perceptron function, and wherein learning parameters of the learning algorithm are updated according to a stochastic gradient descent-based optimization model". There is insufficient antecedent basis for this limitation in the claim. Claim 1 notes “updating, by the computing system, the machine learning model according to a learning algorithm with the loss”. This conflicts with the limitations in claim 2, as claim 2 does not note the updating of the machine learning model but instead of the learning algorithm’s parameters. The meaning of learning algorithm also appears to have been changed between claim 1 and 2. Claim 1 indicated the use of a learning algorithm, such as the ones used to update machine learning models. That interpretation is supported by claim 3 indicating the “Adam optimizer function” as Adam is a variant of an existing learning algorithm. However, the updating of a machine learning model using a “learning algorithm” that is a “multilayer perceptron” indicates the learning algorithm is a machine learning model itself. One of ordinary skill in the art would be confused by the use of a machine learning model being referred to as a learning algorithm. The confusion gives a possible indication that the learning algorithm was not the intended target for being a multilayer perceptron, but possibly one of the models within claim 1. However, claim 1 indicates an adversary model and a machine learning model for source code classification, thus which could possibly be the multilayer perceptron is not known.
The indefiniteness around claim 2 renders the claims 2-5 difficult to relate to prior art as a result of claim limitations not being able to provide clear limitations. As a result, prior art limitations to claims 2-5 mostly teach the aspects of what the claim refers to (such as claim 2 noting multilayer perceptrons and stochastic gradient descent) ignoring the requirement of the learning algorithm supposedly being a machine learning model. Amending the claim to correct the confusion and to clarify the intention is appreciated.
In regards to Claim 5:
Claim 5 recites the limitation wherein the five-layer perceptron comprises kernel sizes of 1024, 512, 256, 512, and 1024 ". There is insufficient antecedent basis for this limitation in the claim. Claim 4 introduces a five-layer perceptron. Claim 5 attempts to note kernel sizes related to the perceptron, but the meaning of a multilayer perceptron is a fully connected series of layers of perceptrons. How fully connected layers could have kernel sizes or kernel sizes that comprise different sizes is not understood. Correction and clarification is appreciated. Appropriate correction is required.
In regards to dependent claims:
Claims that are depend upon a claim rejected under 112b are rejected under 112b for being dependent upon a rejected claim.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 14-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed towards an abstract idea without significantly more.
In regards to Claim 14:
Step 1: Is the claim directed towards a process, machine, manufacture, or composition of matter?
Yes, the claim is directed towards a method, so a process.
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 14 recites the following abstract ideas:
generating, by the computing system, a predicted outcome for the approximation of the transformation of the feature vector, with a machine learning model for source code classification
This limitation is directed towards the abstract idea of a mental process, or a concept performed in the human mind, including observation, evaluation, judgement or opinion (see MPEP 2106.04(a)(2) subsection 3). Here the limitation is seen as evaluation.
calculating, by the computing system, a loss by a loss function that compares the predicted outcome to the ground-truth of interest for the input program
This limitation is directed towards the abstract idea of a mental process, or a concept performed in the human mind, including observation, evaluation, judgement or opinion (see MPEP 2106.04(a)(2) subsection 3). Here the limitation is seen as evaluation.
This abstract idea is not seen as integrated into a practical application as the loss does not appear to be utilized in some way, such as updating a machine learning using a learning algorithm.
selecting, by the computing system, a subsequent transformation of the feature vector of the input program based on the loss with a search algorithm
This limitation is directed towards the abstract idea of a mental process, or a concept performed in the human mind, including observation, evaluation, judgement or opinion (see MPEP 2106.04(a)(2) subsection 3). Here the limitation is seen as evaluation. This is seen as evaluation as the limitation appears to indicate selecting something where the selection is modified by a loss and a search algorithm. Without particular details, the limitation is broad enough to be performable by a person of ordinary skill in the art in the mind or with pen and paper.
determining, by the computing system, a sequence of transformations that meets a threshold loss over the plurality of training pairs in the input dataset, and wherein the sequence of transformations comprises a plurality of transformations that prevents the machine learning model from correctly classifying an adversarial example
This limitation is directed towards the abstract idea of a mental process, or a concept performed in the human mind, including observation, evaluation, judgement or opinion (see MPEP 2106.04(a)(2) subsection 3). Here the limitation is seen as evaluation and judgement. The broad nature of the determining enables the method to be performable by a person in the mind or using pen and paper.
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 14 recites the following additional elements:
A method for training a neural network for generating adversarial examples, the method comprising: receiving, by a computing system, an input dataset, wherein the input dataset comprises a plurality of training pairs, wherein each training pair comprises an input program and a ground-truth of interest corresponding to the input program
This limitation is directed towards the insignificant extra solution activity of mere data gathering (see MPEP § 2106.05(g)).
generating, by the computing system, a feature vector of the input program with a feature extractor
At a high level of generality, this is an activity of using a feature extractor as an “apply it” use (see MPEP 2106.05(f)).
generating, by the computing system, an approximation of a transformation of the feature vector of the input program, wherein the approximation of the transformation of the feature vector is approximated with a mapping function in a feature-space
At a high level of generality, this is an activity of using a mapping function as an “apply it” use (see MPEP 2106.05(f)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 14 recites the following additional elements:
A method for training a neural network for generating adversarial examples, the method comprising: receiving, by a computing system, an input dataset, wherein the input dataset comprises a plurality of training pairs, wherein each training pair comprises an input program and a ground-truth of interest corresponding to the input program
This limitation is directed towards the insignificant extra solution activity of mere data gathering (see MPEP § 2106.05(g)). This is a well understood, routine, conventional activity of transmitting data (see MPEP 2106.05(d) example i in computer functions).
generating, by the computing system, a feature vector of the input program with a feature extractor
At a high level of generality, this is an activity of using a feature extractor as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a generic recitation of “generating… a feature vector” using a feature extractor does not incorporate the abstract idea into a practical invention and is seen as a variation of the phrase “apply it”.
generating, by the computing system, an approximation of a transformation of the feature vector of the input program, wherein the approximation of the transformation of the feature vector is approximated with a mapping function in a feature-space
At a high level of generality, this is an activity of using a mapping function as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a generic recitation of “generating… an approximation of a transformation” using a mapping function does not incorporate the abstract idea into a practical invention and is seen as a variation of the phrase “apply it”.
In regards to Claim 15:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 15 recites the following abstract ideas:
wherein the search algorithm is a Monte-Carlo Tree Search (MCTS),
This limitation is directed towards the abstract idea of a mathematical concept (see MPEP 2106.04(a)(2) subsection 1).
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 15 recites the following additional elements:
and wherein the search algorithm that identifies an exhaustive list of transformation for the sequence of transformations
At a high level of generality, this is an activity of using a search algorithm as an “apply it” use (see MPEP 2106.05(f)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 15 recites the following additional elements:
and wherein the search algorithm that identifies an exhaustive list of transformation for the sequence of transformations
At a high level of generality, this is an activity of using a search algorithm as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, a generic recitation of “identifies… list of transformations” using a search algorithm does not incorporate the abstract idea into a practical invention and is seen as a variation of the phrase “apply it”.
In regards to Claim 16:
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 16 recites the following additional elements:
wherein the feature extractor is a convolutional neutral network, and wherein the feature extractor is a function of a classification task and the input program
At a high level of generality, this is a continuation of an activity of using a feature extractor from claim 14 and is seen as an “apply it” use (see MPEP 2106.05(f)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 16 recites the following additional elements:
wherein the feature extractor is a convolutional neutral network, and wherein the feature extractor is a function of a classification task and the input program
At a high level of generality, this is a continuation of an activity of using a feature extractor from claim 14 and is seen as an “apply it” use (see MPEP 2106.05(f)). At said high level of generality, the addition of noting convolutional neural network and classification does not incorporate the abstract idea into a practical invention and is seen as a variation of the phrase “apply it”.
In regards to Claim 17:
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 17 recites the following additional elements:
wherein the classification task is a functionality prediction of source code
This limitation is directed towards linking or indicating a field of use or technological environment (see MPEP 2106.05(h)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 17 recites the following additional elements:
wherein the classification task is a functionality prediction of source code
This limitation is directed towards linking or indicating a field of use or technological environment (see MPEP 2106.05(h)). Limitations that amount to merely linking/indicating to a field of use or technological environment, such as a computer environment (see MPEP 2106.05(h)(x) or see MPEP 2106.05(h)(iv)), do not amount to significantly more than the exception itself.
In regards to claim 18:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 18 recites the following abstract ideas:
wherein the threshold loss is a cumulative maximum loss of the plurality of transformations in the sequence of transformations
This limitation is directed towards the continuation of the abstract idea of a mental process, or a concept performed in the human mind, including observation, evaluation, judgement or opinion (see MPEP 2106.04(a)(2) subsection 3). Here the limitation is seen as a continuation of the evaluation and judgement from claim 14 related to determining a sequence of transformations that meets a threshold loss. The broad nature of the determining enables the method to be performable by a person in the mind or using pen and paper.
In regards to Claim 19:
Step 2A Prong 2: Does the claim recite additional elements that integrate the exception into a practical application of the exception?
No, the application does not recite any additional elements that would integrate the abstract idea into a practical application.
Claim 19 recites the following additional elements:
wherein the input dataset is a large-scale dataset comprising more than 500 million of the training pairs
This limitation is directed towards the continuation of insignificant extra solution activity of mere data gathering (see MPEP § 2106.05(g)).
Step 2B: Does the claim as a whole amount to significantly more than the judicial exception?
No, the claim as a whole does not amount to significantly more than the judicial exception. All elements of the claim, viewed individually or wholistically, do not provide an inventive concept or otherwise significantly more than the abstract idea itself.
Claim 19 recites the following additional elements:
wherein the input dataset is a large-scale dataset comprising more than 500 million of the training pairs
This limitation is directed towards the continuation of insignificant extra solution activity of mere data gathering (see MPEP § 2106.05(g)). This is a well understood, routine, conventional activity of transmitting data (see MPEP 2106.05(d) example i in computer functions).
In regards to claim 20:
Step 2A Prong 1: Does the claim recite a law of nature, a natural phenomenon, or an abstract idea?
Yes, the claim does recite a(n) abstract idea.
Claim 20 recites the following abstract ideas:
wherein the loss function is a mean squared error function
This limitation is directed towards the abstract idea of a mathematical concept (see MPEP 2106.04(a)(2) subsection 1).
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1, 9-14, 18-20 is/are rejected under 35 U.S.C. 102(a)(1) and (a)(2) as being anticipated by Yefet et al (“Adversarial examples for models of code”), referred to as Yefet in this document.
Regarding Claim 1:
Yefet teaches:
A method for training an adversarial neural network for source code classification, the method comprising:
[Yefet Abstract page 2]: “Neural models of code have shown impressive results when performing tasks such as predicting method names and identifying certain kinds of bugs [A method for training an adversarial neural network for source code classification, the method comprising]. We show that these models are vulnerable to adversarial examples, and introduce a novel approach for attacking trained models of code using adversarial examples.”
receiving, by a computing system, an input dataset, wherein the input dataset comprises a plurality of training pairs, wherein each training pair comprises an input program and a ground-truth of interest corresponding to the input program;
[Yefet 6.1.3 Dataset page 16]: “We evaluated our proposed attack and defense on code2vec using the Java-large dataset [Alon et al. 2019a]. This dataset contains more than 16M Java methods and their labels [receiving, by a computing system, an input dataset, wherein the input dataset comprises a plurality of training pairs, wherein each training pair comprises an input program and a ground-truth of interest corresponding to the input program], taken from 9500 top-starred Java projects in GitHub that have been created since January 2007. It contains 9000 projects for training, 200 distinct projects for validation, and 300 distinct projects for test.”
generating, by the computing system, a feature vector of the input program with a feature extractor;
generating, by the computing system, adversarial examples with an attacker model, wherein the adversarial examples are based on the feature vector of the input program,
[Yefet 1.2 Our Approach page 4]: “Given a desired adversarial label ybad and an existing variable name, we derive the loss of the model with ybad as the correct label, with respect to the one-hot vector [generating, by the computing system, a feature vector of the input program with a feature extractor as the claim notes the inputs are interpreted as vectors] of the input variable. We then take the argmax of the resulting gradient to select an alternative variable name, rename the original variable to the alternative name, check whether this modification changes the output label to the desired adversarial label [generating, by the computing system, adversarial examples with an attacker model, wherein the adversarial examples are based on the feature vector of the input program, as the process here is describing making an adversarial example], and continue iteratively.”
wherein the attacker model comprises a mapping function that approximates a sequence of transformations of the input program in a feature-space,
[Yefet 4.5 Search page 12]: “Sometimes, adversarial examples can be found by applying the adversarial step of Equation (4) once. At other times, multiple steps are needed, i.e., replacing v with v′ and computing the gradient again with respect to v′. We limit the number of times we apply the adversarial step [wherein the attacker model comprises a mapping function that approximates a sequence of transformations of the input program in a feature-space, as the quote notes finding the adversarial examples and notes performing steps (thus sequence of transformations)] by a depth hyperparameter. Additionally, instead of taking the argmax from the distribution over candidate names, we can try all “top-k” candidates. These define a Breadth-First Search (BFS), where the width parameter is defined by the “top-k”. Checking whether a choice of variable name results in the desired output is low cost; it just requires computing the output of the model given the modified input. Thus, it is conveniently feasible to perform multiple adversarial steps and check multiple top-k candidates.”
and wherein the adversarial examples are semantically equivalent to the input program;
[Yefet 1.1 Goal page 3]: “For example, we can: (i) rename variables and (ii) add dead code. There are clearly many other semantic preserving transformations [and wherein the adversarial examples are semantically equivalent to the input program] (e.g., re-ordering independent statements), but their application would require a deeper analysis of the program to guarantee that they are indeed semantic preserving. In this paper, therefore, we focus on the above two semantic-preserving transformations, which can be safely applied without any semantic analysis.”
generating, by the computing system, a predicted outcome for the adversarial examples with a machine learning model for source code classification;
[Yefet 1.2 Our Approach page 4]: “Given a desired adversarial label ybad and an existing variable name, we derive the loss of the model with ybad as the correct label, with respect to the one-hot vector of the input variable. We then take the argmax of the resulting gradient to select an alternative variable name, rename the original variable to the alternative name, check whether this modification changes the output label to the desired adversarial label [generating, by the computing system, a predicted outcome for the adversarial examples with a machine learning model for source code classification as notes the checking if the label from the model has changed], and continue iteratively.”
calculating, by the computing system, a loss by a loss function that compares the predicted outcome to the ground-truth of interest for the input program;
[Yefet 4.3 Non-targeted Attack page 12]: “In a non-targeted attack, our goal is to update v to v′ in a way that will increase the loss in any direction, instead of decreasing it, as in the training process. Thus, we compute the gradient with respect to v and use Gradient Ascent… In Equation (5), our goal for the non-targeted attack is to find the direction of the higher loss with respect to the original label [calculating, by the computing system, a loss by a loss function that compares the predicted outcome to the ground-truth of interest for the input program]. This is exactly what the gradient gives us.”
updating, by the computing system, the machine learning model according to a learning algorithm with the loss;
[Yefet 4.2 Targeted Attack page 10]: “Let θ be the learned parameters of a model, c be the input code snippet to the model, y be the target label, and J(θ,c,y) be the loss function used to train the neural model [updating, by the computing system, the machine learning model according to a learning algorithm with the loss]. As explained in Section 3.1, when training using gradient descent, the following rule is used to update the model parameters and minimize J”
and determining, by the computing system, a quality metric for the machine learning model, wherein the quality metric is a number of correctly predicted outcome based on the ground-truth of interest for the plurality of training pairs;
[Yefet 6.2.1 Metrics page 17]: “In non-targeted attacks, the goal of the adversary is to change the prediction of the model to any label other than the correct label. We thus define robustness as the percentage of examples in which the correctly predicted label was not changed to any label other than the correct label [and determining, by the computing system, a quality metric for the machine learning model, wherein the quality metric is a number of correctly predicted outcome based on the ground-truth of interest for the plurality of training pairs as a metric indicative of quality related to the prediction of a model based on the ground truth is noted].”
and determining, by the computing system, the quality metric meets a training threshold, wherein the training threshold indicates a peak accuracy of the predicted outcomes for the adversarial examples.
[Yefet 5.2 Defense with Re-training page 14]: “Instead of minimizing only the expected loss of the original distribution, every example from the training set (x,y) ∈ T contributes both to the original loss J(θ,x,y) and to an adversarial loss Jadv(θ,x′,y). [and determining, by the computing system, the quality metric meets a training threshold, wherein the training threshold indicates a peak accuracy of the predicted outcomes for the adversarial examples where the minimizing loss is seen as a form of peak accuracy thus matching the description in the claim of the training threshold] Here, x′ is a perturbed version of x, which was created using a single BFS step of our non-targeted attack (Section 4.3). During training, we minimized J + Jadv simultaneously.”
The quality metric is meeting the training threshold as the quality metric changes with the loss as the loss is altering the accuracy of the model.
Regarding claim 9:
The method of claim 1 is taught by Yefet
Yefet teaches:
wherein the training threshold is a model accuracy to place the machine learning model in a state of equilibrium between overfitting the training pairs and underfitting the training pairs
[Yefet 5.2 Defense with Re-training page 14]: “Instead of minimizing only the expected loss of the original distribution, every example from the training set (x,y) ∈ T contributes both to the original loss J(θ,x,y) and to an adversarial loss Jadv(θ,x′,y). Here, x′ is a perturbed version of x, which was created using a single BFS step of our non-targeted attack (Section 4.3). During training, we minimized J + Jadv simultaneously [wherein the training threshold is a model accuracy to place the machine learning model in a state of equilibrium between overfitting the training pairs and underfitting the training pairs where the two losses are seen as helping preventing underfitting and overfitting since the original training set and adversarial examples are being managed, thus helping the generalization of the model (which generalization focuses on preventing under and over fitting)].”
Regarding claim 10:
The method of claim 1 is taught by Yefet
Yefet teaches:
wherein the mapping function is based on the sequence of transformations and the feature extractor
[Yefet 4.5 Search page 12]: “Sometimes, adversarial examples can be found by applying the adversarial step of Equation (4) once. At other times, multiple steps are needed, i.e., replacing v with v′ and computing the gradient again with respect to v′. We limit the number of times we apply the adversarial step [wherein the mapping function is based on the sequence of transformations and the feature extractor where the feature extractor is given in more depth by the teaching of a feature extractor in claim 1 by Yefet] by a depth hyperparameter. Additionally, instead of taking the argmax from the distribution over candidate names, we can try all “top-k” candidates. These define a Breadth-First Search (BFS), where the width parameter is defined by the “top-k”. Checking whether a choice of variable name results in the desired output is low cost; it just requires computing the output of the model given the modified input. Thus, it is conveniently feasible to perform multiple adversarial steps and check multiple top-k candidates.”
Regarding claim 11:
The method of claim 1 is taught by Yefet
wherein the input dataset is a large-scale dataset comprising more than 500 million of the training pairs
[Yefet 6.1.3 Dataset page 16]: “We evaluated our proposed attack and defense on code2vec using the Java-large dataset [Alon et al. 2019a]. This dataset contains more than 16M Java methods and their labels [wherein the input dataset is a large-scale dataset comprising more than 500 million of the training pairs where the size is not the same number, but the premise of a large dataset is taught and one of ordinary skill in the art would understand that datasets can be of different sizes], taken from 9500 top-starred Java projects in GitHub that have been created since January 2007. It contains 9000 projects for training, 200 distinct projects for validation, and 300 distinct projects for test.”
The dataset given by Yefet is also seen as being similar or larger in size than the dataset noted in the specification ([Current Application 0039]: “In one example, Project CodeNet dataset is a large-scale dataset that comprises over 14 million code samples, over 500 million lines of code, in 55 different programming languages.”), thus the indication of “500 million of the training pairs” is seen as met as the dataset in Yefet notes more samples than Codenet (16M vs 14M).
Regarding claim 12:
The method of claim 1 is taught by Yefet
Yefet teaches:
wherein the loss function is a mean squared error function
[Yefet 3.1 Training Neural Networks]: “The process of adjusting θ (i.e., training) is done by solving an optimization problem defined by a certain loss function J(θ,x,y); this is usually mean squared error (MSE) [wherein the loss function is a mean squared error function] or cross entropy and is used to estimate the model’s generalization ability”
Regarding claim 13:
The method of claim 1 is taught by Yefet
wherein the attacker model is configured to maximize the loss between an expected outcome and the predicted outcome, wherein the expected outcome is the ground-truth of interest for the input program
[Yefet 4.3 Non-targeted Attack page 12]: “In a non-targeted attack, our goal is to update v to v′ in a way that will increase the loss in any direction [wherein the attacker model is configured to maximize the loss between an expected outcome and the predicted outcome, wherein the expected outcome is the ground-truth of interest for the input program], instead of decreasing it, as in the training process. Thus, we compute the gradient with respect to v and use Gradient Ascent… In Equation (5), our goal for the non-targeted attack is to find the direction of the higher loss with respect to the original label. This is exactly what the gradient gives us.”
Regarding Claim 14:
Yefet teaches:
A method for training a neural network for generating adversarial examples, the method comprising
[Yefet Abstract page 2]: “Neural models of code have shown impressive results when performing tasks such as predicting method names and identifying certain kinds of bugs. We show that these models are vulnerable to adversarial examples, and introduce a novel approach for attacking trained models of code using adversarial examples [A method for training a neural network for generating adversarial examples, the method comprising].”
receiving, by a computing system, an input dataset, wherein the input dataset comprises a plurality of training pairs, wherein each training pair comprises an input program and a ground-truth of interest corresponding to the input program
[Yefet 6.1.3 Dataset page 16]: “We evaluated our proposed attack and defense on code2vec using the Java-large dataset [Alon et al. 2019a]. This dataset contains more than 16M Java methods and their labels [receiving, by a computing system, an input dataset, wherein the input dataset comprises a plurality of training pairs, wherein each training pair comprises an input program and a ground-truth of interest corresponding to the input program], taken from 9500 top-starred Java projects in GitHub that have been created since January 2007. It contains 9000 projects for training, 200 distinct projects for validation, and 300 distinct projects for test.”
generating, by the computing system, a feature vector of the input program with a feature extractor;
[Yefet 1.2 Our Approach page 4]: “Given a desired adversarial label ybad and an existing variable name, we derive the loss of the model with ybad as the correct label, with respect to the one-hot vector [generating, by the computing system, a feature vector of the input program with a feature extractor as the claim notes the inputs are interpreted as vectors] of the input variable. We then take the argmax of the resulting gradient to select an alternative variable name, rename the original variable to the alternative name, check whether this modification changes the output label to the desired adversarial label, and continue iteratively.”
generating, by the computing system, an approximation of a transformation of the feature vector of the input program, wherein the approximation of the transformation of the feature vector is approximated with a mapping function in a feature-space
[Yefet 4.5 Search page 12]: “Sometimes, adversarial examples can be found by applying the adversarial step of Equation (4) once. At other times, multiple steps are needed, i.e., replacing v with v′ and computing the gradient again with respect to v′. We limit the number of times we apply the adversarial step [generating, by the computing system, an approximation of a transformation of the feature vector of the input program, wherein the approximation of the transformation of the feature vector is approximated with a mapping function in a feature-space as an approximation of a transformation of the feature vector is noted utilizing a search/mapping function] by a depth hyperparameter. Additionally, instead of taking the argmax from the distribution over candidate names, we can try all “top-k” candidates. These define a Breadth-First Search (BFS), where the width parameter is defined by the “top-k”. Checking whether a choice of variable name results in the desired output is low cost; it just requires computing the output of the model given the modified input. Thus, it is conveniently feasible to perform multiple adversarial steps and check multiple top-k candidates.”
generating, by the computing system, a predicted outcome for the approximation of the transformation of the feature vector, with a machine learning model for source code classification
[Yefet 1.2 Our Approach page 4]: “Given a desired adversarial label ybad and an existing variable name, we derive the loss of the model with ybad as the correct label, with respect to the one-hot vector of the input variable. We then take the argmax of the resulting gradient to select an alternative variable name, rename the original variable to the alternative name, check whether this modification changes the output label to the desired adversarial label [generating, by the computing system, a predicted outcome for the approximation of the transformation of the feature vector, with a machine learning model for source code classification as notes the checking if the label from the model has changed], and continue iteratively.”
calculating, by the computing system, a loss by a loss function that compares the predicted outcome to the ground-truth of interest for the input program
[Yefet 4.3 Non-targeted Attack page 12]: “In a non-targeted attack, our goal is to update v to v′ in a way that will increase the loss in any direction, instead of decreasing it, as in the training process. Thus, we compute the gradient with respect to v and use Gradient Ascent… In Equation (5), our goal for the non-targeted attack is to find the direction of the higher loss with respect to the original label [calculating, by the computing system, a loss by a loss function that compares the predicted outcome to the ground-truth of interest for the input program]. This is exactly what the gradient gives us.”
selecting, by the computing system, a subsequent transformation of the feature vector of the input program based on the loss with a search algorithm
[Yefet 4.3 Non-targeted Attack page 12]: “In a non-targeted attack, our goal is to update v to v′ in a way that will increase the loss [selecting, by the computing system, a subsequent transformation of the feature vector of the input program based on the loss with a search algorithm where the selecting is the choosing the update for v to v’] in any direction, instead of decreasing it, as in the training process. Thus, we compute the gradient with respect to v and use Gradient Ascent… In Equation (5), our goal for the non-targeted attack is to find the direction of the higher loss with respect to the original label. This is exactly what the gradient gives us.”
and determining, by the computing system, a sequence of transformations that meets a threshold loss over the plurality of training pairs in the input dataset
[Yefet 5.2 Defense with Re-training page 14]: “Instead of minimizing only the expected loss of the original distribution, every example from the training set (x,y) ∈ T contributes both to the original loss J(θ,x,y) and to an adversarial loss Jadv(θ,x′,y). [and determining, by the computing system, a sequence of transformations that meets a threshold loss over the plurality of training pairs in the input dataset as notes minimizing loss] Here, x′ is a perturbed version of x, which was created using a single BFS step of our non-targeted attack (Section 4.3). During training, we minimized J + Jadv simultaneously.”
and wherein the sequence of transformations comprises a plurality of transformations that prevents the machine learning model from correctly classifying an adversarial example
[Yefet 6.2.1 Metrics page 17]: “In non-targeted attacks, the goal of the adversary is to change the prediction of the model to any label other than the correct label [and wherein the sequence of transformations comprises a plurality of transformations that prevents the machine learning model from correctly classifying an adversarial example]. We thus define robustness as the percentage of examples in which the correctly predicted label was not changed to any label other than the correct label.”
Regarding Claim 18:
The method of claim 14 is taught by Yefet
Yefet teaches:
wherein the threshold loss is a cumulative maximum loss of the plurality of transformations in the sequence of transformations
[Yefet 4.3 Non-targeted Attack page 12]: “In a non-targeted attack, our goal is to update v to v′ in a way that will increase the loss in any direction, instead of decreasing it, as in the training process. Thus, we compute the gradient with respect to v and use Gradient Ascent… In Equation (5), our goal for the non-targeted attack is to find the direction of the higher loss [wherein the threshold loss is a cumulative maximum loss of the plurality of transformations in the sequence of transformations where moving in the direction of higher loss for the transformations is seen as maximizing loss] with respect to the original label. This is exactly what the gradient gives us.”
Regarding Claim 19:
Claim 19 recites analogous limitations as claim 11.
Regarding Claim 20:
Claim 20 recites analogous limitations as claim 12.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 2, 4, 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yefet et al (“Adversarial examples for models of code”), referred to as Yefet in this document, and further in combination with Jones et al (US 20210335039 A1), referred to as Jones in this document.
Regarding claim 2:
The method of claim 1 is taught by Yefet.
and wherein learning parameters of the learning algorithm are updated according to a stochastic gradient descent-based optimization model
[Yefet 2.2 Discrete Adversarial Manipulation of Programs (DAMP)]: “In standard stochastic gradient descent [and wherein learning parameters of the learning algorithm are updated according to a stochastic gradient descent-based optimization model] (SGD)-based training of neural networks, the weights of the network are updated to minimize the loss function”
Yefet does not explicitly teach:
wherein the learning algorithm is multi-layer perceptron function,
Jones teaches:
wherein the learning algorithm is multi-layer perceptron function,
[Jones 0100]: “In some implementations, the trained machine learning model may include an artificial neural network (ANN), e.g., a feedforward ANN such as a multilayer perceptron [wherein the learning algorithm is multi-layer perceptron function] (MLP)”
The interpretation issue with 112b result in the mapping of claim wanting the teaching of multi-layer perceptrons.
One of ordinary skill in the art, prior to the effective filing date, would have been motivated to combine Yefet and Jones. Yefet and Jones are in the same field of endeavor of machine learning. One of ordinary skill in the art would have been motivated to combine Yefet and Jones to utilize multi-layer perceptron models as multi-layer perceptron models are a known method of implementation for a trained machine learning model ([Jones 0100]: “In some implementations, the trained machine learning model may include an artificial neural network (ANN), e.g., a feedforward ANN such as a multilayer perceptron (MLP)”).
Regarding claim 4:
The method of claim 2 is taught by Yefet and Jones
Jones teaches:
wherein the learning algorithm comprises a five-layer perceptron
[Jones 0114]: “In some implementations/embodiments, a decoder for texturing may include five [wherein the learning algorithm comprises a five-layer perceptron where additional support for perceptron are noted in claim 2] fully-connected layers with output sizes of 1024, 512, 256, 128, and 2, respectively.”
The motivation is the same motivation to combine with Jones as claim 2.
Regarding claim 5:
The method of claim 4 is taught by Yefet and Jones
Jones teaches:
wherein the five-layer perceptron comprises kernel sizes of 1024, 512, 256, 512, and 1024
[Jones 0114]: “In some implementations/embodiments, a decoder for texturing may include five fully-connected layers with output sizes of 1024, 512, 256, [wherein the five-layer perceptron comprises kernel sizes of 1024, 512, 256, 512, and 1024] 128, and 2, respectively.”
The interpretation of the claim is that sizes of a layer should be possible, as the 112b regarding claims 2-5 makes the understanding of kernel sizes for a multilayer perceptron unknown. Thus Jones is teaching that a multilayer perceptron can include the sizes of 1024, 512, and 256.
Motivation to combine with Jones is the same motivation to combine as claim 2.
Claims 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yefet et al (“Adversarial examples for models of code”), referred to as Yefet in this document, and further in combination with Jones et al (US 20210335039 A1), referred to as Jones in this document, and further in combination with Zhang et al (“Improved Adam Optimizer for Deep Neural Networks”), referred to as Zhang in this document.
Regarding claim 3:
The method of claim 2 is taught by Yefet and JonesYefet does not explicitly teach:
wherein the learning algorithm comprises an adaptive learning rate according to an Adam optimizer function
Zhang teaches:
wherein the learning algorithm comprises an adaptive learning rate according to an Adam optimizer function
[Zhang Introduction page 1]: “To tackle this problem, several adaptive variants of SGD were developed, including Adagrad [4], Adadelta [5], RM-Sprop [6], Adam [7]. These algorithms aim to adapt the learning rate [wherein the learning algorithm comprises an adaptive learning rate according to an Adam optimizer function] to different parameters automatically, based on the statistics of gradient. Although they usually simplify learning rate settings, and lead to faster convergence…”
Zhang notes trying to create an improved Adam optimizer (as indicated by the title of Zhang) as the alternative to Adam is SGD with momentum which can be seen as better in some cases if the focus is accuracy. Zhang teaches reasoning for the use and improvement of the Adam optimizer, thus is seen as teaching what is wanted in the claim limitations.
One of ordinary skill in the art, prior to the effective filing date, would have been motivated to combine Yefet and Zhang. Yefet and Zhang are in the same field of endeavor of machine learning. One of ordinary skill in the art would have been motivated to combine Yefet and Zhang to utilize adaptive learning rate like the Adam optimizer does as optimizers like Adam help simplify settings and converge models faster ([Zhang Introduction page 1]: “To tackle this problem, several adaptive variants of SGD were developed, including Adagrad [4], Adadelta [5], RM-Sprop [6], Adam [7]. These algorithms aim to adapt the learning rate to different parameters automatically, based on the statistics of gradient. Although they usually simplify learning rate settings, and lead to faster convergence…”).
Claims 6, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yefet et al (“Adversarial examples for models of code”), referred to as Yefet in this document, and further in combination with Quiring et al (“Misleading Authorship Attribution of Source Code using Adversarial Learning”), referred to as Quiring in this document.
Regarding claim 6:
The method of claim 1 is taught by Yefet
Yefet does not explicitly teach:
wherein the attacker model comprises a search algorithm is a Monte-Carlo Tree Search (MCTS), and wherein the search algorithm that identifies an exhaustive list of transformation for the sequence of transformations of the input program
Yefet teaches a search algorithm and the sequence of transformations of the input program as noted earlier in claim 1. The exhaustive list aspect is treated as being related to the Monte-Carlo Tree Search as a definition doesn’t appear within the specification.
Quiring teaches:
wherein the attacker model comprises a search algorithm is a Monte-Carlo Tree Search (MCTS),
[Quiring Introduction page 1]: “Our attack proceeds by iteratively transforming the source code of a program, such that stylistic patterns are changed while the underlying semantics are preserved. To determine these transformations, we interpret the attack as a game against the attribution method and develop a variant of Monte-Carlo tree search [wherein the attacker model comprises a search algorithm is a Monte-Carlo Tree Search (MCTS)] [24] for constructing a sequence of adversarial but plausible transformations. This black-box strategy enables us to construct untargeted attacks that thwart a correct attribution as well as targeted attacks that imitate the stylistic patterns of a developer.”
and wherein the search algorithm that identifies an exhaustive list of transformation for the sequence of transformations of the input program
[Quiring 5 Monte-Carlo Tree Search]: “As a remedy, we construct our attack algorithm around the concept of Monte-Carlo tree search (MCTS)—a strong search algorithm that has proven effective in AI gaming with AlphaGo [24]. Similar to a game tree, our variant of MCTS creates a search tree for the attack, where each node represents a state of the source code and the edges correspond to transformations. By moving in this tree, we can evaluate the impact of different transformation sequences [and wherein the search algorithm that identifies an exhaustive list of transformation for the sequence of transformations of the input program] before deciding on the next move. Figure 10 depicts the four basic steps of our algorithm: selection, simulation, expansion and backpropagation.”
One of ordinary skill in the art, prior to the effective filing date, would have been motivated to combine Yefet and Quiring. Yefet and Quiring are in the same field of endeavor of machine learning. One of ordinary skill in the art would have been motivated to combine Yefet and Quiring to utilize Monte-Carlo Tree Search as the algorithm is considered a good candidate as a result of previous cases of effectiveness ([Quiring 5 Monte-Carlo Tree Search]: “As a remedy, we construct our attack algorithm around the concept of Monte-Carlo tree search (MCTS)—a strong search algorithm that has proven effective in AI gaming with AlphaGo [24]. Similar to a game tree, our variant of MCTS creates a search tree for the attack, where each node represents a state of the source code and the edges correspond to transformations. By moving in this tree, we can evaluate the impact of different transformation sequences before deciding on the next move. Figure 10 depicts the four basic steps of our algorithm: selection, simulation, expansion and backpropagation.”).
Regarding Claim 15:
Claim 15 recites analogous limitations as claim 6.
Claims 7, 8, 16, 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yefet et al (“Adversarial examples for models of code”), referred to as Yefet in this document, and further in combination with Wang et al (“A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack and Learning”), referred to as Wang in this document.
Regarding claim 7:
The method of claim 1 is taught by Yefet
Yefet teaches:
and wherein the feature extractor is a function of a classification task and the input program
PNG
media_image1.png
487
934
media_image1.png
Greyscale
[Yefet Figure 1 page 3] above shows the adversarial example hurting the prediction [and wherein the feature extractor is a function of a classification task and the input program] of what a function of code is doing.
Yefet does not explicitly teach:
wherein the feature extractor is a convolutional neutral network,
Wang teaches:
wherein the feature extractor is a convolutional neutral network,
[Wang Introduction page 1]: “With the rapid development and superior performance achieved in various vision tasks, deep convolutional neural networks (CNNs) [wherein the feature extractor is a convolutional neutral network where a feature extraction is shown in claim 1 by Yefet and use of CNNs for models is taught here] have eventually led to pervasive and dominant applications in many industries. However, most deep CNN models could be easily misled by natural images with imperceptible but deceptive perturbations. These crafted images are known as adversarial examples, which have become one of the biggest threats in real-world applications with security-sensitive purposes [1], [2], [3]. Devising an effective algorithm to generate such deceptive examples can not only help to evaluate the robustness of deep models, but also promote better understanding about deep learning for the future community development.”
One of ordinary skill in the art, prior to the effective filing date, would have been motivated to combine Yefet and Wang. Yefet and Wang are in the same field of endeavor of machine learning. One of ordinary skill in the art would have been motivated to combine Yefet and Wang to utilize CNNs as such networks have shown superior performance in some tasks and applications in many industries ([Wang Introduction page 1]: “With the rapid development and superior performance achieved in various vision tasks, deep convolutional neural networks (CNNs) have eventually led to pervasive and dominant applications in many industries.”)
Regarding claim 8:
The method of claim 7 is taught by Yefet and Wang
Yefet teaches:
wherein the classification task is a functionality prediction of source code
PNG
media_image1.png
487
934
media_image1.png
Greyscale
[Yefet Figure 1 page 3] above shows the adversarial example hurting the prediction [wherein the classification task is a functionality prediction of source code] of what a function of code is doing.
This is also supported by the known examples given in [Yefet Introduction page 2]: “Neural models of code have achieved state-of-the-art performance on various tasks such as the prediction of variable names and types [Allamanis et al. 2018; Alon et al. 2018; Bielik et al. 2016; Raychev et al. 2015], code summarization [Allamanis et al. 2016; Alon et al. 2019a; Fernandes et al. 2019], code generation [Alon et al. 2019b; Brockschmidt et al. 2019; Murali et al. 2017], code search [Cambronero et al. 2019; Liu et al. 2019; Sachdev et al. 2018], and bug finding [Pradel and Sen 2018; Rice et al. 2017; Scott et al. 2019].”
Regarding Claim 16:
Claim 16 recites analogous limitations as claim 7.
Regarding Claim 17:
Claim 17 recites analogous limitations as claim 8.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Vedaldi et al (“Efficient Additive Kernels via Explicit Feature Maps”) is relevant art that notes aspects of mapping things like kernels in a feature map approximation to speed up training and test times thus enabling larger datasets. The improvement aligns with an improvement aspect of the mapping part of the current invention.
Sontakke et al (“Code Summarization: Do Transformers Really Understand Code?”) is relevant art that discusses model summarization of code, adversarial examples, large datasets, and other aspects relevant to the current invention.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTOPHER D DEVORE whose telephone number is (703)756-1234. The examiner can normally be reached Monday-Friday 7:30 am - 5 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.D.D./Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129