Last updated: April 19, 2026
Application No. 17/853,518
METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO IMPROVE PERFORMANCE OF AN ARTIFICIAL INTELLIGENCE BASED MODEL ON DATASETS HAVING DIFFERENT DISTRIBUTIONS

Final Rejection §101§103
Filed
Jun 29, 2022
Examiner
MARU, MATIYAS T
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Intel Corporation
OA Round
2 (Final)
Interview Optional

— +12.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 40 resolved cases, 2023–2026
Examiner Intelligence

MARU, MATIYAS T View full profile →
Grants 58% of resolved cases
Career Allow Rate
23 granted / 40 resolved
+2.5% vs TC avg
Moderate +12% lift
Without
With
+12.5%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
39 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
35.9%
-4.1% vs TC avg
§103
50.9%
+10.9% vs TC avg
§102
1.9%
-38.1% vs TC avg
§112
11.3%
-28.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 40 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Argument
Applicant's arguments filed 12/26/2025 ("Arguments/Remarks") have been fully considered but they are not persuasive.
Argument – 1: (pg. 13) Applicant contends:
“Like Director Squires's explanation in Desjardins, where he explained that the court in Enfish articulated that "the claims are not simply directed to any form of storing tabular data, but instead are specifically directed to a self-referential table for a computer database," Enfish, 822 F.3d at 1336 (emphasis in original) ...
…In comparison, known approaches to training machine learning models to recognize adversarial attacks "suffer from increase training due to the additional overhead during backpropagation resulting from generating perturbed images, as well as additional storage requirements." (Id. at para. [0029]). Given that known methods of performing such training "sacrifice performance on clean images, often resulting in a significant loss in performance (e.g., an NN [neural network] will perform well when classifying adversarial images but perform poorly when classifying clean images)" (Id. at para. [0026]), methods and apparatus disclosed herein improve training of machine learning models to recognize adversarial attacks by using a fast learnable once-for-all adversarial training (FLOAT) that includes a configurable scaled noise tensor that is added to a parameter tensor for layer of the machine learning model when processing adversarial data (Id. at para. [0030]). Clearly, claim 1 as a whole is not directed to an abstract idea. On the contrary, one can only arrive at the incorrect conclusion in the Office action through analysis done at the impermissibly high level of granularity precluded by Director Squires.”
Regarding the above argument, the Examiner notes that in the Ex parte Desjardins, the claim is directed to a specific enhancement in how a machine learning model train and how the model estimates the importance of parameters learned from a first task and then updates those parameters during training on a second task using a penalty term and how performance improve on the second task while preserving performance on the first task, rather than an abstract idea or mathematical concept. The specification provides the required technical details explaining how model parameters are adjusted to optimize performance on a new task while protecting performance on a prior task, which addresses the technical problem of knowledge degradation. The claim also reflects the disclosed improvement in the specification. In the instant application, the specification (¶[0030]) describes improvements such as achieving state of the art robustness to adversarial inputs, incorporating a configurable scaled noise tensor into model parameters during adversarial processing, simultaneously training on clean and adversarial data, and improving memory efficiency through non-iterative parameter pruning. However, these supposed technical improvement are not reflect in the claims. The claims do not recite the addition of a scaled noise tensor to parameter tensors, the simultaneous adversarial and clean training or the specific non-iterative pruning mechanism that allegedly improves memory efficiency. It is noted that the features upon which applicant relies are not recited in the rejected claim. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims, see MPEP 2106.05(a).

Argument – 2: (pg. 14) Applicant contends:
“The Office Action also alleges that claim 1 recites a metal process practically performed in the human mind. For example, the Office Action alleges that claim 1 is directed to "a mental process" (see Office Action, p. 2). However, claim 1 does not recite a mental process because the steps are not practically performed in the mind. Training of the machine learning model…
… As such, performing training of the machine learning model as disclosed in claim 1 is associated with extensive databases that cannot be practically performed in the mind. For example, training of a machine learning model includes modifications of data structures associated with memory that cannot be practically performed in the human mind as the respective formats and quantities of the data would be impossible for a human mind to process, even with the help of pencil and paper. Claim 1 disclosed herein includes the training of a machine learning model that requires the accessing of computer memory, given that training machine learning models is memory-intensive (e.g., since the model's parameters and the data used for training need to be stored in memory). Therefore, claim 1 does not recite a mental process because it does not contain limitations that can practically be performed in the human mind and/or the human mind is not equipped to perform the claim limitations.”
Regarding the above argument, the Examiner notes that the claim 1 includes limitations that falls under abstract idea: mental process, such as: 
…to determine whether the data is to be processed as adversarial data
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: which is evaluating information and making a judgement about classifying data as adversarial or not.
…to, based on whether the adversarial evaluation circuitry indicates that the data is to be processed as the adversarial data, determine a convolution of an input tensor corresponding to the data and (1) a parameter tensor corresponding to a layer of the AI-based model or (2) a noisy parameter tensor generated based on the parameter tensor;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process of determining whether to apply a convolution between an input tensor and a parameter tensor based on an evaluation of whether the data is adversarial, which is deciding whether to process in one way or another. (see claim rejections - 35 USC § 101 section).
In addition, the Examiner emphasize that limitation directed to training a machine learning model to classify inputs, including related data processing or optimization steps, have not been characterized as mental process under Step 2A, prong one. Rather, such limitation have been treated as reciting the use of a generic computer to implement an abstract idea, i.e.: “apply it”, where the computer merely performs data analysis or classification without reciting a specific technological improvement in computer functionality, (see MPEP § 2106.05(f)). Accordingly, these training related limitations fall within the category of implementing an abstract idea on a computer.

Argument – 3: (pg. 15) Applicant contends:
“Lastly, claim 1 is grounded in the practical application of improving model performance on both clean and adversarial images while also meeting a target global parameter density for the machine learning model (Id. at para. [00176]). Claim 1 applies the claimed elements in a meaningful way, which makes the claim, as a whole, more than a mere drafting effort that monopolizes a judicial exception. For example, claim 1 improves detection of adversarial attacks by, inter alia, training a machine learning model to classify input data, the input data including clean data having a first distribution and adversarial data having a second distribution, by identifying, based on a conditional parameter, ….”
Regarding the above argument, the Examiner notes that claim 1 lacks the required technical specificity to demonstrate how the alleged improvement i.e., improved performance on both clean and adversarial images while meeting a target global parameter density is achieved. The claim does not recite any particular training architecture, adversarial perturbation mechanism, parameter pruning mechanism or other technical details that would link the recited steps to the asserted performance gains. Accordingly, the claim does not integrate the alleged improvement into a practical application, but merely recites a desired result without specifying the technological means for achieving it.

Argument – 4: (pg. 15 – 17) Applicant contends:
“In fact, the instant application discloses multiple examples of how the claimed subject matter provides improvements over known methods of performing machine learning model training for recognition of an adversarial attack. As disclosed in the present specification, "during training, the machine learning platform reduces the total amount of parameters required to implement the CNN model by implementing pruning [such that] models trained by the machine learning platform can operate in resource constrained environments (e.g., where there is a limited supply of resources, such as compute resources, memory resources, network resources, power resources, and/or storage resources) (Id. at para. [0037], emphasis added)…”
Regarding the above argument, the Examiner notes that the cited paragraphs ([0025], [0037] and [0180]) describe specific implementation details such as implementing pruning to reduce total parameter count, achieving quantified improvements in robust and clean accuracy, reducing storage requirement and lowering latency in resource constrained environments. However, these technical details are not reflected in claim 1. The claim does not recite any particular pruning algorithms, parameter reduction mechanism or resource constrained configuration. Instead, it generically recites training a model using clean and adversarial data and performing convolutions with parameter or noisy parameter tensors. Accordingly, the asserted technological improvements are not reflected on the claim. 
Applicant’s arguments (regarding claim 103 rejection (pg. 17 – 20)) with respect to amended claim(s) have been considered but are moot, because arguments/remarks are directed to amended claim limitations “wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model, the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor;” that were not previously examined by the examiner. The rejections are noted in the current office action to address amended claim limitations.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim(s) 1 – 4, 6 – 14, 16 – 24 and 61 – 63 rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e. an abstract idea) without significantly more.
In step 1, of the 101-analysis set forth in the MPEP 2106, the examiner has determined
that the following limitations recite a process that, under the broadest reasonable interpretation, falls within one or more statutory categories (processes).
In step 2A prong 1, of the 101-analysis set forth in MPEP 2106, the examiner has determined
that the following limitations recite a process that, under broadest reasonable interpretation, covers
a mental process but for the recitation of generic computer components:
Regarding claim 1, 
identifying, based on a conditional parameter when the input data is to be processed as the adversarial data;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating conditional parameters and making a judgement about when to classify data as adversarial or not. See (MPEP 2106.04)).
determining a convolution of an input tensor corresponding to the input data and (1) a parameter tensor corresponding to a layer of the machine learning model or (2) a noisy parameter tensor generated based on the parameter tensor;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves determining whether to apply a convolution between an input tensor and a parameter tensor based on an evaluation of whether the data is adversarial, which is deciding whether to process in one way or another. See (MPEP 2106.04)).

If the claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process, but for the recitation of generic computer components, then it falls within the mental process. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2 of the 101-analysis, set forth in MPEP 2106, the examiner has determined that
the following additional elements do not integrate this judicial exception into a practical application:
As evaluated below:
• 	The preamble is deemed insufficient to transform the judicial exception to a patentable
invention to a patentable invention because the preamble generally links the use of a
judicial exception to a particular technological environment or field of use, see MPEP
2106.05(h).
interface circuitry; machine-readable instructions; and at least one processor circuit to be programmed by the machine readable instructions
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
train a machine learning model to classify input data, the input data including clean data having a first distribution and adversarial data having a second distribution:
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model, the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor; 
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h)).
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation directed to mere data outputting as deemed insufficient to transform the judicial exception because claimed elements are considered insignificant extra-solution activity. See MPEP (2106.05(g))).

In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the
claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception: 
Regarding limitation (I and II), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f). 
Regarding limitation (III), additional elements are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because they generally link the judicial exception to the technology environment, see MPEP 2106.05(h).	
Regarding limitation (IV), additional elements considered extra/post solution activity, as analyzed above, are activity that are well-understood routine and conventional, specifically: the courts have recognized the computer functions as well‐understood, routine, and conventional functions.
Data gathering and outputting: see Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015).

As analyzed above, the additional elements, analyzed above, do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Regarding claim 11, 
The rest of the limitations recite similar subject matter as claim 1, so are rejected under the same
rationale.
A server to distribute first instructions on a network, the server comprising: at least one storage device including second instructions; and 
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
processor circuitry to execute the second instructions to cause transmission of the first instructions over the network, the first instructions, when executed, to cause at least one device to:
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception:
Regarding limitation (I and II), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Regarding claim 21, 
The rest of the limitations recite similar subject matter as claim 1, so are rejected under the same
rationale
A non-transitory machine readable storage medium comprising instructions that, when executed, cause processor circuitry to at least:
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).

In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception:
Regarding limitation (I), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Regarding claim 2, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
generate a noise tensor;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves generating a noise tensor (an array of random numbers or values). See (MPEP 2106.04)).
apply at least one of a noise scaling factor or the conditional parameter to the noise tensor
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves applying a noise scaling factor or a conditional parameter to generated random values, which is scaling or conditioning a number array. See (MPEP 2106.04)).
the conditional parameter indicating that the input data is to be processed as the adversarial data;
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
combine the noise tensor with the parameter tensor to generate the noisy parameter tensor.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves combining the noise tensor (an array of random numbers or values) with a parameter tensor to generate a noisy parameter tensor. See (MPEP 2106.04)).
Claim 12 and 22, recite similar subject matter as claim 2, so are rejected under the same rationale.
Regarding claim 3, dependent upon claim 2, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
adjust, based on at least one of a gradient for the parameter tensor or a bitmask tensor for the parameter tensor, at least one of the parameter tensor for the layer of the machine learning model or the noise scaling factor
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves modifying values based on other determined values (gradients or masks). See (MPEP 2106.04)).
Claim 13 and 23, recite similar subject matter as claim 3, so are rejected under the same rationale.
Regarding claim 4, dependent upon claim 2, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein to combine the noise tensor with the parameter tensor, [ ] is to perform element-wise addition using first elements of the parameter tensor and second elements of the noise tensor.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves combining two tensors by applying element-wise addition (adding corresponding elements from each tensor). See (MPEP 2106.04)).
the one or more of the at least one processor circuit
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Claim 14 and 24, recite similar subject matter as claim 4, so are rejected under the same rationale.
Regarding claim 6, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
apply a bitmask tensor to the parameter tensor
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves performing an element wise masking operation between two numerical arrays (tensors), where values in one tensor are selectively retained, modified, or nullified based on corresponding values in the bitmask tensor), See (MPEP 2106.04)).
Claim 16, recites similar subject matter as claim 6, so is rejected under the same rationale.
Regarding claim 7, dependent upon claim 6, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein to apply the bitmask tensor to the parameter tensor, the one or more of the at least one processor circuit is to perform element-wise multiplication using first elements of the parameter tensor and second elements of the bitmask tensor.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves applying a bitmask tensor to a parameter tensor by performing element-wise multiplication between the tensor elements). See (MPEP 2106.04)).
Claim 17, recites similar subject matter as claim 7, so is rejected under the same rationale.
Regarding claim 8, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
determine a ranking of the first layer and a second layer of the machine learning model;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves determining a ranking of the first layer and the second layer of a model, which comparing properties or scores or layers to determine their rank . See (MPEP 2106.04)).
based on the ranking and a constraint associated with a total amount of parameters of the machine learning model, determine (1) that at least one of a first bitmask tensor corresponding to the first parameter tensor or a second bitmask tensor corresponding to a second parameter tensor is to be adjusted, the second parameter tensor corresponding to the second layer and (2) one or more adjustments to the at least one of the first bitmask tensor or the second bitmask tensor that is to be adjusted;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves based on the ranking and a constraint on the total number of parameters, whether a bitmask tensor should be adjusted and what adjustments should be made. See (MPEP 2106.04)).
update the at least one of the first bitmask tensor or the second bitmask tensor based on the one or more adjustments.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves updating one or more bitmask tensors based on previously determined adjustments. See (MPEP 2106.04)).
Claim 18, recites similar subject matter as claim 8, so is rejected under the same rationale.
Regarding claim 9, dependent upon claim 8, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein one or more of the at least one processor circuit is to determine the ranking of the first layer and the second layer based on at least one of: a first momentum of the first layer and a second momentum of the second layer; or a first Frobenius norm of the first layer and a second Frobenius norm of the second layer.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves determining a ranking of neural network layers based on calculated mathematical values (momentum or Frobenius norm)). See (MPEP 2106.04)).
Claim 19, recites similar subject matter as claim 9, so is rejected under the same rationale.
Regarding claim 10, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
to adjust the parameter tensor based on a slimming factor for the machine learning model;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves applying a previously determined scaling factor to modify values in a dataset or model representation. See (MPEP 2106.04)).
process a tensor output with adversarial normalization for the slimming factor
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Claim 20, recites similar subject matter as claim 10, so is rejected under the same rationale.
Regarding claim 61, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
execute an adversarial batch-normalization sub-layer to generate an adversarial tensor when the data is to be processed as the adversarial data.
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 62, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
execute a clean batch-normalization sub-layer to generate a clean tensor when the data is to be processed as the clean data
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 63, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
generate a noisy weight tensor for convolution with the IFM of the perturbed input image.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves generating a modified numerical array (weight tensor) by introducing noise into existing weight values and preparing it for use in a convolution operation. See (MPEP 2106.04)).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 11 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Esmaeilzadeh et al., Pub. No.: US20220269928A1 in view of Wang et al., Pub. No.: US20230297823A1, Mathews et al., Pub. No.: US20210097176A1 and Condessa, Pub. No.: US20210182394A1.
Regarding claim 1, Esmaeilzadeh teaches: An apparatus comprising, 
interface circuitry machine-readable instructions; and at least one processor circuit to be programmed by the machine readable instructions to:
(Esmaeilzadeh, “[0116] Computing system 800 may include one or more processors (e.g., processors 810 a-810 n) coupled to system memory 820, an input/output I/O device interface 830, and a network interface 840 via an input/output (I/O) interface 850. A processor may include a single processor or a plurality of processors [interface circuitry machine-readable instructions; and at least one processor circuit to be programmed by the machine readable instructions to] (e.g., distributed processors).”)
 identifying, based on a conditional parameter, when the input data is to be processed as the adversarial data; 
(Esmaeilzadeh, “[0075] The vulnerability of the trained model to adversarial attacks and/or other input [is to be processed as the adversarial data] corruption can be an important measure of the robustness of the model and can be leveraged to determine how closely input should be guarded and/or filtered, to determine a deployment strategy, to determine how often or when the model is tested [identifying, based on a conditional parameter, when the input data], etc. A measure of vulnerability or susceptibility for a model can be determined based on a determination of the minimum or smallest change to the input which causes a change in the output—where the change in the output is therefore a mistaken output and/or classification.”)

determining a convolution of an input tensor corresponding to the input data 
(Esmaeilzadeh, “[0079] In order to determine a vulnerability analysis, the minimum perturbation (i.e., minimum μ and minimum σ) for which a tensor of random samples drawn from the distributions defined by the minimum perturbation parameters and added to the input tensor X  [determining a convolution of an input tensor corresponding to the input data] leads to misclassification is determined. For simplicity, the mean of the probability distribution can be set to zero (i.e., μ=0) in an example case. The mean of the probability distribution can be non-zero, and further the probability distribution can be non-symmetric. In the mean of the probability distribution is zero, then the minimum perturbation is given by the minimum standard deviation (i.e., min σ). In order to find the minimum σ, a noise or perturbed input tensor can be generated for which an additive noise vector with a covariance matrix of Σ is injected. The perturbed input tensor can be given by {tilde over (X)}=X+N, where N is the noise vector which has a covariance matrix of Σ. Then the minimum noise can be found for a vulnerability analysis based on an optimization of the covariance matrix E, such as that described in Equation 19, below:”)

(1) a parameter tensor corresponding to a layer of the machine learning model or (2) a noisy parameter tensor generated based on the parameter tensor; and
(Esmaeilzadeh, “[0078] In some embodiments, a loss function (which can instead be a gain function or an optimization function) can be defined for a pretrained neural network. For a neural network, the input can be represented by an input tensor X and the weights can be represented by a weight tensor W. In some embodiments, it can be assumed that perturbation or noise is added to the input tensor X For example, the perturbation can be a tensor where the tensor elements are parameters of one or more probability distribution. In a specific example, the noise can be randomly sampled from a normal distribution, which can be represented as N(μ.sup.ϕ, σ.sup.ϕ). In this example, the perturbation tensor can be a tensor with the dimensions of the input tensor X The elements of the perturbation tensor can be probability distribution parameters, such as (0, 0) [a parameter tensor corresponding to a layer of the AI-based model], which represent a normal distribution, such as N.sup.o(μ.sup.ϕ, σ.sup.ϕ). Alternatively, the perturbation tensor can have different dimensions than the input tensor X For example, the perturbation tensor can be applied to the input tensor X multiple times or in specific regions.”)
Esmaeilzadeh does not teach: 
train a machine learning model to classify input data, the input data including clean data having a first distribution and adversarial data having a second distribution, by:
wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model, the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor; 
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device.
Wang teaches: 
train a machine learning model to classify input data, the input data including clean data having a first distribution and adversarial data having a second distribution, by:
(Wang, “[0015] Accordingly, one embodiment discloses a computer-implemented method for training a neural network. The method includes collecting a plurality of data samples comprising clean data samples and adversarial data samples. The training of the neural network includes training of a probabilistic encoder to encode the plurality of data samples into a probabilistic distribution over a latent space representation. In addition, the training of the neural network comprising training of a classifier to classify an instance of the latent space representation to produce a classification result [train a machine learning model to classify input data]. In addition, the method includes training shared parameters of a first instance of the neural network using the clean data samples [the input data including clean data having a first distribution] and a second instance of the neural network using the adversarial data samples [and adversarial data having a second distribution]. Further, the method includes outputting the shared parameters of the first instance of the neural network and the second instance of the neural network.”)
Wang and Esmaeilzadeh are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Wang with teachings of Esmaeilzadeh to learn shared parameters for improved adversarial robustness, (Wang, Abstract).
Esmaeilzadeh in view of Wang do not teach:
wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model, the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor; 
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device.
Mathews teaches: 
wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model, the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor; 
(Mathews, “[0033] To determine which layers to replace with tree-based structures, the example model modifier 206 of FIG. 2 inputs (e.g., provides) a file into the example trainer classification model 204 after training/testing. Additionally, the model modifier 206 adds an adversarial perturbation to the input [the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor] and inputs the perturbed input into the model modifier 206. The example difference determiner 208 determines the difference (e.g., a Euclidean distance) between the input and the perturbed input at each layer [wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model] of the trainer classification model 204. After the Euclidean distances between the two inputs have been determined for the layers, the difference determiner 208 determines a difference between the determined Euclidean distance at each layer with the determined Euclidean distance of the layer immediately subsequent to each layer. As described above, because the amount of change between layers is smaller at later layers, the distance between the Euclidean distance of a layer to the Euclidean distance of a layer immediately subsequent will decrease with the depth of the layers.”)
Mathews, Esmaeilzadeh and Wang are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Mathews with teachings of Esmaeilzadeh and Wang to analysis and selective replacement of a convolution layer to improve adversarial robustness to the trained classification model, (Mathews, Abstract).
Esmaeilzadeh in view of Wang and Mathews do not teach:
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device
Condessa teaches:
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device.
(Condessa, “[0025] The adversarial defense system 220, via the processing system 160, is advantageously configured to ensure that at least one other system (e.g., actuator system 130) is protected from directly or indirectly receiving a sequence of output data from the machine learning system 200A [outputting a classification of the input data based on the convolution] that has been generated based on a sequence of inputs that is deemed to be an adversarial sequence by the detector 210. More specifically, upon receiving an adversarial label from the detector 210. the adversarial defense system 220, via the processing system 160, is configured to take defensive action with respect to the identified sequence of output data from the machine learning system 200A that corresponds to the adversarial sequence [the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device].”)
Condessa, Esmaeilzadeh, Wang and Mathews are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Condessa with teachings of Esmaeilzadeh, Wang and Mathews to use machine learning model’s output to classify inputs and how performance feedback is used to update the network parameter, (Condessa, Abstract).
Regarding claim 11, Esmaeilzadeh teaches: A server to distribute first instructions on a network, the server comprising: 
(Esmaeilzadeh, “[0101] The ML system 602 may include one or more computing devices described above and may include any type of mobile terminal, fixed terminal, or other device. For example, the ML system 602 may be implemented as a cloud computing system and may feature one or more component devices. Users may, for example, utilize one or more other devices to interact with devices, one or more servers, or other components of system 600.”)

at least one storage device including second instructions; and 
(Esmaeilzadeh, “[0120] System memory 820 may include a tangible program carrier having program instructions stored thereon. A tangible program carrier may include a non-transitory computer readable storage medium. A non-transitory computer readable storage medium [at least one storage device including second instructions] may include a machine readable storage device, a machine readable storage substrate, a memory device, or any combination thereof.”)

processor circuitry to execute the second instructions to cause transmission of the first instructions over the network, the first instructions, when executed, to cause at least one device to: 
(Esmaeilzadeh, “[0116] Computing system 800 may include one or more processors (e.g., processors 810 a-810 n) coupled to system memory 820, an input/output I/O device interface 830, and a network interface 840 via an input/output (I/O) interface 850. A processor may include a single processor or a plurality of processors [processor circuitry to execute the second instructions to cause transmission of the first instructions over the network, the first instructions, when executed, to cause at least one device to] (e.g., distributed processors).”)

 identifying, based on a conditional parameter, when the input data is to be processed as the adversarial data; 
(Esmaeilzadeh, “[0075] The vulnerability of the trained model to adversarial attacks and/or other input [is to be processed as the adversarial data] corruption can be an important measure of the robustness of the model and can be leveraged to determine how closely input should be guarded and/or filtered, to determine a deployment strategy, to determine how often or when the model is tested [identifying, based on a conditional parameter, when the input data], etc. A measure of vulnerability or susceptibility for a model can be determined based on a determination of the minimum or smallest change to the input which causes a change in the output—where the change in the output is therefore a mistaken output and/or classification.”)

determining a convolution of an input tensor corresponding to the input data 
(Esmaeilzadeh, “[0079] In order to determine a vulnerability analysis, the minimum perturbation (i.e., minimum μ and minimum σ) for which a tensor of random samples drawn from the distributions defined by the minimum perturbation parameters and added to the input tensor X  [determining a convolution of an input tensor corresponding to the input data] leads to misclassification is determined. For simplicity, the mean of the probability distribution can be set to zero (i.e., μ=0) in an example case. The mean of the probability distribution can be non-zero, and further the probability distribution can be non-symmetric. In the mean of the probability distribution is zero, then the minimum perturbation is given by the minimum standard deviation (i.e., min σ). In order to find the minimum σ, a noise or perturbed input tensor can be generated for which an additive noise vector with a covariance matrix of Σ is injected. The perturbed input tensor can be given by {tilde over (X)}=X+N, where N is the noise vector which has a covariance matrix of Σ. Then the minimum noise can be found for a vulnerability analysis based on an optimization of the covariance matrix E, such as that described in Equation 19, below:”)

(1) a parameter tensor corresponding to a layer of the machine learning model or (2) a noisy parameter tensor generated based on the parameter tensor; and
(Esmaeilzadeh, “[0078] In some embodiments, a loss function (which can instead be a gain function or an optimization function) can be defined for a pretrained neural network. For a neural network, the input can be represented by an input tensor X and the weights can be represented by a weight tensor W. In some embodiments, it can be assumed that perturbation or noise is added to the input tensor X For example, the perturbation can be a tensor where the tensor elements are parameters of one or more probability distribution. In a specific example, the noise can be randomly sampled from a normal distribution, which can be represented as N(μ.sup.ϕ, σ.sup.ϕ). In this example, the perturbation tensor can be a tensor with the dimensions of the input tensor X The elements of the perturbation tensor can be probability distribution parameters, such as (0, 0) [a parameter tensor corresponding to a layer of the AI-based model], which represent a normal distribution, such as N.sup.o(μ.sup.ϕ, σ.sup.ϕ). Alternatively, the perturbation tensor can have different dimensions than the input tensor X For example, the perturbation tensor can be applied to the input tensor X multiple times or in specific regions.”)
Esmaeilzadeh does not teach: 
train a machine learning model to classify input data, the input data including clean data having a first distribution and adversarial data having a second distribution, by:
wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model, the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor; 
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device.
Wang teaches: 
train a machine learning model to classify input data, the input data including clean data having a first distribution and adversarial data having a second distribution, by:
(Wang, “[0015] Accordingly, one embodiment discloses a computer-implemented method for training a neural network. The method includes collecting a plurality of data samples comprising clean data samples and adversarial data samples. The training of the neural network includes training of a probabilistic encoder to encode the plurality of data samples into a probabilistic distribution over a latent space representation. In addition, the training of the neural network comprising training of a classifier to classify an instance of the latent space representation to produce a classification result [train a machine learning model to classify input data]. In addition, the method includes training shared parameters of a first instance of the neural network using the clean data samples [the input data including clean data having a first distribution] and a second instance of the neural network using the adversarial data samples [and adversarial data having a second distribution]. Further, the method includes outputting the shared parameters of the first instance of the neural network and the second instance of the neural network.”)
Wang and Esmaeilzadeh are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Wang with teachings of Esmaeilzadeh to learn shared parameters for improved adversarial robustness, (Wang, Abstract).
Esmaeilzadeh in view of Wang do not teach:
wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model, the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor; 
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device.
Mathews teaches: 
wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model, the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor; 
(Mathews, “[0033] To determine which layers to replace with tree-based structures, the example model modifier 206 of FIG. 2 inputs (e.g., provides) a file into the example trainer classification model 204 after training/testing. Additionally, the model modifier 206 adds an adversarial perturbation to the input [the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor] and inputs the perturbed input into the model modifier 206. The example difference determiner 208 determines the difference (e.g., a Euclidean distance) between the input and the perturbed input at each layer [wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model] of the trainer classification model 204. After the Euclidean distances between the two inputs have been determined for the layers, the difference determiner 208 determines a difference between the determined Euclidean distance at each layer with the determined Euclidean distance of the layer immediately subsequent to each layer. As described above, because the amount of change between layers is smaller at later layers, the distance between the Euclidean distance of a layer to the Euclidean distance of a layer immediately subsequent will decrease with the depth of the layers.”)
Mathews, Esmaeilzadeh and Wang are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Mathews with teachings of Esmaeilzadeh and Wang to analysis and selective replacement of a convolution layer to improve adversarial robustness to the trained classification model, (Mathews, Abstract).
Esmaeilzadeh in view of Wang and Mathews do not teach:
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device
Condessa teaches:
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device.
(Condessa, “[0025] The adversarial defense system 220, via the processing system 160, is advantageously configured to ensure that at least one other system (e.g., actuator system 130) is protected from directly or indirectly receiving a sequence of output data from the machine learning system 200A [outputting a classification of the input data based on the convolution] that has been generated based on a sequence of inputs that is deemed to be an adversarial sequence by the detector 210. More specifically, upon receiving an adversarial label from the detector 210. the adversarial defense system 220, via the processing system 160, is configured to take defensive action with respect to the identified sequence of output data from the machine learning system 200A that corresponds to the adversarial sequence [the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device].”)
Condessa, Esmaeilzadeh, Wang and Mathews are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Condessa with teachings of Esmaeilzadeh, Wang and Mathews to use machine learning model’s output to classify inputs and how performance feedback is used to update the network parameter, (Condessa, Abstract).




Regarding claim 21, Esmaeilzadeh teaches: A non-transitory machine readable storage medium comprising instructions that, when executed, cause processor circuitry to at least: 
(Esmaeilzadeh, “[0120] System memory 820 may include a tangible program carrier having program instructions stored thereon. A tangible program carrier may include a non-transitory computer readable storage medium. A non-transitory computer readable storage medium [A non-transitory machine readable storage medium comprising instructions that] may include a machine readable storage device, a machine readable storage substrate, a memory device, or any combination thereof.”)

 identifying, based on a conditional parameter, when the input data is to be processed as the adversarial data; 
(Esmaeilzadeh, “[0075] The vulnerability of the trained model to adversarial attacks and/or other input [is to be processed as the adversarial data] corruption can be an important measure of the robustness of the model and can be leveraged to determine how closely input should be guarded and/or filtered, to determine a deployment strategy, to determine how often or when the model is tested [identifying, based on a conditional parameter, when the input data], etc. A measure of vulnerability or susceptibility for a model can be determined based on a determination of the minimum or smallest change to the input which causes a change in the output—where the change in the output is therefore a mistaken output and/or classification.”)

determining a convolution of an input tensor corresponding to the input data 
(Esmaeilzadeh, “[0079] In order to determine a vulnerability analysis, the minimum perturbation (i.e., minimum μ and minimum σ) for which a tensor of random samples drawn from the distributions defined by the minimum perturbation parameters and added to the input tensor X  [determining a convolution of an input tensor corresponding to the input data] leads to misclassification is determined. For simplicity, the mean of the probability distribution can be set to zero (i.e., μ=0) in an example case. The mean of the probability distribution can be non-zero, and further the probability distribution can be non-symmetric. In the mean of the probability distribution is zero, then the minimum perturbation is given by the minimum standard deviation (i.e., min σ). In order to find the minimum σ, a noise or perturbed input tensor can be generated for which an additive noise vector with a covariance matrix of Σ is injected. The perturbed input tensor can be given by {tilde over (X)}=X+N, where N is the noise vector which has a covariance matrix of Σ. Then the minimum noise can be found for a vulnerability analysis based on an optimization of the covariance matrix E, such as that described in Equation 19, below:”)

(1) a parameter tensor corresponding to a layer of the machine learning model or (2) a noisy parameter tensor generated based on the parameter tensor; and
(Esmaeilzadeh, “[0078] In some embodiments, a loss function (which can instead be a gain function or an optimization function) can be defined for a pretrained neural network. For a neural network, the input can be represented by an input tensor X and the weights can be represented by a weight tensor W. In some embodiments, it can be assumed that perturbation or noise is added to the input tensor X For example, the perturbation can be a tensor where the tensor elements are parameters of one or more probability distribution. In a specific example, the noise can be randomly sampled from a normal distribution, which can be represented as N(μ.sup.ϕ, σ.sup.ϕ). In this example, the perturbation tensor can be a tensor with the dimensions of the input tensor X The elements of the perturbation tensor can be probability distribution parameters, such as (0, 0) [a parameter tensor corresponding to a layer of the AI-based model], which represent a normal distribution, such as N.sup.o(μ.sup.ϕ, σ.sup.ϕ). Alternatively, the perturbation tensor can have different dimensions than the input tensor X For example, the perturbation tensor can be applied to the input tensor X multiple times or in specific regions.”)
Esmaeilzadeh does not teach: 
train a machine learning model to classify input data, the input data including clean data having a first distribution and adversarial data having a second distribution, by:
wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model, the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor; 
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device.
Wang teaches: 
train a machine learning model to classify input data, the input data including clean data having a first distribution and adversarial data having a second distribution, by:
(Wang, “[0015] Accordingly, one embodiment discloses a computer-implemented method for training a neural network. The method includes collecting a plurality of data samples comprising clean data samples and adversarial data samples. The training of the neural network includes training of a probabilistic encoder to encode the plurality of data samples into a probabilistic distribution over a latent space representation. In addition, the training of the neural network comprising training of a classifier to classify an instance of the latent space representation to produce a classification result [train a machine learning model to classify input data]. In addition, the method includes training shared parameters of a first instance of the neural network using the clean data samples [the input data including clean data having a first distribution] and a second instance of the neural network using the adversarial data samples [and adversarial data having a second distribution]. Further, the method includes outputting the shared parameters of the first instance of the neural network and the second instance of the neural network.”)
Wang and Esmaeilzadeh are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Wang with teachings of Esmaeilzadeh to learn shared parameters for improved adversarial robustness, (Wang, Abstract).
Esmaeilzadeh in view of Wang do not teach:
wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model, the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor; 
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device.
Mathews teaches: 
wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model, the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor; 
(Mathews, “[0033] To determine which layers to replace with tree-based structures, the example model modifier 206 of FIG. 2 inputs (e.g., provides) a file into the example trainer classification model 204 after training/testing. Additionally, the model modifier 206 adds an adversarial perturbation to the input [the IFM of the perturbed input image different from an IFM of a clean input image for convolution with a weight tensor] and inputs the perturbed input into the model modifier 206. The example difference determiner 208 determines the difference (e.g., a Euclidean distance) between the input and the perturbed input at each layer [wherein the input tensor is an input feature map (IFM) of a perturbed input image for a first layer of the machine learning model] of the trainer classification model 204. After the Euclidean distances between the two inputs have been determined for the layers, the difference determiner 208 determines a difference between the determined Euclidean distance at each layer with the determined Euclidean distance of the layer immediately subsequent to each layer. As described above, because the amount of change between layers is smaller at later layers, the distance between the Euclidean distance of a layer to the Euclidean distance of a layer immediately subsequent will decrease with the depth of the layers.”)
Mathews, Esmaeilzadeh and Wang are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Mathews with teachings of Esmaeilzadeh and Wang to analysis and selective replacement of a convolution layer to improve adversarial robustness to the trained classification model, (Mathews, Abstract).
Esmaeilzadeh in view of Wang and Mathews do not teach:
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device
Condessa teaches:
outputting a classification of the input data based on the convolution, the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device.
(Condessa, “[0025] The adversarial defense system 220, via the processing system 160, is advantageously configured to ensure that at least one other system (e.g., actuator system 130) is protected from directly or indirectly receiving a sequence of output data from the machine learning system 200A [outputting a classification of the input data based on the convolution] that has been generated based on a sequence of inputs that is deemed to be an adversarial sequence by the detector 210. More specifically, upon receiving an adversarial label from the detector 210. the adversarial defense system 220, via the processing system 160, is configured to take defensive action with respect to the identified sequence of output data from the machine learning system 200A that corresponds to the adversarial sequence [the classification identifying the input data as adversarial data or clean data as part of training the machine learning model for deployment to an endpoint device].”)
Condessa, Esmaeilzadeh, Wang and Mathews are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Condessa with teachings of Esmaeilzadeh, Wang and Mathews to use machine learning model’s output to classify inputs and how performance feedback is used to update the network parameter, (Condessa, Abstract).
Claim(s) 2, 12 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Esmaeilzadeh in view of Wang, Mathews and Condessa, and in further view of Murray Pub. No.: US20210089903A1.
Regarding claim 2, Esmaeilzadeh in view of Wang, Mathews and Condessa teach the method of claim 1.
Esmaeilzadeh further teaches: generate a noise tensor; 
(Esmaeilzadeh, “[0069] A loss function can be defined as L(X; W) for a neural network where X represents the inputs to the neural network and W represents a vector of the weights of the neural network. Stochastic noise can be added to the neural network by a perturbation of the weights of the vector W to create a vector W+N where N represents a zero-mean additive noise vector. The noise vector N can be any appropriate probability distribution [generate a noise tensor;]. The noise vector N has a covariance matrix Σ. If the weight vector W is perturbed to W+N, this is equivalent to sampling weights from a distribution with the mean of W and the variance of Σ.”)

combine the noise tensor with the parameter tensor to generate the noisy parameter tensor.  
(Esmaeilzadeh, “[0078] In some embodiments, a loss function (which can instead be a gain function or an optimization function) can be defined for a pretrained neural network. For a neural network, the input can be represented by an input tensor X and the weights can be represented by a weight tensor W [combine the noise tensor with the parameter tensor to generate the noisy parameter tensor]. In some embodiments, it can be assumed that perturbation or noise is added to the input tensor X For example, the perturbation can be a tensor where the tensor elements are parameters of one or more probability distribution. In a specific example, the noise can be randomly sampled from a normal distribution.”)

Condessa further teaches: the conditional parameter indicating that the input data is to be processed as the adversarial data; and 
(Condessa, “[0045] FIG. 7 illustrates a flow diagram of an example of the training process 500 (FIG. 5) for generating the detector 210 according to an example embodiment. This training process 500 includes a method 700 for training at least one machine learning system 210A of the detector 210 to differentiate between at least one sequence of nominal data and at least one sequence of adversarial data. Advantageously, this method 700 provides training data 412 that includes both a set of nominal sequences 412A and a set of adversarial sequences 412B while also optimizing parameters of the machine learning system 210A [the conditional parameter] of the detector 210 based on results obtained from this training data 412. Accordingly, upon undergoing the training process 500 with this method 700, the detector 210 becomes operable to identify a sequence, predict whether or not the sequence is nominal/adversarial [indicating that the input data is to be processed as the adversarial data], and provide a label indicative of its prediction.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Condessa with teachings of Esmaeilzadeh, Wang and Mathews for the same reasons disclosed for claim 1.
Esmaeilzadeh in view of Wang, Mathews and Condessa do not teach: 
apply at least one of a noise scaling factor or the conditional parameter to the noise tensor
Murray teaches: 
apply at least one of a noise scaling factor or a conditional parameter to the noise tensor, 
(Murray, “[0143] In an embodiment, a generative adversarial network, GAN, implemented by one or more computers to generate images, comprises: a generator neural network configured to process an input comprising a noise vector and a pair of conditioning variables to generate an image according to the conditioning variables [apply, … the conditional parameter to the noise tensor], wherein a pair of conditioning variables comprises a first conditioning variable and a second conditioning variable, and wherein the generator neural network comprises a mixed-conditional batch normalization, MCBN, layer between a first generator neural network layer and a second generator neural network layer, and wherein the mixed-conditional batch normalization layer is configured to, during processing of the noise vector by the generator neural network: receive a first layer output generated by the first generator neural network layer and the pair of conditioning variables; normalize the first layer output to generate a normalized layer output, comprising transforming the first layer output in accordance with mixed-conditional batch normalization layer parameters to generate the normalized layer output, wherein the mixed-conditional batch normalization layer parameters are computed by applying an affine transformation to the second conditioning variable; and provide the mixed-conditional batch normalization layer output as an input to the second neural network layer.”)
Murray, Esmaeilzadeh, Wang, Mathews and Condessa are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Murray with teachings of Esmaeilzadeh, Wang, Mathews and Condessa to add specialized layer operations that adjust and normalize the network’s outputs according to these conditions, (Murray, Abstract).
Regarding claim 12, Esmaeilzadeh in view of Wang, Mathews and Condessa teach the method of claim 11.
Esmaeilzadeh further teaches: wherein the first instructions, when executed, cause the at least one device to, in response to a determination that the data is to be processed as the adversarial data: generate a noise tensor; 
(Esmaeilzadeh, “[0069] A loss function can be defined as L(X; W) for a neural network where X represents the inputs to the neural network and W represents a vector of the weights of the neural network. Stochastic noise can be added to the neural network by a perturbation of the weights of the vector W to create a vector W+N where N represents a zero-mean additive noise vector. The noise vector N can be any appropriate probability distribution [generate a noise tensor]. The noise vector N has a covariance matrix Σ. If the weight vector W is perturbed to W+N, this is equivalent to sampling weights from a distribution with the mean of W and the variance of Σ.”)

generate the noisy parameter tensor as a combination of the noise tensor and the parameter tensor.
(Esmaeilzadeh, “[0078] In some embodiments, a loss function (which can instead be a gain function or an optimization function) can be defined for a pretrained neural network. For a neural network, the input can be represented by an input tensor X and the weights can be represented by a weight tensor W [generate the noisy parameter tensor as a combination of the noise tensor and the parameter tensor]. In some embodiments, it can be assumed that perturbation or noise is added to the input tensor X For example, the perturbation can be a tensor where the tensor elements are parameters of one or more probability distribution. In a specific example, the noise can be randomly sampled from a normal distribution.”)

Condessa further teaches: the conditional parameter indicative of whether the input data is to be processed as the adversarial data; and 
(Condessa, “[0045] FIG. 7 illustrates a flow diagram of an example of the training process 500 (FIG. 5) for generating the detector 210 according to an example embodiment. This training process 500 includes a method 700 for training at least one machine learning system 210A of the detector 210 to differentiate between at least one sequence of nominal data and at least one sequence of adversarial data. Advantageously, this method 700 provides training data 412 that includes both a set of nominal sequences 412A and a set of adversarial sequences 412B while also optimizing parameters of the machine learning system 210A [the conditional parameter] of the detector 210 based on results obtained from this training data 412. Accordingly, upon undergoing the training process 500 with this method 700, the detector 210 becomes operable to identify a sequence, predict whether or not the sequence is nominal/adversarial [indicative of whether the input data is to be processed as the adversarial data], and provide a label indicative of its prediction.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Condessa with teachings of Esmaeilzadeh for the same reasons disclosed for claim 11.
Esmaeilzadeh in view of Condessa do not teach: 
apply, to the noise tensor, at least one of a noise scaling factor or the conditional parameter
Murray teaches: 
apply, to the noise tensor, at least one of a noise scaling factor or the conditional parameter, 
(Murray, “[0143] In an embodiment, a generative adversarial network, GAN, implemented by one or more computers to generate images, comprises: a generator neural network configured to process an input comprising a noise vector and a pair of conditioning variables to generate an image according to the conditioning variables [apply, to the noise tensor … a conditional parameter], wherein a pair of conditioning variables comprises a first conditioning variable and a second conditioning variable, and wherein the generator neural network comprises a mixed-conditional batch normalization, MCBN, layer between a first generator neural network layer and a second generator neural network layer, and wherein the mixed-conditional batch normalization layer is configured to, during processing of the noise vector by the generator neural network: receive a first layer output generated by the first generator neural network layer and the pair of conditioning variables; normalize the first layer output to generate a normalized layer output, comprising transforming the first layer output in accordance with mixed-conditional batch normalization layer parameters to generate the normalized layer output, wherein the mixed-conditional batch normalization layer parameters are computed by applying an affine transformation to the second conditioning variable; and provide the mixed-conditional batch normalization layer output as an input to the second neural network layer.”)
Murray, Esmaeilzadeh, Wang, Mathews and Condessa are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Murray with teachings of Esmaeilzadeh, Wang, Mathews and Condessa to add specialized layer operations that adjust and normalize the network’s outputs according to these conditions, (Murray, Abstract).
Claim 22, recites limitations analogous to claim 2, so are rejected under the same rationale.
Claim(s) 3, 13 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Esmaeilzadeh in view of Wang, Mathews Condessa, Murray and in further view of SUCH et al., Pub. No.: US20200234144A1.
Regarding claim 3, Esmaeilzadeh in view of Wang, Mathews, Condessa and Murray teach the method of claim 2.
Esmaeilzadeh in view of Wang, Mathews, Condessa and Murray do not teach: 
wherein one or more of the at least one processor circuitry circuit is to adjust, based on at least one of a gradient for the parameter tensor or a bitmask tensor for the parameter tensor, at least one of the parameter tensor for the layer of the machine learning model or the noise scaling factor.
SUCH teaches: 
wherein one or more of the at least one processor circuitry circuit is to adjust, based on at least one of a gradient for the parameter tensor or a bitmask tensor for the parameter tensor, at least one of the parameter tensor for the layer of the machine learning model or the noise scaling factor
(SUCH, “[0037] FIG. 5 shows the details of the step of training the learner model using the generated training dataset, according to an embodiment. The machine learning module 112 repeats the steps 510 and 520 a plurality of times, for example, until a metric indicating the performance of the learner model indicates more than a threshold performance or until the improvement in performance of the learner model with successive iterations is below a threshold value. The machine learning module 112 determines 510 a loss Lx based on a function of the output of the learner model when provided with input corresponding to the generated training dataset. The machine learning module 112 adjusts 520 [adjust, based on at least one of a gradient for the parameter tensor] the parameters of the learner model [at least one of the parameter tensor for the layer of the machine learning model] based on the loss Lx, for example, using gradient descent techniques or other optimization techniques. In an embodiment, the machine learning module 112 adjusts parameters of the learner model using stochastic gradient techniques.”)
SUCH, Esmaeilzadeh, Wang, Mathews, Condessa and Murray are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of SUCH with teachings of Esmaeilzadeh, Wang, Mathews, Condessa and Murray to introduce feedback loop, where the generated datasets and model evaluation are used to adjust parameters, (SUCH, Abstract).
Regarding claim 13, Esmaeilzadeh in view of Wang, Mathews, Condessa and Murray teach the method of claim 12.
Esmaeilzadeh in view of Wang, Mathews, Condessa and Murray do not teach: 
wherein the at least one storage device includes third instructions, and the processor circuitry is to execute the third instructions to adjust, based on at least one of a gradient for the parameter tensor or a bitmask tensor for the parameter tensor, at least one of the parameter tensor for the layer of the machine learning model or the noise scaling factor. 
SUCH teaches: 
wherein the at least one storage device includes third instructions, and the processor circuitry is to execute the third instructions to adjust, based on at least one of a gradient for the parameter tensor or a bitmask tensor for the parameter tensor, at least one of the parameter tensor for the layer of the machine learning model or the noise scaling factor. 
(SUCH, “[0037] FIG. 5 shows the details of the step of training the learner model using the generated training dataset, according to an embodiment. The machine learning module 112 repeats the steps 510 and 520 a plurality of times, for example, until a metric indicating the performance of the learner model indicates more than a threshold performance or until the improvement in performance of the learner model with successive iterations is below a threshold value. The machine learning module 112 determines 510 a loss Lx based on a function of the output of the learner model when provided with input corresponding to the generated training dataset. The machine learning module 112 adjusts 520 [adjust, based on at least one of a gradient for the parameter tensor] the parameters of the learner model [at least one of the parameter tensor for the layer of the artificial intelligence based model or the noise scaling factor] based on the loss Lx, for example, using gradient descent techniques or other optimization techniques. In an embodiment, the machine learning module 112 adjusts parameters of the learner model using stochastic gradient techniques.”)
SUCH, Esmaeilzadeh, Wang, Mathews, Condessa and Murray are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of SUCH with teachings of Esmaeilzadeh, Wang, Mathews, Condessa and Murray to introduce feedback loop, where the generated datasets and model evaluation are used to adjust parameters, (SUCH, Abstract).
Claim 23, recites limitations analogous to claim 3, so are rejected under the same rationale.
Claim(s) 4, 14 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Esmaeilzadeh in view of Wang, Mathews, Condessa and Murray.
Regarding claim 4, Esmaeilzadeh in view of Wang, Mathews, Condessa and Murray teach the method of claim 2.
Esmaeilzadeh further teaches: wherein to combine the noise tensor with the parameter tensor, the one or more of the at least one processor circuit is to perform element-wise addition using first elements of the parameter tensor and second elements of the noise tensor.  

    PNG
    media_image1.png
    37
    415
    media_image1.png
    Greyscale
(Esmaeilzadeh, “[0079] Then the minimum noise can be found for a vulnerability analysis based on an optimization of the covariance matrix E, such as that described in Equation 19, below: 

where ∥Σ∥ is the determinant of the covariance matrix Σ of the noise matrix N. XN represents the input (i.e., the input tensor X) plus the noise samples drawn from the probability distributions contained within the noise tensor N. α ∈ (0,1) [the noise tensor with the parameter tensor, the one or more of the at least one processor circuit is to perform element-wise addition using first elements of the parameter tensor and second elements of the noise tensor] (i.e.: X is the input tensor and N is a noise tensor/vector drawn from a probability distribution) and a is a hyper-parameter that determines how much emphasis is given to the perturbations (i.e., ∥Σ∥) versus the degradation of the network performance represented by the loss function of the neural network with perturbed inputs 
    PNG
    media_image2.png
    38
    29
    media_image2.png
    Greyscale
(XN; W+N). The networks susceptibility to adversarial attacks is given by the low variance noise Σα −, where the noise degrades the neural network performance significantly:”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Esmaeilzadeh with teachings of Wang, Mathews, Condessa and Murray for the same reasons disclosed for claim 2.
Regarding claim 14, Esmaeilzadeh in view of Wang, Mathews, Condessa and Murray teach the method of claim 12.
Esmaeilzadeh further teaches: wherein the first instructions, when executed, cause the at least one device to generate the noisy parameter tensor by performing element-wise addition using first elements of the parameter tensor and second elements of the noise tensor. 

    PNG
    media_image1.png
    37
    415
    media_image1.png
    Greyscale
(Esmaeilzadeh, “[0079] Then the minimum noise can be found for a vulnerability analysis based on an optimization of the covariance matrix E, such as that described in Equation 19, below: 

where ∥Σ∥ is the determinant of the covariance matrix Σ of the noise matrix N. XN represents the input (i.e., the input tensor X) plus the noise samples drawn from the probability distributions contained within the noise tensor N. α ∈ (0,1) [to generate the noisy parameter tensor by performing element-wise addition using first elements of the parameter tensor and second elements of the noise tensor] (i.e.: X is the input tensor and N is a noise tensor/vector drawn from a probability distribution) and a is a hyper-parameter that determines how much emphasis is given to the perturbations (i.e., ∥Σ∥) versus the degradation of the network performance represented by the loss function of the neural network with perturbed inputs 
    PNG
    media_image2.png
    38
    29
    media_image2.png
    Greyscale
(XN; W+N). The networks susceptibility to adversarial attacks is given by the low variance noise Σα −, where the noise degrades the neural network performance significantly:”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Esmaeilzadeh with teachings of Wang, Mathews, Condessa and Murray for the same reasons disclosed for claim 12.
Regarding claim 24, Esmaeilzadeh in view of Wang, Mathews, Condessa and Murray teach the method of claim 22.
Esmaeilzadeh further teaches: wherein the instructions cause the processor circuitry to combine the noise tensor and the parameter tensor by performing element-wise addition based on first elements of the parameter tensor and second elements of the noise tensor.  

    PNG
    media_image1.png
    37
    415
    media_image1.png
    Greyscale
(Esmaeilzadeh, “[0079] Then the minimum noise can be found for a vulnerability analysis based on an optimization of the covariance matrix E, such as that described in Equation 19, below: 

where ∥Σ∥ is the determinant of the covariance matrix Σ of the noise matrix N. XN represents the input (i.e., the input tensor X) plus the noise samples drawn from the probability distributions contained within the noise tensor N. α ∈ (0,1) [to combine the noise tensor and the parameter tensor by performing element-wise addition based on first elements of the parameter tensor and second elements of the noise tensor] and a is a hyper-parameter that determines how much emphasis is given to the perturbations (i.e., ∥Σ∥) versus the degradation of the network performance represented by the loss function of the neural network with perturbed inputs 
    PNG
    media_image2.png
    38
    29
    media_image2.png
    Greyscale
(XN; W+N). The networks susceptibility to adversarial attacks is given by the low variance noise Σα −, where the noise degrades the neural network performance significantly:”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Esmaeilzadeh with teachings of Wang, Mathews, Condessa and Murray for the same reasons disclosed for claim 22.
Claim(s) 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Esmaeilzadeh in view of Wang, Mathews, Condessa and in further view of Sandler et al., Pub. No.: US20190279092A1.
Regarding claim 6, Esmaeilzadeh in view of Wang, Mathews, Condessa teach the method of claim 1.
Esmaeilzadeh in view of Wang, Mathews, Condessa do not teach: 
one or more of the at least one processor circuit is to apply a bitmask tensor to the parameter tensor
Sandler teaches: 
one or more of the at least one processor circuit is to apply a bitmask tensor to the parameter tensor
(Sandler, “[0023] In some implementations, the CNN architecture can be implemented by activating connections between input and output filters according to their likelihood from the uniform distribution. In addition, the activation can be performed such that there are no connections going in or coming out of dead filters. In this manner, any connections must have a path to the input image and a path to the final prediction. All the connections in any fully connected layers associated with the CNN are maintained. In some implementations, the CNN architecture can be implemented by randomly deactivating a fraction α of connections having parameters that connect at least two filters on each layer. A fraction √{square root over (α)} of connections can be randomly deactivated if the associated parameters connect layers having only one filter left. In some implementations, the connections can be activated and/or deactivated by selectively applying masks to parameter tensors associated with the appropriate filters [to apply a bitmask tensor to the parameter tensor].”)
Sandler, Esmaeilzadeh, Wang, Mathews and Condessa are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Sandler with teachings of Esmaeilzadeh, Wang, Mathews and Condessa to reduce redundancy while maintaining performance, supporting more compact and optimized neural network design, (Sandler, Abstract).
Regarding claim 16, Esmaeilzadeh in view of Condessa teach the method of claim 11.
Esmaeilzadeh in view of Wang, Mathews, Condessa do not teach: 
wherein the at least one storage device includes third instructions, and the processor circuitry is to execute the third instructions to apply, to the parameter tensor, a bitmask tensor
Sandler teaches: 
wherein the at least one storage device includes third instructions, and the processor circuitry is to execute the third instructions to apply, to the parameter tensor, a bitmask tensor.
(Sandler, “[0023] In some implementations, the CNN architecture can be implemented by activating connections between input and output filters according to their likelihood from the uniform distribution. In addition, the activation can be performed such that there are no connections going in or coming out of dead filters. In this manner, any connections must have a path to the input image and a path to the final prediction. All the connections in any fully connected layers associated with the CNN are maintained. In some implementations, the CNN architecture can be implemented by randomly deactivating a fraction α of connections having parameters that connect at least two filters on each layer. A fraction √{square root over (α)} of connections can be randomly deactivated if the associated parameters connect layers having only one filter left. In some implementations, the connections can be activated and/or deactivated by selectively applying masks to parameter tensors associated with the appropriate filters [to apply, to the parameter tensor, a bitmask tensor].”)
Sandler, Esmaeilzadeh, Wang, Mathews and Condessa are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Sandler with teachings of Esmaeilzadeh, Wang, Mathews and Condessa to reduce redundancy while maintaining performance, supporting more compact and optimized neural network design, (Sandler, Abstract).
Claim(s) 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Esmaeilzadeh in view of Wang, Mathews, Condessa, Sandler and in further view of Dave et al., "Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights.".
Regarding claim 7, Esmaeilzadeh in view of Wang, Mathews, Condessa and Sandler teach the method of claim 6.
Esmaeilzadeh in view of Wang, Mathews, Condessa and Sandler do not teach: 
wherein to apply the bitmask tensor to the parameter tensor, the one or more of the at least one processor circuit is to perform element-wise multiplication using first elements of the parameter tensor and second elements of the bitmask tensor.
Dave teaches: 
wherein to apply the bitmask tensor to the parameter tensor, the one or more of the at least one processor circuit is to perform element-wise multiplication using first elements of the parameter tensor and second elements of the bitmask tensor.
(Dave, page: 3 – 4, “As Fig. 2 illustrates, the accelerator comprises an array of PEs that may contain private register files (RFs) and shared buffers or a scratchpad memory. PEs are simple in design (functional units with little local control), and the shared scratchpad is non-coherent with software directed execution. Therefore, these accelerators are a few orders of magnitude more power-efficient than out-of-order CPU or GPU cores [20]–[22]. They lead to highly energy-efficient execution of ML models that are compute-intensive and memory-intensive. Performance-critical tensor computations of ML models are relatively simple operations like element-wise or tensor additions and multiplications [to perform element-wise multiplication using first elements of the parameter tensor and second elements of the bitmask tensor]. So, they can be processed efficiently with structured computations on the PE-array.”)
Dave, Esmaeilzadeh, Wang, Mathews, Condessa and Sandler are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Dave with teachings of Esmaeilzadeh, Wang, Mathews, Condessa and Sandler to handle sparse, quantize and irregular tensors in order to reduce computational, memory and communication costs, (Dave, Abstract).
Regarding claim 17, Esmaeilzadeh in view of Wang, Mathews, Condessa and Sandler teach the method of claim 16.
Esmaeilzadeh in view of Wang, Mathews, Condessa and Sandler do not teach: 
wherein the third instructions, when executed, cause the processor circuitry to apply, to the parameter tensor, the bitmask tensor by performing element-wise multiplication using first elements of the parameter tensor and second elements of the bitmask tensor.
Dave teaches: 
wherein the third instructions, when executed, cause the processor circuitry to apply, to the parameter tensor, the bitmask tensor by performing element-wise multiplication using first elements of the parameter tensor and second elements of the bitmask tensor.
(Dave, page: 3 – 4, “As Fig. 2 illustrates, the accelerator comprises an array of PEs that may contain private register files (RFs) and shared buffers or a scratchpad memory. PEs are simple in design (functional units with little local control), and the shared scratchpad is non-coherent with software directed execution. Therefore, these accelerators are a few orders of magnitude more power-efficient than out-of-order CPU or GPU cores [20]–[22]. They lead to highly energy-efficient execution of ML models that are compute-intensive and memory-intensive. Performance-critical tensor computations of ML models are relatively simple operations like element-wise or tensor additions and multiplications [by performing element-wise multiplication using first elements of the parameter tensor and second elements of the bitmask tensor]. So, they can be processed efficiently with structured computations on the PE-array.”)
Dave, Esmaeilzadeh, Wang, Mathews, Condessa and Sandler are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Dave with teachings of Esmaeilzadeh, Wang, Mathews, Condessa and Sandler to handle sparse, quantize and irregular tensors in order to reduce computational, memory and communication costs, (Dave, Abstract).
Claim(s) 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Esmaeilzadeh in view of Wang, Mathews, Condessa, and in further view of Li et al., Pub. No.: US20210248459A1, and GRIGORIEVSKIY et al., Pub. No.: WO2022037756A1. 
Regarding claim 8, Esmaeilzadeh in view of Wang, Mathews, Condessa teach the method of claim 1.
Esmaeilzadeh in view of Wang, Mathews, Condessa do not teach: 
determine a ranking of the first layer and a second layer of the machine learning model; 
based on the ranking and a constraint associated with a total amount of parameters of the machine learning model, determine (1) that at least one of a first bitmask tensor corresponding to the first parameter tensor or a second bitmask tensor corresponding to a second parameter tensor is to be adjusted, the second parameter tensor corresponding to the second layer and (2) one or more adjustments to the at least one of the first bitmask tensor or the second bitmask tensor that is to be adjusted; and update the at least one of the first bitmask tensor or the second bitmask tensor based on the one or more adjustments.
Li teaches: 
determine a ranking of the first layer and a second layer of the machine learning model; 
(Li, “[0118] In some embodiments, a non-transitory computer-readable storage medium having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations including transforming a first neural network into a binary neural network by processing layers of the first neural network in a composite binary decomposition process, the first neural network having floating-point values, the composite binary decomposition process including a composite operation to expand real matrices or tensors of the first neural network into a first group of a plurality of binary matrices or tensors of the binary neural network, and a decompose operation to decompose one or more binary matrices or tensors of the first group into a second group of a plurality of low rank binary matrices or tensors [determine a ranking of the first layer and a second layer of the machine learning model], the binary matrices or tensors of the second group having lower rank than the binary matrices or tensors of the first group.”)
Li, Esmaeilzadeh, Wang, Mathews and Condessa are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Li with teachings of Esmaeilzadeh, Wang, Mathews and Condessa to add the use of composite decomposition of weights to reduce complexity while retraining representational power, (Li, Abstract).
Esmaeilzadeh in view of Wang, Mathews, Condessa and Li do not teach: 
based on the ranking and a constraint associated with a total amount of parameters of the machine learning model, determine (1) that at least one of a first bitmask tensor corresponding to the first parameter tensor or a second bitmask tensor corresponding to a second parameter tensor is to be adjusted, the second parameter tensor corresponding to the second layer and (2) one or more adjustments to the at least one of the first bitmask tensor or the second bitmask tensor that is to be adjusted; and update the at least one of the first bitmask tensor or the second bitmask tensor based on the one or more adjustments.
GRIGORIEVSKIY teaches: 
based on the ranking and a constraint associated with a total amount of parameters of the machine learning model, determine (1) that at least one of a first bitmask tensor corresponding to the first parameter tensor or a second bitmask tensor corresponding to a second parameter tensor is to be adjusted, the second parameter tensor corresponding to the second layer and (2) one or more adjustments to the at least one of the first bitmask tensor or the second bitmask tensor that is to be adjusted; and 
(GRIGORIEVSKIY, (page: 3 line [34 – 37] – page:4 line [1 – 2]), “In a further possible implementation form of the first aspect, the input data vector comprises D elements, wherein the first parameter tensor comprises M x R x D elements [determine (1) that at least one of a first bitmask tensor corresponding to the first parameter tensor] and the second parameter tensor comprises R x M x D elements [or a second bitmask tensor corresponding to a second parameter tensor]. In a further possible implementation form of the first aspect, the processing circuitry of the data processing apparatus is configured to adjust [is to be adjusted, the second parameter tensor corresponding to the second layer and (2) one or more adjustments to the at least one of the first bitmask tensor or the second bitmask tensor that is to be adjusted;] the integer approximation parameter R.”)

update the at least one of the first bitmask tensor or the second bitmask tensor based on the one or more adjustments.
(GRIGORIEVSKIY, (page: 4 line [31 – 35] – page: 5 line [1 – 2]), “In a further possible implementation form of the first aspect, each element of the approximation tensor, in particular approximation matrix is associated with a respective item of a plurality of items, e.g. N items, wherein the processing circuitry of the data processing apparatus is further configured to adjust the order of the elements of the approximation tensor [update the at least one of the first bitmask tensor or the second bitmask tensor based on the one or more adjustments], in particular approximation matrix on the basis of information about a respective score of a respective item. In an embodiment, the respective score of a respective item may be a measure of a popularity or rating of an item.”)
GRIGORIEVSKIY, Esmaeilzadeh, Wang, Mathews, Condessa and Li are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of GRIGORIEVSKIY with teachings of Esmaeilzadeh, Wang, Mathews, Condessa and Li to form approximation tensors that reduce complexity in neural network computations, (GRIGORIEVSKIY, Abstract).
Regarding claim 18, Esmaeilzadeh in view of Wang, Mathews, Condessa teach the method of claim 11.
Esmaeilzadeh in view of Wang, Mathews, Condessa do not teach: 
determine a ranking of the first layer and a second layer of the machine learning model
based on the ranking and a constraint, determine (1) that at least one of a first bitmask tensor corresponding to the first parameter tensor or a second bitmask tensor corresponding to a second parameter tensor is to be adjusted, the second parameter tensor corresponding to the second layer and (2) one or more adjustments to the at least one of the first bitmask tensor or the second bitmask tensor that is to be adjusted, the constraint associated with a total amount of parameters of the machine learning model; and update the at least one of the first bitmask tensor or the second bitmask tensor based on the one or more adjustments.
Li teaches: 
determine a ranking of the first layer and a second layer of the machine learning model; 
(Li, “[0118] In some embodiments, a non-transitory computer-readable storage medium having stored thereon executable computer program instructions that, when executed by one or more processors, cause the one or more processors to perform operations including transforming a first neural network into a binary neural network by processing layers of the first neural network in a composite binary decomposition process, the first neural network having floating-point values, the composite binary decomposition process including a composite operation to expand real matrices or tensors of the first neural network into a first group of a plurality of binary matrices or tensors of the binary neural network, and a decompose operation to decompose one or more binary matrices or tensors of the first group into a second group of a plurality of low rank binary matrices or tensors [determine a ranking of the first layer and a second layer of the machine learning model], the binary matrices or tensors of the second group having lower rank than the binary matrices or tensors of the first group.”)
Li, Esmaeilzadeh, Wang, Mathews and Condessa are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Li with teachings of Esmaeilzadeh, Wang, Mathews and Condessa to add the use of composite decomposition of weights to reduce complexity while retraining representational power, (Li, Abstract).
Esmaeilzadeh in view of Wang, Mathews, Condessa and Li do not teach: 
based on the ranking and a constraint, determine (1) that at least one of a first bitmask tensor corresponding to the first parameter tensor or a second bitmask tensor corresponding to a second parameter tensor is to be adjusted, the second parameter tensor corresponding to the second layer and (2) one or more adjustments to the at least one of the first bitmask tensor or the second bitmask tensor that is to be adjusted, the constraint associated with a total amount of parameters of the machine learning model; and update the at least one of the first bitmask tensor or the second bitmask tensor based on the one or more adjustments
GRIGORIEVSKIY teaches: 
based on the ranking and a constraint, determine (1) that at least one of a first bitmask tensor corresponding to the first parameter tensor or a second bitmask tensor corresponding to a second parameter tensor is to be adjusted, the second parameter tensor corresponding to the second layer and (2) one or more adjustments to the at least one of the first bitmask tensor or the second bitmask tensor that is to be adjusted, the constraint associated with a total amount of parameters of the machine learning; and 
(GRIGORIEVSKIY, (page: 3 line [34 – 37] – page:4 line [1 – 2]), “In a further possible implementation form of the first aspect, the input data vector comprises D elements, wherein the first parameter tensor comprises M x R x D elements [based on the ranking and a constraint, determine (1) that at least one of a first bitmask tensor corresponding to the first parameter tensor] and the second parameter tensor comprises R x M x D elements [or a second bitmask tensor corresponding to a second parameter tensor]. In a further possible implementation form of the first aspect, the processing circuitry of the data processing apparatus is configured to adjust [is to be adjusted, the second parameter tensor corresponding to the second layer and (2) one or more adjustments to the at least one of the first bitmask tensor or the second bitmask tensor that is to be adjusted] the integer approximation parameter R.”)

update the at least one of the first bitmask tensor or the second bitmask tensor based on the one or more adjustments.
(GRIGORIEVSKIY, (page: 4 line [31 – 35] – page: 5 line [1 – 2]), “In a further possible implementation form of the first aspect, each element of the approximation tensor, in particular approximation matrix is associated with a respective item of a plurality of items, e.g. N items, wherein the processing circuitry of the data processing apparatus is further configured to adjust the order of the elements of the approximation tensor [update the at least one of the first bitmask tensor or the second bitmask tensor based on the one or more adjustments], in particular approximation matrix on the basis of information about a respective score of a respective item. In an embodiment, the respective score of a respective item may be a measure of a popularity or rating of an item.”)
GRIGORIEVSKIY, Esmaeilzadeh, Wang, Mathews, Condessa and Li are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of GRIGORIEVSKIY with teachings of Esmaeilzadeh, Wang, Mathews, Condessa and Li to form approximation tensors that reduce complexity in neural network computations, (GRIGORIEVSKIY, Abstract).
Claim(s) 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Esmaeilzadeh in view of Wang, Mathews, Condessa, Li, GRIGORIEVSKIY and in further view of Yang et al., Pub. No.: US20230237125A1.
Regarding claim 9, Esmaeilzadeh in view of Wang, Mathews, Condessa, Li, GRIGORIEVSKIY teach the method of claim 8.
Esmaeilzadeh in view of Wang, Mathews, Condessa, Li, GRIGORIEVSKIY do not teach: 
wherein one or more of the at least one processor circuit is to determine the ranking of the first layer and the second layer based on at least one of: a first momentum of the first layer and a second momentum of the second layer; or a first Frobenius norm of the first layer and a second Frobenius norm of the second layer
Yang teaches: 
wherein one or more of the at least one processor circuit to determine the ranking of the first layer and the second layer based on at least one of: a first momentum of the first layer and a second momentum of the second layer; or a first Frobenius norm of the first layer and a second Frobenius norm of the second layer.
(Yang, “[0051] In some embodiments, a Frobenius norm may be determined based on a difference between the determined target tensor and the actually obtained target tensor; and an Einstein product of the Frobenius norm may also be determined as the second function [a first Frobenius norm of the first layer and a second Frobenius norm of the second layer]. As an example, E(∥ [Image Omitted] (t)− [Image Omitted] (t)∥F) represents the determined Einstein product of the Frobenius norm.”)
Yang, Esmaeilzadeh, Wang, Mathews, Condessa, Li and GRIGORIEVSKIY are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Yang with teachings of Esmaeilzadeh, Wang, Mathews, Condessa, Li and GRIGORIEVSKIY to add a low-rank tensor decomposition aspect by illustrating how complex tensors can be broken into simpler components to approximate target tensors, (Yang, Abstract).
Regarding claim 19, Esmaeilzadeh in view of Wang, Mathews, Condessa, Li, GRIGORIEVSKIY teach the method of claim 18.
Esmaeilzadeh in view of Wang, Mathews, Condessa, Li, GRIGORIEVSKIY do not teach: 
wherein the processor circuitry is to execute the third instructions to determine the ranking of the first layer and the second layer based on at least one of: a first momentum of the first layer and a second momentum of the second layer; or a first Frobenius norm of the first layer and a second Frobenius norm of the second layer.
Yang teaches: 
wherein the processor circuitry is to execute the third instructions to determine the ranking of the first layer and the second layer based on at least one of: a first momentum of the first layer and a second momentum of the second layer; or a first Frobenius norm of the first layer and a second Frobenius norm of the second layer
(Yang, “[0051] In some embodiments, a Frobenius norm may be determined based on a difference between the determined target tensor and the actually obtained target tensor; and an Einstein product of the Frobenius norm may also be determined as the second function [a first Frobenius norm of the first layer and a second Frobenius norm of the second layer]. As an example, E(∥ [Image Omitted] (t)− [Image Omitted] (t)∥F) represents the determined Einstein product of the Frobenius norm.”)
Yang, Esmaeilzadeh, Wang, Mathews, Condessa, Li and GRIGORIEVSKIY are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Yang with teachings of Esmaeilzadeh, Wang, Mathews, Condessa, Li and GRIGORIEVSKIY to add a low-rank tensor decomposition aspect by illustrating how complex tensors can be broken into simpler components to approximate target tensors, (Yang, Abstract).
Claim(s) 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Esmaeilzadeh in view of Wang, Mathews, Condessa, and in further view of Chan et al., Pub. No.: US20230289604A1. 
Regarding claim 10, Esmaeilzadeh in view of Wang, Mathews, Condessa teach the method of claim 1.
Esmaeilzadeh in view of Wang, Mathews, Condessa do not teach: 
adjust the parameter tensor based on a slimming factor for the machine learning model; and process a tensor output with adversarial normalization for the slimming factor.
Chan teaches:
adjust the parameter tensor based on a slimming factor for the machine learning model; and 
(Chan, “[0042] As described above, the system 100 supports compression of ML models in a security-aware manner that accounts for cyberattacks or threats to ML and AI services, as compared to conventional ML model compression systems and techniques. For example, in addition to pruning the pre-trained ML model represented by the ML model parameters 172 based on the pruning heuristics 176 (in order to achieve target size, accuracy, or other performance metrics), the server 102 may test pruned ML models (i.e., candidate ML models) using the attack model 118, which represent ML and AI-specific cyberattacks and/or edge computing-specific cyberattacks. Based on results of the testing, the server 102 continuously updates the updated heuristics 124 [adjust] and controls the iterative pruning process such that an output ML model represented by the final ML model parameters 130 [the parameter tensor based on a slimming factor for the machine learning model;] not only satisfies one or more performance metrics, but is also robust against (e.g., is secure or prevents/has a decreased likelihood of being exploited by) known cybersecurity threats and attacks, particularly ones designed to exploit ML and AI services.”)

process a tensor output with adversarial normalization for the slimming factor.
(Chan, [0047], “For example, adversarial and adaptive attacks may exploit the complexity of ML models for malicious behaviors, and the security-aware pruning performed by the model compression container 206 may remove excess complexity [process a tensor output with adversarial normalization for the slimming factor], thereby smoothing decision boundaries and increasing the difficulty of such attacks. As another example, a data poising attack may alter the distribution of data (e.g., training data, testing data, input data, etc.) to create malicious behavior for specific inputs, and the security-aware pruning may remove information from the candidate ML models that is necessary to trigger the malicious behavior. As another example, the attack models 212 may include a membership inference attack model, and as a result of the testing, feedback from the attack model may be utilized to determine how to prune the candidate ML network, by differentially looking at the uniquely identifying parts of the candidate ML network and updating the pruning heuristics to eliminate the uniquely identifying nodes.”)
Chan, Esmaeilzadeh, Wang, Mathews and Condessa are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Chan with teachings of Esmaeilzadeh, Wang, Mathews and Condessa to add a security aware optimization aspects by showing how compression techniques can be guided by threat models to make machine learning systems more resilient, (Chan, Abstract).
Regarding claim 20, Esmaeilzadeh in view of Wang, Mathews and Condessa teach the method of claim 11.
Esmaeilzadeh in view of Wang, Mathews and Condessa do not teach: 
adjust the parameter tensor based on a slimming factor for the machine learning model; and in response to a determination that the input data is to be processed as the adversarial data, process a tensor output from the convolution of the input tensor and the noisy parameter tensor with adversarial normalization for the slimming factor.
Chan teaches: 
adjust the parameter tensor based on a slimming factor for the machine learning model; and 
(Chan, “[0042] As described above, the system 100 supports compression of ML models in a security-aware manner that accounts for cyberattacks or threats to ML and AI services, as compared to conventional ML model compression systems and techniques. For example, in addition to pruning the pre-trained ML model represented by the ML model parameters 172 based on the pruning heuristics 176 (in order to achieve target size, accuracy, or other performance metrics), the server 102 may test pruned ML models (i.e., candidate ML models) using the attack model 118, which represent ML and AI-specific cyberattacks and/or edge computing-specific cyberattacks. Based on results of the testing, the server 102 continuously updates the updated heuristics 124 [adjust] and controls the iterative pruning process such that an output ML model represented by the final ML model parameters 130 [the parameter tensor based on a slimming factor for the machine learning model;] not only satisfies one or more performance metrics, but is also robust against (e.g., is secure or prevents/has a decreased likelihood of being exploited by) known cybersecurity threats and attacks, particularly ones designed to exploit ML and AI services.”)
in response to a determination that the input data is to be processed as the adversarial data, process a tensor output from the convolution of the input tensor and the noisy parameter tensor with adversarial normalization for the slimming factor.
(Chan, [0047], “For example, adversarial and adaptive attacks may exploit the complexity of ML models for malicious behaviors, and the security-aware pruning performed by the model compression container 206 may remove excess complexity [in response to a determination that the input data is to be processed as the adversarial data, process a tensor output from the convolution of the input tensor and the noisy parameter tensor with adversarial normalization for the slimming factor], thereby smoothing decision boundaries and increasing the difficulty of such attacks. As another example, a data poising attack may alter the distribution of data (e.g., training data, testing data, input data, etc.) to create malicious behavior for specific inputs, and the security-aware pruning may remove information from the candidate ML models that is necessary to trigger the malicious behavior. As another example, the attack models 212 may include a membership inference attack model, and as a result of the testing, feedback from the attack model may be utilized to determine how to prune the candidate ML network, by differentially looking at the uniquely identifying parts of the candidate ML network and updating the pruning heuristics to eliminate the uniquely identifying nodes.”)
Chan, Esmaeilzadeh, Wang, Mathews and Condessa are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Chan with teachings of Esmaeilzadeh, Wang, Mathews and Condessa to add a security aware optimization aspects by showing how compression techniques can be guided by threat models to make machine learning systems more resilient, (Chan, Abstract).
Claim(s) 61 and 62 are rejected under 35 U.S.C. 103 as being unpatentable over Esmaeilzadeh in view of Wang, Mathews, Condessa, and in further view of Xie, et al. "Adversarial examples improve image recognition."
Regarding claim 61, Esmaeilzadeh in view of Wang, Mathews, Condessa teach the method of claim 1.
Esmaeilzadeh in view of Wang, Mathews, Condessa do not teach:
wherein one or more of the at least one processor circuit is to execute an adversarial batch-normalization sub-layer to generate an adversarial tensor when the data is to be processed as the adversarial data.
Xie teaches: 
wherein one or more of the at least one processor circuit is to execute an adversarial batch-normalization sub-layer to generate an adversarial tensor when the data is to be processed as the adversarial data
(Xie, page: 820, “In this paper, we propose AdvProp, short for Adversarial Propagation, a new training scheme that bridges the distribution mismatch with a simple yet highly effective two batchnorm approach. Specifically, we propose to use two batch norm statistics [an adversarial batch-normalization sub-layer], one for clean images and one auxiliary for adversarial examples [to generate an adversarial tensor when the data is to be processed as the adversarial data]. The two batchnorms properly disentangle the two distributions at normalization layers for accurate statistics estimation. We show this distribution disentangling is crucial, enabling us to successfully improve, rather than degrade, model performance with adversarial examples.”)
Xie, Esmaeilzadeh, Wang, Mathews and Condessa are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Xie with teachings of Esmaeilzadeh, Wang, Mathews, and Condessa to add separate modeling of clean and adversarial data distribution using distinct normalization statistics to improve robustness and generalization by preventing the mixing of statistically different distributions within the same normalization parameters, (Xie, Abstract).
Regarding claim 62, Esmaeilzadeh in view of Wang, Mathews, Condessa teach the method of claim 1.
Esmaeilzadeh in view of Wang, Mathews, Condessa do not teach:
wherein one or more of the at least one processor circuit is to execute a clean batch-normalization sub-layer to generate a clean tensor when the data is to be processed as the clean data.
Xie teaches: 
wherein one or more of the at least one processor circuit is to execute a clean batch-normalization sub-layer to generate a clean tensor when the data is to be processed as the clean data
(Xie, page: 820, “In this paper, we propose AdvProp, short for Adversarial Propagation, a new training scheme that bridges the distribution mismatch with a simple yet highly effective two batchnorm approach. Specifically, we propose to use two batch norm statistics [execute a clean batch-normalization sub-layer], one for clean images [to generate a clean tensor when the data is to be processed as the clean data] and one auxiliary for adversarial examples. The two batchnorms properly disentangle the two distributions at normalization layers for accurate statistics estimation. We show this distribution disentangling is crucial, enabling us to successfully improve, rather than degrade, model performance with adversarial examples.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Xie with teachings of Esmaeilzadeh, Wang, Mathews, Condessa for the same reasons disclosed for claim 61.
Claim 63 is rejected under 35 U.S.C. 103 as being unpatentable over Esmaeilzadeh in view of Wang, Mathews, Condessa, and in further view of Karam, et al. Pub. No.: US11030485B2.
Esmaeilzadeh in view of Wang, Mathews, Condessa teach the method of claim 1.
Esmaeilzadeh in view of Wang, Mathews, Condessa do not teach:
wherein the weight tensor is combined with the noise tensor to generate a noisy weight tensor for convolution with the IFM of the perturbed input image.
Karam teaches: 
wherein the weight tensor is combined with the noise tensor to generate a noisy weight tensor for convolution with the IFM of the perturbed input image
(Karam, (col. 23, line [39 – 55]), “FIG. 15 shows how our DeepCorrect framework can be used as a defense against adversarial attacks and adversarial perturbations through resilient feature regeneration. Convolutional filter activations in the baseline DNN (top) are first sorted in order of vulnerability to adversarial noise [wherein the weight tensor is combined with the noise tensor to generate a noisy weight tensor] (e.g., using their respective filter weight norms or other ranking metric). We deploy feature transformations in the form of defender units, consisting of a residual block with a single skip connection (4 layers) at the output of vulnerable activations (high-noise features) to regenerate input features into noise-resilient features that restore the lost accuracy of the baseline DNN, while leaving the remaining filter activations (low-noise features) unchanged. We train these units on both clean and perturbed images [for convolution with the IFM of the perturbed input image] in every mini-batch using the same target loss as the baseline DNN such that all parameters of the baseline DNN are left unchanged during training.”)
Karam, Esmaeilzadeh, Wang, Mathews and Condessa are related to the same field of endeavor (i.e.: adversarial training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Karam with teachings of Esmaeilzadeh, Wang, Mathews and Condessa to add identification and ranking of vulnerable internal feature to improve adversarial robustness, (Karam, Abstract).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Awais, et al., "Towards an adversarially robust normalization approach." (2020).
Awais investigates how BatchNorm causes this vulnerability and proposed new normalization that is robust to adversarial attacks. It observe that adversarial images tend to shift the distribution of BatchNorm input, and this shift makes train-time estimated population statistics inaccurate.
Ioffe, et al., "Batch renormalization: Towards reducing minibatch dependence in batch-normalized models" , 2019.
Ioffe propose Batch Renormalization, a simple and effective extension to ensure that the training and inference models generate the same outputs that de pend on individual examples rather than the entire minibatch.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner
should be directed to MATIYAS T MARU whose telephone number is (571)270-0902 or via email: matiyas.maru@uspto.gov. The examiner can normally be reached Monday - Friday (8:00am - 4:00pm) EST.
		Examiner interviews are available via telephone, in-person, and video conferencing using a
USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to
use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor,
Michelle Bechtold can be reached on (571) 431-0762. The fax phone number for the organization were this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from
Patent Center. Unpublished application information in Patent Center is available to registered users.
To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit
https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and
https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional
questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like
assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA)
or 571-272-1000.

/M.T.M./       Examiner, Art Unit 2148                                                                                                                                                                                                 
/MICHELLE T BECHTOLD/       Supervisory Patent Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Jun 29, 2022
Application Filed
Aug 18, 2022
Response after Non-Final Action
Sep 29, 2025
Non-Final Rejection — §101, §103
Dec 22, 2025
Applicant Interview (Telephonic)
Dec 22, 2025
Examiner Interview Summary
Dec 26, 2025
Response Filed
Feb 17, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/367,134
Patent 12586114
GENERATING DIGITAL RECOMMENDATIONS UTILIZING COLLABORATIVE FILTERING, REINFORCEMENT LEARNING, AND INCLUSIVE SETS OF NEGATIVE FEEDBACK
2y 5m to grant Granted Mar 24, 2026
17/138,890
Patent 12572796
METHODS AND SYSTEMS FOR GENERATING RECOMMENDATIONS FOR COUNTERFACTUAL EXPLANATIONS OF COMPUTER ALERTS THAT ARE AUTOMATICALLY DETECTED BY A MACHINE LEARNING ALGORITHM
2y 5m to grant Granted Mar 10, 2026
17/161,575
Patent 12567004
METHOD OF MACHINE LEARNING TRAINING FOR DATA AUGMENTATION
2y 5m to grant Granted Mar 03, 2026
17/329,627
Patent 12561588
Methods and Systems for Generating Example-Based Explanations of Link Prediction Models in Knowledge Graphs
2y 5m to grant Granted Feb 24, 2026
17/384,253
Patent 12561584
TEACHING DATA PREPARATION DEVICE, TEACHING DATA PREPARATION METHOD, AND PROGRAM
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
58%
Grant Probability
70%
With Interview (+12.5%)
4y 6m
Median Time to Grant
Moderate
PTA Risk
Based on 40 resolved cases by this examiner. Grant probability derived from career allow rate.