Last updated: April 19, 2026
Application No. 17/228,126
DEVICE AND METHOD FOR TRAINING A CLASSIFIER AND ASSESSING THE ROBUSTNESS OF A CLASSIFIER

Final Rejection §103
Filed
Apr 12, 2021
Examiner
YI, HYUNGJUN B
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Robert Bosch GmbH
OA Round
4 (Final)
This examiner grants 18% of cases after interview

— +31.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 17 resolved cases, 2023–2026
Examiner Intelligence

YI, HYUNGJUN B View full profile →
Grants only 18% of cases
Career Allow Rate
3 granted / 17 resolved
-37.4% vs TC avg
Strong +32% interview lift
Without
With
+31.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
39 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
26.3%
-13.7% vs TC avg
§103
53.9%
+13.9% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
4.7%
-35.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 17 resolved cases
Office Action

§103
DETAILED ACTION
	This action is responsive to the claims filed on 03/03/2025. Claims 1-4 and 6-13 are pending for examination.
This action is Final.
Response to Amendments
Applicant’s argument that Dong fails to disclose limitation 1(e) because Dong’s g_t is an ‘accumulated gradient’ rather than a ‘perturbation’ (Remarks, pages 2-4) is not persuasive. Under the broadest reasonable interpretation, claim 1(e) does not require the ‘first perturbation’ to be labeled in the prior art with identical nomenclature, nor does claim 1(e) require the perturbation to be expressed with a particular mathematical formula. Rather, step 1(e) broadly requires replacing a prior perturbation-related term used in the iterative adversarial-example generation process with a weighted sum of that prior term and a newly obtained term. Dong teaches exactly such an iterative update. In Algorithm 1, Dong first updates g_{t+1} as 
    PNG
    media_image1.png
    46
    265
    media_image1.png
    Greyscale
, and then updates the adversarial example 
    PNG
    media_image2.png
    42
    240
    media_image2.png
    Greyscale
. Dong further explains that g_t gathers the gradients of prior iterations and that the adversarial example is perturbed in the direction of g_t. Thus, although Dong describes g_t as an accumulated gradient, g_t is the operative perturbation-update term from which the perturbation applied to the adversarial example is derived. Accordingly, Dong teaches, at least under a broadest reasonable interpretation, replacing a prior perturbation/update term with a weighted sum of the prior term and a newly computed term, as recited in step 1(e).
Applicant’s remarks refer to ‘new claims 14–19.’ However, no amendment adding claims 14–19 has been presented. Amendments to claims must be made through a compliant claim listing, and any amendment that adds a new claim must include a complete listing of all claims ever presented, with the status of each claim identified; any claim added by amendment must be indicated with the status ‘New.’ Because the present record does not include an entered amendment adding claims 14–19, no claims 14–19 are before the Examiner, and examination is directed to the currently pending claims only. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3, 6, 7, 11, 12, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Schmidt et al., (US11481681B2) and Dong et al. ((2018). Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9185-9193).) referred to as Schmidt and Dong respectively. 

Claim 1: Schmidt teaches the following limitations:
A computer-implemented method for training a classifier, wherein the classifier is configured to classify input signals of digital image data and/or audio data, and training of the classifier is based on a perturbed input signal obtained by applying a perturbation provided from a plurality of perturbations to an input signal provided from a training dataset, the method comprising the following steps (Schmidt, col. 1, lines 13-24, “The present invention relates to a system and computer-implemented method for training a classification model, e.g., an image classifier, to be robust against perturbations of multiple perturbation types”): 
providing a plurality of initial perturbations (Schmidt, col. 11, lines 20-24, “The classification model of Fig. 4 may be trained to be robust against perturbations of multiple perturbation types. Shown in the figure are perturbation types PT1, 461 up to PTn, 462. The number of perturbation types can for example be at least two, at least three, or at most or at least five. A perturbation type may define a set of allowed perturbations.”, perturbation types define a set of allowed perturbations which forms the plurality of initial perturbations.); 
adapting a perturbation from the plurality of initial perturbations to an input signal, wherein the input signal is randomly drawn from the training dataset and the perturbation is adapted to the input signal such that applying the perturbation to the input signal yields a second input signal, which is classified differently than the first input signal (Schmidt, col. 12, lines 8-20 “In an outer iteration, in operation DSel, 431, a set of training instances TI, 432, may be selected. Perturbation(s) UP, 491 for perturbing training instances TI may be selected among the set of allowed perturbations of the multiple perturbation types to maximize a loss of the classification model for training instances TI. In the example of Fig. 4, this selection comprises operations Desc., 470; Proj., 480; and PSel, 490 which may be repeated in an inner iteration. Based on selected perturbation(s) UP, set of parameters PAR may be updated in an operation POpt, 451, to decrease the loss of the classification model for the training instances TI perturbed by perturbation UP.”, perturbation UP is selected from the plurality of initial perturbations (set of allowed perturbations) in order to adapt it to maximize the loss (a misclassification occurs) for the training instances.
Col. 15, lines 59-63, “If the classification of the perturbed training instance is different from a classification of the training instance, e.g., according to the training dataset TD or according to the classification model, the perturbation may be determined as the current updated perturbation UP”, perturbation UP perturbs the input signal such that it creates a second signal that is classified differently from the original.
Col. 10, lines 52-56, “The training instances are typically labelled instances {xi,yi}i=1, . . . n, e.g., a training instance may comprise a feature vector xi, e.g., a real-valued vector, and a label yi, e.g., a binary or otherwise categorical label”, it is noted that training instances comprises of a feature vector x and desired output label y of an image or audio signal (sound signal disclosed in paragraph 7), which is the input signal disclosed in the claim.
Col. 12, lines 51-53, “Training instances TI may be selected in various ways known per se for gradient descent or similar methods, e.g., randomly, sequentially, etc”, it is noted that the selection of training instances (input signal) is random.); 
providing a subset of the plurality of initial perturbations as the plurality of perturbations (Schmidt, col. 13, lines 9-13, “perturbations UP may be selected among the sets of allowed perturbations of the multiple perturbation types PT1, ..., PTn to maximize a loss of the classification model for training instances TI when perturbed by perturbations UP.”, perturbations UP is the subset of the plurality of initial perturbations provided to the classification model for training (the set of allowed perturbations defined by perturbation types.). Furthermore, paragraph 57 discloses providing perturbation(s) UP for training the classification model.); 
and training the classifier based on the plurality of perturbations (Schmidt, col. 12, lines 21-26, “In other words, training the classification model may comprise (1) backpropagating training instances through the optimization problem, wherein a training instance may be backpropagated by solving the inner maximization to obtain updated perturbation(s) UP and (2) backpropagating the perturbed input instance through the outer optimization.”, the plurality of perturbations (perturbations UP) are used to train the classification model to strengthen it against adversarial responses). 
wherein the step of training the classifier includes the following steps: a. selecting a first perturbation from the plurality of perturbations and selecting a first input signal and a corresponding desired output signal from the training dataset (Schmidt, col. 12, lines 9-14, “a set of training instances TI, 432, may be selected. Perturbation(s) UP, 491 for perturbing training instances TI may be selected among the set of allowed perturbations of the multiple perturbation types to maximize a loss of the classification model for training instances TI.”, as disclosed previously, training instances TI comprises of a input feature vector x as well as the corresponding label (desired output)); 
b. obtaining a second perturbation, which is stronger than the first perturbation, by adapting the first perturbation based on the first input signal, the corresponding desired output signal and the classifier (Schmidt, col. 13, lines 51-55, “In operation Desc., 470, to determine an updated perturbation UPi allowed by a perturbation type PTi, a respective update U1, 471, . . . , Un, 472 to the perturbation UP may be determined to increase the loss of the classification model for the set of training instances TI.” An updated perturbation UP is adapted from the set of allowed perturbations, which is stronger than the perturbation from the set of allowed perturbations since it is updated to increase the loss of the classification model for the set of training instances TI.); 
c. obtaining a first adversarial example by applying the second perturbation to the input signal (Schmidt, col. 13, lines 33-47, “a selection PSel, 490, may be made of an updated perturbation UP1, ... , UPn that most increases the loss of the classification model, e.g., updated perturbations that more increase the loss of the classification model are favoured over updated perturbations that less increase the loss. For example, updated perturbation UP may be selected among updated perturbations UP1, ... , UPn as 

    PNG
    media_image3.png
    86
    361
    media_image3.png
    Greyscale

e.g., as perturbation that maximizes a loss over one or more selected training instances of applying the perturbation.”, a first adversarial example (perturbation UP) is selected as the determined perturbation UP used in the stochastic gradient descent algorithm for training the classifier. The algorithm pictured above is used to determine a perturbation UP from a set of perturbations UP that maximizes the loss over the selected training instances of applying the perturbation); 
d. adapting the classifier by training the classifier based on the first adversarial example and the corresponding desired output signal to harden the classifier against the second perturbation (Schmidt, col. 15, lines 44-46, “For example, backpropagation may be performed, e.g., with stochastic gradient descent, to update the set of parameters PAR with gradient

    PNG
    media_image4.png
    64
    392
    media_image4.png
    Greyscale

where i sums over selected training instances Tl and xi+ 8'(x) denotes a selected training instance perturbed by a determined perturbation UP”, the classification model is adapted via back propagation with the first adversarial example (training instance perturbed by a determined perturbation UP) and the corresponding desired output y. This “hardens” the classification model by making it more robust to adversarial examples perturbed by the second perturbation (perturbation UP).); 
and f. repeating steps a. to e (Schmidt, col. 16, lines 17-39, “Training the classification model…may comprise, in an outer iteration of the training procedure, performing the following operations: 

    PNG
    media_image5.png
    471
    1054
    media_image5.png
    Greyscale
,
The aforementioned steps a-e as disclosed in the limitations above are repeated as an outer iteration in order to train the classification model”).  
Dong in the same field of adversarial training using perturbed input, teaches the following limitation which Schmidt fails to teach:
e. replacing the first perturbation in the plurality of perturbations by a weighted sum of the first perturbation and the second perturbation (Dong, page 9187, col. 1, paragraph 1, “Iterative methods [9] iteratively apply fast gradient multiple times with a small step size α. The iterative version of FGSM (I-FGSM) can be expressed as:

    PNG
    media_image6.png
    26
    318
    media_image6.png
    Greyscale

To make the generated adversarial examples satisfy the L∞ (or L2) bound, one can clip x ∗ t into the ǫ vicinity of x or simply set α = ǫ/T with T being the number of iterations. It has been shown that iterative methods are stronger whitebox adversaries than one-step methods at the cost of worse transferability [10, 24]… the momentum variant of iterative fast gradient method (MIFGM) can be written as 
    PNG
    media_image2.png
    42
    240
    media_image2.png
    Greyscale
”, above is the formula for updating adversarial examples used in the model, unlike how Schmidt updates their perturbation, Dong uses a weighted sum of previous perturbations accumulated iteratively. In particular, Dong discloses the update formula 
    PNG
    media_image1.png
    46
    265
    media_image1.png
    Greyscale
. Here, g_t is the value of the perturbation from the previous step (i.e., the “first perturbation” in the claim), and the term 
    PNG
    media_image7.png
    36
    99
    media_image7.png
    Greyscale
 is a newly-computed perturbation direction for the current input (x), output (y), and classifier (classifier loss function J) (serving as the “second perturbation” in the claim). The formula then combines these two using a weighted sum (with weights μ and 1, respectively). The resulting g{t+1} replaces the previous perturbation for generating new adversarial examples.);
Schmidt and Dong both teach systems or techniques for perturbing the input of a neural network in order to train the model against adversarial examples. Schmidt teaches an optimization formula for updating its perturbed input, not involving a weighted sum of previous perturbations. Dong does teach a system that iteratively updates perturbations with a weighted sum of previous perturbations. It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Schmidt with the teachings disclosed Dong (i.e., using a weighted sum for determining a new updated perturbation). A motivation for the combination is to increase the probability of deceiving a classification in a white box system, allowing for higher fully trained success rate due to the inherent transparency of white box systems. (Dong, page 9186, col. 1, paragraph 1, “We show that the adversarial examples generated by momentum iterative methods have higher success rates in both white-box and black-box attacks. The proposed methods alleviate the trade-off between the white-box attacks and the transferability, and act as a stronger attack algorithm than one-step methods [5] and vanilla iterative methods [9].”, a higher deception rate for white box systems than other techniques, showing that the technique in Dong further improves the chances of a model detecting adversarial examples in data.).

Claim 3: Schmidt and Dong teaches the limitations of claim 1, Schmidt further teaches:
wherein at least one perturbation from the plurality of initial perturbations is provided by randomly sampling a first input signal from the training dataset or a second dataset, adapting a plurality of values included in the first input signal, and providing the adapted input signal as a perturbation (Schmidt, col. 12, lines 8-21, “In an outer iteration, in operation DSel, 431, a set of training instances TI, 432, may be selected. Perturbation(s) UP, 491 for perturbing training instances TI may be selected among the set of allowed perturbations of the multiple perturbation types to maximize a loss of the classification model for training instances TI. In the example of Fig. 4, this selection comprises operations Desc., 470; Proj., 480; and PSel, 490 which may be repeated in an inner iteration. Based on selected perturbation(s) UP, set of parameters PAR may be updated in an operation POpt, 451, to decrease the loss of the classification model for the training instances TI perturbed by perturbation UP.”, perturbation UP is selected from the plurality of initial perturbations (set of allowed perturbations) in order to adapt it to maximize the loss (a misclassification occurs) for the selected training instances. The parameters of the input signal are then updated with the updated perturbation UP, through operation POpt 451.
Col. 12, lines 51-53, “Training instances TI may be selected in various ways known per se for gradient descent or similar methods, e.g., randomly, sequentially, etc”, it is noted that the selection/sampling of training instances (input signal) is random.).  

Claim 6: Schmidt teaches the following limitations:
training a classifier including: providing a plurality of initial perturbations (Schmidt, col. 11, lines 20-24, “The classification model of Fig. 4 may be trained to be robust against perturbations of multiple perturbation types. Shown in the figure are perturbation types PT1, 461 up to PTn, 462. The number of perturbation types can for example be at least two, at least three, or at most or at least five. A perturbation type may define a set of allowed perturbations.”, perturbation types define a set of allowed perturbations which forms the plurality of initial perturbations.), 
adapting a perturbation from the plurality of initial perturbations to a first input signal, wherein the first input signal is randomly drawn from the training dataset and the perturbation is adapted to the first input signal such that applying the perturbation to the first input signal yields a second input signal, which is classified differently than the first input signal (Schmidt, col. 12, lines 8-21, “In an outer iteration, in operation DSel, 431, a set of training instances TI, 432, may be selected. Perturbation(s) UP, 491 for perturbing training instances TI may be selected among the set of allowed perturbations of the multiple perturbation types to maximize a loss of the classification model for training instances TI. In the example of Fig. 4, this selection comprises operations Desc., 470; Proj., 480; and PSel, 490 which may be repeated in an inner iteration. Based on selected perturbation(s) UP, set of parameters PAR may be updated in an operation POpt, 451, to decrease the loss of the classification model for the training instances TI perturbed by perturbation UP.”, perturbation UP is selected from the plurality of initial perturbations (set of allowed perturbations) in order to adapt it to maximize the loss (a misclassification occurs) for the training instances.), 
providing a subset of the plurality of initial perturbations as a plurality of perturbations (Schmidt, col. 13, lines 9-13, “perturbations UP may be selected among the sets of allowed perturbations of the multiple perturbation types PT1, ..., PTn to maximize a loss of the classification model for training instances TI when perturbed by perturbations UP.”, perturbations UP is the subset of the plurality of initial perturbations provided to the classification model for training (the set of allowed perturbations defined by perturbation types. Furthermore, paragraph 57 discloses providing perturbation(s) UP for training the classification model.), 
and training the classifier based on the plurality of perturbations (Schmidt, col. 12, lines 21-26, “In other words, training the classification model may comprise (1) backpropagating training instances through the optimization problem, wherein a training instance may be backpropagated by solving the inner maximization to obtain updated perturbation(s) UP and (2) backpropagating the perturbed input instance through the outer optimization.”, the plurality of perturbations (perturbations UP) are used to train the classification model to strengthen it against adversarial responses); 
providing the classifier in a control system (Schmidt, col. 9, lines 62-63, “For example, vehicle 62 may incorporate the classification system to control the vehicle based on images”); 
obtaining the output signal from the control system, wherein the control system supplies the input signal to the classifier to obtain the output signal (Schmidt, col. 9, lines 62-67, and col. 10, lines 1-5, “For example, vehicle 62 may incorporate the classification system to control the vehicle based on images obtained from a camera 22. For example, automotive control system 300 may comprise a camera interface (not shown separately) for obtaining an image of an environment 50 of the vehicle from the camera 22. The classification system may be configured to classify the image obtained from camera 22 according to the classification model to detect an object in the environment 50 of the vehicle, for example, a traffic sign or an obstacle with which the vehicle is at risk of colliding.”, the classification model is used in the control system (car) to classify images (output signal) obtained from a camera (input signal)); 
and providing the output signal for controlling the control system (Schmidt, specification page 5, col. 2, lines 13-19, “Control system 300 may further comprise an actuator interface (not shown separately) for providing, to an actuator, actuator data causing the actuator to effect an action to control vehicle 62. Automotive control system 300 may be configured to determine actuator data to control vehicle 62 based at least on part on this detection; and to provide the actuator data to the actuator via the actuator interface.”, based on the detected image (output signal) the control system may use an actuator to control the vehicle).  
wherein the step of training the classifier includes the following steps: a. selecting a first perturbation from the plurality of perturbations and selecting a first input signal and a corresponding desired output signal from the training dataset (Schmidt, col. 12, lines 9-14, “a set of training instances TI, 432, may be selected. Perturbation(s) UP, 491 for perturbing training instances TI may be selected among the set of allowed perturbations of the multiple perturbation types to maximize a loss of the classification model for training instances TI.”, as disclosed previously, training instances TI comprises of a input feature vector x as well as the corresponding label (desired output)); 
b. obtaining a second perturbation, which is stronger than the first perturbation, by adapting the first perturbation based on the first input signal, the corresponding desired output signal and the classifier (Schmidt, col. 13, lines 51-55, “In operation Desc., 470, to determine an updated perturbation UPi allowed by a perturbation type PTi, a respective update U1, 471, . . . , Un, 472 to the perturbation UP may be determined to increase the loss of the classification model for the set of training instances TI.” An updated perturbation UP is adapted from the set of allowed perturbations, which is stronger than the perturbation from the set of allowed perturbations since it is updated to increase the loss of the classification model for the set of training instances TI.); 
c. obtaining a first adversarial example by applying the second perturbation to the input signal (Schmidt, col. 13, lines 33-47, “a selection PSel, 490, may be made of an updated perturbation UP1, ... , UPn that most increases the loss of the classification model, e.g., updated perturbations that more increase the loss of the classification model are favoured over updated perturbations that less increase the loss. For example, updated perturbation UP may be selected among updated perturbations UP1, ... , UPn as 

    PNG
    media_image3.png
    86
    361
    media_image3.png
    Greyscale

e.g., as perturbation that maximizes a loss over one or more selected training instances of applying the perturbation.”, a first adversarial example (perturbation UP) is selected as the determined perturbation UP used in the stochastic gradient descent algorithm for training the classifier. The algorithm pictured above is used to determine a perturbation UP from a set of perturbations UP that maximizes the loss over the selected training instances of applying the perturbation); 
d. adapting the classifier by training the classifier based on the first adversarial example and the corresponding desired output signal to harden the classifier against the second perturbation (Schmidt, col. 15, lines 44-46, “For example, backpropagation may be performed, e.g., with stochastic gradient descent, to update the set of parameters PAR with gradient

    PNG
    media_image4.png
    64
    392
    media_image4.png
    Greyscale

where i sums over selected training instances Tl and xi+ 8'(x) denotes a selected training instance perturbed by a determined perturbation UP”, the classification model is adapted via back propagation with the first adversarial example (training instance perturbed by a determined perturbation UP) and the corresponding desired output y. This “hardens” the classification model by making it more robust to adversarial examples perturbed by the second perturbation (perturbation UP).); 
and f. repeating steps a. to e (Schmidt, col. 16, lines 17-39, “Training the classification model…may comprise, in an outer iteration of the training procedure, performing the following operations: 

    PNG
    media_image5.png
    471
    1054
    media_image5.png
    Greyscale
,
The aforementioned steps a-e as disclosed in the limitations above are repeated as an outer iteration in order to train the classification model”).  
wherein a control signal is generated based on the output signal, the control signal configured to control a physical action of a vehicle or a robot (Schmidt, col. 9, lines 62-67, and col. 10, lines 1-5, “For example, vehicle 62 may incorporate the classification system to control the vehicle based on images obtained from a camera 22. For example, automotive control system 300 may comprise a camera interface (not shown separately) for obtaining an image of an environment 50 of the vehicle from the camera 22. The classification system may be configured to classify the image obtained from camera 22 according to the classification model to detect an object in the environment 50 of the vehicle, for example, a traffic sign or an obstacle with which the vehicle is at risk of colliding.”, the control system creates control signals to control a vehicle based on classified images (output signal))
Dong in the same field of adversarial training using perturbed input, teaches the following limitation which Schmidt fails to teach:
e. replacing the first perturbation in the plurality of perturbations by a linear combination of the first perturbation and the second perturbation (Dong, page 9187, col. 1, paragraph 1, “Iterative methods [9] iteratively apply fast gradient multiple times with a small step size α. The iterative version of FGSM (I-FGSM) can be expressed as:

    PNG
    media_image6.png
    26
    318
    media_image6.png
    Greyscale

To make the generated adversarial examples satisfy the L∞ (or L2) bound, one can clip x ∗ t into the ǫ vicinity of x or simply set α = ǫ/T with T being the number of iterations. It has been shown that iterative methods are stronger whitebox adversaries than one-step methods at the cost of worse transferability [10, 24]… the momentum variant of iterative fast gradient method (MIFGM) can be written as 
    PNG
    media_image2.png
    42
    240
    media_image2.png
    Greyscale
”, above is the formula for updating adversarial examples used in the model, unlike how Schmidt updates their perturbation, Dong uses a weighted sum of previous perturbations accumulated iteratively. In particular, Dong discloses the update formula 
    PNG
    media_image1.png
    46
    265
    media_image1.png
    Greyscale
. Here, g_t is the value of the perturbation from the previous step (i.e., the “first perturbation” in the claim), and the term 
    PNG
    media_image7.png
    36
    99
    media_image7.png
    Greyscale
 is a newly-computed perturbation direction for the current input (x), output (y), and classifier (classifier loss function J) (serving as the “second perturbation” in the claim). The formula then combines these two using a weighted sum (with weights μ and 1, respectively). The resulting g{t+1} replaces the previous perturbation for generating new adversarial examples.);


Claim 7: Schmidt and Dong teaches the limitations of claim 6. Schmidt further teaches:
wherein the input signal is obtained based on a signal of a sensor and/or an actuator is controlled based on the output signal and/or a display device is controlled based on the output signal (Schmidt, col. 2, lines 47-49, “Typically, instances are represented as vectors of numbers, e.g., a vector may represent an image, one or more sensor readings, a sound signal, etc.”, input is based on a signal from a sensor.
Col. 10, lines 13-19, “Control system 300 may further comprise an actuator interface (not shown separately) for providing, to an actuator, actuator data causing the actuator to effect an action to control vehicle 62. Automotive control system 300 may be configured to determine actuator data to control vehicle 62 based at least on part on this detection; and to provide the actuator data to the actuator via the actuator interface.”, and actuator is controlled based on the output (detection) of the classification model).  

Claims 11 is directed to a non-transitory machine readable storage medium on which is stored a computer program for performing the method of claim 1. Therefore, the rejection of claim 1 applies to claim 11. 
Furthermore, Schmidt, col. 18, lines 36-43, “The method(s) may be implemented on a computer as a computer implemented method, as dedicated hardware, or as a combination of both. As also illustrated in Fig. 9 , instructions for the computer, e.g., executable code, may be stored on a computer readable medium 900, e.g., in the form of a series 910 of machine-readable physical marks and/or as a series of elements having different electrical, e.g., magnetic, or optical properties or values. The executable code may be stored in a transitory or non-transitory manner.”, discloses the aforementioned non-transitory storage medium performing the method.
Claim 12 is directed to a control system configured to control an actuator and/or a display device based on an output signal of a classifier, the control system comprising the classifier to perform the method of claim 1. Therefore, the rejection of claim 1 applies to claim 11.
Claim 13 is directed to a training system configured to train a classifier to perform the method of claim 1. Therefore, the rejection of claim 1 applies to claim 13.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Schmidt, Dong, and Singh et al. (US11468314), hereinafter referred to as Singh.
Claim 2: Schmidt and Dong teaches the limitations of claim 1. Singh, in the same field of adversarial training, teaches the following limitations which Schmidt fails to teach:
wherein at least one perturbation from the plurality of initial perturbations is provided by randomly drawing a noise signal and providing the noise signal as perturbation (Singh, paragraph 54, “In this type of adversarial attack, a random noise (or perturbation) is added to input x to generate x′”, random noise is added to the input as a perturbation).  
Schmidt, Dong, and Singh teach systems or techniques for training a neural network with adversarial examples. Although similar, Schmidt and Dong both teach generating and updating perturbations with respect to a loss function. Schmidt and Dong do not teach a way to generate perturbations using a random process. Singh does teach a system that uses a random perturbation generation process using noise. It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Schmidt and Dong with the teachings disclosed Singh (i.e., randomly drawing a noise signal as the perturbation). A motivation for the combination is to increase the accuracy of correctly classified adversarial examples compared to conventionally trained neural networks (Singh, paragraph 65, “As can be seen in TABLE III, CNNs trained using the regularizer significantly outperforms conventionally trained CNNs.”, CNNs trained with the regularizer comprises of training the neural network randomly perturbed signals).

	Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Schmidt, Dong, Singh, and Ratner et. al. (US11568183), hereinafter referred to as Ratner.
Claim 4: Schmidt and Singh teaches the limitations of claim 2. Ratner, in the same field of adversarial training, teaches the following limitations which Schmidt and Singh fail to teach:
wherein in the step of adapting the perturbation, the perturbation is applied to a region of the input signal for obtaining a perturbed input signal (Ratner, paragraph 9, “the perturbation may be the change in value of a pixel from an input to the perturbed input. As one example, if a pixel has a value of 40 in the input and a value of 60 in the perturbed input, then the perturbation may have a value of 20.”, a perturbation is applied to a region (a pixel) of the input signal.).  
Schmidt, Dong, Singh, and Ratner teach systems or techniques for training a neural network with adversarial examples. Although similar, Schmidt, Dong, and Singh teach perturbing an entire image. Schmidt, Dong, and Singh do not teach a way to apply perturbations by incorporating it to a region of the image. Ratner does teach a system that incorporates perturbations into regions of the instance of input data. It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Schmidt, Dong, and Singh with the teachings disclosed Ratner (i.e., adapting the perturbation by applying it to a region of the input signal). A motivation for the combination is to adjust a saliency metric while keeping the parameters of the model constant, highlighting key regions of interest while also not changing model parameters for observation. (Ratner, paragraph 9, “The processor can iteratively generate an adversarial example, referred to herein as a perturbed input, that optimizes a saliency metric including a classification term, a sparsity term, and a smoothness term, while keeping parameters of the model constant.”).

	Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Schmidt, Dong, Singh, and Bastani et. al. (Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A., & Criminisi, A. (2016). Measuring neural net robustness with constraints. Advances in neural information processing systems, 29.), hereinafter referred to as Bastani.
Claim 8: Schmidt teaches the following limitations:
b. selecting a first perturbation form the plurality of perturbations and selecting an input signal and a corresponding desired output signal from a test dataset (Schmidt,  col. 12, lines 9-14, “a set of training instances TI, 432, may be selected. Perturbation(s) UP, 491 for perturbing training instances TI may be selected among the set of allowed perturbations of the multiple perturbation types to maximize a loss of the classification model for training instances TI.”, it is noted that the training instances comprise of input signals (feature vector x) and corresponding desired outputs (labels)); 
c. obtaining a second perturbation, which is stronger than the first perturbation, by adapting the first perturbation based on the input signal, the corresponding desired output signal and the classifier (Schmidt, col. 13, lines 51-55, “In operation Desc., 470, to determine an updated perturbation UPi allowed by a perturbation type PTi, a respective update U1, 471, . . . , Un, 472 to the perturbation UP may be determined to increase the loss of the classification model for the set of training instances TI.”, an updated perturbation UP (second perturbation) from the selected perturbation (perturbation UP) of the plurality of initial perturbations (set of allowed perturbations) is obtained by maximizing the loss function corresponding to the selected training instances TI. The second perturbation (perturbation UP) is stronger than the first perturbation (perturbation from set of allowed perturbations) because it further increases the loss of the classification model. Maximizing the loss of a classification model includes incorporating values of the desired output and the results from the classifier); 
e. Repeating steps b. to d. for a predefined number of iterations (Schmidt, col. 13, lines 14-18, “As shown in FIG. 4, perturbations UP may themselves be determined in an inner iterative optimization. For example, the number of iterations of the inner optimization may be at most or at least 50, at most or at least 100, or at most or at least 200.”,); 
f. determining a strongest perturbation from the plurality of perturbations with respect to the test dataset after having completed the predefined number of iterations (Schmidt, col. 13, lines 32-38, “In such a case, a selection PSel, 490, may be made of an updated perturbation UP1, . . . , UPn that most increases the loss of the classification model, e.g., updated perturbations that more increase the loss of the classification model are favoured over updated perturbations that less increase the loss.”, in cases where multiple updated perturbations UP are created, the operation PSel 490 may select an updated perturbation UP from the set of updated perturbations that most increase the loss of the model. This selection is the strongest perturbation since it is the perturbation that maximizes the loss of the classification model.); 
Dong, in the same field of adversarial training using perturbed input, teaches the following limitations which Schmidt fails to teach:
d. replacing the first perturbation in the plurality of perturbations by a linear combination of the first perturbation and the second perturbation (Dong, page 9187, col. 1, paragraph 1, “Iterative methods [9] iteratively apply fast gradient multiple times with a small step size α. The iterative version of FGSM (I-FGSM) can be expressed as:

    PNG
    media_image6.png
    26
    318
    media_image6.png
    Greyscale

To make the generated adversarial examples satisfy the L∞ (or L2) bound, one can clip x ∗ t into the ǫ vicinity of x or simply set α = ǫ/T with T being the number of iterations. It has been shown that iterative methods are stronger whitebox adversaries than one-step methods at the cost of worse transferability [10, 24]… the momentum variant of iterative fast gradient method (MIFGM) can be written as 
    PNG
    media_image2.png
    42
    240
    media_image2.png
    Greyscale
”, above is the formula for updating adversarial examples used in the model, unlike how Schmidt updates their perturbation, Dong uses a weighted sum of previous perturbations accumulated iteratively. In particular, Dong discloses the update formula 
    PNG
    media_image1.png
    46
    265
    media_image1.png
    Greyscale
. Here, g_t is the value of the perturbation from the previous step (i.e., the “first perturbation” in the claim), and the term 
    PNG
    media_image7.png
    36
    99
    media_image7.png
    Greyscale
 is a newly-computed perturbation direction for the current input (x), output (y), and classifier (classifier loss function J) (serving as the “second perturbation” in the claim). The formula then combines these two using a weighted sum (with weights μ and 1, respectively). The resulting g{t+1} replaces the previous perturbation for generating new adversarial examples.);
Singh, in the same field of adversarial training via perturbed input, teaches the following limitation which Schmidt and Dong fail to teach:
a. providing a plurality of initial perturbations via a random generation process (Singh, paragraph 54, “In this type of adversarial attack, a random noise (or perturbation) is added to input x to generate x′”, random noise is added to the input as a perturbation); 
Bastani, in the same field of adversarial training, teaches the following limitation which Schmidt, Dong, and Singh fail to teach:
g. determining a fraction of input signals in the test dataset for which the strongest perturbation is able to cause a misclassification by the classifier and providing the determined fraction as the robustness value (Bastani, page 3, section 2, paragraph 3, “Given a parameter ε, the adversarial frequency

    PNG
    media_image8.png
    33
    197
    media_image8.png
    Greyscale

measures how often f fails to be (x∗, ε)-robust. In other words, if f has high adversarial frequency, then it fails to be (x∗, ε)-robust for many inputs x∗.” to measure a robustness value, an adversarial frequency algorithm is used to count the number of inputs x that misclassify an adversarial example, thus failing to be robust. This fraction of misclassified inputs x forms the adversarial frequency (robustness value).).  
Schmidt, Dong, Singh, and Bastani teach systems or techniques for training a neural network with adversarial examples. Although similar, Schmidt, Dong, and Singh do not teach a way to score the model trained on the adversarial examples for a robustness measure. Bastani does teach determining a robustness value (measure of how well a neural network is trained with adversarial examples) using the fraction of input signals for which a misclassification occurs. It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Schmidt, Dong, and Singh, with the teachings disclosed Bastani (i.e., calculating a fraction of misclassified input as the robustness value for an adversarial trained neural network). A motivation for the combination is to measure accuracy on adversarial examples. (Bastani, page 3, section 2, paragraph 5, “Frequency is typically the more important metric, since a neural net with low adversarial frequency is robust most of the time. Indeed, adversarial frequency corresponds to the accuracy on adversarial examples used to measure robustness in [5, 20].”).

Claim 9: Schmidt, Dong, Singh, and Bastani teaches the limitations of claim 8. Bastani further teaches:
wherein at least one perturbation from the plurality of initial perturbations is provided by randomly drawing a noise signal and providing the noise signal as a perturbation (Singh, paragraph 54, “In this type of adversarial attack, a random noise (or perturbation) is added to input x to generate x′”, random noise is added to the input as a perturbation).  

Claim 10: Schmidt, Dong, Singh, and Bastani teaches the limitations of claim 8. Schmidt further teaches:
wherein at least one perturbation from the plurality of initial perturbations is provided by randomly sampling a first input signal from the training dataset or a second dataset, adapting a plurality of values included in the first input signal, and providing the adapted input signal as a perturbation (Schmidt, col. 12, lines 8-21, “In an outer iteration, in operation DSel, 431, a set of training instances TI, 432, may be selected. Perturbation(s) UP, 491 for perturbing training instances TI may be selected among the set of allowed perturbations of the multiple perturbation types to maximize a loss of the classification model for training instances TI. In the example of Fig. 4, this selection comprises operations Desc., 470; Proj., 480; and PSel, 490 which may be repeated in an inner iteration. Based on selected perturbation(s) UP, set of parameters PAR may be updated in an operation POpt, 451, to decrease the loss of the classification model for the training instances TI perturbed by perturbation UP.”, perturbation UP is selected from the plurality of initial perturbations (set of allowed perturbations) in order to adapt it to maximize the loss (a misclassification occurs) for the selected training instances. The parameters of the input signal are then updated with the updated perturbation UP, through operation POpt 451.
Col. 12, “Training instances TI may be selected in various ways known per se for gradient descent or similar methods, e.g., randomly, sequentially, etc”, it is noted that the selection/sampling of training instances (input signal) is random.).  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Chen, J., Zhou, D., Yi, J., & Gu, Q. (2018). A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks. arXiv preprint arXiv:1811.10828.
Tramer, F., & Boneh, D. (2019). Adversarial training and robustness for multiple perturbations. Advances in neural information processing systems, 32.
Mądry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. stat, 1050, 9.
US 10521718 B1 - Adversarial training of neural networks
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HYUNGJUN B YI whose telephone number is (703)756-4799. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/H.B.Y./Examiner, Art Unit 2124                                                                                                                                                                                                        
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Apr 12, 2021
Application Filed
Mar 05, 2024
Non-Final Rejection — §103
Jun 11, 2024
Response Filed
Aug 27, 2024
Final Rejection — §103
Nov 26, 2024
Response after Non-Final Action
Mar 03, 2025
Request for Continued Examination
Mar 04, 2025
Response after Non-Final Action
Jul 21, 2025
Non-Final Rejection — §103
Dec 24, 2025
Response Filed
Mar 06, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/337,998
Patent 12536429
INTELLIGENTLY MODIFYING DIGITAL CALENDARS UTILIZING A GRAPH NEURAL NETWORK AND REINFORCEMENT LEARNING
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
18%
Grant Probability
49%
With Interview (+31.7%)
4y 7m
Median Time to Grant
High
PTA Risk
Based on 17 resolved cases by this examiner. Grant probability derived from career allow rate.