Office Action Analysis: 17655487 — Method and system for training a neural network for improving adversarial robustness

Office Action

§103 §112
DETAILED ACTION
Status of Claims
This Office action is responsive to communications filed on 2025-09-08. Claim(s) 16 was/were cancelled. Claim(s) 1-15 is/are pending and are examined herein.
Claim(s) 5 and 12 is/are objected to. 
Claim(s) 15 is/are rejected under 35 USC 112(b).
Claim(s) 1-15 is/are rejected under 35 USC 103.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after 2013-03-16, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2025-09-08 has been entered.
 
Response to Arguments
Regarding objections for informalities and rejections under 35 USC 112, the applicant’s amendments resolve the issues raised in the previous Office action. Issues in the pending claims are described below. 

Regarding rejections under 35 USC 101, the rejections are withdrawn upon further consideration of the claims as a whole. 

Regarding rejections under 35 USC 103, the applicant’s arguments have been fully considered but they are unpersuasive: 
The applicant asserts that “the models G and F of Zonooz are not instances of the same neural network” [remarks, page 12]. The examiner respectfully disagrees. Zonooz discloses the use of the same architecture for both models (either ResNet-18 for both, or WRN-28-10 for both) [Zonooz, 0029-0034 and table], and models that use the same architecture do in fact fall under the broadest reasonable interpretation of both being “instances of the same neural network”. 
The applicant asserts that Zonooz does not disclose “shared parameters” [remarks, page 13]. The examiner respectfully disagrees. As explained in previous Office actions (and below), Zonooz discloses that the use of mimicry loss serves to “align each model with the other one” [Zonooz, 0023] so the network parameters which result from this training procedure fall under the broadest reasonable interpretation of being “shared parameters” as recited by the claim. 
The applicant asserts that “Zonooz’s architecture does not require any encoder prior to the classifier” [remarks, page 13]. The examiner respectfully disagrees with this assertion. If, for example, the final layer in the neural network architecture used in Zonooz is mapped to the “classifier” of the claim, then all of the layers prior to the final layer can be regarded as an “encoder” (because these layers “encode” the input into a vector which is used by the final layer of the neural network for classification). The applicant is invited to consult the complete prior art mapping as given below for further details. (The applicant is also invited to consult Lee as cited in the conclusion of this Office action for a reference which may be used to provide an alternative mapping for claim elements which are currently mapped from Zonooz.) 
The applicant argues that the combination of Zonooz with Kingma so as to incorporate a probabilistic encoder into the architecture “is not a routine design choice, but instead demands extraordinary skill and inventive insight” [remarks, page 13]. The examiner respectfully disagrees. Variational auto-encoders as disclosed in Kingma are well-known by those of ordinary skill in the art, and the examiner maintains that it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to incorporate a variational auto-encoder into a classification architecture (as in Zonooz) so as to allow for more a more robust classification system. (The applicant is invited to consult Hoy, Berlin, and Turcot as cited in the conclusion of this Office action for further documentary support regarding this point.)
The complete prior art mapping, updated in view of the applicant’s amendments, is given below. 

Claim Objections
Claim(s) 5 and 12 is/are objected to because of the following informalities: 
Claims 5 and 12 recites a first term corresponding to maximizing a mutual information between probability distributions of encodings of pairs of the clean data samples and the adversarial data samples [emphasis added]. However, the parent claim already introduces a “first probability distribution over the latent space” and a “second probability distribution over the latent space” and the specification [specification, 0034] (as well as claims 4 and 11) indicate that the mutual information being maximized is in fact the one between these probability distributions. For consistency of nomenclature and clarity of antecedent basis, the examiner suggests amending claims 5 and 12 to recite instead “a first term corresponding to maximizing a mutual information between the first probability distribution over the latent space and the second probability distribution over the latent space”.

Appropriate correction is required.
	
Claim Rejections - 35 USC 112(b)
The following is a quotation of 35 USC 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 USC 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim(s) 15 is/are rejected under 35 USC 112(b) or 35 USC 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 USC 112, the applicant), regards as the invention.

Claim 15 recites numerous terms which lack antecedent basis, including at least the neural network, the probabilistic encoder, the classifier, and the shared parameters. The claim is therefore indefinite. For the purpose of compact prosecution, the indefinite claim elements are interpreted by analogy with claims 1 and 8. The examiner suggests amending the preamble of claim 15 to provide antecedent basis for “the neural network”, “the probabilistic encoder”, and “the classifier” (as in claims 1 and 8), and removing the word “the” before “shared parameters” (again, as in claims 1 and 8). 

Claim Rejections - 35 USC 103
The following is a quotation of 35 USC 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 USC 102(b)(2)(C) for any potential 35 USC 102(a)(2) prior art against the later invention.

Claim(s) 1-3, 8-10, and 15 is/are rejected under 35 USC 103 as being unpatentable over Zonooz et al. (US20210166123A1, published 2021-06-03; hereafter “Zonooz”) in view of Kingma et al. (Auto-Encoding Variational Bayes, published 2014-05-01; hereafter “Kingma”).

Claim 1
Zonooz discloses: 
A computer-implemented method for training a neural network, ([Zonooz, 0002, 0035]: Zonooz discloses an invention “relat[ing] to a method for training a robust deep neural network model” [Zonooz, 0002] and implemented by means of “a general or specific purpose computer or distributed system programmed with computer software” [Zonooz, 0035]. The deep neural network model (or, more precisely, its architecture) maps to the “neural network” of the claim.)
the neural network comprising a [probabilistic] encoder and a classifier, ([Zonooz, 0019, 0029]: Zonooz discloses an embodiment which is directed towards “a binary classification problem” [Zonooz, 0019]. It also discloses the use of one of two possible architectures: ResNet-18 and WRN-28-10 [Zonooz, 0029]. Any proper subset of consecutive layers including the final layer can be mapped to the “classifier” of the claim (since it produces the final classification output), and any preceding subset can be mapped to the “encoder” of the claim (since it “encodes” the input). Merely for the sake of concreteness, the remainder of this mapping assumes the ResNet-18 architecture, and, as this architecture has 18 layers, layer 18 can be mapped to the “classifier” of the claim and layers 1-17 to the “encoder” of the claim. See below for more regarding these mappings. In particular, see the combination with Kingma regarding the probabilistic encoder.)
the method comprising: collecting a plurality of data samples as input for training the neural network, wherein the plurality of data samples comprises clean data samples and adversarial data samples; ([Zonooz, 0020]: Zonooz discloses the use of two models: a “standard model [which] is trained on the original images” and a “robust model [which] is trained on adversarial images” [Zonooz, 0020]. The original and adversarial images map, respectively, to the “clean data samples” and the “adversarial data samples” of the claim. See the next parenthetical for more regarding this mapping.)
jointly training a first instance of the neural network and a second instance of the neural network, wherein the first instance of the neural network is trained using the clean data samples to produce a first output and the second instance of the neural network is trained using the adversarial data samples to produce a second output, ([Zonooz, 0009, 0020, 0023, 0029]: As noted above, Zonooz discloses the use of two models: a “standard model [which] is trained on the original images” and a “robust model [which] is trained on adversarial images” [Zonooz, 0020], with both models using the ResNet-18 architecture [Zonooz, 0029]. Furthermore, Zonooz discloses “collaborative[ly]” training the two models “in conjunction” [Zonooz, 0009]. More precisely, it discloses that the training makes use of a “mimicry loss which is used to align each model with the other one” [Zonooz, 0023]. The standard and robust models map, respectively, to the “first instance of the neural network” and the “second instance of the neural network” of the claim, and their outputs map respectively to the “first output” and the “second output” of the claim. The collaborative training of the two models maps to the “jointly training” step of the claim.)
wherein the training of the first instance of the neural network comprises: training the [probabilistic] encoder of the first instance of the neural network to encode the clean data samples [into a first probability distribution over a latent space;] and training the classifier of the first instance of the neural network to classify a first instance of the latent space to produce a first classification result, ([Zonooz, 0020, 0029]: As described above, the standard model maps to the “first instance of the neural network” of the claim, the first 17 layers of the architecture to the “encoder of the first instance of the neural network” of the claim, and the final layer to the “classifier of the first instance of the neural network” of the claim. The output of the “encoder of the first instance of the neural network” maps to the “first instance of the latent space” of the claim (cf. the combination with Kingma as described below for more about the latent space), and the output of the “classifier of the first instance of the neural network” to the “first classification result” of the claim. With these mappings, the function of the “encoder of the first instance of the neural network” falls under the broadest reasonable interpretation of “encod[ing] the clean data samples” as recited by the claim, and, the function of the “classifier of the first instance of the neural network” falls under the broadest reasonable interpretation of “classifiy[ing the] first instance of the latent space to produce a first classification result” as recited by the claim.)
and wherein the training of the second instance of the neural network comprises: training the [probabilistic] encoder of the second instance of the neural network to encode the adversarial data samples [into a second probability distribution over the latent space;] and training the classifier of the second instance of the neural network to classify a second instance of the latent space to produce a second classification result, ([Zonooz, 0020, 0029]: The mappings from this limitation are essentially the same as those explained in the previous parenthetical, except that it is the robust model that maps to the “second instance of the neural network” of the claim.)
optimizing a multi-objective loss function defined on the first output of the first instance of the neural network and the second output of the second instance of the neural network; ([Zonooz, 0023]: Zonooz discloses that “[e]ach model… is trained with two losses: a task specific loss and a mimicry loss which is used to align each model with the other one” [Zonooz, 0023]. The loss function used to train either model (denoted L_G [Zonooz, 0025] or L_F [Zonooz, 0026]) maps to the “multi-objective loss function” of the claim: it is “multi-objective” since it incorporates both the task-specific loss and the mimicry loss, and it defined based on both the “first output” and the “second output” as mapped above.)
and outputting shared parameters of the first instance of the neural network and the second instance of the neural network. ([Zonooz, 0023, 0025-0026]: As noted above, Zonooz discloses using a “mimicry loss which is used to align each model with the other one” [Zonooz, 0023]. Since the use of this mimicry loss results in an alignment between the network parameters of the standard model (denoted φ [Zonooz, 0026]) and those of the robust model (denoted θ [Zonooz, 0025]), the network parameters φ and θ fall under broadest reasonable interpretation of the “shared parameters” of the claim.)

Zonooz does not distinctly disclose the use of a probabilistic encoder. In other words, Zonooz does not distinctly disclose:  
[the neural network comprising a] probabilistic [encoder] … [training the] probabilistic [encoder of the first instance of the neural network to encode the clean data samples] into a first probability distribution over a latent space; … wherein the first instance of the latent space corresponds to the first probability distribution over the latent space, … [training the] probabilistic [encoder of the second instance of the neural network to encode the adversarial data samples] into a second probability distribution over the latent space; … wherein the second instance of the latent space corresponds to the second probability distribution over the latent space,

Kingma is also in the field of machine learning. Moreover, Zonooz in view of Kingma discloses: 
[the neural network comprising a] probabilistic [encoder] … [training the] probabilistic [encoder of the first instance of the neural network to encode the clean data samples] into a first probability distribution over a latent space; … wherein the first instance of the latent space corresponds to the first probability distribution over the latent space, … [training the] probabilistic [encoder of the second instance of the neural network to encode the adversarial data samples] into a second probability distribution over the latent space; … wherein the second instance of the latent space corresponds to the second probability distribution over the latent space, ([Kingma, section 1]: Kingma discloses a “variational auto-encoder” [Kingma, section 1 second paragraph]. In the combination, a variational auto-encoder is used as a part of the “encoder” from Zonooz as mapped above, thereby resulting in the “probabilistic encoder” of the claim. The variational auto-encoder performs “[e]fficient approximate posterior inference of the latent variable z given an observed value x” [Kingma, page 2]. This posterior (denoted p_ θ(z|x) [Kingma, figure 1 caption]) maps to the first/second “probability distribution over the latent space” of the claim, the “latent space” of the claim being the space of values which the latent variable z can take. The probability distribution then falls under the broadest reasonable interpretation of “correspond[ing] to” the first/second “instance of the latent space” as mapped under Zonooz above. The examiner notes that the claim as recited does not require any particular relationship between the first/second “instance of the latent space” and the first/second “probability distribution over the latent space”.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the adversarial training method disclosed by Zonooz with the variational autoencoder disclosed by Kingma because variational autoencoders help to reduce data dimensionality in a way that is well-suited for handling uncertainty, thereby allowing the combination to make more robust classifications. 

Claim 2
Zonooz in view of Kingma discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 1, wherein the optimizing of the multi-objective loss function comprises] minimizing the multi-objective loss function, wherein the multi-objective loss function measures a difference between the first output and the second output. ([Zonooz, 0012, 0023, 0025-0026]: As noted above, the loss functions disclosed in Zonooz map to the “multi-objective loss function” of the claim. Zonooz specifically indicates that the loss function is “minimize[d]” [Zonooz, 0012; see also, 0023 and 0026]. The loss function uses “Kullback-Leibler Divergence (D_{KL}) as the mimicry loss” [Zonooz, 0023; see also, 0025-0026]. This Kullback-Leibler divergence falls under the broadest reasonable interpretation of “measur[ing] a difference between the first output and the second output” as recited by the claim.) 

The same motivation to combine applies. 

Claim 3
Zonooz in view of Kingma discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 2, wherein] the joint training is performed with a plurality of latent representations for the clean data samples and the adversarial samples that are sampled multiple times. ([Zonooz, 0009, 0020, 0027 algorithm 1]: As noted under the parent claim, Zonooz discloses “collaborative[ly]” training two models [Zonooz, 0009], one on original data and another on adversarial data [Zonooz, 0020]. This process includes multiple iterations of sampling original data and generating adversarial examples, one for each iteration of the “while” loop in [Zonooz, 0027 algorithm 1]. In other words, each iteration of the loop is one of the “multiple times” of the claim, and the results produced by the encoder during all of these iterations map to the “plurality of latent representations” of the claim.)

The same motivation to combine applies. 

Claim 8
Zonooz discloses: 
An artificial intelligence (Al) system for training a neural network for classifying a plurality of data samples, ([Zonooz, 0002, 0035]: Zonooz discloses an invention “relat[ing] to a method for training a robust deep neural network model” [Zonooz, 0002] and implemented by means of “a general or specific purpose computer or distributed system programmed with computer software” [Zonooz, 0035]. The computer maps to the “AI system” of the claim, and the deep neural network model (or, more precisely, its architecture) maps to the “neural network” of the claim.)
the neural network comprising a [probabilistic] encoder and a classifier, ([Zonooz, 0019, 0029]: Zonooz discloses an embodiment which is directed towards “a binary classification problem” [Zonooz, 0019]. It also discloses the use of one of two possible architectures: ResNet-18 and WRN-28-10 [Zonooz, 0029]. Any proper subset of consecutive layers including the final layer can be mapped to the “classifier” of the claim (since it produces the final classification output), and any preceding subset can be mapped to the “encoder” of the claim (since it “encodes” the input). Merely for the sake of concreteness, the remainder of this mapping assumes the ResNet-18 architecture, and, as this architecture has 18 layers, layer 18 can be mapped to the “classifier” of the claim and layers 1-17 to the “encoder” of the claim. See below for more regarding these mappings. In particular, see the combination with Kingma regarding the probabilistic encoder.)
the Al system comprising: a processor; and a memory having instructions stored thereon, wherein the processor is configured to execute the stored instructions to cause the Al system to: ([Zonooz, 0035]: Zonooz discloses that “one or more processors and/or microcontrollers can operate via instructions of the computer code and the software is preferably stored on one or more tangible non-transitory memory storage devices” [Zonooz, 0035].)
collect a plurality of data samples as input for training the neural network, wherein the plurality of data samples comprises clean data samples and adversarial data samples; ([Zonooz, 0020]: Zonooz discloses the use of two models: a “standard model [which] is trained on the original images” and a “robust model [which] is trained on adversarial images” [Zonooz, 0020]. The original and adversarial images map, respectively, to the “clean data samples” and the “adversarial data samples” of the claim. See the next parenthetical for more regarding this mapping.)
jointly train a first instance of the neural network and a second instance of the neural network, wherein the first instance of the neural network is trained using the clean data samples to produce a first output and the second instance of the neural network is trained using the adversarial data samples to produce a second output, ([Zonooz, 0009, 0020, 0023, 0029]: As noted above, Zonooz discloses the use of two models: a “standard model [which] is trained on the original images” and a “robust model [which] is trained on adversarial images” [Zonooz, 0020], with both models using the ResNet-18 architecture [Zonooz, 0029]. Furthermore, Zonooz discloses “collaborative[ly]” training the two models “in conjunction” [Zonooz, 0009]. More precisely, it discloses that the training makes use of a “mimicry loss which is used to align each model with the other one” [Zonooz, 0023]. The standard and robust models map, respectively, to the “first instance of the neural network” and the “second instance of the neural network” of the claim, and their outputs map respectively to the “first output” and the “second output” of the claim. The collaborative training of the two models maps to the “jointly training” step of the claim.)
wherein the training of the first instance of the neural network comprises: training the [probabilistic] encoder of the first instance of the neural network to encode the clean data samples [into a first probability distribution over a latent space;] and training the classifier of the first instance of the neural network to classify a first instance of the latent space to produce a first classification result, ([Zonooz, 0020, 0029]: As described above, the standard model maps to the “first instance of the neural network” of the claim, the first 17 layers of the architecture to the “encoder of the first instance of the neural network” of the claim, and the final layer to the “classifier of the first instance of the neural network” of the claim. The output of the “encoder of the first instance of the neural network” maps to the “first instance of the latent space” of the claim (cf. the combination with Kingma as described below for more about the latent space), and the output of the “classifier of the first instance of the neural network” to the “first classification result” of the claim. With these mappings, the function of the “encoder of the first instance of the neural network” falls under the broadest reasonable interpretation of “encod[ing] the clean data samples” as recited by the claim, and, the function of the “classifier of the first instance of the neural network” falls under the broadest reasonable interpretation of “classifiy[ing the] first instance of the latent space to produce a first classification result” as recited by the claim.)
and wherein the training of the second instance of the neural network comprises: training the [probabilistic] encoder of the second instance of the neural network to encode the adversarial data samples [into a second probability distribution over the latent space;] and training the classifier of the second instance of the neural network to classify a second instance of the latent space to produce a second classification result, ([Zonooz, 0020, 0029]: The mappings from this limitation are essentially the same as those explained in the previous parenthetical, except that it is the robust model that maps to the “second instance of the neural network” of the claim.)
optimize a multi-objective loss function defined on the first output of the first instance of the neural network and the second output of the second instance of the neural network; ([Zonooz, 0023]: Zonooz discloses that “[e]ach model… is trained with two losses: a task specific loss and a mimicry loss which is used to align each model with the other one” [Zonooz, 0023]. The loss function used to train either model (denoted L_G [Zonooz, 0025] or L_F [Zonooz, 0026]) maps to the “multi-objective loss function” of the claim: it is “multi-objective” since it incorporates both the task-specific loss and the mimicry loss, and it defined based on both the “first output” and the “second output” as mapped above.)
and output shared parameters of the first instance of the neural network and the second instance of the neural network. ([Zonooz, 0023, 0025-0026]: As noted above, Zonooz discloses using a “mimicry loss which is used to align each model with the other one” [Zonooz, 0023]. Since the use of this mimicry loss results in an alignment between the network parameters of the standard model (denoted φ [Zonooz, 0026]) and those of the robust model (denoted θ [Zonooz, 0025]), the network parameters φ and θ fall under broadest reasonable interpretation of the “shared parameters” of the claim.)

Zonooz does not distinctly disclose the use of a probabilistic encoder. In other words, Zonooz does not distinctly disclose:  
[the neural network comprising a] probabilistic [encoder] … [training the] probabilistic [encoder of the first instance of the neural network to encode the clean data samples] into a first probability distribution over a latent space; … wherein the first instance of the latent space corresponds to the first probability distribution over the latent space, … [training the] probabilistic [encoder of the second instance of the neural network to encode the adversarial data samples] into a second probability distribution over the latent space; … wherein the second instance of the latent space corresponds to the second probability distribution over the latent space,

Kingma is also in the field of machine learning. Moreover, Zonooz in view of Kingma discloses: 
[the neural network comprising a] probabilistic [encoder] … [training the] probabilistic [encoder of the first instance of the neural network to encode the clean data samples] into a first probability distribution over a latent space; … wherein the first instance of the latent space corresponds to the first probability distribution over the latent space, … [training the] probabilistic [encoder of the second instance of the neural network to encode the adversarial data samples] into a second probability distribution over the latent space; … wherein the second instance of the latent space corresponds to the second probability distribution over the latent space, ([Kingma, section 1]: Kingma discloses a “variational auto-encoder” [Kingma, section 1 second paragraph]. In the combination, a variational auto-encoder is used as a part of the “encoder” from Zonooz as mapped above, thereby resulting in the “probabilistic encoder” of the claim. The variational auto-encoder performs “[e]fficient approximate posterior inference of the latent variable z given an observed value x” [Kingma, page 2]. This posterior (denoted p_ θ(z|x) [Kingma, figure 1 caption]) maps to the first/second “probability distribution over the latent space” of the claim, the “latent space” of the claim being the space of values which the latent variable z can take. The probability distribution then falls under the broadest reasonable interpretation of “correspond[ing] to” the first/second “instance of the latent space” as mapped under Zonooz above. The examiner notes that the claim as recited does not require any particular relationship between the first/second “instance of the latent space” and the first/second “probability distribution over the latent space”.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the adversarial training method disclosed by Zonooz with the variational autoencoder disclosed by Kingma because variational autoencoders help to reduce data dimensionality in a way that is well-suited for handling uncertainty, thereby allowing the combination to make more robust classifications. 

Claims 9-10 inherit limitations from claim 8 and recite additional limitations which are substantially similar to those recited by claim 2-3, respectively, so they are rejected by the same rationale.

Claim 15
Zonooz discloses: 
A non-transitory computer-readable medium having stored thereon computer-executable instructions, which when executed by a computer, cause the computer to execute operations, the operations comprising: ([Zonooz, 0002, 0019, 0029, 0035]: Zonooz discloses an invention “relat[ing] to a method for training a robust deep neural network model” [Zonooz, 0002] and implemented by means of “a general or specific purpose computer or distributed system programmed with computer software” [Zonooz, 0035] and that the software is preferably stored on one or more tangible non-transitory memory storage devices” [Zonooz, 0035]. The deep neural network model (or, more precisely, its architecture) maps to the “neural network” of the claim. Zonooz discloses an embodiment which is directed towards “a binary classification problem” [Zonooz, 0019]. It also discloses the use of one of two possible architectures: ResNet-18 and WRN-28-10 [Zonooz, 0029]. Any proper subset of consecutive layers including the final layer can be mapped to the “classifier” of the claim (since it produces the final classification output), and any preceding subset can be mapped to the “encoder” of the claim (since it “encodes” the input). Merely for the sake of concreteness, the remainder of this mapping assumes the ResNet-18 architecture, and, as this architecture has 18 layers, layer 18 can be mapped to the “classifier” of the claim and layers 1-17 to the “encoder” of the claim. See below for more regarding these mappings. In particular, see the combination with Kingma regarding the probabilistic encoder.)
collecting a plurality of data samples as input for training the neural network, wherein the plurality of data samples comprises clean data samples and adversarial data samples; ([Zonooz, 0020]: Zonooz discloses the use of two models: a “standard model [which] is trained on the original images” and a “robust model [which] is trained on adversarial images” [Zonooz, 0020]. The original and adversarial images map, respectively, to the “clean data samples” and the “adversarial data samples” of the claim. See the next parenthetical for more regarding this mapping.)
jointly training a first instance of the neural network and a second instance of the neural network, wherein the first instance of the neural network is trained using the clean data samples to produce a first output and the second instance of the neural network is trained using the adversarial data samples to produce a second output, ([Zonooz, 0009, 0020, 0023, 0029]: As noted above, Zonooz discloses the use of two models: a “standard model [which] is trained on the original images” and a “robust model [which] is trained on adversarial images” [Zonooz, 0020], with both models using the ResNet-18 architecture [Zonooz, 0029]. Furthermore, Zonooz discloses “collaborative[ly]” training the two models “in conjunction” [Zonooz, 0009]. More precisely, it discloses that the training makes use of a “mimicry loss which is used to align each model with the other one” [Zonooz, 0023]. The standard and robust models map, respectively, to the “first instance of the neural network” and the “second instance of the neural network” of the claim, and their outputs map respectively to the “first output” and the “second output” of the claim. The collaborative training of the two models maps to the “jointly training” step of the claim.)
wherein the training of the first instance of the neural network comprises: training the [probabilistic] encoder of the first instance of the neural network to encode the clean data samples [into a first probability distribution over a latent space;] and training the classifier of the first instance of the neural network to classify a first instance of the latent space to produce a first classification result, ([Zonooz, 0020, 0029]: As described above, the standard model maps to the “first instance of the neural network” of the claim, the first 17 layers of the architecture to the “encoder of the first instance of the neural network” of the claim, and the final layer to the “classifier of the first instance of the neural network” of the claim. The output of the “encoder of the first instance of the neural network” maps to the “first instance of the latent space” of the claim (cf. the combination with Kingma as described below for more about the latent space), and the output of the “classifier of the first instance of the neural network” to the “first classification result” of the claim. With these mappings, the function of the “encoder of the first instance of the neural network” falls under the broadest reasonable interpretation of “encod[ing] the clean data samples” as recited by the claim, and, the function of the “classifier of the first instance of the neural network” falls under the broadest reasonable interpretation of “classifiy[ing the] first instance of the latent space to produce a first classification result” as recited by the claim.)
and wherein the training of the second instance of the neural network comprises: training the [probabilistic] encoder of the second instance of the neural network to encode the adversarial data samples [into a second probability distribution over the latent space;] and training the classifier of the second instance of the neural network to classify a second instance of the latent space to produce a second classification result, ([Zonooz, 0020, 0029]: The mappings from this limitation are essentially the same as those explained in the previous parenthetical, except that it is the robust model that maps to the “second instance of the neural network” of the claim.)
optimizing a multi-objective loss function defined on the first output of the first instance of the neural network and the second output of the second instance of the neural network; ([Zonooz, 0023]: Zonooz discloses that “[e]ach model… is trained with two losses: a task specific loss and a mimicry loss which is used to align each model with the other one” [Zonooz, 0023]. The loss function used to train either model (denoted L_G [Zonooz, 0025] or L_F [Zonooz, 0026]) maps to the “multi-objective loss function” of the claim: it is “multi-objective” since it incorporates both the task-specific loss and the mimicry loss, and it defined based on both the “first output” and the “second output” as mapped above.)
and outputting the shared parameters of the first instance of the neural network and the second instance of the neural network. ([Zonooz, 0023, 0025-0026]: As noted above, Zonooz discloses using a “mimicry loss which is used to align each model with the other one” [Zonooz, 0023]. Since the use of this mimicry loss results in an alignment between the network parameters of the standard model (denoted φ [Zonooz, 0026]) and those of the robust model (denoted θ [Zonooz, 0025]), the network parameters φ and θ fall under broadest reasonable interpretation of the “shared parameters” of the claim.)

Zonooz does not distinctly disclose the use of a probabilistic encoder. In other words, Zonooz does not distinctly disclose:  
[the neural network comprising a] probabilistic [encoder] … [training the] probabilistic [encoder of the first instance of the neural network to encode the clean data samples] into a first probability distribution over a latent space; … wherein the first instance of the latent space corresponds to the first probability distribution over the latent space, … [training the] probabilistic [encoder of the second instance of the neural network to encode the adversarial data samples] into a second probability distribution over the latent space; … wherein the second instance of the latent space corresponds to the second probability distribution over the latent space,

Kingma is also in the field of machine learning. Moreover, Zonooz in view of Kingma discloses: 
[the neural network comprising a] probabilistic [encoder] … [training the] probabilistic [encoder of the first instance of the neural network to encode the clean data samples] into a first probability distribution over a latent space; … wherein the first instance of the latent space corresponds to the first probability distribution over the latent space, … [training the] probabilistic [encoder of the second instance of the neural network to encode the adversarial data samples] into a second probability distribution over the latent space; … wherein the second instance of the latent space corresponds to the second probability distribution over the latent space, ([Kingma, section 1]: Kingma discloses a “variational auto-encoder” [Kingma, section 1 second paragraph]. In the combination, a variational auto-encoder is used as a part of the “encoder” from Zonooz as mapped above, thereby resulting in the “probabilistic encoder” of the claim. The variational auto-encoder performs “[e]fficient approximate posterior inference of the latent variable z given an observed value x” [Kingma, page 2]. This posterior (denoted p_ θ(z|x) [Kingma, figure 1 caption]) maps to the first/second “probability distribution over the latent space” of the claim, the “latent space” of the claim being the space of values which the latent variable z can take. The probability distribution then falls under the broadest reasonable interpretation of “correspond[ing] to” the first/second “instance of the latent space” as mapped under Zonooz above. The examiner notes that the claim as recited does not require any particular relationship between the first/second “instance of the latent space” and the first/second “probability distribution over the latent space”.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the adversarial training method disclosed by Zonooz with the variational autoencoder disclosed by Kingma because variational autoencoders help to reduce data dimensionality in a way that is well-suited for handling uncertainty, thereby allowing the combination to make more robust classifications. 

Claim(s) 4-5 and 11-12 is/are rejected under 35 USC 103 as being unpatentable over Zonooz in view of Kingma, further in view of Becker (Mutual information maximization: models of cortical self-organization, published 1996; hereafter, “Becker”).

Claim 4
Zonooz in view of Kingma already discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 1, further comprising] parameterizing the multi-objective loss function based on [a mutual information of the first probability distribution over the latent space and the second probability distribution over the latent space] and entropy losses of the first classification result and the second classification result. ([Zonooz, 0023, 0025-0026]: As noted above, the loss functions disclosed by Zonooz make use of a “task specific loss” given by “a natural cross-entropy between the output of the model and the ground truth” [Zonooz, 0023]. The task specific losses map to the “entropy losses” of the claim.)

The same motivation to combine applies. 

Zonooz in view of Kingma does not distinctly disclose:
[the multi-objective loss function based on] a mutual information of the first probability distribution over the latent space and the second probability distribution over the latent space

Becker is in the field of neural networks. Moreover, Zonooz in view of Kingma and Becker discloses:
[the multi-objective loss function based on] a mutual information of the first probability distribution over the latent space and the second probability distribution over the latent space ([Becker, section 2]: Becker discloses the use of a “mutual information cost function” [Becker, section 2 pages 11-12 paragraph beginning “To see”] which ensures agreement between the outputs of “two different neural units or neural network modules” by “maximiz[ing] the mutual information between the outputs of two (or more) different network modules” [Becker, section 2 page 11 paragraph beginning “In collaboration”]. In the combination, the neural network modules of Becker are taken to be variational autoencoders disclosed by Kingma, i.e., the “probabilistic encoder[s]” of the claims, so that the outputs whose mutual information is being maximized are the probability distributions over the latent space of the claim.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to incorporate a term for maximizing mutual information into a cost/loss function as disclosed by Becker into the adversarial training method disclosed by Zonooz in view of Kingma because maximizing mutual information “selects features which agree across multiple input channels” [Becker, section 2 pages 11-12 paragraph beginning “To see how”] and “provides a means for information in one channel to modulate learning in another channel” [Becker, section 2 page 12 paragraph beginning “One advantage”], so the combination would ensure that probabilistic encodings produced by the two networks are better aligned. 

Claim 5
Zonooz in view of Kingma discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 1, wherein] the multi-objective loss function comprises … a second term corresponding to minimizing a symmetrized Kullback-Leibler divergence between encodings of one of the clean data samples or the adversarial data samples in the pair conditioned on another data sample in the pair, ([Zonooz, 0025-0026]: The second terms on the right-hand side of [Zonooz, 0025 equation 1] and [Zonooz, 0026 equation 2] are Kullback-Leibler divergence terms and fall under the broadest reasonable interpretation of “corresponding to… minimizing” a symmetrized Kullback-Leibler divergence: since the loss functions L_F and L_G are minimized, so too are each of those Kullback-Leibler divergences (since the coefficients in front of them are positive), and so too is a corresponding symmetrized Kullback-Leibler divergence (namely, the one defined to be the sum of the two Kullback-Leibler divergences, which is also known as Jeffreys divergence).)
a third term corresponding to a clean cross-entropy loss determined for classifying the clean data samples, ([Zonooz, 0026]: The first term on the right-hand side of [Zonooz, 0026 equation 2] is a cross-entropy between the output of the model on the original/clean data and ground-truth labels. In other words, this term “corresponds to… a clean cross-entropy loss” as recited by the claim.)
and a fourth term corresponding to an adversarial cross-entropy loss determined for classifying the adversarial data samples. ([Zonooz, 0025]: The first term on the right-hand side of [Zonooz, 0025 equation 1] is a cross-entropy between the output of the model on the adversarial examples and ground-truth labels. In other words, this term “corresponds to… an adversarial cross-entropy loss” as recited by the claim.)

Zonooz in view of Kingma does not distinctly disclose: 
[the multi-objective loss function comprises] a first term corresponding to maximizing a mutual information between probability distributions of encodings of pairs of the clean data samples and the adversarial data samples,

Becker is in the field of neural networks. Moreover, Zonooz in view of Kingma and Becker discloses:
[the multi-objective loss function comprises] a first term corresponding to maximizing a mutual information between probability distributions of encodings of pairs of the clean data samples and the adversarial data samples, ([Becker, section 2]: Becker discloses the use of a “mutual information cost function” [Becker, section 2 pages 11-12 paragraph beginning “To see”] which ensures agreement between the outputs of “two different neural units or neural network modules” by “maximiz[ing] the mutual information between the outputs of two (or more) different network modules” [Becker, section 2 page 11 paragraph beginning “In collaboration”]. In the combination, the neural network modules of Becker are taken to be variational autoencoders disclosed by Kingma, i.e., the “probabilistic encoder[s]” of the claims, so that the outputs whose mutual information is being maximized are the probability distributions over the latent space of the claim, i.e., the “probability distributions of encodings” of the claim. The mutual information term incorporated into the loss functions of Zonooz then maps to the “first term” of the claim.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to incorporate a term for maximizing mutual information into a cost/loss function as disclosed by Becker into the adversarial training method disclosed by Zonooz in view of Kingma because maximizing mutual information “selects features which agree across multiple input channels” [Becker, section 2 pages 11-12 paragraph beginning “To see how”] and “provides a means for information in one channel to modulate learning in another channel” [Becker, section 2 page 12 paragraph beginning “One advantage”], so the combination would ensure that probabilistic encodings produced by the two networks are better aligned. 

Claims 11-12 inherit limitations from claim 8 and recite additional limitations which are substantially similar to those recited by claims 4-5, respectively, so they are rejected by the same rationale. 

Claim(s) 6 and 13 is/are rejected under 35 USC 103 as being unpatentable over Zonooz in view of Kingma, further in view of Xiao et al. (US20190012575A1, published 2019-01-10; hereafter, “Xiao”).

Claim 6
Zonooz in view of Kingma already discloses the elements of the parent claim(s). It also discloses: 
[The method of claim 1, wherein the collecting the plurality of data samples comprises: receiving the clean data samples over a communication channel;] and modifying each clean data sample of the clean data samples to generate a corresponding adversarial data sample of the adversarial data samples. ([Zonooz, 0020]: Zonooz discloses adding an “adversarial perturbation, δ” to the original/clean data in order to obtain the adversarial examples. This addition of a perturbation maps to the “modifying” recited by the claim.)

Zonooz in view of Kingma does not distinctly disclose: 
receiving the clean data samples over a communication channel, 

Xiao is also in the field of machine learning. Moreover, Zonooz in view of Kingma and Xiao discloses: 
receiving the clean data samples over a communication channel, ([Xiao, 0047]: Xiao discloses “receiv[ing] a new training data set… through a wired connection or a wireless connection”. In the combination, the training data set corresponds to the original data in Zonooz and the “clean data samples” of the claim.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the adversarial training method disclosed by Zonooz in view of Kingma with the process of receiving training data through a wired or wireless connection as disclosed by Xiao because receiving data through such a connection would be faster than manually copying data from one device or component to another.

Claims 13 inherits limitations from claim 8 and recites additional limitations which are substantially similar to those recited by claim 6, so it is rejected by the same rationale.

Claim(s) 7 and 14 is/are rejected under 35 USC 103 as being unpatentable over Zonooz in view of Kingma and Xiao, further in view of Madry et al. (Towards Deep Learning Models Resistant to Adversarial Attacks, published 2019-09-04; hereafter, “Madry”).

Claim 7
While Zonooz in view of Kingma and Xiao discloses generating adversarial examples by perturbing the original examples, it does not distinctly disclose: 
[The method of claim 6, wherein the modifying comprises:] applying an adversarial example generation method on each clean data sample of the clean data samples, wherein the adversarial example generation method is one of a projected gradient descent method, a fast-gradient sign method, a limited-memory Broyden-Fletcher-Goldfarb-Shanno method, a Jacobian-based saliency map attack, or a Carlini & Wagner attack. 

Madry is in the field of machine learning. Moreover, Zonooz in view of Xiao and Madry discloses: 
[The method of claim 6, wherein the modifying comprises:] applying an adversarial example generation method on each clean data sample of the clean data samples, wherein the adversarial example generation method is one of a projected gradient descent method, a fast-gradient sign method, a limited-memory Broyden-Fletcher-Goldfarb-Shanno method, a Jacobian-based saliency map attack, or a Carlini & Wagner attack. ([Madry, page 2]: Madry discloses the use of “projective gradient descent (PGD)” for generating adversarial examples.)

Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to combine the adversarial training method disclosed by Zonooz in view of Kingma and Xiao with the use of PGD as disclosed by Madry because PDG is a “reliable first-order adversary” which generates the “strongest attack” (i.e., the most convincing adversarial examples) [Madry, page 2], so the use of examples generated by PGD would result in a most robust system overall. 

Claims 14 inherits limitations from claim 8 and recites additional limitations which are substantially similar to those recited by claim 7, so it is rejected by the same rationale.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Jee Hyong LEE et al. (US20220198270A1, effectively filed 2020-12-22; hereafter, “Lee”) discloses two neural networks trained, respectively, on clean and adversarial data, with each having a feature extractor [Lee, figure 2]. The neural networks trained on clean and adversarial data can be mapped, respectively, to the “first instance of the neural network” and the “second instance of the neural network” of the claim, and the feature extractor can be mapped to the “encoder” of the claim. 
Michael HOY et al. (Learning to Predict Pedestrian Intention via Variational Tracking Networks, published 2018; hereafter, “Hoy”) discloses a method for “solv[ing] a binary classification problem” [Hoy, section 1 first paragraph] using a “deep learning based system” [Hoy, abstract]. The method disclosed therein includes “newly proposed inference techniques like Variational Recurrent Neural Networks… [which] are extensions of the well known Variational Auto-Encoder” [Hoy, section 1 second paragraph]. In other words, Hoy provides documentary evidence of both the fact that the variational auto-encoders of Kingma were “well known” to those of ordinary skill in the art before the effective filing date of the invention, as well as the fact that it would have been obvious to those of ordinary skill in the art before the effective filing date of the invention to incorporate a probabilistic encoder into a neural network performing classification (since that is precisely what Hoy does). 
Similarly, Konstantin BERLIN et al. (US20180041536A1, published 2018-02-08; hereafter, “Berlin”) describes a classifier including an encoder which is specifically indicated as being a variational autoencoder [Berlin, figure 8a/b and 0043]. In other words, Berlin provides further documentary evidence that fact that it would have been obvious to those of ordinary skill in the art before the effective filing date of the invention to incorporate a probabilistic encoder into a neural network performing classification.
Similarly, Panu TURCOT et al. (US20180189581A1, published 2018-07-05; hereafter, “Turcot”) describes a classifier including a “bottleneck layer” which produces a “low dimensional representation” and which may be a “variational autoencoder” [Turcot, 0147]. In other words, Turcot provides further documentary evidence that fact that it would have been obvious to those of ordinary skill in the art before the effective filing date of the invention to incorporate a probabilistic encoder into a neural network performing classification.
Yanfei DONG et al. (US20200210808A1, published 2020-07-02; hereafter, “Dong”) discloses a classifier including an encoder in which data is produced by a sampling process [Dong, abstract; see also, figure 2, figure 4 step 408]. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Shishir AGRAWAL whose telephone number is +1 703-756-1183. The examiner can normally be reached Monday through Thursday, 08:30-14:30 Pacific Time.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey SHMATOV can be reached on +1 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is +1 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at +1 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call +1 800-786-9199 (IN USA OR CANADA) or +1 571-272-1000.

/S.A./Examiner, Art Unit 2123
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
Method and system for training a neural network for improving adversarial robustness

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Method and system for training a neural network for improving adversarial robustness

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email