Last updated: May 29, 2026
Application No. 18/341,397
Adaptive Learning Rates for Training Adversarial Models with Improved Computational Efficiency

Non-Final OA §103
Filed
Jun 26, 2023
Priority
Jun 24, 2022 — provisional 63/355,363
Examiner
TRAN, TAN H
Art Unit
2141
Tech Center
2100 — Computer Architecture & Software
Assignee
Google LLC
OA Round
1 (Non-Final)
Interview Optional

— +32.1% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 60% grant rate with +32.1% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 313 resolved cases, 2023–2026
Examiner Intelligence

TRAN, TAN H View full profile →
Grants 60% of resolved cases
Career Allowance Rate
189 granted / 313 resolved
+5.4% vs TC avg
Strong +32% interview lift
Without
With
+32.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 6m
Avg Prosecution
32 currently pending
Career history
368
Total Applications
across all art units
Statute-Specific Performance

§101
2.6%
-37.4% vs TC avg
§103
92.1%
+52.1% vs TC avg
§102
4.8%
-35.2% vs TC avg
§112
0.2%
-39.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 313 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
2.	This action is in response to the filing on 06/26/2023 and 02/12/2026. Claims 1-20 are pending and have been considered below.

Information Disclosure Statement
3.	The information disclosure statement (IDS(s)) submitted on 06/26/2023, 02/01/2024 is/are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections – 35 USC § 103
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


5.	Claims 1-2, 7, 9-12, 15, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Xiao et al. (U.S. Patent Application Pub. No. US 20230095092 A1) in view of Bhandary et al. (U.S. Patent Application Pub. No. US 20200319631 A1).

Claim 1: Xiao teaches a computer-implemented method for training adversarial models with improved computational efficiency (i.e. techniques are presented to train and utilize one or more neural networks. A denoising diffusion generative adversarial network (denoising diffusion GAN) reduces a number of denoising steps during a reverse process; abs), the method comprising:
obtaining, by a computing system comprising one or more computing devices (i.e. computing system; para. [0190]), one or more training samples (i.e. FIG. 5 illustrates an example process 500 for training a denoising diffusion GAN. In this example, a sample input is received 502; para. [0108]);
processing, by the computing system, the one or more training samples with an adversarial machine learning model to generate one or more outputs, wherein the adversarial machine learning model comprises at least a first model component and a second model component that are adversarial to each other (i.e. the discriminator of the GAN tries to distinguish whether a denoised sample is generated by the generator or sampled in the previous step of the forward process (in which case it is a true denoising sample), and the generator tries to produce denoised samples conditioned on the same noisy observation … the denoising diffusion GAN may be trained on an input image that undergoes forward diffusion to generate one or more intermediate images that may be compared, by a discriminator, against a generated image, from a generator, that undergoes posterior sampling to add noise. As a result, the discriminator may be comparing noisy images to determine whether an image is real or fake (e.g., whether the noisy image is based on noise added to a real image or the noisy image is based on noise added to a generated image). Adversarial training provides a loss that is then fed back to the generator to improve the network to generate additional images; para. [0057, 0105]), the generator and discriminator where the discriminator tries to distinguish generated samples from real samples while the generator tries to produce denoised samples, establishing that the two components are adversarial to each other;
evaluating, by the computing system, a loss function based at least in part on the one or more outputs to determine a current loss value associated with the adversarial machine learning model (i.e. Adversarial training provides a loss that is then fed back to the generator to improve the network to generate additional images … The determinator may then be used to determine a loss 426, which can be used to update and improve the generator; para. [0105, 0107]);
determining, by the computing system, the current loss value associated with the adversarial machine learning model and an ideal loss value for the adversarial machine learning model (i.e. Training may be formulated by matching the conditional GAN generator pθ(xt-1|xt) and q(xt-1|xt) using an adversarial loss that minimizes a divergence Dad, per denoising step using: where Dadv can be Wasserstein distance, Jenson-Shannon divergence, or f-divergence depending on the adversarial training setup; para. [0071]);
determining, by the computing system, an adaptive learning rate value for at least one of the first model component and the second model component based at least in part on the current loss value associated with the adversarial machine learning model and the loss value for the adversarial machine learning model (i.e. During training, a randomly sampled integer time step for each datapoint may be used within a batch. Additionally, a regularization term may be added to the objective for the discriminator. The model may then be trained using Adam optimizer, and may use cosine learning rate decay for training both the generator and discriminator; para. [0086]); and
updating, by the computing system, the at least one of the first model component and the second model component (i.e. The determinator may then be used to determine a loss 426, which can be used to update and improve the generator; para. [0055, 0107]) according to the adaptive learning rate value (i.e. During training, a randomly sampled integer time step for each datapoint may be used within a batch. Additionally, a regularization term may be added to the objective for the discriminator. The model may then be trained using Adam optimizer, and may use cosine learning rate decay for training both the generator and discriminator; para. [0086]).
	Xiao does not explicitly teach determining a distance between the current loss value and an ideal loss value; determining an adaptive learning rate value based at least in part on the distance between the current loss value and the ideal loss value.
	However, Bhandary teaches determining, by the computing system, a distance between the current loss value and an ideal loss value (i.e. calculating, by the processor, a current pattern error based on a vector distance between the machine failure sequence and current predicted sequence; and if the current pattern error is less than or not greater than a predetermined error threshold value, determining, by the processor, an updated learning rate based on the current pattern error, and updating weight values between input and hidden units in RNN based on the updated learning rate; para. [0009, 0010]); determining, by the computing system, an adaptive learning rate value based at least in part on the distance between the current loss value and the ideal loss value (i.e. calculating, by the processor, a current pattern error based on a vector distance between the machine failure sequence and current predicted sequence; and if the current pattern error is less than or not greater than a predetermined error threshold value, determining, by the processor, an updated learning rate based on the current pattern error, and updating weight values between input and hidden units in RNN based on the updated learning rate; para. [0009, 0010]); and
updating, by the computing system, according to the adaptive learning rate value (i.e. calculating, by the processor, a current pattern error based on a vector distance between the machine failure sequence and current predicted sequence; and if the current pattern error is less than or not greater than a predetermined error threshold value, determining, by the processor, an updated learning rate based on the current pattern error, and updating weight values between input and hidden units in RNN based on the updated learning rate; para. [0009, 0010]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Xiao to include the feature of Bhandary. One would have been motivated to make this modification because it improves convergence speed.

Claim 2: Xiao and Bhandary teach the computer-implemented method of claim 1. Xiao further teaches wherein: the loss function comprises a minimax function (i.e. equations 5 and 6, min/max adversarial objective; para. [0073, 0074]); the first model component seeks to minimize the minimax function (i.e. equation 5; para. [0073]); the second model component seeks to maximize the minimax function (i.e. equation 6; para. [0074]); and the ideal loss value comprises a minimum value of the minimax function (i.e. equations 5 and 6, min/max adversarial objective; para. [0073, 0074]).

Claim 7: Xiao and Bhandary teach the computer-implemented method of claim 1. Xiao further teaches wherein: determining, by the computing system, the adaptive learning rate value for the at least one of the first model component and the second model component comprises determining, by the computing system, the adaptive learning rate value for the second model component (i.e. During training, a randomly sampled integer time step for each datapoint may be used within a batch. Additionally, a regularization term may be added to the objective for the discriminator. The model may then be trained using Adam optimizer, and may use cosine learning rate decay for training both the generator and discriminator; para. [0086]); and updating, by the computing system, the at least one of the first model component and the second model component according to the adaptive learning rate value comprises updating, by the computing system, the second model component according to the adaptive learning rate value (i.e. During training, a randomly sampled integer time step for each datapoint may be used within a batch. Additionally, a regularization term may be added to the objective for the discriminator. The model may then be trained using Adam optimizer, and may use cosine learning rate decay for training both the generator and discriminator; para. [0086]).
Bhandary further teaches wherein: determining, by the computing system, the adaptive learning rate value comprises determining, by the computing system, the adaptive learning rate value; and updating, by the computing system, according to the adaptive learning rate value comprises updating, by the computing system, according to the adaptive learning rate value (i.e. calculating, by the processor, a current pattern error based on a vector distance between the machine failure sequence and current predicted sequence; and if the current pattern error is less than or not greater than a predetermined error threshold value, determining, by the processor, an updated learning rate based on the current pattern error, and updating weight values between input and hidden units in RNN based on the updated learning rate; para. [0009, 0010]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Xiao to include the feature of Bhandary. One would have been motivated to make this modification because it improves convergence speed.

Claim 9: Xiao and Bhandary teach the computer-implemented method of claim 1. Xiao further teaches wherein the first model comprises an image synthesis model (i.e. the noisy image 356 is further provided to the generator 206, which is used to output a generated image 358; para. [0084]).

Claim 10: Xiao and Bhandary teach the computer-implemented method of claim 1. Xiao further teaches wherein determining, by the computing system, the adaptive learning rate value for at least one of the first model component and the second model component based at least in part the current loss value associated with the adversarial machine learning model and the ideal loss value for the adversarial machine learning model (i.e. During training, a randomly sampled integer time step for each datapoint may be used within a batch. Additionally, a regularization term may be added to the objective for the discriminator. The model may then be trained using Adam optimizer, and may use cosine learning rate decay for training both the generator and discriminator; para. [0086]) comprises: determining, by the computing system, a learning rate value for the at least one of the first model component and the second model component based at least in part on the current loss value associated with the adversarial machine learning model and the ideal loss value for the adversarial machine learning model (i.e. During training, a randomly sampled integer time step for each datapoint may be used within a batch. Additionally, a regularization term may be added to the objective for the discriminator. The model may then be trained using Adam optimizer, and may use cosine learning rate decay for training both the generator and discriminator; para. [0086]).
Xiao does not explicitly teach determining, the adaptive learning rate value based at least in part on the distance between the current loss value and the ideal loss value comprises determining, a learning rate scaling value based at least in part on the distance between the current loss value and the ideal loss value; and scaling, by the computing system, a base learning rate value by the learning rate scaling value to obtain the adaptive learning rate value.
However, Bhandary further teaches determining, the adaptive learning rate value based at least in part on the distance between the current loss value and the ideal loss value (i.e. calculating, by the processor, a current pattern error based on a vector distance between the machine failure sequence and current predicted sequence; and if the current pattern error is less than or not greater than a predetermined error threshold value, determining, by the processor, an updated learning rate based on the current pattern error, and updating weight values between input and hidden units in RNN based on the updated learning rate; para. [0009, 0010]) comprises determining, a learning rate scaling value (i.e. the updated learning rate may be calculated according to the following Equation (7): η.sub.updated=η.sub.base*(1+α+β) wherein η.sub.updated refers to the updated learning rate, η.sub.base refers to the base updated learning rate, α refers to the acceleration/deceleration parameter and β refers to the bonus acceleration parameter; para. [0099]) based at least in part on the distance between the current loss value and the ideal loss value (i.e. calculating, by the processor, a current pattern error based on a vector distance between the machine failure sequence and current predicted sequence; and if the current pattern error is less than or not greater than a predetermined error threshold value, determining, by the processor, an updated learning rate based on the current pattern error, and updating weight values between input and hidden units in RNN based on the updated learning rate; para. [0009, 0010]); and scaling, by the computing system, a base learning rate value by the learning rate scaling value (i.e. the updated learning rate may be calculated according to the following Equation (7): η.sub.updated=η.sub.base*(1+α+β) wherein η.sub.updated refers to the updated learning rate, η.sub.base refers to the base updated learning rate, α refers to the acceleration/deceleration parameter and β refers to the bonus acceleration parameter; para. [0099]) to obtain the adaptive learning rate value (i.e. calculating, by the processor, a current pattern error based on a vector distance between the machine failure sequence and current predicted sequence; and if the current pattern error is less than or not greater than a predetermined error threshold value, determining, by the processor, an updated learning rate based on the current pattern error, and updating weight values between input and hidden units in RNN based on the updated learning rate; para. [0009, 0010]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Xiao to include the feature of Bhandary. One would have been motivated to make this modification because it improves convergence speed.

Claim 11: Xiao and Bhandary teach the computer-implemented method of claim 10. Xiao does not explicitly teach wherein: when the current loss value is greater than the ideal loss value, the learning rate scaling value is greater than or equal to one; when the current loss value is less than the ideal loss value, the learning rate scaling value is greater than zero and less than or equal to one.
However, Bhandary further teaches wherein: when the current loss value is greater than the ideal loss value, the learning rate scaling value is greater than or equal to one; when the current loss value is less than the ideal loss value, the learning rate scaling value is greater than zero and less than or equal to one (i.e. From the second predetermined rule in Table 2, it can be seen that the acceleration parameter is always higher than deceleration parameter i.e. reward for learning rate is higher than penalty. Therefore, the mechanism for updating learning rate is optimized in the sense that the learning rate is not decreased as much as it is increased with changes of pattern errors … the updated learning rate may be calculated according to the following Equation (7): η.sub.updated=η.sub.base*(1+α+β) wherein η.sub.updated refers to the updated learning rate, η.sub.base refers to the base updated learning rate, α refers to the acceleration/deceleration parameter and β refers to the bonus acceleration parameter; para. [0094, 0099]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Xiao to include the feature of Bhandary. One would have been motivated to make this modification because it improves convergence speed.

Claim 12: Xiao and Bhandary teach the computer-implemented method of claim 10. Xiao does not explicitly teach wherein determining, by the computing system, the learning rate scaling value comprises: when the current loss value is greater than the ideal loss value, evaluating a first scheduling function with an argument of the distance between the current loss value and the ideal loss value; and
when the current loss value is less than the ideal loss value, evaluating a second scheduling function with an argument of the distance between the current loss value associated with the adversarial machine learning model and the ideal loss value for the adversarial machine learning model.
However, Bhandary further teaches wherein determining, by the computing system, the learning rate scaling value comprises: when the current loss value is greater than the ideal loss value, evaluating a first scheduling function (i.e. Table 1 shows an example of the pre-stored table for the first predetermined rule. The base updated learning rates in the table are determined by experimentation; para. [0047, 0084, 0085]) with an argument of the distance between the current loss value and the ideal loss value (i.e. calculating, by the processor, a current pattern error based on a vector distance between the machine failure sequence and current predicted sequence; and if the current pattern error is less than or not greater than a predetermined error threshold value, determining, by the processor, an updated learning rate based on the current pattern error, and updating weight values between input and hidden units in RNN based on the updated learning rate; para. [0009, 0010]); and when the current loss value is less than the ideal loss value, evaluating a second scheduling function with an argument of the distance between the current loss value associated with the adversarial machine learning model and the ideal loss value for the adversarial machine learning model (i.e. In block 301, the processor retrieves an acceleration/deceleration parameter from a pre-stored table based on the current pattern error and whether the current pattern error is less than the immediately preceding pattern error. The pre-stored table, as shown in Table 2, includes a plurality of mappings and each mapping associates a predetermined pattern error range to both an acceleration parameter and a deceleration parameter; para. [0104, 0105]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Xiao to include the feature of Bhandary. One would have been motivated to make this modification because it improves convergence speed.

Claim 15: Xiao and Bhandary teach the computer-implemented method of claim 1. Xiao further teaches wherein the one or more training samples comprise a batch of a plurality of training samples (i.e. During training, a randomly sampled integer time step for each datapoint may be used within a batch; para. [0086]).

Claim 17 is similar in scope to Claim 1 and is rejected under a similar rationale. 
Xiao further teaches one or more processors (i.e. processors; para. [0282]);
at least a first machine learning component, wherein the first machine learning component was trained (i.e. During training, a randomly sampled integer time step for each datapoint may be used within a batch. Additionally, a regularization term may be added to the objective for the discriminator. The model may then be trained using Adam optimizer, and may use cosine learning rate decay for training both the generator and discriminator; para. [0086]) or
was jointly trained with a second machine learning component by performance of training operations (i.e. During training, a randomly sampled integer time step for each datapoint may be used within a batch. Additionally, a regularization term may be added to the objective for the discriminator. The model may then be trained using Adam optimizer, and may use cosine learning rate decay for training both the generator and discriminator; para. [0086]); and
one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors, cause the computer system to run at least the first machine learning component (i.e. code (e.g., executable code or source code) is stored on a set of one or more non-transitory computer-readable storage media having stored thereon executable instructions (or other memory to store executable instructions) that, when executed (i.e., as a result of being executed) by one or more processors of a computer system, cause computer system to perform operations described herein; para. [0437]).

Claim 18 is similar in scope to Claim 2 and is rejected under a similar rationale.

6.	Claims 3-5 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xiao in view of Bhandary and further in view of Chen (U.S. Patent Application Pub. No. US 20210366126 A1).

Claim 3: Xiao and Bhandary teach the computer-implemented method of claim 1. Xiao further teaches wherein: the first model component is configured to generate a first output (i.e. The generator 206 may then provide the output 106; para. [0080]); the second model component comprises a discriminator model configured to generate a second output comprising a probability that the first output belongs to a first distribution (i.e. the discriminator 204 is a time-dependent discrimination designed with a convolution network with ResNet blocks, where the design of ResNet blocks … By way of non-limiting example, network structures for the discrimination 204 may include a 1×1 conv2d, 128, a ResBlock 128, a ResBlock down 256, a ResBlock 512, a minibatch std layer, Global Sum Pooling, and a FC layer→scalar; para. [0073, 0074, 0079]).
Xiao does not explicitly teach the ideal loss value occurs when the probability output by the discriminator model is equal to one half.
However, Chen teaches the ideal loss value occurs when the probability output by the discriminator model is equal to one half (i.e. the training stop condition may also be considered as a convergence condition, which can be that the loss function of the discriminator no longer drops; the output of the discriminator is stable at about (0.5, 0.5), and the discriminator cannot distinguish the difference between the optical flow feature map and the CNN feature map; para. [0177]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Xiao and Bhandary to include the feature of Chen. One would have been motivated to make this modification because it improves training stability and efficiency.

Claim 4: Xiao, Bhandary, and Chen teach the computer-implemented method of claim 3. Xiao further teaches wherein: the adversarial machine learning model comprises a generative adversarial network (i.e. each denoising step may be modeled with a conditional Generative Adversarial Network (GAN); para. [0057]); the first model component comprises a generator network configured to generate the first output (i.e. The generator 206 may then provide the output 106; para. [0080]); and the second model component comprises a discriminator network (i.e. the discriminator of the GAN; para. [0057]).

Claim 5: Xiao, Bhandary, and Chen teach the computer-implemented method of claim 4. Xiao further teaches wherein the generator network is configured to generate a synthetic image (i.e. the noisy image 356 is further provided to the generator 206, which is used to output a generated image 358; para. [0084]).

Claims 19-20 are similar in scope to Claims 3-4 and are rejected under a similar rationale.

7.	Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Xiao in view of Bhandary, Chen, and further in view of Sohn et al. (U.S. Patent Application Pub. No. US 20190066493 A1).

Claim 6: Xiao, Bhandary, and Chen teach the computer-implemented method of claim 3. Xiao does not explicitly teach wherein the adversarial machine learning model comprises a domain adversarial neural network; the first model component comprises a feature extraction network configured to generate the first output comprising extracted features; the second model component comprises a discriminator network; and the domain adversarial neural network comprises a third model component configured to generate a task output based on the extracted features.
However, Sohn teaches wherein the adversarial machine learning model comprises a domain adversarial neural network (i.e. the domain adaptation module 520 is, e.g., a domain adversarial neural network (DANN); para. [0059]); the first model component comprises a feature extraction network configured to generate the first output comprising extracted features (i.e. the feature extractor 300 extracts features 20 and 23 for feature-level domain adaptation with the help of the domain adaptation training unit 400; para. [0059]); the second model component comprises a discriminator network (i.e. While the domain adaptation training unit 400 can perform classification and discrimination with, e.g., separate classifiers and discriminators, a joint parameterizated structure can be used instead such that classifiers without separate discriminators are used; para. [0062-0064]); and the domain adversarial neural network comprises a third model component configured to generate a task output based on the extracted features (i.e. the classifiers 401 and 402 can each generate outputs that include, e.g., entries for class scores corresponding to each classification as well as an additional entry for a domain classification score corresponding to domain discrimination; para. [0064]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Xiao, Bhandary, and Chen to include the feature of Sohn. One would have been motivated to make this modification because it reduces performance degradation under domain shift.

8.	Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Xiao in view of Bhandary and further in view of Cooper (U.S. Patent Application Pub. No. US 20230041290 A1).

Claim 8: Xiao and Bhandary teach the computer-implemented method of claim 7. Xiao further teaches comprising: updating, by the computing system, the first model component according to a learning rate value (i.e. During training, a randomly sampled integer time step for each datapoint may be used within a batch. Additionally, a regularization term may be added to the objective for the discriminator. The model may then be trained using Adam optimizer, and may use cosine learning rate decay for training both the generator and discriminator; para. [0086]).
However, Bhandary further teaches comprising: updating, by the computing system, the first model component according to a learning rate value (i.e. calculating, by the processor, a current pattern error based on a vector distance between the machine failure sequence and current predicted sequence; and if the current pattern error is less than or not greater than a predetermined error threshold value, determining, by the processor, an updated learning rate based on the current pattern error, and updating weight values between input and hidden units in RNN based on the updated learning rate; para. [0009, 0010]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Xiao to include the feature of Bhandary. One would have been motivated to make this modification because it improves convergence speed.
Xiao does not explicitly teach a fixed learning rate value.
However, Cooper teaches updating, by the computing system, the first model component according to a fixed learning rate value (i.e. when a step size switch is activated, the step size or the learning rate used during the training may be modified. Notably, when the step size switch is deactivated, the step size or the learning rate may be unchanged; para. [0069]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Xiao and Bhandary to include the feature of Cooper. One would have been motivated to make this modification because it improves training efficiency.

9.	Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Xiao in view of Bhandary and further in view of Singh et al. (U.S. Patent Application Pub. No. US 20200234110 A1).

Claim 13: Xiao and Bhandary teach the computer-implemented method of claim 12. Xiao does not explicitly teach wherein: the first scheduling function comprises linear or exponential interpolation between one and a maximum value; and the second scheduling function comprises linear or exponential interpolation between a minimum value and one.
However, Singh teaches wherein: the first scheduling function comprises linear (i.e. the adversarially-robust neural-network training system changes the learning rate in a linear fashion; para. [0043]) or exponential interpolation between one and a maximum value (i.e. To elaborate, as illustrated in FIG. 8, the adversarially-robust neural-network training system 102 oscillates the learning rate associated with the neural network 304 between a first learning rate α1 and a second learning rate α2. Indeed, for a given period c, the adversarially-robust neural-network training system 102 oscillates the learning rate from the first learning rate α1 to the second learning rate α2 and back to the first learning rate α1. In one or more embodiments, the first learning rate α1 is a threshold amount different than the second learning rate α2. For example, the first learning rate α1 can be at least 1.25, 1.5, 1.75, or 2 times the second learning rate; para. [0089, 0092]); and the second scheduling function comprises linear (i.e. the adversarially-robust neural-network training system changes the learning rate in a linear fashion; para. [0043]) or exponential interpolation between a minimum value and one (i.e. To elaborate, as illustrated in FIG. 8, the adversarially-robust neural-network training system 102 oscillates the learning rate associated with the neural network 304 between a first learning rate α1 and a second learning rate α2. Indeed, for a given period c, the adversarially-robust neural-network training system 102 oscillates the learning rate from the first learning rate α1 to the second learning rate α2 and back to the first learning rate α1. In one or more embodiments, the first learning rate α1 is a threshold amount different than the second learning rate α2. For example, the first learning rate α1 can be at least 1.25, 1.5, 1.75, or 2 times the second learning rate; para. [0089, 0092]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Xiao and Bhandary to include the feature of Cooper. One would have been motivated to make this modification because it stabilizes training dynamics.

10.	Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Xiao in view of Bhandary and further in view of Yi et al. (U.S. Patent Application Pub. No. US 20210150678 A1).

Claim 14: Xiao and Bhandary teach the computer-implemented method of claim 1. Xiao does not explicitly teach wherein the ideal loss value comprises the loss for the machine learning system when an output of first model component is indistinguishable, by the second model component, from a target distribution.
However, Yi teaches wherein the ideal loss value comprises the loss for the machine learning system when an output of first model component is indistinguishable, by the second model component, from a target distribution (i.e. The discriminator is a network that is learning to discriminate whether a photo is a real-world photo. The discriminator receives the input x, where x represents a possible photo. An output D(x) generated by the discriminator represents the probability that x is a real-world photo. If D(x) is 1, it indicates that x is absolutely a real-world photo. If D(x) is 0, it indicates that x absolutely is not a real-world photo. In training the GAN, an objective of the generator is to generate a photo as real as possible (to avoid detection by discriminator), and an objective of the discriminator is to try to discriminate between a real-world photo and the photo generated by the generator. Thus, training constitutes a dynamic adversarial process between the generator and the discriminator. The aim of the training is for the generator to learn to generate a photo that the discriminator cannot discriminate from a real-world photo (ideally, D(G(z))=0.5). The trained generator is then used for model application, which is generation of a synthetic photo in this example; para. [0056]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Xiao and Bhandary to include the feature of Yi. One would have been motivated to make this modification because it helps avoid overtraining.

11.	Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Xiao in view of Bhandary and further in view of Cresswell et al. (U.S. Patent Application Pub. No. US 20230385694 A1).
Claim 16: Xiao and Bhandary teach the computer-implemented method of claim 1. Xiao does not explicitly teach wherein the current loss value comprises an exponential moving average of a model loss over a number of batches.
However, Cresswell teaches wherein the current loss value comprises an exponential moving average of a model loss (i.e. the loss may be represented as a moving average, such as an exponential moving average, that adjusts the loss at each iteration according to a momentum hyperparameter β. The stored set of losses may store the moving average, such that the moving average may be updated based on the loss evaluated at this iteration; para. [0036]) over a number of batches (i.e. the models may be trained in one or more training iterations based on batches of training data from the training data store 170; para. [0024]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Xiao and Bhandary to include the feature of Cresswell. One would have been motivated to make this modification because it reduces variance from batch sampling.

Conclusion
	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Acuna Marrero et al. (Pub. No. US 20220383073 A1), the MLMs may be trained using adversarial learning, for example, by jointly training the MLMs.
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way.  A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art.  In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAN TRAN whose telephone number is (303)297-4266.  The examiner can normally be reached on Monday - Thursday - 8:00 am - 5:00 pm MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matt Ell can be reached on 571-270-3264.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/TAN H TRAN/Primary Examiner, Art Unit 2141
Read full office action
Prosecution Timeline

Jun 26, 2023
Application Filed
Feb 12, 2026
Response after Non-Final Action
Feb 26, 2026
Non-Final Rejection mailed — §103
Apr 20, 2026
Interview Requested
Apr 30, 2026
Applicant Interview (Telephonic)
May 01, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/048,279
Patent 12639613
AUTOMATION DESIGN FOR ACHIEVING LONG TERM STABLE OPERATION OF QUANTUM COMPUTERS
3y 7m to grant Granted May 26, 2026
17/979,488
Patent 12633211
TRAFFIC ACCIDENT PREDICTION SYSTEMS AND METHODS
3y 6m to grant Granted May 19, 2026
17/670,443
Patent 12594668
BRAIN-LIKE DECISION-MAKING AND MOTION CONTROL SYSTEM
4y 1m to grant Granted Apr 07, 2026
17/198,198
Patent 12579420
Analog Hardware Realization of Trained Neural Networks
5y 0m to grant Granted Mar 17, 2026
17/199,407
Patent 12579421
Analog Hardware Realization of Trained Neural Networks
5y 0m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
60%
Grant Probability
92%
With Interview (+32.1%)
3y 6m (~7m remaining)
Median Time to Grant
Low
PTA Risk
Based on 313 resolved cases by this examiner. Grant probability derived from career allowance rate.