Last updated: May 29, 2026
Application No. 17/943,958
SYSTEM AND METHOD FOR ENFORCING MONOTONICITY IN A NEURAL NETWORK ARCHITECTURE

Final Rejection §103§112
Filed
Sep 13, 2022
Priority
Sep 14, 2021 — provisional 63/243,925
Examiner
THAI, JASMINE THANH
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Royal Bank Of Canada
OA Round
2 (Final)
This examiner grants 25% of cases after interview

— +56.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 24 resolved cases, 2023–2026
Examiner Intelligence

THAI, JASMINE THANH View full profile →
Grants only 25% of cases
Career Allowance Rate
6 granted / 24 resolved
-30.0% vs TC avg
Strong +56% interview lift
Without
With
+56.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
16 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
83.6%
+43.6% vs TC avg
§102
16.4%
-23.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 24 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 01/19/2026 have been fully considered but they are not persuasive.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 101, the applicant argues that the amended claims directed to a technical solution. Examiner respectfully agrees and withdraws the rejection of claims under 35 USC § 101.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 102, the arguments are directed to newly amended limitations that were not previously examined by the examiner. Therefore, applicants arguments are rendered moot. The examiner refers to the rejection under 35 USC § 103 in the current office action for more details.

Claim Rejections - 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-9, 11-21 and 23 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claim 1 and analogous claims 12 and 23 recites “for each of a plurality of batch data sets of the training data, generating an augmented data set including the batch and a set of random data points sampled from the data space; generating mixtures of pairs of the augmented data set wherein selection of the pairs enables batch-batch and batch-random data point pair mixtures.” Applicant submits on pg. 1 of Remarks filed 01/19/2026, “Support for these amendments can be found throughout the originally-filed application.” 
Examiner respectfully points out that only extrapolation [for each of a plurality of batch data sets of the training data, generating an augmented data set ie new batch of size 2N including the batch and a set of random data points sampled from the data space] is supported to generate mixtures from batch and random data points. See para. [0014], [0028], [0039], and [0149] of the specification of the instant application, “In some embodiments, extrapolation of the training data and the random data comprises generating new data points for training by, for each batch of training data with size N>1: augmenting the batch of training data with data points from the random data to obtain a new batch of mixed up training data of size 2N.”
However, there does not appear to be sufficient written description support for “generating mixtures of pairs of the augmented data set wherein selection of the pairs enables batch-batch and batch-random data point pair mixtures” as Examiner notes that though there appears to be support for “batch-random data point pair mixtures,” there does not appear to be support for “batch-batch… data point pair mixtures.” 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-6, 8-18, 20-21 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Liu, Xingchao, et al. "Certified Monotonic Neural Networks." arXiv preprint arXiv:2011.10219 (2020). (“Liu”) in view of Zhang, Hongyi, et al. "mixup: Beyond empirical risk minimization." arXiv preprint arXiv:1710.09412 (2017). (“Zhang”).
In regards to claim 1, 
Liu teaches A computer-implemented system for training a neural network with enforced monotonicity, the system comprising: at least one processor; and memory in communication with said at least one processor, wherein the memory stores instructions for providing a data model representing a neural network for predicting an outcome based on input data, the instructions when executed at said at least one processor causes said system to: 
(Liu, Section 4.1, “Computational Time for Monotonicity Verification: Because our monotonicity verification involves solving MILP problems, we evaluate the time cost of two-layer verification in Fig. 3. All the results are averaged over 3 networks trained with different random seeds on COMPAS. The verification can be done in less than 4 seconds with 100 neurons in the first layer. Our computer has 48 cores and 192GB memory.”)
Liu teaches receive a feature data as input data, wherein the feature data comprises monotonic feature data; 
(Liu, Section 2, “Individual Monotonicity and Monotonicity Attacking In fields where fairness and security are of critical importance, it is highly desirable to enforce monotonicity over certain features in the deployed ML models [11, 17, 28]. Otherwise, the system may be subject to attacks that exploit the non-monotonicity within it. Consider, for example, a program for predicting a product price (e.g., house) based on the product features. Let xα be the features that people naturally expect to be monotonic (such as the quantity or quality of the product) [receive a feature data as input data, wherein the feature data comprises monotonic feature data]. For a product with feature x = [xα, x¬α], if the function is not monotonic w.r.t. xα, then we can find another testing example xˆ = [xˆα, xˆ¬α], which satisfies

    PNG
    media_image1.png
    24
    482
    media_image1.png
    Greyscale
”)
Liu teaches train the neural network with training data to encourage monotonicity across different parts of a data space of the feature data, 
(Liu, Abstract, “This provides a new general approach for learning monotonic neural networks with arbitrary model structures. Our method allows us to train neural networks with heuristic monotonicity regularizations, and we can gradually increase the regularization magnitude until the learned network is certified monotonic.”)
Liu teaches the training including: for each of a plurality of batch data sets of the training data, 
(Liu, Section 3.3, “The exact value of R(f) is intractable, and we approximate it by drawing samples of size 1024 [for each of a plurality of batch data sets of the training data ie samples of size 1024] uniformly from the input domain during iterations of the gradient descent.”)
Liu teaches computing a loss function based on the predicted outcome and an expected outcome associated with the input data, the loss function 
    PNG
    media_image2.png
    20
    16
    media_image2.png
    Greyscale
 being dependent on a monotonicity penalty 
    PNG
    media_image3.png
    18
    19
    media_image3.png
    Greyscale
 computed based on the mixtures of the pairs of the augmented data set; and updating weights of the neural network based on the loss function; and storing the updated weights of the neural network in the memory.
(Liu, Section 3.3, “We now introduce our simple procedure for learning monotonic neural networks with verification. Our learning algorithm works by training a typical network with a data-driving monotonicity regularization, and gradually increase the regularization magnitude until the network passes the monotonicity verification in (6). Precisely, it alternates between the following two steps: Step 1: Training a neural network f by [updating weights of the neural network based on the loss function; wherein training the neural network involves updating weights and obtaining a trained neural network includes storing the updated weights]

    PNG
    media_image4.png
    52
    637
    media_image4.png
    Greyscale

where L(f) is the typical training loss [compute a loss function based on the predicted outcome and an expected outcome associated with the input data], and R(f) is a penalty [the loss function 
    PNG
    media_image2.png
    20
    16
    media_image2.png
    Greyscale
 being dependent on a monotonicity penalty 
    PNG
    media_image3.png
    18
    19
    media_image3.png
    Greyscale
 ie R(f) computed based on the mixtures of the pairs of the augmented data set; wherein the augmented batch of Liu and Zhang would be substituted for Uni(X) to teach the mixtures of the pairs of the augmented data set] that characterizes the violation of monotonicity; here λ is the corresponding coefficient and Uni(X ) denotes the uniform distribution on X . R(f) can be defined heuristically in other ways. R(f) = 0 implies that f is monotonic w.r.t. xα, but it has to be computationally efficient. For example, Uα in (6) is not suitable because it is too computationally expensive to be evaluated at each iteration of training.”)
(Liu, Section 4.1, “Our computer has 48 cores and 192GB memory [and storing the updated weights of the neural network in the memory].”)
However, Liu does not explicitly teach generating an augmented data set including the batch and a set of random data points sampled from the data space; generating mixtures of pairs of the augmented data set wherein selection of the pairs enables batch-batch and batch-random data point pair mixtures;
Zhang teaches generating an augmented data set including the batch and a set of random data points sampled from the data space; generating mixtures of pairs of the augmented data set wherein selection of the pairs enables batch-batch and batch-random data point pair mixtures;
(Zhang, Section 1, “Contribution Motivated by these issues, we introduce a simple and data-agnostic data augmentation routine, termed mixup (Section 2). In a nutshell, mixup constructs virtual training examples

    PNG
    media_image5.png
    41
    377
    media_image5.png
    Greyscale
[ generating an augmented data set including the batch ie the batch provided by Liu and a set of random data points sampled from the data space]
(xi , yi) and (xj , yj ) are two examples drawn at random from our training data, and λ ∈ [0, 1]. Therefore, mixup extends the training distribution [generating mixtures of pairs of the augmented data set wherein selection of the pairs enables batch-batch and batch-random data point pair mixtures; wherein batch-batch and batch-random are interpreted to be the same in the context of Liu in view of Zhang as Liu provides a batch to be augmented and Zhang constructs the augmented new batch from random sampling on the batch] by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. mixup can be implemented in a few lines of code, and introduces minimal computation overhead.”)
Liu is considered to be analogous to the claimed invention because they are in the same field of monotonic neural networks. Zhang is considered analogous to the claimed invention because they are reasonably pertinent to the problem the inventor faced (data augmentation). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liu to incorporate the teachings of Zhang in order to regularize neural networks and prevent memorization through augmenting examples (Zhang, Abstract, “Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.”)

In regards to claim 2, 
Liu and Zhang teach The system of claim 1, 
Liu teaches wherein the set of random data points excludes the training data.  
(Liu, Section 3.3, “The exact value of R(f) is intractable, and we approximate it by drawing samples of size 1024 uniformly from the input domain during iterations of the gradient descent. Note that the samples we draw vary from iteration to iteration [wherein the set of random data points excludes the training data; wherein since a sample of size 1024 is drawn from the overall training data, it excludes the (overall) training data].”)

In regards to claim 3, 
Liu and Zhang teach The system of claim 2, 
Zhang teaches wherein the monotonicity penalty 
    PNG
    media_image3.png
    18
    19
    media_image3.png
    Greyscale
is determined based on at least one of: interpolation of the training data and extrapolation of the training data and the random data.  
(Zhang, Section 1, “Contribution Motivated by these issues, we introduce a simple and data-agnostic data augmentation routine, termed mixup (Section 2). In a nutshell, mixup constructs virtual training examples

    PNG
    media_image5.png
    41
    377
    media_image5.png
    Greyscale

(xi , yi) and (xj , yj ) are two examples drawn at random from our training data, and λ ∈ [0, 1]. Therefore, mixup extends the training distribution [interpolation of the training data] by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. mixup can be implemented in a few lines of code, and introduces minimal computation overhead.”; wherein Zhang is relied upon to augment the training data)

In regards to claim 4, 
Liu and Zhang teach The system of claim 3, 
Zhang teaches wherein interpolation of the training data comprises mixing up data points from the training data.  
(Zhang, Section 1, “Contribution Motivated by these issues, we introduce a simple and data-agnostic data augmentation routine, termed mixup (Section 2). In a nutshell, mixup constructs virtual training examples

    PNG
    media_image5.png
    41
    377
    media_image5.png
    Greyscale

(xi , yi) and (xj , yj ) are two examples drawn at random from our training data [mixing up data points from the training data], and λ ∈ [0, 1]. Therefore, mixup extends the training distribution [interpolation of the training data] by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. mixup can be implemented in a few lines of code, and introduces minimal computation overhead.”; wherein Zhang is relied upon to augment the training data)
In regards to claim 5, 
Liu and Zhang teach The system of claim 4, 
Zhang teaches wherein interpolation of a pair of data points (x', y'), (x", y") from the training data comprises generating new data points for training based on 
    PNG
    media_image6.png
    29
    238
    media_image6.png
    Greyscale
, and 
    PNG
    media_image7.png
    23
    133
    media_image7.png
    Greyscale
.  
(Zhang, Section 1, “Contribution Motivated by these issues, we introduce a simple and data-agnostic data augmentation routine, termed mixup (Section 2). In a nutshell, mixup constructs virtual training examples

    PNG
    media_image5.png
    41
    377
    media_image5.png
    Greyscale

(xi , yi) and (xj , yj ) are two examples [a pair of data points (x', y'), (x", y")] drawn at random from our training data, and λ ∈ [0, 1]. Therefore, mixup extends the training distribution by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. mixup can be implemented in a few lines of code, and introduces minimal computation overhead.”; wherein Zhang is relied upon to augment the training data)

In regards to claim 6, 
Liu and Zhang teach The system of claim 3, 
Zhang teaches wherein extrapolation of the training data and the random data comprises mixing up data points from the training data and the random data.  
(Zhang, Section 1, “Contribution Motivated by these issues, we introduce a simple and data-agnostic data augmentation routine, termed mixup (Section 2). In a nutshell, mixup constructs virtual training examples

    PNG
    media_image5.png
    41
    377
    media_image5.png
    Greyscale

(xi , yi) and (xj , yj ) are two examples drawn at random from our training data [mixing up data points from the training data and the random data], and λ ∈ [0, 1]. Therefore, mixup extends the training distribution [extrapolation of the training data and the random data] by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. mixup can be implemented in a few lines of code, and introduces minimal computation overhead.”; wherein Zhang is relied upon to augment the training data)

In regards to claim 8, 
Liu and Zhang teach The system of claim 3, 
Liu teaches wherein monotonic predictor is represented by 

    PNG
    media_image8.png
    20
    42
    media_image8.png
    Greyscale

    PNG
    media_image9.png
    26
    263
    media_image9.png
    Greyscale
, and 
    PNG
    media_image10.png
    22
    58
    media_image10.png
    Greyscale
is the monotonicity penalty configured to measure the monotonicity of the monotonic predictor h*_M relative to input dimensions indicated by M, M 
    PNG
    media_image11.png
    15
    18
    media_image11.png
    Greyscale
 {1,... d} being indicative of a subset of the input dimensions and comprising at least some of the monotonic feature data from the input data, where h represents a predictor of a class of predictors H for data from input x and output y data spaces, and 
    PNG
    media_image12.png
    12
    11
    media_image12.png
    Greyscale
 is a hyperparameter weighting the monotonicity penalty.
(Liu, Section 3.3, “We now introduce our simple procedure for learning monotonic neural networks with verification. Our learning algorithm works by training a typical network with a data-driving monotonicity regularization, and gradually increase the regularization magnitude until the network passes the monotonicity verification in (6) [monotonic predictor]. Precisely, it alternates between the following two steps: Step 1: Training a neural network f by

    PNG
    media_image13.png
    142
    621
    media_image13.png
    Greyscale

where L(f) is the typical training loss, and R(f) is a penalty that characterizes the violation of monotonicity; here λ is the corresponding coefficient and Uni(X ) denotes the uniform distribution on X . R(f) can be defined heuristically in other ways. R(f) = 0 implies that f is monotonic w.r.t. xα, but it has to be computationally efficient. For example, Uα in (6) is not suitable because it is too computationally expensive to be evaluated at each iteration of training.
The exact value of R(f) is intractable, and we approximate it by drawing samples of size 1024 uniformly from the input domain during iterations of the gradient descent. Note that the samples we draw vary from iteration to iteration [input dimensions indicated by M, M 
    PNG
    media_image11.png
    15
    18
    media_image11.png
    Greyscale
 {1,... d} being indicative of a subset of the input dimensions and comprising at least some of the monotonic feature data from the input data]. By the theory of stochastic gradient descent, we can expect to minimize the object function well at convergence. Also, training NNs requires more than thousands of steps, therefore the overall size of samples can well cover the input domain. In practice, we use a modified regularization R(f) = Ex∼Uni(X ) hP `∈α max(b, −∂x` f(x))2 i , where b is a small positive constant, because we find the original version will always lead to a Uα that is slightly smaller than zero. 
Step 2: Calculate Uα or a lower bound of it. If it is sufficient to verify that Uα ≥ 0, then f is monotonic and the algorithm terminates, otherwise, increase λ and repeat step 1.”)

In regards to claim 9, 
Liu and Zhang teach The system of claim 8, 
Liu teaches wherein  
    PNG
    media_image14.png
    44
    457
    media_image14.png
    Greyscale
 
    PNG
    media_image15.png
    39
    40
    media_image15.png
    Greyscale
indicates the gradients of h*_M relative to the input dimensions i 
    PNG
    media_image16.png
    17
    16
    media_image16.png
    Greyscale
 M, 
(Liu, Section 3.3, “We now introduce our simple procedure for learning monotonic neural networks with verification. Our learning algorithm works by training a typical network with a data-driving monotonicity regularization, and gradually increase the regularization magnitude until the network passes the monotonicity verification in (6). Precisely, it alternates between the following two steps: Step 1: Training a neural network f by

    PNG
    media_image17.png
    236
    621
    media_image17.png
    Greyscale
”; wherein Uni(X) is replaced by the augmented data given by Zhang)
(Liu, Section 3.2, “In addition to the individual monotonicity around a given point x, it is important to check the global monotonicity for all the points in the input domain as well. It turns out that we can also address this problem through an optimization approach. For a differentiable function f, it is monotonic w.r.t. xα on X if and only if ∂x` f(x) ≥ 0 for all ` ∈ α, x ∈ X . We can check this by solving

    PNG
    media_image18.png
    28
    453
    media_image18.png
    Greyscale

If Uα ≥ 0, then monotonicity is verified. Again, we can turn this optimization into a MILP for the ReLU networks. Consider the ReLU network in (4). Its gradient equals

    PNG
    media_image19.png
    49
    474
    media_image19.png
    Greyscale
”)
However, Liu does not explicitly teach wherein D comprises data points generated by the interpolation of the training data and by the extrapolation of the training data and the random data
Zhang teaches wherein D comprises data points generated by the interpolation of the training data and by the extrapolation of the training data and the random data.
(Zhang, Section 1, “Therefore, mixup extends the training distribution [D ie extended training data comprises data points generated by the interpolation of the training data and by the extrapolation of the training data and the random data] by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets.”)

In regards to claim 11, 
Liu and Zhang teach The system of claim 1, 
Liu teaches wherein the feature data comprises non-monotonic feature data.  
(Liu, Section 2, “In other words, while xˆ has the same values on the non-monotonic features [the feature data comprises non-monotonic feature data; wherein the feature data comprises of monotonic and non-monotonic feature data originally] with x, and smaller values on the monontonic features than x, f(xˆ) is larger than f(x). If such case is possible, the fairness of the system would be cast in doubt. Addressing this kind of problems is critical for many real-world scenarios such as criminal judgment, loan applications, as well as hiring/administration decisions. In light of this, we call f to be individually monotonic on x if there exists no adversarial example as described in (2). The non-monotonicity is hard to detect through a simple sanity check, unless the model is monotonic by construction. For example, Figure 1 shows a data instance x we found on COMPAS [16], a recidivism risk score dataset. In this example, a trained neural network is monotonic with respect to the monotonic features (i.e., f([xi , x¬i ]) w.r.t. each xi with x¬i fixed on the instance), but there exists an adversarial example xˆ that violates the monotonicity in the sense of (2). In this case, checking the monotonicity requires us to eliminate all the combinations of features on the input domain. To do so, we need a principled optimization framework, which can eliminate the existence of any possible monotonicity violations.”)

12 and 23 are substantially similar to and thus rejected under 35 USC § 103 as claim 1.
13 is substantially similar to and thus rejected under 35 USC § 103 as claim 2.
14 is substantially similar to and thus rejected under 35 USC § 103 as claim 11.
15 is substantially similar to and thus rejected under 35 USC § 103 as claim 3 (dependent of claim 14)
16 is substantially similar to and thus rejected under 35 USC § 103 as claim 4.
17 is substantially similar to and thus rejected under 35 USC § 103 as claim 5. 
18 is substantially similar to and thus rejected under 35 USC § 103 as claim 6.
20 is substantially similar to and thus rejected under 35 USC § 103 as claim 8. 
21 is substantially similar to and thus rejected under 35 USC § 103 as claim 9

Claim(s) 7 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Zhang in further view of Audiomason. (June 15th, 2016) Comment on “How do I get the total number of unique pairs of a set in the database?” Stack Overflow. https://web.archive.org/web/20181228130041/https://stackoverflow.com/questions/18859430/how-do-i-get-the-total-number-of-unique-pairs-of-a-set-in-the-database (“Audiomason”)
In regards to claim 7, 
Liu and Zhang teach The system of claim 6, 
wherein extrapolation of the training data and the random data comprises generating new data points for training by, 
for each batch of training data with size N > 1:
(Liu, Section 3.3, “The exact value of R(f) is intractable, and we approximate it by drawing samples of size 1024 [for each batch ie samples of size 1024 of training data with size N > 1] uniformly from the input domain during iterations of the gradient descent.”)
However, Liu does not explicitly teach augmenting the batch of training data with data points from the random data to obtain a new batch of 2N mixed up training data points (xm, ym); 
and out of the 
    PNG
    media_image20.png
    33
    62
    media_image20.png
    Greyscale
possible pairs of data points from the new batch of mixed up training data, 
selecting a random sample of k pairs of data points, wherein for each pair of data points (xm', ym'), (xm", ym") from the k pairs: 
generating new data points for training based on 
    PNG
    media_image21.png
    22
    290
    media_image21.png
    Greyscale
, wherein 
    PNG
    media_image22.png
    27
    134
    media_image22.png
    Greyscale
, and 
    PNG
    media_image23.png
    17
    14
    media_image23.png
    Greyscale
 is independently drawn.  

Zhang teaches augmenting the batch of training data with data points from the random data to obtain a new batch of 2N mixed up training data points (xm, ym); 
and out of the … possible pairs of data points from the new batch of mixed up training data, 
selecting a random sample of k pairs of data points, wherein for each pair of data points (xm', ym'), (xm", ym") from the k pairs: 
generating new data points for training based on 
    PNG
    media_image21.png
    22
    290
    media_image21.png
    Greyscale
, wherein 
    PNG
    media_image22.png
    27
    134
    media_image22.png
    Greyscale
, and 
    PNG
    media_image23.png
    17
    14
    media_image23.png
    Greyscale
 is independently drawn.  
(Zhang, Section 1, “Contribution Motivated by these issues, we introduce a simple and data-agnostic data augmentation routine, termed mixup (Section 2). In a nutshell, mixup constructs virtual training examples

    PNG
    media_image5.png
    41
    377
    media_image5.png
    Greyscale
[ generating new data points for training]
(xi , yi) and (xj , yj ) are two examples drawn at random from our training data [selecting a random sample of k pairs of data points], and λ ∈ [0, 1]. Therefore, mixup extends the training distribution [augmenting the batch of training data with data points from the random data to obtain a new batch of mixed up training data of size 2N; wherein the random data is a randomly drawn sample of the training data as previously taught by Liu (and thus an example drawn from the subset (that sample) is the data point given by the random data); further instead of providing the overall training data to augment using the method of Zhang, only the batch (samples of size 1024) is provided by Liu wherein augmenting the batch for each sample in the batch and concatenating the augmented batch with the original batch (thus extending the training distribution) provides a new batch of training data wherein the size is 2N ie 2 x 1024] by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. mixup can be implemented in a few lines of code, and introduces minimal computation overhead.”)
However, Zhang does not explicitly teach
    PNG
    media_image20.png
    33
    62
    media_image20.png
    Greyscale
possible pairs of data points from the new batch of mixed up training data, 

Audiomason teaches 
    PNG
    media_image20.png
    33
    62
    media_image20.png
    Greyscale
possible pairs of data points from the new batch of mixed up training data,
(Audiomason, “TLDR; The formula is n(n-1)/2 where n is the number of items in the set [Examiner’s note: n = 2N as that is the new size of the batch, the provided equation of Audiomason is equivalent to 2N(2N-1)/2].
Explanation:
To find the number of unique pairs in a set, where the pairs are subject to the commutative property (AB = BA), you can calculate the summation of 1 + 2 + ... + (n-1) where n is the number of items in the set.
The reasoning is as follows, say you have 4 items:
A
B
C
D
The number of items that can be paired with A is 3, or n-1:
AB
AC
AD
It follows that the number of items that can be paired with B is n-2 (because B has already been paired with A):
BC
BD
and so on...
(n-1) + (n-2) + ... + (n-(n-1))
which is the same as
1 + 2 + ... + (n-1)
or
n(n-1)/2
“)
Audiomason are both considered to be analogous to the claimed invention because they are reasonably pertinent to the problem the inventor faced (determining the number of pairs from a set). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liu and Zhang to incorporate the teachings of Audiomason in order to provide an explanation of how many possible pairs that can be obtained from the new batch provided by the augmentation of Zhang.

19 is substantially similar to and thus rejected under 35 USC § 103 as claim 7.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
NPL: Gupta, Akhil, et al. "How to incorporate monotonicity in deep networks while preserving flexibility?." arXiv preprint arXiv:1909.10662 (2019).
NPL: E. Hoffer, T. Ben-Nun, I. Hubara, N. Giladi, T. Hoefler and D. Soudry, "Augment Your Batch: Improving Generalization Through Instance Repetition," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 8126-8135, doi: 10.1109/CVPR42600.2020.00815
US Pub No. US20210295175A1: Kennel et al. teaches Training artificial neural networks with constraints
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASMINE THAI whose telephone number is (703)756-5904. The examiner can normally be reached M-F 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.T.T./Examiner, Art Unit 2129                                                                                                                                                                                                        




/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Sep 13, 2022
Application Filed
Aug 19, 2025
Non-Final Rejection mailed — §103, §112
Jan 19, 2026
Response Filed
Mar 30, 2026
Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/366,773
Patent 12561603
SYSTEM FOR TIME BASED MONITORING AND IMPROVED INTEGRITY OF MACHINE LEARNING MODEL INPUT DATA
4y 7m to grant Granted Feb 24, 2026
17/588,175
Patent 12555000
GENERATION OF CONVERSATIONAL TASK COMPLETION STRUCTURE
4y 0m to grant Granted Feb 17, 2026
17/676,775
Patent 12462154
METHOD AND SYSTEM FOR ASPECT-LEVEL SENTIMENT CLASSIFICATION BY MERGING GRAPHS
3y 8m to grant Granted Nov 04, 2025
17/470,900
Patent 12395590
REDUCTION AND GEO-SPATIAL DISTRIBUTION OF TRAINING DATA FOR GEOLOCATION PREDICTION USING MACHINE LEARNING
3y 11m to grant Granted Aug 19, 2025
17/357,626
Patent 12380361
Federated Machine Learning Management
4y 1m to grant Granted Aug 05, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
25%
Grant Probability
81%
With Interview (+56.3%)
3y 9m (~1m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 24 resolved cases by this examiner. Grant probability derived from career allowance rate.