DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 01/19/2026 have been fully considered but they are not persuasive.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 101, the applicant argues that the amended claims directed to a technical solution. Examiner respectfully agrees and withdraws the rejection of claims under 35 USC § 101.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 102, the arguments are directed to newly amended limitations that were not previously examined by the examiner. Therefore, applicants arguments are rendered moot. The examiner refers to the rejection under 35 USC § 103 in the current office action for more details.
Claim Rejections - 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-9, 11-21 and 23 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 1 and analogous claims 12 and 23 recites “for each of a plurality of batch data sets of the training data, generating an augmented data set including the batch and a set of random data points sampled from the data space; generating mixtures of pairs of the augmented data set wherein selection of the pairs enables batch-batch and batch-random data point pair mixtures.” Applicant submits on pg. 1 of Remarks filed 01/19/2026, “Support for these amendments can be found throughout the originally-filed application.”
Examiner respectfully points out that only extrapolation [for each of a plurality of batch data sets of the training data, generating an augmented data set ie new batch of size 2N including the batch and a set of random data points sampled from the data space] is supported to generate mixtures from batch and random data points. See para. [0014], [0028], [0039], and [0149] of the specification of the instant application, “In some embodiments, extrapolation of the training data and the random data comprises generating new data points for training by, for each batch of training data with size N>1: augmenting the batch of training data with data points from the random data to obtain a new batch of mixed up training data of size 2N.”
However, there does not appear to be sufficient written description support for “generating mixtures of pairs of the augmented data set wherein selection of the pairs enables batch-batch and batch-random data point pair mixtures” as Examiner notes that though there appears to be support for “batch-random data point pair mixtures,” there does not appear to be support for “batch-batch… data point pair mixtures.”
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-6, 8-18, 20-21 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Liu, Xingchao, et al. "Certified Monotonic Neural Networks." arXiv preprint arXiv:2011.10219 (2020). (“Liu”) in view of Zhang, Hongyi, et al. "mixup: Beyond empirical risk minimization." arXiv preprint arXiv:1710.09412 (2017). (“Zhang”).
In regards to claim 1,
Liu teaches A computer-implemented system for training a neural network with enforced monotonicity, the system comprising: at least one processor; and memory in communication with said at least one processor, wherein the memory stores instructions for providing a data model representing a neural network for predicting an outcome based on input data, the instructions when executed at said at least one processor causes said system to:
(Liu, Section 4.1, “Computational Time for Monotonicity Verification: Because our monotonicity verification involves solving MILP problems, we evaluate the time cost of two-layer verification in Fig. 3. All the results are averaged over 3 networks trained with different random seeds on COMPAS. The verification can be done in less than 4 seconds with 100 neurons in the first layer. Our computer has 48 cores and 192GB memory.”)
Liu teaches receive a feature data as input data, wherein the feature data comprises monotonic feature data;
(Liu, Section 2, “Individual Monotonicity and Monotonicity Attacking In fields where fairness and security are of critical importance, it is highly desirable to enforce monotonicity over certain features in the deployed ML models [11, 17, 28]. Otherwise, the system may be subject to attacks that exploit the non-monotonicity within it. Consider, for example, a program for predicting a product price (e.g., house) based on the product features. Let xα be the features that people naturally expect to be monotonic (such as the quantity or quality of the product) [receive a feature data as input data, wherein the feature data comprises monotonic feature data]. For a product with feature x = [xα, x¬α], if the function is not monotonic w.r.t. xα, then we can find another testing example xˆ = [xˆα, xˆ¬α], which satisfies
PNG
media_image1.png
24
482
media_image1.png
Greyscale
”)
Liu teaches train the neural network with training data to encourage monotonicity across different parts of a data space of the feature data,
(Liu, Abstract, “This provides a new general approach for learning monotonic neural networks with arbitrary model structures. Our method allows us to train neural networks with heuristic monotonicity regularizations, and we can gradually increase the regularization magnitude until the learned network is certified monotonic.”)
Liu teaches the training including: for each of a plurality of batch data sets of the training data,
(Liu, Section 3.3, “The exact value of R(f) is intractable, and we approximate it by drawing samples of size 1024 [for each of a plurality of batch data sets of the training data ie samples of size 1024] uniformly from the input domain during iterations of the gradient descent.”)
Liu teaches computing a loss function based on the predicted outcome and an expected outcome associated with the input data, the loss function
PNG
media_image2.png
20
16
media_image2.png
Greyscale
being dependent on a monotonicity penalty
PNG
media_image3.png
18
19
media_image3.png
Greyscale
computed based on the mixtures of the pairs of the augmented data set; and updating weights of the neural network based on the loss function; and storing the updated weights of the neural network in the memory.
(Liu, Section 3.3, “We now introduce our simple procedure for learning monotonic neural networks with verification. Our learning algorithm works by training a typical network with a data-driving monotonicity regularization, and gradually increase the regularization magnitude until the network passes the monotonicity verification in (6). Precisely, it alternates between the following two steps: Step 1: Training a neural network f by [updating weights of the neural network based on the loss function; wherein training the neural network involves updating weights and obtaining a trained neural network includes storing the updated weights]
PNG
media_image4.png
52
637
media_image4.png
Greyscale
where L(f) is the typical training loss [compute a loss function based on the predicted outcome and an expected outcome associated with the input data], and R(f) is a penalty [the loss function
PNG
media_image2.png
20
16
media_image2.png
Greyscale
being dependent on a monotonicity penalty
PNG
media_image3.png
18
19
media_image3.png
Greyscale
ie R(f) computed based on the mixtures of the pairs of the augmented data set; wherein the augmented batch of Liu and Zhang would be substituted for Uni(X) to teach the mixtures of the pairs of the augmented data set] that characterizes the violation of monotonicity; here λ is the corresponding coefficient and Uni(X ) denotes the uniform distribution on X . R(f) can be defined heuristically in other ways. R(f) = 0 implies that f is monotonic w.r.t. xα, but it has to be computationally efficient. For example, Uα in (6) is not suitable because it is too computationally expensive to be evaluated at each iteration of training.”)
(Liu, Section 4.1, “Our computer has 48 cores and 192GB memory [and storing the updated weights of the neural network in the memory].”)
However, Liu does not explicitly teach generating an augmented data set including the batch and a set of random data points sampled from the data space; generating mixtures of pairs of the augmented data set wherein selection of the pairs enables batch-batch and batch-random data point pair mixtures;
Zhang teaches generating an augmented data set including the batch and a set of random data points sampled from the data space; generating mixtures of pairs of the augmented data set wherein selection of the pairs enables batch-batch and batch-random data point pair mixtures;
(Zhang, Section 1, “Contribution Motivated by these issues, we introduce a simple and data-agnostic data augmentation routine, termed mixup (Section 2). In a nutshell, mixup constructs virtual training examples
PNG
media_image5.png
41
377
media_image5.png
Greyscale
[ generating an augmented data set including the batch ie the batch provided by Liu and a set of random data points sampled from the data space]
(xi , yi) and (xj , yj ) are two examples drawn at random from our training data, and λ ∈ [0, 1]. Therefore, mixup extends the training distribution [generating mixtures of pairs of the augmented data set wherein selection of the pairs enables batch-batch and batch-random data point pair mixtures; wherein batch-batch and batch-random are interpreted to be the same in the context of Liu in view of Zhang as Liu provides a batch to be augmented and Zhang constructs the augmented new batch from random sampling on the batch] by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. mixup can be implemented in a few lines of code, and introduces minimal computation overhead.”)
Liu is considered to be analogous to the claimed invention because they are in the same field of monotonic neural networks. Zhang is considered analogous to the claimed invention because they are reasonably pertinent to the problem the inventor faced (data augmentation). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liu to incorporate the teachings of Zhang in order to regularize neural networks and prevent memorization through augmenting examples (Zhang, Abstract, “Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.”)
In regards to claim 2,
Liu and Zhang teach The system of claim 1,
Liu teaches wherein the set of random data points excludes the training data.
(Liu, Section 3.3, “The exact value of R(f) is intractable, and we approximate it by drawing samples of size 1024 uniformly from the input domain during iterations of the gradient descent. Note that the samples we draw vary from iteration to iteration [wherein the set of random data points excludes the training data; wherein since a sample of size 1024 is drawn from the overall training data, it excludes the (overall) training data].”)
In regards to claim 3,
Liu and Zhang teach The system of claim 2,
Zhang teaches wherein the monotonicity penalty
PNG
media_image3.png
18
19
media_image3.png
Greyscale
is determined based on at least one of: interpolation of the training data and extrapolation of the training data and the random data.
(Zhang, Section 1, “Contribution Motivated by these issues, we introduce a simple and data-agnostic data augmentation routine, termed mixup (Section 2). In a nutshell, mixup constructs virtual training examples
PNG
media_image5.png
41
377
media_image5.png
Greyscale
(xi , yi) and (xj , yj ) are two examples drawn at random from our training data, and λ ∈ [0, 1]. Therefore, mixup extends the training distribution [interpolation of the training data] by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. mixup can be implemented in a few lines of code, and introduces minimal computation overhead.”; wherein Zhang is relied upon to augment the training data)
In regards to claim 4,
Liu and Zhang teach The system of claim 3,
Zhang teaches wherein interpolation of the training data comprises mixing up data points from the training data.
(Zhang, Section 1, “Contribution Motivated by these issues, we introduce a simple and data-agnostic data augmentation routine, termed mixup (Section 2). In a nutshell, mixup constructs virtual training examples
PNG
media_image5.png
41
377
media_image5.png
Greyscale
(xi , yi) and (xj , yj ) are two examples drawn at random from our training data [mixing up data points from the training data], and λ ∈ [0, 1]. Therefore, mixup extends the training distribution [interpolation of the training data] by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. mixup can be implemented in a few lines of code, and introduces minimal computation overhead.”; wherein Zhang is relied upon to augment the training data)
In regards to claim 5,
Liu and Zhang teach The system of claim 4,
Zhang teaches wherein interpolation of a pair of data points (x', y'), (x", y") from the training data comprises generating new data points for training based on
PNG
media_image6.png
29
238
media_image6.png
Greyscale
, and
PNG
media_image7.png
23
133
media_image7.png
Greyscale
.
(Zhang, Section 1, “Contribution Motivated by these issues, we introduce a simple and data-agnostic data augmentation routine, termed mixup (Section 2). In a nutshell, mixup constructs virtual training examples
PNG
media_image5.png
41
377
media_image5.png
Greyscale
(xi , yi) and (xj , yj ) are two examples [a pair of data points (x', y'), (x", y")] drawn at random from our training data, and λ ∈ [0, 1]. Therefore, mixup extends the training distribution by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. mixup can be implemented in a few lines of code, and introduces minimal computation overhead.”; wherein Zhang is relied upon to augment the training data)
In regards to claim 6,
Liu and Zhang teach The system of claim 3,
Zhang teaches wherein extrapolation of the training data and the random data comprises mixing up data points from the training data and the random data.
(Zhang, Section 1, “Contribution Motivated by these issues, we introduce a simple and data-agnostic data augmentation routine, termed mixup (Section 2). In a nutshell, mixup constructs virtual training examples
PNG
media_image5.png
41
377
media_image5.png
Greyscale
(xi , yi) and (xj , yj ) are two examples drawn at random from our training data [mixing up data points from the training data and the random data], and λ ∈ [0, 1]. Therefore, mixup extends the training distribution [extrapolation of the training data and the random data] by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. mixup can be implemented in a few lines of code, and introduces minimal computation overhead.”; wherein Zhang is relied upon to augment the training data)
In regards to claim 8,
Liu and Zhang teach The system of claim 3,
Liu teaches wherein monotonic predictor is represented by
PNG
media_image8.png
20
42
media_image8.png
Greyscale
PNG
media_image9.png
26
263
media_image9.png
Greyscale
, and
PNG
media_image10.png
22
58
media_image10.png
Greyscale
is the monotonicity penalty configured to measure the monotonicity of the monotonic predictor h*_M relative to input dimensions indicated by M, M
PNG
media_image11.png
15
18
media_image11.png
Greyscale
{1,... d} being indicative of a subset of the input dimensions and comprising at least some of the monotonic feature data from the input data, where h represents a predictor of a class of predictors H for data from input x and output y data spaces, and
PNG
media_image12.png
12
11
media_image12.png
Greyscale
is a hyperparameter weighting the monotonicity penalty.
(Liu, Section 3.3, “We now introduce our simple procedure for learning monotonic neural networks with verification. Our learning algorithm works by training a typical network with a data-driving monotonicity regularization, and gradually increase the regularization magnitude until the network passes the monotonicity verification in (6) [monotonic predictor]. Precisely, it alternates between the following two steps: Step 1: Training a neural network f by
PNG
media_image13.png
142
621
media_image13.png
Greyscale
where L(f) is the typical training loss, and R(f) is a penalty that characterizes the violation of monotonicity; here λ is the corresponding coefficient and Uni(X ) denotes the uniform distribution on X . R(f) can be defined heuristically in other ways. R(f) = 0 implies that f is monotonic w.r.t. xα, but it has to be computationally efficient. For example, Uα in (6) is not suitable because it is too computationally expensive to be evaluated at each iteration of training.
The exact value of R(f) is intractable, and we approximate it by drawing samples of size 1024 uniformly from the input domain during iterations of the gradient descent. Note that the samples we draw vary from iteration to iteration [input dimensions indicated by M, M
PNG
media_image11.png
15
18
media_image11.png
Greyscale
{1,... d} being indicative of a subset of the input dimensions and comprising at least some of the monotonic feature data from the input data]. By the theory of stochastic gradient descent, we can expect to minimize the object function well at convergence. Also, training NNs requires more than thousands of steps, therefore the overall size of samples can well cover the input domain. In practice, we use a modified regularization R(f) = Ex∼Uni(X ) hP `∈α max(b, −∂x` f(x))2 i , where b is a small positive constant, because we find the original version will always lead to a Uα that is slightly smaller than zero.
Step 2: Calculate Uα or a lower bound of it. If it is sufficient to verify that Uα ≥ 0, then f is monotonic and the algorithm terminates, otherwise, increase λ and repeat step 1.”)
In regards to claim 9,
Liu and Zhang teach The system of claim 8,
Liu teaches wherein
PNG
media_image14.png
44
457
media_image14.png
Greyscale
PNG
media_image15.png
39
40
media_image15.png
Greyscale
indicates the gradients of h*_M relative to the input dimensions i
PNG
media_image16.png
17
16
media_image16.png
Greyscale
M,
(Liu, Section 3.3, “We now introduce our simple procedure for learning monotonic neural networks with verification. Our learning algorithm works by training a typical network with a data-driving monotonicity regularization, and gradually increase the regularization magnitude until the network passes the monotonicity verification in (6). Precisely, it alternates between the following two steps: Step 1: Training a neural network f by
PNG
media_image17.png
236
621
media_image17.png
Greyscale
”; wherein Uni(X) is replaced by the augmented data given by Zhang)
(Liu, Section 3.2, “In addition to the individual monotonicity around a given point x, it is important to check the global monotonicity for all the points in the input domain as well. It turns out that we can also address this problem through an optimization approach. For a differentiable function f, it is monotonic w.r.t. xα on X if and only if ∂x` f(x) ≥ 0 for all ` ∈ α, x ∈ X . We can check this by solving
PNG
media_image18.png
28
453
media_image18.png
Greyscale
If Uα ≥ 0, then monotonicity is verified. Again, we can turn this optimization into a MILP for the ReLU networks. Consider the ReLU network in (4). Its gradient equals
PNG
media_image19.png
49
474
media_image19.png
Greyscale
”)
However, Liu does not explicitly teach wherein D comprises data points generated by the interpolation of the training data and by the extrapolation of the training data and the random data
Zhang teaches wherein D comprises data points generated by the interpolation of the training data and by the extrapolation of the training data and the random data.
(Zhang, Section 1, “Therefore, mixup extends the training distribution [D ie extended training data comprises data points generated by the interpolation of the training data and by the extrapolation of the training data and the random data] by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets.”)
In regards to claim 11,
Liu and Zhang teach The system of claim 1,
Liu teaches wherein the feature data comprises non-monotonic feature data.
(Liu, Section 2, “In other words, while xˆ has the same values on the non-monotonic features [the feature data comprises non-monotonic feature data; wherein the feature data comprises of monotonic and non-monotonic feature data originally] with x, and smaller values on the monontonic features than x, f(xˆ) is larger than f(x). If such case is possible, the fairness of the system would be cast in doubt. Addressing this kind of problems is critical for many real-world scenarios such as criminal judgment, loan applications, as well as hiring/administration decisions. In light of this, we call f to be individually monotonic on x if there exists no adversarial example as described in (2). The non-monotonicity is hard to detect through a simple sanity check, unless the model is monotonic by construction. For example, Figure 1 shows a data instance x we found on COMPAS [16], a recidivism risk score dataset. In this example, a trained neural network is monotonic with respect to the monotonic features (i.e., f([xi , x¬i ]) w.r.t. each xi with x¬i fixed on the instance), but there exists an adversarial example xˆ that violates the monotonicity in the sense of (2). In this case, checking the monotonicity requires us to eliminate all the combinations of features on the input domain. To do so, we need a principled optimization framework, which can eliminate the existence of any possible monotonicity violations.”)
12 and 23 are substantially similar to and thus rejected under 35 USC § 103 as claim 1.
13 is substantially similar to and thus rejected under 35 USC § 103 as claim 2.
14 is substantially similar to and thus rejected under 35 USC § 103 as claim 11.
15 is substantially similar to and thus rejected under 35 USC § 103 as claim 3 (dependent of claim 14)
16 is substantially similar to and thus rejected under 35 USC § 103 as claim 4.
17 is substantially similar to and thus rejected under 35 USC § 103 as claim 5.
18 is substantially similar to and thus rejected under 35 USC § 103 as claim 6.
20 is substantially similar to and thus rejected under 35 USC § 103 as claim 8.
21 is substantially similar to and thus rejected under 35 USC § 103 as claim 9
Claim(s) 7 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Zhang in further view of Audiomason. (June 15th, 2016) Comment on “How do I get the total number of unique pairs of a set in the database?” Stack Overflow. https://web.archive.org/web/20181228130041/https://stackoverflow.com/questions/18859430/how-do-i-get-the-total-number-of-unique-pairs-of-a-set-in-the-database (“Audiomason”)
In regards to claim 7,
Liu and Zhang teach The system of claim 6,
wherein extrapolation of the training data and the random data comprises generating new data points for training by,
for each batch of training data with size N > 1:
(Liu, Section 3.3, “The exact value of R(f) is intractable, and we approximate it by drawing samples of size 1024 [for each batch ie samples of size 1024 of training data with size N > 1] uniformly from the input domain during iterations of the gradient descent.”)
However, Liu does not explicitly teach augmenting the batch of training data with data points from the random data to obtain a new batch of 2N mixed up training data points (xm, ym);
and out of the
PNG
media_image20.png
33
62
media_image20.png
Greyscale
possible pairs of data points from the new batch of mixed up training data,
selecting a random sample of k pairs of data points, wherein for each pair of data points (xm', ym'), (xm", ym") from the k pairs:
generating new data points for training based on
PNG
media_image21.png
22
290
media_image21.png
Greyscale
, wherein
PNG
media_image22.png
27
134
media_image22.png
Greyscale
, and
PNG
media_image23.png
17
14
media_image23.png
Greyscale
is independently drawn.
Zhang teaches augmenting the batch of training data with data points from the random data to obtain a new batch of 2N mixed up training data points (xm, ym);
and out of the … possible pairs of data points from the new batch of mixed up training data,
selecting a random sample of k pairs of data points, wherein for each pair of data points (xm', ym'), (xm", ym") from the k pairs:
generating new data points for training based on
PNG
media_image21.png
22
290
media_image21.png
Greyscale
, wherein
PNG
media_image22.png
27
134
media_image22.png
Greyscale
, and
PNG
media_image23.png
17
14
media_image23.png
Greyscale
is independently drawn.
(Zhang, Section 1, “Contribution Motivated by these issues, we introduce a simple and data-agnostic data augmentation routine, termed mixup (Section 2). In a nutshell, mixup constructs virtual training examples
PNG
media_image5.png
41
377
media_image5.png
Greyscale
[ generating new data points for training]
(xi , yi) and (xj , yj ) are two examples drawn at random from our training data [selecting a random sample of k pairs of data points], and λ ∈ [0, 1]. Therefore, mixup extends the training distribution [augmenting the batch of training data with data points from the random data to obtain a new batch of mixed up training data of size 2N; wherein the random data is a randomly drawn sample of the training data as previously taught by Liu (and thus an example drawn from the subset (that sample) is the data point given by the random data); further instead of providing the overall training data to augment using the method of Zhang, only the batch (samples of size 1024) is provided by Liu wherein augmenting the batch for each sample in the batch and concatenating the augmented batch with the original batch (thus extending the training distribution) provides a new batch of training data wherein the size is 2N ie 2 x 1024] by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. mixup can be implemented in a few lines of code, and introduces minimal computation overhead.”)
However, Zhang does not explicitly teach
PNG
media_image20.png
33
62
media_image20.png
Greyscale
possible pairs of data points from the new batch of mixed up training data,
Audiomason teaches
PNG
media_image20.png
33
62
media_image20.png
Greyscale
possible pairs of data points from the new batch of mixed up training data,
(Audiomason, “TLDR; The formula is n(n-1)/2 where n is the number of items in the set [Examiner’s note: n = 2N as that is the new size of the batch, the provided equation of Audiomason is equivalent to 2N(2N-1)/2].
Explanation:
To find the number of unique pairs in a set, where the pairs are subject to the commutative property (AB = BA), you can calculate the summation of 1 + 2 + ... + (n-1) where n is the number of items in the set.
The reasoning is as follows, say you have 4 items:
A
B
C
D
The number of items that can be paired with A is 3, or n-1:
AB
AC
AD
It follows that the number of items that can be paired with B is n-2 (because B has already been paired with A):
BC
BD
and so on...
(n-1) + (n-2) + ... + (n-(n-1))
which is the same as
1 + 2 + ... + (n-1)
or
n(n-1)/2
“)
Audiomason are both considered to be analogous to the claimed invention because they are reasonably pertinent to the problem the inventor faced (determining the number of pairs from a set). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Liu and Zhang to incorporate the teachings of Audiomason in order to provide an explanation of how many possible pairs that can be obtained from the new batch provided by the augmentation of Zhang.
19 is substantially similar to and thus rejected under 35 USC § 103 as claim 7.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
NPL: Gupta, Akhil, et al. "How to incorporate monotonicity in deep networks while preserving flexibility?." arXiv preprint arXiv:1909.10662 (2019).
NPL: E. Hoffer, T. Ben-Nun, I. Hubara, N. Giladi, T. Hoefler and D. Soudry, "Augment Your Batch: Improving Generalization Through Instance Repetition," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 8126-8135, doi: 10.1109/CVPR42600.2020.00815
US Pub No. US20210295175A1: Kennel et al. teaches Training artificial neural networks with constraints
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASMINE THAI whose telephone number is (703)756-5904. The examiner can normally be reached M-F 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.T.T./Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129