Office Action Analysis: 18189039 — MACHINE LEARNING MODEL TRAINING FOR IMPROVING ANOMALY DETECTION

Office Action

§101 §103
DETAILED ACTION
This action is responsive to the application filed on 03/23/2023. Claims 1-20 are pending and have been examined.
This action is Non-final.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C.
120, 121, 365(c), or 386(c) is acknowledged.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition
of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the
conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1,
Step 1: The claim is directed to a method, which is one of four statutory categories. Therefore, claim 1 satisfies step 1. 
Step 2A Prong 1: 
“generating, by the one or more processors and using an anomaly detection machine learning model and the plurality of labeled training data objects, (a) a normal prediction loss parameter associated with the normal classification label and (b) an anomaly prediction loss parameter associated with the anomaly classification label;” -- The limitation is directed to generating a normal prediction loss parameter associated with a label for normal classification as well as anomaly. The limitation is directed to the use of mathematical concept/calculation/operation, and thus is considered math. 
Step 2A Prong 2 and Step 2B: 
“A computer-implemented method comprising: receiving, by one or more processors, a plurality of labeled training data objects, wherein (a) a labeled training data object of the plurality of labeled training data objects is associated with one of a plurality of labeled classification parameters, and (b) a labeled classification parameter of the plurality of labeled classification parameters indicates at least one of a normal classification label or an anomaly classification label;” - The limitation recites method of receiving a plurality of data object like labeled training associated with classification parameters, and a parameter that indicates at least one normal/anomaly classification label. The limitation is directed to an insignificant, extra-solution activity by way of mere data gathering/obtaining data for use in manipulation of data and calculation, and it does not integrate to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, sending/receiving data over a network/computer of mere gathered data is a well-understood, routine, and conventional activity (WURC) that cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)).
generating, by the one or more processors and using a classification prediction machine learning model, a global classification loss parameter based on the plurality of labeled training data objects, the normal prediction loss parameter, and the anomaly prediction loss parameter; generating, by the one or more processors, a composite loss parameter based on the normal prediction loss parameter, the global classification loss parameter, a normal prediction weight parameter, and a global classification weight parameter; and initiating, by the one or more processors, the performance of one or more prediction-based operations based on the anomaly detection machine learning model and the composite loss parameter.”  -- The limitation recites generating loss parameters based on gathered data, initiating the performance of predict-based operations, and also using elements of the computer like the processor(s) and the classification prediction ML model. The limitation is directed to mere instructions to apply onto a computer and does not integrate to a practical application, nor does it provide significantly more than the judicial exception (see MPEP 2106.05(f)). 
Thus, claim 1 is non-patent eligible. Claims 8 and 15 are analogous to claim 1, aside from claim type and preamble, though main contents of the claim are analogous and can be treated as such, and thus the same rejection can be applied. 

Regarding claim 2,
Step 1: The claim is directed to a method, which is one of four statutory categories. Therefore, claim 1 satisfies step 1. 
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B; 
“The computer-implemented method of claim 1, further comprising selecting a plurality of normal training data objects and a plurality of anomaly training data objects from the plurality of labeled training data objects, wherein (a) a normal training data object of the plurality of normal training data objects is associated with the normal classification label, and (b) an anomaly training data object of the plurality of anomaly training data objects is associated with the anomaly classification label.” -- The limitation recites that the method will further comprise selecting training data objects from labeled training data and discusses relationships/associations of one data to label from another. The limitation is directed to an significant, extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, the act of selecting data and merely describing the data’s relationship is a well-understood, routine, and conventional activity (WURC) that cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 2 is non-patent eligible. Claims 9 and 16 are analogous to claim 2, and thus would face the same rejection as set forth above. 

Regarding claim 3,
Step 1: The claim is directed to a method, which is one of four statutory categories. Therefore, claim 1 satisfies step 1. 
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B; 
“The computer-implemented method of claim 2, wherein a normal training data object count associated with the plurality of normal training data objects is larger than an anomaly training data object count associated with the plurality of anomaly training data objects.” -- The limitation recites that the normal training data introduced in claims 1 and 2 will be associated with training data objects that are larger than anomaly data objects. The limitation amounts to no more than mere further limiting to a field of use/environment, and it cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 3 is non-patent eligible. Claims 10 and 17 are analogous to claim 3, and thus would face the same rejection as set forth above. 

Regarding claim 4, 
Step 1: The claim is directed to a method, which is one of four statutory categories. Therefore, claim 1 satisfies step 1. 
Step 2A Prong 1: 
“The computer-implemented method of claim 1, wherein the normal prediction loss parameter indicates a reconstruction loss measure associated with the anomaly detection machine learning model in reconstructing a plurality of normal training data objects from the plurality of labeled training data objects from the plurality of labeled training data objects that is associated with the normal classification label.” -- The limitation is directed to normal detection loss parameter that will indicates multiple mathematical relationships/concepts with one data to another, and thus the limitation is directed to math. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 4 is non-patent eligible. Claims 11 and 18 are analogous to claim 4, and thus would face the same rejection as set forth above. Furthermore, claim 6 is analogous to claim 4, main difference being anomaly training data vs. normal training data, which would not affect 101.

Regarding claim 5, 
Step 1: The claim is directed to a method, which is one of four statutory categories. Therefore, claim 1 satisfies step 1. 
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B: 
“ The computer-implemented method of claim 4, further comprising: generating, by the one or more processors, a plurality of encoded normal training data objects based on the plurality of normal training data objects; generating, by the one or more processors, a plurality of reconstructed normal training data objects based on the plurality of encoded normal training data objects; and generating, by the one or more processors, the normal prediction loss parameter based on the plurality of normal training data objects and the plurality of reconstructed normal training data objects.” -- The limitation is directed to generating encoded training data based on gathered normal training fata objects, a group of reconstructed normal training data objects based on a group of encoded normal training data objects, and the normal prediction loss parameter based on the group of normal and reconstructed training data objects. The limitation is directed to an insignificant, extra-solution activity that of generating new data based on mere gathered/past information, which does not integrate into a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, the act of generating data based on past, gathered information to be performed unto a computer/network is a well-understood, routine, and conventional activity (WURC) that cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 5 is non-patent eligible. Claims 12 and 19 are analogous to claim 5, and thus would face the same rejection as set forth above. Furthermore, claim 7 and 14 is analogous to claim 5, main difference being anomaly training data vs. normal training data, which would not affect 101.

Regarding claim 6, 
Step 1: The claim is directed to a method, which is one of four statutory categories. Therefore, claim 1 satisfies step 1. 
Step 2A Prong 1: 
“ The computer-implemented method of claim 1, wherein the anomaly prediction loss parameter indicates a reconstruction loss measure associated with the anomaly detection machine learning model in reconstructing a plurality of anomaly training data objects from the plurality of labeled training data objects that is associated with the anomaly classification label.” -- The limitation is directed to computed loss measure and what will indicate in relation to the model when constructing the data objects. The limitation is directed to a mathematical concept/calculation, and thus the limitation is directed to math. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
 Thus, claim 6 is non-patent eligible. Claims 13 and 20 are analogous to claim 6, and thus would face the same rejection as set forth above. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this
Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not
identically disclosed as set forth in section 102, if the differences between the claimed invention and the
prior art are such that the claimed invention as a whole would have been obvious before the effective filing
date of the claimed invention to a person having ordinary skill in the art to which the claimed invention
pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are
summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness. 
Claim(s) 1-3.6-10,13-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over NPL reference “Deep semi-supervised anomaly detection”, by Ruff et. al. (referred herein as Ruff) in view of NPL reference “Esad: End-to-end deep semi-supervised anomaly detection”, by Huang et. al. (referred herein as Huang). 

Regarding claim 1, Ruff teaches:
 A computer-implemented method comprising: receiving, by one or more processors, a plurality of labeled training data objects, wherein (a) a labeled training data object of the plurality of labeled training data objects is associated with one of a plurality of labeled classification parameters, and (b) a labeled classification parameter of the plurality of labeled classification parameters indicates at least one of a normal classification label or an anomaly classification label; ([Ruff, page 4] “We now introduce our method for deep semi-supervised anomaly detection: Deep SAD. Assume that, in addition to the n unlabeled samples x1, . . . , xn ∈ X with X ⊆ R^D, we also have access to m labeled samples (x˜1, y˜1), . . . ,(x˜m, y˜m) ∈ X × Y with Y = {-1, +1} where y˜ = +1 denotes known normal samples and y˜ = -1 known anomalies...We employ the same loss term as Deep SVDD for the unlabeled data in our Deep SAD objective and thus recover Deep SVDD (3) as the special case when there is no labeled training data available (m = 0). In doing this we also incorporate the assumption that most of the unlabeled data is normal.”, wherein the examiner interprets “m labeled samples (x˜₁, y˜₁), . . . ,(x˜ₘ, y˜ₘ)” to be the same as a plurality of labeled training data objects, because they are both directed to a finite collection of training examples each individually paired with an assigned class label, and further interprets “Y = {-1, +1} where y˜ = +1 denotes known normal samples and y˜ = -1 known anomalies” to be the same as a plurality of labeled classification parameters indicating at least one of a normal classification label or an anomaly classification label, because they are both directed to a binary labeling scheme in which each discrete label value designates a sample as belonging to either the normal class or the anomalous class).
and the plurality of labeled training data objects, (a) a normal prediction loss parameter associated with the normal classification label and (b) an anomaly prediction loss parameter associated with the anomaly classification label; ([Ruff, p. 5] “For the labeled data, we introduce a new loss term that is weighted via the hyperparameter η > 0 which controls the balance between the labeled and the unlabeled term. Setting η > 1 puts more emphasis on the labeled data whereas η < 1 emphasizes the unlabeled data. For the labeled normal samples (y˜ = +1), we also impose a quadratic loss on the distances of the mapped points to the center c, thus intending to overall learn a latent distribution which concentrates the normal data. Again, one might consider η > 1 to emphasize labeled normal over unlabeled samples. For the labeled anomalies (y˜ = -1) in contrast, we penalize the inverse of the distances such that anomalies must be mapped further away from the center.1 Note that this is in line with the common assumption that anomalies are not concentrated”, wherein the examiner interprets “quadratic loss on the distances of the mapped points to the center c” to be the same as a “normal prediction loss parameter associated with the normal classification label”, because they are both directed to a scalar loss quantity computed exclusively from training samples bearing the normal class label that measures and penalizes the deviation of those samples' latent representations from the center of the learned normal distribution. The examiner further interprets “penalize the inverse of the distances such that anomalies must be mapped further away from the center” to be the same as an anomaly prediction loss parameter associated with the anomaly classification label, because they are both directed to a separate scalar loss quantity computed exclusively from training samples bearing the anomaly class label that penalizes proximity of those samples' latent representations to the normal region by applying an inverse distance penalty.)
Ruff does not teach generating, by the one or more processors and using an anomaly detection machine learning model … generating, by the one or more processors and using a classification prediction machine learning model, a global classification loss parameter based on the plurality of labeled training data objects, the normal prediction loss parameter, and the anomaly prediction loss parameter; generating, by the one or more processors, a composite loss parameter based on the normal prediction loss parameter, the global classification loss parameter, a normal prediction weight parameter, and a global classification weight parameter; and initiating, by the one or more processors, the performance of one or more prediction-based operations based on the anomaly detection machine learning model and the composite loss parameter.
Huang teaches generating, by the one or more processors and using an anomaly detection machine learning model ([Huang, page 2] “we introduce ESAD, an end-to-end method for semi-supervised anomaly detection”, wherein the examiner interprets “end-to-end method for semi-supervised anomaly detection” to be the same as using a classification prediction machine learning model because they are both directed to a trained model that learns to distinguish between normal and anomalous classes, which is a form of classification prediction);
generating, by the one or more processors and using a classification prediction machine learning model, ([Huang, page 2] “we introduce ESAD, an end-to-end method for semi-supervised anomaly detection”, wherein the examiner interprets “end-to-end method for semi-supervised anomaly detection” to be the same as using a classification prediction machine learning model because they are both directed to a trained model that learns to distinguish between normal and anomalous classes, which is a form of classification prediction);
a global classification loss parameter based on the plurality of labeled training data objects, ([Huang, page 6] “Finally, we define our training loss as follow: Lsemi = Lrec-semi + λ1Lnorm-semi + λ2Lass”, and ([Huang, page 4] “m labeled samples (x^l_1, y1),···,(x^l_m, ym) ∈ X ×Y with Y = {-1,1} where y = 1 denotes normal samples and y = -1 denotes anomalous samples”, wherein the examiner interprets “Lsemi” to be the same as a “global classification loss parameter” because they are both directed to a single combined loss value that aggregates all individual loss components for training the model). The examiner further interprets “m labeled samples (x^l_1, y1),···,(x^l_m, ym) ∈ X ×Y with Y = {-1,1}” to be the same as the “plurality of labeled training data objects” because they are both directed to a set of multiple training samples that have been assigned known class labels);
the normal prediction loss parameter, and the anomaly prediction loss parameter; ([Huang, page 5] “where yj= -1 for the labeled anomalous data while yj= 1 for the labeled normal data. This loss enforces the compacted representation for the normal data and scattered representation for the labeled anomalous data … scattered representation for the labeled anomalous data”, wherein the examiner interprets “compacted representation for the normal data” when “yj= 1 for the labeled normal data” to be the same as “the normal prediction loss parameter” because they are both directed to a loss component computed specifically over samples known to belong to the normal class, penalizing deviations from a compact encoding). The examiner further interprets “scattered representation for the labeled anomalous data” when “yj= -1 for the labeled anomalous data” to be the same as “the anomaly prediction loss parameter” because they are both directed to a loss component computed specifically over samples known to be anomalous, penalizing representations that fail to be dispersed away from the normal cluster).
generating, by the one or more processors, a composite loss parameter based on the normal prediction loss parameter, the global classification loss parameter, a normal prediction weight parameter, and a global classification weight parameter; and ([Huang, page 6] “Finally, we define our training loss as follow: Lsemi = Lrec-semi + λ1Lnorm-semi + λ2Lass, where λ1 and λ2 are two hyperparameters.”, wherein the examiner interprets “Lsemi = Lrec-semi + λ1Lnorm-semi + λ2Lass, where λ1 and λ2 are two hyperparameters” to be the same as “a composite loss parameter based on the normal prediction loss parameter, the global classification loss parameter, a normal prediction weight parameter, and a global classification weight parameter” because they are both a single weighted sum that combines individual component losses (L_rec-semi acting as the normal-data mutual-information loss, L_norm-semi as the entropy/classification loss, L_ass as the consistency/global constraint term) scaled by weighting hyperparameters (λ₁, λ₂), exactly as the claim recites a composite formed from individual loss parameters and their corresponding weight parameters.)
initiating, by the one or more processors, the performance of one or more prediction-based operations based on the anomaly detection machine learning model and the composite loss parameter. ([Huang, page 6] “Anomaly Score Measurement. We discuss how we calculate the anomaly score in the test phase. Since both the mutual information and the entropy are related to the performance of anomaly detection, we use both Lrec-semi and Lnorm-semi to measure the anomaly score for the given samples, which are related to the mutual information and the entropy, respectively. We calculate the reconstruction error of each input sample x and the value of L2 norm for its representation zˆ for anomaly detection. The anomaly score is formulated as: S_test = ‖x̂ - x‖² + λ₁‖ẑ‖², where λ₁ is the same as the setting in the training process… considering both the terms of the mutual information and the entropy for the anomaly score measurement”, wherein the examiner interprets “S_test = ‖x̂ - x‖² + λ₁‖ẑ‖²” to be the same as “initiating the performance of one or more prediction-based operations based on the anomaly detection machine learning model and the composite loss parameter” because they are both using the trained model (encoder-decoder-encoder) together with the same weighted combination of loss terms (reconstruction error + λ₁ * norm) that constitutes the composite training loss to produce a test-time anomaly prediction score; i.e., the composite loss structure directly determines how inference/prediction is performed.)
Ruff, Huang, and the instant application are analogous art because they are all directed to semi-supervised anomaly detection to improve detection of normal and anomalous samples.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method for labeling normal/anomaly data disclosed by Ruff to include the technique for loss calculation disclosed by Huang. One would be motivated to do so to effectively improve anomaly detection performance by combining multiple complementary loss components into a single weighted objective that balances reconstruction accuracy and latent representation structure, as suggested by Huang ([Huang, page 6] “Finally, we define our training loss as follow: Lsemi = Lrec-semi + λ1Lnorm-semi + λ2Lass, where λ1 and λ2 are two hyperparameters.”). Claims 8 and 15 are analogous to claim 1, aside from claim type and mild differences, and thus will face the same rejection as set forth above. 

Regarding claim 2, Ruff and Huang teaches The computer-implemented method of claim 1, (see rejection of claim 1). 
Ruff further teaches: 
further comprising selecting a plurality of normal training data objects and a plurality of anomaly training data objects from the plurality of labeled training data objects, ([Ruff, page 6] “In every setup, we set one of the ten classes to be the normal class and let the remaining nine classes represent anomalies. We use the original training data of the respective normal class as the unlabeled part of our training set. Thus we start with a clean AD setting that fulfills the assumption that most (in this case all) unlabeled samples are normal. The training data of the respective nine anomaly classes then forms the data pool from which we draw anomalies for training to create different scenarios.”, wherein the examiner interprets “use the original training data of the respective normal class” to be the same as “selecting a plurality of normal training data objects …” because they are both directed to choosing a set of training samples that are treated as normal training data. The examiner further interprets “data pool from which we draw anomalies for training” to be the same as “selecting … a plurality of anomaly training data objects …” because they are both directed to selecting/choosing anomaly training samples from a larger set for use in training.)
Huang further teaches wherein (a) a normal training data object of the plurality of normal training data objects is associated with the normal classification label, and (b) an anomaly training data object of the plurality of anomaly training data objects is associated with the anomaly classification label ([Huang, page 4] “Given the input space X consisting of normal data XN and anomalous data XA, where X = XN ∪ XA. For semi-supervised anomaly detection (AD), we are given n unlabeled samples x u 1 ,··· ,x u n ∈ X and m labeled samples (x l 1 , y1),··· ,(x l m, ym) ∈ X ×Y with Y = {-1,1} where y = 1 denotes normal samples and y = -1 denotes anomalous samples.”, wherein the examiner interprets “y = 1 denotes normal samples” to be the same as “a normal training data object … is associated with the normal classification label” because they are both directed to normal samples being explicitly associated with a normal-indicating label value. The examiner further interprets “y = -1 denotes anomalous samples” to be the same as “an anomaly training data object … is associated with the anomaly classification label” because they are both directed to anomalous samples being explicitly associated with an anomaly-indicating label value.)
Ruff, Huang, and the instant application are analogous art because they are all directed to semi-supervised anomaly detection methods that utilize labeled normal and anomalous training samples.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 1 disclosed by Ruff and Huang to include the normal and anomaly data labeling technique disclosed by Huang. One would be motivated to do so to effectively ensure explicit association between selected training samples and their corresponding classification labels, thereby improving the reliability and interpretability of the semi-supervised anomaly detection framework, as suggested by Huang ([Huang, page 4] “m labeled samples (x l 1 , y1),··· ,(x l m, ym) ∈ X ×Y with Y = {-1,1} where y = 1 denotes normal samples and y = -1 denotes anomalous samples”). Claims 9 and 16 are analogous to claim 2, aside from claim type and mild differences, and thus will face the same rejection as set forth above.

Regarding claim 3, Ruff and Huang teaches The computer-implemented method of claim 2, (see rejection of claim 2). 
Ruff further teaches wherein a normal training data object count associated with the plurality of normal training data objects is larger than an anomaly training data object count associated with the plurality of anomaly training data objects. ([Ruff, page 1] “Anomaly detection (AD) is the task of identifying unusual samples in data. Typically AD methods attempt to learn a “compact” description of the data in an unsupervised manner assuming that most of the samples are normal (i.e., not anomalous)...Shallow unsupervised AD methods ... often require manual feature engineering to be effective on high-dimensional data and are limited in their scalability to large datasets. These limitations have sparked great interest in developing novel deep approaches to unsupervised AD.” wherein the examiner interprets “assuming that most of the samples are normal (i.e., not anomalous)” to be the same as “normal training data object count … is larger than … anomaly training data object count …” because they are both directed to anomaly-detection settings where the normal class forms the majority relative to anomalies.).  Claims 10 and 17 are analogous to claim 3, aside from claim type and mild differences, and thus will face the same rejection as set forth above.

Regarding claim 6, Ruff and Huang teaches The computer-implemented method of claim 1, (see rejection of claim 1). 
Huang further teaches:
 wherein the anomaly prediction loss parameter indicates a reconstruction loss measure ([Huang, page 5] “The reconstruction loss is defined as follows: Lrec-semi = 1/n ∑ ||x̂ u i - x u i||² + 1/m ∑ ||x̂ l j - Φ(x l j)||²”, wherein the examiner interprets “reconstruction loss is defined as follows” to be the same as “the anomaly prediction loss parameter indicates a reconstruction loss measure” because they are both defining a loss parameter that includes a reconstruction loss term computed over the labeled samples, including the anomalous ones -- the second summation term covers the labeled data which includes anomalous samples governed by the Φ function.)
associated with the anomaly detection machine learning model in reconstructing a plurality of anomaly training data objects ([Huang, page 5] “With unlabeled samples … and labeled samples … we want the autoencoder to well reconstruct the normal data but erroneously reconstruct the labeled anomalous data …”, wherein the examiner interprets “we want the autoencoder …” to be the same as “associated with the anomaly detection machine learning model” because they are both directed to an anomaly-detection model implemented using an autoencoder architecture that is trained/assessed via reconstruction behavior. The examiner further interprets “erroneously reconstruct the labeled anomalous data” to be the same as “reconstructing a plurality of anomaly training data objects” because they are both directed to reconstructing anomalous (anomaly-class) training samples using the model’s reconstruction process.)
from the plurality of labeled training data objects that is associated with the anomaly classification label. ([Huang, page 4] “Given the input space X consisting of normal data XN and anomalous data XA, where X = XN ∪ XA. For semi-supervised anomaly detection (AD), we are given n unlabeled samples x u 1 ,··· ,x u n ∈ X and m labeled samples (x l 1 , y1),··· ,(x l m, ym) ∈ X ×Y with Y = {-1,1} where y = 1 denotes normal samples and y = -1 denotes anomalous samples.”, wherein the examiner interprets “y = -1 denotes anomalous samples” to be the same as “plurality of labeled training data objects that is associated with the anomaly classification label” because they are both directed to anomalous samples being explicitly associated with an anomaly-indicating label value.)
Ruff, Huang, and the instant application are analogous art because they are all directed to machine learning-based semi-supervised anomaly detection systems that generate and optimize loss functions.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 1 disclosed by Ruff and Huang to include the reconstruction loss function disclosed by Huang. One would be motivated to do so to effectively improve anomaly discrimination performance by incorporating a reconstruction-based loss component for labeled anomalous samples, as suggested by Huang ([Huang, page 5] “The reconstruction loss is defined as follows: Lrec-semi = 1/n ∑ ||x̂ u i - x u i||² + 1/m ∑ ||x̂ l j - Φ(x l j)||².”). Claims 13 and 20 are analogous to claim 6, aside from claim type and mild differences, and thus will face the same rejection as set forth above.

Regarding claim 7, Ruff and Huang teaches The computer-implemented method of claim 6, (see rejection of Claim 6.)
Huang further teaches:
 further comprising: generating, by the one or more processors, a plurality of encoded anomaly training data objects based on the plurality of anomaly training data objects that is associated with the anomaly classification label; ([Huang, page 4] “Given the input space X consisting of normal data XN and anomalous data XA, where X = XN ∪ XA. For semi-supervised anomaly detection (AD), we are given n unlabeled samples x u 1 ,··· ,x u n ∈ X and m labeled samples (x l 1 , y1),··· ,(x l m, ym) ∈ X ×Y with Y = {-1,1} where y = 1 denotes normal samples and y = -1 denotes anomalous samples.” and [Huang, page 5] “Enc1(·) emphasizes mutual information optimization and the second encoder Enc2(·) focuses on entropy optimization, and in the meanwhile, the two encoders are enforced to share similar encoding via a consistent constraint on their latent representations. The encoderdecoder-encoder architecture can be expressed as: z = Enc1(x), xˆ = Dec(z), zˆ = Enc2(xˆ), (Equation 4) where xˆ is the output of the decoder, and z and zˆ are the latent representations from the first and second encoders, respectively.”, wherein the examiner interprets “m labeled samples … where y = -1 denotes anomalous samples” and “z = Enc1(x)” to be the same as “generating … a plurality of encoded anomaly training data objects … associated with the anomaly classification label” because they are both directed to processing multiple labeled anomalous (y = -1) training samples and encoding each such input x into an encoded representation z using an encoder (Enc1).
generating, by the one or more processors, a plurality of reconstructed anomaly training data objects based on the plurality of encoded anomaly training data objects; [Huang, page 5] “Enc1(·) emphasizes mutual information optimization and the second encoder Enc2(·) focuses on entropy optimization, and in the meanwhile, the two encoders are enforced to share similar encoding via a consistent constraint on their latent representations. The encoderdecoder-encoder architecture can be expressed as: z = Enc1(x), xˆ = Dec(z), zˆ = Enc2(xˆ), (Equation 4) where xˆ is the output of the decoder, and z and zˆ are the latent representations from the first and second encoders, respectively.” wherein the examiner interprets “xˆ = Dec(z)” to be the same as “generating … a plurality of reconstructed anomaly training data objects based on the plurality of encoded anomaly training data objects” because they are both directed to producing reconstructed outputs (xˆ) by decoding encoded/latent representations (z) using a decoder (Dec).)
and generating, by the one or more processors, the anomaly prediction loss parameter based on the plurality of anomaly training data objects and the plurality of reconstructed anomaly training data objects. ([Huang, page 5] “Lrec-semi = 1/n ∑ ||x̂ u i - x u i||² + 1/m ∑ ||x̂ l j - Φ(x l j)||² where, Φ(x l j) = φ(x l j), if yj = -1 (Equation 5)”, and [Huang, page 5] “we want the autoencoder to well reconstruct the normal data but erroneously reconstruct the labeled anomalous data, thus the reconstruction likelihood is maximized for the normal data and minimized for the labeled anomalous data. A straight-forward loss definition for the anomalous data is the negative squared norm loss.”, wherein the examiner interprets “Lrec-semi = 1/n ∑ ||x̂ u i - x u i||² + 1/m ∑ ||x̂ l j - Φ(x l j)||² where, Φ(x l j) = φ(x l j), if yj = -1 (Equation 5)” and “reconstruction likelihood … minimized for the labeled anomalous data” to be the same as “generating … the anomaly prediction loss parameter based on the plurality of anomaly training data objects and the plurality of reconstructed anomaly training data objects” because they are both describing the generation of a reconstruction-based loss specifically computed over the anomalous training data and their reconstructions, namely, both calculate a computing loss (Lrec-semi term for labeled samples) using the reconstructed outputs (xˆ^l_j) and the labeled anomalous training inputs (x^l_j with yj = -1, via Φ(·) selecting φ(x^l_j) for anomalies).
Ruff, Huang, and the instant application are analogous art because they are all directed to semi-supervised anomaly detection systems that process labeled anomalous training samples using machine learning architectures.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 6 disclosed by Ruff and Huang to include the end to end encoding technique disclosed by Huang. One would be motivated to do so to effectively enhance anomaly discrimination by leveraging structured latent representations and reconstruction-based supervision, as suggested by Huang ([Huang, page 5] “the encoder-decoder-encoder architecture can be expressed as: z = Enc1(x), xˆ = Dec(z) …”). Claim 14 is analogous to claim 7, aside from claim type and mild differences, and thus will face the same rejection as set forth above.

Claim(s) 4,5,11-12, and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ruff in view of Huang further in view of NPL reference “Semi-supervised anomaly detection with dual prototypes autoencoder for industrial surface inspection”, by Liu et. al. (referred herein as Liu). 

Regarding claim 4, Ruff and Huang teaches The computer-implemented method of claim 1, (see rejection of claim 1). 
Huang further teaches from the plurality of labeled training data objects that is associated with the normal classification label. ([Huang, page 4] “m labeled samples (x l 1, y1),··· ,(x l m, ym) ∈ X ×Y with Y = {-1,1} where y = 1 denotes normal samples and y = -1 denotes anomalous samples” wherein the examiner interprets m labeled samples…normal samples” to be the same as “the plurality of labeled training data objects that is associated with the normal classification label” because they are both describing labeled training data where the label y = 1 identifies the normal class, from which the normal training data objects are selected for reconstruction.)
	Ruff and Huang do not teach wherein the normal prediction loss parameter indicates a reconstruction loss measure associated with the anomaly detection machine learning model in reconstructing a plurality of normal training data objects.
Liu teaches wherein the normal prediction loss parameter indicates a reconstruction loss measure ([Liu, page 4] “Reconstruction Loss. … we firstly consider the distance between the input image x and its reconstruction x̂. The reconstruction error on each sample is minimized as follows: Lrec(x, x̂) = ||x - x̂||² … where the l2-norm is used to measure the reconstruction error.”, wherein the examiner interprets “The reconstruction error on each sample” to be the same as “normal prediction loss parameter indicates a reconstruction loss measure” because they are both a loss computed by measuring the error between an input and its reconstruction.)
associated with the anomaly detection machine learning model ([Liu, page 2] “In the anomaly detection task, AE is usually trained by minimizing the reconstruction error of defect-free samples, and then the reconstruction error is adopted as an indicator of anomalies.”, wherein the examiner interprets “AE is usually trained by minimizing the reconstruction error” to be the same as “associated with the anomaly detection machine learning model” because they are both stating that the reconstruction loss is inherently tied to the anomaly detection model and its training.)
in reconstructing a plurality of normal training data objects ([Liu, page 4] “anomaly detection models are expected to minimize the reconstruction error on normal images during training”, wherein the examiner interprets “anomaly detection models are expected to minimize the reconstruction error on normal images” to be the same as “reconstructing a plurality of normal training data” because they are both specifying that the reconstruction is performed on the normal (defect-free) training samples.)
Liu does not teach, from the plurality of labeled training data objects that is associated with the normal classification label.
Liu, Huang, Liu, and the instant application are analogous art because they are all directed to machine learning-based systems that use an anomaly detection framework.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify method of claim 1 disclosed by Ruff and Huang to include the dual encoder disclosed by Liu. One would be motivated to do so to more effectively improve model generalization and anomaly detection performance by minimizing reconstruction error, as suggested by Liu ([Liu, page 4] “anomaly detection models are expected to minimize the reconstruction error on normal images during training”). Claims 11 and 18 are analogous to claim 4, aside from claim type and mild differences, and thus will face the same rejection as set forth above.

Regarding claim 5, Ruff, Huang, and Liu teaches The computer-implemented method of claim 4, (see rejection of claim 4). 
 Liu further teaches further comprising: generating, by the one or more processors, a plurality of encoded normal training data objects based on the plurality of normal training data objects; ([Liu, page 3] “Suppose the training set D = {𝑥𝑖|𝑖 = 1, 2, ⋯ , 𝑀} is given, where xi ∈ X is the ith normal sample (totally M samples) of the sample space X ... Given an image sample x ∈ X, the encoder-1 network encodes it as a latent vector z ∈ Z”, wherein the examiner interprets “Suppose the training set D = {xi…}… where xi … is the i th normal sample” and “the encoder-1 network encodes it as a latent vector z” to be the same as “generating … a plurality of encoded normal training data objects based on the plurality of normal training data objects” because they are both directed to taking multiple normal training samples and encoding each normal sample into an encoded (latent) representation.)
generating, by the one or more processors, a plurality of reconstructed normal training data objects based on the plurality of encoded normal training data objects; and ([Liu, page 2,3] “given an input, we first encode it as a latent vector using encoder-1, then the decoder is applied to get the reconstructed image … Decoder Network. The decoder network usually cooperates with the encoder network to reconstruct the input image from the latent vector z. And the details of our decoder network are presented in Table 1. Firstly, a linear layer followed by batch normalization is applied to upscale the latent vector z. Then, five blocks comprised by deconvolutional transpose layers, batch normalization, and leaky ReLU activation are utilized to reconstruct the original image x as 𝑥̂.” wherein the examiner interprets “reconstruct the input image from the latent vector z … reconstruct the original image x as x̂” and “the decoder is applied to get the reconstructed image” to be the same as “generating … a plurality of reconstructed normal training data objects based on the plurality of encoded normal training data objects” because they are both directed to decoding encoded/latent vectors to produce reconstructed outputs corresponding to the (normal) inputs.)
generating, by the one or more processors, the normal prediction loss parameter based on the plurality of normal training data objects and the plurality of reconstructed normal training data objects. ([Liu, page 4] “Reconstruction Loss. Given the training set D = {𝑥𝑖|𝑖 = 1, 2, ⋯ , 𝑀} containing M samples, we firstly consider the distance between the input image x and its reconstruction 𝑥̂. The reconstruction error on each sample is minimized as follows: L𝑟𝑒𝑐 (𝑥, 𝑥̂) = || 𝑥 - 𝑥̂ ||^2_2, where the 𝓁2-norm is used to measure the reconstruction error ...Overall, the final loss function of our DPAE model is formed as a combination of the reconstruction loss and dual prototype loss.”, wherein the examiner interprets “consider the distance between the input image x and its reconstruction x̂ … Lrec(x, x̂) = ||x - x̂||²” and “∑ … ||xi - x̂i||²” to be the same as “generating … the normal prediction loss parameter based on the plurality of normal training data objects and the plurality of reconstructed normal training data objects” because they are both directed to computing a loss/error value from differences between normal training inputs (xi) and their reconstructions (x̂i).)
Ruff, Huang, Liu, and the instant application are analogous art because they are all directed to machine learning-based anomaly detection methods.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 4 disclosed by Ruff, Huang, and Liu to include the reconstruction loss calculation disclosed by Liu. One would be motivated to do so to more effectively quantify reconstruction error over normal training samples to improve model training stability and anomaly discrimination performance, as suggested by Liu ([Liu, page 4] “The reconstruction error on each sample is minimized as follows: Lrec(x, x̂) = ||x - x̂||²₂.”). Claims 12 and 19 are analogous to claim 5, aside from claim type and mild differences, and thus will face the same rejection as set forth above.

Conclusion


Any inquiry concerning this communication or earlier communications from the examiner should be directed to DEVAN KAPOOR whose telephone number is (703)756-1434. The examiner can normally be reached Monday - Friday: 9:00AM - 5:00 PM EST (times may vary).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DEVAN KAPOOR/Examiner, Art Unit 2126                                                                                                                                                                                                        
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126
Read full office action
MACHINE LEARNING MODEL TRAINING FOR IMPROVING ANOMALY DETECTION

This examiner grants 11% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

MACHINE LEARNING MODEL TRAINING FOR IMPROVING ANOMALY DETECTION

This examiner grants 11% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email