Office Action Analysis: 18192721 — Outlier Detection Based On Synthetic Trend Generation For Machine Learning

Office Action

§101 §102 §103
DETAILED ACTION
This action is in response to the application filed on 3/30/2023. Claims 1-20 are pending in the application and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding Independent Claims 1, 11, and 18,
Regarding Claim 1,
(Step 1): Claim 1 recites  A method comprising, thus a process, one of the four statutory categories of patentable subject matter.
(Step 2A Prong 1): However, Claim 1 further recites generating, based on the historical outlier data and using a minority oversampling technique, synthetic outlier data associated with the plurality of user accounts, which constitutes the evaluation of the historical outlier data to determine synthetic outlier data, thus corresponding to a mental process which can be done mentally or by pen and paper;
combining the historical outlier data, at least a subset of the synthetic outlier data, and historical non-outlier data associated with the plurality of user accounts into a unified dataset which constitutes the evaluation of the subsets of data to determine a unified dataset, thus corresponding to a mental process which can be done mentally or by pen and paper;
and classifying … new data as either outlier data or non-outlier data in the plurality of user accounts which constitutes the evaluation of new data to determine its label as outlier or non-outlier data, thus corresponding to a mental process which can be done mentally or by pen and paper;
Thus, Claim 1 recites an abstract idea.
(Step 2A Prong 2): The claim does not recite any additional elements which integrate the abstract idea into a practical application because the additional elements consist of:
accessing historical outlier data corresponding to a plurality of user accounts, wherein the historical outlier data comprises a plurality of parameters, which is insignificant extra-solution activity of data gathering (MPEP 2106.05(g)))
training, based on a first portion of the resource data, the predictive model according to the plurality of features, which merely recites the idea of a solution or outcome in addition to reciting limitations of broad applicability (MPEP 2106.05(f)(1, 3))
based on the trained machine learning model, which is implementing an abstract idea on generic computer components (MPEP 2106.05(f))
and thus, the claim is directed to the abstract idea of evaluating generating synthetic outlier data and combining the historical data with the synthetic outlier data for classifying new data as outlier or non-outlier data.
(Step 2B) The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because element a) is further well-understood, routine, and conventional activity of “transmitting or receiving data over a network,” by MPEP 2106.05(d), which cannot provide significantly more than the abstract idea itself, element b) (via MPEP 2106.05(f)(1)) has no particular steps that disclose how the improvement in technology is to be achieved which does not integrate a judicial exception into practical application nor provide significantly more than the abstract idea itself, element b) (via MPEP 2106.05(f)(3) generally recites an effect of the judicial exception with no description as to how the abstract idea steps are to be executed which cannot integrate the judicial exception into a practical application nor provide significantly more than the abstract idea itself, and element c) (via MPEP 2106.05(f), “apply it on a computer”) cannot provide an inventive concept. Thus, Claim 1 is subject-matter ineligible.

Regarding Claim 11,
(Step 1): Claim 11 recites  A system comprising: a processor; and a non-transitory computer-readable medium having stored thereon instructions that are executable by the processor to cause the system to perform operations thus a machine, one of the four statutory categories of patentable subject matter.
(Step 2A Prong 1): However, Claim 11 further recites accessing second data corresponding to a plurality of synthetically-generated outlier events in the specified environment, wherein the synthetically-generated outlier events are generated at least in part by applying a minority oversampling technique on the first data, which constitutes the evaluation of the first data to determine synthetic outlier second data, thus corresponding to a mental process which can be done mentally or by pen and paper;
combining at least subsets of the first data, the second data, and the third data into a combined dataset which constitutes the evaluation of the subsets of data to determine a combined dataset, thus corresponding to a mental process which can be done mentally or by pen and paper;
and determining … whether a new event in the specified environment is an outlier event which constitutes the evaluation of new events to determine its label as outlier or non-outlier data, thus corresponding to a mental process which can be done mentally or by pen and paper;
Thus, Claim 11 recites an abstract idea.
(Step 2A Prong 2): The claim does not recite any additional elements which integrate the abstract idea into a practical application because the additional elements consist of:
accessing first data corresponding to a plurality of historical outlier events that have occurred in a specified environment, wherein each of the historical outlier events is associated with a plurality of different types of parameters, which is insignificant extra-solution activity of data gathering (MPEP 2106.05(g)))
accessing third data corresponding to a plurality of historical non-outlier events that have occurred in the specified environment, which is insignificant extra-solution activity of data gathering (MPEP 2106.05(g)))
training a machine learning model with the combined dataset which merely recites the idea of a solution or outcome in addition to reciting limitations of broad applicability (MPEP 2106.05(f)(1, 3))
based on the trained machine learning model, which is implementing an abstract idea on generic computer components (MPEP 2106.05(f))
and thus, the claim is directed to the abstract idea of evaluating generating synthetic outlier data and combining the historical data with the synthetic outlier data for classifying new data as outlier or non-outlier data.
(Step 2B) The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because elements a), b) are further well-understood, routine, and conventional activity of “transmitting or receiving data over a network,” by MPEP 2106.05(d), which cannot provide significantly more than the abstract idea itself, element c) (via MPEP 2106.05(f)(1)) has no particular steps that disclose how the improvement in technology is to be achieved which does not integrate a judicial exception into practical application nor provide significantly more than the abstract idea itself, element c) (via MPEP 2106.05(f)(3) generally recites an effect of the judicial exception with no description as to how the abstract idea steps are to be executed which cannot integrate the judicial exception into a practical application nor provide significantly more than the abstract idea itself, and element d) (via MPEP 2106.05(f), “apply it on a computer”) cannot provide an inventive concept. Thus, Claim 11 is subject-matter ineligible.

Regarding Claim 18,
(Step 1): Claim 18 recites  A non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations comprising, thus a machine, one of the four statutory categories of patentable subject matter.
(Step 2A Prong 1): However, Claim 18 further recites  generating, based on the historical outlier data and using a minority oversampling technique, synthetic outlier data associated with the plurality of user accounts, which constitutes the evaluation of the historical outlier data to determine synthetic outlier second data, thus corresponding to a mental process which can be done mentally or by pen and paper;
generating an aggregated dataset based on at least a subset of the historical outlier data, at least a subset of the synthetic outlier data, and at least a subset of historical non-outlier data associated with the plurality of user accounts which constitutes the evaluation of the subsets of data to determine an aggregated dataset, thus corresponding to a mental process which can be done mentally or by pen and paper;
and determining … whether the new data should be classified as outlier data or non-outlier data which constitutes the evaluation of new events to determine its label as outlier or non-outlier data, thus corresponding to a mental process which can be done mentally or by pen and paper;
Thus, Claim 18 recites an abstract idea.
(Step 2A Prong 2): The claim does not recite any additional elements which integrate the abstract idea into a practical application because the additional elements consist of:
accessing historical outlier data corresponding to a plurality of user accounts, wherein the historical outlier data comprises a plurality of parameters, which is insignificant extra-solution activity of data gathering (MPEP 2106.05(g)))
and wherein an amount of the historical outlier data is insufficient to train a machine learning model, which merely recites the particular technological environment or field of use in which the abstract idea is to be performed (MPEP 2106.05(h))
training a machine learning model with the aggregated dataset, which merely recites the idea of a solution or outcome in addition to reciting limitations of broad applicability (MPEP 2106.05(f)(1, 3))
accessing new data after the machine learning model has been trained, the new data containing a request to access a resource, which is insignificant extra-solution activity of data gathering (MPEP 2106.05(g)))
based on the trained machine learning model, which is implementing an abstract idea on generic computer components (MPEP 2106.05(f))
and thus, the claim is directed to the abstract idea of evaluating generating synthetic outlier data and combining the historical data with the synthetic outlier data for classifying new data as outlier or non-outlier data.
(Step 2B) The additional elements, taken alone or in combination, cannot provide significantly
more than the abstract idea itself because elements a), d) are further well-understood, routine, and conventional activity of “transmitting or receiving data over a network,” by MPEP 2106.05(d), which cannot provide significantly more than the abstract idea itself, element b) (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself, element c) (via MPEP 2106.05(f)(1)) has no particular steps that disclose how the improvement in technology is to be achieved which does not integrate a judicial exception into practical application nor provide significantly more than the abstract idea itself, element c) (via MPEP 2106.05(f)(3) generally recites an effect of the judicial exception with no description as to how the abstract idea steps are to be executed which cannot integrate the judicial exception into a practical application nor provide significantly more than the abstract idea itself, and element e) (via MPEP 2106.05(f), “apply it on a computer”) cannot provide an inventive concept. Thus, Claim 18 is subject-matter ineligible.

Regarding Dependent Claims 2-10, 12-17, 19-20,
Claims 2; 13; 20 recite additional steps of the abstract idea (mental processes) but merely recite the idea of a solution or outcome in addition to reciting limitations of broad applicability (MPEP 2106.05(f)(1, 3)) and thus (via MPEP 2106.05(f)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claims 2; 13; 20 are subject-matter ineligible

Claims 3; 14 recite additional steps of the abstract idea (mental processes) but merely recite:
instances of implementing an abstract idea on generic computer components and thus (via MPEP 2106.05(f), “apply it on a computer”) cannot provide an inventive concept. 
the particular technological environment or field of use in which the abstract idea is to be performed and thus (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself.
Thus, Claims 3; 14 are subject-matter ineligible.

Claims 4; 12 merely recite the particular technological environment or field of use in which the abstract idea is to be performed and thus (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claims 4; 12 are subject-matter ineligible. 

Claims 5; 15 recite additional steps of the abstract idea but do not recite any additional elements which integrate the abstract idea into practical application nor provide significantly more than the abstract idea itself. Thus, Claims 5, 15 are subject-matter ineligible.

Claims 6, 7, 8, 9; 16; 19 recite additional steps of the abstract idea (mental processes) but merely recite the particular technological environment or field of use in which the abstract idea is to be performed and thus (via MPEP 2106.05(h)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claims 6, 7, 8, 9; 16; 19 are subject-matter ineligible.

Claims 10; 17 merely recites insignificant extra-solution activity of data outputting (MPEP 2106.05(g))) and thus (via MPEP 2106.05(g)) cannot integrate the abstract idea into a practical application or provide significantly more than the abstract idea itself. Thus, Claims 10; 17 are subject-matter ineligible.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 11, 13, 15-17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Tajima et al. (US20210224599A1, hereinafter “Tajima”)

Regarding Claim 11,
Tajima discloses A system comprising: a processor; and a non-transitory computer-readable medium having stored thereon instructions that are executable by the processor to cause the system to perform operations comprising: accessing first data corresponding to a plurality of historical outlier events that have occurred in a specified environment, wherein each of the historical outlier events is associated with a plurality of different types of parameters, (Tajima [0046]; “ Operation data 100 is data collected from the equipment 10 by the controller 11 and is managed by the local data management unit 113, and is, specifically, data relating to sensor values of the sensors attached to the equipment 10 and control signals to be sent to the equipment 10, for example. As depicted in FIG. 3, the operation data 100 includes items for a date and time 101, an item name 102, and a value 103. The date and time 101 is date and time when operation data is generated or collected. The item name 102 is a name for identifying operation data and is, for example, a sensor number or a control signal number. The value 103 is a value of operation data of the date and time and the item.
It is noted that also the operation data managed by the integrated data management unit 123 of the data management server 12 is similar in contents and is integration of the operation data 100 of the local data management unit 113 of the controllers 11.”
Tajima [0053]; “First, the collection unit 111 of the controller 11 collects the operation data 100 in a normal condition from both or one of the equipment 10 and the controller 11, and stores the collected operation data 100 in a normal condition into the local data management unit 113 (S101). Note that it is assumed that, in the present embodiment, the period of data collected by the collection unit 111 is fixed. If the period is not fixed, the operation data 100 is converted into operation data adjusted in period by interpolation or the like and then stored into the local data management unit 113.
Then, the collection and delivery unit 121 of the data management server 12 aggregates the operation data 100 stored in the local data management unit 113 of the controllers 11 and stores the aggregated operation data 100 into the integrated data management unit 123 of the data management server 12 (S102).
Then, the learning unit 122 of the data management server 12 constructs (learns) an anomaly detection model using the operation data 100 associated, in item name, with the model ID in the monitoring unit definition data 200 (S103). Note that it is assumed that, prior to this processing, appropriate monitoring unit definition data 200 is registered and association between the model ID and the operation data 100 is completed already. It is noted that the process of constructing (learning) an anomaly detection model is hereinafter described in detail”)
accessing second data corresponding to a plurality of synthetically-generated outlier events in the specified environment, wherein the synthetically-generated outlier events are generated at least in part by applying a minority oversampling technique on the first data (Tajima [0062]; “Then, the learning unit 122 of the data management server 12 creates new training data X.sub.t by bootstrap sampling of the training data (S204). At this time, the sampling is performed according to the probability P(x) given by the (Formula 2) below representing the anomaly score at time t−1 as S.sub.t-1. Here, x, x.sub.j∈X.sub.t-1, X.sub.t-1 are training data before X.sub.t, and the index j of the sum total of the denominator of the (Formula 2) moves all elements of X.sub.t-1. In other words, as the anomaly score becomes higher, it is sampled at a higher probability. This process makes it possible to create a new anomaly detection model that decreases the dispersion efficiently by a process described hereinafter. It is noted that, although, in the present embodiment, sampling is performed using a ratio of the anomaly score simply, the sampling may be performed otherwise based on some other distribution such as random distribution. Further, when sampling is performed, not only extracting data from within existing operation data, but also using interpolation values or estimated values may be possible. For example, an oversampling method such as SMOTE (Synthetic Minority Over-sampling Technique, including interpolation using neighborhood points) or a method of learning a creation model such as GAN (Generative Adversarial Networks) from operation data and then performing sampling from within the creation model may be used. This makes it possible to construct an anomaly detection model including information that is not included in operation data, and as a result, in some cases, the detection performance can be improved.” wherein a minority oversampling technique performed to generate new training data for anomaly detection reads on generating synthetic outlier data second data)
accessing third data corresponding to a plurality of historical non-outlier events that have occurred in the specified environment (Tajima [0062]; “Then, the learning unit 122 of the data management server 12 creates new training data X.sub.t by bootstrap sampling of the training data (S204). At this time, the sampling is performed according to the probability P(x) given by the (Formula 2) below representing the anomaly score at time t−1 as S.sub.t-1. Here, x, x.sub.j∈X.sub.t-1, X.sub.t-1 are training data before X.sub.t, and the index j of the sum total of the denominator of the (Formula 2) moves all elements of X.sub.t-1. In other words, as the anomaly score becomes higher, it is sampled at a higher probability. This process makes it possible to create a new anomaly detection model that decreases the dispersion efficiently by a process described hereinafter. It is noted that, although, in the present embodiment, sampling is performed using a ratio of the anomaly score simply, the sampling may be performed otherwise based on some other distribution such as random distribution. Further, when sampling is performed, not only extracting data from within existing operation data, but also using interpolation values or estimated values may be possible. For example, an oversampling method such as SMOTE (Synthetic Minority Over-sampling Technique, including interpolation using neighborhood points) or a method of learning a creation model such as GAN (Generative Adversarial Networks) from operation data and then performing sampling from within the creation model may be used. This makes it possible to construct an anomaly detection model including information that is not included in operation data, and as a result, in some cases, the detection performance can be improved.” wherein the training data before X.sub.t representative of both anomalous and non-anomalous data samples reads on a third data corresponding to a plurality of historical non-outlier events)
combining at least subsets of the first data, the second data, and the third data into a combined dataset; training a machine learning model with the combined dataset (Tajima [0063]; “Then, the learning unit 122 of the data management server 12 creates a new anomaly detection model using the training data X.sub.t (S205). This procedure is similar to that in S205. The anomaly score S.sub.new of this anomaly detection model is given by the (Formula 3) below. It is noted that, when the distance calculation becomes a bottleneck, a method of approximating distance calculation such as binary hashing or a Product Quantization method (PQ) may be used. By this, the load of the distance calculation can be reduced significantly. Then, the learning unit 122 of the data management server 12 combines the anomaly detection model at time t−1 and the newly created anomaly detection model to create the anomaly detection model at time t whose dispersion (variance) is small (S206). The anomaly score S.sub.t of this anomaly detection model is given by the weighted linear sum of the newly created anomaly detection model and the anomaly detection model at time t−1. S.sub.t is given by the following formula.” wherein the construction of a new anomaly detection model including information that is not included in operation data reads on training a machine learning model with a new data set comprised in part of the new training data and original training data (unified dataset))
and determining, based on the trained machine learning model, whether a new event in the specified environment is an outlier event (Tajima [0073]; “Next, processing of the monitoring phase in the anomaly detection system is described with reference to FIG. 9. Note that it is assumed that, prior to the processing of the monitoring phase, operation data in the equipment 10 is collected in advance.
First, the detection unit 112 of the controller 11 calculates an anomaly score in an initial stage (referred to as a “initial anomaly score”) using an anomaly detection model whose sub model ID 302 is 0, that is, a first anomaly detection model (S301).
Then, the detection unit 112 of the controller 11 calculates an anomaly score using an anomaly detection model whose sub model ID 302 is −1, that is, the last anomaly detection model (S302). It is noted that, when an anomaly detection model whose sub model ID 302 is −1 is not found, an anomaly score is calculated by a procedure similar to that when the anomaly score of the (Formula 4) given hereinabove is calculated.
Then, the detection unit 112 of the controller 11 registers the initial anomaly score and the anomaly score into the anomaly detection result data 400. Further, the detection unit 112 of the controller 11 registers similar data into the integrated data management unit 123 of the data management server 12 through the collection and delivery unit 121 of the data management server 12 (S303).
Then, the detection unit 112 of the controller 11 decides whether or not the anomaly score is higher than a threshold value determined in advance (S304). When the anomaly score is higher than the threshold value (S304: YES), the processing advances to S305. In the other case (S304: NO), the present processing is ended.” wherein the anomaly detection model used for classification of new data as anomalous reads on classifying new data as either outlier or non-outlier data)

Regarding Claim 13,
Tajima teaches the method of Claim 11 (and thus the rejection of Claim 11 is incorporated). Tajima further discloses retraining the machine learning model based on additional data corresponding to new outlier events that have occurred after an initial training of the machine learning model (Tajima [0025]; “In the learning phase, an anomaly detection model is first learned from operation data collected from various apparatus and equipment. Although various models of mechanical learning can be adopted as the anomaly detection model, in the description of the present embodiment, an example that uses a model based on the k-nearest neighbor method is described. It is noted that also it is possible to use a model of different mechanical learning or statistics. In the learning process, a first anomaly detection model based on the k-nearest neighbor method is learned first using normal operation data as training data. The k-nearest neighbor method is referred to as lazy learning and merely stores, from its nature, training data into a memory without processing fetched data.
Then, bootstrap sampling is performed for the training data to create one or a plurality pieces of new training data. Here, the bootstrap sampling is a statistical sampling method of extracting n pieces of data from n pieces of data of a target forgiving duplication. Then, one or a plurality of new anomaly detection models are created using the created training data. In the case where an anomaly detection model (referred to as “ensemble model”) created by combination of the original anomaly detection model and the newly created anomaly detection model indicates a variance of the anomaly scores that is small in comparison with the original anomaly detection model, the ensemble model is replaced with the original anomaly detection model. In the present embodiment, as a specific method for the combination, a method based on the weighted linear sum is used. At this time, the balance (weight) of the combination is determined searching for a best one by line search or the like. Such a sequence of processes is repeated by the predetermined number of times to configure an anomaly detection model having a minimum variance. It is noted that the variance of the anomaly scores is one index that provides a dispersion, and some other index, for example, IQR (Inter Quartile Range) may be used”
Tajima [0028]; “Therefore, a different model may be learned using ensemble models. For example, so-called self-taught learning of learning an anomaly score for a training data set of an ensemble model with a regression model (a model that represents certain two variables using an estimate formula by a statistical method) may be performed. Further, an anomaly detection model may be constructed using training data created by sampling data sets configuring individual ensemble models according to the weights of the models. This makes it possible to re-construct an anomaly detection model that has similar natures and is comparatively light in calculation amount” wherein creation of a plurality of new anomaly detection models based on the original anomaly detection model reads on retraining of the machine learning model based on the additional outlier data)
comparing a performance of the retrained machine learning model with a performance of the initially trained machine learning model (Tajima [0026]; “Then, bootstrap sampling is performed for the training data to create one or a plurality pieces of new training data. Here, the bootstrap sampling is a statistical sampling method of extracting n pieces of data from n pieces of data of a target forgiving duplication. Then, one or a plurality of new anomaly detection models are created using the created training data. In the case where an anomaly detection model (referred to as “ensemble model”) created by combination of the original anomaly detection model and the newly created anomaly detection model indicates a variance of the anomaly scores that is small in comparison with the original anomaly detection model, the ensemble model is replaced with the original anomaly detection model. In the present embodiment, as a specific method for the combination, a method based on the weighted linear sum is used. At this time, the balance (weight) of the combination is determined searching for a best one by line search or the like. Such a sequence of processes is repeated by the predetermined number of times to configure an anomaly detection model having a minimum variance. It is noted that the variance of the anomaly scores is one index that provides a dispersion, and some other index, for example, IQR (Inter Quartile Range) may be used.” wherein the comparison of variance between the original anomaly detection model and the ensemble retrained anomaly detection model reads on comparing a performance of the retrained outlier machine learning model with a performance of the initially trained machine learning model)
and selecting, based on a result of the comparing, one of the retrained machine learning model or the initially trained machine learning model to perform the determining (Tajima [0026]; “Then, bootstrap sampling is performed for the training data to create one or a plurality pieces of new training data. Here, the bootstrap sampling is a statistical sampling method of extracting n pieces of data from n pieces of data of a target forgiving duplication. Then, one or a plurality of new anomaly detection models are created using the created training data. In the case where an anomaly detection model (referred to as “ensemble model”) created by combination of the original anomaly detection model and the newly created anomaly detection model indicates a variance of the anomaly scores that is small in comparison with the original anomaly detection model, the ensemble model is replaced with the original anomaly detection model. In the present embodiment, as a specific method for the combination, a method based on the weighted linear sum is used. At this time, the balance (weight) of the combination is determined searching for a best one by line search or the like. Such a sequence of processes is repeated by the predetermined number of times to configure an anomaly detection model having a minimum variance. It is noted that the variance of the anomaly scores is one index that provides a dispersion, and some other index, for example, IQR (Inter Quartile Range) may be used.” wherein the configuration of the anomaly model having a minimum variance reads on selecting, based on a result of the comparing, one of either the retrained or original machine learning models to perform the anomaly (outlier) classification)

Regarding Claim 15,
Tajima teaches the method of Claim 11 (and thus the rejection of Claim 11 is incorporated). Tajima further discloses adjusting the subsets of the first data, the second data, or the third data until a specified ratio is achieved between the second data and the first data (Tajima [0062]; “Then, the learning unit 122 of the data management server 12 creates new training data X.sub.t by bootstrap sampling of the training data (S204). At this time, the sampling is performed according to the probability P(x) given by the (Formula 2) below representing the anomaly score at time t−1 as S.sub.t-1. Here, x, x.sub.j∈X.sub.t-1, X.sub.t-1 are training data before X.sub.t, and the index j of the sum total of the denominator of the (Formula 2) moves all elements of X.sub.t-1. In other words, as the anomaly score becomes higher, it is sampled at a higher probability. This process makes it possible to create a new anomaly detection model that decreases the dispersion efficiently by a process described hereinafter.” wherein the anomaly score is read on a specified ratio achieved between outlier and non-outlier data to determine the sampling rate probability)

Regarding Claim 16,
Tajima teaches the method of Claim 11 (and thus the rejection of Claim 11 is incorporated). Tajima further discloses wherein the determining is performed in response to the new event issuing a request to access a resource in the specified environment, and wherein the operations further comprise: denying the request when the new event is determined to be the outlier event; or granting the request when the new event is determined to not be the outlier event (Tajima [0077]; “Then, the detection unit 112 of the controller 11 decides whether or not the anomaly score is higher than a threshold value determined in advance (S304). When the anomaly score is higher than the threshold value (S304: YES), the processing advances to S305. In the other case (S304: NO), the present processing is ended.” wherein the processing is determined to advance or end according to the classification of the anomaly score as anomalous when compared against a threshold value, thus interpreted as denying or granting access to the specified processing resource when new data is classified as anomalous or non-anomalous (outlier or non-outlier, respectively) data)

Regarding Claim 17,
Tajima teaches the method of Claim 16 (and thus the rejection of Claim 16 is incorporated). Tajima further discloses wherein the operations further comprise: when the request is denied, providing, to entity associated with the new event, a reason why the request is denied, wherein the reason is based at least in part on the trained machine learning model (Tajima [0078]; “When the anomaly score is higher than the threshold value in S304, the detection unit 112 of the controller 11 notifies the display unit 131 of the client terminal 13 that an anomaly has been found. In response to this, the display unit 131 of the client terminal 13 presents information for allowing the user to know a situation of the anomaly of the operation data 100 or the anomaly detection result data 400 to the user (S305).”)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-2, 4-7, 9-10; 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Tajima et al. (US20210224599A1, hereinafter “Tajima”) in view of Drozhak et al. (US20230004486A1, hereinafter “Drozhak” as disclosed in IDS).

Regarding Claim 1,
Tajima discloses A method, comprising: accessing historical outlier data …, wherein the historical outlier data comprises a plurality of parameters (Tajima [0046]; “ Operation data 100 is data collected from the equipment 10 by the controller 11 and is managed by the local data management unit 113, and is, specifically, data relating to sensor values of the sensors attached to the equipment 10 and control signals to be sent to the equipment 10, for example. As depicted in FIG. 3, the operation data 100 includes items for a date and time 101, an item name 102, and a value 103. The date and time 101 is date and time when operation data is generated or collected. The item name 102 is a name for identifying operation data and is, for example, a sensor number or a control signal number. The value 103 is a value of operation data of the date and time and the item.
It is noted that also the operation data managed by the integrated data management unit 123 of the data management server 12 is similar in contents and is integration of the operation data 100 of the local data management unit 113 of the controllers 11.”
Tajima [0053]; “First, the collection unit 111 of the controller 11 collects the operation data 100 in a normal condition from both or one of the equipment 10 and the controller 11, and stores the collected operation data 100 in a normal condition into the local data management unit 113 (S101). Note that it is assumed that, in the present embodiment, the period of data collected by the collection unit 111 is fixed. If the period is not fixed, the operation data 100 is converted into operation data adjusted in period by interpolation or the like and then stored into the local data management unit 113.
Then, the collection and delivery unit 121 of the data management server 12 aggregates the operation data 100 stored in the local data management unit 113 of the controllers 11 and stores the aggregated operation data 100 into the integrated data management unit 123 of the data management server 12 (S102).
Then, the learning unit 122 of the data management server 12 constructs (learns) an anomaly detection model using the operation data 100 associated, in item name, with the model ID in the monitoring unit definition data 200 (S103). Note that it is assumed that, prior to this processing, appropriate monitoring unit definition data 200 is registered and association between the model ID and the operation data 100 is completed already. It is noted that the process of constructing (learning) an anomaly detection model is hereinafter described in detail)
generating, based on the historical outlier data and using a minority oversampling technique, synthetic outlier data associated with the plurality of user accounts; (Tajima [0062]; “Then, the learning unit 122 of the data management server 12 creates new training data X.sub.t by bootstrap sampling of the training data (S204). At this time, the sampling is performed according to the probability P(x) given by the (Formula 2) below representing the anomaly score at time t−1 as S.sub.t-1. Here, x, x.sub.j∈X.sub.t-1, X.sub.t-1 are training data before X.sub.t, and the index j of the sum total of the denominator of the (Formula 2) moves all elements of X.sub.t-1. In other words, as the anomaly score becomes higher, it is sampled at a higher probability. This process makes it possible to create a new anomaly detection model that decreases the dispersion efficiently by a process described hereinafter. It is noted that, although, in the present embodiment, sampling is performed using a ratio of the anomaly score simply, the sampling may be performed otherwise based on some other distribution such as random distribution. Further, when sampling is performed, not only extracting data from within existing operation data, but also using interpolation values or estimated values may be possible. For example, an oversampling method such as SMOTE (Synthetic Minority Over-sampling Technique, including interpolation using neighborhood points) or a method of learning a creation model such as GAN (Generative Adversarial Networks) from operation data and then performing sampling from within the creation model may be used. This makes it possible to construct an anomaly detection model including information that is not included in operation data, and as a result, in some cases, the detection performance can be improved.” wherein a minority oversampling technique performed to generate new training data for anomaly detection reads on generating synthetic outlier data)
combining the historical outlier data, at least a subset of the synthetic outlier data, and historical non-outlier data associated with the plurality of user accounts into a unified dataset; training a machine learning model with the unified dataset; (Tajima [0063]; “Then, the learning unit 122 of the data management server 12 creates a new anomaly detection model using the training data X.sub.t (S205). This procedure is similar to that in S205. The anomaly score S.sub.new of this anomaly detection model is given by the (Formula 3) below. It is noted that, when the distance calculation becomes a bottleneck, a method of approximating distance calculation such as binary hashing or a Product Quantization method (PQ) may be used. By this, the load of the distance calculation can be reduced significantly. Then, the learning unit 122 of the data management server 12 combines the anomaly detection model at time t−1 and the newly created anomaly detection model to create the anomaly detection model at time t whose dispersion (variance) is small (S206). The anomaly score S.sub.t of this anomaly detection model is given by the weighted linear sum of the newly created anomaly detection model and the anomaly detection model at time t−1. S.sub.t is given by the following formula.” wherein the construction of a new anomaly detection model including information that is not included in operation data reads on training a machine learning model with a new data set comprised in part of the new training data and original training data (unified dataset))
and classifying, based on the trained machine learning model, new data as either outlier data or non-outlier data in the plurality of user accounts (Tajima [0073]; “Next, processing of the monitoring phase in the anomaly detection system is described with reference to FIG. 9. Note that it is assumed that, prior to the processing of the monitoring phase, operation data in the equipment 10 is collected in advance.
First, the detection unit 112 of the controller 11 calculates an anomaly score in an initial stage (referred to as a “initial anomaly score”) using an anomaly detection model whose sub model ID 302 is 0, that is, a first anomaly detection model (S301).
Then, the detection unit 112 of the controller 11 calculates an anomaly score using an anomaly detection model whose sub model ID 302 is −1, that is, the last anomaly detection model (S302). It is noted that, when an anomaly detection model whose sub model ID 302 is −1 is not found, an anomaly score is calculated by a procedure similar to that when the anomaly score of the (Formula 4) given hereinabove is calculated.
Then, the detection unit 112 of the controller 11 registers the initial anomaly score and the anomaly score into the anomaly detection result data 400. Further, the detection unit 112 of the controller 11 registers similar data into the integrated data management unit 123 of the data management server 12 through the collection and delivery unit 121 of the data management server 12 (S303).
Then, the detection unit 112 of the controller 11 decides whether or not the anomaly score is higher than a threshold value determined in advance (S304). When the anomaly score is higher than the threshold value (S304: YES), the processing advances to S305. In the other case (S304: NO), the present processing is ended.” wherein the anomaly detection model used for classification of new data as anomalous reads on classifying new data as either outlier or non-outlier data)

Tajima does not explicitly disclose but Drozhak discloses Outlier data corresponding to a plurality of user accounts (Drozhak [0054]; “At requirements definition step 120, the application's specifications (e.g., functionality, performance characteristics, etc.) may be defined. For example, a social media application's functional requirements may include the ability to connect with a friend. In some examples, defining the project's requirements may also include defining the resources to be used to develop the software”
Drozhak [0066]; “The data preparation operations may include, for example, characterizing the input data. Characterizing the input data may include detecting missing observations, detecting missing variable values, and/or identifying outlying variable values”
Drozhak [0068]; “The model-fitting steps may include, without limitation, algorithm selection, parameter estimation, hyperparameter tuning, scoring, diagnostics, etc. The model creation and evaluation module 360 may perform model fitting operations on any suitable type of model, including (without limitation) decision trees, neural networks, support vector machine models, regression models, boosted trees, random forests, deep learning neural networks, k-nearest neighbors models, naïve Bayes models, etc.” interpreted as the identification of outlying variable values in the input data associated with the plurality of social media user accounts wherein the variable values are read on as a plurality of parameters)

It would have been obvious for Tajima’s obtained historical outlier data to be correspondent to user accounts. One would have been motivated to perform Tajima’s outlier detection methodology on Drozhak’s user accounts to “help reduce the number of bugs and glitches that users encounter, resulting in higher user satisfaction and/or increased adoption of the software” (Drozhak [0057]).

Regarding Claim 2,
The combination of Tajima/Drozhak teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Tajima/Drozhak already discloses retraining the machine learning model based on additional outlier data received after an initial training of the machine learning model (Tajima [0025]; “In the learning phase, an anomaly detection model is first learned from operation data collected from various apparatus and equipment. Although various models of mechanical learning can be adopted as the anomaly detection model, in the description of the present embodiment, an example that uses a model based on the k-nearest neighbor method is described. It is noted that also it is possible to use a model of different mechanical learning or statistics. In the learning process, a first anomaly detection model based on the k-nearest neighbor method is learned first using normal operation data as training data. The k-nearest neighbor method is referred to as lazy learning and merely stores, from its nature, training data into a memory without processing fetched data.
Then, bootstrap sampling is performed for the training data to create one or a plurality pieces of new training data. Here, the bootstrap sampling is a statistical sampling method of extracting n pieces of data from n pieces of data of a target forgiving duplication. Then, one or a plurality of new anomaly detection models are created using the created training data. In the case where an anomaly detection model (referred to as “ensemble model”) created by combination of the original anomaly detection model and the newly created anomaly detection model indicates a variance of the anomaly scores that is small in comparison with the original anomaly detection model, the ensemble model is replaced with the original anomaly detection model. In the present embodiment, as a specific method for the combination, a method based on the weighted linear sum is used. At this time, the balance (weight) of the combination is determined searching for a best one by line search or the like. Such a sequence of processes is repeated by the predetermined number of times to configure an anomaly detection model having a minimum variance. It is noted that the variance of the anomaly scores is one index that provides a dispersion, and some other index, for example, IQR (Inter Quartile Range) may be used”
Tajima [0028]; “Therefore, a different model may be learned using ensemble models. For example, so-called self-taught learning of learning an anomaly score for a training data set of an ensemble model with a regression model (a model that represents certain two variables using an estimate formula by a statistical method) may be performed. Further, an anomaly detection model may be constructed using training data created by sampling data sets configuring individual ensemble models according to the weights of the models. This makes it possible to re-construct an anomaly detection model that has similar natures and is comparatively light in calculation amount” wherein creation of a plurality of new anomaly detection models based on the original anomaly detection model reads on retraining of the machine learning model based on the additional outlier data)
comparing a performance of the retrained machine learning model with a performance of the initially trained machine learning model (Tajima [0026]; “Then, bootstrap sampling is performed for the training data to create one or a plurality pieces of new training data. Here, the bootstrap sampling is a statistical sampling method of extracting n pieces of data from n pieces of data of a target forgiving duplication. Then, one or a plurality of new anomaly detection models are created using the created training data. In the case where an anomaly detection model (referred to as “ensemble model”) created by combination of the original anomaly detection model and the newly created anomaly detection model indicates a variance of the anomaly scores that is small in comparison with the original anomaly detection model, the ensemble model is replaced with the original anomaly detection model. In the present embodiment, as a specific method for the combination, a method based on the weighted linear sum is used. At this time, the balance (weight) of the combination is determined searching for a best one by line search or the like. Such a sequence of processes is repeated by the predetermined number of times to configure an anomaly detection model having a minimum variance. It is noted that the variance of the anomaly scores is one index that provides a dispersion, and some other index, for example, IQR (Inter Quartile Range) may be used.” wherein the comparison of variance between the original anomaly detection model and the ensemble retrained anomaly detection model reads on comparing a performance of the retrained outlier machine learning model with a performance of the initially trained machine learning model)
and selecting, based on a result of the comparing, one of the retrained machine learning model or the initially trained machine learning model to perform the classifying (Tajima [0026]; “Then, bootstrap sampling is performed for the training data to create one or a plurality pieces of new training data. Here, the bootstrap sampling is a statistical sampling method of extracting n pieces of data from n pieces of data of a target forgiving duplication. Then, one or a plurality of new anomaly detection models are created using the created training data. In the case where an anomaly detection model (referred to as “ensemble model”) created by combination of the original anomaly detection model and the newly created anomaly detection model indicates a variance of the anomaly scores that is small in comparison with the original anomaly detection model, the ensemble model is replaced with the original anomaly detection model. In the present embodiment, as a specific method for the combination, a method based on the weighted linear sum is used. At this time, the balance (weight) of the combination is determined searching for a best one by line search or the like. Such a sequence of processes is repeated by the predetermined number of times to configure an anomaly detection model having a minimum variance. It is noted that the variance of the anomaly scores is one index that provides a dispersion, and some other index, for example, IQR (Inter Quartile Range) may be used.” wherein the configuration of the anomaly model having a minimum variance reads on selecting, based on a result of the comparing, one of either the retrained or original machine learning models to perform the anomaly (outlier) classification)

Regarding Claim 4,
The combination of Tajima/Drozhak teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Tajima/Drozhak already discloses generating the synthetic outlier data is performed based on a determination that an outlier trend in the plurality of user accounts has occurred or is occurring, but that an amount of the historical outlier data is insufficient to train the machine learning model (Tajima [0027]; “In the monitoring phase, an anomaly score is calculated using operation data at the time of monitoring and an anomaly detection model. In the case where the anomaly score exceeds a predetermined threshold value, it is determined that an anomaly or an omen of an anomaly has occurred, and a notification of the anomaly situation is issued to the user. At this time, the dissociation in anomaly score between the original anomaly detection model and the final anomaly detection model is presented additionally. This makes it possible for the system to give a suggestion to the user whether the detection result is based on low density or minority operation data in a normal condition or is based on high density or majority operation data in a normal condition“
Tajima [0055]; “Then, the learning unit 122 of the data management server 12 constructs (learns) an anomaly detection model using the operation data 100 associated, in item name, with the model ID in the monitoring unit definition data 200 (S103). Note that it is assumed that, prior to this processing, appropriate monitoring unit definition data 200 is registered and association between the model ID and the operation data 100 is completed already. It is noted that the process of constructing (learning) an anomaly detection model is hereinafter described in detail”
Tajima [0086]; “In the score ratio displaying pane 605, a ratio of an anomaly score to an initial anomaly score that are calculated with the anomaly detection model of the selected model ID (score ratio=anomaly score/initial anomaly score) are displayed. The axis of abscissa of a graph displayed indicates time, and the axis of ordinate indicates the anomaly score. By viewing the initial anomaly score in the initial anomaly score displaying pane 604 described above and the score ratio, the user can grasp a low-density portion or minority portion. For example, the score ratio corresponding to portions of broken line frames 602x2 and 602x3 is low. This indicates that the activation state, ending state, and so forth of the system at the portions are minorities in comparison with those in the stop state and a steady operation state. In particular, in the present embodiment, it is indicated that an anomaly score is not grasped as an anomaly with respect to the initial anomaly score of a minority portion. Accordingly, when the user analyzes an anomaly of the system, by observing a portion at which the score ratio is low, the user can obtain a suggestion about the portion at which the training data is insufficient”)

Regarding Claim 5,
The combination of Tajima/Drozhak teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Tajima/Drozhak already discloses generating the synthetic outlier data is performed at least in part by varying values of one or more of the plurality of parameters of the historical outlier data (Tajima [0062]; “Then, the learning unit 122 of the data management server 12 creates new training data X.sub.t by bootstrap sampling of the training data (S204). At this time, the sampling is performed according to the probability P(x) given by the (Formula 2) below representing the anomaly score at time t−1 as S.sub.t-1. Here, x, x.sub.j∈X.sub.t-1, X.sub.t-1 are training data before X.sub.t, and the index j of the sum total of the denominator of the (Formula 2) moves all elements of X.sub.t-1. In other words, as the anomaly score becomes higher, it is sampled at a higher probability. This process makes it possible to create a new anomaly detection model that decreases the dispersion efficiently by a process described hereinafter. It is noted that, although, in the present embodiment, sampling is performed using a ratio of the anomaly score simply, the sampling may be performed otherwise based on some other distribution such as random distribution. Further, when sampling is performed, not only extracting data from within existing operation data, but also using interpolation values or estimated values may be possible. For example, an oversampling method such as SMOTE (Synthetic Minority Over-sampling Technique, including interpolation using neighborhood points) or a method of learning a creation model such as GAN (Generative Adversarial Networks) from operation data and then performing sampling from within the creation model may be used. This makes it possible to construct an anomaly detection model including information that is not included in operation data, and as a result, in some cases, the detection performance can be improved.” wherein the bootstrap sampling of the original training data to generate new synthetic outlier training data reads on varying values of the parameters of the original data)

Regarding Claim 6,
The combination of Tajima/Drozhak teaches the method of Claim 5 (and thus the rejection of Claim 5 is incorporated). Tajima/Drozhak already discloses wherein the varying comprises: selecting an instance of the historical outlier data corresponding to a particular historical event; and generating a hypothetical event at least in part by varying a value of a first parameter of the selected instance while maintaining values of at least a subset of remaining parameters of the selected instance, wherein the hypothetical event is different from any actual historical event corresponding to the historical outlier data (Tajima [0062]; “Then, the learning unit 122 of the data management server 12 creates new training data X.sub.t by bootstrap sampling of the training data (S204). At this time, the sampling is performed according to the probability P(x) given by the (Formula 2) below representing the anomaly score at time t−1 as S.sub.t-1. Here, x, x.sub.j∈X.sub.t-1, X.sub.t-1 are training data before X.sub.t, and the index j of the sum total of the denominator of the (Formula 2) moves all elements of X.sub.t-1. In other words, as the anomaly score becomes higher, it is sampled at a higher probability. This process makes it possible to create a new anomaly detection model that decreases the dispersion efficiently by a process described hereinafter.” wherein the bootstrap sampling to generate new training data is conducted for data corresponding to timestamps in a time series historical dataset, thus reading on the varying being performed through selection of a particular historical event (associated at some time t-1) and generating of a hypothetical synthetic event through varying some parameter value through bootstrap sampling)

Regarding Claim 7,
The combination of Tajima/Drozhak teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Tajima/Drozhak already discloses determining an amount of the synthetic outlier data to be used as the subset of the synthetic outlier data to be combined, such that a specified ratio is achieved between a total amount of outlier data and a total amount of non-outlier data (Tajima [0062]; “Then, the learning unit 122 of the data management server 12 creates new training data X.sub.t by bootstrap sampling of the training data (S204). At this time, the sampling is performed according to the probability P(x) given by the (Formula 2) below representing the anomaly score at time t−1 as S.sub.t-1. Here, x, x.sub.j∈X.sub.t-1, X.sub.t-1 are training data before X.sub.t, and the index j of the sum total of the denominator of the (Formula 2) moves all elements of X.sub.t-1. In other words, as the anomaly score becomes higher, it is sampled at a higher probability. This process makes it possible to create a new anomaly detection model that decreases the dispersion efficiently by a process described hereinafter.” wherein the anomaly score is read on a specified ratio achieved between outlier and non-outlier data to determine the sampling rate probability)

Regarding Claim 9,
The combination of Tajima/Drozhak teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Tajima/Drozhak already discloses wherein the classifying is performed in response to a request to access a specified resource, wherein the new data is associated with the request, and wherein the method further comprises: denying access to the specified resource when the new data is classified as the outlier data; or granting access to the specified resource when the new data is classified as the non-outlier data (Tajima [0077]; “Then, the detection unit 112 of the controller 11 decides whether or not the anomaly score is higher than a threshold value determined in advance (S304). When the anomaly score is higher than the threshold value (S304: YES), the processing advances to S305. In the other case (S304: NO), the present processing is ended.” wherein the processing is determined to advance or end according to the classification of the anomaly score as anomalous when compared against a threshold value, thus interpreted as denying or granting access to the specified processing resource when new data is classified as anomalous or non-anomalous (outlier or non-outlier, respectively) data)

Regarding Claim 10,
The combination of Tajima/Drozhak teaches the method of Claim 9 (and thus the rejection of Claim 9 is incorporated). Tajima/Drozhak alerady discloses when the access to the specified resource is denied, providing, to entity that issued the request, a reason why the access is denied, wherein the providing is based at least in part on the trained machine learning model (Tajima [0078]; “When the anomaly score is higher than the threshold value in S304, the detection unit 112 of the controller 11 notifies the display unit 131 of the client terminal 13 that an anomaly has been found. In response to this, the display unit 131 of the client terminal 13 presents information for allowing the user to know a situation of the anomaly of the operation data 100 or the anomaly detection result data 400 to the user (S305).”)
Regarding Claim 18,
Tajima discloses accessing historical outlier data …, wherein the historical outlier data comprises a plurality of parameters (Tajima [0046]; “ Operation data 100 is data collected from the equipment 10 by the controller 11 and is managed by the local data management unit 113, and is, specifically, data relating to sensor values of the sensors attached to the equipment 10 and control signals to be sent to the equipment 10, for example. As depicted in FIG. 3, the operation data 100 includes items for a date and time 101, an item name 102, and a value 103. The date and time 101 is date and time when operation data is generated or collected. The item name 102 is a name for identifying operation data and is, for example, a sensor number or a control signal number. The value 103 is a value of operation data of the date and time and the item.
It is noted that also the operation data managed by the integrated data management unit 123 of the data management server 12 is similar in contents and is integration of the operation data 100 of the local data management unit 113 of the controllers 11.”
Tajima [0053]; “First, the collection unit 111 of the controller 11 collects the operation data 100 in a normal condition from both or one of the equipment 10 and the controller 11, and stores the collected operation data 100 in a normal condition into the local data management unit 113 (S101). Note that it is assumed that, in the present embodiment, the period of data collected by the collection unit 111 is fixed. If the period is not fixed, the operation data 100 is converted into operation data adjusted in period by interpolation or the like and then stored into the local data management unit 113.
Then, the collection and delivery unit 121 of the data management server 12 aggregates the operation data 100 stored in the local data management unit 113 of the controllers 11 and stores the aggregated operation data 100 into the integrated data management unit 123 of the data management server 12 (S102).
Then, the learning unit 122 of the data management server 12 constructs (learns) an anomaly detection model using the operation data 100 associated, in item name, with the model ID in the monitoring unit definition data 200 (S103). Note that it is assumed that, prior to this processing, appropriate monitoring unit definition data 200 is registered and association between the model ID and the operation data 100 is completed already. It is noted that the process of constructing (learning) an anomaly detection model is hereinafter described in detail”)
and wherein an amount of the historical outlier data is insufficient to train a machine learning model; (Tajima [0027]; “In the monitoring phase, an anomaly score is calculated using operation data at the time of monitoring and an anomaly detection model. In the case where the anomaly score exceeds a predetermined threshold value, it is determined that an anomaly or an omen of an anomaly has occurred, and a notification of the anomaly situation is issued to the user. At this time, the dissociation in anomaly score between the original anomaly detection model and the final anomaly detection model is presented additionally. This makes it possible for the system to give a suggestion to the user whether the detection result is based on low density or minority operation data in a normal condition or is based on high density or majority operation data in a normal condition“
Tajima [0055]; “Then, the learning unit 122 of the data management server 12 constructs (learns) an anomaly detection model using the operation data 100 associated, in item name, with the model ID in the monitoring unit definition data 200 (S103). Note that it is assumed that, prior to this processing, appropriate monitoring unit definition data 200 is registered and association between the model ID and the operation data 100 is completed already. It is noted that the process of constructing (learning) an anomaly detection model is hereinafter described in detail”
Tajima [0086]; “In the score ratio displaying pane 605, a ratio of an anomaly score to an initial anomaly score that are calculated with the anomaly detection model of the selected model ID (score ratio=anomaly score/initial anomaly score) are displayed. The axis of abscissa of a graph displayed indicates time, and the axis of ordinate indicates the anomaly score. By viewing the initial anomaly score in the initial anomaly score displaying pane 604 described above and the score ratio, the user can grasp a low-density portion or minority portion. For example, the score ratio corresponding to portions of broken line frames 602x2 and 602x3 is low. This indicates that the activation state, ending state, and so forth of the system at the portions are minorities in comparison with those in the stop state and a steady operation state. In particular, in the present embodiment, it is indicated that an anomaly score is not grasped as an anomaly with respect to the initial anomaly score of a minority portion. Accordingly, when the user analyzes an anomaly of the system, by observing a portion at which the score ratio is low, the user can obtain a suggestion about the portion at which the training data is insufficient”)
generating, based on the historical outlier data and using a minority oversampling technique, synthetic outlier data associated with the plurality of user accounts; (Tajima [0062]; “Then, the learning unit 122 of the data management server 12 creates new training data X.sub.t by bootstrap sampling of the training data (S204). At this time, the sampling is performed according to the probability P(x) given by the (Formula 2) below representing the anomaly score at time t−1 as S.sub.t-1. Here, x, x.sub.j∈X.sub.t-1, X.sub.t-1 are training data before X.sub.t, and the index j of the sum total of the denominator of the (Formula 2) moves all elements of X.sub.t-1. In other words, as the anomaly score becomes higher, it is sampled at a higher probability. This process makes it possible to create a new anomaly detection model that decreases the dispersion efficiently by a process described hereinafter. It is noted that, although, in the present embodiment, sampling is performed using a ratio of the anomaly score simply, the sampling may be performed otherwise based on some other distribution such as random distribution. Further, when sampling is performed, not only extracting data from within existing operation data, but also using interpolation values or estimated values may be possible. For example, an oversampling method such as SMOTE (Synthetic Minority Over-sampling Technique, including interpolation using neighborhood points) or a method of learning a creation model such as GAN (Generative Adversarial Networks) from operation data and then performing sampling from within the creation model may be used. This makes it possible to construct an anomaly detection model including information that is not included in operation data, and as a result, in some cases, the detection performance can be improved.” wherein a minority oversampling technique performed to generate new training data for anomaly detection reads on generating synthetic outlier data)
generating an aggregated dataset based on at least a subset of the historical outlier data, at least a subset of the synthetic outlier data, and at least a subset of historical non-outlier data associated with the plurality of user accounts; training a machine learning model with the aggregated dataset (Tajima [0063]; “Then, the learning unit 122 of the data management server 12 creates a new anomaly detection model using the training data X.sub.t (S205). This procedure is similar to that in S205. The anomaly score S.sub.new of this anomaly detection model is given by the (Formula 3) below. It is noted that, when the distance calculation becomes a bottleneck, a method of approximating distance calculation such as binary hashing or a Product Quantization method (PQ) may be used. By this, the load of the distance calculation can be reduced significantly. Then, the learning unit 122 of the data management server 12 combines the anomaly detection model at time t−1 and the newly created anomaly detection model to create the anomaly detection model at time t whose dispersion (variance) is small (S206). The anomaly score S.sub.t of this anomaly detection model is given by the weighted linear sum of the newly created anomaly detection model and the anomaly detection model at time t−1. S.sub.t is given by the following formula.” wherein the construction of a new anomaly detection model including information that is not included in operation data reads on training a machine learning model with a new aggregated data set comprised in part of the new training data (unified dataset))
accessing new data after the machine learning model has been trained, the new data containing a request to access a resource (Tajima [0077]; “Then, the detection unit 112 of the controller 11 decides whether or not the anomaly score is higher than a threshold value determined in advance (S304). When the anomaly score is higher than the threshold value (S304: YES), the processing advances to S305. In the other case (S304: NO), the present processing is ended.” wherein the processing is determined to advance or end according to the classification of the anomaly score as anomalous when compared against a threshold value, thus interpreted as denying or granting access to the specified processing resource when new data is classified as anomalous or non-anomalous (outlier or non-outlier, respectively) data)
and determining, based on the trained machine learning model, whether the new data should be classified as outlier data or non-outlier data (Tajima [0073]; “Next, processing of the monitoring phase in the anomaly detection system is described with reference to FIG. 9. Note that it is assumed that, prior to the processing of the monitoring phase, operation data in the equipment 10 is collected in advance.
First, the detection unit 112 of the controller 11 calculates an anomaly score in an initial stage (referred to as a “initial anomaly score”) using an anomaly detection model whose sub model ID 302 is 0, that is, a first anomaly detection model (S301).
Then, the detection unit 112 of the controller 11 calculates an anomaly score using an anomaly detection model whose sub model ID 302 is −1, that is, the last anomaly detection model (S302). It is noted that, when an anomaly detection model whose sub model ID 302 is −1 is not found, an anomaly score is calculated by a procedure similar to that when the anomaly score of the (Formula 4) given hereinabove is calculated.
Then, the detection unit 112 of the controller 11 registers the initial anomaly score and the anomaly score into the anomaly detection result data 400. Further, the detection unit 112 of the controller 11 registers similar data into the integrated data management unit 123 of the data management server 12 through the collection and delivery unit 121 of the data management server 12 (S303).
Then, the detection unit 112 of the controller 11 decides whether or not the anomaly score is higher than a threshold value determined in advance (S304). When the anomaly score is higher than the threshold value (S304: YES), the processing advances to S305. In the other case (S304: NO), the present processing is ended.” wherein the anomaly detection model used for classification of new data as anomalous reads on classifying new data as either outlier or non-outlier data)

Tajima does not explicitly disclose but Drozhak discloses Historical outlier data, corresponding to a plurality of user accounts (Drozhak [0054]; “At requirements definition step 120, the application's specifications (e.g., functionality, performance characteristics, etc.) may be defined. For example, a social media application's functional requirements may include the ability to connect with a friend. In some examples, defining the project's requirements may also include defining the resources to be used to develop the software”
Drozhak [0066]; “The data preparation operations may include, for example, characterizing the input data. Characterizing the input data may include detecting missing observations, detecting missing variable values, and/or identifying outlying variable values”
Drozhak [0068]; “The model-fitting steps may include, without limitation, algorithm selection, parameter estimation, hyperparameter tuning, scoring, diagnostics, etc. The model creation and evaluation module 360 may perform model fitting operations on any suitable type of model, including (without limitation) decision trees, neural networks, support vector machine models, regression models, boosted trees, random forests, deep learning neural networks, k-nearest neighbors models, naïve Bayes models, etc.” interpreted as the identification of outlying variable values in the input data associated with the plurality of social media user accounts wherein the variable values are read on as a plurality of parameters)

It would have been obvious for Tajima’s obtained historical outlier data to be correspondent to user accounts. One would have been motivated to perform Tajima’s outlier detection methodology on Drozhak’s user accounts to “help reduce the number of bugs and glitches that users encounter, resulting in higher user satisfaction and/or increased adoption of the software” (Drozhak [0057]).

Regarding Claim 19,
The combination of Tajima/Drozhak teaches the method of Claim 18 (and thus the rejection of Claim 18 is incorporated). Tajima/Drozhak further discloses wherein the resource comprises a use account of the plurality of user accounts, and wherein the operations further comprise: denying, based on a determination that the new data should be classified as outlier data, the request to access the resource; and providing, to an entity associated with the new data, an explanation why the request is denied, wherein the explanation is generated at least in part based on the trained machine learning model (Tajima [0078]; “When the anomaly score is higher than the threshold value in S304, the detection unit 112 of the controller 11 notifies the display unit 131 of the client terminal 13 that an anomaly has been found. In response to this, the display unit 131 of the client terminal 13 presents information for allowing the user to know a situation of the anomaly of the operation data 100 or the anomaly detection result data 400 to the user (S305).”)

Regarding Claim 20,
The combination of Tajima/Drozhak teaches the method of Claim 18 (and thus the rejection of Claim 18 is incorporated). Tajima/Drozhak further discloses retraining the machine learning model based on additional outlier data received after an initial training of the machine learning model; (Tajima [0025]; “In the learning phase, an anomaly detection model is first learned from operation data collected from various apparatus and equipment. Although various models of mechanical learning can be adopted as the anomaly detection model, in the description of the present embodiment, an example that uses a model based on the k-nearest neighbor method is described. It is noted that also it is possible to use a model of different mechanical learning or statistics. In the learning process, a first anomaly detection model based on the k-nearest neighbor method is learned first using normal operation data as training data. The k-nearest neighbor method is referred to as lazy learning and merely stores, from its nature, training data into a memory without processing fetched data.
Then, bootstrap sampling is performed for the training data to create one or a plurality pieces of new training data. Here, the bootstrap sampling is a statistical sampling method of extracting n pieces of data from n pieces of data of a target forgiving duplication. Then, one or a plurality of new anomaly detection models are created using the created training data. In the case where an anomaly detection model (referred to as “ensemble model”) created by combination of the original anomaly detection model and the newly created anomaly detection model indicates a variance of the anomaly scores that is small in comparison with the original anomaly detection model, the ensemble model is replaced with the original anomaly detection model. In the present embodiment, as a specific method for the combination, a method based on the weighted linear sum is used. At this time, the balance (weight) of the combination is determined searching for a best one by line search or the like. Such a sequence of processes is repeated by the predetermined number of times to configure an anomaly detection model having a minimum variance. It is noted that the variance of the anomaly scores is one index that provides a dispersion, and some other index, for example, IQR (Inter Quartile Range) may be used”
Tajima [0028]; “Therefore, a different model may be learned using ensemble models. For example, so-called self-taught learning of learning an anomaly score for a training data set of an ensemble model with a regression model (a model that represents certain two variables using an estimate formula by a statistical method) may be performed. Further, an anomaly detection model may be constructed using training data created by sampling data sets configuring individual ensemble models according to the weights of the models. This makes it possible to re-construct an anomaly detection model that has similar natures and is comparatively light in calculation amount” wherein creation of a plurality of new anomaly detection models based on the original anomaly detection model reads on retraining of the machine learning model based on the additional outlier data)
comparing a performance of the retrained machine learning model with a performance of the initially trained machine learning model (Tajima [0026]; “Then, bootstrap sampling is performed for the training data to create one or a plurality pieces of new training data. Here, the bootstrap sampling is a statistical sampling method of extracting n pieces of data from n pieces of data of a target forgiving duplication. Then, one or a plurality of new anomaly detection models are created using the created training data. In the case where an anomaly detection model (referred to as “ensemble model”) created by combination of the original anomaly detection model and the newly created anomaly detection model indicates a variance of the anomaly scores that is small in comparison with the original anomaly detection model, the ensemble model is replaced with the original anomaly detection model. In the present embodiment, as a specific method for the combination, a method based on the weighted linear sum is used. At this time, the balance (weight) of the combination is determined searching for a best one by line search or the like. Such a sequence of processes is repeated by the predetermined number of times to configure an anomaly detection model having a minimum variance. It is noted that the variance of the anomaly scores is one index that provides a dispersion, and some other index, for example, IQR (Inter Quartile Range) may be used.” wherein the comparison of variance between the original anomaly detection model and the ensemble retrained anomaly detection model reads on comparing a performance of the retrained outlier machine learning model with a performance of the initially trained machine learning model)
and selecting, based on a result of the comparing, one of the retrained machine learning model or the initially trained machine learning model to perform the determining (Tajima [0026]; “Then, bootstrap sampling is performed for the training data to create one or a plurality pieces of new training data. Here, the bootstrap sampling is a statistical sampling method of extracting n pieces of data from n pieces of data of a target forgiving duplication. Then, one or a plurality of new anomaly detection models are created using the created training data. In the case where an anomaly detection model (referred to as “ensemble model”) created by combination of the original anomaly detection model and the newly created anomaly detection model indicates a variance of the anomaly scores that is small in comparison with the original anomaly detection model, the ensemble model is replaced with the original anomaly detection model. In the present embodiment, as a specific method for the combination, a method based on the weighted linear sum is used. At this time, the balance (weight) of the combination is determined searching for a best one by line search or the like. Such a sequence of processes is repeated by the predetermined number of times to configure an anomaly detection model having a minimum variance. It is noted that the variance of the anomaly scores is one index that provides a dispersion, and some other index, for example, IQR (Inter Quartile Range) may be used.” wherein the configuration of the anomaly model having a minimum variance reads on selecting, based on a result of the comparing, one of either the retrained or original machine learning models to perform the anomaly (outlier) classification)

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Tajima et al. (US20210224599A1, hereinafter “Tajima”) in view of Drozhak et al. (US20230004486A1, hereinafter “Drozhak” as disclosed in IDS) further in view of Umakanth et al. (US20220038332A1, hereinafter “Umakanth”).

Regarding Claim 3,
The combination of Tajima/Drozhak teaches the method of Claim 18 (and thus the rejection of Claim 18 is incorporated). Tajima/Drozhak fails to explicitly disclose but Umakanth discloses evaluating, based on a Shapley Additive Explanation (SHAP) model, an importance of each parameter of the plurality of parameters in the training of the machine learning model (Umakanth [0027]; “Additionally, the anomaly detection system may determine a root cause KPI parameter for an identified anomaly. The root cause KPI parameter may be determined using the anomaly model that has been retrained after filtering out the potential anomalies associated with desirable behavior. As an example, for a tree-based anomaly detection model, such as an isolation forest model, the anomaly detection system may calculate Shapley Additive Explanations (SHAP) values for an identified anomaly to determine the root cause KPI parameter.” wherein the KPI parameters evaluated through SHAP reads on evaluating importance of parameters for training of an anomaly detection model)

It would have been obvious to perform Umakanth’s parameter importance evaluation through a Shapley Additive Explanation (SHAP) model during Tajima/Drozhak’s machine learning model training. One would have been motivated to do so in order “to remove anomalies associated with the particular KPI parameter that do not meet the significance threshold” (Umakanth [0062]).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Tajima et al. (US20210224599A1, hereinafter “Tajima”) in view of Drozhak et al. (US20230004486A1, hereinafter “Drozhak” as disclosed in IDS) further in view of Ye et al. (US20220382622A1, hereinafter “Ye”).

Regarding Claim 8,
The combination of Tajima/Drozhak teaches the method of Claim 7 (and thus the rejection of Claim 7 is incorporated). Tajima/Drozhak fails to explicitly disclose but Ye discloses evaluating a precision percentage or a recall percentage of the trained machine learning model, wherein the specified ratio is determined based on the precision percentage or the recall percentage of the trained machine learning model (Ye [0048]; “Thus, the threshold variance value 412 (or optionally, plurality of threshold variance values 412) defines criteria for determining the anomalous point data value 152A. For example, the detector 410 determines whether the variance value 154 is below a lower bound threshold value or above an upper bound threshold value (i.e., outside the bounds of an acceptable distribution for the variance value 154). The point data anomaly detector 160 may receive user input to determine the threshold variance value 412. For example, the point data anomaly detector 160 receives a recall target 414 and/or a precision target 416 from the user 12 (FIG. 4A). The recall target 414, in some implementations, represents a percentage or portion of the determined or identified anomalous point data values 152A out of the total number of anomalous point data values 152A present in the set of point data values 152. The precision target 416 may represent a percentage or portion of the determined or identified anomalous point data values 152A that are true anomalous point data values 152A and not false positives. Generally, there is a tradeoff between a high recall target 414 (i.e., catching anomalous point data values 152A) and a high precision target 416 (i.e., reducing false positives). Based on the use case, the user 12 may configure the tradeoff appropriately. For example, when diagnosing a disease, a large number of false positives are acceptable to ensure that most anomalies are detected. In this case, the user 12 may pick a threshold between 0.5 and 3.0 to ensure that at least 80% of the fraud can be detected.”)

It would have been obvious for Tajima/Drozhak’s trained machine learning model to be modified to evaluate precision or recall percentages of the trained machine learning model and obtain a ratio threshold between outlier and non-outlier data. One would have been motivated to do so in order to visually “represent a percentage or portion of the determined or identified anomalous point data values 152A that are true anomalous point data values 152A and not false positives” (Ye [0048]).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Tajima et al. (US20210224599A1, hereinafter “Tajima”) further in view of Porwal et al. (“Credit Card Fraud Detection in e-commerce: An outlier detection approach” [2018], hereinafter “Porwal”).

Regarding Claim 12,
Tajima teaches the method of Claim 11 (and thus the rejection of Claim 11 is incorporated). Tajima fails to explicitly disclose but Porwal discloses the specified environment comprises an electronic commerce environment, an information security environment, a healthcare environment, a natural phenomenon environment, a financial markets environment, or an electrical power grid environment; and the historical outlier events comprise a fraudulent transaction in the electronic commerce environment, a cyber-attack in the information security environment, a disease in the healthcare environment, a natural disaster in the natural phenomenon environment, a volatility exceeding a first threshold in the financial markets environment, or an unexpected surge exceeding a second threshold in the electrical power grid environment (Porwal [Abstract]; “Often the challenge associated with tasks like fraud and spam detection is the lack of all likely patterns needed to train suitable supervised learning models. This problem accentuates when the fraudulent patterns are not only scarce, they also change over time. Change in fraudulent pattern is because fraudsters continue to innovate novel ways to circumvent measures put in place to prevent fraud. Limited data and continuously changing patterns makes learning significantly dicult. We hypothesize that good behavior does not change with time and data points representing good behavior have consistent spatial signature under different groupings. Based on this hypothesis we are proposing an approach that detects outliers in large data sets by assigning a consistency score to each data point using an ensemble of clustering methods. Our main contribution is proposing a novel method that can detect outliers in large datasets and is robust to changing patterns. We also argue that area under the ROC curve, although a commonly used metric to evaluate outlier detection methods is not the right metric. Since outlier detection problems have a skewed distribution of classes, precision-recall curves are better suited because precision compares false positives to true positives (outliers) rather than true negatives (inliers) and therefore is not affected by the problem of class imbalance. We show empirically that area under the precision-recall curve is a better than ROC as an evaluation metric.”)

It would have been obvious to perform Tajima’s outlier detection specifically in Porwal’s electronic commerce environment wherein the historical outlier events comprise fraudulent transactions. One would have been motivated to do so to create an outlier detection model that “is robust to changing patterns” (Porwal [Abstract]) to account for how “Change in fraudulent pattern is because fraudsters continue to innovate novel ways to circumvent measures put in place to prevent fraud” (Porwal [Abstract]).

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Tajima et al. (US20210224599A1, hereinafter “Tajima”) in view of Umakanth et al. (US20220038332A1, hereinafter “Umakanth”),

Regarding Claim 14,
Tajima teaches the method of Claim 11 (and thus the rejection of Claim 11 is incorporated). Tajima fails to explicitly disclose but Umakanth discloses assigning, based on a Shapley Additive Explanation (SHAP) model, an importance to each of the plurality of different types of parameters in the training of the machine learning model (Umakanth [0027]; “Additionally, the anomaly detection system may determine a root cause KPI parameter for an identified anomaly. The root cause KPI parameter may be determined using the anomaly model that has been retrained after filtering out the potential anomalies associated with desirable behavior. As an example, for a tree-based anomaly detection model, such as an isolation forest model, the anomaly detection system may calculate Shapley Additive Explanations (SHAP) values for an identified anomaly to determine the root cause KPI parameter.” wherein the KPI parameters evaluated through SHAP reads on evaluating importance of parameters for training of an anomaly detection model)

It would have been obvious to perform Umakanth’s parameter importance evaluation through a Shapley Additive Explanation (SHAP) model during Tajima’s machine learning model training. One would have been motivated to do so in order “to remove anomalies associated with the particular KPI parameter that do not meet the significance threshold” (Umakanth [0062]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
“LEARNING HYPER-PARAMETER SCALING MODELS FOR UNSUPERVISED ANOMALY DETECTION” (US20240095604A1) which discloses precision and recall metrics for quantifying performance of model by threshold ratio 
“Automated generation of anomaly scenarios for testing machine learned anomaly detection models” (US12314385B1) which discloses model learning for anomaly detection scenarios 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN J KIM whose telephone number is (571) 272-0523. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kieu Vu can be reached on (571) 272-4057. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JONATHAN J KIM/Examiner, Art Unit 2141                                                                                                                                                                                                        

/MATTHEW ELL/Supervisory Patent Examiner, Art Unit 2141
Read full office action
Outlier Detection Based On Synthetic Trend Generation For Machine Learning

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Outlier Detection Based On Synthetic Trend Generation For Machine Learning

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email