Last updated: April 19, 2026
Application No. 17/656,263
SYSTEM AND METHOD FOR OUTAGE FORECASTING

Non-Final OA §101§103§112
Filed
Mar 24, 2022
Examiner
HUANG, YAO D
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Adobe Inc.
OA Round
1 (Non-Final)
This examiner grants 63% of cases after interview

— +31.9% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 124 resolved cases, 2023–2026
Examiner Intelligence

HUANG, YAO D View full profile →
Grants 63% of resolved cases
Career Allow Rate
78 granted / 124 resolved
+7.9% vs TC avg
Strong +32% interview lift
Without
With
+31.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
18 currently pending
Career history
142
Total Applications
across all art units
Statute-Specific Performance

§101
17.6%
-22.4% vs TC avg
§103
47.1%
+7.1% vs TC avg
§102
9.5%
-30.5% vs TC avg
§112
22.9%
-17.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 124 resolved cases
Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement filed on 03/24/2022 fails to comply with the provisions of 37 CFR 1.97, 1.98 and MPEP § 609 because the author names of the NPL documents are not correctly given in the IDS form. Specifically, the author names are incorrectly identified with a number (e.g., “1Moss et al.” and “2Aggarwal et al.” instead of “Moss et al.” and “Aggarwal et al.”). As such, the IDS is considered to be non-compliant with the requirement of 37 CFR 1.97(5), which requires that each publication be identified by author.
It has been placed in the application file, but the information referred to therein has not been considered as to the merits.  Applicant is advised that the date of any re-submission of any item of information contained in this information disclosure statement or the submission of any missing element(s) will be the date of submission for purposes of determining compliance with the requirements based on the time of filing the statement, including all certification requirements for statements under 37 CFR 1.97(e).  See MPEP § 609.05(a).
Applicant is invited to correct this issue.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
Claims 16-20 include limitations that invoke § 112(f). Such claim limitation(s) are: 
“forecasting component configured to generate…” in claim 16;
“training component configured to update…” in claim 17
“data collection component configured to collect…” in claim 19
“change attribution component configured to filter…” in claim 20
Here, “component” is a generic placeholder equivalent to “means” and “configured to” is a linking phrase equivalent to “for.” See MPEP § 2181. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
Support is found in paragraphs [0096], [0106], [0108], and [0110] of the specification, which teaches a general-purpose computer implementing the components using software.
It is also noted that claim 18 does not negate the § 112(f) interpretation of the forecasting component in claim 16, since claim 18 only generally associates the operation of the forecasting component with the processor and memory rather than specifically define the forecasting component as software executed by the recited processor. In other words, claim 18 does not exclude the forecasting component as being a device separate from the memory and processor.  
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 14-15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	In claims 14 and 15, “the outage label data” lacks antecedent basis, and is therefore indefinite. For purposes of examination, “the” has been treated as “a.”

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 11 and 13-15 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to an abstract idea without significantly more.
	Independent Claims
Step 2A Prong One: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, independent claim 11 recites an abstract idea in the form of mental processes. A mental process is a process that “can be performed in the human mind, or by a human using a pen and paper” (MPEP § 2106.04(a)(2)(III), paragraph 1). Examples of mental processes include “observations, evaluations, judgments, and opinions” (MPEP § 2106.04(a)(2)(III), paragraph 2).
The following limitations of claim 1 are mental processes:
“generating […] probability distribution information for the service metric based on the time series data” [This step is a mental process that can be performed by observations, evaluations, judgments, and opinions, since the claim does not specifically define the content or data structure of the probability distribution information, nor does it require a specific algorithm of generating it other than the use of a generic machine learning model.]
“generating […] threshold outage information based on the time series data” [This step is a mental process that can be performed by observations, evaluations, judgments, and opinions, since the claim does not specifically define the content or data structure of the threshold outage information, nor does it require a specific algorithm of generating it other than the use of a generic machine learning model.]
Therefore, independent claim 16 recites a judicial exception.
Step 2A Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. The judicial exception recited in the above discussed claims is not integrated into a practical application. 
Independent claim 16 recites the following additional elements, but these additional elements are not sufficient to integrate the judicial exception into a practical application:
“receiving, by a training component, training data including time series data for a service metric of a computer network and outage data for the computer network;” [This element constitutes “adding insignificant extra-solution activity to the judicial exception” (MPEP § 2106.05(g)) since it merely amounts to necessary data gathering or outputting, which identifies is identified in MPEP § 2106.05(g) as a form of extra-solution activity. Furthermore, the specific part of “by a training component” constitute no more than mere instructions to apply the judicial exception using generic computer functions (MPEP § 2106.04(d)(I)), namely the generic computer function of software that possesses a generic training function.]
“training, by the training component, parameters of the machine learning model based on the probability distribution information, the threshold outage information, and the outage data” [These elements constitute no more than mere instructions to apply the judicial exception using generic computer functions (MPEP § 2106.04(d)(I)), namely the generic computer function of machine learning. These additional elements merely invoke the use of generic machine learning as a tool to apply an abstract idea. Furthermore, the ]
“by a/the machine learning model” [These elements constitute no more than mere instructions to apply the judicial exception using generic computer functions (MPEP § 2106.04(d)(I)), namely the generic computer function of machine learning. These additional elements merely invoke the use of generic machine learning as a tool to apply an abstract idea.] 
Therefore, under MPEP 2106.04(d), the additional elements of the claim do not integrate the judicial exception into a practical application. 
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The claims do not include additional elements that are sufficient for the claims to amount to significantly more than the judicial exception.
Additional elements that are mere instructions to apply an exception do not constitute significantly more than a judicial exception under MPEP § 2106.05(I)(A). Therefore, those additional elements identified above in the Prong One analysis as mere instructions to apply an exception do not constitute significantly more.
Additional elements that are considered to be extra-solution activity do not amount to significantly more if, upon their reevaluation in Step 2B, they are also merely appending “well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception” (MPEP § 2106.05(I)(A)). Here, the additional elements that were previously identified as extra-solution activity are reevaluated as follows: 
“receiving, by a training component, training data including time series data for a service metric of a computer network and outage data for the computer network” [This element is well-understood, routine, conventional activity because it is merely a limitation of “receiving or transmitting data over a network” or “storing and retrieving information in memory,” which MPEP § 2106.05(d)(II) identifies as an example of well‐understood, routine, and conventional computer functions.]
Dependent Claims 13-15
The remaining dependent claims being rejected do not recite additional elements, whether considered individually or in combination, that are sufficient to integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. 
	Dependent Claim 13:
“computing […] a distribution loss based on the probability distribution information and the outage data” [This element is an abstract idea in the category of mathematical concepts, specifically a mathematical concept in the form of a mathematical calculation.]
“by the training component” [This element is an additional element besides the abstract idea, but constitutes no more than mere instructions to apply the judicial exception using generic computer functions, for the same reasons given for the corresponding limitation in the parent independent claim.]   
“wherein the parameters of the machine learning model are updated based on the distribution loss.” [This element is an additional element besides the abstract idea, but constitutes no more than mere instructions to apply the judicial exception using generic computer functions, for the same reasons given for the “training” limitation in the parent independent claim. The Examiner notes that the term “updated based on” does not require any specific method of applying the recited elements for the training. Thus, this statement merely recites, at a high degree of generality, some unspecified association between the computed loss and the parameter update process.]   
Dependent Claim 14:
“computing […] a binary cross-entropy loss based on the threshold outage information and the outage label data” [This element is an abstract idea in the category of mathematical concepts, specifically a mathematical concept in the form of a mathematical calculation.]
“by the training component” [This element is an additional element besides the abstract idea, but constitutes no more than mere instructions to apply the judicial exception using generic computer functions, for the same reasons given for the corresponding limitation in the parent independent claim.]   
“wherein the parameters of the machine learning model are updated based on the binary cross-entropy loss.” [This element is an additional element besides the abstract idea, but constitutes no more than mere instructions to apply the judicial exception using generic computer functions, for the same reasons given for the “training” limitation in the parent independent claim. The Examiner notes that the term “updated based on” does not require any specific method of applying the recited elements for the training. Thus, this statement merely recites, at a high degree of generality, some unspecified association between the computed loss and the parameter update process.]   
Dependent Claim 15:
“computing […] an extreme value loss based on the threshold outage information and the outage label data” [This element is an abstract idea in the category of mathematical concepts, specifically a mathematical concept in the form of a mathematical calculation.]
“by the training component” [This element is an additional element besides the abstract idea, but constitutes no more than mere instructions to apply the judicial exception using generic computer functions, for the same reasons given for the corresponding limitation in the parent independent claim.]   
“wherein the parameters of the machine learning model are updated based on the extreme value loss.” [This element is an additional element besides the abstract idea, but constitutes no more than mere instructions to apply the judicial exception using generic computer functions, for the same reasons given for the “training” limitation in the parent independent claim. The Examiner notes that the term “updated based on” does not require any specific method of applying the recited elements for the training. Thus, this statement merely recites, at a high degree of generality, some unspecified association between the computed loss and the parameter update process.]   

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.	Claims 1-3, 5-6, 11-14, and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al., “Outage Prediction and Diagnosis for Cloud Service Systems,” WWW '19: World Wide Web Conference (May 2019) (“Chen”) in view of Gugulothu et al. (US 2020/0401888 A1) (“Gugulothu”).
As to claim 1, Chen teaches a method for outage forecasting, comprising:
receiving, by a machine learning model, time series data for a service metric of a computer network; [§ 4.1.1, paragraph 2: “Given alerting signals across the cloud system, we predict whether a component or service outage will happen. For evaluation, we choose the service and component outages that occurred frequently in the last year…” The alerting signal is a time series, as defined in § 3.1: “A single alerting signal Ai can be represented by Ai …, where each component                         
                            
                                
                                    A
                                
                                
                                    t
                                
                                
                                    i
                                
                            
                        
                     indicates the strength of this signal at time t ∈ (1, T) …The input feature At …are the m alerting signals used for prediction.” The model is a machine learning model, which includes a “Bayesian network” and a “gradient boosting tree” as described in § 3.2 and § 3.3.] 
[…] wherein the machine learning model is trained using […] a classification loss based on a classification output of the machine learning model; [§ 3.3, paragraph 2: “The XGBoost is optimized to achieve the best prediction results by minimizing the following loss… where the first term …is the summation of cross-entropy loss measuring whether the classification model and features can well perform for the prediction task across all timestamp.” Note that in this context, “optimized” refers to the training of the model (see also § 4.1.1, paragraph 1, which describes “training data” for the “training phase”)] and
generating, by a forecasting component, outage forecasting information for the computer network […]. [§ 3.3, paragraph 2: “In the CART tree, each node is one of the alerting signals. The prediction result is the sum of scores predicted by K decision trees…” § 4.1.1, paragraph 2: “Given alerting signals across the cloud system, we predict whether a component or service outage will happen.]
Chen does not teach “generating, by the machine learning model, probability distribution information for the service metric based on the time series data,” the machine learning model being trained also using “a distribution loss based on a distribution output of the machine learning mode,” and the generation of the outage forecasting information being “based on the probability distribution information.”
Gugulothu, which generally relates to time series forecasting using machine learning models (see title and abstract), teaches “generating, by the machine learning model, probability distribution information for the service metric based on the time series data,” [[0042]: “The outputs of the MDN (represented as 218 in FIG. 2A/258 in FIG. 2B) as formulated in (2) model the conditional distribution of the future values yt+1, …, t+p to be predicted given the latent representation zt expressed as follows.” [0043]: “Thus, the MDN layer outputs a well-defined joint probability distribution obtained for all the time steps in the forecast time horizon.” Note that the latent representation is based on a time series. See [0032]: “Given the input time series the encoder learns a latent representation zt of the time series.”], a machine learning model trained “a distribution loss based on a distribution output of the machine learning mode,” [[0043]: “The model parameters are learned by minimizing the negative log-likelihood of the distribution in (3) as shown below”] and the generation of the forecasting information “based on the probability distribution information.” [See parts cited above. See also [0023]: “Finally, mixture density networks are used to model the trend shifts and variability present in the data and provide a confidence estimate of the prediction.” That is, the confidence estimate is a forecasting information.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Chen with the teachings of Gugulothu by implementing the training and the use of a mixture density network (MDN) for confidence estimation, as taught in Gugulothu, for the outage forecasting application taught in Chen, so as to arrive at the claimed invention. The motivation would have been to use a type of model, suitable for time series forecasting in general, that enables modeling of variability and trend shifts present in the input and determination of a confidence estimate of a prediction, as suggested by Gugulothu (see, e.g., [0023]: “Finally, mixture density networks are used to model the trend shifts and variability present in the data and provide a confidence estimate of the prediction.”).   

As to claim 2, the combination of Chen and Gugulothu teaches the method of claim 1, further comprising:
identifying, by a data collection component, a plurality of service metrics; [Chen, § 4.1.1, paragraph 2: “Given alerting signals across the cloud system, we predict whether a component or service outage will happen.” As defined in Chen, § 3.1, A is a multivariate time series (“For all alerting signals, we denote a multivariate time series of length T as A… for all signals”), i.e., a plurality of service metrics. Note that examples of signals are shown in Chen, FIG. 2(a). In regards to the limitation of a “data collection component,” this is taught in the form of a software implementation of the functions stated in the instant claim. Chen implies that its method is implemented using software, since it involves complex operations on large datasets.]
computing, by the data collection component, correlation information for the plurality of service metrics; [Chen, § 3.2, paragraph 2: “In our method, the conditional dependence between the alerting signal and outage given a set of other alerting signals are obtained by calculating the Pearson correlation. The influence of the conditional signals needs to be regressed out first. Then, the Fisher’s z-transform is performed as in [5]. For example, given the time series sequence of an alerting signal Ai and an outage Oi, and the conditional set only contains the alerting signal Ai2, the correlation between Ai and Oi given Ai2 is… z is the correlation score for the alerting signal Ai and outage Oi give Ai2 . The significant test of z is then used to check whether the independence assumption can be accepted that Ai is independent of Oi given Ai2.” That is, the correlation score is a benchmark indicator that constitutes correlation information for the respective Ai, across the plurality of service metrics in A.] and
filtering, by the data collection component, the plurality of service metrics based on the correlation information to obtain the service metric. [Chen, § 3.2, last paragraph: “In our work, the Bayesian network mainly works as a diagnositic tool to infer the relationship between the alerting signals and the outage. We can also use it to select the most relevant features and feed them as the inputs of the outage prediction model.” Chen, § 4.1.2, paragraph 2: “This observation shows that Bayesian network can help us find the most representative and dependent alerting signals. Other signals which are not found to be highly related to outages in the network would not significantly improve the performance.” That is, the signals selected by the Bayesian network are considered to be “filtered” in that it is a sub selection of the original set.]

As to claim 3, the combination of Chen and Gugulothu teaches the method of claim 1, further comprising:
identifying, by a data collection component, a plurality of service metrics; [Chen, § 4.1.1, paragraph 2: “Given alerting signals across the cloud system, we predict whether a component or service outage will happen.” As defined in Chen, § 3.1, A is a multivariate time series (“For all alerting signals, we denote a multivariate time series of length T as A… for all signals”), i.e., a plurality of service metrics. Note that examples of signals are shown in Chen, FIG. 2(a). In regards to the limitation of a “data collection component,” this is taught in the form of a software implementation of the functions stated in the instant claim. Chen implies that its method is implemented using software, since it involves complex operations on large datasets.]
identifying, by the data collection component, one or more benchmark indicators; [Chen, § 3.2, paragraph 2: “In our method, the conditional dependence between the alerting signal and outage given a set of other alerting signals are obtained by calculating the Pearson correlation. The influence of the conditional signals needs to be regressed out first. Then, the Fisher’s z-transform is performed as in [5]. For example, given the time series sequence of an alerting signal Ai and an outage Oi, and the conditional set only contains the alerting signal Ai2, the correlation between Ai and Oi given Ai2 is… z is the correlation score for the alerting signal Ai and outage Oi give Ai2 . The significant test of z is then used to check whether the independence assumption can be accepted that Ai is independent of Oi given Ai2.” That is, the correlation score is a benchmark indicator.] and
selecting, by the data collection component, the service metric from the plurality of service metrics based on the one or more benchmark indicators. [Chen, § 3.2, last paragraph: “In our work, the Bayesian network mainly works as a diagnositic tool to infer the relationship between the alerting signals and the outage. We can also use it to select the most relevant features and feed them as the inputs of the outage prediction model.” Chen, § 4.1.2, paragraph 2: “This observation shows that Bayesian network can help us find the most representative and dependent alerting signals. Other signals which are not found to be highly related to outages in the network would not significantly improve the performance.” That is, the signals selected by the Bayesian network are considered to be “filtered” in that it is a sub selection of the original set.]

As to claim 5, the combination of Chen and Gugulothu teaches the method of claim 1, as set forth above.
Gugulothu further teaches further comprising:
encoding the time series data using a recurrent neural network to obtain encoded data; [[0025]: “In an embodiment, the sparse RMDN may include long-short term memory (LSTM) or encoder-decoder (ED) as the underlying recurrent architectures.” See also [0026]. [0032]: “An ED is a seq2seq learning model that contains a pair of RNNs (called encoder and decoder) which are trained simultaneously. Given the input time series the encoder learns a latent representation zt of the time series.”] and
decoding the encoded data using a mixture density network to obtain mixture parameters for a plurality of distributions, [[0032]: “The decoder, which has the same structure as the encoder, decodes the hidden state zt to predict y′t+1, . . . , t+p.” The decoder is part of a mixture density network. See [0002]: “time series prediction using a sparse recurrent mixture density network (RMDN), such as sparse LSTM-MDN and a sparse ED-MDN, for accurate forecasting of a high variability time series.”; “The outputs of the MDN (represented as 218 in FIG. 2A/258 in FIG. 2B) as formulated in (2) model the conditional distribution of the future values yt+1, …, t+p to be predicted given the latent representation zt expressed as follows.”] wherein the probability distribution information is based on the mixture parameters. [[0036]: “Every forecasted point of the time series is associated with its own mixture of Gaussians. Let K be the total number of mixtures, then each component k∈{1, . . . , K} in the mixture is associated with coefficient ρk, mean μk and standard deviation σk.” [0047]: “To get prediction at time t, a Gaussian mixture k with the one having highest value of probability ρt,k is selected at 218 in FIG. 2A and 258 in FIG. 2B. The selected Gaussian mixture's mean μt,k is selected as the prediction and the standard deviation σt,k confidence estimate of the prediction at 218 in FIG. 2A and 258 in FIG. 2B.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Chen and Gugulothu to have also arrived at the limitations of the instant claim. The motivation for doing so is the same as the one given for the teachings of Gugulothu in the rejection of the parent claim, since the techniques of Gugulothu discussed for the instant dependent claim are part of those discussed in the rejection of the parent independent claim.

As to claim 6, the combination of Chen and Gugulothu teaches the method of claim 5, as set forth above.
Gugulothu further teaches further comprising:
generating a mixing coefficient using the mixture density network, wherein the probability distribution information is based on the mixing coefficient. [[0036]: “Every forecasted point of the time series is associated with its own mixture of Gaussians. Let K be the total number of mixtures, then each component k∈{1, . . . , K} in the mixture is associated with coefficient ρk, mean μk and standard deviation σk.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Chen and Gugulothu to have also arrived at the limitations of the instant claim. The motivation for doing so is the same as the one given for the teachings of Gugulothu in the rejection of the parent claim, since the techniques of Gugulothu discussed for the instant dependent claim are part of those discussed in the rejection of the parent independent claim.

As to claim 11, Chen teaches a method for training a machine learning model, comprising:
receiving, by a training component, training data including time series data for a service metric of a computer network and outage data for the computer network; [§ 4.1.1, paragraph 1: “We acquired data at the time step of one hour and obtained over 8,000 samples in total (24hrs*365days). …To better deal with the imbalanced data, the SMOTE [2] over-sampling strategy is used for generating the training data from system database so that the positive and negative samples in the training phase can be balanced.”]
generating, by the machine learning model, threshold outage information based on the time series data; [Chen, § 3.2, paragraphs 2-3: “In our method, the conditional dependence between the alerting signal and outage given a set of other alerting signals are obtained by calculating the Pearson correlation. The influence of the conditional signals needs to be regressed out first. Then, the Fisher’s z-transform is performed as in [5]. For example, given the time series sequence of an alerting signal Ai and an outage Oi, and the conditional set only contains the alerting signal Ai2, the correlation between Ai and Oi given Ai2 is… z is the correlation score for the alerting signal Ai and outage Oi give Ai2 . The significant test of z is then used to check whether the independence assumption can be accepted that Ai is independent of Oi given Ai2…In our work, the Bayesian network mainly works as a diagnositic tool to infer the relationship between the alerting signals and the outage. We can also use it to select the most relevant features and feed them as the inputs of the outage prediction model.” That is, the correlation score is used as information to determine whether a particular signal meets a relevance threshold.] and
training, by the training component, parameters of the machine learning model based on […], the threshold outage information, and the outage data. [Chen, § 3.3, paragraph 2: “The XGBoost is optimized to achieve the best prediction results by minimizing the following loss… where the first term …is the summation of cross-entropy loss measuring whether the classification model and features can well perform for the prediction task across all timestamp.” Note that in this context, “optimized” refers to the training of the model (see also § 4.1.1, paragraph 1, which describes “training data” for the “training phase”).]
Chen does not teach: “generating, by a machine learning model, probability distribution information for the service metric based on the time series data” and the training being “based on the probability distribution information.” 
Gugulothu, which generally relates to time series forecasting using machine learning models (see title and abstract), teaches “generating, by a machine learning model, probability distribution information for the service metric based on the time series data” [[0042]: “The outputs of the MDN (represented as 218 in FIG. 2A/258 in FIG. 2B) as formulated in (2) model the conditional distribution of the future values yt+1, …, t+p to be predicted given the latent representation zt expressed as follows.” [0043]: “Thus, the MDN layer outputs a well-defined joint probability distribution obtained for all the time steps in the forecast time horizon.” Note that the latent representation is based on a time series. See [0032]: “Given the input time series the encoder learns a latent representation zt of the time series.”] and the training being “based on the probability distribution information.” [[0043]: “The model parameters are learned by minimizing the negative log-likelihood of the distribution in (3) as shown below”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Chen with the teachings of Gugulothu by implementing the training and the use of a mixture density network (MDN) for confidence estimation, as taught in Gugulothu, for the outage forecasting application taught in Chen, so as to arrive at the claimed invention. The motivation would have been to use a type of model, suitable for time series forecasting in general, that enables modeling of variability and trend shifts present in the input and determination of a confidence estimate of a prediction, as suggested by Gugulothu (see, e.g., [0023]: “Finally, mixture density networks are used to model the trend shifts and variability present in the data and provide a confidence estimate of the prediction.”).   

As to claim 12, the combination of Chen and Gugulothu teaches the method of claim 11, as set forth above.
Gugulothu further teaches further comprising:
encoding, by a recurrent neural network of the machine learning model, the time series data to obtain encoded data; [[0025]: “In an embodiment, the sparse RMDN may include long-short term memory (LSTM) or encoder-decoder (ED) as the underlying recurrent architectures.” See also [0026]. [0032]: “An ED is a seq2seq learning model that contains a pair of RNNs (called encoder and decoder) which are trained simultaneously. Given the input time series the encoder learns a latent representation zt of the time series.”] and
decoding, by a mixture density network of the machine learning model, the encoded data to obtain mixture parameters for a plurality of distributions, [[0032]: “The decoder, which has the same structure as the encoder, decodes the hidden state zt to predict y′t+1, . . . , t+p.” The decoder is part of a mixture density network. See [0002]: “time series prediction using a sparse recurrent mixture density network (RMDN), such as sparse LSTM-MDN and a sparse ED-MDN, for accurate forecasting of a high variability time series.”; “The outputs of the MDN (represented as 218 in FIG. 2A/258 in FIG. 2B) as formulated in (2) model the conditional distribution of the future values yt+1, …, t+p to be predicted given the latent representation zt expressed as follows.”]  wherein the probability distribution information is based on the mixture parameters. [[0036]: “Every forecasted point of the time series is associated with its own mixture of Gaussians. Let K be the total number of mixtures, then each component k∈{1, . . . , K} in the mixture is associated with coefficient ρk, mean μk and standard deviation σk.” [0047]: “To get prediction at time t, a Gaussian mixture k with the one having highest value of probability ρt,k is selected at 218 in FIG. 2A and 258 in FIG. 2B. The selected Gaussian mixture's mean μt,k is selected as the prediction and the standard deviation σt,k confidence estimate of the prediction at 218 in FIG. 2A and 258 in FIG. 2B.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Chen and Gugulothu to have also arrived at the limitations of the instant claim. The motivation for doing so is the same as the one given for the teachings of Gugulothu in the rejection of the parent claim, since the techniques of Gugulothu discussed for the instant dependent claim are part of those discussed in the rejection of the parent independent claim.

As to claim 13, the combination of Chen and Gugulothu teaches the method of claim 11, as set forth above.
Gugulothu further teaches “further comprising: computing, by the training component, a distribution loss based on the probability distribution information and the outage data, wherein the parameters of the machine learning model are updated based on the distribution loss.” [[0043]: “The model parameters are learned by minimizing the negative log-likelihood of the distribution in (3) as shown below.” Note that “learned by minimizing” refers to the training process of the model.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Chen and Gugulothu to have also arrived at the limitations of the instant claim. The motivation for doing so is the same as the one given for the teachings of Gugulothu in the rejection of the parent claim, since the techniques of Gugulothu discussed for the instant dependent claim are part of those discussed in the rejection of the parent independent claim.

As to claim 14, the combination of Chen and Gugulothu teaches the method of claim 11, further comprising:
computing, by the training component, a binary cross-entropy loss based on the threshold outage information and the outage label data, wherein the parameters of the machine learning model are updated based on the binary cross-entropy loss. [Chen, § 3.3, paragraph 2: “The XGBoost is optimized to achieve the best prediction results by minimizing the following loss… where the first term …is the summation of cross-entropy loss measuring whether the classification model and features can well perform for the prediction task across all timestamp.” Note that in this context, “optimized” refers to the training of the model (see also § 4.1.1, paragraph 1, which describes “training data” for the “training phase”). The limitation of “outage label data” is taught in § 3.1, paragraph 2: “The outage sequence is a binary time series of length T as O = (O1,O2, ...,OT ) ∈ RT , where for each t ∈ {1, 2, ...,T }, Ot ∈ {1, 0} indicating whether an outage happens at time t or not.”] 

As to claim 16, Chen teaches an apparatus for outage forecasting, [Chen teaches that its method is implemented on a computer apparatus. See, e.g., § 2, paragraph 2: “When an outage is detected, the management tool is expected to automatically notify, mitigate, and diagnose the outage…our work uses the failure signals from monitors (which we call, alerting signals) to predict critical failures (which we call, outages). In current real-time system, outage prediction is an important issue that bothers many systems.”] comprising:
a machine learning model configured to generate […] for a service metric of a computer network based on time series data, [§ 4.1.1, paragraph 2: “Given alerting signals across the cloud system, we predict whether a component or service outage will happen. For evaluation, we choose the service and component outages that occurred frequently in the last year…” The alerting signal is a time series, as defined in § 3.1: “A single alerting signal Ai can be represented by Ai …, where each component                         
                            
                                
                                    A
                                
                                
                                    t
                                
                                
                                    i
                                
                            
                        
                     indicates the strength of this signal at time t ∈ (1, T) …The input feature At …are the m alerting signals used for prediction.” The model is a machine learning model, which includes a “Bayesian network” and a “gradient boosting tree” as described in § 3.2 and § 3.3.] wherein the machine learning model is trained using […] and a classification loss based on a classification output of the machine learning model; [§ 3.3, paragraph 2: “The XGBoost is optimized to achieve the best prediction results by minimizing the following loss… where the first term …is the summation of cross-entropy loss measuring whether the classification model and features can well perform for the prediction task across all timestamp.” Note that in this context, “optimized” refers to the training of the model (see also § 4.1.1, paragraph 1, which describes “training data” for the “training phase”).] and
a forecasting component configured to generate outage forecasting information for the computer network […]. [§ 3.3, paragraph 2: “In the CART tree, each node is one of the alerting signals. The prediction result is the sum of scores predicted by K decision trees…” § 4.1.1, paragraph 2: “Given alerting signals across the cloud system, we predict whether a component or service outage will happen.]
The information being “probability distribution information”, the model being trained also using “a distribution loss based on a distribution output of the machine learning model,” and the generation of the outage forecasting information being “based on the probability distribution information.”
Gugulothu, which generally relates to time series forecasting using machine learning models (see title and abstract), teaches generating by a machine learning model, “probability distribution information” [[0042]: “The outputs of the MDN (represented as 218 in FIG. 2A/258 in FIG. 2B) as formulated in (2) model the conditional distribution of the future values yt+1, …, t+p to be predicted given the latent representation zt expressed as follows.” [0043]: “Thus, the MDN layer outputs a well-defined joint probability distribution obtained for all the time steps in the forecast time horizon.” Note that the latent representation is based on a time series. See [0032]: “Given the input time series the encoder learns a latent representation zt of the time series.”], the machine learning model trained “using a distribution loss based on a distribution output of the machine learning mode,” [[0043]: “The model parameters are learned by minimizing the negative log-likelihood of the distribution in (3) as shown below”] and the generation of the forecasting information “based on the probability distribution information.” [See parts cited above. See also [0023]: “Finally, mixture density networks are used to model the trend shifts and variability present in the data and provide a confidence estimate of the prediction.” That is, the confidence estimate is a forecasting information.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Chen with the teachings of Gugulothu by implementing the training and the use of a mixture density network (MDN) for confidence estimation, as taught in Gugulothu, for the outage forecasting application taught in Chen, so as to arrive at the claimed invention. The motivation would have been to use a type of model, suitable for time series forecasting in general, that enables modeling of variability and trend shifts present in the input and determination of a confidence estimate of a prediction, as suggested by Gugulothu (see, e.g., [0023]: “Finally, mixture density networks are used to model the trend shifts and variability present in the data and provide a confidence estimate of the prediction.”).   

As to claim 17, the combination of Chen and Gugulothu teaches the apparatus of claim 16, further comprising:
a training component configured to update parameters of the machine learning model. [Chen, § 3.3, paragraph 2: “The XGBoost is optimized to achieve the best prediction results by minimizing the following loss… where the first term …is the summation of cross-entropy loss measuring whether the classification model and features can well perform for the prediction task across all timestamp.” Note that in this context, “optimized” refers to the training of the model (see also § 4.1.1, paragraph 1, which describes “training data” for the “training phase”).]

As to claim 18, the combination of Chen and Gugulothu teaches the apparatus of claim 16, further comprising: a memory; and a processor configured to cause the machine learning model and the forecasting component to operate based on instructions stored in the memory. [As noted in the rejection of the parent claim, Chen teaches that its method is implemented on a computer system. Since the instant limitations are generic components of a general-purpose computer, the instant limitations are implied by Chen.]

As to claim 19, the combination of Chen and Gugulothu teaches the apparatus of claim 16, further comprising:
a data collection component configured to collect the time series data for the service metric. [Chen, § 4.1.1, paragraph 2: “Given alerting signals across the cloud system, we predict whether a component or service outage will happen.” As defined in Chen, § 3.1, A is a multivariate time series (“For all alerting signals, we denote a multivariate time series of length T as A… for all signals”), i.e., a plurality of service metrics. Note that examples of signals are shown in Chen, FIG. 2(a). In regards to the limitation of a “data collection component,” this is taught in the form of a software implementation of the functions stated in the instant claim. Chen implies that its method is implemented using software, since it involves complex operations on large datasets.]

2.	Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Gugulothu, and further in view of Wang et al. (US 2012/0136909 A1) (“Wang”).
As to claim 4, the combination of Chen and Gugulothu teaches the method of claim 1, but does not teach the further limitations of the instant dependent claim.
Wang teaches further comprising:
collecting, by a data collection component, data for the service metric for a plurality of instances; [[0025]: “FIG. 4 is a diagram of an example EbAT framework 400. Shown are the data center 108 and various example metrics generated by this data center 108. These example metrics include service level metrics 401, system level metrics 402, and platform metrics 403. These metrics are provided to an EbAT enabled device or series of devices.”] and
computing, by the data collection component, an aggregate value for the service metric over the plurality of instances at each of a plurality of time steps, wherein the time series data is based on the aggregate value. [[0026]: “Operatively connected to the processor 501 is an aggregation module 506 to generate an entropy time series through transforming the at least one vector value into an entropy value to be displayed as part of a look-back window.” [0031]: “Operation 1005 is executed to generate an aggregated entropy time series for the parent through transforming all of the vector values for all of the children into the aggregated entropy time series.” [0020]: “The resulting entropy distributions aggregate raw metric data across the utility cloud to form “entropy time series,” and in addition, each hierarchy of the cloud (e.g., H-Crossing components such as data center, container, rack, enclosure, node, socket, core) can generate higher level from lower level entropy time series.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Wang by implementing the aggregation technique so as to arrive at the claimed invention of the instant claim. The motivation for doing so would have been to analyze entropy distributions aggregate raw metric data across a cloud, which is a form of information that is indicative of anomalies (see Wang, [0020]: “The resulting entropy distributions aggregate raw metric data across the utility cloud to form “entropy time series,”… Tools, such as spike detecting (e.g., visually or using time series analysis), signal processing or subspace method, are used to identify anomalies in entropy time series in general and at each level of the hierarchy.”).

3.	Claims 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Gugulothu, and further in view of Yang et al. (US 2022/0382856 A1) (“Yang”).
As to claim 7, the combination of Chen and Gugulothu teaches the method of claim 5, further comprising:
decoding the encoded data [As noted above in the rejection of claim 5, the step of decoding is taught as part of the model structure of Gugulothu incorporated in the combined teachings of the references.]
The combination of references thus far does not teach “using a classification network to obtain threshold outage information, wherein the outage forecasting information is based on the threshold outage information.”
Yang teaches “using a classification network to obtain threshold outage information, wherein the outage forecasting information is based on the threshold outage information.” [[0046]: “A point is labeled as an anomaly if its anomaly score is larger than a certain threshold, and thus the anomaly output 255 can be generated.” Since the anomaly score is compared to a threshold, it is considered to be a threshold information (note that the context of “outage” is already taught by the existing references), and the label is analogous to information that is based on threshold outage information.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Yang by implementing the model of Chen, as modified by Gugulothu, to include a classification network so as to arrive at the claimed invention. The motivation for doing so would have been to determine a causal graph from a multivariate time series input, which can be used to train a model for time series analysis (see Yang (see [0018]: “…Specifically, a causal graph is determined from a multivariate time series input, which describes the causal relationship between a plurality of causal variables in the multivariate time series input… The causal structure can then be served as a condition for a machine learning model to predict a conditional distribution of the multivariate time series, which is used to train the machine learning model.”), particularly in a manner that decompose the training complexity, which improves computational efficiency in training for real world applications (see Yang, [0019]: “The decomposability of the anomaly detection problem also helps to decompose the training complexity, which improves computational efficiency in training for real world applications and in particular for root cause analysis.”).

As to claim 8, the combination of Chen and Gugulothu teaches the method of claim 1, but does not teach the further limitations of the instant dependent claim.
Yang teaches further comprising:
generating, by a change attribution component, a causal graph based on the time series data for the service metric and deployment data related to the service metric; [[0018]: “Specifically, a causal graph is determined from a multivariate time series input, which describes the causal relationship between a plurality of causal variables in the multivariate time series input. A subset of causal variables are than derived as having no causal parents in the causal graph. A causal structure of the multivariate time series can then be determined based on the subset of causal variables.” Note that the “multivariate time series input” is analogous to both the service metric and deployment data, since deployment data can also be another service metric that is relate to the first service metric. See also [0050]: “At step 302, the system may receive, via a data interface, training data containing a multivariate time series input (e.g., the multivariate time series data 230 in FIG. 2A) over a period of time.” [0016]: “For example, some existing systems may treat each performance metric individually using univariate time series anomaly detection algorithms, or alternatively treat all the performance metrics as an entity using multivariate time series anomaly detection algorithms.”] and
aggregating, by the change attribution component, causality information for a service based on the causal graph, wherein the outage forecasting information is based on the aggregated causality information. [[0018]: “Specifically, a causal graph is determined from a multivariate time series input, which describes the causal relationship between a plurality of causal variables in the multivariate time series input. A subset of causal variables are than derived as having no causal parents in the causal graph. A causal structure of the multivariate time series can then be determined based on the subset of causal variables. The causal structure can then be served as a condition for a machine learning model to predict a conditional distribution of the multivariate time series, which is used to train the machine learning model.” Here, the causal structure is regarded as an aggregation of causality information for the service. Regarding the limitation of “wherein the outage forecasting information is based on the aggregated causality information,” the training of a model used for forecasting would result in the predictions of the model being based on the causal structure. Moreover, the term “based on” does not require a specific relationship.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Yang by implementing the model of Chen, as modified by Gugulothu, to include a classification network so as to arrive at the claimed invention. The motivation for doing so would have been to determine a causal graph from a multivariate time series input, which can be used to train a model for time series analysis (see Yang (see [0018]: “…Specifically, a causal graph is determined from a multivariate time series input, which describes the causal relationship between a plurality of causal variables in the multivariate time series input… The causal structure can then be served as a condition for a machine learning model to predict a conditional distribution of the multivariate time series, which is used to train the machine learning model.”), particularly in a manner that decompose the training complexity, which improves computational efficiency in training for real world applications (see Yang, [0019]: “The decomposability of the anomaly detection problem also helps to decompose the training complexity, which improves computational efficiency in training for real world applications and in particular for root cause analysis.”).

4.	Claims 9 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Gugulothu and Yang, and further in view of Fattu et al. (US 2020/0162342 A1) (“Fattu”).
As to claim 9, the combination of Chen, Gugulothu, and Yang teaches the method of claim 8, as set forth above.
Fattu teaches “further comprising: filtering, by the change attribution component, a set of service changes corresponding to the deployment data based on the aggregated causality information.” [[0004]: “The preventative action module communicates a ranked set of recommended actions for mitigating a predicted failure of one or more of the replaceable devices.” [0095]: “For example, in response to a recommended action to replace one of the replaceable devices 118, the visual map displays a symbol with an easily visible color such as red, that illustrates the location or coordinates of replaceable device 118 which is to be replaced.” Replacement of a device corresponds to a service change, and ranking set of recommended actions for such constitutes an act of filtering. Additionally, the concept of being based on a predicted failure is analogous to “based on the aggregated causality information” when applied to the existing combination of references, since a recommended action based on a prediction is also based on causality information used to train the model. The limitation of “corresponding to deployment data” is met in the combination of references because Chen already teaches the service metric, as discussed above, and the data corresponding to a service metric can also correspond to deployment data.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Fattu by implementing its technique of determining recommendations actions in response to a predicted failure, so as to arrive at the limitations of the instant dependent claim. Doing so would enable mitigation of a predicted failure, as suggested by Fattu ([0004]: “A preventative action module is also included in the apparatus. The preventative action module communicates a ranked set of recommended actions for mitigating a predicted failure of one or more of the replaceable devices, the failure predicted based on a comparison between current weighted performance and environmental measurements and the baseline correlations”). 

As to claim 20, the combination of Chen and Gugulothu teaches the apparatus of claim 16, but does not teach the further limitations of the instant dependent claim.
Fattu teaches “further comprising: a change attribution component configured to filter a set of service changes corresponding to deployment data based on the probability distribution information.” [[0004]: “The preventative action module communicates a ranked set of recommended actions for mitigating a predicted failure of one or more of the replaceable devices.” [0095]: “For example, in response to a recommended action to replace one of the replaceable devices 118, the visual map displays a symbol with an easily visible color such as red, that illustrates the location or coordinates of replaceable device 118 which is to be replaced.” Replacement of a device corresponds to a service change, and ranking set of recommended actions for such constitutes an act of filtering. Additionally, the concept of being based on a predicted failure is analogous to “based on the probability distribution information” when applied to the existing combination of references. The limitation of “corresponding to deployment data” is met in the combination of references because Chen already teaches the service metric, as discussed above.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Fattu by implementing its technique of determining recommendations actions in response to a predicted failure, so as to arrive at the limitations of the instant dependent claim. Doing so would enable mitigation of a predicted failure, as suggested by Fattu ([0004]: “A preventative action module is also included in the apparatus. The preventative action module communicates a ranked set of recommended actions for mitigating a predicted failure of one or more of the replaceable devices, the failure predicted based on a comparison between current weighted performance and environmental measurements and the baseline correlations”). 


5.	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Gugulothu, and further in view of Mehta et al. (US 2020/0184355 A1).
As to claim 10, the combination of Chen and Gugulothu teaches the method of claim 1, but does not teach the further limitations of the instant dependent claim.
Mehta teaches “further comprising: determining, by the forecasting component, that a likelihood of an outage in the computer network exceeds a threshold based on the probability distribution information; and transmitting, by the forecasting component, an alert based on the determination.” [[0073]: “An alert is generated at the dashboard and/or disseminated via the network 120 when an outage prediction score exceeds a predefined prediction threshold. The prediction threshold may be customized by the application owner such that an alert is generated when the probability of an outage reaches an unacceptable level. The alert is intended to inform the application owner that the probability of an outage based on the input log data has reached the unacceptable level, thereby affording the application owner an opportunity to take prophylactic measures.” Note that the score is a classification of “likelihood” (probability). See [0069]: “As a result of applying the classifier algorithm to the several data inputs, the incident prediction engine 360 generates incident prediction scores classifying the likelihood of any known incidents occurring.” Therefore, the basis for the score as disclosed here is analogous to the probability distribution information of the instant claim.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Mehta by implementing the alerting technique as taught by Mehta so as to arrive at the limitations of the instant dependent claim. The motivation for doing so would have enabled “real-time diagnostic monitoring as well as alerts and outage counter-measures” (Mehta, [0065]: “The process flow involves inputting various data, which collectively make up decision criteria, to the incident prediction engine 360, and, based on the input data, providing real-time diagnostic monitoring as well as alerts and outage counter-measures.”).

6.	Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Gugulothu, and further in view of Zhang et al., “Modeling Extreme Events in Time Series Prediction,” KDD ’19, August 4–8, 2019, Anchorage, AK, USA (“Zhang”).
As to claim 15, the combination of Chen and Gugulothu teaches the method of claim 11, further comprising:
computing, by the training component, an […] loss based on the threshold outage information and the outage label data, wherein the parameters of the machine learning model are updated based on the extreme value loss. [Chen, § 3.3, paragraph 2: “The XGBoost is optimized to achieve the best prediction results by minimizing the following loss… where the first term …is the summation of cross-entropy loss measuring whether the classification model and features can well perform for the prediction task across all timestamp.” Note that in this context, “optimized” refers to the training of the model (see also § 4.1.1, paragraph 1, which describes “training data” for the “training phase”). The limitation of “outage label data” is taught in § 3.1, paragraph 2: “The outage sequence is a binary time series of length T as O = (O1,O2, ...,OT ) ∈ RT , where for each t ∈ {1, 2, ...,T }, Ot ∈ {1, 0} indicating whether an outage happens at time t or not.”]
The combination of references thus far does not teach the limitation of the loss including an “extreme value loss.”
Zhang teaches an “extreme value loss” [Abstract: “To address this issue, we take inspirations from the Extreme Value Theory, developing a new form of loss called Extreme Value Loss (EVL) for detecting the future occurrence of extreme events.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Zhang by implementing an extreme value loss so as to arrive at the limitations of the instant claim. The motivation would have been to implement a form of loss that enables time series prediction with extreme events (see Zhang, abstract: “By incorporating EVL with an adapted memory network module, we achieve an end-to-end framework for time series prediction with extreme events”).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following documents depict the state of the art.
US 20210304037 A1 teaches conventional techniques in outage detection.
Yang et al., “Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network,” Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) teaches the use of both classification loss and distribution loss.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 9:00 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Mar 24, 2022
Application Filed
Feb 07, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/747,036
Patent 12536455
Method for Early Warning Brandish of Transmission Wire Based on Improved Bayes-Adaboost Algorithm
2y 5m to grant Granted Jan 27, 2026
16/566,504
Patent 12517958
SYSTEM AND METHOD FOR NEXT STEP PREDICTION OF ICS FLOW USING ARTIFICIAL INTELLIGENCE/MACHINE LEARNING
2y 5m to grant Granted Jan 06, 2026
17/223,859
Patent 12518218
DYNAMICALLY SCALABLE MACHINE LEARNING MODEL GENERATION AND RETRAINING THROUGH CONTAINERIZATION
2y 5m to grant Granted Jan 06, 2026
17/135,913
Patent 12488279
DOMAIN-SPECIFIC CONSTRAINTS FOR PREDICTIVE MODELING
2y 5m to grant Granted Dec 02, 2025
17/081,454
Patent 12475373
INFORMATION PROCESSING APPARATUS AND METHOD AND PROGRAM FOR GENERATING INTEGRATED MODEL
2y 5m to grant Granted Nov 18, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
63%
Grant Probability
95%
With Interview (+31.9%)
3y 11m
Median Time to Grant
Low
PTA Risk
Based on 124 resolved cases by this examiner. Grant probability derived from career allow rate.