Last updated: May 29, 2026
Application No. 17/393,130
Learning Causal Relationships

Non-Final OA §103
Filed
Aug 03, 2021
Examiner
VANWORMER, SKYLAR K
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
4 (Non-Final)
Interview Optional

— +22.5% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 39% grant rate with +22.5% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 28 resolved cases, 2023–2026
Examiner Intelligence

VANWORMER, SKYLAR K View full profile →
Grants only 39% of cases
Career Allowance Rate
11 granted / 28 resolved
-15.7% vs TC avg
Strong +22% interview lift
Without
With
+22.5%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
14 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.7%
-37.3% vs TC avg
§103
96.6%
+56.6% vs TC avg
§112
0.7%
-39.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 28 resolved cases
Office Action

§103
DETAILED ACTION
Claims 1-25 are pending.
Claims 1, 7, 13, 19 and 24 are independent.
Claims 1, 6-7, 12-13, 18-19 and 24 are amended.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-25 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. The newly used prior art Ma et al (ServiceRank: Root Cause Identification of Anomaly in Large-Scale Microservice Architectures, “Ma”) is used in combination with previously cited art Aggarwal, Briancon, Bender and Zhou. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-2, 5, 7-8, 11, 13-14, 17, 19, 21-23 and 24-25 are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal et al (Localization of Operational Faults in Cloud Applications by Mining Causal Dependencies in Logs Using Golden Signals, "Aggarwal"), in view of Briancon et al (US Published Patent Application No. 20200380417, "Briancon"), Bender et al (Lowest common ancestors in trees and directed acyclic graphs, "Bender") and in further view of Zhou et al (Latent Error Prediction and Fault Localization for Microservice Applications by Learning from System Trace Logs, "Zhou") and Ma et al (ServiceRank: Root Cause Identification of Anomaly in Large-Scale Microservice Architectures, "Ma").

In regard to claim 1 and analogous clams 7, 13, and 19, Aggarwal teaches a staging manager that operates in a pre-deployment environment to learn causal relationships between two or more application micro-services, including: (Aggarwal, pg. 141, paragraph 1, “Next, we use two Granger causality techniques: regression based and independence testing based to infer the causal relationship among micro-services [learn causal relationships between two or more application micro-services]. Causal dependencies indicate the strength of the correlation between the errors in various micro-services.” And Fig. 1, 
    PNG
    media_image1.png
    373
    753
    media_image1.png
    Greyscale
, pre-processing being interpreted as the pre-deployment environment.)
collecting first micro-service error log data generated offline, the first micro-service error log data corresponding to one or more selectively injected errors; (Aggarwal, pg. 144, Dataset Details, paragraph 1, “We use the TrainTicket application [2], an open-source micro-service application, to inject faults and generate log data to evaluate the effectiveness of our proposed approach. The application contains 41 micro-services. Service ts-ui-dashboard acts as the gateway service which records the status of each incoming and outgoing service. The error signals emitted by this service and capturing the failure of a request are considered as golden signals. We use Istio [1] to inject HTTP abort fault in multiple services. In abort fault, incoming request is intercepted by Istio and returns 500 error status. Users can also specify the percentage of requests that should be failed. We injected faults in 17 services that cover the main flow of the train ticket application. We generated data by running a scenario withere 100% incoming requests are failed.”). Aggarwal does not expressly teach that the log data is [generated offline]. However, Zhou teaches pg. 685, Col. 1, paragraph 3, “Our system implementation uses Istio to manage asynchronous microservice interactions during offline training and collect trace logs during both offline training and online prediction.”). Aggarwal and Zhou are all related to the same field of endeavor (i.e. error detection). In view of the teachings of Zhou, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Zhou to Aggarwal before the effective filing date of the claimed invention in order achieve high accuracy of prediction of errors. (Zhou, Col. 2, abstract, paragraph 3, “The results indicate that MEPFL can achieve high accuracy in intraapplication prediction of latent errors, faulty microservices, and fault types, and outperforms a state-of-the-art approach of failure diagnosis for distributed systems.”).
Aggarwal further discloses generating a learned causal graph based on the collected first micro- service error log data, the learned causal graph representing dependency of application micro-services effected by the selective error injection; (Aggarwal, pg. 142, Causal Inference, paragraph 1, “We infer causal relationships among the error signals emitted by individual micro-services and the golden signal errors, after modeling the log data as multiple time series. We assume that the anomalous behavior of a faulty component is likely to result in error signals being emitted by neighboring components (micro-services), which are components that interact with the faulty component either directly or indirectly. Different from association and correlation, causality is used to represent a direct “cause-effect” relation. Figure 1 shows a sample graph where the nodes correspond to micro-services and edges represent the cause and depends on relationship. The direction of causality is reverse of the direction of dependency.” And pg. 143, paragraph 2, “The inputs to thePageRank algorithm are the graphs (causal and dependency), the golden signal errors and the causal score of each node with the golden signal errors [based on the collected first micro- service error log data, the learned causal graph representing dependency of application micro-services effected by the selective error injection;]. Let CSi define the causal score of node i with respect to the golden signal errors. We derive the anomalous sub-graph from both dependency and causal graphs [generate a learned causal graph] by preserving the nodes that cause golden signal errors (candidate nodes) and their direct connections. Considering that the request flow is from node i to node j, the weight of each edge eij is assigned to the value of CSj , the weight of each added self-edge eii is assigned to the value of CSi, and the weight of each added backward edge eji is assigned to the value of ρCSi, where ρ ∈ [0, 1]. We set ρ to a high value if the causal graph represents the true dependency graph. As error propagation happens in the opposite direction of request flow, we reverse the direction of the edges when applying the PageRank algorithm.”).
Aggarwal further discloses a director, operatively coupled to the production manager, configured to identify the micro-service associated with the identified error source. (Aggarwal, pg. 142, paragraph 4, “In order to track the causal dependencies among time series instantly, [28] developed a novel Bayesian Lasso-Granger method, BLasso, which conducts the causal inference from the Bayesian perspective [19] in a sequential online mode. We use BLinear and BLasso regression based methods to infer the causality graph of micro-services. For conditional independence based causal inference, we use the PC − Algorithm [10,20]. The algorithm starts from a complete, undirected graph and deletes recursively the edges based on conditional independence decisions. We leverage a cross entropy based metric, namely G2, to test whether two services are dependent on one another or not. The micro-services which cause golden signal errors are identified as potential source of fault [to identify the micro-service associated with the identified error source.].”).
Aggarwal does not explicitly teach an artificial intelligence (AI) platform in communication with the computer processor and memory, the Al platform comprising: 
a production manager operatively coupled to the staging manager, the production manager operates in a production environment to dynamically localize a source of an application error, including:
collecting second micro-service error log data generated online, the second micro-service error log data corresponding to the application error; 
building a correlation matrix based on the learned causal graph and the collected second micro-service error log data, wherein the correlation matrix indicates whether a first application micro-service represented in the causal graph is an ancestor of a second application micro-service represented in the second micro-service error log; 
leveraging the correlation matrix to identify the source of the application error based on an ancestry of the application micro-services indicated in the correlation matrix; and
However, Zhou further teaches a production manager operatively coupled to the staging manager, the production manager operates in a production environment to dynamically localize a source of an application error, including: (Zhou, pg. 684, Col. 1, paragraph 2, “To allow developers to resolve microservice application failures efficiently in the production environment [a production manager operatively coupled to the staging manager], it is desirable and yet challenging that these microservice application failures can be detected and the faults can be located at runtime of the production environment, e.g., based on application logs or system logs [the production manager operates in a production environment to dynamically localize a source of an application error,]. Application logs record the internal status and events during the execution of an application.”)
collecting second micro-service error log data generated online, the second micro-service error log data corresponding to the application error; (Zhou, pg. 685, Col. 1, paragraph 3, “Our system implementation uses Istio to manage asynchronous microservice interactions during offline training and collect trace logs during both offline training and online prediction.”)
Aggarwal and Zhou are all related to the same field of endeavor (i.e. error detection). In view of the teachings of Zhou, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Zhou to Aggarwal before the effective filing date of the claimed invention in order achieve high accuracy of prediction of errors. (Zhou, Col. 2, abstract, paragraph 3, “The results indicate that MEPFL can achieve high accuracy in intraapplication prediction of latent errors, faulty microservices, and fault types, and outperforms a state-of-the-art approach of failure diagnosis for distributed systems.”).
Although Aggarwal, figure 1 and pg. 142-3, Causal Inference, teaches generating causal and dependency graphs [learned causal graph] based on micro-service log data, where the nodes correspond to micro-services and edges represent the cause and depends on relationship where the direction of causality is reverse of the direction of dependency, and provide a list of nodes which are potentially faulty [source of the error], Aggarwal and Zhou does not explicitly teach an artificial intelligence (AI) platform in communication with the computer processor and memory, the Al platform comprising: 
building a correlation matrix based on the learned causal graph and the collected second micro-service error log data, wherein the correlation matrix indicates whether a first application micro-service represented in the causal graph is an ancestor of a second application micro-service represented in the second micro-service error log; 
leveraging the correlation matrix to identify the source of the application error based on an ancestry of the application micro-services indicated in the correlation matrix; and
However, Bender teaches leveraging the correlation matrix to identify the source of the application error based on an ancestry of the application micro-services indicated in the correlation matrix; and (Bender, pg. 85, 4. Finding all-pairs LCA in DAGS, “In our algorithms we compute the answers to all _ n2 _ queries in the preprocessing stage. Then we answer queries by performing table lookups. We show how to build the binary common-ancestor-existence matrix in _O(nω) operations and the representative-LCA matrix in _O(n(ω+3)/2) operations [leverage the ancestral matrix to identify the source of the error]. The fastest known matrix-multiplication algorithm to date runs in O(nω) where, ω ≈ 2.376 [10]. Thus, our all-pairs-common-ancestor-existence algorithm runs in time _O(n2.376), and our all-pairs-representative-LCA algorithm runs in time _O(n2.688) [based on an ancestry of the application micro-services indicated in the correlation matrix;].”)
Aggarwal, Zhou and Bender are all related to the same field of endeavor (i.e. error detection). In view of the teachings of Bender, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Bender to Aggarwal and Zhou before the effective filing date of the claimed invention in order to provide efficient optimal algorithms in trees. (Bender, pg. 87, paragraph 5, “Observe that the L has no edge weights greater than n since n is the depth of the deepest node in the graph. We set ε = n(ω−3)/2 which leads to a running time of _O(n(ω+3)/2) for the approximate-shortest-path computation. The greatest possible distance error is n(ω−1)/2. Thus the algorithm identifies 2n(ω−1)/2 possible LCAh candidates for every pair. We perform a transitive closure operation (using fast matrix multiplication) to allow for fast access to a node’s ancestors.”).
Although Aggarwal teaches using the TrainTicket web application to inject faults and generate log data in order to infer causal relationships among the error signals emitted by individual micro-services and the golden signal errors, by modeling the log data as multiple time series, see Aggarwal p. 142 (Causal Inference), p. 144 (Dataset Details), Aggarwal, Zhou and Bender does not explicitly teach an artificial intelligence (AI) platform in communication with the computer processor and memory, the AI platform comprising:
building a correlation matrix based on the learned causal graph and the collected second micro-service error log data, wherein the correlation matrix indicates whether a first application micro-service represented in the causal graph is an ancestor of a second application micro-service represented in the second micro-service error log;
However, Briancon teaches an artificial intelligence (AI) platform in communication with the computer processor and memory, the Al platform comprising: (Briancon, paragraph 0057, “A data domain that transforms data into objects, and an AI (artificial intelligence) domain [an artificial intelligence (AI) platform] that transforms these objects into results.” And paragraph 0263, “Computing system 1000 may include one or more processors (e.g., processors 1010a-1010n) coupled to system memory 1020, an input/output I/O device interface 1030, and a network interface 1040 via an input/output (I/O) interface 1050. A processor may include a single processor or a plurality of processors ( e.g., distributed processors). A processor may be any suitable processor capable of executing or otherwise performing instructions. A processor may include a central processing unit (CPU) that carries out program instructions to perform the arithmetical, logical, and input/output operations of computing system 1000. A processor may execute code (e.g., processor firmware, a protocol stack, a database management system, an operating system, or a combination thereof) that creates an execution environment for program instructions. A processor may include a programmable processor. A processor may include a Graphic Processing Unit (GPU). A processor may include general or special purpose microprocessors. A processor may receive instructions and data from a memory [communication with the computer processor and memory,] (e.g., system memory 1020).”)
Aggarwal, Zhou, Bender and Briancon are all related to the same field of endeavor (i.e. error detection). In view of the teachings of Briancon, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Briancon to Aggarwal, Zhou and Bender before the effective filing date of the claimed invention in order to deploy an entire pipeline efficiently. (Briancon, paragraph 0062, “In some embodiments, a sheet may be compiled into a single archive allowing runtime to efficiently deploy an entire pipeline ( or a selected portion,), including its classes and their associated resources, in a single request.”).
However, Aggarwal, Zhou, Bender and Briancon do not explicitly teach building a correlation matrix based on the learned causal graph and the collected second micro-service error log data, wherein the correlation matrix indicates whether a first application micro-service represented in the causal graph is an ancestor of a second application micro-service represented in the second micro-service error log;
Ma teaches building a correlation matrix based on the learned causal graph and the collected second micro-service error log data, wherein the correlation matrix indicates whether a first application micro-service represented in the causal graph is an ancestor of a second application micro-service represented in the second micro-service error log; (Ma, pg. 3095, Col. 1, 4.6, paragraph 1, “Let GðV;EÞ be the constructed impact graph from T [learned causal graph], where each node vi 2 V indicates a service, each edge eij 2 E is set to 1 when service vi is impacted by vj [the collected second micro-service error log data]. Other inputs of the root cause identification stage include a front-end service vfe and correlation score matrix C [building a correlation matrix]. For simplicity, we use the notation of the original correlation score ci;j in the following, regardless of whether calibrated or not. Our algorithm proposed to find the root cause is inspired by the random walk algorithm, which is analogous to the human behavior during the manual investigation. Assuming that we have no domain knowledge about the anomaly, only the discovered impact graph GðV;EÞ and correlation score C, one of the natural diagnosis methods is to randomly traverse services following G with preferentially looking for a high correlation score ci;j regard to the anomaly in frontend service vfe. The random walk should consider not only the correlation of currently-visiting service, but also the correlation between the previously visited service and vfe [an ancestor]. For example, when the correlation of the previous and current node is both high, moving forward may find the root cause.”)
Aggarwal, Zhou, Bender and Briancon and Ma are all related to the same field of endeavor (i.e. error detection). In view of the teachings of Ma, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Ma to Aggarwal, Zhou, Bender and Briancon before the effective filing date of the claimed invention in order to improve the accuracy of diagnosis of an anomaly. (Ma, pg. 3091, Col. 1, paragraph 1, “On the other hand, Service- Rank can effectively utilize various expert knowledge to further improve the accuracy and efficiency of diagnosis.”) 


In regard to claim 2 and analogous claims 8 and 14, Aggarwal, Briancon, Bender, Zhou and Ma teach the system of claim 1. 
Briancon further teaches wherein the causal relationship learning between two or more application micro-services, further comprises the staging manager to filter the collected first micro-service error log data to selectively remove a subset of first error log data. (Briancon, paragraph 0379, “The medium of any one of embodiments 15D-22D, wherein the plurality of object-orientation modelors comprise: ingestion modelors used to control schema drift of the datasets and add version numbers to the datasets; landing modelors used to clean error records in the datasets and update the version numbers of the datasets [filter the collected first micro-service error log data to selectively remove a subset of first error log data.]; curation modelors used to normalize the datasets, by adding primary surrogate keys, and update the version numbers of the datasets; dimensional modelors used to encode the datasets in dimensional star schema and update the version numbers of the datasets; and feature and label modelors used to: change the datasets from dimensional star schema to denormalized flat table; adjust granularity of the datasets; and update the version numbers of the datasets.”)
Aggarwal and Braincon are combinable for the same rationale as set forth above with respect to claim 1.

In regard to claim 5 and analogous claims 11 and 17, Aggarwal, Briancon, Bender, Zhou and Ma teach the system of claim 1. 
Aggarwal further teaches further comprising the staging manager configured to apply transitive reduction to the learned causal graph. (Aggarwal, pg. 143, paragraph 1 and 2, “We experiment with both dependency and causal graphs and use the extracted anomalous sub-graph (nodes having causal scores with golden signal and their connections) to rank the nodes using the Personalized PageRank method proposed in [11]. We assign higher weights to the nodes which cause golden signal error.
The inputs to thePageRank algorithm are the graphs (causal and dependency), the golden signal errors and the causal score of each node with the golden signal errors. Let CSi define the causal score of node i with respect to the golden signal errors. We derive the anomalous sub-graph from both dependency and causal graphs by preserving the nodes that cause golden signal errors (candidate nodes) and their direct connections. Considering that the request flow is from node i to node j, the weight of each edge eij is assigned to the value of CSj , the weight of each added self-edge eii is assigned to the value of CSi, and the weight of each added backward edge eji is assigned to the value of ρCSi, where ρ ∈ [0, 1] [to apply transitive reduction to the learned causal graph, examiner is interpreting the dependency graph as the transitive reduction of the causal graph]. We set ρ to a high value if the causal graph represents the true dependency graph. As error propagation happens in the opposite direction of request flow, we reverse the direction of the edges when applying the PageRank algorithm.”)

In regard to claim 21 and analogous claim 25, Aggarwal, Briancon, Bender, Zhou and Ma teach the method of claim 19. 
Aggarwal further teaches wherein dynamically localizing the application fault further comprises applying a distance based thresholding to estimate the source of one or more possible application faults.  (Aggarwal, pg. 145, 5.1 Results, paragraph 1, “We use a time bin of size 10ms as the inter-arrival time between error logs in this dataset. We have considered threshold for the number of golden signals errors as 15. To calculate precision and recall we use a graph-based approach. If the localized node does not exactly match the ground truth node we calculate match based on the distance (in number of hops) of the returned node n according to the following equation: S = 1− hn/(H + 1) (3) Here S is the final match score for the returned node n, hn is the distance(in hops) of this node from the ground truth node and H is a pre-configured threshold for the maximum number of hops allowed [applying a distance based thresholding to estimate the source of one or more possible application faults.]. For our experiments we use H = 3. In all the PageRank based methods, we measure precision and recall for Top3.”)

In regard to claim 22, Aggarwal, Briancon, Bender, Zhou and Ma teach the method of claim 19. 
Aggarwal further teaches wherein the training the Al model further comprises controlling fault injection and estimating ancestral edges for the micro-service in receipt of the fault injection. (Aggarwal, pg. 143, paragraph 2, “The inputs to thePageRank algorithm are the graphs (causal and dependency), the golden signal errors and the causal score of each node with the golden signal errors [fault injection]. Let CSi define the causal score of node i with respect to the golden signal errors. We derive the anomalous sub-graph from both dependency and causal graphs by preserving the nodes that cause golden signal errors (candidate nodes) and their direct connections. Considering that the request flow is from node i to node j, the weight of each edge eij is assigned to the value of CSj , the weight of each added self-edge eii is assigned to the value of CSi, and the weight of each added backward edge eji is assigned to the value of ρCSi, where ρ ∈ [0, 1] [estimating ancestral edges for the micro-service in receipt of the fault injection]. We set ρ to a high value if the causal graph represents the true dependency graph. As error propagation happens in the opposite direction of request flow, we reverse the direction of the edges when applying the PageRank algorithm.”)

In regard to claim 23, Aggarwal, Briancon, Bender, Zhou and Ma teach the method of claim 19. 
Aggarwal further teaches wherein training the Al model further comprises applying transitive reduction to the learned causal graph, the transitive reduction combining estimated ancestral edges from two or more controlled fault injections. (Aggarwal, pg. 143, paragraph 2, “The inputs to thePageRank algorithm are the graphs (causal and dependency), the golden signal errors and the causal score of each node with the golden signal errors. Let CSi define the causal score of node i with respect to the golden signal errors. We derive the anomalous sub-graph from both dependency and causal graphs by preserving the nodes that cause golden signal errors (candidate nodes) and their direct connections. Considering that the request flow is from node i to node j, the weight of each edge eij is assigned to the value of CSj , the weight of each added self-edge eii is assigned to the value of CSi, and the weight of each added backward edge eji is assigned to the value of ρCSi, where ρ ∈ [0, 1] [the transitive reduction combining estimated ancestral edges from two or more controlled fault injections.]. We set ρ to a high value if the causal graph represents the true dependency graph. As error propagation happens in the opposite direction of request flow, we reverse the direction of the edges when applying the PageRank algorithm.”)

In regard to claim 24, Aggarwal teaches a staging manager that operates in a pre-deployment environment to train an Al model, including: (Aggarwal, pg. 139, 2. Causal dependency model, “Causal dependency model which is mined from golden signals viz. error logs in application and gateway at the log template level. Unlike state-of-the art approaches which need weeks of training data our system can compute causal relationship [to train an Al model] among anomalous micro-services from only a few minutes of logs.” And pg. 141, paragraph 1, “Next, we use two Granger causality techniques: regression based and independence testing based to infer the causal relationship among micro-services. Causal dependencies indicate the strength of the correlation between the errors in various micro-services.” And Fig. 1, 
    PNG
    media_image1.png
    373
    753
    media_image1.png
    Greyscale
, pre-processing being interpreted as the pre-deployment environment.)
collecting first micro-service error log data generated offline, the first micro-service error log data corresponding to one or more selectively injected errors; (Aggarwal, pg. 144, Dataset Details, paragraph 1, “We use the TrainTicket application [2], an open-source micro-service application, to inject faults and generate log data to evaluate the effectiveness of our proposed approach. The application contains 41 micro-services. Service ts-ui-dashboard acts as the gateway service which records the status of each incoming and outgoing service. The error signals emitted by this service and capturing the failure of a request are considered as golden signals. We use Istio [1] to inject HTTP abort fault in multiple services. In abort fault, incoming request is intercepted by Istio and returns 500 error status. Users can also specify the percentage of requests that should be failed. We injected faults in 17 services that cover the main flow of the train ticket application. We generated data by running a scenario withere 100% incoming requests are failed.”). Aggarwal does not expressly teach that the log data is generated offline. However, Zhou teaches pg. 685, Col. 1, paragraph 3, “Our system implementation uses Istio to manage asynchronous microservice interactions during offline training and collect trace logs during both offline training and online prediction.”). Aggarwal and Zhou are all related to the same field of endeavor (i.e. error detection). In view of the teachings of Zhou, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Zhou to Aggarwal before the effective filing date of the claimed invention in order achieve high accuracy of prediction of errors. (Zhou, Col. 2, abstract, paragraph 3, “The results indicate that MEPFL can achieve high accuracy in intraapplication prediction of latent errors, faulty microservices, and fault types, and outperforms a state-of-the-art approach of failure diagnosis for distributed systems.”).
a production manager, operatively coupled to the staging manager, the production manager operates in a production environment, to dynamically localize an application fault, including: (Aggarwal, pg. 143, Last Mile Fault Localization, “The output from the previous step namely, Personalized PageRank, provides a ranked list of potential nodes (or microservices) which are faulty. However, this information is not sufficient to identify the precise location of the fault. To get the precise location, we use the Last Mile Fault Localization technique which examines the service emitting the error as well as it’s neighbourhood to identify the correct fault location [dynamically localize an application fault].” And Fig. 1, 
    PNG
    media_image1.png
    373
    753
    media_image1.png
    Greyscale
, examiner would like to point out that the production manager is interpreted as the first spot where the alert happens and then continuing through the process.)
However, Aggarwal does not explicitly teach an artificial intelligence (Al) platform in communication with the computer processor and memory, the Al platform comprising: 
collecting first micro-service error log data generated offline, the first micro-service error log data corresponding to one or more selectively injected errors; and
collecting second micro-service error log data generated online, the second micro-service error log data corresponding to the application error; 
building an correlation matrix based on the learned causal graph and the collected second error data, wherein the correlation matrix indicates whether a first application micro-service represented in the causal graph is an ancestor of a second application micro-service represented in the second micro- service error log data; and
leveraging the correlation matrix to identify a source of the application fault based on an ancestry of the application micro-services indicated in the correlation matrix.
Zhou teaches collecting first micro-service error log data generated offline, the first micro-service error log data corresponding to one or more selectively injected errors; and (Zhou, pg. 685, Col. 1, paragraph 3, “Our system implementation uses Istio to manage asynchronous microservice interactions during offline training and collect trace logs during both offline training and online prediction.”)
collecting second micro-service error log data generated online, the second micro-service error log data corresponding to the application error; (Zhou, pg. 685, Col. 1, paragraph 3, “Our system implementation uses Istio to manage asynchronous microservice interactions during offline training and collect trace logs during both offline training and online prediction.”)
Aggarwal and Zhou are combinable for the same rationale as set forth above with respect to claim 1.
However, Aggarwal and Zhou do not explicitly teach an artificial intelligence (Al) platform in communication with the computer processor and memory, the Al platform comprising: 
building an correlation matrix based on the learned causal graph and the collected second error data, wherein the correlation matrix indicates whether a first application micro-service represented in the causal graph is an ancestor of a second application micro-service represented in the second micro- service error log data; and
leveraging the correlation matrix to identify a source of the application fault based on an ancestry of the application micro-services indicated in the correlation matrix.
Bender teaches leveraging the correlation matrix to identify a source of the application fault based on an ancestry of the application micro-services indicated in the correlation matrix. (Bender, pg. 85, 4. Finding all-pairs LCA in DAGS, “In our algorithms we compute the answers to all _ n2 _ queries in the preprocessing stage. Then we answer queries by performing table lookups. We show how to build the binary common-ancestor-existence matrix in _O(nω) operations and the representative-LCA matrix in _O(n(ω+3)/2) operations [leveraging the ancestral matrix to identify the source of the application fault]. The fastest known matrix-multiplication algorithm to date runs in O(nω) where, ω ≈ 2.376 [10]. Thus, our all-pairs-common-ancestor-existence algorithm runs in time _O(n2.376), and our all-pairs-representative-LCA algorithm runs in time _O(n2.688).”)
Aggarwal, Zhou and Bender are combinable for the same rationale as set forth above with respect to claim 1.
However, Aggarwal, Zhou and Bender do not explicitly teach an artificial intelligence (Al) platform in communication with the computer processor and memory, the Al platform comprising: 
building an correlation matrix based on the learned causal graph and the collected second error data , wherein the correlation matrix indicates whether a first application micro-service represented in the causal graph is an ancestor of a second application micro-service represented in the second micro- service error log data; and
Briancon teaches an artificial intelligence (Al) platform in communication with the computer processor and memory, the Al platform comprising: (Briancon, paragraph 0057, “A data domain that transforms data into objects, and an AI (artificial intelligence) domain [artificial intelligence (Al) platform] that transforms these objects into results.” And paragraph 0263, “Computing system 1000 may include one or more processors (e.g., processors 1010a-1010n) coupled to system memory 1020, an input/output I/O device interface 1030, and a network interface 1040 via an input/output (I/O) interface 1050. A processor may include a single processor or a plurality of processors ( e.g., distributed processors). A processor may be any suitable processor capable of executing or otherwise performing instructions. A processor may include a central processing unit (CPU) that carries out program instructions to perform the arithmetical, logical, and input/output operations of computing system 1000. A processor may execute code (e.g., processor firmware, a protocol stack, a database management system, an operating system, or a combination thereof) that creates an execution environment for program instructions. A processor may include a programmable processor. A processor may include a Graphic Processing Unit (GPU). A processor may include general or special purpose microprocessors. A processor may receive instructions and data from a memory [processor and memory] (e.g., system memory 1020).”)
Aggarwal, Zhou, Bender and Braincon are combinable for the same rationale as set forth above with respect to claim 1.
However, Aggarwal, Zhou, Bender and Briancon do not explicitly teach building an correlation matrix based on the learned causal graph and the collected second error data, wherein the correlation matrix indicates whether a first application micro-service represented in the causal graph is an ancestor of a second application micro-service represented in the second micro- service error log data; and
Ma teaches building an correlation matrix based on the learned causal graph and the collected second error data , wherein the correlation matrix indicates whether a first application micro-service represented in the causal graph is an ancestor of a second application micro-service represented in the second micro- service error log data; and (Ma, pg. 3095, Col. 1, 4.6, paragraph 1, “Let GðV;EÞ be the constructed impact graph from T [learned causal graph], where each node vi 2 V indicates a service, each edge eij 2 E is set to 1 when service vi is impacted by vj [the collected second micro-service error log data]. Other inputs of the root cause identification stage include a front-end service vfe and correlation score matrix C [building a correlation matrix]. For simplicity, we use the notation of the original correlation score ci;j in the following, regardless of whether calibrated or not. Our algorithm proposed to find the root cause is inspired by the random walk algorithm, which is analogous to the human behavior during the manual investigation. Assuming that we have no domain knowledge about the anomaly, only the discovered impact graph GðV;EÞ and correlation score C, one of the natural diagnosis methods is to randomly traverse services following G with preferentially looking for a high correlation score ci;j regard to the anomaly in frontend service vfe. The random walk should consider not only the correlation of currently-visiting service, but also the correlation between the previously visited service and vfe [an ancestor]. For example, when the correlation of the previous and current node is both high, moving forward may find the root cause.”)
Aggarwal, Zhou, Bender, Briancon and Ma are combinable for the same rationale as set forth above with respect to claim 1.

Claims 3-4, 9-10, 15-16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal, in view of Braincon, Bender, Zhou and Ma and in further view of Zhang et al (An Influence-Based Approach for Root Cause Alarm Discovery in Telecom Networks, "Zhang").

In regard to claim 3 and analogous claims 9 and 15, Aggarwal, Briancon, Bender, Zhou and Ma teach the system of claim 1. 
However, Aggarwal, Briancon, Bender, Zhou and Ma do not explicitly teach wherein the causal relationship learning between application micro-services and the causal graph generation occurs offline.  
Zhang teaches wherein the causal relationship learning between application micro-services and the causal graph generation occurs offline. (Zhang, pg. 127, Fig. 1, 
    PNG
    media_image2.png
    392
    859
    media_image2.png
    Greyscale
, Examiner would like to point out that in Fig. 1 the offline process on the bottom is being interpreted as the construction of the causal graph.)
Aggarwal, Briancon, Bender, Zhou, Ma and Zhang are all related to the same field of endeavor (i.e. placement systems). In view of the teachings of Zhang, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Zhang to Aggarwal, Briancon, Bender, Zhou and Ma before the effective filing date of the claimed invention in order to use real-time data to discover root cause alarms. (Zhang, Abstract, ”We subsequently discover root cause alarms in a real-time data stream by applying an influence maximization algorithm on the weighted graph.”)

In regard to claim 4 and analogous claims 10 and 16, Aggarwal, Briancon, Bender, Zhou and Ma teach the system of claim 1. 
However, Aggarwal, Briancon, Bender, Zhou and Ma do not explicitly teach wherein fault localization occurs online in real-time.  
Zhang teaches wherein fault localization occurs online in real-time. (Zhang, pg. 127, Fig. 1, 
    PNG
    media_image2.png
    392
    859
    media_image2.png
    Greyscale
, Examiner would like to point out that the online process of the top of Fig. 1 and (Root Cause Alarm Prediction) is being interpreted as the fault localization process happening online.)
Aggarwal and Zhang are combinable for the same rationale as set forth above with respect to claim 3.

In regard to claim 20, Aggarwal, Briancon, Bender, Zhou and Ma teach the method of claim 19. 
However, Aggarwal, Briancon, Bender, Zhou and Ma do not explicitly teach wherein training the Al model occurs offline and localizing the application fault occurs in real-time.  
Zhang teaches wherein training the Al model occurs offline and localizing the application fault occurs in real-time. (Zhang, pg. 127, Fig. 1, 
    PNG
    media_image2.png
    392
    859
    media_image2.png
    Greyscale
, the online and offline processes are being interpreted by examiner as the AI model being trained offline and the fault localization occurring online.)
Aggarwal and Zhang are combinable for the same rationale as set forth above with respect to claim 3.

Claims 6, 12, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal, in view of Briancon, Bender, Zhou and Ma and in further view of Wu et al (US Published Patent Application No. 20140181618, "Wu").

In regard to claim 6, and analogous claim 12 and 18, Aggarwal, Briancon, Bender, Zhou and Ma teach the system of claim 1. 
However, Aggarwal, Briancon, Bender, Zhou and Ma do not explicitly teach wherein the leveraging of the correlation matrix includes the production manager to identify a plurality of potential sources of the error, and further comprising the production manager configured to apply a distance metric to estimate the error source, wherein the distance metric comprises a Hamming distance or cosine similarity.  
Wu teaches wherein the leveraging of the correlation matrix includes the production manager to identify a plurality of potential sources of the error, and further comprising the production manager configured to apply a distance metric to estimate the error source, wherein the distance metric comprises a Hamming distance or cosine similarity. (Wu, paragraph 0042, “The H-matrix used in error-detection module 402 may be further configured to assure that all burst patterns in FIG. 3 are detectable and correctable. In embodiments, the H-matrix may be configured to produce a non-zero value for a codeword with at least one error [identify a plurality of potential sources of the error,]. That happens to be the case because DECTED has Hamming distance [a Hamming distance] of 6, thus having an error alias to a valid word requires at least 6 bit of difference. However, burst patterns in FIG. 3 may have a maximum bit weight of 4.”)
Aggarwal, Briancon, Bender, Zhou, Ma and Wu are all related to the same field of endeavor (i.e. error detection). In view of the teachings of Wu, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Wu to Aggarwal, Briancon, Bender, Zhou and Ma before the effective filing date of the claimed invention in order to make error patterns detectable and correctable. (Wu, paragraph 0042, “The H-matrix used in error-detection module 402 may be further configured to assure that all burst patterns in FIG. 3 are detectable and correctable.”)

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SKYLAR K VANWORMER whose telephone number is (703)756-1571. The examiner can normally be reached M-F 6:00am to 3:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/S.K.V./Examiner, Art Unit 2146                                                                                                                                                                                                        /USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Show 14 earlier events
Oct 22, 2025
Response Filed
Mar 03, 2026
Final Rejection mailed — §103
Mar 30, 2026
Interview Requested
Apr 14, 2026
Examiner Interview Summary
Apr 14, 2026
Applicant Interview (Telephonic)
Apr 17, 2026
Response after Non-Final Action
May 18, 2026
Request for Continued Examination
May 20, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/331,475
Patent 12591789
Knowledge distillation in multi-arm bandit, neural network models for real-time online optimization
4y 10m to grant Granted Mar 31, 2026
17/169,083
Patent 12541680
REDUCED COMPUTATION REAL TIME RECURRENT LEARNING
4y 12m to grant Granted Feb 03, 2026
17/383,132
Patent 12524655
ARTIFICIAL NEURAL NETWORK PROCESSING METHODS AND SYSTEM
4y 5m to grant Granted Jan 13, 2026
17/350,840
Patent 12511554
Complex System for End-to-End Causal Inference
4y 6m to grant Granted Dec 30, 2025
17/514,512
Patent 12505358
Methods and Systems for Approximating Embeddings of Out-Of-Knowledge-Graph Entities for Link Prediction in Knowledge Graph
4y 1m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

4-5
Expected OA Rounds
39%
Grant Probability
62%
With Interview (+22.5%)
4y 0m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 28 resolved cases by this examiner. Grant probability derived from career allowance rate.