Last updated: April 18, 2026
Application No. 17/322,184
DATA DRIFT MITIGATION IN MACHINE LEARNING FOR LARGE-SCALE SYSTEMS

Final Rejection §102§103
Filed
May 17, 2021
Examiner
YI, HYUNGJUN B
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
4 (Final)
Interview Optional

— +31.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 17 resolved cases, 2023–2026
Examiner Intelligence

YI, HYUNGJUN B View full profile →
Grants only 18% of cases
Career Allow Rate
3 granted / 17 resolved
-37.4% vs TC avg
Strong +32% interview lift
Without
With
+31.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
39 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
26.3%
-13.7% vs TC avg
§103
53.9%
+13.9% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
4.7%
-35.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 17 resolved cases
Office Action

§102 §103
DETAILED ACTION
	This action is responsive to the claims filed on 01/06/2026. Claims 1-20 are pending for examination.
This action is Final.
Response to Arguments
Applicant’s arguments traversing the rejection under 35 U.S.C. 103 have been fully considered but are not persuasive.

Regarding Applicant’s argument that the rejection is based on improper piecemeal examination and hindsight reconstruction (Remarks, page 9), this argument is not persuasive. The rejection does not rely on hindsight, but instead on the express and complementary teachings of the applied references. Sarkar teaches the claimed cloud-based multi-model environment and batched training data, e.g., “configured as a service in a cloud storage application … for communication with and data transfer to and from a model server” (Sarkar, paragraph 87), and further teaches operating on “training data sets 224 or batches thereof” to generate trained model instances (Sarkar, paragraph 50). Rao teaches grouping the training data and using a decision-tree meta-model for model selection, e.g., “the time series categorization module 212 … generates grouped time series training data 214” (Rao, paragraph 54), “the time series training data 210 may be categorized … into three groups of grouped time series training data 214” (Rao, paragraph 56), and “a decision tree is generated by the meta-model decision engine training module 230 … a meta-model is built for model selection” (Rao, paragraph 90). Tan further provides an express reason for using decision-tree classification logic in large datasets, namely that “Techniques developed for constructing decision trees are computationally inexpensive … and … classifying a test record is extremely fast” (Tan, p. 169, paragraph 2). Thus, the rejection is based on articulated teachings and express reasons grounded in the prior art itself, rather than improper hindsight.
Regarding Applicant’s argument that the prior art fails to teach “training a single decision tree” and fails to teach that data points from two or more batches are present in at least one leaf node (Remarks, page 9), Applicant’s argument is not persuasive. To the extent Applicant’s remarks are directed to Khiari’s bagged multi-tree framework, such argument is not commensurate with the present rejection, which relies on Rao rather than Khiari for the decision-tree selector. Rao expressly teaches that “a decision tree is generated by the meta-model decision engine training module 230” and that “an algorithm that is utilized to train the machine learning algorithm is a decision tree” (Rao, paragraph 90), which directly teaches the claimed single decision tree. Further, Tan expressly teaches that a tree node/leaf may contain records from more than one class, stating: “If Dt contains records that belong to more than one class, an attribute test condition is selected to partition the records into smaller subsets” (Tan, p. 152, paragraph 3). Tan additionally teaches that, at a leaf, “for each leaf node t, let p(i|t) denote the fraction of training records from class i associated with the node t,” and that “the leaf node is assigned to the class that has the majority number of training records” (Tan, p. 165, paragraph 3). Thus, Tan expressly teaches that multiple labeled records may be present at a leaf prior to majority assignment. In the present combination, those labels reasonably correspond to batch/model identifiers. Accordingly, the applied art teaches both a single decision tree and a leaf containing data points from two or more claimed batches.
Regarding Applicant’s argument that the cited art does not teach “identifying a leaf node for a test data point including following a path through the decision tree to reach a corresponding leaf node” (Remarks, page 10), this argument is not persuasive. Rao teaches this limitation expressly and in nearly identical terms. Specifically, Rao states that “traversal of the decision tree represented by the diagram 1300 begins at the root node 1310, proceeds through a path of the branch nodes 1320, until a leaf node 1330 is reached” (Rao, paragraph 147). Thus, Rao expressly teaches the claimed act of following a path through the decision tree for an input/test data point until a corresponding leaf node is reached. Therefore, Applicant’s argument that the art does not teach the leaf-identification/path-following step is not persuasive.
Regarding Applicant’s argument that Tan merely teaches majority-class voting for ordinary classification and does not teach associating the test data point with a batch having the highest number of data points in the same leaf node, this argument is not persuasive (Remarks, page 10). Tan expressly teaches the exact count-based rule relied upon in the rejection. Tan states that “for each leaf node t, let p(i|t) denote the fraction of training records from class i associated with the node t,” and that “the leaf node is assigned to the class that has the majority number of training records” (Tan, p. 165, paragraph 3). Tan therefore teaches selecting the label having the highest count among the training records associated with the reached leaf. Further, Tan explains that “If Dt contains records that belong to more than one class,” the records are partitioned into subsets (Tan, p. 152, paragraph 3), thereby confirming that multiple labels may coexist in the same leaf/node before majority assignment. Applicant’s argument that Tan concerns “classification” rather than “batch similarity” is not persuasive because the claim language requires only that the test data point be associated with the batch having the highest number of data points in the same leaf node, not that any separate or specialized similarity metric be computed. In the present combination, the relevant class/label is the batch/model identifier, and Rao supplies the prior step of sorting the test data point to a leaf, i.e., traversal “proceeds through a path of the branch nodes 1320, until a leaf node 1330 is reached” (Rao, paragraph 147). Thus, once the test data point is in the leaf, Tan teaches the claimed highest-count association rule.
Regarding Applicant’s argument that the applied art fails to teach that each leaf node corresponds to a cluster of data points (Remarks, page 10), this argument is not persuasive. Landwehr expressly teaches that “the tree structure gives a disjoint subdivision of S into regions St, and every region is represented by a leaf in the tree” (Landwehr, p. 173, last para.). Under the broadest reasonable interpretation, the claimed “cluster” reads on such a region, partition cell, or grouped subset of data points defined by the decision-tree subdivision of the instance space. Thus, Landwehr directly supports the mapping that each leaf corresponds to a region/subset of the data, i.e., a claimed cluster. Accordingly, Applicant’s argument that the leaf/cluster correspondence is absent from the applied art is not persuasive.
Regarding Applicant’s argument that Landwehr merely executes a local model at a leaf and does not teach routing the test data point to the ML model associated with the batch having the highest number of data points in that same leaf (Remarks, page 11), this argument is also not persuasive. Landwehr is not relied upon for the majority-count determination; Tan supplies that rule, as discussed above, by teaching that the operative label at a leaf is the one with the “majority number of training records” (Tan, p. 165, paragraph 3). Landwehr is relied upon for the routing/use of the model associated with the reached leaf, which Landwehr teaches expressly: “The final model tree consists of a tree with linear regression functions at the leaves … and the prediction for an instance is obtained by sorting it down to a leaf and using the prediction of the linear model associated with that leaf” (Landwehr, p. 169, sec. 3.1, paragraph 2). Thus, Landwehr teaches routing an input to a leaf-associated model for prediction, while Tan teaches which label predominates at that leaf. The rejection therefore does not rely on Landwehr alone to teach the count-based selection rule, but rather on the combined teachings of Tan and Landwehr for that portion of the claim. Applicant’s argument does not address this combined rationale and is therefore not persuasive.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Sarkar et al. (US 20210224684 A1), hereafter referred to as Sarkar, in view of Rao et al., (US20200242483A1), hereafter referred to as Rao, and in further view of Tan et al., (Tan, P. N., Steinbach, M., & Kumar, V. (2006). Classification: basic concepts, decision trees, and model evaluation. Introduction to data mining, 1, 145-205.), hereafter referred to as Tan, and Landwehr et al., (Landwehr, N., Hall, M., & Frank, E. (2005). Logistic model trees. Machine learning, 59(1), 161-205.), hereafter referred to as Landwehr.
Claim 1: Sarkar teaches the following limitations:
A method for selecting a machine learning (ML) model for processing test data from a plurality of ML models in a cloud environment, the method comprising: (Sarkar, paragraph 87, “Model manager 452 may include an interface protocol and/or set of functions and parameters for enabling a model server to access training model instances stored in data store 490. For example, model manager 452 may be configured as a service in a cloud storage application or API in a distributed storage system for communication with and data transfer to and from a model server.”, model storage manager is used to select machine learning models for further processing through cloud storage application interfaces.)
batching training data into a plurality of batches each being associated with a ML model of the plurality of ML models; (Sarkar, paragraph 50, “For example, model trainer 240 may be operating on one or more training data sets 224 in primary storage system 220 to generate trained model instances 228. In some embodiments, model trainer 240 may be hosted by a GPU compute cluster configured to access the training data from primary storage system 220 without using any local flash storage or similar permanent storage resource. For example, model trainer 240 may use remote direct memory access (RDMA) or another remote memory access protocol to operate directly on training data sets 224 or batches thereof in primary storage system 220 without first transferring training data sets 224 into local permanent storage.”, plurality of data sets associated with the plurality of models is processed in batches.)
Rao, in the same field of machine learning model selection, teaches the following limitations which Sarkar fails to teach:
organizing the plurality of batches of training data into a plurality of clusters of data points of the training data; (Rao, paragraph 54, “the time series categorization module 212 receives the time series training data 210 and categorizes the degree of sparsities for each time series of the time series training data 210. In one embodiment, the time series categorization module 212 generates grouped time series training data 214 based on the categorizations.”;
Rao, paragraph 56, “the time series training data 210 may be categorized by the time series categorization module 212 into three groups of grouped time series training data 214.”, 
Rao, paragraph 115, “In one embodiment, the time series training data 210 is grouped into grouped time series training data 214 representing buckets of time series”, The claimed “organizing … into a plurality of clusters of data points” reads on Rao’s categorization of training data into grouped time series training data and buckets of time series. A “cluster” is reasonably met by Rao’s grouped subsets of training instances formed according to shared characteristics. The “data points” are read on Rao’s time series training data instances, and the claimed plurality of clusters reads on Rao’s express disclosure of multiple groups, e.g., “three groups.” Thus, it teaches organizing the training data into grouped/clustered subsets.)
training a single decision tree on the plurality of batches of training data, (Rao, paragraph 90, “a decision tree is generated by the meta-model decision engine training module 230… a meta-model is built for model selection. In this example, an algorithm that is utilized to train the machine learning algorithm is a decision tree.”, Rao expressly teaches generating and building a decision tree as the meta-model used for model selection. The claimed “single” decision tree reads on Rao’s disclosure of one selector-tree meta-model built for a given grouping/selection task. The “plurality of batches of training data” maps to the grouped training data supplied to Rao’s meta-model training pipeline. Thus, Rao teaches training a single decision-tree selector over the organized training data.)
identifying a leaf node for a test data point including following a path through the decision tree to reach a corresponding leaf node of the test data point; (Rao, paragraph 147, “traversal of the decision tree represented by the diagram 1300 begins at the root node 1310, proceeds through a path of the branch nodes 1320, until a leaf node 1330 is reached”, The claimed “identifying a leaf node for a test data point” maps directly to Rao’s traversal of the decision tree for an input time series. The “following a path through the decision tree” is expressly taught by Rao’s “proceeds through a path of the branch nodes,” and the claimed “corresponding leaf node” is taught by Rao’s statement that traversal continues “until a leaf node … is reached.”)
and processing the test data point by the ML model the test data point was routed to, resulting in an inference. (Rao, paragraph 97, “the prediction module 430 generates a prediction based on the time series input data 310 and the selected models 340”, 
Rao, paragraph 127, “the selected prediction algorithms 344 are applied to the transformed time series input data 310 to form the prediction output data 440”, The claimed “processing the test data point by the ML model” reads on Rao’s application of the selected prediction algorithm/model to the input data. The “resulting in an inference” reads on Rao’s “prediction” / “prediction output data.” Thus, Rao expressly teaches processing the routed input with the selected model to obtain the claimed inference.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Sarkar’s cloud-based multi-model training and deployment environment with Rao’s decision-tree-based dynamic model-selection framework, because both references address selection and use of an appropriate predictive model from among multiple candidate models using characteristics of training/input data. Rao explains that dynamic model selection is performed by creating a meta-model from historical sets of time series and dynamically selecting a model based on time series attributes, which would have predictably improved model selection in Sarkar’s multi-model cloud environment. 
Tan, in the same field as decision tree processing, teaches the following which the above prior art fails to teach:
wherein data points of the training data from two or more of the plurality of batches are present in at least one leaf node (Tan, page 165, paragraph 3, “The Classify() function determines the class label to be assigned to a leaf node. For each leaf node t, let p( i It) denote the fraction of training records from class i associated with the node t. In most cases, the leaf node is assigned to the class that has the majority number of training records”;
Tan, page 152, paragraph 3, “Step 1: If all the records in Dt belong to the same class Yt, then t is a leaf node labeled as Yt. Step 2: If Dt contains records that belong to more than one class, an attribute test condition is selected to partition the records into smaller subsets.”, Tan expressly teaches that a leaf node can contain training records belonging to different classes, and that the leaf label is then determined by the class having the majority number of training records. In the present combination, the relevant “class” is the batch/model identifier label used in Rao’s selector framework. Tan teaches that multiple labels may be represented among the training records associated with a node/leaf before majority assignment. Therefore, Tan teaches that at least one leaf may contain data points from two or more claimed batches.)
associating the test data point with a batch of the two or more of the plurality of batches that has a highest number of data points of the training data in the same leaf node as the test data point; (Tan, page 165, paragraph 3, “The Classify() function determines the class label to be assigned to a leaf node. For each leaf node t, let p(i|t) denote the fraction of training records from class i associated with the node t. In most cases, the leaf node is assigned to the class that has the majority number of training records: 
    PNG
    media_image1.png
    41
    325
    media_image1.png
    Greyscale
”,at a leaf, the decision is made by selecting the most frequent category among the training records that reaches that leaf (i.e., the label corresponds to the arg-max of the per-leaf class proportions p(i|t)). An argmax function explicitly chooses the leaf with the highest amount of class proportion, i.e., associating the test data point with the batch having the highest number of data points in the same leaf node as claimed.)
I	It would have been obvious to a person of ordinary skill in the art before the effective filing date to have incorporated the teachings disclosed by Sarkar (i.e. batching training data associated with ML models over a cloud environment) and Rao with the teachings disclosed by Tan (i.e. decision tree model classification selection). A motivation for the combination is for lower computational cost on large datasets, (Tan, page 169, paragraph 2, “Techniques developed for constructing decision trees are computationally inexpensive, making it possible to quickly construct models even when the training set size is very large. Furthermore, once a decision tree has been built, classifying a test record is extremely fast, with a worst-case complexity of O(w), where w is the maximum depth of the tree.”).
Landwehr, in the same field of decision tree processing, teaches the following limitation which the above prior art fails to teach:
wherein each leaf node of the decision tree corresponds to a cluster of the plurality of clusters of data points of the training data; (Landwehr, page 173, last paragraph, “More formally, a logistic model tree consists of a tree structure that is made up of a set of inner or non-terminal nodes N and a set of leaves or terminal nodes T. Let S denote the whole instance space, spanned by all attributes that are present in the data. Then the tree structure gives a disjoint subdivision of S into regions St, and every region is represented by a leaf in the tree”, The claimed “corresponds to a cluster” is reasonably met by Landwehr’s “region” or partition cell of the instance space. Under a broadest reasonable interpretation, each Landwehr leaf corresponds to a subset/region of the training data points, i.e., a claimed “cluster.”)
routing the test data point to the ML model associated with a batch of training data that has the highest number of data points in the same leaf node as the test data point;  (Landwehr, page 169, section 3.1, paragraph 2, “The final model tree consists of a tree with linear regression functions at the leaves (Frank et al., 1998), and the prediction for an instance is obtained by sorting it down to a leaf and using the prediction of the linear model associated with that leaf.”, the input is routed along the tree and at the leaf, the ML model associated with that leaf, is executed to produce the prediction. In combination, we use Tan to decide which batch wins at the leaf  as disclosed in the previous limitation above (the batch with the most training points in that leaf). It is interpreted by the examiner that once Tan’s system sorts a test record to a leaf, Landwehr’s system uses the model attached to that leaf. So the test record is routed to the ML model associated with the batch that has the highest number of points in that same leaf.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date to have incorporated the teachings disclosed by Sarkar (i.e. batching training data associated with ML models over a cloud environment), Rao, and Tan with the teachings disclosed by Landwehr (i.e. selection of machine learning model associated with leaf node routing). A motivation for the combination is to allow for explicit probabilities for each class to be produced rather than having a single classification, (Landwehr, page 162, paragraph 2, “A more natural way to deal with classification tasks is to use a combination of a tree structure and logistic regression models resulting in a single tree. Another advantage of using logistic regression is that explicit class probability estimates are produced rather than just a classification.”).
Claim 11 has limitations substantially similar to claim 1 and as such a similar analysis applies.
Claim 2 is rejected under 35 U.S.C. 102(a)(2) as being unpatentable over Sarkar in view of Rao and in further view of Tan, Landwehr, and David Segev (US 11886994), hereafter referred to as Segev. 
Claim 2: Sarkar, Rao, Tan, and Landwehr teaches the limitations of claim 1. Segev, in the same field of machine learning model processing and training, teaches:
wherein the batching of the training data is performed by offline before the test data is received by the cloud environment (Segev, col. 20, lines 33-39, “OL (Online Detection) can involve a single offline initial (or “first”) training phase (330 or 330′ in FIG. 3B) for a pre-determined training period, followed by a second training phase and detection according to one embodiment (FIG. 4A) or another embodiment (FIG. 4B). The two training phases generate the infrastructure for the detection (the “normal clusters”).”, online detection of the model involves an initial offline training of the model to determine “normal clusters”.
Col. 10, lines 64-67, “An interface 59 may ingest input data by employing a variety of mechanisms including, for example, push/pull_protocol (mechanism), in real time and/or in batches (historical data)”, input data into the network may be deployed in batches. Therefore, the batching of training data is performed offline.).  
Segev is analogous to the present invention because it discloses systems and methods for training and selecting machine learning pipelines. It would have been obvious to a person of ordinary skill in the art before the effective filing date to have incorporated the teachings disclosed by Sarkar, Rao, Tan, and Landwehr with the teachings disclosed by Segev (i.e. offline processing of training data). A motivation for the combination is to allow for the user to retrain the machine learning models in a steady environment for moments when an online model (a model using a live stream of data) deviates from the original training profile, (Segev, col. 19, lines 56-61, “The training process clusters the data into “normal” clusters. Since the training process is always done offline, it can be updated in the background all the time. Therefore, it supports steady online construction of training data to replace current training data, if the latter deviate from the current training profile.”).
Claims 3, 12, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Sarkar in view of Rao, Tan, Landwehr, Segev, as applied to claim 2 above, and further in view of Khan et al. (US 20200380312 A1), hereafter referred to as Khan.

Claim 3: Sarkar, Rao, Tan, Landwehr, and Segev teaches the limitations of claim 2. Khan, in the same field of machine learning clustering, teaches the following limitations that Sarkar, Rao, Tan, Landwehr and Segev fails to teach:
calculating a similarity metric for each test data point that represents a similarity of the test data point to the training data, wherein the similarity metric includes a distance for the test data point relative to the training data as part of a spatial nearness factor and also includes a temporal nearness factor based on a time that the test data is received by the cloud environment (Khan, paragraph 30, “Further, the categorization module 302 may categorize the plurality of images into a set of clusters based on a temporal and spatial similarities. It should be noted that the temporal and spatial similarity may be computed based on at least one of frame mapping, background subtraction, distance measure, or the like.”, clustering of input data (images) is based on a spatial and temporal distance from clusters learned during the training phase or beforehand.).  
Khan is analogous to the present invention because it discloses systems and methods in the machine learning clustering. It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Sarkar, Rao, Tan, Landwehr, and Segev with the teachings disclosed by Khan (i.e. using spatial and temporal distance to determine clusters). A motivation for the combination is to account for time sequenced data such as a video stream into the clustering, (Khan, paragraph 56, “By way of an example, the plurality of images may be a sequence of frames associated with a video. Hence, the video across the dataset may be aligned based on temporal frame sequences and the sequence of frames may be sorted and categorized into the same set of cluster.”).
Claim 12 has limitations substantially similar to claim 3 and as such a similar analysis applies.
Claim 15: Sarkar teaches the following limitations:
One or more computer-storage memory devices embodied with executable operations that, when executed by one or more processors, are configured to select a machine learning (ML) model for processing test data from a plurality of ML models in a cloud environment, comprising: (Sarkar, paragraph 87, “Model manager 452 may include an interface protocol and/or set of functions and parameters for enabling a model server to access training model instances stored in data store 490. For example, model manager 452 may be configured as a service in a cloud storage application or API in a distributed storage system for communication with and data transfer to and from a model server.”, model storage manager is used to select machine learning models for further processing through cloud storage application interfaces.)
batching training data into a plurality of batches each being associated with a ML model of the plurality of ML models; (Sarkar, paragraph 50, “For example, model trainer 240 may be operating on one or more training data sets 224 in primary storage system 220 to generate trained model instances 228. In some embodiments, model trainer 240 may be hosted by a GPU compute cluster configured to access the training data from primary storage system 220 without using any local flash storage or similar permanent storage resource. For example, model trainer 240 may use remote direct memory access (RDMA) or another remote memory access protocol to operate directly on training data sets 224 or batches thereof in primary storage system 220 without first transferring training data sets 224 into local permanent storage.”, plurality of data sets associated with the plurality of models is processed in batches.)
Rao, in the same field of machine learning model selection, teaches the following limitations which Sarkar fails to teach:
an offline training pipeline executable for: (Rao, paragraph 51, “In one embodiment, the meta-model training system 120 receives time series training data 210. In one embodiment, the time series training data 210 is received from the data management system 150. It is to be understood that, under one embodiment, time series training data includes sequence data as discussed herein, or as known in the art at the time of filing, or as developed, or becomes available, after the time of filing”, the training process (the pipeline) relies solely on pre-existing training data, which is manipulated and resampled offline. No online or real-time streams of data are involved in the training.)
organizing the plurality of batches of training data into a plurality of clusters of data points of the training data; (Rao, paragraph 54, “the time series categorization module 212 receives the time series training data 210 and categorizes the degree of sparsities for each time series of the time series training data 210. In one embodiment, the time series categorization module 212 generates grouped time series training data 214 based on the categorizations.”;
Rao, paragraph 56, “the time series training data 210 may be categorized by the time series categorization module 212 into three groups of grouped time series training data 214.”, 
Rao, paragraph 115, “In one embodiment, the time series training data 210 is grouped into grouped time series training data 214 representing buckets of time series”, The claimed “organizing … into a plurality of clusters of data points” reads on Rao’s categorization of training data into grouped time series training data and buckets of time series. A “cluster” is reasonably met by Rao’s grouped subsets of training instances formed according to shared characteristics. The “data points” are read on Rao’s time series training data instances, and the claimed plurality of clusters reads on Rao’s express disclosure of multiple groups, e.g., “three groups.” Thus, it teaches organizing the training data into grouped/clustered subsets.)
training a single decision tree on the plurality of batches of training data, (Rao, paragraph 90, “a decision tree is generated by the meta-model decision engine training module 230… a meta-model is built for model selection. In this example, an algorithm that is utilized to train the machine learning algorithm is a decision tree.”, Rao expressly teaches generating and building a decision tree as the meta-model used for model selection. The claimed “single” decision tree reads on Rao’s disclosure of one selector-tree meta-model built for a given grouping/selection task. The “plurality of batches of training data” maps to the grouped training data supplied to Rao’s meta-model training pipeline. Thus, Rao teaches training a single decision-tree selector over the organized training data.)
identifying a leaf node for a test data point including following a path through the decision tree to reach a corresponding leaf node of the test data point; (Rao, paragraph 147, “traversal of the decision tree represented by the diagram 1300 begins at the root node 1310, proceeds through a path of the branch nodes 1320, until a leaf node 1330 is reached”, The claimed “identifying a leaf node for a test data point” maps directly to Rao’s traversal of the decision tree for an input time series. The “following a path through the decision tree” is expressly taught by Rao’s “proceeds through a path of the branch nodes,” and the claimed “corresponding leaf node” is taught by Rao’s statement that traversal continues “until a leaf node … is reached.”)
The rationale for combining Sarkar with Rao is similar to that applied for claim 1 above.
Tan, in the same field as decision tree processing, teaches the following which the above prior art fails to teach:
associating the test data point with a batch of the two or more of the plurality of batches that has a highest number of data points of the training data in the same leaf node as the test data point; (Tan, page 165, paragraph 3, “The Classify() function determines the class label to be assigned to a leaf node. For each leaf node t, let p(i|t) denote the fraction of training records from class i associated with the node t. In most cases, the leaf node is assigned to the class that has the majority number of training records: 
    PNG
    media_image1.png
    41
    325
    media_image1.png
    Greyscale
”,at a leaf, the decision is made by selecting the most frequent category among the training records that reaches that leaf (i.e., the label corresponds to the arg-max of the per-leaf class proportions p(i|t)). An argmax function explicitly chooses the leaf with the highest amount of class proportion, i.e., associating the test data point with the batch having the highest number of data points in the same leaf node as claimed.)
The rationale for combining Sarkar with Tan is similar to that applied for claim 1 above.
Landwehr, in the same field of decision tree processing, teaches the following limitation which the above prior art fails to teach:
selecting the ML model for processing the test data based on the similarity metric, and routing the test data point to the selected ML model; (Landwehr, page 169, section 3.1, paragraph 2, “The final model tree consists of a tree with linear regression functions at the leaves (Frank et al., 1998), and the prediction for an instance is obtained by sorting it down to a leaf and using the prediction of the linear model associated with that leaf.”, the input is routed along the tree and at the leaf, the ML model associated with that leaf, is executed to produce the prediction.)
The rationale for combining Sarkar with Landwehr is similar to that applied for claim 1 above.
Khan, in the same field of machine learning clustering, teaches the following limitations that Rao and Sarkar fails to teach:
calculating a similarity metric for each test data point that represents a similarity of the test data point to the training data, wherein the similarity metric includes a distance for the test data point relative to the training data as part of a spatial nearness factor and also includes a temporal nearness factor based on a time that the test data is received by the cloud environment (Khan, paragraph 30, “Further, the categorization module 302 may categorize the plurality of images into a set of clusters based on a temporal and spatial similarities. It should be noted that the temporal and spatial similarity may be computed based on at least one of frame mapping, background subtraction, distance measure, or the like.”, clustering of input data (images) is based on a spatial and temporal distance from clusters learned during the training phase or beforehand.).  
The rationale for combining Sarkar with Khan is similar to that applied for claim 3 above.
Segev, in the same field of offline/online machine learning pipeline execution, teaches the following limitations which Sarkar, Khiari, and Khan fail to teach:
an online matching pipeline configured for (Segev, col. 9, lines 30-33, “The rest of the data (commonly referred to as “testing data”) is sensed/streamed/captured constantly in real-time, and classification of each NAMDDP (Newly arrived Multi-dimensional data point) as being either normal or abnormal is done online.”, testing data is compared to the “normal” clusters to determine abnormality.)
The rationale for combining Sarkar with Segev is similar to that applied for claim 2 above.

Claims 4, 13, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Sarkar in view of Rao, Tan, Landwehr, Segev, and Khan, as applied to claims 3, 12, and 15 above, and further in view of Khiari et al., (Khiari, J., Moreira-Matias, L., Shaker, A., Ženko, B., & Džeroski, S. (2019). Metabags: Bagged meta-decision trees for regression. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part I 18 (pp. 637-652). Springer International Publishing.), hereafter referred to as Khiari, and Gomes et al. (Gomes, H. M., Barddal, J. P., Enembreck, F., & Bifet, A. (2017). A survey on ensemble learning for data stream classification. ACM Computing Surveys (CSUR), 50(2), 1-36.), hereafter referred to as Gomes.
Claim 4: Sarkar, Rao, Tan, Landwehr, Segev, and Khan teaches the limitations of claim 3. Khiari, in the same field of machine learning model selection, teaches the following limitations which Sarkar as modified fails to teach:
The method of claim 3, wherein identifying the leaf node for the test data point comprises identifying a plurality of leaf nodes for a test data sample of test data points, and the method further comprising: (Khiari, abstract , ” Each meta-decision tree is learned on a different data bootstrap sample, and, given a new example, selects a suitable base model that computes a prediction”, for each test example, a leaf node is routed in each decision tree in order to derive a set of base models to select. Therefore a plurality of leaf nodes are identified for a given test data point.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Sarkar, Rao, Tan, Landwehr, Segev, and Khan with the teachings disclosed by Khiari (i.e. meta-learning ensemble of decision trees). A motivation for the combination is to identify regions of input space where models outpeprform, (Khiari, page 4, section 3.1, “However, even a model that is weak in the whole input space may be strong in some subregion. In our approach we rely on classic tree-based isothetic boundaries to identify contexts (e.g. subregions of the input space) where some models may outperform others, and by using only strong experts within each context, we improve the overall model.”).
Gomes, in the same field of machine learning ensemble learning, teaches the following limitations which the previous prior art fails to teach:
ranking the plurality of batches of training data based on a spatial nearness of each batch to the test data sample including applying a Borda Count algorithm resulting in a spatial score of each batch, wherein the test data is processed by the ML model associated with a highest ranking batch (Gomes, page 23:11, paragraph 1, “For example, if the base learner prediction is a sorted list of class labels, then the Borda count method can be used. Borda count [de Borda 1781] is a preferential voting system introduced in 1770 by Jean Charles de Borda. In ensemble learning, the overall decision when using Borda count is the class label with the highest rank sum. An example of batch ensemble that uses rank-based voting (Borda count) is the Nearest Neighbor Ensemble [Domeniconi and Yan 2004].”, a spatial score derived from the Borda Count algorithm is calculated and ranked with other scores.).  
It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Sarkar, Rao, Khiari, Tan, Landwehr, Segev, and Khan with the teachings disclosed by Gomes (i.e. using the Borda Count algorithm to calculate spatial scores). A motivation for the combination is to have a classification method where multiple labels or classes are needed to be predicted, (Gomes, page 23:11, paragraph 1, “In situations where the base learner can output more than one class label per prediction, a voting method, which is similar to the weighted majority approach, can be used to combine all predictions. For example, if the base learner prediction is a sorted list of class labels, then the Borda count method can be used. Borda count [de Borda 1781] is a preferential voting system introduced in 1770 by Jean Charles de Borda. In ensemble learning, the overall decision when using Borda count is the class label with the highest rank sum. An example of batch ensemble that uses rank-based voting (Borda count) is the Nearest Neighbor Ensemble [Domeniconi and Yan 2004].”, a spatial score derived from the Borda Count algorithm is calculated and ranked with other scores.).
Claim 13 has limitations substantially similar to claim 4 and as such a similar analysis applies.
Claim 16: Sarkar, Rao, Khiari, Tan, Landwehr, Segev, and Khan teaches the limitations of claim 3. Gomes, in the same field of machine learning ensemble learning, teaches the following limitations which the previous prior art fails to teach:
The one or more computer-storage memory devices of claim 15, wherein the offline training pipeline is further configured for: ranking the plurality of batches of training data based on a spatial nearness of each batch to a test data sample of test data including applying a Borda Count algorithm resulting in a spatial score of each batch, and wherein the test data is processed by the ML model associated with the highest ranking batch (Gomes, page 23:11, paragraph 1, “For example, if the base learner prediction is a sorted list of class labels, then the Borda count method can be used. Borda count [de Borda 1781] is a preferential voting system introduced in 1770 by Jean Charles de Borda. In ensemble learning, the overall decision when using Borda count is the class label with the highest rank sum. An example of batch ensemble that uses rank-based voting (Borda count) is the Nearest Neighbor Ensemble [Domeniconi and Yan 2004].”, a spatial score derived from the Borda Count algorithm is calculated and ranked with other scores.)
Khiari further teaches:
wherein identifying the leaf node for the test data point comprises identifying a plurality of leaf nodes for the test data sample, (Khiari, abstract , ” Each meta-decision tree is learned on a different data bootstrap sample, and, given a new example, selects a suitable base model that computes a prediction”, for each test example, a leaf node is routed in each decision tree in order to derive a set of base models to select. Therefore a plurality of leaf nodes are identified for a given test data point.)
Claims 5-8, 14, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Sarkar in view of Rao and in further view of Tan, Landwehr, Segev, Khan, Khiari, and Gomes, as applied to claims 4, 13, and 16 above, and further in view of Hier et al. (Hier, D. B., Kopel, J., Brint, S. U., Wunsch, D. C., Olbricht, G. R., Azizi, S., & Allen, B. (2020). Evaluation of standard and semantically-augmented distance metrics for neurology patients. BMC medical informatics and decision making, 20, 1-15.), hereafter referred to as Hier.

Claim 5: Sarkar, Rao, Tan, Landwehr, Segev, Khan, Khiari, and Gomes teaches the limitations of claim 4. Hier teaches the following limitations that the above prior art fails to teach:
The method of claim 4, further comprising: caching the spatial scores of each batch to the test data sample in a lookup table; (Hier, page 4, col. 1, paragraph 1, “Based on eq. (7), the dist (a, b) for each inter-concept distance was stored as a nxn lookup table where the number of possible concepts was n = 1204.”, distance or spatial scores are stored in a lookup table.)
And based on samples of incoming test data matching the test data sample in the lookup table, retrieving the test data sample spatial score (Hier, page 4, col. 1, paragraph 1, “Values from this lookup table were used in eqs. (5) and (6) to iteratively find the minimum inter-concept distance for each concept from patient A compared to the concepts in patient B. Cosine distances between patients (1 – cosine similarity) were calculated by standard methods (eq. 8). If patient A and patient B are represented as vectors of findings from a1 to an and from b1 to bn, the vector is binarized, so that ai or bi is 1 if the finding is present and 0 if the finding is absent”, the new patients spatial score is calculated using the lookup table to determine how close or far the new patient is from other patient in a semantic space.).  
Hier is analogous to the current invention because they provide systems and methods for computing storing machine learning clusters in a lookup table data structure. It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Sarkar, Rao, Khiari, Tan, Landwehr, Segev, Khan, and Gomes with the teachings disclosed by Hier (i.e. storing spatial scores of clusters in a lookup table). A motivation for the combination is to have a data structure for fast storage and retrieval of scores to avoid the handling and costs of re-computation, (Hier, page 4, col. 1, paragraph 1, “Values from this lookup table were used in eqs. (5) and (6) to iteratively find the minimum inter-concept distance for each concept from patient A compared to the concepts in patient B. Cosine distances between patients (1 – cosine similarity) were calculated by standard methods (eq. 8). If patient A and patient B are represented as vectors of findings from a1 to an and from b1 to bn, the vector is binarized, so that ai or bi is 1 if the finding is present and 0 if the finding is absent”, storing spatial scores in a lookup table avoids recalculation during comparisons, which saves computational resources.).
Claim 6: Sarkar, Rao, Khiari, Tan, Landwehr, Segev, Khan, Gomes, and Hier teaches the limitations of claim 5. Khiari further teaches:
The method of claim 5, wherein training the decision tree on the plurality of batches of training data further comprises training a random forest on the plurality of batches of training data, the random forest being a random ensemble of decision trees (Khiari, page 7, section 6, “MetaBags uses meta-decision trees that perform on-demand selection of base learners at test time based on a series of innovative meta-features. These meta-decision trees are learned over data bootstrap samples, whereas the outputs of the selected models are combined by average.”, the meta-learning method for selecting a best ML model integrates multiple decision trees into an ensemble, where each tree contributes to the final prediction.
Page 4, section 3.2, “Bagging [4] is a popular ensemble learning technique. It consists of forming multiple d replicate datasets D (B) ⊂ D by drawning s << N examples from D at random, but with replacement, forming bootstrap samples. Next, d base models φ(xi , D (B) ) are learned with a selected method on each D (B) , and the final prediction φA(xi) is obtained by averaging the predictions of all d base model”, the methods use of bootstrap sampling to create subsets of training data for the individual models introduce randomness by training each decision tree on a unique, randomly sampled subset of the training data. This ensemble of decision trees with bootstrap sampling is analogous to a random forest model, since bootstrap sampling is a defining characteristic of random forests.)
Claim 7: Sarkar, Rao, Khiari, Tan, Landwehr, Segev, Khan, Gomes, and Hier teaches the limitations of claim 6. Sarkar further teaches:
The method of claim 6, wherein batching the training data into the plurality of batches is based on time, content, data type, user profiles, or applications (Sarkar, paragraph 67, “In some cases, the retraining request may identify new data element 302 (and/or a batch of data elements of a similar data type and/or ingestion period) for preparation and inclusion in the training data set for a next iteration of model trainer 332 to generate a new (retrained) model instance.”, the batching of training data is based on data type or ingestion period (time))
Claim 8: Sarkar, Rao, Khiari, Tan, Landwehr, Segev, Khan, Gomes, and Hier teaches the limitations of claim 7. Khiari further teaches:
The method of claim 7, further comprising: profiling the plurality of ML models offline with an offline training pipeline and the training data; (Khiari, page 5, col. 2, paragraph 1, “To generate this set of meta-features, we start by creating one landmarking model per each available method over the entire training set. Then, we design a small artificial neighborhood of size ψ of each training example xi as X ′ i = {x ′ i,1, x ′ i,2..x ′ i,ψ } by perturbing xi with gaussian noise as follows”, the profiling of different ML models includes organizing different meta-features of each model.
Page 4, section 3.2, “Bagging [4] is a popular ensemble learning technique. It consists of forming multiple d replicate datasets D (B) ⊂ D by drawning s << N examples from D at random, but with replacement, forming bootstrap samples. Next, d base models φ(xi , D (B) ) are learned with a selected method on each D (B) , and the final prediction φA(xi) is obtained by averaging the predictions of all d base model”, the training process relies solely on pre-existing training data, which is manipulated and resampled offline. No online or real-time streams of data are involved in the training.)
And filtering each data point of the incoming test data including matching a data point of the incoming test data with the closest ML model of the plurality of ML models for the data point (Khiari, page 4, section 3.1.1, paragraph 3, “we aim to build a classification tree that, for a given instance x and its supporting meta-features {z1, . . . , zQ }, dynamically selects the expert that should be chosen for prediction, i.e., Fˆ(x, z1, . . . , zQ ; ˆ f1, . . . , ˆ fM ) = ˆ fj(x).”, the meta-learning model of Khiari dynamically matches incoming test examples (x) by comparing its features with existing examples to determine a best expert (an expert is analogous to a ML model as described in the abstract))

Claims 14 and 17 has limitations substantially similar to claim 5 and as such a similar analysis applies.
Claim 18 has limitations substantially similar to claim 6 and as such a similar analysis applies.
Claim 19 has limitations substantially similar to claim 7 and as such a similar analysis applies.

Claims 9, 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Sarkar in view of Rao, Tan, Landwehr, Segev, Khan, Khiari, Gomes, Hier, as applied to claims 5-8, 14, and 17-19 above, and further in view of Hazelwood et al. (Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., ... & Wang, X. (2018, February). Applied machine learning at facebook: A datacenter infrastructure perspective. In 2018 IEEE international symposium on high performance computer architecture (HPCA) (pp. 620-629). IEEE.), hereafter referred to as Hazelwood.

Claim 9: Sarkar, Rao, Khiari, Tan, Landwehr, Segev, Khan, Gomes, and Hier teaches the limitations of claim 8. Hazelwood teaches the following limitations which the above prior art fails to teach:
The method of claim 8, wherein the similarity metric is calculated in a data center (Hazelwood, page 624, col. 1, paragraph 3, “Big Basin is the successor to our earlier Big Sur GPU server, which was the first widely deployed, high-performance AI compute platform in our data centers, designed to support NVIDIA M40 GPUs, which was developed in 2015 and released via the Open Compute Project. Compared with Big Sur, the newer V100 Big Basin platform enables much better gains on performance per watt, benefiting from single-precision floating-point arithmetic per GPU increasing from 7 teraflops to 15.7 teraflops, and high-bandwidth memory (HBM2) providing 900 GB/s bandwidth (3.1x of Big Sur). Half-precision was also doubled with this new architecture to further improve throughput. Big Basin can train models that are 30 percent larger because of the availability of greater arithmetic throughput and a memory increase from 12 GB to 16 GB.”, Hazelwood outline’s the methods and infrastructure used to compute machine learning models in Facebook datacenters. A GPU server, Big Basin, is used in such data centers to perform model computations.
Page 622, col. 2, paragraph 3, “FBLearner Flow is Facebook’s machine learning platform for model training [8]. Flow is a pipeline management system that executes a workflow describing the steps to train and/or evaluate a model and the resources required to do so… Flow handles the scheduling and resource management to execute the workflow. Flow also has tooling for experiment management and a simple user interface which keeps track of all of the artifacts and metrics generated by each workflow execution or experiment.”, metrics (such as similarity metrics) during the machine learning process are calculated in the workflow which resides in the Facebook data centers.).  
Hazelwood is analogous to the current invention because they provide systems and methods for computing machine learning operations in data centers. It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Sarkar, Khiari, Tan, Landwehr, Segev, Khan, Gomes, and Hier, with the teachings disclosed by Hazelwood (i.e. computing machine learning metrics in a data center). A motivation for the combination is to have a machine learning calculations on scalable hardware, allowing for increased throughput in processing and improved data security through redundancy, (Hazelwood, page 620, col. 2, paragraph 1, “Diurnal load cycles leave a significant number of CPUs available for distributed training algorithms during off-peak periods. With Facebook’s compute fleet spread over ten datacenter locations, scale also provides disaster recovery capability. Disaster recovery planning is essential as timely delivery of new machine learning models is important to Facebook’s operations.”, leveraging off-peak CPU cycles over a distributed infrastructure (having multiple connected centers) allows Facebook to maximize resource utilization for its machine learning purposes.).
Claim 10: Sarkar, Rao, Khiari, Tan, Landwehr, Segev, Khan, Gomes, Hier, and Hazelwood teaches the limitations of claim 9. Khiari further teaches:
The method of claim 9, wherein: each ML model of the plurality of ML models is trained on a different set of training data (Khiari, page 4, section 3.2, “Bagging [4] is a popular ensemble learning technique. It consists of forming multiple d replicate datasets D (B) ⊂ D by drawning s << N examples from D at random, but with replacement, forming bootstrap samples. Next, d base models φ(xi , D (B) ) are learned with a selected method on each D (B.”, each base model of the plurality is trained on a unique bootstrap sample of the training data. Each base model is trained on a different subset of the data.)
Claim 20 has limitations substantially similar to claim 10 and as such a similar analysis applies.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Britto Jr, A. S., Sabourin, R., & Oliveira, L. E. (2014). Dynamic selection of classifiers—a comprehensive review. Pattern recognition, 47(11), 3665-3680.
Cruz, R. M., Sabourin, R., & Cavalcanti, G. D. (2014, August). On meta-learning for dynamic ensemble selection. In 2014 22nd international conference on pattern recognition (pp. 1230-1235). IEEE.
Jordan, M. I., & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural computation, 6(2), 181-214.
Todorovski, L., & Džeroski, S. (2003). Combining classifiers with meta decision trees. Machine learning, 50(3), 223-249.
Olston, C., Fiedel, N., Gorovoy, K., Harmsen, J., Lao, L., Li, F., ... & Soyke, J. (2017). Tensorflow-serving: Flexible, high-performance ml serving. arXiv preprint arXiv:1712.06139.
Baylor, D., Breck, E., Cheng, H.-T., et al. (2017). TFX: A TensorFlow-based production-scale machine learning platform. KDD ’17.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HYUNGJUN B YI whose telephone number is (703)756-4799. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/H.B.Y./Examiner, Art Unit 2124                                                                                                                                                                                                        
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

May 17, 2021
Application Filed
May 28, 2024
Non-Final Rejection — §102, §103
Jul 10, 2024
Interview Requested
Oct 04, 2024
Response Filed
Jan 09, 2025
Final Rejection — §102, §103
Feb 07, 2025
Interview Requested
Feb 19, 2025
Applicant Interview (Telephonic)
Feb 19, 2025
Examiner Interview Summary
Mar 11, 2025
Response after Non-Final Action
Jun 12, 2025
Notice of Allowance
Jun 12, 2025
Response after Non-Final Action
Aug 04, 2025
Response after Non-Final Action
Oct 08, 2025
Non-Final Rejection — §102, §103
Jan 16, 2026
Response Filed
Apr 03, 2026
Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/337,998
Patent 12536429
INTELLIGENTLY MODIFYING DIGITAL CALENDARS UTILIZING A GRAPH NEURAL NETWORK AND REINFORCEMENT LEARNING
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
18%
Grant Probability
49%
With Interview (+31.7%)
4y 7m
Median Time to Grant
High
PTA Risk
Based on 17 resolved cases by this examiner. Grant probability derived from career allow rate.
DATA DRIFT MITIGATION IN MACHINE LEARNING FOR LARGE-SCALE SYSTEMS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email