Last updated: April 19, 2026
Application No. 18/175,693
Method and Apparatus for Determining a Robustness of a Data-Based Sensor Model

Non-Final OA §101§102§103
Filed
Feb 28, 2023
Examiner
SHINE, NICHOLAS B
Art Unit
2126
Tech Center
2100 — Computer Architecture & Software
Assignee
Robert Bosch GmbH
OA Round
1 (Non-Final)
This examiner grants 38% of cases after interview

— +44.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 37 resolved cases, 2023–2026
Examiner Intelligence

SHINE, NICHOLAS B View full profile →
Grants only 38% of cases
Career Allow Rate
14 granted / 37 resolved
-17.2% vs TC avg
Strong +45% interview lift
Without
With
+44.6%
Interview Lift
resolved cases with interview
Typical timeline
5y 1m
Avg Prosecution
25 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
34.9%
-5.1% vs TC avg
§103
46.0%
+6.0% vs TC avg
§102
5.3%
-34.7% vs TC avg
§112
13.4%
-26.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 37 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
	This action is responsive to claims filed 02/28/2023.
	Claims 1–9 are pending for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/09/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered and attached by the examiner.

Claim Objections
Claims 1 and 2 are objected to because of the following informalities:  
Regarding claim 1, “a computer-implemented method for determining a degree of robustness for a robustness of a provided, trained, data-based sensor model for evaluating an input dataset” should be “a computer-implemented method for determining a degree of robustness 
Regarding claim 2, “a modified validation input dataset falls below a first threshold value specified by a first threshold” should be “a modified validation input dataset falls below a first threshold value specified by [[a]] the first threshold”.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title

Claims 1–9 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1:
Step 1 — Is the claim to a process, machine, manufacture, or composition of matter?
Yes, claim 1 is directed to a method i.e., a process. 
Step 2A — Prong 1 Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim recites an abstract idea. 
“determining a plurality of robust validation input datasets of the plurality of unlabeled validation input datasets that satisfy a first robustness criterion and/or a second robustness criterion”
“determining a proportion of the plurality of robust validation input datasets out of the plurality of unlabeled validation input datasets in order to obtain the degree of robustness”
These limitations, under their broadest reasonable interpretation, cover mental processes, concepts performed in the human mind (including an observation, evaluation, judgment, opinion). See MPEP 2106.04(a)(2). In particular, with the aid of pen and paper, a human can decide a set of data satisfies a criteria and decide that a portion of the sets of data obtains a degree of measure. 
Step 2A — Prong 2 — Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, there are no additional elements that integrate the judicial exception into a practical application. The additional elements:
“A computer-implemented method for determining a degree of robustness for a robustness of a provided, trained, data-based sensor model for evaluating an input dataset having at least one signal time series in order to determine a model output representing a change-point time, the method comprising” — This limitation is reciting only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished such that it amounts no more than mere instructions to apply. See MPEP 2106.05(f); See also Electric Power Group, LLC v. Alstom, S.A., 830 F.3d 1350, 1356, 119 USPQ2d 1739 (Fed. Cir. 2016).
“providing a plurality of unlabeled validation input datasets to the sensor model” — This limitation is insignificant extra-solution activity and is merely data gathering. See MPEP 2106.05(g)
Step 2B — Does the claim recite additional elements that amount to significantly more than the judicial exception?
No, there are no additional elements that amount to significantly more than the judicial exception.
“providing a plurality of unlabeled validation input datasets to the sensor model” — This limitation is directed to the activity of data gathering and outputting which is not an inventive concept because it is insignificant extra-solution activity of mere data gathering. See Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015); MPEP 2106.05(g)(3). This limitation is well-understood, routine, and conventional because it involves transmitting information over a network. MPEP 2106.05(d)(II). 
Regarding Claim 2:
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1 above). This claim merely recites a further limitation on the first robustness criterion limitation which is directed to an abstract idea that can be performed in the human mind. The additional limitations: 
“the first robustness criterion indicates that a corresponding unlabeled validation input dataset is robust when a distance between a first model output of the sensor model for the corresponding unlabeled validation input dataset and a second model output of the sensor model for a modified validation input dataset falls below a first threshold value specified by a first threshold” — These limitations are merely a continuation of the abstract idea in claim 1. Under their broadest reasonable interpretation, they cover mental processes, concepts performed in the human mind (including an observation, evaluation, judgment, opinion). See MPEP 2106.04(a)(2). In particular, with the aid of pen and paper, a human can decide a set of data satisfies a criteria and that criteria can indicate that a distance between model outputs falls below a threshold.
“the modified validation input dataset corresponds to a temporal shift of the signal time series in the validation input dataset through an element-wise shift” — These limitations are merely a continuation of the abstract idea in claim 1. Under their broadest reasonable interpretation, they cover mental processes, concepts performed in the human mind (including an observation, evaluation, judgment, opinion). See MPEP 2106.04(a)(2). In particular, with the aid of pen and paper, a human can decide a set of data satisfies a criteria and that criteria can indicate that a distance between model outputs falls below a threshold corresponding to a shift in time.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d)I.), failing Step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 3:
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 2 which included an abstract idea (see rejection for claim 2 above). This claim merely recites a further limitation on the second robustness criterion limitation which is directed to an abstract idea that can be performed in the human mind. The additional limitations: 
“the second robustness criterion indicates that a corresponding unlabeled validation input dataset is robust when a maximum distance between a minimum threshold value or maximum threshold value of the first model output and a minimum threshold value or maximum threshold value of the second model output falls below a second threshold value specified by a second threshold” — These limitations are merely a continuation of the abstract idea in claim 2. Under their broadest reasonable interpretation, they cover mental processes, concepts performed in the human mind (including an observation, evaluation, judgment, opinion). See MPEP 2106.04(a)(2). In particular, with aid of pen of paper, a human can decide a set of data satisfies a criteria and that criteria can indicate that a distance between model outputs falls below a threshold corresponding to a shift in time.
“the minimum or maximum threshold value results from a distribution of model outputs from a specified epsilon environment of the corresponding unlabeled validation input dataset and the modified validation input dataset by sampling from the epsilon environment of the corresponding unlabeled validation input dataset or the modified validation input” — These limitations are merely a continuation of the abstract idea in claim 2. Under their broadest reasonable interpretation, they cover mental processes, concepts performed in the human mind (including an observation, evaluation, judgment, opinion). See MPEP 2106.04(a)(2). In particular, with aid of pen of paper, a human can decide a set of data satisfies a criteria and that criteria can indicate that a distance between model outputs falls below a threshold where the threshold has been derived from model output relationships.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d)I.), failing Step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 4:
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 2 which included an abstract idea (see rejection for claim 2 above). This claim merely recites a further limitation on the first and second robustness criterion limitations which are directed to an abstract idea that can be performed in the human mind. The additional limitations: 
“wherein the first threshold value and the second threshold value, respectively, consider or depend on the temporal shift of the signal time series of the corresponding unlabeled validation input dataset for creating the modified validation input dataset” — These limitations are merely a continuation of the abstract idea in claim 2. Under their broadest reasonable interpretation, they cover  mental processes, concepts performed in the human mind (including an observation, evaluation, judgment, opinion). See MPEP 2106.04(a)(2). In particular, with aid of pen of paper, a human can decide a set of data satisfies a criteria and that criteria can indicate that a distance between model outputs falls below a threshold corresponding to a shift in time.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d)I.), failing Step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding Claim 5:
Step 1 — Is the claim to a process, machine, manufacture, or composition of matter?
Yes, claim 5 depends from claim 1 (see analysis of claim 1 above) which is directed to a method i.e., a process. 
Step 2A — Prong 1 Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim recites an abstract idea. 
“the second robustness criterion indicates that a corresponding unlabeled validation input dataset is robust when a maximum distance between a minimum threshold value or maximum threshold value of a first model output of a first model evaluation for the corresponding unlabeled validation input dataset and a minimum threshold value or the maximum threshold value of a second model evaluation for the relevant validation input dataset falls below a second threshold value specified by a second threshold, and the at least one signal time series of the validation input dataset is enlarged by a predetermined number of elements”
These limitations, under their broadest reasonable interpretation, cover mental processes, concepts performed in the human mind (including an observation, evaluation, judgment, opinion). See MPEP 2106.04(a)(2). In particular, with the aid of pen and paper, a human can decide a set of data satisfies a criteria and that criteria can indicate that a distance between model outputs falls below a threshold indicating that a dataset has been added to by a predetermined number. Examiner notes these limitations are merely a continuation of the abstract idea in claim 1, adding additional passive limitations that limit the types of indications that are determined from criterion similar to what claims 2–3 are doing.
Step 2A — Prong 2 — Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, there are no additional elements that integrate the judicial exception into a practical application. The additional limitation:
“the trained sensor model includes a deep neural network comprising multiple neuron layers with neurons that are calibrated using model parameters” — This limitation is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. See MPEP 2106.05(f).
“the sensor model is configured as an input neuron layer having a number of additional neurons, such that the input neuron layer has a number of elements corresponding to the number of elements of the corresponding unlabeled validation input dataset” — This limitation amounts to no more than mere instructions to apply the exception and is the equivalent to mere instruction to implement the abstract idea on a computer. See MPEP 2106.05(f). Configuring neural networks for inputs merely invokes computers or other machinery as a tool to perform an existing process.
“the second model evaluation occurs by shifting the model parameters of the neurons in the input neuron layer” — This limitation amounts to no more than mere instructions to apply the exception and is the equivalent to mere instruction to implement the abstract idea on a computer. See MPEP 2106.05(f). Shifting parameters in a neural network merely invokes computers or other machinery as a tool to perform an existing process. 
Step 2B — Does the claim recite additional elements that amount to significantly more than the judicial exception?
No, there are no additional elements that amount to significantly more than the judicial exception. 
Regarding Claim 6:
Step 1 — Is the claim to a process, machine, manufacture, or composition of matter?
Yes, claim 6 depends from claim 2 (see analysis of claim 2 above) which is directed to a method i.e., a process. 
Step 2A — Prong 1 Does the claim recite an abstract idea, law of nature, or natural phenomenon?
	Yes, the claim recites an abstract idea. 
“wherein the distance is determined using an L2 standard, L-infinity standard, or as a difference between the corresponding change-point times represented by the model outputs”
These limitations, under their broadest reasonable interpretation, cover mathematical concepts (including mathematical relationships, mathematical formulas or equations, mathematical calculations). See MPEP 2106.04(a)(2). In particular, the above distance is determined step involves using an L2 standard (wherein the BRI of L2 standard is any regularization technique that adds a penalty term to the loss function (i.e., subtraction step added to a math formula) to prevent overfitting by shrinking weights), L-infinity standard (wherein the BRI of L-infinity standard is any technique that measures the maximum absolute value of a vector's components), or the difference between outputs (i.e., subtraction). Therefore, these limitations cover mathematical relationships, formulas or equations, and calculations.
Step 2A — Prong 2 — Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, there are no additional elements that integrate the judicial exception into a practical application. 
Step 2B — Does the claim recite additional elements that amount to significantly more than the judicial exception?
No, there are no additional elements that amount to significantly more than the judicial exception.
Regarding claim 7:
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1 above). This claim merely recites a further limitation on the method limitation which is directed to only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished such that it amounts no more than mere instructions to apply. The additional limitations:
“wherein an apparatus” — This limitation is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. See MPEP 2106.05(f). 
“is configured to carry out the method” — This limitation amounts to no more than mere instructions to apply the exception and is the equivalent to mere instruction to implement the abstract idea on a computer. See MPEP 2106.05(f). Configuring an apparatus to perform a method merely invokes computers or other machinery as a tool to perform an existing process.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d)I.), failing Step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding claim 8:
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1 above). This claim merely recites a further limitation on the method limitation which is directed to only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished such that it amounts no more than mere instructions to apply. The additional limitations:
“wherein a computer program product comprises instructions which” — This limitation is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. See MPEP 2106.05(f).
“when the computer program product is executed by a computer, prompt the computer to perform the method” — This limitation amounts to no more than mere instructions to apply the exception and is the equivalent to mere instruction to implement the abstract idea on a computer. See MPEP 2106.05(f). A computer program with instructions to perform a method merely invokes computers or other machinery as a tool to perform an existing process.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d)I.), failing Step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.
Regarding claim 9:
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1 above). This claim merely recites a further limitation on the method limitation which is directed to only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished such that it amounts no more than mere instructions to apply. The additional limitations:
“wherein a non-transitory machine-readable storage medium comprises” — This limitation is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. See MPEP 2106.05(f).
“instructions that, when executed by a computer, prompt the computer to carry out the method” — This limitation amounts to no more than mere instructions to apply the exception and is the equivalent to mere instruction to implement the abstract idea on a computer. See MPEP 2106.05(f). A storage medium with instructions to perform a method merely invokes computers or other machinery as a tool to perform an existing process.
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d)I.), failing Step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1–2, 4–5, and 7–9 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Schmidt et al., (US 20200104200 A1), hereinafter “Schmidt”.
Regarding claim 1, Schmidt teaches:
a computer-implemented method for determining a degree of robustness for a robustness of a provided, trained, data-based sensor model for evaluating an input dataset having at least one signal time series in order to determine a model output representing a change-point time, the method comprising (Schmidt ¶Abstract: “Techniques are described herein for predicting disk drive failure using a machine learning model. The framework involves receiving disk drive sensor attributes as training data, preprocessing the training data to select a set of enhanced feature sequences, and using the enhanced feature sequences to train a machine learning model to predict disk drive failures from disk drive sensor monitoring data. Prior to the training phase, the RNN LSTM model is tuned using a set of predefined hyper-parameters. The preprocessing, which is performed during the training and evaluation phase as well as later during the prediction phase, involves using predefined values for a set of parameters to generate the set of enhanced sequences from raw sensor reading. The enhanced feature sequences are generated to maintain a desired healthy/failed disk ratio, and only use samples leading up to a last-valid-time sample in order to honor a pre-specified heads-up-period alert requirement”—[wherein the trained machine learning model predicts disk drive failure (i.e., a degree of robustness) based on sensor data readings (i.e., time series signals) based on samples leading up to a last-valid-time sample (i.e., a change-point time)]):  
providing a plurality of unlabeled validation input datasets to the sensor model (Schmidt Fig. 4, ¶0062: “At step 402, raw sensor readings are received from disk drive sensors. These readings may represent disk drive attribute values generated by disk drive sensors that are monitoring the disk drives. Sets raw data readings are used to form one or more training data set, test sets as well as validation sets for training a RNN LSTM deep learning model”—[wherein raw sensor readings are unlabeled datasets]); 
determining a plurality of robust validation input datasets of the plurality of unlabeled validation input datasets that satisfy a first robustness criterion and/or a second robustness criterion (Schmidt Fig. 4, ¶¶0061–0068, 0108–0110: “In step 408, the raw sensor readings received in step 402 are preprocessed based on the preprocessing specifications received in step 406 to generate the preprocessed sequence training data that will be used to train the deep learning model … The trained model is evaluated using the preprocessed test and validation data sets” and “gradient of the error … vanishes beneath a threshold … Model training may be supervised or unsupervised”—[wherein during training, either supervised or unsupervised, sequence training data is generated (i.e., determined from unlabeled validation input datasets) by based on a gradient of error and/or a threshold (i.e., a first and/or second criterion)]). 
determining a proportion of the plurality of robust validation input datasets out of the plurality of unlabeled validation input datasets in order to obtain the degree of robustness (Schmidt Fig. 5, ¶¶0071–0072: “During the training phase, the output of the Preprocessing Module 503 is fed to the RNN LSTM Training Module 505 for training the RNN LSTM deep learning model. Prior to training, the Training Module 505 tunes the RNN LSTM model using hyper-parameter specifications 504 that may be provided by a user to the System 500. The Model Evaluation Module 506 is responsible for testing and validating the RNN LSTM model and establishing the trained RNN LSTM model that will then be used for analyzing the disk drive sensor data. Once the RNN LSTM model has been trained, the Analysis and Prediction Module 507 will receive the preprocessed input data from disk drive sensor reading, and analyze the data using the trained RNN LSTM model and provide as output, predictions regarding impending disk failures”—[wherein after the machine learning model is trained on the unlabeled datasets, evaluation datasets are used to determine datasets that best train the model to make predictions regarding impending disk failures (i.e., determine a proportion of the input datasets to obtain the degree of robustness)]).
Regarding claim 2, Schmidt teaches all the limitations of claim 1.
Schmidt teaches:
the first robustness criterion indicates that a corresponding unlabeled validation input dataset is robust when a distance between a first model output of the sensor model for the corresponding unlabeled validation input dataset and a second model output of the sensor model for a modified validation input dataset falls below a first threshold value specified by a first threshold (Schmidt ¶¶0107–0108: “An ANN's output may be more or less correct. For example, an ANN that recognizes letters may mistake a I as an L because those letters have similar features. Correct output may have particular value(s), while actual output may have somewhat different values. The arithmetic or geometric difference between correct and actual outputs may be measured as error according to a loss function, such that zero represents error free (i.e. completely accurate) behavior. For any edge in any layer, the difference between correct and actual outputs is a delta value … Backpropagation entails distributing the error backward through the layers of the ANN in varying amounts to all of the connection edges within the ANN … Gradient of an edge is calculated by multiplying the edge's error delta times the activation value of the upstream neuron. When the gradient is negative, the greater the magnitude of error contributed to the network by an edge, the more the edge's weight should be reduced, which is negative reinforcement … Training may cease when the error stabilizes (i.e. ceases to reduce) or vanishes beneath a threshold (i.e. approaches zero). Example mathematical formulae and techniques for feedforward multilayer perceptrons (MLP), including matrix operations and backpropagation, are taught in related reference “EXACT CALCULATION OF THE HESSIAN MATRIX FOR THE MULTI-LAYER PERCEPTRON,” by Christopher M. Bishop”—[(emphasis added) wherein the BRI of distance between a first model output and a second model output is simply a value indicating the difference (i.e., subtraction, see present disclosure ¶0051), and wherein the outputs of the trained model are evaluated for an error delta (i.e., distance) to determine a gradient for thresholding that helps the model stabilize (i.e., indicates robustness of the dataset)]), and 
the modified validation input dataset corresponds to a temporal shift of the signal time series in the validation input dataset through an element-wise shift (Schmidt ¶¶0112–0117: “One form of contextual encoding is graph embedding, which constructs and prunes (i.e. limits the extent of) a logical graph of (e.g. temporally or semantically) related events or records. The graph embedding may be used as a contextual encoding and input stimulus to an ANN … Hidden state (i.e. memory) is a powerful ANN enhancement for (especially temporal) sequence processing. Sequencing may facilitate prediction and operational anomaly detection, which can be important techniques. A recurrent neural network (RNN) is a stateful MLP that is arranged in topological steps that may operate more or less as stages of a processing pipeline … A sequence of inputs may be simultaneously or sequentially applied to respective steps of an RNN to cause analysis of the whole sequence. For each input in the sequence, the RNN predicts a next sequential input based on all previous inputs in the sequence … The way LSTM arranges neurons is different from how transistors are arranged in a flip flop, but a same theme of a few control gates that are specially arranged to be stateful is a goal shared by LSTM and digital logic”—[wherein the BRI of element-wise shift is any change in an output element’s value (see [present disclosure ¶¶0018, 0020), and when the model uses encoded contextual embeddings as input (i.e., modified validation input dataset) it corresponds to the sequential arrangement (i.e., temporal shift) of each element-wise neuron]).
Regarding claim 4, Schmidt teaches all the limitations of claim 2.
Schmidt teaches:
wherein the first threshold value and the second threshold value, respectively, consider or depend on the temporal shift of the signal time series of the corresponding unlabeled validation input dataset for creating the modified validation input dataset (Schmidt ¶¶0114–0116: “The other input is an output of the adjacent previous step that may embed details from some or all previous steps, which achieves sequential history (i.e. temporal context). The other output is a predicted next item in the sequence. Example mathematical formulae and techniques for RNNs and LSTM are taught in related U.S. patent application Ser. No. 15/347,501, entitled “MEMORY CELL UNIT AND RECURRENT NEURAL NETWORK INCLUDING MULTIPLE MEMORY CELL UNITS”; see also Schmidt claim 1: “A method for predicting multivariate time-series data and identifying anomalies in the time-series data, comprising: receiving input data for a disk drive, wherein the input data comprises time-stamped sensor attribute values from sensors monitoring the disk drive; automatically preprocessing the input data to generate one or more enhanced feature sequences, said one or more enhanced feature sequences including values generated by applying statistical functions to said input data; providing the one or more enhanced feature sequences to a trained machine learning model; and receiving, from the trained machine learning model, predictions regarding impending failures in the disk drive in threshold period of time in the future; wherein the method is performed by one or more computing devices”—[wherein the threshold period of time in the future depends and/or considers the temporal shift from the training data sequences]).
Regarding claim 5, Schmidt teaches all the limitations of claim 1.
Schmidt teaches:
the trained sensor model includes a deep neural network comprising multiple neuron layers with neurons that are calibrated using model parameters  (Schmidt ¶¶0029–0038: “The terms machine learning and deep learning are both used interchangeably in this description … Some examples of machine learning and deep learning models that are used for time-series prediction are … A feedforward neural network with multiple fully connected layers used for supervised learning. The network is trained by back-propagating errors to update the weights (connections) between neurons based on the prediction of normal or anomalous behavior relative to the actual classification”), and 
the second robustness criterion indicates that a corresponding unlabeled validation input dataset is robust when a maximum distance between a minimum threshold value or maximum threshold value of a first model output of a first model evaluation for the corresponding unlabeled validation input dataset and a minimum threshold value or the maximum threshold value of a second model evaluation for the relevant validation input dataset falls below a second threshold value specified by a second threshold  (Schmidt ¶0053: “Some embodiments described herein include generating the enhanced sequence by performing the enhanced feature addition to the time series sensor data. In the pseudo-code depicting the Preprocessing with Enhanced Feature Sequence Creation Algorithm, the add_enhanced_features routine implements this functionality. The input to this routine is a sample, which consists of a sequence that belongs to single disk. On a given sample, the enhance function, which is a window function, is applied. A window function involves a function that is zero-valued outside of an interval. As just noted, the applied enhancements may be any of (i) simple moving averages, (ii) exponential moving averages, (iii) statistical variance, maximum, minimum, standard deviation, etc. Applying a window function to an enhancement may be, for example, deriving simple moving average values for just a specified time-interval of the sensor data. Furthermore, the enhance function is applied multiple times with different window sizes, i.e., different time-intervals, defined by a specified parameter that is labeled: enhancement factor (efactor)”; see also Schmidt ¶¶0107–0108: “An ANN's output may be more or less correct. For example, an ANN that recognizes letters may mistake a I as an L because those letters have similar features. Correct output may have particular value(s), while actual output may have somewhat different values. The arithmetic or geometric difference between correct and actual outputs may be measured as error according to a loss function, such that zero represents error free (i.e. completely accurate) behavior. For any edge in any layer, the difference between correct and actual outputs is a delta value … Backpropagation entails distributing the error backward through the layers of the ANN in varying amounts to all of the connection edges within the ANN … Gradient of an edge is calculated by multiplying the edge's error delta times the activation value of the upstream neuron. When the gradient is negative, the greater the magnitude of error contributed to the network by an edge, the more the edge's weight should be reduced, which is negative reinforcement … Training may cease when the error stabilizes (i.e. ceases to reduce) or vanishes beneath a threshold (i.e. approaches zero). Example mathematical formulae and techniques for feedforward multilayer perceptrons (MLP), including matrix operations and backpropagation, are taught in related reference “EXACT CALCULATION OF THE HESSIAN MATRIX FOR THE MULTI-LAYER PERCEPTRON,” by Christopher M. Bishop”—[(emphasis added) wherein the outputs of the trained model are evaluated for an error delta (i.e., distance) to determine a gradient for thresholding that helps the model stabilize (i.e., indicates robustness of the dataset), and wherein the different time windows represent different thresholds for input data (i.e., more than one threshold value)]), and 
the at least one signal time series of the validation input dataset is enlarged by a predetermined number of elements  (Schmidt ¶0053–0054: “Applying a window function to an enhancement may be, for example, deriving simple moving average values for just a specified time-interval of the sensor data. Furthermore, the enhance function is applied multiple times with different window sizes, i.e., different time-intervals, defined by a specified parameter that is labeled: enhancement factor (efactor) … This sequence increase is defined by the parameter: largewindow in the pseudo-code”—[wherein the sequence is increased by the predefined parameter]), 
the sensor model is configured as an input neuron layer having a number of additional neurons, such that the input neuron layer has a number of elements corresponding to the number of elements of the corresponding unlabeled validation input dataset  (Schmidt ¶0059, ¶¶0062–0063, ¶¶0075–0076, ¶¶0090–0096, ¶0105: “The GUI 600 may include a RNN/LSTM Hyper-Parameter Specification component 630 for specifying hyper-parameter specifications 504 as input to the RNN LTSM training module 505. Some of the hyper-parameters that may be specified include, without limitation, Number of Layers 631 and Number of Neurons per layer 632” and “Properties of matrices used to implement a neural network correspond neurons and edges. A cell in a matrix W represents a particular edge from a node in layer L−1 to L. An activation neuron represents an activation function for the layer that includes the activation function. An activation neuron in layer L corresponds to a row of weights in a matrix W for the edges between layer L and L−1 and a column of weights in matrix W for edges between layer L and L+1. During execution of a neural network, a neuron also corresponds to one or more activation values stored in matrix A for the layer and generated by an activation function”), and 
the second model evaluation occurs by shifting the model parameters of the neurons in the input neuron layer (Schmidt ¶0040: “ Multiple existing time-series analysis techniques, both statistical and machine learning based, have identified the benefits of using averages, or similar operations, for time-series forecasting and anomaly detection. For example, moving averages, weighted moving averages, exponential smoothing, or other filter-based techniques may be used to smooth time-series to evaluate statistical deviations of data points from the smoothed time-series or as filtered inputs to other models. Sequential probability ratio testing (SPRT) evaluates level shifts (changes in the average) of the residuals generated based on the time-series predictions. Seasonal trend decomposition techniques, such as Sequential Trend Decomposition using Loess (STL), use a form of moving averages to separate out the long-term trend in the time-series from the seasonal and residual components in the time-series. The trend information can then be used to determine if the time-series is stationary or generally increasing/decreasing”; see also Schmidt ¶0059: “Embodiments described herein present approaches using the LSTM RNN model for predicting disk failures. Table 1 shows a three-layer bi-directional LSTM design followed with an additional activation layer. When created sequences are input to the LSTM RNN, the input layer first applies masks sequences of un-aligned sizes, then applies activation operations. The bidirectional LSTM layers, train incoming sequences first in the input order, then also in reverse order to fully understand the context within the sequence. At each layer, there are a predefined number of neurons, which is a hyper-parameter that is presently tuned in embodiments described herein. Similarly, the number of layers are subject to hyper-parameter tuning”—[(emphasis added)]).
Regarding claim 7, Schmidt teaches all the limitations of claim 1.
Schmidt teaches:
wherein an apparatus is configured to carry out the method (Schmidt ¶0132).
Regarding claim 8, Schmidt teaches all the limitations of claim 1.
Schmidt teaches:
wherein a computer program product comprises instructions which, when the computer program product is executed by a computer, prompt the computer to perform the method (Schmidt ¶0069, ¶0132, ¶0136).
Regarding claim 9, Schmidt teaches all the limitations of claim 1.
Schmidt teaches:
wherein a non-transitory machine-readable storage medium comprises instructions that, when executed by a computer, prompt the computer to carry out the method (Schmidt ¶0133).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 3 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Schmidt (as applied above in the §102 rejections) in view of Freiesleben et al., ("The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples," Minds & Machines 32, 77–109 (2022). https://doi.org/10.1007/s11023-021-09580-9 (Year: 2021)), hereinafter “Freiesleben”.
Regarding claim 3, Schmidt teaches all the limitations of claim 2.
Schmidt teaches:
the second robustness criterion indicates that a corresponding unlabeled validation input dataset is robust when a maximum distance between a minimum threshold value or maximum threshold value of the first model output and a minimum threshold value or maximum threshold value of the second model output falls below a second threshold value specified by a second threshold (Schmidt ¶0046, ¶0053, : “preprocessing stage … creates enhanced sequences of data, and outputs the enhanced sequences. The output enhanced sequences, in turn, are the input sequences received by the machine learning model” and “In the pseudo-code depicting the Preprocessing with Enhanced Feature Sequence Creation Algorithm, the add_enhanced_features routine implements this functionality. The input to this routine is a sample, which consists of a sequence that belongs to single disk. On a given sample, the enhance function, which is a window function, is applied. A window function involves a function that is zero-valued outside of an interval. As just noted, the applied enhancements may be any of (i) simple moving averages, (ii) exponential moving averages, (iii) statistical variance, maximum, minimum, standard deviation, etc.”—[wherein the BRI of when a maximum distance between a minimum threshold value or maximum threshold value of the first model output and a minimum threshold value or maximum threshold value of the second model output is an absolute value, and wherein the applied enhancements may be any of (i) simple moving averages, (ii) exponential moving averages, (iii) statistical variance, maximum, minimum, standard deviation, etc.]).
Schmidt does not appear to explicitly teach:
the minimum or maximum threshold value results from a distribution of model outputs from a specified epsilon environment of the corresponding unlabeled validation input dataset and the modified validation input dataset by sampling from the epsilon environment of the corresponding unlabeled validation input dataset or the modified validation input.
However, Freiesleben teaches: 
the minimum or maximum threshold value results from a distribution of model outputs from a specified epsilon environment of the corresponding unlabeled validation input dataset and the modified validation input dataset by sampling from the epsilon environment of the corresponding unlabeled validation input dataset or the modified validation input (Freiesleben Pgs. 91–93: “Machine Learning Algorithms and Models assume we consider the relation of variables X ∶= X1 ×⋯× Xn and a (often one-dimensional) variable Y A (supervised) ML algorithm - is a procedure that based on a set of models M , a labeled training dataset DTr ∶= {(x1, y1),…, (xn, yn)} with n ∈ ℕ , some hyperparameters H , an optimization method O , and a loss function L outputs a model f ∈M . This procedure - intuitively speaking searches for a model f in the set M , using method O and hyperparameters H , that has a low prediction loss L on the training dataset DTr ... We can think of as a step away from x for which we cross a decision boundary of the model but stay within a local ε-environment around x … a counterfactual describes the shortest13 step that crosses a decision boundary. Notice that this closest vector does not have to be unique, there might exist a variety of vectors in equal distance … The second definitional difference we introduce is that counterfactuals must be maximally close data-points, while adversarials need only be within an ε-environment around the original input x”—[wherein the machine learning algorithm outputs values all derived within a local ε-environment around x (i.e., model outputs from a specified epsilon environment)]).
The methods of Schmidt, the teachings of Freiesleben, and the instant application are analogous art because they pertain to statistical modeling using machine learning models.
It would be obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the methods of Schmidt with the teachings of Freiesleben to provide meets and bounds for acceptable input datasets. One would be motivated to do so to increase the probability that the classifications produced by the model are accurate (Freiesleben Pg. 89: “We must look at image-classification models to answer why solutions to Eq. 1 are mostly misclassified in that scenario. Complex image classifiers perform reasonably well on training data and highly similar inputs. In “unseen regions”, on the other hand, they have to extrapolate and therefore perform worse. Since the input space is incredibly high-dimensional, the training data and therefore the data-manifold the algorithm approximates is comparably tiny. That means, there are many more meaningless, unrealistic, and unseen inputs than there are points in the training-data. The assignment of these inputs is not trustworthy and does not necessarily match the assignment of other nearby inputs. At the same time, there is usually a strongly limited number of classes that inputs are assigned to. Moreover, the training-data assigned to different classes have great distances. Hence, if we search for an input from another class but close to a given input, the probability is high that it is an input the algorithm has not seen, is unrealistic, or is meaningless and therefore where the algorithm is not reliable. Thus, the model will with high probability misclassify this input”).
Regarding claim 6, Schmidt teaches all the limitations of claim 2.
Schmidt teaches:
wherein the distance is determined using … an L-infinity standard or as a difference between the corresponding change-point times represented by the model outputs (Schmidt ¶¶0107–0108: “For any edge in any layer, the difference between correct and actual outputs is a delta value … Propagation of error causes adjustments to edge weights, which depends on the gradient of the error at each edge. Gradient of an edge is calculated by multiplying the edge's error delta times the activation value of the upstream neuron. When the gradient is negative, the greater the magnitude of error contributed to the network by an edge, the more the edge's weight should be reduced, which is negative reinforcement”; see also Schmidt ¶0056–0058: “Performing the feature enhancement as described has two benefits: Easier change-point detection by smoothened attributes: Applying an enhancement function such as a simple moving average has a significant impact on the accuracy of the LSTM RNN. This is because the functions reduce the effects of minor fluctuations on the sequence, smoothen the attribute values, and hence, make the attribute change point detection easier. Given that the change point differs across attributes, applying the same enhancement function with various window sizes helps significantly in determining the accurate change point for the various attributes.  Easier sequence length selection: Applying the enhancement functions involves applying window functions. Thus, performing feature enhancements enable the generated sequence to carry additional information, which is normally outside of the input time series sensor sequence. As a result, creating a representative sequence is less sensitive to the sequence length because it is less likely to miss out an important but-not-recent change in an attribute”—[(emphasis added) wherein the BRI of L-infinity standard is any technique that measures the maximum absolute value of a vector's components, and wherein the greater the magnitude of edge error (i.e., the L-infinity standard used to determine the delta) the more the edge's weight should be reduced]).
Schmidt does not appear to explicitly teach: 
wherein the distance is determined using an L2 standard, [L-infinity standard, or as a difference between the corresponding change-point times represented by the model outputs].
However, Freiesleben teaches: 
wherein the distance is determined using an L2 standard (Freiesleben Pg. 91, Pg. 99: “A (supervised) ML algorithm _ is a procedure that based on a set of models M , a labeled training dataset DTr ∶= {(x1, y1),…, (xn, yn)} with n ∈ ℕ , some hyperparameters H , an optimization method O , and a loss function L outputs a model f ∈M . This procedure _ intuitively speaking searches for a model f in the set M , using method O and hyperparameters H , that has a low prediction loss L on the training dataset DTr” and “In addition, we have highlighted similarities and differences between the two fields in terms of use cases, solution methods, and distance metrics”—[wherein the BRI of L2 standard is any regularization technique incorporating a penalty term to the loss function (i.e., subtraction step added to a math formula) to prevent overfitting by shrinking weights), and wherein the lowest prediction loss L is a distance metric determined by the loss function L after intuitively searching through the models]). 
The methods of Schmidt, the teachings of Freiesleben, and the instant application are analogous art because they pertain to statistical modeling using machine learning models.
It would be obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the methods of Schmidt with the teachings of Freiesleben to provide for regularization techniques including specific bounding techniques. One would be motivated to do so to increase the probability that the classifications produced by the model are accurate (Freiesleben Pg. 89: “We must look at image-classification models to answer why solutions to Eq. 1 are mostly misclassified in that scenario. Complex image classifiers perform reasonably well on training data and highly similar inputs. In “unseen regions”, on the other hand, they have to extrapolate and therefore perform worse. Since the input space is incredibly high-dimensional, the training data and therefore the data-manifold the algorithm approximates is comparably tiny. That means, there are many more meaningless, unrealistic, and unseen inputs than there are points in the training-data. The assignment of these inputs is not trustworthy and does not necessarily match the assignment of other nearby inputs. At the same time, there is usually a strongly limited number of classes that inputs are assigned to. Moreover, the training-data assigned to different classes have great distances. Hence, if we search for an input from another class but close to a given input, the probability is high that it is an input the algorithm has not seen, is unrealistic, or is meaningless and therefore where the algorithm is not reliable. Thus, the model will with high probability misclassify this input”).

Prior Art of Record
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Anderson et al., (“DISK DRIVE FAILURE PREDICTION WITH NEURAL NETWORKS”) discloses systems and methods for developing training data and validating datasets for machine learning predictions “Operational events associated with a target physical device can be detected for mitigation by implementing some aspects described herein. For example, a system can apply a sliding window to received sensor measurements at successive time intervals to generate a set of data windows. The system can determine a set of eigenvectors associated with the set of data windows by performing principal component analysis on a set of data points in the set of data windows. The system can determine a set of angle changes between pairs of eigenvectors. The system can generate a measurement profile by executing an integral transform on the set of angle changes. One or more trained machine-learning models are configured to detect an operational event associated with the target physical device based on the measurement profile and generate an output indicating the operational event”. Anderson Abstract.
Anisimov et al., (“Method of and system for evaluating consumption of visual information displayed to a user by analyzing user's eye tracking and bioresponse data”) discloses systems and methods for processing sequences of training data to make predictions “The analysis may consist of applying several steps of the training machine learning model to the calibrated target data. The EEG data is analyzed when the target recording session data contains an EEG channel and is analyzed as follows. It is understood that when no EEG data is available, the steps relating to the EEG analysis data may be omitted. The data recorded by the EEG may be considered as a sequence of overlapping time interval windows (window_1, window_2, . . . window_n), wherein the length of each window is similar to each other, is typically no less than 500 ms., and the time shift between the windows may be chosen from interval 0 ms to the value of the measurement of window length/2. Approximately 600 windows may be recorded during a 10 minutes recording of a target session. Each window_k consists of fragments of multichannel EEG time series, the number of channels is equal to the number of EEG channels provided by the EEG recording device. A feature extraction method based on fast Fourier transform (FFT) is applied to each window, generating a tensor of extracted features for each considered window of the EEG signal, producing a sequence (feat_1, feat_2, . . . feat_n). For each of the (window_1, window_2, . . . , window_n) there may be 256 tensors of the sequence (feat_1, feat_2, . . . feat_256), whereby each of the tensors of (feat_1, feat_2, . . . feat_256) are real numbers from 0 to 1000. These tensors are used as an input for the convolutional neural network of the training machine learning model, producing a sequence of the activation tensors (activation_1, activation_2, . . . , activation_k). In some examples, there may be approximately twice as many windows as activations, and the activations are floating point numbers, which are used to generate a sequence of EEG embeddings, such as (embedding_1, embedding_2, . . . , embedding_m) using a suitable pooling approach such as averaging or statistics pooling. Each (embedding_k) is a tensor of real values of a fixed dimension, for example, the dimension may be 100. For each (embedding_k), a (timestep t_k,start) and a (timesteps t_k,stop) are stored, representing specific time coordinates of the input data that was used to produce the (embedding_k)”. Anisimov ¶223.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS SHINE whose telephone number is (571)272-2512. The examiner can normally be reached M-F, 11a-7p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached on (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/N.B.S./Examiner, Art Unit 2126       

/VAN C MANG/Primary Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Feb 28, 2023
Application Filed
Dec 27, 2025
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/173,148
Patent 12579449
HYDROCARBON OIL FRACTION PREDICTION WHILE DRILLING
2y 5m to grant Granted Mar 17, 2026
17/213,958
Patent 12572440
AUTOMATICALLY DETECTING WORKLOAD TYPE-RELATED INFORMATION IN STORAGE SYSTEMS USING MACHINE LEARNING TECHNIQUES
2y 5m to grant Granted Mar 10, 2026
17/172,707
Patent 12561554
ERROR IDENTIFICATION FOR AN ARTIFICIAL NEURAL NETWORK
2y 5m to grant Granted Feb 24, 2026
17/103,827
Patent 12533800
TRAINING REINFORCEMENT LEARNING AGENTS TO LEARN FARSIGHTED BEHAVIORS BY PREDICTING IN LATENT SPACE
2y 5m to grant Granted Jan 27, 2026
17/183,870
Patent 12536428
KNOWLEDGE GRAPHS IN MACHINE LEARNING DECISION OPTIMIZATION
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
38%
Grant Probability
82%
With Interview (+44.6%)
5y 1m
Median Time to Grant
Low
PTA Risk
Based on 37 resolved cases by this examiner. Grant probability derived from career allow rate.