Last updated: April 19, 2026
Application No. 18/051,786
Data Set Distance Model Validation

Final Rejection §101§103
Filed
Nov 01, 2022
Examiner
MORALES, PEDRO JESUS
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Final)
Interview Optional

— +50.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 9 resolved cases, 2023–2026
Examiner Intelligence

MORALES, PEDRO JESUS View full profile →
Grants 67% — above average
Career Allow Rate
6 granted / 9 resolved
+11.7% vs TC avg
Strong +50% interview lift
Without
With
+50.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
20 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
26.9%
-13.1% vs TC avg
§103
40.4%
+0.4% vs TC avg
§102
13.5%
-26.5% vs TC avg
§112
17.1%
-22.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 9 resolved cases
Office Action

§101 §103
DETAILED ACTION
	This action is responsive to Applicant’s reply filed 03 December 2025. This action is made final. 

Status of the Claims
Claims 1, 5, 7-9, 12-13, 15-17 and 19 are currently amended. 
Claim status is currently pending and under examination for claims 1-20 of which independent claims are 1, 9, and 16.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s arguments regarding the claims being interpreted under 35 U.S.C. 112(f) are persuasive. Accordingly, each and every 112(b) rejection previously set forth in the Non-Final Office Action mailed September 11th 2025 are withdrawn, and the claims as amended are no longer interpreted under 35 U.S.C. 112(f). 
Applicant’s arguments regarding the art rejections are moot in view of the new grounds of rejection necessitated by applicant’s amendment. 
In regards to the rejection of claims 1-20 under 35 U.S.C. 101 for being directed towards an abstract idea without significantly more, Applicant argues the claims are not directed to a judicial exception but rather to an improvement on the technical problem of validating inferential models without using labeled data (See Applicant’s response, page 12). On Page 12, Applicant argues that the “extracting a first distribution”, “extracting a second distribution”, and “determining” steps cannot be practically performed in the human mind. Applicant’s arguments are not persuasive since the steps are recited at a high level, such that the steps are not required to have any specific level of complexity or execution requirements that would preclude the steps from being practically performed entirely in the human mind or with the use of a physical aid. On Page 12, Applicant argues “determining a distance between extracted distributions from inference and validating data sets and comparing the distance to a validity-determining criterion” provide a technical solution to the technical problem of validating inferential models without using labeled data. Applicant’s argument is not persuasive since the improvement of validating inferential models without using labeled data is not reflected in the claims. On Page 14, Applicant argues the claims provide an improvement to the problem of validating inferential models without labeled data. However, Applicant’s arguments are not persuasive since the improvements of enabling real-time deployment and reduced latency are not reflected in the claims. Thus, the rejections of claims 1-20 as being directed towards an abstract idea without significantly more are still maintained.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Independent Claims 1, 9, and 16
Step 2A Prong One: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, independent claim 1, under the broadest reasonable interpretation, recites the following limitations that are abstract ideas:
extracting a first distribution of values of a first parameter from the inference data set; (mental process)
extracting a second distribution of values of the first parameter from a validating data set previously used to validate, in a first validation operation, the trained inferential model; (mental process)
determining a first parameter distance between the extracted first distribution and the extracted second distribution by statistically measuring a similarity between the extracted first distribution and the extracted second distribution; (mental process)
and validating, in a second validating operation, the trained inferential model for operation on the inference data set based on satisfaction of a validation condition, the satisfaction of the validation condition being based on the determined first parameter distance (mental process)
The “extracting” steps involve identifying distributions from sets of data which amounts to no more than observations, evaluations, and judgments that can be performed in the human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the steps of extracting a distribution at a high degree of generality, thus the steps are not required to have any specific level of complexity that would preclude the steps from being mental processes. Therefore, the “extracting” steps are considered to be mental processes, see MPEP § 2106.04(a)(2)(III).
The “determining” step involves calculating a distance between two distributions which amounts to no more than observations, evaluations, and judgments that can be performed in the human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of determining a first parameter distance at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “determining” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III).
The “validating” step involves determining if an inferential model is suitable for operation based on the calculated distance which amounts to no more than observations, evaluations, and judgments that can be performed in the human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of validating the inferential model at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “validating” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III). 
Therefore, the independent claims recite a judicial exception. Independent claims 9 and 16 recite similar limitations corresponding to claim 1, therefore the same subject matter eligibility analysis is applied.
Step 2A Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the judicial exception recited above is not integrated into a practical application. The claims recite the following additional elements, but these additional elements are not sufficient to integrate the judicial exception into a practical application:
one or more hardware processors configured to execute instructions stored in memory; (claim 9) (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea)
and a data set distance model validator executable by the one or more hardware processors, the data set distance model validator including (claim 9) (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
one or more tangible processor-readable storage media embodied with instructions for executing on one or more processors and circuits of a computing device a process of validating an inferential model for operation on an inference data set, the process comprising (claim 16) (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea)
the trained inferential model being trained using a training data set different from the validating dataset;
The “and a data set distance model validator executable …” step is recited at a high-level of generality such that the limitation amounts to no more than mere instructions to “apply” the judicial exception on a computer. It can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f).
The “trained inferential model being trained …” step is recited at a high-level of generality such that the limitation amounts to no more than mere instructions to “apply” the judicial exception on a computer. It can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f).
The remaining additional elements are recited at a high-level of generality such that they amount to no more than mere instructions to “apply” an exception using a generic component. Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). 
Therefore, the above limitations do not integrate the judicial exception into a practical application. 
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? 
No. The claims do not include additional elements that are sufficient for the claims to amount to significantly more than the judicial exception. 
In regards to the “and a data set distance model validator executable …” and the “trained inferential model being trained …” steps and the remaining additional elements, the limitations are recited so generically such that they amount to no more than mere instructions to “apply” the judicial exception on a computer using generic computer components. Mere instructions to apply a judicial exception cannot provide an inventive concept. See MPEP § 2106.05(f). 
Therefore, independent claims 1, 9, and 16 are not patent eligible.

Dependent Claims 2-8, 10-15, and 17-20
The remaining dependent claims being rejected do not recite additional elements, whether considered individually or in combination, that are sufficient to integrate the judicial exception into a practical application or amount to significantly more than a judicial exception.
Dependent claim 2 recites the following limitations:
“wherein the inference data set is collected under a first condition represented in the inference data set and the validating data set is collected under a second condition represented in the validating data set, the first condition and the second condition at least partially differ” (MPEP § 2106.05(g) necessary data gathering and insignificant extra-solution activity to the judicial exception)
“and satisfaction of the validation condition is based on the first condition and the second condition” (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
The “inference data set is collected …” limitation represents mere necessary data gathering and is recited at a high level of generality, thus adding insignificant extra-solution activity to the judicial exception - see MPEP § 2106.05(g). The extra-solution activity is a well-understood, routine and conventional (WURC) activity per MPEP § 2106.05(d)(II), “the courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network, e.g., using the Internet to gather data.” The limitation does not integrate the judicial exception into a practical application and does not amount to significantly more.
The “satisfaction of the validation condition …” step is recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The limitation does not integrate the judicial exception into a practical application and does not amount to significantly more.
Dependent claims 3, 11, and 18 recite the following limitations:
Step 2A Prong One: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, dependent claim 3, under the broadest reasonable interpretation, recites the following limitations that are abstract ideas:
normalizing the determined first parameter distance to generate a normalized first parameter distance, the normalized first parameter distance normalized to a predetermined range of values; (mental process)
and applying a relative weight to the normalized first parameter distance to generate a weighted first parameter distance, the relative weight based on a predetermined correlative value of the first parameter relative to a second parameter represented in the inference data set and the validating data set, (mental process)
	The “normalizing” step involves scaling a distance to a range of values which amounts to no more than observations, evaluations, and judgments that can be performed in the human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of normalizing a first parameter distance at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “normalizing” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III). 
The “applying” step involves multiplying a distance with a weight which amounts to no more than observations, evaluations, and judgments that can be performed in the human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of applying a relative weight at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “applying” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III). 
Therefore, dependent claim 3 recites a judicial exception. Dependent claims 11 and 18 recite similar limitations corresponding to claim 3, therefore the same subject matter eligibility analysis is applied. 
Step 2A Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the judicial exception recited above is not integrated into a practical application. The claims recite the following additional elements, but these additional elements are not sufficient to integrate the judicial exception into a practical application:
wherein the satisfaction of the validation condition is based on the weighted first parameter distance (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
The “satisfaction of the validation condition …” step is recited at a high-level of generality such that the limitation amounts to no more than mere instructions to “apply” the judicial exception on a computer. It can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f).
Therefore, the above limitations do not integrate the judicial exception into a practical application. 
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? 
No. The claims do not include additional elements that are sufficient for the claims to amount to significantly more than the judicial exception. 
In regards to the “satisfaction of the validation condition …” step, the limitations are recited so generically such that they amount to no more than mere instructions to “apply” the judicial exception on a computer using generic computer components. Mere instructions to apply a judicial exception cannot provide an inventive concept. See MPEP § 2106.05(f). 
Therefore, dependent claims 3, 11, and 18 are not patent eligible.

Dependent claims 4, 12, and 19 recite the following limitations:
Step 2A Prong One: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, dependent claim 4, under the broadest reasonable interpretation, recites the following limitations that are abstract ideas:
extracting a third distribution of values of a second parameter from the inference data set; (mental process)
extracting a fourth distribution of values of the second parameter from the validating data set; (mental process)
determining a second parameter distance between the extracted third distribution and the extracted fourth distribution; (mental process)
and determining an aggregate distance based on the determined first parameter distance and the determined second parameter distance, (mental process)
The “extracting” steps involve identifying distributions from sets of data which amounts to no more than observations, evaluations, and judgments that can be performed in the human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the steps of extracting distributions at a high degree of generality, thus the steps are not required to have any specific level of complexity that would preclude the steps from being mental processes. Therefore, the “extracting” steps are considered to be mental processes, see MPEP § 2106.04(a)(2)(III).
The “determining a second parameter distance” step involves calculating a distance between two distributions which amounts to no more than observations, evaluations, and judgments that can be performed in the human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of determining a second parameter distance at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III).
The “determining an aggregate distance” step involves adding or combining two distances which amounts to no more than observations, evaluations, and judgments that can be performed in the human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of determining an aggregate distance at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III). 
Therefore, dependent claim 4 recites a judicial exception. Dependent claims 12 and 19 recite similar limitations corresponding to claim 4 therefore the same subject matter eligibility analysis is applied. 
Step 2A Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the judicial exception recited above is not integrated into a practical application. The claims recite the following additional elements, but these additional elements are not sufficient to integrate the judicial exception into a practical application:
wherein the satisfaction of the validation condition is based on the aggregate distance (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
wherein operation of the inferential model on the validating data set generates validated data results (claim 12) (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
The “wherein the satisfaction of the validation condition …” step is recited at a high-level of generality such that the limitation amounts to no more than mere instructions to “apply” the judicial exception on a computer. It can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f).
The “wherein operation of the inferential model …” step is recited at a high-level of generality such that the limitation amounts to no more than mere instructions to “apply” the judicial exception on a computer. It can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f).
Therefore, the above limitations do not integrate the judicial exception into a practical application. 
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? 
No. The claims do not include additional elements that are sufficient for the claims to amount to significantly more than the judicial exception. 
In regards to the “wherein the satisfaction of the validation condition …” and “wherein operation of the inferential model …” steps, the limitations are recited so generically such that they amount to no more than mere instructions to “apply” the judicial exception on a computer using generic computer components. Mere instructions to apply a judicial exception cannot provide an inventive concept. See MPEP § 2106.05(f). 
Therefore, dependent claims 4, 12, and 19 are not patent eligible.

Dependent claim 5 recites the following limitations:
wherein the first parameter represents output from the trained inferential model, (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
the extracted first distribution including a distribution of inference output parameter values output from the trained inferential model responsive to input of inference input data from the inference data set into the trained inferential model, (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
the extracted second distribution including a distribution of validating output parameter values output from the trained inferential model responsive to input of validating model input data from the validating data set into the trained inferential model (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
The “wherein the first parameter represents …”, “the extracted first distribution including …”, and “the extracted second distribution including …” steps are recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The limitations do not integrate the judicial exception into a practical application and do not amount to significantly more. 
Dependent claim 6 recites the following limitations:
wherein the first parameter represents a metadata parameter, (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
the extracted first distribution including a distribution of values of the metadata parameter from the inference data set, (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
and the extracted second distribution including a distribution of values of the metadata parameter of the validating data set (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
The “wherein the first parameter represents …”, “the extracted first distribution including …”, and “the extracted second distribution including …” steps are recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The limitations do not integrate the judicial exception into a practical application and do not amount to significantly more. 
Dependent claim 7 recites the following limitations:
wherein the first parameter represents a sensor data parameter, (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
the extracted first distribution including a distribution of values of the sensor data parameter from the inference data set, (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
and the extracted second distribution including a distribution of values of the sensor data parameter of validating model input data of the validating data set configured to be input into the trained inferential model (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
The “wherein the first parameter represents …”, “the extracted first distribution including …”, and “the extracted second distribution including …” steps are recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The limitations do not integrate the judicial exception into a practical application and do not amount to significantly more. 
Dependent claim 8 recites the following limitations:
wherein the first parameter includes a reduced representation parameter of raw sensor data output by a sensor, (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
the extracted first distribution including a distribution of values of the reduced representation parameter of raw sensor data of the inference data set, (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
the extracted second distribution including a distribution of values of the reduced representation parameter of raw sensor data of validating model input data of the validating data set configured to be input into the trained inferential model (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
The “wherein the first parameter includes…”, “the extracted first distribution including …”, and “the extracted second distribution including …” steps are recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The limitations do not integrate the judicial exception into a practical application and do not amount to significantly more. 
Dependent claims 10 and 17 recite similar limitations corresponding to claim 2, therefore the same subject matter eligibility analysis is applied.
Dependent claim 13 recites similar limitations corresponding to claim 5, therefore the same subject matter eligibility analysis is applied.
Dependent claim 14 recites similar limitations corresponding to claim 6, therefore the same subject matter eligibility analysis is applied.
Dependent claims 15 and 20 recite similar limitations corresponding to claim 7, therefore the same subject matter eligibility analysis is applied.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 6, 9-10, 14, 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Agarwal et al. (“Accountability in AI”), hereinafter Agarwal, in view of Sun et al. (“Double Window Concept Drift Detection Method Based on Sample Distribution Statistical Test”), hereinafter Sun.

With respect to claim 1, Agarwal teaches:
a method of validating a trained inferential model for operation on an inference data set, the method comprising (Agarwal discloses “to be able to monitor a model’s performance, the first and the foremost challenge is quantifying the model degradation. Identifying the parameters to track the model performance and defining the thresholds that if breached should raise an alert are fundamental components of model monitoring. This is important as it allows us to avoid spending more effort than required on retraining the models and yet ensure that we are not performing below the standards we have set for ourselves … it is imperative to have a process for detecting, alerting and addressing any kind of drift that may occur post go live. Eventually a model that exhibits any kind of degradation will need to be examined further either for recalibration, retraining or, in the worst case, replacement” (P. 124, ¶ 1-3).
Agarwal discloses “the data distribution of the live data shifts over a period of time. This is known as data drift (or model drift), and if left unattended, it leads to performance degradation of the model. …  Another common reason for the poor performance on the production environment is production skew – which is the difference in the model performance between training and production environments. Production skew can happen because of errors in training, bugs in the production environment or because the training data and the live data do not follow the same distribution” (P. 123, ¶2). 
The distributions of training data (‘inference data set’) and live data are compared for data shifts to determine if model retaining is required. Therefore, when comparing the training data set and live data distributions to determine if a shift has occurred, a trained model is validated on a training data set.): 
extracting a first distribution of values of a first parameter from the inference data set (Agarwal discloses a programming function used to calculate a covariate drift between a training data set (‘inference data set’) and a live data set (‘validating data set’) on P. 126 (reproduced below). The function obtains (‘extracts’) distributions from the training data set of variables (‘parameters’) that are present in both the training data set and the live data set. 

    PNG
    media_image1.png
    483
    1797
    media_image1.png
    Greyscale

Agarwal further discloses “each independent variable is binned to form i bins (commonly 20 equal bins) from both the actual distribution and live distribution taken together. Then, the shift in the variable contribution to each bin is calculated. The function below picks the common variables between the two datasets and calls the calculate_distance function to get the values for each variable” (P. 126, ¶2-3).); 
extracting a second distribution of values of the first parameter from a validating data set previously used to validate, in a first validation operation, the trained inferential model (Agarwal discloses the calculate_covariate_drift() programming function on P. 126 (reproduced above). The function obtains (‘extracts’) distributions from the live data set (‘validating data set’) of variables (‘parameters’) that are present in both the training data set and the live data set.
Agarwal discloses a live data set is used to detect drifts, “detecting drift requires continuous monitoring to gauge if the drift is one-time, sporadic or regular one. The data drift, however, relatively speaking is the easier one to detect. Since data drift is caused by the change in the underlying distribution, we just need to monitor the distribution of the live input data and compare it with the training data to identify any drift” (P. 125, Last Paragraph).
Agarwal discloses “Validity of a model is highly dependent on the similarity between the data distribution on which it is trained and the live data on which it makes its predictions. As the live data distribution changes, the validity of the model can come under the scanner” (P. 132, First Paragraph).
Agarwal discloses “Continuing with the same dataset, for the feature “Applied Amount”, the covariate drift over 30 days (Fig. 7.2) using the above concept shows a need for investigation as the drift is close to the High Drift level on a large number of days” (P. 127, Last Paragraph). 
Agarwal discloses Figure 7.2 on P. 128 (reproduced below) depicting covariate drifts calculated for the feature (‘first parameter’) Applied Amount for a period of 30 days. Each covariate drift measures the difference in distributions between the training data set (‘inference data set’) and the live data set (‘validating data set’) for a specific day. A covariate drift is used to determine if model retraining is needed, therefore model validation is performed (and therefore, a covariate drift that is calculated for each of the 30 days is an individual/distinct validation operation for the trained model). 

    PNG
    media_image2.png
    636
    1096
    media_image2.png
    Greyscale

), 
the trained inferential model being trained using a training data set different from the validating dataset (Agarwal discloses “Validity of a model is highly dependent on the similarity between the data distribution on which it is trained and the live data on which it makes its predictions. As the live data distribution changes, the validity of the model can come under the scanner” (P. 132, First Paragraph).
A model is trained on a training data set that has a different data distribution than that of a live data set (‘validating dataset’). Therefore, since the training data set and the live data set have different data distributions, the datasets must be different from each other.);
determining a first parameter distance between the extracted first distribution and the extracted second distribution … (Agarwal discloses the programming function calculate_distance() on P. 126 (reproduced below) is used to calculate a distance (‘first parameter distance’) between two distributions, “the function [calculate_covariate_drift] picks the common variables between the two datasets and calls the calculate_distance function to get the values for each variable. We begin by sorting the distributions by data rank and then creating bins of equal sizes. For a given feature, the percentage of observations falling into each bin is computed separately for the training and live data. The distance calculation after that is straightforward – the sum of the minimum percentage across each bin is calculated and then subtracted from 1” (P. 126, Last Two Paragraphs).

    PNG
    media_image3.png
    695
    1763
    media_image3.png
    Greyscale

);
and validating, in a second validation operation, the trained inferential model for operation on the inference data set based on satisfaction of a validation condition, the satisfaction of the validation condition being based on the determined first parameter distance (Agarwal discloses programming function cal_threshold() on P. 217 (reproduced below) that determines if an alert should be raised to indicate the severity of a data drift. 

    PNG
    media_image4.png
    389
    1412
    media_image4.png
    Greyscale

Agarwal discloses data drifts are indicative of model degradation, “to be able to monitor a model’s performance, the first and the foremost challenge is quantifying the model degradation. Identifying the parameters to track the model performance and defining the thresholds that if breached should raise an alert are fundamental components of model monitoring. This is important as it allows us to avoid spending more effort than required on retraining the models and yet ensure that we are not performing below the standards we have set for ourselves … it is imperative to have a process for detecting, alerting and addressing any kind of drift that may occur post go live. Eventually a model that exhibits any kind of degradation will need to be examined further either for recalibration, retraining or, in the worst case, replacement” (P. 124, ¶ 1-3).
Agarwal further discloses “using a predefined threshold, alerts are generated by passing the distance calculated above for each batch of new or live data.” (P. 127, First Paragraph). 
Agarwal discloses Figure 7.2 on P. 128 (reproduced above) depicting covariate drifts calculated for the feature (‘first parameter’) Applied Amount for a period of 30 days. Each covariate drift measures the difference in distributions between the training data set (‘inference data set’) and the live data set (‘validating data set’) for a specific day. A covariate drift is used to determine if model retraining is needed, therefore model validation is performed (and therefore, a covariate drift that is calculated for each of the 30 days is an individual/distinct validation operation for the trained model). 
Agarwal discloses covariate drifts (‘validation operations’) close to the High Drift level indicate a model investigation is needed, “Continuing with the same dataset, for the feature “Applied Amount”, the covariate drift over 30 days (Fig. 7.2) using the above concept shows a need for investigation as the drift is close to the High Drift level on a large number of days” (P. 127, Last Paragraph).).  
However, Agarwal does not teach statistically measuring a similarity between the extracted first distribution and the extracted second distribution, which is taught by Sun: 
determining a first parameter distance between the extracted first distribution and the extracted second distribution by statistically measuring a similarity between the extracted first distribution and the extracted second distribution (Sun discloses “this paper proposes a method to identify whether there is concept drift in new samples by employing three hypothesis test types and the distribution similarity of Euclidean distance between samples” (P. 2085, Sec. 1, Last Paragraph). 
Sun discloses “Hypothesis testing is a method of studying how to infer the overall quantitative characteristics from sample data [19], which is usually used to determine the difference between a sample and a sample, a sample and the population. … The purpose of the hypothesis test is to exclude the influence of the sampling error and to determine whether the difference between the samples is statistically valid. …Common types of test hypotheses include F-test, t-test, and rank sum test” (P. 2086, Sec. II-C, ¶1). 
Sun discloses “When the outlier detection window detects anomalies and sends an alarm signal, the distribution detection window would immediately match the samples in the window with the historical data to confirm that the alarm is caused by concept drift. The specific method is to first calculate the Euclidean distance between samples in historical data and the Euclidean distance between samples in the window and historical samples” (P. 2087, Sec. III-B, ¶1). 
Sun discloses “calculate the Euclidean distance between the sample in the variable window and the historical sample, and use the F-test to observe whether the two sets of distance data have similarity in the variance. When the variances are similar, the t-test is performed on the two sets of distance data, and the correlation and distribution between the two samples are judged by the similarity of averages” (P. 2088, Sec. III-C, ¶1).
Sun discloses “The return value of the t-test is … when                        
                             
                            
                                
                                    λ
                                
                                
                                    t
                                    e
                                    s
                                    t
                                
                                
                                    t
                                
                            
                            =
                            0
                        
                    , the distance between the two groups is considered to have the same distribution; otherwise, the distribution is considered to be different” (P. 2087, Sec. III-B, ¶1).). 
Sun teaches performing a t-test to determine distribution similarity between new and historical samples is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the method of Agarwal with the t-test disclosed by Sun to use a t-test to determine distribution similarity. By using a t-test to determine distribution similarity, it can be determined if distribution similarity is statistically valid, thereby ensuring that differences between distributions are not due to random chance.

With respect to claim 2, the combination of Agarwal in view of Sun teaches:
the method of claim 1, wherein the inference data set is collected under a first condition represented in the inference data set and the validating data set is collected under a second condition represented in the validating data set, the first condition and the second condition at least partially differ (Agarwal discloses “Covariate drift is the change in the distribution of the input feature set … This is one of the most common causes of model drifts, and the distribution drift can happen due to many reasons. The drifts can happen slowly over a period of time … A lot of time drift in data distribution occur because of latent changes, macro-economic changes or demographic changes … As the covariate drift happens, the distribution of the live data shifts from that of the data used for the training and testing. The distance between the non-intersection of the two distributions is a very good measure of the drift. … P and Q are training and live distributions, respectively. The larger the distance, the bigger is the drift” (P. 126, ¶ 1-2). 
	It is implied training data (‘inference data set’) and live data (‘validating data set’) are collected during different periods of times (‘conditions’) since the live data consists of new incoming data. Comparing two datasets with differing distributions also implies that data sets were collected during different conditions.), 
and satisfaction of the validation condition is based on the first condition and the second condition (The Examiner interprets the satisfaction of the validation condition as encompassing the exact same step as the validation step in Claim 1.).

With respect to claim 6, the combination of Agarwal in view of Sun teaches: 
the method of claim 1, wherein the first parameter represents a metadata parameter, the extracted first distribution including a distribution of values of the metadata parameter from the inference data set, and the extracted second distribution including a distribution of values of the metadata parameter of the validating data set (The Examiner interprets “metadata parameter” according to its broadest reasonable interpretation (in view of the Applicant’s specification at Paragraph 0055) as being a Kullback-Leibler divergence and a Covariate drift describing the distributions of two datasets being compared. 
Agarwal discloses “mathematically, given two population sets P and Q (indicating the trained data and the live data, respectively), the stability index can be defined using Kullback-Leibler divergence (DKL). To over the issue of its being non-symmetric, the sum of K-L divergence … is used as the base metric” (P. 133, First Paragraph). 
To compute the Kullback-Leibler divergence, the distributions of populations sets P (‘inference data set’) and Q (‘validating data set’) must be known, therefore the descriptions that describe these distributions are the metadata for these population sets. 
Agarwal further discloses “Covariate drift is the change in the distribution of the input feature set … As the covariate drift happens, the distribution of the live data shifts from that of the data used for the training and testing. The distance between the non-intersection of the two distributions is a very good measure of the drift” (P. 126, ¶ 1-2). 
To compute the Covariate drift, the distributions of training data (‘inference data set’) and live data (‘validating data set’) must be known, therefore the descriptions that describe these distributions are the metadata for these data sets.).  

With respect to claim 9, the rejection of claim 1 is incorporated. The difference in scope being:  
A system, comprising: one or more hardware processors configured to execute instructions stored in memory (A computer is implied by calculating the covariate drift for a data set as disclosed in Figure 7.2 on P. 128 by Agarwal. A hardware processor configured to execute instructions stored in memory is further implied by the use of a computer.);
and a data set distance model validator executable by the one or more hardware processors, the data set distance model validator including (The Examiner interprets a data set distance model validator according to its broadest reasonable interpretation as encompassing the programming instructions used to detect covariate drifts disclosed above in claim 1.).

With respect to claims 10 and 17, the claims recite similar limitations corresponding to claim 2, therefore the same rationale of rejection is applicable.

With respect to claim 14, the claim recites similar limitations corresponding to claim 6, therefore the same rationale of rejection is applicable.

With respect to claim 16, the rejection of claim 1 is incorporated. The difference in scope being:  
One or more tangible processor-readable storage media embodied with instructions for executing on one or more processors and circuits of a computing device a process of validating a trained inferential model for operation on an inference data set, the process comprising (A computer is implied by calculating the covariate drift for a data set as disclosed in Figure 7.2 on P. 128 by Agarwal. A tangible processor-readable storage media embodied with instructions for executing on one or more processors and circuits of a computing device is further implied by the use of a computer.).

Claims 3-4, 11-12, and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Agarwal in view of Sun, further in view of Lee et al. (“Unsupervised model drift estimation with batch normalization …”), hereinafter Lee.

With respect to claim 3, the combination of Agarwal in view of Sun teaches: the method of claim 1, however the combination does not teach normalizing a distance or applying a weight, which is taught by Lee: 
further comprising: normalizing the determined first parameter distance to generate a normalized first parameter distance (Lee discloses Equation 4 on P. 4 (reproduced below) for calculating a drift score (‘first parameter distance’) between a target data set (‘validation set;of a batch normalization layer. 

    PNG
    media_image5.png
    311
    1432
    media_image5.png
    Greyscale


Lee discloses “Dist(a, b) is a distance metric between vectors a and b. From the Equation 4, the right element of distance metric contains the information of the source in the BN layer and the left element that contains the information of the target, and hence we can implicitly compute the discrepancy of the source dataset from target dataset using only model parameters” (P. 4, Sec. 3.2, First Paragraph). 
Lee further discloses a drift score can be normalized by using a cosine distance, “Gaussian random variable, the drift score (4) of BN layer l can be computed with conventional distance metrics such as cosine distance                         
                            C
                            o
                            s
                            D
                            i
                            s
                            t
                            (
                            a
                            ,
                             
                            b
                            )
                            =
                            (
                            1
                            -
                            a
                            ·
                            b
                            /
                            |
                            |
                            a
                            |
                            |
                            |
                            |
                            b
                            |
                            |
                            )
                            /
                            2
                        
                    , which is bounded in [0, 1]” (P. 4, Sec. 3.2, Last Paragraph).), 
the normalized first parameter distance normalized to a predetermined range of values (Lee discloses “Gaussian random variable, the drift score (4) of BN layer l can be computed with conventional distance metrics such as cosine distance                         
                            C
                            o
                            s
                            D
                            i
                            s
                            t
                            (
                            a
                            ,
                             
                            b
                            )
                            =
                            (
                            1
                            -
                            a
                            ·
                            b
                            /
                            |
                            |
                            a
                            |
                            |
                            |
                            |
                            b
                            |
                            |
                            )
                            /
                            2
                        
                    , which is bounded in [0, 1]” (P. 4, Sec. 3.2, Last Paragraph).); 
and applying a relative weight to the normalized first parameter distance to generate a weighted first parameter distance (Lee discloses Equation 5 (reproduced below) on P.4 for calculating overall model discrepancy. The equation applies a weight to a calculated drift score (‘first parameter distance’). 

    PNG
    media_image6.png
    315
    1428
    media_image6.png
    Greyscale

Lee discloses “overall model discrepancy can be calculated by taking average of all layers … where weight                         
                            
                                
                                    w
                                
                                
                                    (
                                    l
                                    )
                                
                            
                            ∈
                             
                            [
                            0
                            ,
                             
                            1
                            ]
                        
                     indicates the relative importance of BN layer l compared to others. The weights can be set proportional to magnitude of gradient during training phase [17] or ratio the L2-norm of weight and gradient” (P. 4, Sec. 3.2, First Paragraph).),
the relative weight based on a predetermined correlative value of the first parameter relative to a second parameter represented in the inference data set and the validating data set (Lee discloses a weight can be set proportional to magnitude of gradient during training (‘predetermined correlative value of the first parameter’), “overall model discrepancy can be calculated by taking average of all layers … where weight                         
                            
                                
                                    w
                                
                                
                                    (
                                    l
                                    )
                                
                            
                            ∈
                             
                            [
                            0
                            ,
                             
                            1
                            ]
                        
                     indicates the relative importance of BN layer l compared to others. The weights can be set proportional to magnitude of gradient during training phase [17] or ratio the L2-norm of weight and gradient” (P. 4, Sec. 3.2, First Paragraph). A weight is then applied to a drift score calculated using model parameters that represents the discrepancy (relationship) between a source data set (‘inference data set’) and a target data set (‘validating data set’).), 
wherein the satisfaction of the validation condition is based on the weighted first parameter distance (Lee discloses that drift scores are used to select a model with the smallest drift, “Figure 1 shows an example of practical uses of the MDE [(model drift estimation)] where the dataset shift occurs every 30 epoch. When dataset shift happens, the drift score of MDE bounces upward and we select the model having the smallest drift from 20 different model candidates, which are trained with different datasets, for automatic recovery from dataset shift” (P. 2, Sec. 1, Last Paragraph). See also Figure 4 on P. 8 depicting the drift scores calculated for each model by using a target data set.).  
Lee teaches calculating a normalized and weighted drift score (‘parameter distance’) to determine dataset drift is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the method of Agarwal with the drift score disclosed by Lee to remove biases. Normalizing and weighing a score would give more importance to parameters that significantly affect drift and ensures that all scores are on the same scale, thereby creating fair comparisons and reducing biases. By reducing or removing biases, more accurate and fair machine learning models can be developed and used to make fair decisions. 

With respect to claim 4, the combination of Agarwal in view of Sun teaches:
the method of claim 1, further comprising: extracting a third distribution of values of a second parameter from the inference data set (Agarwal discloses the programming function calculate_covariate_drift() used to calculate a covariate drift between a training data set (‘inference data set’) and a live data set (‘validating data set’) on P. 126 (reproduced above). The function obtains (‘extracts’) distributions from the training data set of each variable (‘parameter’) that is present in both the training data set and the live data set.); 
extracting a fourth distribution of values of the second parameter from the validating data set (Agarwal discloses the calculate_covariate_drift() programming function on P. 126 (reproduced above). The function obtains (‘extracts’) distributions from the live data set (‘validating data set’) of each variable that is present in both the training data set and the live data set.); 
determining a second parameter distance between the extracted third distribution and the extracted fourth distribution (Agarwal discloses the programming function calculate_distance() on P. 126 (reproduced above) is used to calculate a distance (‘second parameter distance’) between two distributions, “the function [calculate_covariate_drift] picks the common variables between the two datasets and calls the calculate_distance function to get the values for each variable. We begin by sorting the distributions by data rank and then creating bins of equal sizes. For a given feature, the percentage of observations falling into each bin is computed separately for the training and live data. The distance calculation after that is straightforward – the sum of the minimum percentage across each bin is calculated and then subtracted from 1” (P. 126, Last Two Paragraphs). (Emphasis added).); 
However, the combination does not teach determining an aggregate distance, which is taught by Lee: 
and determining an aggregate distance based on the determined first parameter distance and the determined second parameter distance (Lee discloses Equation 5 (reproduced above) on P.4 for calculating overall model discrepancy. The equation averages (‘aggregates’) all the drift scores (‘parameter distances’) calculated for each layer to find the overall model drift. Lee discloses “overall model discrepancy can be calculated by taking average of all layers … where weight                         
                            
                                
                                    w
                                
                                
                                    (
                                    l
                                    )
                                
                            
                            ∈
                             
                            [
                            0
                            ,
                             
                            1
                            ]
                        
                     indicates the relative importance of BN layer l compared to others” (P. 4, Sec. 3.2, First Paragraph).), 
wherein the satisfaction of the validation condition is based on the aggregate distance (Lee discloses that drift scores are used to select a model with the smallest drift, “Figure 1 shows an example of practical uses of the MDE [(model drift estimation)] where the dataset shift occurs every 30 epoch. When dataset shift happens, the drift score of MDE bounces upward and we select the model having the smallest drift from 20 different model candidates, which are trained with different datasets, for automatic recovery from dataset shift” (P. 2, Sec. 1, Last Paragraph).).  
Lee teaches calculating an aggregate drift score (‘aggregate distance’) to determine dataset drift is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the method of Agarwal with the aggregate drift score disclosed by Lee to determine overall model drift. By focusing on multiple parameters, an aggregate drift score is able to reflect real-world data drifts that affect multiple parameters in a model simultaneously, thus leading to a more comprehensive and accurate understanding of distribution changes that can assist in model development.

With respect to claims 11 and 18, the claims recite similar limitations corresponding to claim 3, therefore the same rationale of rejection is applicable.

With respect to claim 12, the rejection of claim 4 is incorporated. The difference in scope being:  
wherein operation of the trained inferential model on the validating data set generates validated data results (Agarwal discloses predicted probabilities (‘validated data results’), “the same concept can be even replicated for the outcome – the predicted probabilities. The same techniques applied on the predicted probabilities is called prior probability shift (Fig. 7.3). This refers to the change in the distribution of the target variable in the training data and the live data. The target variable is binned to form 𝑖 bins from both the distribution. Then, the shift in the variable contribution to each bin is calculated” (P. 128, Last Paragraph).).

With respect to claim 19, the claim recites similar limitations corresponding to claim 4, therefore the same rationale of rejection is applicable.

Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Agarwal in view of Sun.

With respect to claim 5, the combination of Agarwal in view of Sun teaches the method of claim 1. Furthermore, Agarwal also teaches its further limitations in a different embodiment:
wherein the first parameter represents output from the trained inferential model (Agarwal discloses “the same concept can be even replicated for the outcome – the predicted probabilities. The same techniques applied on the predicted probabilities is called prior probability shift (Fig. 7.3). This refers to the change in the distribution of the target variable in the training data and the live data. The target variable is binned to form 𝑖 bins from both the distribution. Then, the shift in the variable contribution to each bin is calculated” (P. 128, Last Paragraph).), 
the extracted first distribution including a distribution of inference output parameter values output from the trained inferential model responsive to input of inference input data from the inference data set into the inferential model (A distribution is obtained from the predicted probabilities (‘inference output parameter values’) generated using training data (‘inference data set’) for a target variable (‘first parameter’). Agarwal discloses “the same concept can be even replicated for the outcome – the predicted probabilities. The same techniques applied on the predicted probabilities is called prior probability shift (Fig. 7.3). This refers to the change in the distribution of the target variable in the training data and the live data. The target variable is binned to form 𝑖 bins from both the distribution. Then, the shift in the variable contribution to each bin is calculated” (P. 128, Last Paragraph).), 
the extracted second distribution including a distribution of validating output parameter values output from the trained inferential model responsive to input of validating model input data from the validating data set into the trained inferential model (A distribution is obtained from the predicted probabilities (‘validating output parameter values’) generated using live data (‘validating data set’) for a target variable (‘first parameter’). Agarwal discloses “the same concept can be even replicated for the outcome – the predicted probabilities. The same techniques applied on the predicted probabilities is called prior probability shift (Fig. 7.3). This refers to the change in the distribution of the target variable in the training data and the live data. The target variable is binned to form 𝑖 bins from both the distribution. Then, the shift in the variable contribution to each bin is calculated” (P. 128, Last Paragraph).).  
Agarwal teaches detecting for drifts in the distribution of predicted probabilities (‘output parameter values’) is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the method of Agarwal for detecting drifts in the distribution of input variables with the method of Agarwal for detecting drifts in the distribution of outcomes because predicted outcomes can be biased and unfair. Predictions that do not reflect the real-world prevalence of outcomes are biased and unfair, and by detecting these biases, machine learning engineers can develop ways to remove them and make their model more fair.

With respect to claim 13, the claim recites similar limitations corresponding to claim 5, therefore the same rationale of rejection is applicable.

Claims 7-8, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Agarwal in view of Sun, further in view of Yi (“Discriminative dimensionality reduction for sensor drift …”).

With respect to claim 7, the combination of Agarwal in view of Sun teaches the method of claim 1, however Agarwal does not teach a first parameter that represents a sensor data parameter, which Yi does:
wherein the first parameter represents a sensor data parameter (Yi discloses “the UCSD sensor drift dataset was collected by Vergara et al. (2012). There are 13,910 samples in total collected using an electronic nose with 16 gas sensors. The collection period lasted for 36 months starting from January 2018 … According to the sample acquisition time, the sample set was split into ten batches. … Eight Features were extracted from sensor signals for each sensor” (P. 5-6, Sec. 4.1, First Paragraph).), 
the extracted first distribution including a distribution of values of the sensor data parameter from the inference data set (Yi discloses “the sample set was split into ten batches. … Eight Features were extracted from sensor signals for each sensor. Thus, the feature vector was 128-dimensional for each sample … batch 1 is adopted as the samples in the source domain. The label information in the source domain is available. Other batches are adopted as the samples in the target domains whose labels need to be predicted. Fig. 1 shows the 2D projection of the samples in batch 1~10. It is easy to observe that the sensor signals are time-varying, i.e., the distribution difference between the source domain and target domain is time-dependent” (P. 5-6, Sec. 4.1, First Paragraph). See Figure 1 on P. 4 depicting samples (‘distribution of values’) of Batch 1 (‘first distribution’) projected into a 2D subspace.
See Figure 3 on P. 7 depicting classification performance obtained by tuning model parameters and using the UCSD dataset that makes up the source (‘inference data set’) and target data.), 
and the extracted second distribution including a distribution of values of the sensor data parameter of validating model input data of the validating data set configured to be input into the trained inferential model (Yi discloses “the sample set was split into ten batches. … Eight Features were extracted from sensor signals for each sensor. Thus, the feature vector was 128-dimensional for each sample … batch 1 is adopted as the samples in the source domain. The label information in the source domain is available. Other batches are adopted as the samples in the target domains whose labels need to be predicted. Fig. 1 shows the 2D projection of the samples in batch 1~10. It is easy to observe that the sensor signals are time-varying, i.e., the distribution difference between the source domain and target domain is time-dependent” (P. 5-6, Sec. 4.1, First Paragraph). See Figure 1 on P. 4 depicting samples (‘distribution of values’) of a target domain (‘second distribution’) projected into a 2D subspace.
See Figure 3 on P. 7 depicting classification performance obtained by tuning model parameters and using the UCSD dataset that makes up the source and target data (‘validating data set’).).  
Yi teaches obtaining (‘extracting’) samples (‘distributions’) from sensor data to detect distribution drifts is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the method of Agarwal for detecting drifts with the sensor data samples disclosed by Yi because sensor data is prone to drifts. By detecting for drifts in sensor data, machine learning engineers can determine if a sensor is degraded or if environmental changes have occurred, and can therefore, take steps to replace sensors or update a model to reflect real-world measurements.

With respect to claim 8, the combination of Agarwal in view of Sun teaches the method of claim 1, however, the combination does not teach a reduced representation parameter of raw sensor data, which is taught by Yi:
wherein the first parameter includes a reduced representation parameter of raw sensor data output by a sensor (Yi discloses “employ a robust, low-rank, and sparse representation of sensor signals to address the sensor drift problem. The low-rank property guarantees that each target sample can be approximately represented by its neighbors in the common subspace, while the sparse property ensures that each target sample can be represented by a few samples in the source domain. Specifically, both the source and target data are projected into a common subspace, where each target sample is assumed to be represented by a linear combination of all the source samples via a reconstruction coefficient matrix. The distribution discrepancy between source data and target data is alleviated” (P. 2, Sec. 1, Second Paragraph). 
Yi further discloses “the UCSD sensor drift dataset was collected by Vergara et al. (2012). There are 13,910 samples in total collected using an electronic nose with 16 gas sensors. The collection period lasted for 36 months starting from January 2018 … According to the sample acquisition time, the sample set was split into ten batches. … Eight Features were extracted from sensor signals for each sensor” (P. 5-6, Sec. 4.1, First Paragraph).), 
the extracted first distribution including a distribution of values of the reduced representation parameter of raw sensor data of the inference data set (Yi discloses “the sample set was split into ten batches. … Eight Features were extracted from sensor signals for each sensor. Thus, the feature vector was 128-dimensional for each sample … batch 1 is adopted as the samples in the source domain. The label information in the source domain is available. Other batches are adopted as the samples in the target domains whose labels need to be predicted. Fig. 1 shows the 2D projection of the samples in batch 1~10. It is easy to observe that the sensor signals are time-varying, i.e., the distribution difference between the source domain and target domain is time-dependent” (P. 5-6, Sec. 4.1, First Paragraph). See Figure 1 on P. 4 depicting samples (‘distribution of values’) of Batch 1 (‘first distribution’) projected into a 2D subspace.
See Figure 3 on P. 7 depicting classification performance obtained by tuning model parameters and using the UCSD dataset that makes up the source (‘inference data set’) and target data.), 
the extracted second distribution including a distribution of values of the reduced representation parameter of raw sensor data of validating model input data of the validating data set configured to be input into the trained inferential model (Yi discloses “the sample set was split into ten batches. … Eight Features were extracted from sensor signals for each sensor. Thus, the feature vector was 128-dimensional for each sample … batch 1 is adopted as the samples in the source domain. The label information in the source domain is available. Other batches are adopted as the samples in the target domains whose labels need to be predicted. Fig. 1 shows the 2D projection of the samples in batch 1~10. It is easy to observe that the sensor signals are time-varying, i.e., the distribution difference between the source domain and target domain is time-dependent” (P. 5-6, Sec. 4.1, First Paragraph). See Figure 1 on P. 4 depicting samples (‘distribution of values’) of a target domain (‘second distribution’) projected into a 2D subspace.
See Figure 3 on P. 7 depicting classification performance obtained by tuning model parameters and using the UCSD dataset that makes up the source and target data (‘validating data set’).).
Yi teaches projecting sensor data into a common subspace to create a low-rank and sparse representation (‘reduced representation parameter’) of sensor signals (‘raw sensor data’) is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the method of Agarwal for detecting drifts with the sensor data samples disclosed by Yi because high-dimensional sensor data is complex. By reducing sensor data dimensionality, machine learning engineers can make models less complex resulting in a more interpretable and accurate model.  

With respect to claims 15 and 20, the claims recite similar limitations corresponding to claim 7, therefore the same rationale of rejection is applicable.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PEDRO J MORALES whose telephone number is (571)272-6106. The examiner can normally be reached 8:30 AM - 6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA M HUANG can be reached at (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PEDRO J MORALES/Examiner, Art Unit 2124                                                                                                                                                                                                        
/VINCENT GONZALES/Primary Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Nov 01, 2022
Application Filed
Sep 05, 2025
Non-Final Rejection — §101, §103
Dec 02, 2025
Applicant Interview (Telephonic)
Dec 03, 2025
Examiner Interview Summary
Dec 03, 2025
Response Filed
Feb 09, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/724,539
Patent 12591803
SYSTEMS AND METHODS FOR APPLYING MACHINE LEARNING BASED ANOMALY DETECTION IN A CONSTRAINED NETWORK
2y 5m to grant Granted Mar 31, 2026
17/514,297
Patent 12530412
SEARCH-QUERY SUGGESTIONS USING REINFORCEMENT LEARNING
2y 5m to grant Granted Jan 20, 2026
17/840,851
Patent 12524673
MULTITASK DISTRIBUTED LEARNING SYSTEM AND METHOD BASED ON LOTTERY TICKET NEURAL NETWORK
2y 5m to grant Granted Jan 13, 2026
Study what changed to get past this examiner. Based on 3 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
67%
Grant Probability
99%
With Interview (+50.0%)
3y 11m
Median Time to Grant
Moderate
PTA Risk
Based on 9 resolved cases by this examiner. Grant probability derived from career allow rate.