Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This action is in response to the application filed 03 March 2023. Claims 1-20 are pending and have been examined.
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 12 May 2023 are being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1
Step 1
Claim 1 recites method, and thus the claimed process falls within a statutory category of invention.
Step 2A Prong 1
The claim recites identifying ... a synthetic data object of the plurality of synthetic data objects that corresponds to the input data object based on one or more corresponding input feature values shared by the synthetic data object and the input data object, which is a mental process. The claim recites in response to identifying the synthetic data object: modifying ... a holistic evaluation score for the target machine learning model, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2
The additional element computer-implemented invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element receiving, by one or more processors, a request amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element to process an input data object with a target machine learning model invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element wherein the target machine learning model is previously trained using a training dataset comprising a plurality of synthetic data objects and a plurality of historical data objects does not amount to more than generally linking the use of a judicial exception to a particular field of use (see MPEP 2106.05(h), "limit the use of the abstract idea to a particular technological environment"). The additional element by the one or more processors invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element initiating ... the performance of a labeling process for assigning a ground truth label to the input data object invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element augmenting ... a supplemental training dataset with the input data object and the ground truth label amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering and outputting").
Step 2B
The additional element computer-implemented invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element receiving, by one or more processors, a request amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element to process an input data object with a target machine learning model invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element wherein the target machine learning model is previously trained using a training dataset comprising a plurality of synthetic data objects and a plurality of historical data objects does not amount to more than generally linking the use of a judicial exception to a particular field of use (see MPEP 2106.05(h), "limit the use of the abstract idea to a particular technological environment"). The additional element by the one or more processors invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element initiating ... the performance of a labeling process for assigning a ground truth label to the input data object invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element augmenting ... a supplemental training dataset with the input data object and the ground truth label is well-understood, routine, conventional activity (see MPEP 2106.05(d), "storing and retrieving information in memory").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 2
Step 1
Regarding Claim 2, the rejection of Claim 1 is incorporated.
Step 2A Prong 1
The claim recites identifying a performance degradation for the target machine learning model based on the holistic evaluation score for the target machine learning model, which is a mental process. The claim recites identifying an influencing feature value corresponding to the performance degradation, which is a mental process. The claim recites modifying the target machine learning model based on the influencing feature value, which is a mental process. The claim recites determining an updated holistic evaluation score for the target machine learning model, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 3
Step 1
Regarding Claim 3, the rejection of Claim 2 is incorporated.
Step 2A Prong 1
The claim recites identifying an influencing feature value corresponding to the performance degradation (as recited by Claim 2) wherein the influencing feature value is based on one or more counterfactual proposals for a plurality of predictive outputs generated by the target machine learning model, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 4
Step 1
Regarding Claim 4, the rejection of Claim 1 is incorporated.
Step 2A Prong 1
The claim recites wherein modifying the holistic evaluation score comprises reducing the holistic evaluation score, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 5
Step 1
Regarding Claim 5, the rejection of Claim 1 is incorporated.
Step 2A Prong 1
The claim recites detecting a threshold augmentation stimulus based on the supplemental training dataset, which is a mental process. The claim recites in response to the threshold augmentation stimulus, generating an augmented training dataset by augmenting the training dataset with the supplemental training dataset, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 6
Step 1
Regarding Claim 6, the rejection of Claim 5 is incorporated.
Step 2A Prong 1
The claim recites identifying an influencing feature value corresponding to the performance degradation (as recited by Claim 2), wherein the one or more corresponding input feature values are associated with an evaluation feature of the training dataset, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2
The additional element wherein the target machine learning model is previously trained using a training dataset comprising a plurality of synthetic data objects and a plurality of historical data objects (as recited by Claim 1), wherein the plurality of synthetic data objects comprise one or more synthetic data objects associated with the evaluation feature does not amount to more than generally linking the use of a judicial exception to a particular field of use (see MPEP 2106.05(h), "limit the use of the abstract idea to a particular technological environment"). The additional element wherein augmenting the training dataset comprises: replacing the one or more synthetic data objects with the supplemental training dataset amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering").
Step 2B
The additional element wherein the target machine learning model is previously trained using a training dataset comprising a plurality of synthetic data objects and a plurality of historical data objects (as recited by Claim 1), wherein the plurality of synthetic data objects comprise one or more synthetic data objects associated with the evaluation feature does not amount to more than generally linking the use of a judicial exception to a particular field of use (see MPEP 2106.05(h), "limit the use of the abstract idea to a particular technological environment"). The additional element wherein augmenting the training dataset comprises: replacing the one or more synthetic data objects with the supplemental training dataset is well-understood, routine, conventional activity (see MPEP 2106.05(d), "storing and retrieving information in memory").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 7
Step 1
Regarding Claim 7, the rejection of Claim 5 is incorporated.
Step 2A Prong 1
The claim recites identifying a performance degradation for the target machine learning model based on the holistic evaluation score for the target machine learning model, which is a mental process. The claim recites in response to the performance degradation, modifying the target machine learning model based on the augmented training dataset, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 8
Step 1
Regarding Claim 8, the rejection of Claim 5 is incorporated.
Step 2A Prong 1
The claim recites detecting a threshold augmentation stimulus based on the supplemental training dataset (as recited by Claim 4), wherein the threshold augmentation stimulus is based on a threshold number of supplemental input data objects in the supplemental training dataset, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2, Step 2B
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Regarding Claim 9
Step 1
Claim 9 recites a computing apparatus, and thus the claimed machine falls within a statutory category of invention.
Step 2A Prong 1
The claim recites identify a synthetic data object of the plurality of synthetic data objects that corresponds to the input data object based on one or more corresponding input feature values shared by the synthetic data object and the input data object, which is a mental process. The claim recites in response to identifying the synthetic data object: modify a holistic evaluation score for the target machine learning model, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2
The additional element memory and one or more processors communicatively coupled to the memory, the one or more processors configured invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element receive a request amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element to process an input data object with a target machine learning model invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element wherein the target machine learning model is previously trained using a training dataset comprising a plurality of synthetic data objects and a plurality of historical data objects does not amount to more than generally linking the use of a judicial exception to a particular field of use (see MPEP 2106.05(h), "limit the use of the abstract idea to a particular technological environment"). The additional element initiate the performance of a labeling process for assigning a ground truth label to the input data object invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element augment a supplemental training dataset with the input data object and the ground truth label amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering and outputting").
Step 2B
The additional element memory and one or more processors communicatively coupled to the memory, the one or more processors configured invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element receive a request amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element to process an input data object with a target machine learning model invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element wherein the target machine learning model is previously trained using a training dataset comprising a plurality of synthetic data objects and a plurality of historical data objects does not amount to more than generally linking the use of a judicial exception to a particular field of use (see MPEP 2106.05(h), "limit the use of the abstract idea to a particular technological environment"). The additional element initiate the performance of a labeling process for assigning a ground truth label to the input data object invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element augment a supplemental training dataset with the input data object and the ground truth label is well-understood, routine, conventional activity (see MPEP 2106.05(d), "storing and retrieving information in memory").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Claims 10-16, dependent on Claim 9, incorporate the rejection of Claim 9. Claims 10-16 incorporate substantively all the limitations of Claims 2-8, respectively, in computing apparatus form and are rejected under the same rationales.
Regarding Claim 17
Step 1
Claim 17 recites one or more non-transitory computer-readable storage media including instructions that, when executed by one or more processors, cause the one or more processors, and thus the claimed manufacture falls within a statutory category of invention.
Step 2A Prong 1
The claim recites identify a synthetic data object of the plurality of synthetic data objects that corresponds to the input data object based on one or more corresponding input feature values shared by the synthetic data object and the input data object, which is a mental process. The claim recites in response to identifying the synthetic data object: modify a holistic evaluation score for the target machine learning model, which is a mental process.
Thus, the claim recites an abstract idea.
Step 2A Prong 2
The additional element receive a request amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element to process an input data object with a target machine learning model invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element wherein the target machine learning model is previously trained using a training dataset comprising a plurality of synthetic data objects and a plurality of historical data objects does not amount to more than generally linking the use of a judicial exception to a particular field of use (see MPEP 2106.05(h), "limit the use of the abstract idea to a particular technological environment"). The additional element initiate the performance of a labeling process for assigning a ground truth label to the input data object invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element augment a supplemental training dataset with the input data object and the ground truth label amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering and outputting").
Step 2B
The additional element receive a request amounts to insignificant extra-solution activity (see MPEP 2106.05(g), "mere data gathering"). The additional element to process an input data object with a target machine learning model invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element wherein the target machine learning model is previously trained using a training dataset comprising a plurality of synthetic data objects and a plurality of historical data objects does not amount to more than generally linking the use of a judicial exception to a particular field of use (see MPEP 2106.05(h), "limit the use of the abstract idea to a particular technological environment"). The additional element initiate the performance of a labeling process for assigning a ground truth label to the input data object invokes a computer or other machinery merely as a tool to perform an existing process (see MPEP 2106.05(f), "apply it"). The additional element augment a supplemental training dataset with the input data object and the ground truth label is well-understood, routine, conventional activity (see MPEP 2106.05(d), "storing and retrieving information in memory").
The claim lacks additional elements that integrate it into a practical application or provide significantly more, so it is directed to an abstract idea and is ineligible.
Claims 18-20, dependent on Claim 17, incorporate the rejection of Claim 17. Claims 18-20 incorporate substantively all the limitations of Claims 2-4, respectively, in non-transitory computer-readable storage media form and are rejected under the same rationales.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang, et al. (US 2021/0406727 A1, hereinafter "Wang") in view of Wan, et al., "Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks" (hereinafter "Wan").
Regarding Claim 1, Wang teaches:
A computer-implemented method (Wang, [0003]: "A computerized method for managing defects in a model training pipeline is described") comprising:
receiving, by one or more processors, a request (Wang, [0043]: "In some examples, changes made to the pipeline may be detected and the process may proceed to 408 automatically as a result. Alternatively, or additionally, a user that makes a change to the pipeline may manually trigger the process to proceed to 408 based on the change," where Wang's manual trigger corresponds to the instant received request) to process an input data object with a target machine learning model (Wang, [0045]: "At 410, test performance metrics are collected based on the test model. As with the baseline model described above, the test performance metrics may be based on the accuracy of the performance of the model after it is trained and/or based on the performance of various operations of the pipeline during the training of the test model," where Wang's test model corresponds to the instant target model), wherein the target machine learning model is previously trained (Wang, [0085]: "based on comparing the baseline performance metrics to the test performance metrics and the test performance metrics exceeding the baseline performance metrics, providing an indication that the code change improves the model training pipeline with respect to the defect type, whereby a baseline version of the model training pipeline is updated to include the code change and the baseline performance metrics are updated based on the test performance metrics," where Wang's baseline version updated to test version corresponds to the instant previously trained) using a training dataset comprising a plurality of synthetic data objects and a plurality of historical data objects (Wang, Fig. 5, block 504, "Select a real data set", block 506, "Generate a synthetic data set based on the selected real data set and the selected defect type," and block 514, "Train a model using the baseline pipeline and the selected synthetic data set," where Wang's real data set of block 504 corresponds to the instant historical data);
identifying, by the one or more processors, a synthetic data object of the plurality of synthetic data objects that corresponds to the input data object (Wang, [0047]: "The defect indicator may be provided to another component of the system, to a user interface for viewing by a user, to a storage component for storage thereon, or the like. The defect indicator may include the defect type of the detected defect, the associated lifecycle stage of the detected defect, and any other associated data that may be of use in addressing the defect. Further, the specific values and types of the compared performance metrics that triggered the detection of the defect may be provided for additional context") based on one or more corresponding input feature values shared by the synthetic data object and the input data object (Wang, [0062]: "a model training pipeline is configured to train models in a healthy baseline state. The model pipeline includes four real data sets for use as training data and a user of the pipeline has defined four different defect types that may commonly be introduced when changes are made to the pipeline. Each of the four defect types is associated with a different data feature of training data. The pipeline is configured with data set converters for each of the defect types that are applied to each of the real data sets to generate synthetic data sets that include the data features with which the defect types are associated. As a result, a synthetic data set is generated for each pair of a defect type and a real data set, resulting in sixteen synthetic data sets"); and
in response to identifying the synthetic data object: modifying, by the one or more processors, a holistic evaluation score for the target machine learning model (Wang, [0085]: "based on comparing the baseline performance metrics to the test performance metrics and the test performance metrics exceeding the baseline performance metrics, providing an indication that the code change improves the model training pipeline with respect to the defect type, whereby a baseline version of the model training pipeline is updated to include the code change and the baseline performance metrics are updated based on the test performance metrics," where Wang's updated baseline performance metrics corresponds to the instant modified holistic evaluation score),
initiating, by the one or more processors, the performance of a labeling process for assigning a ... label to the input data object (Wang, [0022]: "The performance metrics collected by the model performance module 120 may be metrics based on the performance of operations at each of the lifecycle stages 104-108 and/or performance of operations of a trained model 110 after it is trained by the model training pipeline 102. For instance, in some examples, performance metrics measuring the accuracy with which the trained model 110 classifies input data may be collected by the model performance module 120 after the trained model 110 is trained by the model training pipeline," where Wang's testing of a classifier corresponds to the instant label assignement), and
augmenting, by the one or more processors, a ... training dataset with the input data object and the ... label (Wang, [0022]: "in some examples, performance metrics measuring the accuracy with which the trained model 110 classifies input data may be collected by the model performance module 120 after the trained model 110 is trained by the model training pipeline 102. Additionally, or alternatively, performance metrics of specific lifecycle stages or operations associated with the lifecycle stages may be measured, such as time taken to complete an operation of a lifecycle stage," where Wang's updating classification performance metrics for a training dataset corresponds to the instant augmentation).
Wang teaches receiving a request to process an input data object with a target machine learning model, identifying a synthetic data object corresponding to the input data object, modifying a model score, labeling the input data, and augmenting a training dataset using the input data object and the ground truth label.
Wang does not explicitly teach a labeling process for assigning a ground truth label to the input data and augmenting ... a supplemental training dataset with the input data object and the ground truth label.
However, Wan teaches:
a labeling process for assigning a ground truth label to the input data (Wan, p. 21, Method Details: "In this work, we conduct the CTST by using the 1-nearest neighbour classification algorithm due to its simplicity on hyper-parameter tuning. The real and generated synthetic protein feature samples are merged as a union set of protein feature samples with being assigned the binary labels respectively, e.g. the label 1 for the real samples and the label 0 for the synthetic samples") and
augmenting ... a supplemental training dataset with the input data object and the ground truth label (Wan, p. 3, Results, Overview of FFPred-GAN: "On the last step, FFPred-GAN uses the Classifier Two-Sample Tests (CTST) [33] to select the optimal synthetic training protein feature samples, which are used to augment the original training samples. During the down-stream machine learning classifier training stage, the optimal synthetic samples are expected to derive ... classifiers," where Wan's downstream training data corresponds to the instant supplemental dataset).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Wang regarding receiving a request to process an input data object with a target machine learning model, identifying a synthetic data object corresponding to the input data object, modifying a model score, labeling the input data, and augmenting a training dataset using the input data object and the ground truth label with those of Wan regarding a labeling process for assigning a ground truth label to the input data and augmenting a supplemental training dataset with the input data object and the ground truth label.
The motivation to do so would be to facilitate the training of classifiers with higher predictive accuracy (Wan, p. 3, Results, Overview of FFPred-GAN: "On the last step, FFPred-GAN uses the Classifier Two-Sample Tests (CTST) [33] to select the optimal synthetic training protein feature samples, which are used to augment the original training samples. During the down-stream machine learning classifier training stage, the optimal synthetic samples are expected to derive better classifiers, leading to higher predictive accuracy").
Regarding Claim 9, Wang teaches:
A computing apparatus (Wang, Fig. 6, block 600) comprising memory (Wang, Fig. 6, block 622, MEMORY) and one or more processors communicatively coupled to the memory (Wang, Fig. 6, block 619, PROCESSOR), the one or more processors configured to: perform precisely those steps recited by the method of Claim 1. Claim 9 is rejected under the same rationale as Claim 1.
Regarding Claim 17, Wang teaches:
One or more non-transitory computer-readable storage media including instructions that, when executed by one or more processors (Wang, [0078]: "One or more non-transitory computer storage media have computer-executable instructions for managing defects in a model training pipeline"), cause the one or more processors to: perform precisely those steps recited by the method of Claim 1. Claim 10 is rejected under the same rationale as Claim 1.
Regarding Claim 2, the rejection of Claim 1 is incorporated. The Wang/Wan combination teaches:
identifying a performance degradation for the target machine learning model based on the holistic evaluation score for the target machine learning model (Wang, [0047]: "The defect indicator may be provided to another component of the system, to a user interface for viewing by a user, to a storage component for storage thereon, or the like. The defect indicator may include the defect type of the detected defect, the associated lifecycle stage of the detected defect, and any other associated data that may be of use in addressing the defect. Further, the specific values and types of the compared performance metrics that triggered the detection of the defect may be provided for additional context");
identifying an influencing feature value corresponding to the performance degradation (Wang, [0062]: "The pipeline is configured with data set converters for each of the defect types that are applied to each of the real data sets to generate synthetic data sets that include the data features with which the defect types are associated. As a result, a synthetic data set is generated for each pair of a defect type and a real data set, resulting in sixteen synthetic data sets");
modifying the target machine learning model based on the influencing feature value (Wang, [0085]: "based on comparing the baseline performance metrics to the test performance metrics and the test performance metrics exceeding the baseline performance metrics, providing an indication that the code change improves the model training pipeline with respect to the defect type, whereby a baseline version of the model training pipeline is updated to include the code change and the baseline performance metrics are updated based on the test performance metrics"); and
determining an updated holistic evaluation score for the target machine learning model
(Wang, [0085]: "based on comparing the baseline performance metrics to the test performance metrics and the test performance metrics exceeding the baseline performance metrics, providing an indication that the code change improves the model training pipeline with respect to the defect type, whereby ... the baseline performance metrics are updated based on the test performance metrics").
Claims 10 and 18 incorporate substantively all the limitations of Claim 2 in computing apparatus and non-transitory computer-readable storage media forms, respectively, and are rejected under the same rationale.
Regarding Claim 3, the rejection of Claim 2 is incorporated. The Wang/Wan combination teaches:
wherein the influencing feature value is based on one or more counterfactual proposals for a plurality of predictive outputs generated by the target machine learning model (Wang, [0062]: "The pipeline is configured with data set converters for each of the defect types that are applied to each of the real data sets to generate synthetic data sets that include the data features with which the defect types are associated. As a result, a synthetic data set is generated for each pair of a defect type and a real data set, resulting in sixteen synthetic data sets," where Wang's generated defects correspond to the instant counterfactual proposals).
Regarding Claim 4, the rejection of Claim 1 is incorporated. The Wang/Wan combination teaches:
wherein modifying the holistic evaluation score comprises reducing the holistic evaluation score (Wang, [0022]: "performance metrics of specific lifecycle stages or operations associated with the lifecycle stages may be measured, such as time taken to complete an operation of a lifecycle stage," where Wang's time measurement reasonably suggests a reduction in duration of an operation, as in [0048]: "if the comparison of the metrics indicates that the test model has improved performance over the baseline model, the associated change to the pipeline may be considered an improvement of the pipeline and the provided test success indicator may include a recommendation that the change be considered the new baseline version of the pipeline").
Claims 12 and 20 incorporate substantively all the limitations of Claim 4 in computing apparatus and non-transitory computer-readable storage media forms, respectively, and are rejected under the same rationale.
Regarding Claim 5, the rejection of Claim 1 is incorporated. Wan further teaches:
detecting a threshold augmentation stimulus based on the supplemental training dataset (Wan, p. 4: "The training quality of FFPred-GAN continues to improve with more iterations of training, with the LOOCV [Leave One Out Cross-Validation] accuracy reaching 0.515 after another 10,000 iterations. Finally, after 29,601 iterations’ training, FFPred-GAN has been successfully trained due to the desired LOOCV accuracy of 0.500," where Wan's desired LOOCV accuracy corresponds to the instant threshold); and
in response to the threshold augmentation stimulus, generating an augmented training dataset by augmenting the training dataset with the supplemental training dataset (Wan, p. 3, Results: "the FFPred-GAN framework consists of three steps to generate high-quality synthetic training protein feature samples.... On the last step, FFPred-GAN uses the Classifier Two-Sample Tests (CTST) [33] to select the optimal synthetic training protein feature samples, which are used to augment the original training samples").
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Wang/Wan combination regarding augmenting a supplemental training dataset with the further teachings of Wan regarding detecting a threshold augmentation stimulus based on the supplemental training dataset and in response to the threshold augmentation stimulus, generating an augmented training dataset by augmenting the training dataset with the supplemental training dataset.
The motivation to do so would be to facilitate training of classifiers with higher predictive performance (Wan, p. 6, Results, Overview of FFPred-GAN: "the synthetic protein feature samples successfully improve the predictive performance of the original combination of training protein feature samples, and lead to the overall highest accuracy for predicting all three domains of GO terms with an SVM classification algorithm. For predicting Biological Process (BP) domain of GO terms, the combination of Synthetic Positive + Real Positive + Real Negative obtains the overall best average ranks of 5.88 and 4.84, respectively according to MCC and AUROC values by using SVM as the classification algorithm").
Claim 13 incorporates substantively all the limitations of Claim 5 in computing apparatus form and is rejected under the same rationale.
Regarding Claim 6, the rejection of Claim 5 is incorporated. The Wang/Wan combination teaches:
wherein the one or more corresponding input feature values are associated with an evaluation feature of the training dataset, wherein the plurality of synthetic data objects comprise one or more synthetic data objects associated with the evaluation feature (Wang, [0062]: "a model training pipeline is configured to train models in a healthy baseline state. The model pipeline includes four real data sets for use as training data and a user of the pipeline has defined four different defect types that may commonly be introduced when changes are made to the pipeline. Each of the four defect types is associated with a different data feature of training data. The pipeline is configured with data set converters for each of the defect types that are applied to each of the real data sets to generate synthetic data sets that include the data features with which the defect types are associated. As a result, a synthetic data set is generated for each pair of a defect type and a real data set, resulting in sixteen synthetic data sets"), and wherein augmenting the training dataset comprises:
replacing the one or more synthetic data objects with the supplemental training dataset (Wang, Fig. 5, the "YES" edge of blocks 504, 512, and 520, indicating cycling through newly generated synthetic data with subsequent training sets).
Claim 14 incorporates substantively all the limitations of Claim 6 in computing apparatus form and is rejected under the same rationale.
Regarding Claim 7, the rejection of Claim 5 is incorporated. XXX teaches:
identifying a performance degradation for the target machine learning model based on the holistic evaluation score for the target machine learning model (Wang, [0047]: "The defect indicator may be provided to another component of the system, to a user interface for viewing by a user, to a storage component for storage thereon, or the like. The defect indicator may include the defect type of the detected defect, the associated lifecycle stage of the detected defect, and any other associated data that may be of use in addressing the defect. Further, the specific values and types of the compared performance metrics that triggered the detection of the defect may be provided for additional context"); and
in response to the performance degradation, modifying the target machine learning model based on the augmented training dataset (Wang, Fig. 5, blocks 516 and 524, indicating collection of performance metrics, and the corresponding "YES" back-edges, indicating a response).
Claim 15 incorporates substantively all the limitations of Claim 7 in computing apparatus form and is rejected under the same rationale.
Regarding Claim 8, the rejection of Claim 5 is incorporated. Wan further teaches:
wherein the threshold augmentation stimulus is based on a threshold number of supplemental input data objects in the supplemental training dataset (Wan, p. 4, Results, Overview of FFPred-GAN: "we adopt the 1-nearest neighbour classification algorithm and the Leave One Out Cross-Validation (LOOCV) to conduct the classifier two-sample tests, which is used for evaluating the quality of synthetic protein feature samples. The closer the value of LOOCV accuracy is to 0.500, the higher the quality of synthetic samples. ... [A]t the begin of FFPred-GAN training (i.e. after the 1st iteration), the real positive protein feature samples (green dots) are distributed distantly from the synthetic ones (red dots), leading to a LOOCV accuracy of 1.000, suggesting obvious differences between the real and synthetic sets of protein feature samples. ... Finally, after 29,601 iterations' training, FFPred-GAN has been successfully trained due to the desired LOOCV accuracy of 0.500" and p. 21: "the synthetic protein feature samples that obtain the best LOOCV accuracy (i.e. closest to 50.0%) are selected as the optimal synthetic feature samples," where Wan's desired 0.500 accuracy corresponds to the threshold number).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the Wang/Wan combination regarding detecting a threshold augmentation stimulus with the further teachings of Wan regarding wherein the threshold augmentation stimulus is based on a threshold number of supplemental input data objects in the supplemental training dataset.
The motivation to do so would be to ensure generating samples of sufficient quality to improve prediction accuracy of trained models (Wan p. 4, Results, Overview of FFPred-GAN: "The closer the value of LOOCV accuracy is to 0.500, the higher the quality of synthetic samples" and p. 18, Discussion: "we have presented a novel generative adversarial networks-based method that successfully generates high-quality synthetic feature samples, which significantly improve the accuracy on predicting all three domains of GO terms through augmenting the original training data").
Claim 16 incorporates substantively all the limitations of Claim 8 in computing apparatus form and is rejected under the same rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Mothilal, et al., "Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations," teaches a framework for generating and evaluating feasible and diverse sets of hypothetical examples that show people how to obtain differing algorithmic predictions, for the purposes of improved model explainability.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT N DAY whose telephone number is (703)756-1519. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/R.N.D./Examiner, Art Unit 2122
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122