DETAILED ACTION
This communication is in response to the application filed 01/04/23 in which claims 1-8 were presented for examination.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 01/04/23 and 01/13/23 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claims 1 and 3-8 are rejected under 35 U.S.C. 103 as being unpatentable over Bilalli, Besim, et al. "PRESISTANT: Learning based assistant for data pre-processing." Data & Knowledge Engineering 123 (2019): 101727 (“Bilalli”) in view of Tran (US 2019/0354849 A1; published Nov. 21, 2019).
Regarding claim 1, Bilalli discloses […] a process, the process comprising:
obtaining first change information, which indicates a change in a feature of a first dataset when first preprocessing is performed on the first dataset; (see Bilalli Section 5.1 Pruning phase (“Given that there is an overwhelming number of different transformations that can be applied to a dataset, in Section 2.2.1, we argued that simple rules can help on discarding transformations that have no impact. This translates to having a repository of Expert Rules (cf. Fig. 6), that can be extended to contain any types of rules (e.g., the types of rules used in IBM SPSS Modeler7), that may help on reducing the number of potential transformations to be applied on datasets. Our first basic set of rules are derived from the experiments whose results were shown in Fig. 3, where for instance we define rules in order to exclude Standardization and Normalization when considering algorithms like, IBk, Logistic, J48, and PART.”) [the transformation(s) remaining after pruning are interpreted as the claimed first preprocessing]); see Bilalli Section 5.3 (“The recommending phase starts when a user wants to analyze a dataset [first dataset]. She selects an algorithm to be used for the analysis and the system automatically recommends transformations to be applied, such that the final result is improved. This phase is described in Algorithm 3. In Algorithm 3, first the meta-features and the performance of the classification algorithm are extracted from the original non-transformed dataset in lines 3 and 4, respectively. Next, different transformations [first preprocessing] are applied to the dataset and from each transformed version of the dataset the necessary features (i.e., meta-features, delta meta-features) [first change information] are computed — see lines 5–9.”))
inputting the first change information to a trained machine learning model that outputs an inference result regarding preprocessing information in response to an input of the first change information, (see Bilalli Section 5.3 (“These features [first change information] are then fed to the predictor [trained machine learning model] in line 10. The predictor in line 10, applies the meta-model to the extracted features in order to find the predicted impact [inference result] of a transformation [preprocessing information] on the performance of the algorithm.”)) the preprocessing information identifying each of a plurality of pieces of second preprocessing for a second dataset, (see Bilalli Section 5.3 (“After the predicted impacts are obtained for all the transformations [pieces of second preprocessing], they are ranked in descending order using the probabilities of being positive, which are provided by the model, in line 11.”)) the trained machine learning model being trained by machine learning using training data in which the preprocessing information as an objective variable is associated with second change information as an explanatory variable, (see Bilalli Section 5.2 (“Two important activities are performed in the learning phase. First, a meta-database (i.e., set of meta-datasets) [training data] is generated for all the classification algorithms considered (cf. Algorithm 1), and then on top of it, a learning algorithm is applied (cf. Algorithm 2). As a result, a statistical model (meta-model) [trained machine learning model] is generated for every classification algorithm considered. The inputs required to construct the meta-database are datasets, transformations [preprocessing information]— that are likely to improve the performance of classification algorithms, and the classification algorithms in consideration. For the sake of simplicity, let us consider that we want to create the meta-dataset for a single classification algorithm. In line 8 of Algorithm 1, we first extract the dataset characteristics (i.e., meta-features from the original non-transformed datasets). Next, we apply all the available transformations to all the datasets and hence obtain transformed datasets, see line 11. We extract the meta-features from the transformed datasets in line 12, and take the difference between them and the meta-features from the original non-transformed datasets in line 13. Like this, we obtain the delta meta-features [second change information]. Furthermore, to both original non-transformed datasets — line 9, and the transformed ones — line 14, we apply the classification algorithm and then take the relative difference between their corresponding performance measures (e.g., predictive accuracy) — line 15. The latter is the meta-response, which together with the meta-features of the non-transformed version of the dataset, the delta meta-features, and the performance measure of the original dataset compile the complete set of metadata (list of features that will be used in the learning phase) — see line 16. Once a meta-dataset is obtained for each classification algorithm, next a learning algorithm (i.e., meta-learner) is applied — line 6 of Algorithm 2. As a result, a meta-model (i.e., statistical model) [trained machine learning model] for each of the classification algorithms is obtained.”)) the second change information indicating a change in a feature of the second dataset when each of the plurality of pieces of second preprocessing is performed (see Bilalli Section 4.1 (“Note that the ultimate goal is to predict the impact of transformations, and the impact per se is measured as the relative change of the performance of the algorithm before and after the transformation is applied. To this end, to the set of meta-features we consider, we attach also the base performance of the classification algorithm (i.e., the performance before the transformation is applied) and in addition we add features that capture the difference [change in a feature of the second dataset] between the meta-features before and after the transformation is applied. We call these features delta meta-features. As a result, every meta-feature has its corresponding delta meta-feature. For instance, let us say that in a given dataset, before applying a transformation, the number of continuous attributes is 5. Assume we apply a transformation that is discretizing only one continuous attribute, then, the number of continuous attributes becomes 4 and thus the delta of this feature is −1 (i.e., the delta of the number of continuous attributes).”)); and
identifying, among the plurality of pieces of second preprocessing, one or more pieces of recommended preprocessing that correspond to the first preprocessing based on the inference result that is output in response to the input of the first change information (see Bilalli Section 5.3 (“The recommending phase starts when a user wants to analyze a dataset. She selects an algorithm to be used for the analysis and the system automatically recommends transformations [pieces of recommended preprocessing] to be applied, such that the final result is improved. This phase is described in Algorithm 3. In Algorithm 3, first the meta-features and the performance of the classification algorithm are extracted from the original non-transformed dataset in lines 3 and 4, respectively. Next, different transformations [pieces of second preprocessing] are applied to the dataset and from each transformed version of the dataset the necessary features (i.e., meta-features, delta meta-features) are computed — see lines 5–9. These features are then fed to the predictor in line 10. The predictor in line 10, applies the meta-model to the extracted features in order to find the predicted impact of a transformation on the performance of the algorithm. After the predicted impacts are obtained for all the transformations, they are ranked in descending order using the probabilities of being positive, which are provided by the model, in line 11.”)).
Although Bilalli teaches algorithms to assist the user by reducing the number of preprocessing options to only a set of relevant ones, Bilalli does not expressly disclose [a] non-transitory computer-readable recording medium storing a program for causing a computer to execute (but see Tran ¶ 73 (“Turning now to FIG. 8, a method 800 for automatic learning the automatic data preprocessing for a machine learning operation by a processor is depicted, in which various aspects of the illustrated embodiments may be implemented. That is, in association with block 704 of FIG. 7, method 800 may be executed. That is, method 800 may be one or more sub-steps of with block 704 of FIG. 7. The functionality 800 may be implemented as a method executed as instructions on a machine, where the instructions are included on at least one computer readable medium or one non-transitory machine-readable storage medium.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilalli to incorporate the teachings of Tran to implement the Presistent tool algorithms as instructions stored on a non-transitory storage medium. Doing so would enable executing the functionality of the algorithms on a machine.
Claim 7 is a method claim corresponding to claim 1 and, therefore, is similarly rejected.
Claim 8 is an apparatus claim corresponding to claim 1 and, therefore, is similarly rejected. Bilalli does not expressly disclose [a]n information processing device, comprising:
a memory; and (but see Tran FIG. 1 (memory))
a processor coupled to the memory and the processor configured to: (but see Tran FIG. 1 (processing unit 16); ¶ 42 (“As shown in FIG. 1, computer system/server 12 in cloud computing node 10 is shown in the form of a general-purpose computing device. The components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including system memory 28 to processor 16.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilalli to incorporate the teachings of Tran to implement the Presistent tool algorithms on a computer system including a processor coupled to a memory. Doing so would enable executing the functionality of the algorithms on a machine.
Regarding claim 3, Bilalli, in view of Tran, discloses the invention of claim 1 as discussed above. Bilalli further discloses wherein
the first change information includes a difference between the feature of the first dataset before the first preprocessing is performed and the feature of the first dataset after the first preprocessing is performed, and (see Bilalli Section 4.1 (“Note that the ultimate goal is to predict the impact of transformations, and the impact per se is measured as the relative change of the performance of the algorithm before and after the transformation is applied. To this end, to the set of meta-features we consider, we attach also the base performance of the classification algorithm (i.e., the performance before the transformation is applied) and in addition we add features that capture the difference between the meta-features before and after the transformation is applied. We call these features delta meta-features. As a result, every meta-feature has its corresponding delta meta-feature. For instance, let us say that in a given dataset, before applying a transformation, the number of continuous attributes is 5. Assume we apply a transformation that is discretizing only one continuous attribute, then, the number of continuous attributes becomes 4 and thus the delta of this feature is −1 (i.e., the delta of the number of continuous attributes).”))
the second change information includes a difference between the feature of the second dataset before each of the plurality of pieces of second preprocessing is performed and the feature of the second dataset after each of the plurality of pieces of second preprocessing is performed (see Bilalli Section 5.3 (“The recommending phase starts when a user wants to analyze a dataset. She selects an algorithm to be used for the analysis and the system automatically recommends transformations to be applied, such that the final result is improved. This phase is described in Algorithm 3. In Algorithm 3, first the meta-features and the performance of the classification algorithm are extracted from the original non-transformed dataset in lines 3 and 4, respectively. Next, different transformations are applied to the dataset and from each transformed version of the dataset the necessary features (i.e., meta-features, delta meta-features) are computed — see lines 5–9. These features are then fed to the predictor in line 10.”)).
Regarding claim 4, Bilalli, in view of Tran, discloses the invention of claim 1 as discussed above. Bilalli further discloses wherein
the first change information includes a first before-preprocessing feature that is the feature of the first dataset before the first preprocessing is performed, a first after-preprocessing feature that is the feature of the first dataset after the first preprocessing is performed, and a difference between the first before-preprocessing feature and the first after-preprocessing feature, and (see Bilalli Section 4.1 (“Note that the ultimate goal is to predict the impact of transformations, and the impact per se is measured as the relative change of the performance of the algorithm before and after the transformation is applied. To this end, to the set of meta-features we consider, we attach also the base performance of the classification algorithm (i.e., the performance before the transformation is applied) and in addition we add features that capture the difference between the meta-features before and after the transformation is applied. We call these features delta meta-features. As a result, every meta-feature has its corresponding delta meta-feature. For instance, let us say that in a given dataset, before applying a transformation, the number of continuous attributes is 5. Assume we apply a transformation that is discretizing only one continuous attribute, then, the number of continuous attributes becomes 4 and thus the delta of this feature is −1 (i.e., the delta of the number of continuous attributes).”))
the second change information includes a second before-preprocessing feature that is the feature of the second dataset before each of the plurality of pieces of second preprocessing is performed, a second after-preprocessing feature that is the feature of the second dataset after each of the plurality of pieces of second preprocessing is performed, and a difference between the second before-preprocessing feature and the second after-preprocessing feature (see Bilalli Section 5.3 (“The recommending phase starts when a user wants to analyze a dataset. She selects an algorithm to be used for the analysis and the system automatically recommends transformations to be applied, such that the final result is improved. This phase is described in Algorithm 3. In Algorithm 3, first the meta-features and the performance of the classification algorithm are extracted from the original non-transformed dataset in lines 3 and 4, respectively. Next, different transformations are applied to the dataset and from each transformed version of the dataset the necessary features (i.e., meta-features, delta meta-features) are computed — see lines 5–9. These features are then fed to the predictor in line 10.”)).
Regarding claim 5, Bilalli, in view of Tran, discloses the invention of claim 1 as discussed above. Bilalli further discloses wherein
the first change information includes the feature of the first dataset before the first preprocessing is performed and the feature of the first dataset after the first preprocessing is performed, and (see Bilalli Section 4.1 (“Note that the ultimate goal is to predict the impact of transformations, and the impact per se is measured as the relative change of the performance of the algorithm before and after the transformation is applied. To this end, to the set of meta-features we consider, we attach also the base performance of the classification algorithm (i.e., the performance before the transformation is applied) and in addition we add features that capture the difference between the meta-features before and after the transformation is applied. We call these features delta meta-features. As a result, every meta-feature has its corresponding delta meta-feature. For instance, let us say that in a given dataset, before applying a transformation, the number of continuous attributes is 5. Assume we apply a transformation that is discretizing only one continuous attribute, then, the number of continuous attributes becomes 4 and thus the delta of this feature is −1 (i.e., the delta of the number of continuous attributes).”))
the second change information includes the feature of the second dataset before each of the plurality of pieces of second preprocessing is performed and the feature of the second dataset after each of the plurality of pieces of second preprocessing is performed (see Bilalli Section 5.3 (“The recommending phase starts when a user wants to analyze a dataset. She selects an algorithm to be used for the analysis and the system automatically recommends transformations to be applied, such that the final result is improved. This phase is described in Algorithm 3. In Algorithm 3, first the meta-features and the performance of the classification algorithm are extracted from the original non-transformed dataset in lines 3 and 4, respectively. Next, different transformations are applied to the dataset and from each transformed version of the dataset the necessary features (i.e., meta-features, delta meta-features) are computed — see lines 5–9. These features are then fed to the predictor in line 10.”)).
Regarding claim 6, Bilalli, in view of Tran, discloses the invention of claim 1 as discussed above. Bilalli further discloses wherein
the feature of the first dataset is generated using at least one of data that includes a number of rows of the first dataset (see Bilalli Table 4 (No. 49 Number of Instances)) and a number of columns of the first dataset excluding an objective variable (see Bilalli Table 4 (No. 50 Number of Attributes)), a number of columns of numerical data included in the first dataset (see Bilalli Table 4 (No. 28 Number of Binary Attributes)), a number of columns of character strings included in the first dataset, (see Bilalli Table 4 (No. 29 Number of Categorical Attributes)) a percentage of data missing values included in the first dataset, (see Bilalli Table 4 (Nos. 52, 53 [Number | Percentage] of Missing Values)) a statistic of each column included in the first dataset, (see Bilalli Table 4 (No. 29 Percentage of Categorical Attributes)) or a number of classes of the objective variable included in the first dataset (see Bilalli Table 4 (No. 27 Number of Categorical Attributes)).
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Bilalli and Tran as applied to claim 1 above, and further in view of Catterson (US 2021/0035017 A1; published Feb. 4, 2021).
Regarding claim 2, Bilalli, in view of Tran, discloses the invention of claim 1 as discussed above. Bilalli further discloses outputting, as the one or more pieces of recommended preprocessing, a predetermined number of pieces of recommended preprocessing with a higher prediction probability among the plurality of pieces of second preprocessing (see Bilalli Section 5.3 (“After the predicted impacts are obtained for all the transformations, they are ranked in descending order using the probabilities of being positive, which are provided by the model, in line 11.”)).
Bilalli does not expressly disclose outputting a predetermined number of pieces of recommended preprocessing (but see Catterson ¶ 58 (“The preprocessing steps predictor 404 uses the ML algorithm to identify the data type and generate various permutations of the preprocessing parameters. These permutations are then applied on a test data (a subset of the input data) to check for their respective prediction accuracy scores by the accuracy score calculator 405. The accuracy score may be classification accuracy, logarithmic loss, confusion matrix, area under curve, F1 score, mean absolute error, mean squared error, or any other performance evaluation metric.”); ¶ 74 (“The rank allocator 406 then arranges the various permutations in the decreasing order of their respective accuracy scores and assigns a rank in that order to each permutation or a predetermined number of permutations. The preprocessing steps selector 407 selects the top-ranked or a specified number of the permutations of preprocessing parameters. If more than one permutation is selected, the selected permutations may be displayed as options to the user. The user may then select a suitable option for a more customized preprocessing based on the research requirements. The algorithm generator 408 then uses the top-ranked or user selected permutation of preprocessing parameters to generate an optimized preprocessing algorithm. The predictive model 409 then performs data analysis using the optimized preprocessing algorithm.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Bilalli to incorporate the teachings of Catterson to display a predetermined number of transformation recommendations ranked by probability, at least because doing so would limit the number of transformations the user has to decide between.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Bilalli, Besim, et al. "Automated data pre-processing via meta-learning." International Conference on Model and Data Engineering. Cham: Springer International Publishing, 2016;
Gemp, Ian, Georgios Theocharous, and Mohammad Ghavamzadeh. "Automated data cleansing through meta-learning." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31. No. 2. 2017;
Gawhade, Rohan, et al. "Computerized data-preprocessing to improve data quality." 2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T). IEEE, 2022.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAHID KHAN whose telephone number is (571)270-0419. The examiner can normally be reached M-F, 9-5 est.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Jung can be reached at (571)270-3779. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHAHID K KHAN/Primary Examiner, Art Unit 2146