Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Specification The disclosure is objected to because of the following informalities: In ¶23…” identify misclassification 125 ” should be, ‘ identify misclassification 12 0’ In ¶25…”initial dataset 115” should be, ‘initial dataset 110’ In ¶39…”and or” should be, ‘and/or’ In ¶44…”original level” should be, ‘original data’ In ¶ 49 …” which may long short term memory ” should be, ‘ which may include long short term short memory ’ In ¶ 51 …” the classifier describes above ” should be, ‘ the classifier described above ’ In ¶ 62 …” financial information for particular type of bank customers ” should be, ‘ financial information for a particular type of bank customers ’ Appropriate correction is required. Claim Objections Claims 46 and 55 are objected to because of the following informalities: “ the at least one training model is trained ” should be ‘the at least one model is trained’ Appropriate correction is required. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness . This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claims 41 - 60 are rejected under 35 USC 103 as being unpatentable over Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling Technique for Handling the Class Imbalanced Problem to Bunkhumpornpat et al. (hereinafter Bunkhumpornpat ) in view of Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments to Han et al. (hereinafter Han) . Per claim 41, Bunkhumpornpat discloses A method ( Abstract, Section 3 and fig. 1… Safe-Level-Synthetic Minority Oversampling TEchnique algorithm (Safe-Level-SMOTE ) , is algorithm/method to remedy class imbalance problems in classifiers, being applicable to classifiers such as decision trees C4.5, Naïve Bayes, and support vector machines (SVMs) , “ The class imbalanced problem occurs in various disciplines when one of target classes has a tiny number of instances comparing to other classes. A typical classifier normally ignores or neglects to detect a minority class due to the small number of class instances . SMOTE is one of over-sampling techniques that remedies this situation. … Our technique called Safe-Level-SMOTE carefully samples minority instances along the same line with different weight degree, called safe level. The safe level computes by using nearest neighbour minority instances. By synthesizing the minority instances more around larger safe level, we achieve a better accuracy performance than SMOTE and Borderline-SMOTE ” ) performed by a computer hardware arrangement ( Safe-Level-SMOTE and associated procedures are intrinsically executed by computer hardware) , the method comprising: training at least one model on at least one dataset including a plurality of data types ( Section 4 and fig. 2…Safe-Level-SMOTE applied to classifiers such as C4.5, Naïve Bayes, and SVMs, where the classifiers use training datasets like Satimage and Haberman , these training datasets are imbalanced, meaning they include a plurality of data types (instances) belonging to at least two classes: a minority class (positive instances) and a majority class (negative instances) …the classifiers are trained on these training datasets with Safe-Level-SMOTE applied to improve minority class consideration , “ Three classifiers; decision trees C4.5 [18], Naïve Bayes [12], and support vector machines (SVMs) [6], are applied as classifiers in the experiments. We use two quantitative datasets from UCI Repository of Machine Learning Databases [1]; Satimage and Haberman, illustrated in Table 3 … For Satimage dataset, we select the class label 4 as the minority class and merge the remainder classes as the majority class because we only study the two-class imbalanced problem … For Haberman dataset, the minority class is about one quarter of the whole dataset ” ) ; determining at least one misclassification of one of the plurality of data types ( Abstract…Safe-Level-SMOTE addresses the class imbalanced problem where classifiers typically fail to detect the minority class ; Section 1… performance evaluation of the classifiers evaluated by a confusion matrix, which defines False Negative (FN) and False Positive (FP) , these explicitly represent ing misclassifications , e.g., actual class label differs from predicted class label , “ The performance of classifiers is customarily evaluated by a confusion matrix as illustrated in Table 1. The rows of the table are the actual class label of an instance, and the columns of the table are the predicted class label of an instance. Typically, the class label of a minority class set as positive, and that of a majority class set as negative. TP, FN, FP, and TN are True Positive, False Negative, False Positive, and True Negative, respectively. From Table 1, the six performance measures on classification; accuracy, precision, recall, F-value, TP rate, and FP rate, are defined by formulae in (1)-(6) ” ) ; assigning a classification score to each of the plurality of data types ( Section 3 and fig. 1… Safe-Level-SMOTE technique assigns a safe level ( sl ) to each positive instance (a data type/instance of the minority class) , where t he safe level is computed as the number of positive stances in the k nearest neighbors such that i f an instance's safe level is close to 0, it is considered ‘ nearly noise ’ and if it is close to k, then it is considered ‘safe’ …t his numerical assignment guid ing the generation of synthetic data constitutes a score assigned to a data type based on its classification context , “ Based on SMOTE, Safe-Level-SMOTE, Safe-Level-Synthetic Minority Oversampling TEchnique , assigns each positive instance its safe level before generating synthetic instances. Each synthetic instance is positioned closer to the largest safe level so all synthetic instances are generated only in safe regions. The safe level ( sl ) is defined as formula (7). If the safe level of an instance is close to 0, the instance is nearly noise. If it is close to k, the instance is considered safe ” ) ; receiving at least one synthetic dataset ( Section 2… Safe-Level-SMOTE algorithm , like other SMOTE algorithms, is an over-sampling technique designed to generate synthetic minority instances to be used, e.g. , received, for training classifiers, the technique being preprocessing step ‘before feeding it into any classifiers’ where the resulting synthetic data is necessarily received for the subsequent training step , “ Re-sampling is a preprocessing technique which adjusting the distribution of an imbalanced dataset until it is nearly balanced, before feeding it into any classifiers. .. the State of the Art over-sampling technique, namely SMOTE, Synthetic Minority Over-sampling TEchnique [4]. It over-samples a minority class by taking each positive instance and generating synthetic instances along a line segments joining their k nearest neighbours in the minority class. This causes the selection of a random instance along the line segment between two specific features. The synthetic instances cause a classifier to create larger and less specific decision regions, rather than smaller and more specific regions ”; Section 3 and fig. 1…Safe-Level-SMOTE takes as input, a set of all original positive instances D and returns D′, which is the set of all synthetic positive instances , th is generated set D′ is a synthetic dataset , “ After each iteration of for loop in line 2 finishes, if the first case does not occurs, a synthetic instance s will be generated along the specific-ranged line between p and n, and then s will be added to D'. After the algorithm terminates, it returns a set of all synthetic instances D' ” ) ; training the at least one model on the synthetic dataset ( Section 2… Safe-Level-SMOTE is a re-sampling technique that modifies the training dataset distribution ‘ before feeding it into any classifiers ’, meaning the classifiers are trained based on the modified dataset, “ Re-sampling is a preprocessing technique which adjusting the distribution of an imbalanced dataset until it is nearly balanced, before feeding it into any classifiers ”; Section 4…the experimental results show classifiers , e.g., models , being evaluated after applying the over-sampling techniques to the training dataset, t his showing that the model is subsequently trained on the dataset which includes the synthetic instances generated by the Safe-Level- SMOTE technique, “ In our experiments, we use four performance measures; precision, recall, F-value, and AUC, for evaluating the performance of three over-sampling techniques; Safe-Level-SMOTE, SMOTE, and Borderline-SMOTE …T he performance measures are evaluated through 10-fold cross-validation. Three classifiers; decision trees C4.5 [18], Naïve Bayes [12], and support vector machines (SVMs) [6], are applied as classifiers in the experiments ” ) ; and determining if the at least one misclassification is generated … on the at least one synthetic dataset based on the assigned classification score being below a particular threshold ( Section 3 and fig. 1 … use of the assigned safe level (classification score) and the derived safe level ratio (formula 8) to make decisions about generating synthetic instances in safe regions and avoiding noise/misclassification regions involves determining if assigned classification score against certain thresholds…f or example, if sl_ratio =∞ AND sl p =0 (meaning the instance is noise and likely to cause misclassification), the algorithm stops generating synthetic instances for that data type , where t his use of the score ( sl p =0) as a threshold to prevent misclassification regions from being emphasized …in another example, if sl_ratio is < 1, the synthetic instance will be generated in the range [1 – safe level ratio, 1], “ After assigning the safe level to p and the safe level to n , the algorithm calculates the safe level ratio. There are five cases corresponding to the value of safe level ratio showed in the lines 12 to 28 of Fig. 1. The first case showed in the lines 12 to 14 of Fig. 1. The safe level ratio is equal to ∞ and the safe level of p is equal to 0. It means that both p and n are noises. If this case occurs, synthetic instance will not be generated because the algorithm does not want to emphasize the important of noise regions … The fifth case showed in the lines 26 to 28 of Fig. 1. The safe level ratio is less than 1. It means that the safe level of p is less than that of n. If this case occurs, a synthetic instance is positioned closer to n because n is safer than p. The synthetic instance will be generated in the range [1 - safe level ratio, 1] ” ) . Bunkhumpornpat does not expressly disclose ( Bunkhumpornpat : Sections 3-4 and figs. 1-2… use of the safe (level classification score ) to control the data generation before training , not to check misclassification generation during that training step using that score/threshold , where t he final evaluation of misclassification performance relies on traditional metrics like recall or F-value derived from the confusion matrix , “ The experimental results on the two datasets are illustrated in Fig. 2. The x-axis in these figures represents the over-sampling percent on a minority class. The y-axis in these figures represents the four performance measures; precision, recall, F-value, and AUC, in order from Fig. 2 (a) to Fig. 2 (d) ” ) , but with Han does teach: determining if the at least one misclassification is generated during the training of the at least one model on the at least one synthetic dataset based on the assigned classification score being below a particular threshold ( Han: pg s . 5- 8 and Tables 3- 7 … in Active Learning (AL), instances determined to be uncertain (potential misclassifications) are identified when their confidence score is equal to or lower than a predefined threshold ( t ha ) , t his determination happening during the iterative training/learning process , “ A detailed description of the AL strategies used in this paper are shown in Tables 3 and 4. In both strategies we start with a small set of labeled instances S l for training an initial classifier M. With this classifier, we estimate the confidence scores Cs for the instances that are candidates for labeling. In the pool-based scenario, the entire pool of unlabeled instances S u is estimated, and only those instances with confidence scores equal to or lower than the predefined threshold t ha are selected for human annotation ”; Han: pgs. 5-8 and Tables 3- 7 … explicit ly use of thresholds th a , th s , where i f the confidence score is below a particular threshold , the system determines the data is unreliable (potential misclassification) and rejects it from the auto-training set , “ Select those instances with Cs that are equal to or lower than threshold th a … Select those instances with Cs lower than th ssaL … In our approach, we first re-train the model with the human-labeled date set S a new (AL phase), and then produce the machine-labeled data set S s new (SSL phase). The purpose of this design aims at improving the quality of the data set S s new by making use of a model previously trained with reliable (human) labels. This is very important for the SSL phase, since having the model trained first with reliable annotations from the AL phase will decrease the amount of noisy data (instances with potentially wrong labels assigned) … ” ) . Bunkhumpornpat and Han are analogous art because they are both from the field of machine learning classification. Bunkhumpornpat addresses the problem of imbalanced data through synthetic generation and noise associated with misclassified data . B efore the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to appl y Han's confidence threshold to Bunkhumpornpat's synthetic data where Han generates a synthetic instance, assigns it a classification score, and determines if it is a misclassification generated during training , e.g., a bad synthetic sample , by checking if the score is below the threshold. If it is, the sample is rejected/regenerated. See pgs. 5-8 and Tables 3-7. The suggestion/motivation for doing so would have been Bunkhumpornpat teaches s ynthetic data is prone to noise as does Han . Confidence scores from Han are the standard metric for detecting noise in SVM-based systems , which Bunkhumpornpat employs ( Bunkhumpornpat : Section 4…”Three classifiers; decision trees C4.5 [18], Naïve Bayes [12], and support vector machines (SVMs) [6], are applied as classifiers in the experiments”; Han: pg. 10…”As classifier we use Support Vector Machines (SVM) [43] with linear kernels and pairwise multi-class discrimination sequential minimal optimization...”) . A PHOSITA would naturally employ Han's scoring to validate Bunkhumpornpat's synthetic outputs. This is a "predictable use of prior art elements according to their established functions" under KSR . This combination would produce a system that generates synthetic data but filters it using confidence thresholds as part of the retraining process , being the predictable sum of the parts. Per claim 42, Bunkhumpornpat combined with Han discloses claim 41, Bunkhumpornpat further disclosing sending a request for the at least one synthetic dataset ( Bunkhumpornpat : Section 4…in the experiments, two external dataset from which the synthetic data is generated based off of, must be retrieved which intrinsically requires sending a request in order to retrieve, “ We use two quantitative datasets from UCI Repository of Machine Learning Databases [1]; Satimage and Haberman, illustrated in Table 3. The first to last column of the table represents the dataset name, the number of instances, the number of attributes, the number of positive instances, the number of negative instances, and the percent of a minority class, respectively ” ) . Per claim 43, Bunkhumpornpat combined with Han discloses claim 41, Bunkhumpornpat further disclosing the at least one synthetic dataset is based on the misclassification ( Bunkhumpornpat : Section 3 and fig. 1…t he synthetic dataset is generated using Safe-Level-SMOTE to solve the class imbalanced problem ; Section 1…class imbalance problem is where the minority class is typically neglected, resulting in high misclassification rates , where the generation of synthetic data is fundamentally based on addressing the cause of misclassification , “ The minority class includes a few positive instances, and the majority class includes a lot of negative instances. In many real-world domains, analysts encounter many class imbalanced problems, such as the detection of unknown and known network intrusions [8], and the detection of oil spills in satellite radar images [13]. In these domains, standard classifiers need to accurately predict a minority class, which is important and rare, but the usual classifiers seldom predict this minority class ” ) . Per claim 44, Bunkhumpornpat combined with Han discloses claim 41, Han further disclosing sending a request for additional data related to a particular one of the data types ( Han: pg. 5 … use of certainty-based Active Learning (AL) where the model selects instances (which belong to specific sound categories/data types, such as "People" or "Animals") , t hese instances are then submitted to human annotation, meaning the system sends a ‘ query ’ or ‘ request ’ to obtain the correct label , e.g., additional data , for those specific data types/instances , “ In this paper we adopt an certainty-based AL approach. Moreover, we consider two target scenarios: pool-based scenario and stream-based scenario … In the pool-based scenario, the entire pool of unlabeled instances S u is estimated, and only those instances with confidence scores equal to or lower than the predefined threshold th a are selected for human annotation ” ) . The rationale to combine Bunkhumpornpat with Han for this teaching is the same as previously stated for the independent claim. Per claim 45, Bunkhumpornpat combined with Han discloses claim 41, Bunkhumpornpat further disclosing the at least one dataset includes one of ( i ) only real data, (ii) only synthetic data, or (iii) a combination of real data and synthetic data ( Bunkhumpornpat : Section 4 and fig. 2… use of datasets composed of only real data (ORG, representing the original dataset before oversampling) and datasets composed of a combination of real data and synthetic data (SAFE, SMOTE, BORD, after augmentation) in experiments and methodology , “ The experimental results on the two datasets are illustrated in Fig. 2. The x-axis in these figures represents the over-sampling percent on a minority class. The y-axis in these figures represents the four performance measures; precision, recall, F-value, and AUC, in order from Fig. 2 (a) to Fig. 2 (d). In these figures, ORG, SMOTE, BORD, and SAFE are the label of the original dataset, SMOTE, Borderline-SMOTE, and Safe-Level-SMOTE, respectively ” ) . Per claim 46, Bunkhumpornpat combined with Han discloses claim 41, Bunkhumpornpat further disclosing the at least one training model is trained on at least one non-synthetic dataset and at least one further synthetic dataset ( Bunkhumpornpat : Section 3 and fig. 1 … t he training involves the over-sampling technique (Safe-Level-SMOTE) which combines the original dataset (non-synthetic data D) with the generated synthetic positive instances (synthetic dataset D'), “ Safe-Level-SMOTE algorithm is showed in Fig. 1. All variables in this algorithm are described as follows. p is an instance in the set of all original positive instances D. n is a selected nearest neighbours of p. s included in the set of all synthetic positive instances D' is a synthetic instance …. After each iteration of for loop in line 2 finishes, if the first case does not occurs, a synthetic instance s will be generated along the specific-ranged line between p and n, and then s will be added to D' ”; Section 4 and fig. 2… experiments compare the original dataset (ORG label) versus those augmented by synthetic instances (SMOTE, BORD, SAFE) , hence the training takes place on a combination of non - synthetic and synthetic data , “ The experimental results on the two datasets are illustrated in Fig. 2. The x-axis in these figures represents the over-sampling percent on a minority class. The y-axis in these figures represents the four performance measures; precision, recall, F-value, and AUC, in order from Fig. 2 (a) to Fig. 2 (d). In these figures, ORG, SMOTE, BORD, and SAFE are the label of the original dataset, SMOTE, Borderline-SMOTE, and Safe-Level-SMOTE, respectively ” ) . Per claim 47, Bunkhumpornpat combined with Han discloses claim 41, Bunkhumpornpat further disclosing the at least one dataset includes an identification of each of the plurality of data types in the at least one dataset ( Bunkhumpornpat : Section 1…the dataset is defined/identified as being comprised of a ‘ minority class ’ and a ‘ majority class ’, each class construed as a data type, “ A dataset is considered to be imbalanced if one of target classes has a tiny number of instances comparing to other classes. In this paper, we consider only two-class case [5], [17]. The title of a smaller class is a minority class, and that of a bigger class is a majority class. The minority class includes a few positive instances, and the majority class includes a lot of negative instances ”; Section 1 and Table 1… the rows of the confusion matrix identify the Actual Positive and Actual Negative class labels of an instance , and s ince these class labels distinguish the data types (minority vs. majority), the dataset intrinsically includes an identification of each of the plurality of data types) . Per claim 48, Bunkhumpornpat combined with Han discloses claim 41, Bunkhumpornpat further disclosing verifying an accuracy of the at least one model using at least one verification model ( Bunkhumpornpat : Section 4 and fig. 2 … performance measure techniques used to verify the accuracy of the classifiers applied with Safe-Level-SMOTE, are construed as verification models that verify the accuracy of the classifiers, “ In our experiments, we use four performance measures; precision, recall, F-value, and AUC, for evaluating the performance of three over-sampling techniques; Safe- Level-SMOTE, SMOTE, and Borderline-SMOTE … The performance measures are evaluated through 10-fold cross-validation. Three classifiers; decision trees C4.5 [18], Naïve Bayes [12], and support vector machines (SVMs) [6], are applied as classifiers in the experiments ”) Per claim 49, Bunkhumpornpat combined with Han discloses claim 41, Bunkhumpornpat further disclosing the at least one model is a machine learning procedure ( Bunkhumpornpat : Section 4… employs several established machine learning algorithms, including decision trees C4.5, Naïve Bayes, and Support Vector Machines (SVMs), as the classifiers (models) in the experiments , “ Three classifiers; decision trees C4.5 [18], Naïve Bayes [12], and support vector machines (SVMs) [6], are applied as classifiers in the experiments ” ) . Claims 50 - 58 are substantially similar in scope and spirit to claims 41 - 49, respectively. Therefore, the rejections of claims 41 - 49 are applied accordingly. Safe-Level-SMOTE and associated procedures are intrinsically executed by computer hardware the includes instructions stored in memory Claims 59 and 60 are substantially similar in scope and spirit to claims 41 and 49, respectively. Therefore, the rejections of claims 41 and 49 are applied accordingly. Safe-Level-SMOTE and associated procedures are intrinsically executed by computer hardware in combination with the instructions/software, making up the system. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Patents and/or related publications are cited in the Notice of References Cited (Form PTO-892) attached to this action to further show the state of the art with respect to iterative training of machine learning models utilizing synthetic data generation based on classification errors. Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT ALAN CHEN whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)272-4143 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT M-F 10-7 . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT Kamran Afshar can be reached at FILLIN "SPE Phone?" \* MERGEFORMAT (571) 272-7796 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /ALAN CHEN/ Primary Examiner, Art Unit 2125