Prosecution Insights
Last updated: April 19, 2026
Application No. 17/655,523

MACHINE LEARNING MODEL TRAINING WITH EMPHASIS ON FEATURE IMPORTANCE

Non-Final OA §101§103§112
Filed
Mar 18, 2022
Examiner
ABOU EL SEOUD, MOHAMED
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
3 (Non-Final)
38%
Grant Probability
At Risk
3-4
OA Rounds
4y 2m
To Grant
77%
With Interview

Examiner Intelligence

Grants only 38% of cases
38%
Career Allow Rate
80 granted / 208 resolved
-16.5% vs TC avg
Strong +39% interview lift
Without
With
+38.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
46 currently pending
Career history
254
Total Applications
across all art units

Statute-Specific Performance

§101
16.1%
-23.9% vs TC avg
§103
48.2%
+8.2% vs TC avg
§102
15.1%
-24.9% vs TC avg
§112
14.7%
-25.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 208 resolved cases

Office Action

§101 §103 §112
DETAILED ACTION This office action is responsive to the request for continued examination filed 1/7/2026. The application contains claims 1-20, all examined and rejected. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/2/2025 has been entered. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 4, 11, and 18 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 4, 11, and 18 recite “the machine learning model”, it is unclear if the requirement is referring to the first or second machine learning model. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-4, 8-11, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over “Permutation importance: a corrected feature importance measure” Published 2010 [hereinafter D1] in view of “Learning to Reweight Examples for Robust Deep Learning” Published 2018 [hereinafter D2]. With regard to Claim 1, D1 teach a method of training a machine learning model, executable by a processor, comprising: identifying a feature associated with training data derived from a dataset (P. 1341, 2, ¶2, “The VI of a feature is computed as the average decrease in model accuracy on the OOB samples when the values of the respective feature are randomly permuted”, features identified from dataset by considering individual features as units whose importance is computed; VI computed per feature by permuting the feature and measuring accuracy, P. 1341, 2, ¶1, “random subset of a fixed size is selected from the features”); generating a first machine learning model based on the training data (P. 1341, 2, ¶1, “T decision trees using the CART methodology (Breiman et al., 1984) are trained on T bootstrap samples of the data”), wherein training the machine learning model comprises: training the machine learning model with the training data to emphasize a feature (P. 1341, 2, ¶1, “the one yielding the maximum decrease in Gini index is chosen for the split”, P. 1346, Sec. 5, ¶1, “We also introduced an improved RF model that is computed based on the most significant features determined with the PIMP algorithm”, P. 1342, 2.4, “by applying for example the classical 0.05 significance threshold. We will call the improved model PIMP-RF. The idea of using the most predictive features for retraining RF model in order to reduce variance and improve accuracy …”, RF model weights features via split criteria and later permutation importance identifies those emphasized features); utilizing the first machine learning model to compute permutations to identify the training data so that the computing of the permutations results in identifying the feature being emphasize, wherein identifying the feature comprises: evaluating feature importance within permutations of the training data (P. 1341, 2, ¶2, “The VI of a feature is computed as the average decrease in model accuracy on the OOB samples when the values of the respective feature are randomly permuted”); selecting at least a portion of the training data associated with maximizing an importance value associated with the identified feature (P. 1345, 4.3, ¶1, “The RF trained on the top-ranking 1%, 5% and 10% of the features …”, P. 1346, 5, “improved RF model that is computed based on the most significant features determined with the PIMP algorithm”), wherein the importance value corresponds to a need associated with the machine learning model (P. 1340, Abstract, “improved RF model that uses the significant variables with respect to the PIMP measure and show that its prediction accuracy is superior to that of other existing models”, P. 1341, Col. 1, ¶3, “an improved RF model termed PIMP-RF whose computation is based on the significant features and which incurs clear improvement in prediction accuracy”, P. 1347, Col. 1, ¶1, “corrected RF model based on the PIMP scores of the features and we demonstrated that in most of the cases it is superior in accuracy to the cforest model”); wherein updating the machine learning model comprises utilizing the permutations to build a new second machine learning model with the training data which emphasizes the identified feature (Abstract, “improved RF model that uses the significant variables with respect to the PIMP measure and show that its prediction accuracy is superior to that of other existing models”, P. 1341, Col. 1, ¶3, “an improved RF model termed PIMP-RF whose computation is based on the significant features and which incurs clear improvement in prediction accuracy”, P. 1347, “major drawback of the PIMP method is the requirement of time-consuming permutations of the response vector and subsequent computation of feature importance. However, our simulations showed that already a small number of permutations (e.g. 10) provided improvements over a biased base method. For stability of the results any number from 50 to 100 permutations is recommended”). D1 does not explicitly teach assigning one or more weight values to the selected portion of the training data, wherein the weight values are used to emphasize the importance of the feature within the data; and updating the machine learning model based on the assigned weight values. D2 teach a method of training a machine learning model, executable by a processor, comprising: identifying a feature associated with training data derived from a dataset (P. 2, 3.1, “Let (x; y) be an input-target pair, and {(xi; yi); 1 ≤ i ≤ N} be the training set”, P. 3, ¶2, “Let ϕ(x; ϴ) be our neural network model”); generating a first machine learning model based on the training data (P. 3, Col. 1, ¶2, “In standard training, we aim to minimize the expected loss for the training set”), wherein training the machine learning model comprises: training the machine learning model with the training data to emphasize a feature (P. 3, Col. 1, ¶3, “we aim to learn a reweighting of the inputs, where we minimize a weighted loss … since minimizing the negative training loss can usually result in unstable behavior”); selecting at least a portion of the training data associated with maximizing an importance value associated with the identified feature (P. 2, Col. 1, ¶2, “the best example weighting should minimize the loss of a set of unbiased clean validation examples”, P. 3, Col. 1, ¶6, “reweight them according to their similarity to the descent direction of the validation loss surface”, Eq. 8), wherein the importance value corresponds to a need associated with the machine learning model (P. 3, ¶4, “optimal selection of w is based on its validation performance”); assigning one or more weight values to the selected portion of the training data, wherein the weight values are used to emphasize the importance of the feature within the data (P. 1, Col. 2, ¶3, “assigning a weight to each example and minimizing a weighted training loss”, P. 4, Algorithm 1, step 11, P. 3, Col. 1, ¶3, “minimize a weighted loss … eq(1), P. 3, Col. 2, “rectify the output to get a non-negative weighting … eq(7) … eq(8)”, P. 3, “normalizing the weights of all examples in a training batch so that they sum up to one”); and updating the machine learning model based on the assigned weight values (P. 3, PNG media_image1.png 200 400 media_image1.png Greyscale , PNG media_image2.png 200 400 media_image2.png Greyscale , P. 3, “normalizing the weights of all examples in a training batch so that they sum up to one”, P. 4, Algorithm 1, Steps 12-14, , training update form, weights change which examples contribute through the weighted objective) wherein updating the machine learning model comprises to build a new second machine learning model with the training data which emphasizes the identified feature (P. 4, Algorithm 1, Step 14, optimizer step). D1 and D2 are analogous art to the claimed invention because they are from a similar field of endeavor of training machine learning models. Thus, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify D1 resulting in resolutions as disclosed by D2 with a reasonable expectation of success. One of ordinary skill in the art would be motivated to modify D1 as described above to include the ability to reweight examples to improve validation performance using a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions to minimize the loss on a clean unbiased validation set to achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available (D2, Abstract). With regard to Claim 2, D1-D2 teach the method of claim 1, further comprising partitioning the dataset into the training data and testing data (D1, P. 1341, 2.1, ”after each tree has been grown, the inputs that did not participate in the training bootstrap sample are used as test set”, , “The VI of a feature is computed as the average decrease in model accuracy on the OOB samples when the values of the respective feature are randomly permuted”, D2, Algorithm 1, Steps 2-3, P. 2, Col. 1, Col. 2, ¶4, “in order to learn general forms of training set biases, it is necessary to have a small unbiased validation to guide training’, P. 6, “Clean validation set”, Col. 1-2, “Hyper-validation set For monitoring training progress and tuning baseline hyperparameters, we split out another 5,000 hyper-validation set from the 50,000 training images”, P. 2, Col. 2, ¶2, “This is reasonable since we are optimizing on the validation set, which is strictly a subset of the full training set, and therefore suffers from its own subsample bias”). The same motivation to combine for claim 1 equally applies for current claim. With regard to Claim 3, D1-D2 teach the method of claim 2, further comprising testing the updated machine learning model based on the testing data (D1, P. 1341, 2.1, ”after each tree has been grown, the inputs that did not participate in the training bootstrap sample are used as test set, then averaging over all trees gives the test error estimate.”, “The VI of a feature is computed as the average decrease in model accuracy on the OOB samples when the values of the respective feature are randomly permuted”,D2, P. 3, “We can then look for the optimal that minimizes the validation loss fv locally at step t:”, Col. 1, Online approximation “each training iteration, we inspect the descent direction of some training examples locally on the training loss surface and reweight them according to their similarity to the descent direction of the validation loss surface”). The same motivation to combine for claim 1 equally applies for current claim. With regard to Claim 4, D1-D2 teach the method of claim 1, further comprising determining an accuracy value associated with the machine learning model (D1, P. 1341, 2.1, “The VI of a feature is computed as the average decrease in model accuracy on the OOB samples when the values of the respective feature are randomly permuted”, Col. 1-2, “Hyper-validation set For monitoring training progress and tuning baseline hyperparameters”). The same motivation to combine for claim 1 equally applies for current claim. With regard to Claim 8, Claim 8 is similar in scope to claim 1; therefore it is rejected under similar rationale. D1-D2 further teach one or more computer-readable non-transitory storage media configured to store computer program code; and one or more computer processors configured to access said computer program code and operate as instructed by said computer program code (D1, P. 1342, 2.4, “(i) training a classical RF model on the training data; (ii) computing the PIMP scores of the covariates; and (iii) training a new model with the classical RF but now using only the significant variables”, P. 1343, 4, “4.1 Simulations”, D2, P. 3, Col. 1, ¶, “For most training of deep neural networks“, P. 4, Col. 1, “implemented using popular deep learning frameworks such as TensorFlow “, Algorithm 1, algorithm must be stored in a memory and executed by a processor, “Training time Our automatic reweighting method will introduce a constant factor of overhead”). With regard to Claim 15, Claim 15 is similar in scope to claim 1; therefore it is rejected under similar rationale. D1-D2 further teach a non-transitory computer readable medium having stored there on a computer program for training a machine learning model, the computer program configured to cause one or more computer processors (D1, P. 1342, 2.4, “(i) training a classical RF model on the training data; (ii) computing the PIMP scores of the covariates; and (iii) training a new model with the classical RF but now using only the significant variables”, P. 1343, 4, “4.1 Simulations”, D2, P. 3, Col. 1, ¶, “For most training of deep neural networks“, P. 4, Col. 1, “implemented using popular deep learning frameworks such as TensorFlow “, Algorithm 1, algorithm must be stored in a memory and executed by a processor, “Training time Our automatic reweighting method will introduce a constant factor of overhead”). With regard to Claim 9, Claim 9 is similar in scope to claim 2; therefore it is rejected under similar rationale. With regard to Claim 16, Claim 16 is similar in scope to claim 2; therefore it is rejected under similar rationale. With regard to Claim 10, Claim 10 is similar in scope to claim 3; therefore it is rejected under similar rationale. With regard to Claim 17, Claim 17 is similar in scope to claim 3; therefore it is rejected under similar rationale. With regard to Claim 11, Claim 11 is similar in scope to claim 4; therefore it is rejected under similar rationale. With regard to Claim 18, Claim 18 is similar in scope to claim 4; therefore it is rejected under similar rationale. Claims 5-6, 12-13, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over “Permutation importance: a corrected feature importance measure” Published 2010 [hereinafter D1] in view of “Learning to Reweight Examples for Robust Deep Learning” Published 2018 [hereinafter D2] in view of “Wrappers for feature subset selection” Published 1997 [hereinafter D3]. With regard to Claim 5, D1-D2 teach the method of claim 1. The same motivation to combine for claim 1 equally applies for current claim. D1-D2 does not explicitly teach portion of the training data is selected, based on the accuracy value remaining above a threshold value. D3 teach portion of the training data is selected (D3, P.44, “Aha and Bankert [2] used the wrapper for identifying feature subsets“), based on the accuracy value (P. 37, “accuracy is a natural performance metric, but one can trivially use a cost function instead of accuracy as the evaluation function for the wrapper”) remaining above a threshold value (P. 21, 3.3, “An improved node is defined as a node with an accuracy estimation at least E higher than the best one found so far”, P. 37, “cross-validation estimated its accuracy to be lower than the node with 97.22% test-set accuracy”). D1-D2 and D3 are analogous art to the claimed invention because they are from a similar field of endeavor of selecting best features for training machine learning models. Thus, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify D1-D2 resulting in resolutions as disclosed by D3 with a reasonable expectation of success. One of ordinary skill in the art would be motivated to modify D1-D2 as described above to maximize classification accuracy on an unseen test set by guiding the feature subset selection. Instead of trying to maximize accuracy, identify which features were relevant, and use only those features during learning, which increase accuracy and save resources (D3, P. 2, ¶3). With regard to Claim 6, D1-D2-D3 teach the method of claim 5, further comprising stopping the machine learning model from updating based on the accuracy value falling below the threshold value (D3, P. 21, “if we have not found an improved node in the last k expansions, we terminate the search. An improved node is defined as a node with an accuracy estimation at least E higher than the best one found so far”, wrapper method iteratively updates the model by retraining on modified training portions and ceases updating when no candidate portion yields accuracy exceeding the current threshold). The same motivation to combine for claim 5 equally applies for current claim. With regard to Claim 12, Claim 12 is similar in scope to claim 5; therefore it is rejected under similar rationale. With regard to Claim 19, Claim 19 is similar in scope to claim 5; therefore it is rejected under similar rationale. With regard to Claim 13, Claim 13 is similar in scope to claim 6; therefore it is rejected under similar rationale. With regard to Claim 20, Claim 20 is similar in scope to claim 6; therefore it is rejected under similar rationale. Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over “Permutation importance: a corrected feature importance measure” Published 2010 [hereinafter D1] in view of “Learning to Reweight Examples for Robust Deep Learning” Published 2018 [hereinafter D2] in view of “Four Principles of Explainable Artificial Intelligence” Published 2021 [hereinafter D4]. With regard to Claim 7, D1-D2 teach the method of claim 1. D1-D2 does not explicitly teach identified feature comprises one or more from among fidelity, completeness, stability, certainty, compactness, comprehensibility, actionability, interactivity, translucence, coherence, novelty, and personalization associated with the machine learning model. D4 teach identified feature comprises one or more from among fidelity (P. 11, ¶7, “the paper introduces faithfulness of an explanation as ... broadly beneficial for society provided that explanations given are faithful, in the sense that they accurately convey a true understanding without hiding important details”), completeness, stability, certainty, compactness, comprehensibility, actionability, interactivity, translucence, coherence, novelty, and personalization associated with the machine learning model. D1-D2 and D4 are analogous art to the claimed invention because they are from a similar field of endeavor of improving machine learning models training via evaluation and selection mechanisms. Thus, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify D1-D2 resulting in resolutions as disclosed by D4 with a reasonable expectation of success. One of ordinary skill in the art would be motivated to modify D1-D2 as described above to Evaluate the identified features according to known explainability quality metric as fidelity is to support system trustworthiness are accuracy, privacy, reliability, robustness, safety, security (resilience), mitigation of harmful bias, transparency, fairness, and accountability (D4, ii, Executive Summary, ¶1). With regard to Claim 14, Claim 14 is similar in scope to claim 7; therefore it is rejected under similar rationale. Response to Amendment Applicant’s arguments, see Remarks P. 13-15, filed 12/2/2025/ with respect to how the current invention represent several improvements to the technology have been fully considered and are persuasive. The rejection under 35 USC 101 of claims 1-20 has been withdrawn. Applicant’s arguments with respect to claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Conclusion The prior art made of record and not relied upon is considered pertinent to the applicant’s disclosure. US Patent Application Publication No. 2021/0390458 A1 filed by Blumstein et al. that teach the ability determine features importance using permutation importance analysis See at least ¶96, “Determining the “feature importance” of various features may involve permutation importance analysis” Examiner has pointed out particular references contained in the prior arts of record in the body of this action for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and Figures may apply as well. It is respectfully requested from the applicant, in preparing the response, to consider fully the entire references as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior arts or disclosed by the examiner. It is noted that any citation to specific pages, columns, figures, or lines in the prior art references any interpretation of the references should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331-33, 216 USPQ 1038-39 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)). Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMED ABOU EL SEOUD whose telephone number is (303)297-4285. The examiner can normally be reached Monday-Thursday 9:00am-6:00pm MT. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached at (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /MOHAMED ABOU EL SEOUD/Primary Examiner, Art Unit 2148
Read full office action

Prosecution Timeline

Mar 18, 2022
Application Filed
May 23, 2025
Non-Final Rejection — §101, §103, §112
Jul 20, 2025
Interview Requested
Aug 12, 2025
Examiner Interview Summary
Aug 12, 2025
Applicant Interview (Telephonic)
Aug 20, 2025
Response Filed
Oct 27, 2025
Final Rejection — §101, §103, §112
Oct 29, 2025
Interview Requested
Dec 02, 2025
Response after Non-Final Action
Jan 07, 2026
Request for Continued Examination
Jan 24, 2026
Response after Non-Final Action
Feb 19, 2026
Non-Final Rejection — §101, §103, §112
Apr 16, 2026
Interview Requested

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602602
SYSTEMS AND METHODS FOR VALIDATING FORECASTING MACHINE LEARNING MODELS
2y 5m to grant Granted Apr 14, 2026
Patent 12578719
PREDICTION OF REMAINING USEFUL LIFE OF AN ASSET USING CONFORMAL MATHEMATICAL FILTERING
2y 5m to grant Granted Mar 17, 2026
Patent 12561565
MODEL DEPLOYMENT AND OPTIMIZATION BASED ON MODEL SIMILARITY MEASUREMENTS
2y 5m to grant Granted Feb 24, 2026
Patent 12461702
METHODS AND SYSTEMS FOR PROPAGATING USER INPUTS TO DIFFERENT DISPLAYS
2y 5m to grant Granted Nov 04, 2025
Patent 12405722
USER INTERFACE DEVICE FOR INDUSTRIAL VEHICLE
2y 5m to grant Granted Sep 02, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
38%
Grant Probability
77%
With Interview (+38.7%)
4y 2m
Median Time to Grant
High
PTA Risk
Based on 208 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month