Last updated: April 19, 2026
Application No. 18/309,755
METHOD FOR OUTLIER ROBUST SUBGROUP INFERENCE VIA CLUSTERING IN THE GRADIENT SPACE

Non-Final OA §101§103
Filed
Apr 28, 2023
Examiner
HOANG, MICHAEL H
Art Unit
2122
Tech Center
2100 — Computer Architecture & Software
Assignee
Massachusetts Institute Of Technology
OA Round
1 (Non-Final)
This examiner grants 52% of cases after interview

— +25.9% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 136 resolved cases, 2023–2026
Examiner Intelligence

HOANG, MICHAEL H View full profile →
Grants 52% of resolved cases
Career Allow Rate
70 granted / 136 resolved
-3.5% vs TC avg
Strong +26% interview lift
Without
With
+25.9%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
26 currently pending
Career history
162
Total Applications
across all art units
Statute-Specific Performance

§101
30.3%
-9.7% vs TC avg
§103
45.3%
+5.3% vs TC avg
§102
9.1%
-30.9% vs TC avg
§112
12.3%
-27.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 136 resolved cases
Office Action

§101 §103
DETAILED ACTION Th is action is in response to the claims filed 04/28/2023 for Application number 04/28/2023. Claims 1-20 are currently pending. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Information Disclosure Statement The information disclosure statement s (IDS) submitted on 04/28/2023 and 01/21/2026 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement s are being considered by the examiner. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims FILLIN "Pluralize the word 'Claim' if necessary and then identify the claim(s) being rejected." 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Regarding claim 1 , Step 1 Analysis: Claim 1 is directed to a process, which falls within one of the four statutory categories. Step 2A Prong 1 Analysis: Claim 1 recites, in part, The limitations of: for each data point in the classification dataset, using gradient space partitioning ( GraSP ) to identify a gradient representation of each data point by extracting an associated gradient of a logistic regression classification loss with respect to weights of a logistic regression Th is limitation as drafted, is a process that, under broadest reasonable interpretation, covers the recitation of mathematical calculations which falls within the “Mathematical concepts” grouping of abstract ideas. The limitation of: clustering the gradient representations to provide estimated subgroup labels can be considered to be an evaluation in the human mind Th is limitation as drafted, is a process that, under broadest reasonable interpretation, covers performance of the limitation in the mind which falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements - “ a machine learning model ”. Thus, this element in the claim is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Please see MPEP 2106.05(f). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim further recites: receiving a classification dataset wherein subgroups are unlabeled and outputting cluster assignments as the estimated subgroup labels . Th ese limitation s are mere data gathering and outputting steps and thus are insignificant extra-solution activit ies . Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim as a whole is directed to an abstract idea. Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a machine learning model to perform the steps of the claimed process amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Furthermore, the limitation s of receiving a classification dataset wherein subgroups are unlabeled and outputting cluster assignments as the estimated subgroup labels . Th ese limitation s are mere data gathering and outputting steps and thus are insignificant extra-solution activit ies is well-understood, routine, and conventional, as evidenced by MPEP §2106.05(d)(II)(I), “receiving or transmitting data over a network”. These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and does not amount to significantly more. Even when considered in combination, these additional elements amount to mere instructions to apply the exception using generic computer components and insignificant extra-solution activity, which cannot provide an inventive concept. The claim is not patent eligible. Regarding claim 2 , the rejection of claim 1 is further incorporated, and further, the claim recites: u sing an outlier-robust clustering algorithm to perform the clustering of the gradient representations. This limitation amounts to m ere instructions to apply the judicial exception using a generic computer component. Please see MPEP 2106.05(f). The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. Regarding claim 3 , the rejection of claim 1 is further incorporated, and further, the claim recites: wherein classes are labeled in the classification dataset. This limitation amounts to generally linking the judicial exception to a field of use. Please see MPEP 2106.05(h). The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. Regarding claim 4 , the rejection of claim 1 is further incorporated, and further, the claim recites: learning group annotations and identifying outliers of the classification dataset. This limitation amounts to additional mental steps in addition to the judicial exception recited in the rejection of claim 1. The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. Regarding claim 5 , the rejection of claim 1 is further incorporated, and further, the claim recites: training a robust classifier using the estimated subgroup labels. This limitation amounts to m ere instructions to apply the judicial exception using a generic computer component. Please see MPEP 2106.05(f). The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. Regarding claim 6 , the rejection of claim 5 is further incorporated, and further, the claim recites: applying distributionally robust optimization (DRO) to train the robust classifier. This limitation amounts to additional mathematical concepts in addition to the judicial exception identified in the rejection of claim 1. The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. Regarding claim 7 , the rejection of claim 1 is further incorporated, and further, the claim recites: in response to receiving the classification dataset, applying a non-robust neural network classifier, wherein a last layer representation of the non-robust neural network classifier is extracted as dimension-reduced features . This limitation amounts to m ere instructions to apply the judicial exception using a generic computer component. Please see MPEP 2106.05(f). The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. Regarding claim 8 , the rejection of claim 5 is further incorporated, and further, the claim recites: wherein the gradient space partitioning is performed on the last-layer representation. This limitation amounts to additional mathematical concepts in addition to the judicial exception identified in the rejection of claim 1. The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. Regarding claim 9 , it is substantially similar to a combination of claim s 1, 2, 4, and 5 respectively, and is rejected in the same manner, the same art, and reasoning applying. Regarding claim s 1 0-13 , they are substantially similar to claim s 3 and 6-8 respectively, and are rejected in the same manner, the same art, and reasoning applying. Regarding claim 14 , it is substantially similar to claim 1 respectively, and is rejected in the same manner, the same art, and reasoning applying. Claim 14 additionally requires analysis for “ A non-transitory computer readable storage medium tangibly embodying a computer readable program code having computer readable instructions that, when executed, causes a computer device to carry out a method of …” however this an additional element that amounts to mere instructions to apply the judicial exception using a generic computer component. Please see MPEP 2106.05(f). Regarding claim s 1 5-20 , they are substantially similar to claim s 3 , 4 and 6-8 respectively, and are rejected in the same manner, the same art, and reasoning applying. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis ( i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claim s FILLIN "Insert the claim numbers which are under rejection." \d "[ 1 ]" 1, 3, 5-8, 14, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over FILLIN "Insert the prior art relied upon." \d "[ 2 ]" Sohoni et al. ("No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems" (2022), hereinafter " Sohoni ") and further in view of Mu et al. ("GRADIENTS AS FEATURES FOR DEEP REPRESENTATION LEARNIN", hereinafter "Mu") . Regarding claim 1 , Sohoni teaches A computer-implemented method for identifying relevant subgroups in a training dataset associated with a machine learning model, comprising: receiving a classification dataset wherein subgroups are unlabeled (“We propose George, a method to both measure and mitigate hidden stratification even when subclass labels are unknown.” [Abstract]) ; clustering the [ gradient ] representations to provide estimated subgroup labels (“Our approach relies on estimating unknown subclass labels by clustering a feature representation of the data.” [pg. 3, top para ; note : Sohoni teaches clustering a representation . ]) ; and outputting cluster assignments as the estimated subgroup labels (“To obtain a surrogate for this feature space, we leverage the empirical observation that feature representations of deep neural networks trained on a superclass task can carry information about unlabeled subclasses [41]. Next, to improve performance on these estimated subclasses, we minimize the maximum per-cluster average loss, by using the clusters as groups in the GDRO objective [48].” [pg. 5, §4, ¶1]) . However fails to explicitly teach for each data point in the classification dataset, using gradient space partitioning ( GraSP ) to identify a gradient representation of each data point by extracting an associated gradient of a logistic regression classification loss with respect to weights of a logistic regression Mu teaches for each data point in the classification dataset, using gradient space partitioning ( GraSP ) to identify a gradient representation of each data point by extracting an associated gradient of a logistic regression classification loss with respect to weights of a logistic regression (“ These features are gradients of the model parameters with respect to a task-specific loss given an input sample” [Abstract] … “ With trivial modifications, our method can easily extend beyond ConvNets and classification, e.g., for a recurrent network as the backbone and/or for a regression task.” [pg. 3, §3., ¶1; See also Eq( 1) ; Mu teaches that features are gradients ]) It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Sohoni’s teachings by using gradient representation of each data point as taught by Mu. One would have been motivated to make this modification as Mu notes that with trivial modifications, the method could extend beyond ConvNets and classification and work for regression tasks. [pg. 3, §3, ¶1, Mu] Regarding claim 3 , Sohoni /Mu teaches The computer-implemented method of claim 1, Sohoni teaches wherein classes are labeled in the classification dataset (“In real datasets, individual datapoints are typically described by multiple different attributes, yet often only a subset of these are captured by the class labels. For example, a dataset might consist of images labeled “cat” or “dog. ”” [pg. 4, §3.1, ¶1]) . Regarding claim 5 , Sohoni /Mu teaches The computer-implemented method of claim 1, Sohoni teaches further comprising training a robust classifier using the estimated subgroup labels. (“Formally, we train a deep neural network L◦fθ to predict the superclass labels, where f θ : X → Rd is a parametrized “ featurizer ” and L : Rd → ∆B…outputs classification logits. We then cluster the features output by fθ for the data of each superclass into k clusters, where k is chosen automatically” [pg. 5-6, §4.1, ¶1]) Regarding claim 6 , Sohoni /Mu teaches The computer-implemented method of claim 5, Sohoni teaches further comprising applying distributionally robust optimization (DRO) to train the robust classifier. (“We then exploit these estimated subclasses by training a new model to optimize worst-case performance over all estimated subclasses using group distributionally robust optimization (GDRO) ” [pg. 2, ¶2]) Regarding claim 7 , Sohoni /Mu teaches The computer-implemented method of claim 1, Sohoni teaches further comprising, in response to receiving the classification dataset, applying a non-robust neural network classifier, wherein a last layer representation of the non-robust neural network classifier is extracted as dimension- reduced features. (“The inputs are the datapoints and superclass labels. First, a model is trained with ERM on the superclass classification task. The activations of the penultimate layer are then dimensionality reduced , and clustering is applied to the resulting features to obtain estimated subclasses. Finally, a new model is trained using these clusters as groups for GDRO” [pg. 5, Figure 4 Caption]) Regarding claim 8 , Sohoni /Mu teaches The computer-implemented method of claim 7, Sohoni teaches wherein the gradient space partitioning is performed on the last-layer representation. (“The inputs are the datapoints and superclass labels. First, a model is trained with ERM on the superclass classification task. The activations of the penultimate layer are then dimensionality reduced , and clustering is applied to the resulting features to obtain estimated subclasses. Finally, a new model is trained using these clusters as groups for GDRO” [pg. 5, Figure 4 Caption ; note: Although Sohoni teaches performing dimensionality reduction on the last layer, the reference doesn’t teach gradient space partitioning, however as noted in claim 1, Mu teaches this feature thus when combined with Sohoni would teach the limitation as recited. ]) Same motivation to combine the teachings of Sohoni /Mu as claim 1. Claim 1 4 recites features similar to claim 1 and is rejected for at least the same reasons therein. Claim 1 4 additionally requires A non-transitory computer readable storage medium tangibly embodying a computer readable program code having computer readable instructions that, when executed, causes a computer device to carry out a method o f … ( “ However, we train on 4 GPUs instead of 1” [pg. 22, § CelebA , ¶1]) Regarding claim 1 7 , it is substantially similar to claim 5 respectively, and is rejected in the same manner, the same art, and reasoning applying. Regarding claim 1 8 , it is substantially similar to claim 6 respectively, and is rejected in the same manner, the same art, and reasoning applying. Regarding claim 1 9 , it is substantially similar to claim 7 respectively, and is rejected in the same manner, the same art, and reasoning applying. Regarding claim 20 , it is substantially similar to claim 8 respectively, and is rejected in the same manner, the same art, and reasoning applying. Claim s FILLIN "Insert the claim numbers which are under rejection." \d "[ 1 ]" 2, 4, 9-13, and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over FILLIN "Insert the prior art relied upon." \d "[ 2 ]" Sohoni in view Mu and further in view of Zhai et al. (" DORO:Distributional and Outlier Robust Optimization", hereinafter "Zhai") . Regarding claim 2 , Sohoni /Mu teaches The computer-implemented method of claim 1, however fails to explicitly teach further comprising using an outlier-robust clustering algorithm to perform the clustering of the gradient representations. Zhai teaches further comprising using an outlier-robust clustering algorithm to perform the clustering of the gradient representations. (“To resolve this issue, we propose the framework of DORO, for Distributional and Outlier Robust Op timization . At the core of this approach is a refined risk function which prevents DRO from overfitting to potential outliers.” [ Abtract ]) It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Sohoni’s /Mu’s teachings by implementing the outlier robust algorithm of Zhai. One would have been motivated to make this modification as it prevents DRO from overfitting to potential outliers. [Abstract, Zhai] Regarding claim 4 , Sohoni /Mu teaches The computer-implemented method of claim 1, however fails to explicitly teach further comprising learning group annotations and identifying outliers of the classification dataset. Zhai teaches further comprising learning group annotations (“For example, in an algorithmic fairness task, domains are demographic groups defined by a number of protected features such as race and sex. [pg. 2, 2.1, ¶1]) and identifying outliers of the classification dataset. (“ After some examination, we pin point one direct cause of this phenomenon: the vulnerablity of DROto outliers that widely exist in modern datasets. ” [pg. 4, 3, ¶1]) Same motivation to combine the teachings of Sohoni /Mu/Zhai as claim 2. Regarding claim 9 , Sohoni teaches A computer-implemented method for identifying relevant subgroups, in a presence of outliers, for training a classifier to be robust to the identified subgroups, comprising: receiving a classification dataset wherein subgroups are unlabeled (“We propose George, a method to both measure and mitigate hidden stratification even when subclass labels are unknown.” [Abstract]) ; clustering the [ gradient ] representations to estimate subgroup labels (“Our approach relies on estimating unknown subclass labels by clustering a feature representation of the data.” [pg. 3, top para ; note Mu teaches taking gradient representations of the input thus when combined with Sohoni would teach the recited limitation. ]) , outputting cluster assignments as the estimated subgroup labels (“To obtain a surrogate for this feature space, we leverage the empirical observation that feature representations of deep neural networks trained on a superclass task can carry information about unlabeled subclasses [41]. Next, to improve performance on these estimated subclasses, we minimize the maximum per-cluster average loss, by using the clusters as groups in the GDRO objective [48].” [pg. 5, §4, ¶1]) ; and training a robust classifier using the estimated subgroup labels (“Formally, we train a deep neural network L◦fθ to predict the superclass labels, where f θ : X → Rd is a parametrized “ featurizer ” and L : Rd → ∆B…outputs classification logits. We then cluster the features output by fθ for the data of each superclass into k clusters, where k is chosen automatically” [pg. 5-6, §4.1, ¶1]) . However Sohoni fails to explicitly teach for each data point in the classification dataset, using gradient space partitioning ( GraSP ) to identify a gradient representation of each data point by extracting an associated gradient of a logistic regression classification loss with respect to weights of a logistic regression, Mu teaches for each data point in the classification dataset, using gradient space partitioning ( GraSP ) to identify a gradient representation of each data point by extracting an associated gradient of a logistic regression classification loss with respect to weights of a logistic regression (“These features are gradients of the model parameters with respect to a task-specific loss given an input sample” [Abstract] … “With trivial modifications, our method can easily extend beyond ConvNets and classification, e.g., for a recurrent network as the backbone and/or for a regression task.” [pg. 3, §3., ¶1; See also Eq( 1)]) It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Sohoni’s teachings by using gradient representation of each data point as taught by Mu. One would have been motivated to make this modification as Mu notes that with trivial modifications, the method could extend beyond ConvNets and classification and work for regression tasks. [pg. 3, §3, ¶1, Mu] However Sohoni /Mu fails to explicitly teach wherein the GraSP further learns group annotations and identify outliers wherein clustering further comprises using an outlier-robust clustering algorithm to cluster the gradient representations Zhai teaches wherein the GraSP further learns group annotations (“For example, in an algorithmic fairness task, domains are demographic groups defined by a number of protected features such as race and sex. [pg. 2, 2.1, ¶1]) and identify outliers (“ After some examination, we pin point one direct cause of this phenomenon: the vulnerablity of DROto outliers that widely exist in modern datasets. ” [pg. 4, 3, ¶1]) ; wherein clustering further comprises using an outlier-robust clustering algorithm to cluster the gradient representations (“To resolve this issue, we propose the framework of DORO, for Distributional and Outlier Robust Op timization . At the core of this approach is a refined risk function which prevents DRO from overfitting to potential outliers.” [ Abtract ]) It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Sohoni’s /Mu’s teachings by implementing the outlier robust algorithm of Zhai. One would have been motivated to make this modification as it prevents DRO from overfitting to potential outliers. [Abstract, Zhai] Regarding claims 10-13 , they are substantially similar to claim s 3 and 6-8 respectively, and are rejected in the same manner, the same art, and reasoning applying. Regarding claim 1 5 , it is substantially similar to claim 2 respectively, and is rejected in the same manner, the same art, and reasoning applying. Regarding claim 1 6 , it is substantially similar to claim 4 respectively, and is rejected in the same manner, the same art, and reasoning applying. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT MICHAEL H HOANG whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)272-8491 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT Mon-Fri 8:30AM-4:30PM . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT Kakali Chaki can be reached at FILLIN "SPE Phone?" \* MERGEFORMAT (571) 272-3719 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /MICHAEL H HOANG/ PRIMARY EXAMINER, Art Unit 2122
Read full office action
Prosecution Timeline

Apr 28, 2023
Application Filed
Mar 20, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/235,074
Patent 12518156
Training a Neural Network using Graph-Based Temporal Classification
2y 5m to grant Granted Jan 06, 2026
17/316,998
Patent 12468934
SYSTEMS AND METHODS FOR GENERATING DYNAMIC CONVERSATIONAL RESPONSES USING DEEP CONDITIONAL LEARNING
2y 5m to grant Granted Nov 11, 2025
18/982,085
Patent 12456115
METHODS, ARCHITECTURES AND SYSTEMS FOR PROGRAM DEFINED SYSTEMS
2y 5m to grant Granted Oct 28, 2025
18/313,050
Patent 12437211
System and Method for Predicting Fine-Grained Adversarial Multi-Agent Motion
2y 5m to grant Granted Oct 07, 2025
16/879,775
Patent 12430543
Structured Sparsity Guided Training In An Artificial Neural Network
2y 5m to grant Granted Sep 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
52%
Grant Probability
77%
With Interview (+25.9%)
4y 1m
Median Time to Grant
Low
PTA Risk
Based on 136 resolved cases by this examiner. Grant probability derived from career allow rate.