Last updated: April 19, 2026
Application No. 18/166,696
DIRECTIONAL DRIVERS OF DEEP LEARNING MODELS BASED ON MODEL GRADIENTS

Non-Final OA §101§103
Filed
Feb 09, 2023
Examiner
WU, NICHOLAS S
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
The Bank Of New York Mellon
OA Round
1 (Non-Final)
Interview Optional

— +43.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 38 resolved cases, 2023–2026
Examiner Intelligence

WU, NICHOLAS S View full profile →
Grants 47% of resolved cases
Career Allow Rate
18 granted / 38 resolved
-7.6% vs TC avg
Strong +43% interview lift
Without
With
+43.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
44 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
26.7%
-13.3% vs TC avg
§103
52.6%
+12.6% vs TC avg
§102
3.1%
-36.9% vs TC avg
§112
17.4%
-22.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 38 resolved cases
Office Action

§101 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 are rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more. Regarding claim 1 , in step 1 of the 101 analysis set forth in MPEP 2106, the claim recites A system, comprising: a processor . The claim recites a system comprising a processor is interpreted as an apparatus . A n apparatus is one of the four statutory categories of invention. In Step 2A, Prong 1 of the 101 analysis set forth in MPEP 2106, the examiner has determined that the following limitations recite a process that, under broadest reasonable interpretation, covers a mental process or mathematical concept but for the recitation of generic computer components: for each feature from among the plurality of features: obtain a gradient that represents a rate of change of the model function based on the feature; (i.e., the broadest reasonable interpretation includes a mathematical gradient calculation of a feature , a mathematical calculation is considered a mathematical concept (MPEP 2106) ). and for each group of features from among the one or more groups of features: determine a directional driver based on the aggregated gradients, the directional driver indicating an impact of the group of features on the model output. (i.e., the broadest reasonable interpretation includes a step of evaluation and judgement and could be performed mentally or with pen and paper like determining that a gradient negatively impacts a prediction , which is either a mental process of evaluation/judgement (MPEP 2106) ). If the claim limitations, under their broadest reasonable interpretation, covers activities classified under Mental processes: concepts performed in the human mind (including observation, evaluation, judgement, or opinion) (see MPEP 2106.04(a)(2), subsection (III)) or Mathematical concepts: mathematical relationships, mathematical formulas or equations, or mathematical calculations (see MPEP 2106.04(a)(2), subsection (I)). Accordingly, the claim recites an abstract idea. In Step 2A, Prong 2 of the 101 analysis, set forth in MPEP 2106, the examiner has determined that the following additional elements do not integrate this judicial exception into a practical application : A system, comprising: a processor programmed to: (i.e., the generic computer components recited in this limitation merely add the words “apply it”, or an equivalent, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)) ). access a plurality of features and a group definition that specifies at least one or more groups of features; (i.e., the broadest reasonable interpretation of accessing features and definitions is mere data gathering, which is an insignificant extra solution activity (MPEP 2106.05(g)) ). provide the plurality of features as input to a deep learning model trained to generate a model output based on a model function and the plurality of features, wherein the deep learning model, when executed, generates the model output; (i.e., the generic computer components recited in this limitation merely add the words “apply it”, or an equivalent, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)) ). aggregate, based on the one or more groups of features, the gradients obtained from the deep learning model; (i.e., the broadest reasonable interpretation of collected calculated gradients is mere data gathering, which is an insignificant extra solution activity (MPEP 2106.05(g)) ). Since the claim does not contain any other additional elements, that amount to integration into a practical application, the claim is directed to an abstract idea. In Step 2B of the 101 analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception: Regarding limitation s (IV) and (VI), under the broadest reasonable interpretation, recite steps of mere data gathering, which has been recognized by the courts as being well-understood, routine, and conventional functions. Specifically, the courts have recognized computer functions directed to mere data gathering as well-understood, routine, and conventional functions when they are claimed in a merely generic manner or as insignificant extra-solution activity when considering evidence in view of Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018), see USPTO Berkheimer Memorandum (April 2018)). Examiner uses Berkheimer: Option 2, a citation to one or more of the court decisions discussed in MPEP 2106.05(d)(II) as noting well-understood, routine, and conventional nature of the additional elements: Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE , Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). See MPEP 2106.05(d)(II). Further, limitation ( III ), under the broadest reasonable interpretation, merely recite steps that apply generic computer components to perform judicial exceptions which merely add the words “apply it”, or an equivalent, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)) . Similarly, limitation (V), under the broadest reasonable interpretation, merely recite steps that apply using a generic machine learning model to output a result , which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Considering additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible. Regarding claim 2 , it is dependent upon claim 1 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 2 recites wherein the gradients are stored in a data object in three dimensions of batch size, input time steps corresponding to a time period, and the plurality of features . U nder the broadest reasonable interpretation, the limitations recite storing data which are steps of mere data gathering, which has been recognized by the courts as being well-understood, routine, and conventional functions. Specifically, the courts have recognized computer functions directed to mere data gathering as well-understood, routine, and conventional functions when they are claimed in a merely generic manner or as insignificant extra-solution activity (MPEP 2106.05(g)) . Therefore, claim 2 does not solve the deficiencies of claim 1. Regarding claim 3 , it is dependent upon claim 2 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 3 recites collapse the data object storing the gradients from three dimensions into a two dimensional array shaped by the batch size and the plurality of features . Under the broadest reasonable interpretation, the limitations recite reducing a matrix’s dimensions which is interpreted as using a mathematical calculation. A mathematical calculation is interpreted as a mathematical concept. Therefore, claim 3 does not solve the deficiencies of claim 2 . Regarding claim 4 , it is dependent upon claim 3 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 4 recites wherein to collapse the data object, the processor is further programmed to: average the gradients across the time dimension . Under the broadest reasonable interpretation, the limitations recite averaging values together which is interpreted as using a mathematical calculation. A mathematical calculation is interpreted as a mathematical concept. Therefore, claim 4 does not solve the deficiencies of claim 3 . Regarding claim 5 , it is dependent upon claim 1 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 5 recites wherein the group definition specifies a hierarchical grouping of features comprising: a first level having one or more variable groups each comprising a plurality of variables; a second level having each variable from among the plurality of variables, each variable comprising a group of features; and a third level comprising the features . U nder the broadest reasonable interpretation, the limitations recite collecting hierarchical level assignments which are steps of mere data gathering, which has been recognized by the courts as being well-understood, routine, and conventional functions. Specifically, the courts have recognized computer functions directed to mere data gathering as well-understood, routine, and conventional functions when they are claimed in a merely generic manner or as insignificant extra-solution activity (MPEP 2106.05(g)) . Therefore, claim 5 does not solve the deficiencies of claim 1. Regarding claim 6 , it is dependent upon claim 5 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 6 recites wherein to aggregate, based on the one or more groups of features, the gradients obtained from the deep learning model, the processor is further programmed to: for each variable from among the plurality of variables, aggregate the gradients of the group of features pertaining to the variable; and for each variable group, aggregate the aggregate gradients of the plurality of variables pertaining to the variable group, wherein each directional driver indicates an impact of each variable group on the model output. Under the broadest reasonable interpretation, the limitations recite collecting calculated gradients from different hierarchical levels which are steps of mere data gathering, which has been recognized by the courts as being well-understood, routine, and conventional functions. Specifically, the courts have recognized computer functions directed to mere data gathering as well-understood, routine, and conventional functions when they are claimed in a merely generic manner or as insignificant extra-solution activity (MPEP 2106.05(g)). Therefore, claim 6 does not solve the deficiencies of claim 5 . Regarding claim 7 , it is dependent upon claim 6 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 7 recites generate an output report based on the directional drivers for the variable groups, the output report visually showing an impact of each variable group on the model output. Under the broadest reasonable interpretation, the limitations recite outputting a report which is a step of mere data outputting, which has been recognized by the courts as being well-understood, routine, and conventional functions. Specifically, the courts have recognized computer functions directed to mere data outputting as well-understood, routine, and conventional functions when they are claimed in a merely generic manner or as insignificant extra-solution activity (MPEP 2106.05(g)). Therefore, claim 7 does not solve the deficiencies of claim 6 . Regarding claim 8 , it is dependent upon claim 7 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 8 recites store the directional drivers along with historical directional drivers over time, wherein the output report includes the directional drivers and historical directional drivers. Under the broadest reasonable interpretation, the limitations recite collecting historical data which is a step of mere data gathering, which has been recognized by the courts as being well-understood, routine, and conventional functions. Specifically, the courts have recognized computer functions directed to mere data gathering as well-understood, routine, and conventional functions when they are claimed in a merely generic manner or as insignificant extra-solution activity (MPEP 2106.05(g)). Therefore, claim 8 does not solve the deficiencies of claim 7 . Regarding claim 9 , it is dependent upon claim 1 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 9 recites wherein each directional driver comprises a positive value or a negative value, and wherein the processor is further programmed to: determine, for each directional driver, whether the impact is positive or negative based on the positive value or the negative value . Under the broadest reasonable interpretation, the limitations recite determining that a gradient negatively affects a model’s prediction based on the gradient having a negative value which is a step of evaluation and judgement which can be performed mentally or with pen and paper. The steps of evaluation and judgement are mental processes. Therefore, claim 9 does not solve the deficiencies of claim 1. Regarding claim 10 , it is dependent upon claim 1 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 10 recites determine, for each directional driver, a magnitude of the impact based on a value of the directional driver. Under the broadest reasonable interpretation, the limitations recite determining that a gradient has a larger affects a model’s prediction based on the gradient value which is a step of evaluation and judgement which can be performed mentally or with pen and paper. The steps of evaluation and judgement are mental processes. Therefore, claim 10 does not solve the deficiencies of claim 1. Regarding claim 11 , in step 1 of the 101 analysis set forth in MPEP 2106, the claim recites A method . The claim recites a method . A method is one of the four statutory categories of invention. For the Step 2A/2B analyses, since claim 11 is similar to claim 1 it is rejected under the same rationales as claim 1 . The additional limitation below fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. … , by a processor ,… (i.e., the generic computer components recited in this limitation merely add the words “apply it”, or an equivalent, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)) ). Considering additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible. Regarding claims 12-19 , they are similar to claims 2-10 and rejected under the same rationales. Regarding claim 20 , in step 1 of the 101 analysis set forth in MPEP 2106, the claim recites A non-transitory storage medium . The claim recites a non-transitory storage medium which is interpreted as an article of manufacture. An article of manufacture is one of the four statutory categories of invention. For the Step 2A/2B analyses, since claim 20 is similar to claim 1 it is rejected under the same rationales as claim 1 . The additional limitation below fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. A non-transitory storage medium storing instructions that, when executed by a processor, programs the processor to: … (i.e., the generic computer components recited in this limitation merely add the words “apply it”, or an equivalent, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)) ). Considering additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim (s) 1 , 5-7, 9-11, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Miller, et al., US Pre-Grant Publication 2025 / 0045439A1 (“Miller”) in view of Si kd ar , et al., Non-Patent Literature “ Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models ” (“ Si kd ar ”) . Regarding claim 1 , Miller discloses: A system, comprising: a processor programmed to: (Miller, ⁋ 94, “ The computing device 400 can include a processor 402 that is communicatively coupled to a memory 404 [ A system, comprising: a processor programmed to: ] . ”). access a plurality of features and a group definition that specifies at least one or more groups of features; (Miller, ⁋48, “ One or more families [ and a group definition that specifies at least one or more groups of features; ] of time-series transforms can be applied to the time-series data [ access a plurality of features ] for a predictor variable 124 to generate transformed time-series data instances 334. ”). provide the plurality of features as input to a deep learning model trained to generate a model output based on a model function and the plurality of features, wherein the deep learning model, when executed, generates the model output; (Miller, ⁋48, “ Each of the transformed time-series data instances 334 can be fed into one input node of the input layer 340 [ provide the plurality of features as input to a deep learning model trained to generate a model output based on a model function and the plurality of features, ] . Input nodes taking data instances for one family of transformations can be connected to one hidden node in the first hidden layer of the risk prediction model 120 … Hidden nodes in the first hidden layer 350A are connected to the nodes in the second hidden layer 350B, which are further connected to the output layer 360 [ wherein the deep learning model, when executed, generates the model output; ] . ”). for each feature from among the plurality of features: obtain a gradient that represents a rate of change of the model function based on the feature; (Miller, ⁋49, “ The explanatory data can indicate relationships between the time-series data instances [ for each feature from among the plurality of features: ] of the predictor variable and the output risk indicator or between the transformed time-series data instances and the output risk indicator … The explanatory data can be calculated using a points-below-max algorithm or an integrated gradients algorithm [ obtain a gradient that represents a rate of change of the model function based on the feature; ] . ”). aggregate, based on the one or more groups of features, the gradients obtained from the deep learning model; (Miller, ⁋85, “ Alternatively, or additionally, an integrated gradients algorithm may be used to generate the explanatory data. The integrated gradients algorithm can involve a reference point consisting of an alternative set of input variable values (X′, Y′), which produce an alternative score F(X′, Y′). The reference point can be chosen so that the score is above an acceptance threshold. Integrated gradients expresses the score difference F(X′, Y′)−F(X, Y) as a sum of contributions from each of the input variables in X′ and Y′ by evaluating an integral of the derivative of F over a path from (X, Y) to (X′,Y′) [ aggregate, based on the one or more groups of features, the gradients obtained from the deep learning model; ] . ”). and for each group of features from among the one or more groups of features: determine a directional driver based on the aggregated gradients, the directional driver indicating an impact of the group of features on the model output. (Miller, ⁋85, “ Alternatively, or additionally, an integrated gradients algorithm may be used to generate the explanatory data [ and for each group of features from among the one or more groups of features: determine a directional driver based on the aggregated gradients, ] . The integrated gradients algorithm can involve a reference point consisting of an alternative set of input variable values (X′, Y′), which produce an alternative score F(X′, Y′). ”, and Miller, ⁋49, “ The explanatory data can indicate relationships between the time-series data instances of the predictor variable and the output risk indicator or between the transformed time-series data instances and the output risk indicator [ the directional driver indicating an impact of the group of features on the model output. ] . ”). While Miller teaches a time based machine learning system that determines which features impact s model performance, Miller does not explicitly teach: directional driver Si kd ar teaches directional driver ( Si kd ar , pg. 868 col. 2, “ Thus we propose to use absolute value of IDG, which is the path integral of the directional gradient over the straight line path from the baseline b to the input x as the dividend of the feature group. Further, the sign of IDG may be used to signify the nature of contribution (positive or negative) to model output [ directional driver ] . ” ) . Miller and Si kd ar are both in the same field of endeavor (i.e. model explainability ). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Miller and Si kd ar to teach the above limitation(s). The motivation for doing so is that knowing the positive or negative impact of a feature improves the performance of the model (cf. Si kd ar , pg. 868 col. 2, “ the sign of IDG may be used to signify the nature of contribution (positive or negative) to model output. ”). Regarding claim 5 , Miller in view of Si kd ar teaches the system of claim 1 . Miller further teaches wherein the group definition specifies a hierarchical grouping of features comprising: a first level having one or more variable groups each comprising a plurality of variables; a second level having each variable from among the plurality of variables, each variable comprising a group of features; and a third level comprising the features. (Miller, ⁋48 and Figure 3, “ One or more families of time-series transforms [ a second level having each variable from among the plurality of variables, each variable comprising a group of features; ] can be applied to the time-series data for a predictor variable 124 [ wherein the group definition specifies a hierarchical grouping of features comprising: a first level having one or more variable groups each comprising a plurality of variables; ] to generate transformed time-series data instances 334 [ and a third level comprising the features. ] . ”). Regarding claim 6 , Miller in view of Sikdar teaches the system of claim 5 . Sikdar also teaches the directional driver as seen in claim 1. Miller further teaches : wherein to aggregate, based on the one or more groups of features, the gradients obtained from the deep learning model, the processor is further programmed to: for each variable from among the plurality of variables, aggregate the gradients of the group of features pertaining to the variable; and for each variable group, aggregate the aggregate gradients of the plurality of variables pertaining to the variable group, (Miller, ⁋86, “ Integrated gradients may be applied to a model with correlated input variables, including a model with multiple compound time-series transformations constructed as linear combinations of individual transformations [ wherein to aggregate, based on the one or more groups of features, the gradients obtained from the deep learning model, the processor is further programmed to: for each variable from among the plurality of variables, aggregate the gradients of the group of features pertaining to the variable; ] . Treating each of the compound transformations as an input variable in its own right, the integrated gradients algorithm can be applied to express the score difference as a sum of contributions from each of the input variables, including the compound time-series transformations [ and for each variable group, aggregate the aggregate gradients of the plurality of variables pertaining to the variable group, ] . ”). wherein each directional driver indicates an impact of each variable group on the model output. ( Miller, ⁋85, “ Alternatively, or additionally, an integrated gradients algorithm may be used to generate the explanatory data [ wherein each directional driver ] . The integrated gradients algorithm can involve a reference point consisting of an alternative set of input variable values (X′, Y′), which produce an alternative score F(X′, Y′). ”, and Miller, ⁋49, “ The explanatory data can indicate relationships between the time-series data instances of the predictor variable and the output risk indicator or between the transformed time-series data instances and the output risk indicator [ indicates an impact of each variable group on the model output. ] . ”). Regarding claim 7 , Miller in view of Sikdar teaches the system of claim 6 . Sikdar also teaches the directional driver as seen in claim 1. Miller further teaches : wherein the processor is further programmed to: generate an output report based on the directional drivers for the variable groups, (Miller, ⁋49, “ The explanatory data may indicate an impact a predictor variable has or a group of predictor variables have on the value of the risk indicator, such as credit score (e.g., the relative impact of the predictor variable(s) on a risk indicator) [ generate an output report based on the directional drivers for the variable groups, ] . ”). the output report visually showing an impact of each variable group on the model output. (Miller, ⁋49, “ The explanatory data may indicate an impact a predictor variable has or a group of predictor variables have on the value of the risk indicator, such as credit score (e.g., the relative impact of the predictor variable(s) on a risk indicator) [ the output report … showing an impact of each variable group on the model output. ] . ”, and Miller, ⁋51, “ As discussed above with regard to FIG. 1 , the risk assessment computing system 130 can communicate with client computing systems 104, which may send risk assessment queries to the risk assessment server 118 to request risk assessment. ”, and Miller, ⁋100, “ Another example of an output device is the presentation device 412 depicted in FIG. 4 . A presentation device 412 can include any device or group of devices suitable for providing visual [ visually ] , auditory, or other suitable sensory output … In some aspects, the presentation device 412 can include a remote client-computing device ”). Regarding claim 9 , Miller in view of Sikdar teaches the system of claim 1 . Sikdar further teaches wherein each directional driver comprises a positive value or a negative value, and wherein the processor is further programmed to: determine, for each directional driver, whether the impact is positive or negative based on the positive value or the negative value. ( Sikdar , pg. 868 col. 2, “ Thus we propose to use absolute value of IDG, which is the path integral of the directional gradient over the straight line path from the baseline b to the input x as the dividend of the feature group. Further, the sign of IDG may be used to signify the nature of contribution (positive or negative) [ wherein each directional driver comprises a positive value or a negative value, and wherein the processor is further programmed to: ] to model output [ determine, for each directional driver, whether the impact is positive or negative based on the positive value or the negative value. ] . ”). It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Sikdar with the teachings of Miller for the same reasons disclosed in claim 1 . Regarding claim 10 , Miller in view of Sikdar teaches the system of claim 1 . Sikdar further teaches wherein the processor is further programmed to: determine, for each directional driver, a magnitude of the impact based on a value of the directional driver. ( Sikdar , pg. 868 col. 2, “ The dividend of a group of features is distinct from its value and is the measure of the importance of the interaction of the features in the group … Thus we propose to use absolute value of IDG, which is the path integral of the directional gradient over the straight line path from the baseline b to the input x as the dividend of the feature group [ wherein the processor is further programmed to: determine, for each directional driver, a magnitude of the impact based on a value of the directional driver. ] . ”). It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Sikdar with the teachings of Miller for the same reasons disclosed in claim 1 . Regarding claim 11 , the claim is similar to claim 1 and rejected under the same rationales. Miller further teaches the additional limitations of by a processor (Miller, ⁋94, “The computing device 400 can include a processor 402 that is communicatively coupled to a memory 404 [ by a processor ].”). Regarding claims 15-19 , the claims are similar to claims 5-7 and 9-10 and are rejected under the same rationales. Regarding claim 20 , the claim is similar to claim 1 and is rejected under the same rationales. Miller further teaches the additional limitations of A non-transitory storage medium storing instructions that, when executed by a processor, programs the processor to: (Miller, 5, “ a non-transitory computer-readable storage medium having program code that is executable by a processor device to cause a computing device to perform operations [ A non-transitory storage medium storing instructions that, when executed by a processor, programs the processor to: ]”). Claim (s) 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Miller, et al., US Pre-Grant Publication 2025 / 0045439A1 (“Miller”) in view of Sikdar , et al., Non-Patent Literature “ Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models ” (“ Sikdar ”) and further in view of Cerliani , Non-Patent Literature “ Feature Importance with Time Series and Recurrent Neural Network ” (“ Cerliani ”) . Regarding claim 2 , Miller in view of Sikdar teaches the system of claim 1 . While the combination teaches a time based machine learning system that determines which features positively or negatively impacts model performance, the combination does not explicitly teach: wherein the gradients are stored in a data object in three dimensions of batch size, input time steps corresponding to a time period, and the plurality of features. Cerliani teaches wherein the gradients are stored in a data object in three dimensions of batch size, input time steps corresponding to a time period, and the plurality of features. ( Cerliani , pg. 4, “ To train our sequential neural network we rearrange the data properly as 3D sequences of dimension: sample x time dimension x features (i.e. wherein the gradients are stored in a data object in three dimensions of batch size, input time steps corresponding to a time period, and the plurality of features. ) . ”). Miller, in view of Sikdar , and Cerliani are both in the same field of endeavor (i.e. feature importance ). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Miller, in view of Sikdar , and Cerliani to teach the above limitation(s). The motivation for doing is that considering time steps, batch size, and features aids in explaining time-constrained models (cf. Cerliani , pg. 3, “ In this post, I investigate the decision taken by a neural network trained to forecast the future. I used a recurrent structure to automatically learn information also from the time dimension. With some simple steps, we extract all we needed to understand the output of our model. ”). Regarding claim 12 , the claim is similar to claim 2 and rejected under the same rationales. Claim (s) 3 -4 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Miller, et al., US Pre-Grant Publication 2025 / 0045439A1 (“Miller”) in view of Sikdar , et al., Non-Patent Literature “ Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models ” (“ Sikdar ”) and further in view of Cerliani , Non-Patent Literature “ Feature Importance with Time Series and Recurrent Neural Network ” (“ Cerliani ”) and Brownlee, “ A Gentle Introduction to Mini-Batch Gradient Descent and How to Configure Batch Size ” (“Brownlee”) . Regarding claim 3 , Miller in view of Sikdar and Cerliani teaches the system of claim 2 . Cerliani further teaches wherein the processor is further programmed to: collapse the data object storing the gradients from three dimensions into a two dimensional array shaped by the batch size and the plurality of features. ( Cerliani , pg. 7-8, “Our idea is to check the contribution of each single input feature on the final prediction output. The contribution in our case is given by the value of the gradients obtained from the differentiation operation of the input sequences on the forecasts. With Tensorflow , the implementation of this method is only 3 steps: use the GradientTape object to capture the gradients on the input; get the gradients with tape.gradient : this operation produces gradients of the same shape of the single input sequence (time dimension x features) [ collapse the data object storing the gradients from three dimensions into a two dimensional array shaped by the batch size and the plurality of features. ]; obtain the impact of each sequence feature as average over the time dimension.”). Miller, Sikdar , and Cerliani are all in the same field of endeavor (i.e. feature importance ). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Miller, Sikdar , and Cerliani to teach the above limitation(s). The motivation for doing so is that collapsing gradients along the time dimension provides the impact of each feature on the model (cf. Cerliani , pg. 8, “obtain the impact of each sequence feature as average over the time dimension.”). While Cerliani teaches collapsing a gradient to a 2D matrix, the combination does not explicitly teach … shaped by the batch size … . Brownlee teaches … shaped by the batch size … (Brownlee, pg. 4, “Mini-batch gradient descent is a variation of the gradient descent algorithm that splits the training dataset into small batches that are used to calculate model error and update model coefficients [ … shaped by the batch size … ] .”). Miller , in view of Sikdar and Cerliani , and Brownlee are both in the same field of endeavor (i.e. machine learning ). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Miller , in view of S ikdar and Cerliani , and Brownlee to teach the above limitation(s). The motivation for doing so is that including batch size in model update gradients can affect the amount of update steps seen thus improving visibility into model performance (cf. Brownlee, pg. 4, “ The model update frequency is higher than batch gradient descent which allows for a more robust convergence, avoiding local minima. ” ) . Regarding claim 4 , Miller in view of Sikdar , Cerliani , and Brownlee teaches the system of claim 3 . Cerliani further teaches wherein to collapse the data object, the processor is further programmed to: average the gradients across the time dimension. ( Cerliani , pg. 7-8, “Our idea is to check the contribution of each single input feature on the final prediction output. The contribution in our case is given by the value of the gradients [ gradients ] obtained from the differentiation operation of the input sequences on the forecasts. With Tensorflow , the implementation of this method is only 3 steps: use the GradientTape object to capture the gradients on the input; get the gradients with tape.gradient : this operation produces gradients of the same shape of the single input sequence (time dimension x features) [ wherein to collapse the data object ] ; obtain the impact of each sequence feature as average over the time dimension [ the processor is further programmed to: average the gradients across the time dimension. ].”). It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Cerliani with the teachings of Miller, Sikdar , and Brownlee for the same reasons disclosed in claim 3 . Regarding claims 13-14 , the claims are similar to claims 3-4 and rejected under the same rationales. Claim (s) 8 is rejected under 35 U.S.C. 103 as being unpatentable over Miller, et al., US Pre-Grant Publication 2025 / 0045439A1 (“Miller”) in view of Sikdar , et al., Non-Patent Literature “ Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models ” (“ Sikdar ”) and further in view of Alam, et al., US Pre-Grant Publication 2024 / 0176889A1 (“Alam”) . Regarding claim 8 , Miller in view of Sikdar teaches the system of claim 7 . Sikdar also teaches the directional driver as seen in claim 1. While the combination teaches a time based machine learning system that determines which features positively or negatively impacts model performance, the combination does not explicitly teach: wherein the processor is further programmed to: store the directional drivers along with historical directional drivers over time, wherein the output report includes the directional drivers and historical directional drivers. Alam teaches: store the directional drivers (Alam, ⁋19, “ The aggregation can include a linear combination or a non-linear combination of the determined impacts for an attribute at the different evaluation time windows. The overall impacts of the attributes can then be ordered and be used to select a list of attributes that mostly impact the risk indicator change from Rs to Re. This list of attributes can be included in the assessment results and sent to a remote computing device as a response to the request for the risk assessment [ store the directional drivers ] . ”). along with historical directional drivers over time, (Alam, ⁋20, “ As an illustrative example, the assessment results may be used to improve the risk indicator (e.g., reduce the risk indicator value) associated with the entity at future time [ along with historical directional drivers over time, ] . For example, if the impact of an attribute is negative (i.e., the attribute increases the value of risk indicator during the time period Ts to Te ), the entity may perform certain actions to modify the value of the attribute so that the predicted risk indicator decreases in the future ”). wherein the output report includes the directional drivers and historical directional drivers. (Alam, ⁋20, “ The generated assessment results [ wherein the output report includes the directional drivers ] can be utilized in various applications to improve the operations of the corresponding systems. As an illustrative example, the assessment results may be used to improve the risk indicator (e.g., reduce the risk indicator value) associated with the entity at future time [ and historical directional drivers. ] . ”). Miller, in view of Sikdar , and Alam are both in the same field of endeavor (i.e. feature importance ). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Miller, in view of Sikdar , and Alam to teach the above limitation(s). The motivation for doing so is that storing historical feature importance aids in making future determinations (cf. Alam, ⁋20, “ the assessment results may be used to improve the risk indicator (e.g., reduce the risk indicator value) associated with the entity at future time. For example, if the impact of an attribute is negative (i.e., the attribute increases the value of risk indicator during the time period Ts to Te ), the entity may perform certain actions to modify the value of the attribute so that the predicted risk indicator decreases in the future. ”). Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT NICHOLAS S WU whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)270-0939 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT Monday - Friday 8:00 am - 4:00 pm EST . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT Michelle Bechtold can be reached at FILLIN "SPE Phone?" \* MERGEFORMAT 571-431-0762 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /N.S.W./ Examiner, Art Unit 2148 /MICHELLE T BECHTOLD/ Supervisory Patent Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Feb 09, 2023
Application Filed
Dec 09, 2025
Non-Final Rejection — §101, §103
Mar 31, 2026
Applicant Interview (Telephonic)
Mar 31, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/882,311
Patent 12488244
APPARATUS AND METHOD FOR DATA GENERATION FOR USER ENGAGEMENT
2y 5m to grant Granted Dec 02, 2025
17/444,687
Patent 12423576
METHOD AND APPARATUS FOR UPDATING PARAMETER OF MULTI-TASK MODEL, AND STORAGE MEDIUM
2y 5m to grant Granted Sep 23, 2025
17/265,476
Patent 12361280
METHOD AND DEVICE FOR TRAINING A MACHINE LEARNING ROUTINE FOR CONTROLLING A TECHNICAL SYSTEM
2y 5m to grant Granted Jul 15, 2025
17/191,518
Patent 12354017
ALIGNING KNOWLEDGE GRAPHS USING SUBGRAPH TYPING
2y 5m to grant Granted Jul 08, 2025
17/161,152
Patent 12333425
HYBRID GRAPH NEURAL NETWORK
2y 5m to grant Granted Jun 17, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
47%
Grant Probability
90%
With Interview (+43.1%)
3y 9m
Median Time to Grant
Low
PTA Risk
Based on 38 resolved cases by this examiner. Grant probability derived from career allow rate.