Last updated: May 29, 2026

Application No. 18/102,388

REINFORCEMENT LEARNING SYSTEM FOR MAINTENANCE DECISION MAKING

Non-Final OA §103§112

Filed

Jan 27, 2023

Examiner

SCHNEE, HAL W

Art Unit

2129

Tech Center

2100 — Computer Architecture & Software

Assignee

Hitachi, Ltd.

OA Round

2 (Non-Final)

Interview Optional

— +22.3% interview lift. Examiner has a relatively high allowance rate (84%); +22.3% interview lift. A written response may suffice.

Based on 600 resolved cases, 2023–2026

Examiner Intelligence

SCHNEE, HAL W View full profile →

Grants 84% — above average

Career Allowance Rate

507 granted / 600 resolved

+29.5% vs TC avg

Strong +22% interview lift

Without

With

+22.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

14 currently pending

Career history

614

Total Applications

across all art units

Statute-Specific Performance

§101

5.7%

-34.3% vs TC avg

§103

58.9%

+18.9% vs TC avg

§102

4.1%

-35.9% vs TC avg

§112

27.9%

-12.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 600 resolved cases

Office Action

§103 §112

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending in this application. Claims 1, 3, 7, 9, 11, 15, and 17 are amended and claims 19-20 are new by applicant’s amendment filed 21 January 2026.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2-3, 10-11, and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding Claims 2 and 10, they appear redundant to the amended limitations of claims 1 and 9, respectively. Claims 2 and 10 recite “wherein the generating the next action as the model outputs further comprises generating a confidence score of the decision maker model and explanation information as part of the model outputs.” However, the amendments to claims 1 and 9 include more detailed generation of a confidence score and explanation information.
Regarding Claims 3 and 11, they recite “the decision maker training engine” and “the database.” These terms lack antecedent basis, so they are indefinite.
Regarding Claim 18, it is repeated in the claim listing filed 21 January 2026.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-9, and 12-18 are rejected under 35 U.S.C. 103 as being unpatentable over Khorasgani, Hamed, et al. (“An offline deep reinforcement learning for maintenance decision-making,” arXiv preprint arXiv:2109.15050 (2021); hereinafter “Khorasgani”) in view of in view of Moradian et al. (U.S. 2023/0185268; hereinafter “Moradian”), and further in view of Botari, Tiago, et al. (“MeLIME: Meaningful local explanation for machine learning models,” arXiv preprint arXiv:2009.05818 (2020); hereinafter “Botari”).
Regarding Claim 1, Khorasgani teaches an offline reinforcement learning method for predictive maintenance of equipment (Abstract), the method comprising:
receiving an expected future return value as input to a decision maker model, wherein the decision maker model is a machine learning model that predicts maintenance action associated with the equipment (section 2, equations 3 and 4—the expected reward R is an input to the decision maker model. The “Total expected reward” portion of section 2 explains that the total expected reward used as input is within the vicinity of the maximum episodic return, so the input to the model comprises an expected future return value);
feeding recent observations and recent actions from an environment as inputs to the decision maker model (section 2, equations 1, 2, and 4—observations O and actions A are fed into the model);
generating a next action as model outputs of the decision maker model, wherein the next action is the predicted maintenance action associated with the equipment (section 2, equation 4—next action âk is generated by the model); and
executing the next action in the environment (section 2 and fig. 1—“The idea here is to solve maintenance decision-making using on offline supervised RL approach,” indicating that the generated action is executed in the environment to maintain a machine. See also the applications in section 3.6).
Khorasgani does not specifically teach generating a confidence score indicating reliability of the next action, and explanation information comprising feature-wise explanation identifying which input features are most responsible for the next action and instance-wise explanation showing which training samples most influenced the model output for generating the next action; and executing the next action in the environment.
However, Moradian teaches generating a confidence score indicating reliability of output of a predictive model (¶ [0085] – [0086]—a confidence score is generated that indicates a level of confidence in the model outputs).
These claimed elements were known in Khorasgani and Moradian and could have been combined by known methods with no change in their respective functions. It therefore would have been obvious to a person of ordinary skill in the art at the time of filing of the applicant’s invention to combine the confidence score of Moradian with the generating the next action of Khorasgani to yield the predictable result of generating a next action as model outputs of the decision maker model, wherein the next action is the predicted maintenance action associated with the equipment, a confidence score indicating reliability of the next action. One would be motivated to make this combination for the purpose of facilitating improvement of the model by re-training when confidence in the model is low (Moradian, ¶ [0086]).
Khorasgani/Moradian does not specifically teach generating explanation information comprising feature-wise explanation identifying which input features are most responsible for the next action and instance-wise explanation showing which training samples most influenced the model output for generating the next action; and executing the next action in the environment.
However, Botari teaches explanation information comprising feature-wise explanation identifying which input features are most responsible for a model output and instance-wise explanation showing which training samples most influenced the model output (p. 2, fig. 1 and pp. 11-14, Iris dataset and MNIST dataset, including figs. 5 and 6—the explanations generated include both feature importance that identifies which input features are most responsible for a model output and an instance-wise explanation showing which training samples most influenced the classification output of the model).
All of the claimed elements were thus known in Khorasgani/Moradian and Botari and could have been combined by known methods with no change in their respective functions. It therefore would have been obvious to a person of ordinary skill in the art at the time of filing of the applicant’s invention to combine the feature-wise and instance-wise explanations of Botari with the generating the next action of Khorasgani to yield the predictable result of generating a next action as model outputs of the decision maker model, wherein the next action is the predicted maintenance action associated with the equipment, a confidence score indicating reliability of the next action, and explanation information comprising feature-wise explanation identifying which input features are most responsible for the next action and instance-wise explanation showing which training samples most influenced the model output for generating the next action. One would be motivated to make this combination for the purpose of producing more meaningful explanations compared to other techniques (Botari, Abstract).
Regarding Claim 9, Khorasgani teaches a non-transitory computer readable medium, storing instructions for predictive maintenance of equipment (abstract and section 3—it is understood that the instructions and dataset for a machine learning model are stored on a computer readable medium). Khorasgani, Moradian, and Botari teach the instructions, when executed by a processor, perform the operations of the present claim in the same manner as for claim 1.
Regarding Claims 2 and 10, Khorasgani/Moradian/Botari teaches wherein the generating the next action as the model outputs further comprises generating a confidence score of the decision maker model (Moradian, (¶ [0085] – [0086]—a confidence score is generated that indicates a level of confidence in the model outputs) and explanation information as part of the model outputs (Botari, p. 2, fig. 1 and pp. 11-14, Iris dataset and MNIST dataset, including figs. 5 and 6—the explanations generated include both feature importance that identifies which input features are most responsible for a model output and an instance-wise explanation showing which training samples most influenced the classification output of the model).
Regarding Claims 3 and 11, Khorasgani/Moradian/Botari teaches comparing the confidence score against a threshold; and if the confidence score is below the threshold, retraining the machine learning model with observations observed more recent in time than the recent observations and actions observed more recent in time than the recent actions as inputs (Moradian, ¶ [0086]—the model is re-trained when the confidence score is below a threshold),
wherein retraining the machine learning model comprises using the decision maker training engine to randomly draw sample batches from the database and update parameters using gradient descent (Moradian, ¶ [0082] and [0156]—training is performed using a batch of training data; a random selection of samples is well-known in the art. Training using backpropagation to update weights is a gradient descent method).
Regarding Claims 4 and 12, Khorasgani/Moradian/Botari teaches displaying the model outputs on a graphical user interface (GUI) (Khorasgani, section 3.6 and figs. 3 and 4—the remaining useful life {RUL} is the output of the model, displayed in a graph).
Regarding Claims 5 and 13, Khorasgani/Moradian/Botari teaches:
feeding the recent observations as input to a remaining useful life (RUL) estimator; generating estimated remaining useful life of the equipment as output from the RUL estimator; and feeding the generated estimated remaining useful life of the equipment as input to the decision maker model in generating the next action (Khorasgani, fig. 1; sections 2 and 3.4—observations are input to a RUL estimator, which generates estimated remaining useful life that is fed into the decision maker model as input).
Regarding Claims 6 and 14, Khorasgani/Moradian/Botari teaches displaying the model outputs and the estimated remaining useful life of the equipment on a graphical user interface (GUI) (Khorasgani, sections 3.4 and 3.6, and figs. 3 and 4—the graph displays the estimated remaining useful life and the actual remaining useful life as determined by the model).
Regarding Claims 7 and 15, Khorasgani/Moradian/Botari teaches identifying a subset of the inputs that are most responsible to the generation of the decision maker model’s model outputs, wherein the subset of the inputs directly impacts the generation of the next action (Khorasgani, section 2, equation 4—the observations and actions of window of time T are identified as relevant, and are used an inputs to the decision maker model so that it can generate the next action).
Regarding Claims 8 and 16, Khorasgani/Moradian/Botari teaches storing data from a plurality of sensors as the recent observations and the recent actions in a database; and retrieving the recent observations and the recent actions from the database (Khorasgani, sections 3 and 3.1—sensor outputs are stored in the dataset).
Regarding Claim 17, Khorasgani teaches an offline reinforcement learning method for predictive maintenance of equipment (Abstract), the method comprising:
preparing time-series data of past observations and associated past actions (sections 2 and 3);
splitting the time-series data into episodes and computing rewards associated with the episodes (section 2—the windows with a length T are episodes);
storing the episodes and the rewards in a database (sections 3 and 3.5—training is performed offline, so the datasets, including the episodes and rewards, are clearly stored in a database);
initializing a decision maker model, wherein the decision maker model is a machine learning model that predicts maintenance action associated with the equipment (section 2);
training the decision maker model by randomly drawing a sample batch from the database, computing loss associated with the sample batch, and updating parameters of the decision maker model using gradient descent of the loss (sections 3 and 3.1—each trajectory in the training dataset can be considered a batch. Training with a regression normalization model updates parameters using gradient descent of a loss);
receiving expected future return value as input to the decision maker model (section 2, equations 3 and 4—the expected reward R is an input to the decision maker model. The “Total expected reward” portion of section 2 explains that the total expected reward used as input is within the vicinity of the maximum episodic return, so the input to the model comprises an expected future return value);
feeding recent observations and recent actions from environment as inputs to the decision maker model (section 2, equations 1, 2, and 4—observations O and actions A are fed into the model);
generating a next action as model outputs of the decision maker model, wherein the next action is the predicted maintenance action associated with the equipment (section 2, equation 4—next action âk is generated by the model); and
executing the next action in the environment (section 2 and fig. 1—“The idea here is to solve maintenance decision-making using on offline supervised RL approach,” indicating that the generated action is executed in the environment to maintain a machine. See also the applications in section 3.6).
Khorasgani does not specifically teach generating a confidence score indicating reliability of the next action, and explanation information comprising feature-wise explanation identifying which input features are most responsible for the next action and instance-wise explanation showing which training samples most influenced the model output for generating the next action; and executing the next action in the environment.
However, Moradian teaches generating a confidence score indicating reliability of output of a predictive model (¶ [0085] – [0086]—a confidence score is generated that indicates a level of confidence in the model outputs).
These claimed elements were known in Khorasgani and Moradian and could have been combined by known methods with no change in their respective functions. It therefore would have been obvious to a person of ordinary skill in the art at the time of filing of the applicant’s invention to combine the confidence score of Moradian with the generating the next action of Khorasgani to yield the predictable result of generating a next action as model outputs of the decision maker model, wherein the next action is the predicted maintenance action associated with the equipment, a confidence score indicating reliability of the next action. One would be motivated to make this combination for the purpose of facilitating improvement of the model by re-training when confidence in the model is low (Moradian, ¶ [0086]).
Khorasgani/Moradian does not specifically teach generating explanation information comprising feature-wise explanation identifying which input features are most responsible for the next action and instance-wise explanation showing which training samples most influenced the model output for generating the next action; and executing the next action in the environment.
However, Botari teaches explanation information comprising feature-wise explanation identifying which input features are most responsible for a model output and instance-wise explanation showing which training samples most influenced the model output (p. 2, fig. 1 and pp. 11-14, Iris dataset and MNIST dataset, including figs. 5 and 6—the explanations generated include both feature importance that identifies which input features are most responsible for a model output and an instance-wise explanation showing which training samples most influenced the classification output of the model).
All of the claimed elements were thus known in Khorasgani/Moradian and Botari and could have been combined by known methods with no change in their respective functions. It therefore would have been obvious to a person of ordinary skill in the art at the time of filing of the applicant’s invention to combine the feature-wise and instance-wise explanations of Botari with the generating the next action of Khorasgani to yield the predictable result of generating a next action as model outputs of the decision maker model, wherein the next action is the predicted maintenance action associated with the equipment, a confidence score indicating reliability of the next action, and explanation information comprising feature-wise explanation identifying which input features are most responsible for the next action and instance-wise explanation showing which training samples most influenced the model output for generating the next action. One would be motivated to make this combination for the purpose of producing more meaningful explanations compared to other techniques (Botari, Abstract).
Regarding Claim 18, Khorasgani/Moradian/Botari teaches the sample batch comprising sequences of past observations, past actions, and associated expected future return value (Khorasgani, section 2, equations 1, 2, and 4—observations O, actions A, and returns R are fed into the model).
Regarding Claim 19, Khorasgani/Moradian/Botari teaches wherein the feature-wise explanation is generated using at least one of Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), or Gradient-weighted Class Activation Mapping (Grad-CAM) (Botari, p. 1, Abstract and pp. 4-5, Methodology—the MeLIME method is a variation of LIME).
Regarding Claim 20, Khorasgani/Moradian/Botari teaches wherein the confidence score is generated using at least one of a Bayesian neural network, Deep Ensembles, Monte Carlo Dropout, or Quantile Neural Network (Moradian, ¶ [0085] – [0086] describes the confidence score, which is generated by machine learning model 190. ¶ [0084] states that machine learning model 190 may be multiple machine learning models 190, and they may be deep neural networks; machine learning models 190 may therefore comprise deep ensemble).

Response to Arguments
The amendments to the claims are accepted as overcoming the previous rejections under 35 U.S.C. 112(b). Note, however, the new rejections under 35 U.S.C. 112(b) above, which were necessitated by the amendments to the claims.
Applicant’s arguments with respect to claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Although Khorasgani does not teach all of the amended limitations of claims 1, 9, and 17, new prior art reference Botari, in combination with Khorasgani and Moradian, teaches these limitations, as detailed above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Cai, Carrie J., Jonas Jongejan, and Jess Holbrook (“The effects of example-based explanations in a machine learning interface,” Proceedings of the 24th international conference on intelligent user interfaces. 2019) teaches example-based explanations of a machine learning model that show which training examples led the model to output a particular classification.
Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAL W SCHNEE whose telephone number is (571) 270-1918. The examiner can normally be reached M-F 7:30 a.m. - 6:00 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at 303-297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAL SCHNEE/           Primary Examiner, Art Unit 2129

Read full office action

Prosecution Timeline

Jan 27, 2023

Application Filed

Nov 28, 2025

Non-Final Rejection mailed — §103, §112

Jan 21, 2026

Response Filed

Feb 10, 2026

Final Rejection mailed — §103, §112

Mar 20, 2026

Response after Non-Final Action

May 04, 2026

Request for Continued Examination

May 05, 2026

Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

17/877,063

Patent 12632741

AGENT TRAINING METHOD, APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM

3y 9m to grant Granted May 19, 2026

17/954,006

Patent 12632699

Temporal-Aware and Local-Aggregation Graph Neural Networks

3y 7m to grant Granted May 19, 2026

17/825,868

Patent 12619864

EFFICIENT LOOK-UP TABLE BASED FUNCTIONS FOR ARTIFICIAL INTELLIGENCE (AI) ACCELERATOR

3y 11m to grant Granted May 05, 2026

17/887,183

Patent 12619859

METHOD AND APPARATUS FOR NEURAL NETWORK OPERATION

3y 8m to grant Granted May 05, 2026

17/558,327

Patent 12608593

COMPRESSING IMAGE-TO-IMAGE MODELS

4y 4m to grant Granted Apr 21, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

2-3

Expected OA Rounds

84%

Grant Probability

99%

With Interview (+22.3%)

2y 9m (~0m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 600 resolved cases by this examiner. Grant probability derived from career allowance rate.