Last updated: April 19, 2026

Application No. 18/360,071

METHOD FOR TRAINING A CONTROL POLICY FOR CONTROLLING A TECHNICAL SYSTEM

Non-Final OA §101

Filed

Jul 27, 2023

Examiner

KASSIM, HAFIZ A

Art Unit

3623

Tech Center

3600 — Transportation & Electronic Commerce

Assignee

Robert Bosch GmbH

OA Round

1 (Non-Final)

This examiner grants 44% of cases after interview

— +53.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 338 resolved cases, 2023–2026

Examiner Intelligence

KASSIM, HAFIZ A View full profile →

Grants 44% of resolved cases

Career Allow Rate

148 granted / 338 resolved

-8.2% vs TC avg

Strong +54% interview lift

Without

With

+53.7%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

29 currently pending

Career history

367

Total Applications

across all art units

Statute-Specific Performance

§101

40.9%

+0.9% vs TC avg

§103

32.6%

-7.4% vs TC avg

§102

7.8%

-32.2% vs TC avg

§112

14.0%

-26.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 338 resolved cases

Office Action

§101

DETAILED ACTION This is a non-final, first office action on the merits. Claims 1-7 are pending. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Priority Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. DE10/20222078004 , filed on 10/05/2023 . Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-7 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. Specifically, claims 1-7 are directed to an abstract idea without additional elements amounting to significantly more than the abstract idea. With respect to Step 2A Prong One of the framework, claims 1, 5, and 7 recite an abstract idea. Claims 1, 5, and 7 include “ training a value function which, for each state of the technical system, predicts a cumulative reward that may be obtained, starting from the state, by: reducing a loss which, for a plurality of states and, for each of the states, for at least one action that has been previously carried out in the state, involves a deviation between a prediction for the cumulative reward and an estimation of the cumulative reward that is ascertained from a subsequent state that has been achieved by the action, and a reward that is obtained by the action, ascertaining a behavior control policy that reflects a selection of the previously carried out actions in the respective states of the plurality of states, wherein in the loss, for each action, the deviation for the action is weighted more strongly the greater a likelihood is that the action is selected by the control policy, in relation to a likelihood that the action is selected by the behavior control policy; and training the control policy so that it prioritizes actions that result in states for which predicts a higher value, over actions that result in states for which predicts a lower value ”. The limitations above recite an abstract idea under Step 2A Prong One. More particularly, the elements above recite mental processes-concepts performed in the human mind (including an observation, evaluation, judgment, opinion) because the elements describe a process for training a control policy. As a result, claims 1, 5, and 7 recite an abstract idea under Step 2A Prong One. Claims 2-4 and 6 further describe the process for training a control policy. As a result, claims 2-4 and 6 recite an abstract idea under Step 2A Prong One for the same reasons as stated above with respect to claims 1, 5, and 7. With respect to Step 2A Prong Two of the framework, claims 1, 5, and 7 do not include additional elements that integrate the abstract idea into a practical application. Claims 1, 5, and 7 include additional elements that do not recite an abstract idea under Step 2A Prong One. The additional elements of claims 1, 5, and 7 include a technical system, a neural network, a control device, a non-transitory computer-readable medium, a computer program, and a processor. When considered in view of the claim as a whole, the additional elements do not integrate the abstract idea into a practical application because the additional computing elements are generic computing elements that are merely used as a tool to perform the recited abstract idea. As a result, claims 1, 5, and 7 do not include additional elements that integrate the abstract idea into a practical application under Step 2A Prong Two. Claims 2-4 do not include any additional elements beyond those recited with respect to claims 1, 5, and 7. As a result, claims 2-4 do not include additional elements that integrate the abstract idea into a practical application under Step 2A Prong Two for the same reasons as stated above with respect to claims 1, 5, and 7. Claim 6 includes additional elements that do not recite an abstract idea under Step 2A Prong One. The additional elements of claim 6 includes a control device and a technical system . When considered in view of the claims as a whole, the additional elements do not integrate the abstract idea into a practical application because the additional computing elements do no more than generally link the use of the recited abstract idea to a particular technological environment. As a result, claim 6 does not include additional elements that integrate the abstract idea into a practical application under Step 2A Prong Two. With respect to Step 2B of the framework, claims 1, 5, and 7 do not include additional elements amounting to significantly more than the abstract idea. As noted above, claims 1, 5, and 7 include additional elements that do not recite an abstract idea under Step 2A Prong One. The additional elements of claims 1, 5, and 7 include a technical system, a neural network, a control device, a non-transitory computer-readable medium, a computer program, and a processor. The additional elements do not amount to significantly more than the abstract idea because the additional computing elements are generic computing elements that are merely used as a tool to perform the recited abstract idea. Further, looking at the additional elements as an ordered combination adds nothing that is not already present when considering the additional elements individually. As a result, independent claims 1, 5, and 7 do not include additional elements that amount to significantly more than the abstract idea under Step 2B. Claims 2-4 do not include any additional elements beyond those recited with respect to claims 1, 5, and 7. As a result, claims 2-4 do not include additional elements that amount to significantly more than the abstract idea under Step 2B for the same reasons as stated above with respect to claims 1, 5, and 7. Claim 6 includes additional elements that do not recite an abstract idea under Step 2A Prong One. The additional elements of claim 6 includes a control device and a technical system . The additional elements do not amount to significantly more than the abstract idea because the additional computing elements do no more than generally link the use of the recited abstract idea to a particular technological environment. Further, looking at the additional elements as an ordered combination adds nothing that is not already present when considering the additional elements individually. As a result, claim 6 does not include additional elements that amount to significantly more than the abstract idea under Step 2B. Therefore, the claims are directed to an abstract idea without additional elements amounting to significantly more than the abstract idea. Accordingly, claims 1-7 are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter. Software per se Claim 5 is further rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. A system or apparatus defined merely by software, or terms synonymous with software or files, represents functional descriptive material (e.g. data structures or software) per se . Such material is considered non-statutory when claimed without appropriate corresponding structure. Here, in the broadest reasonable interpretation consistent with the specification, the Applicant’s claim 5 recites a control device configured to train a control policy for controlling a technical system encompass functions that can be executed entirely as software per se. As currently written, the claimed system lacks structure and is therefore non-statutory. Accordingly, claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. Allowable Subject Matter Claims 1-7 appear to be allowable if rewritten to overcome the 35 USC § 101 rejection. The prior art references most closely resembling the Applicant’s claimed inventio n Bhoj et al. (US Pub No. 2019/0287004) (hereinafter Bhoj et al . ), in view of Jha et al. (US Pub No. 2021/0178600) (hereinafter Jha et al. ), and further in view of Cabi et al. (US Pub No. 2021/0078169) (hereinafter Cabi et al. ). Bhoj et al. discloses training a neural network to implement a value function which, for each state of the technical system, predicts a cumulative reward that may be obtained by controlling the technical system, starting from the state, by: ascertaining a behavior control policy that reflects a selection of the previously carried out actions in the respective states of the plurality of states (See paras [038], [0043], [053]-[059], [0076], [095], [0141], [0147], and [0300]). However the system in Bhoj does not explicitly disclose adapting the neural network for reducing a loss which, for a plurality of states and, for each of the states, for at least one action that has been previously carried out in the state, involves a deviation between a prediction for the cumulative reward by the neural network and an estimation of the cumulative reward that is ascertained from a subsequent state that has been achieved by the action, and a reward that is obtained by the action; wherein in the loss, for each action, the deviation for the action is weighted more strongly the greater a likelihood is that the action is selected by the control policy, in relation to a likelihood that the action is selected by the behavior control policy; training the control policy so that it prioritizes actions that result in states for which the neural network predicts a higher value, over actions that result in states for which the neural network predicts a lower value . Moreover, neither Bhoj et al., Jha et al., nor Cabi et al. disclose training the control policy so that it prioritizes actions that result in states for which the neural network predicts a higher value, over actions that result in states for which the neural network predicts a lower value . Analogous art Jha discloses adapting the neural network for reducing a loss which, for a plurality of states and, for each of the states, for at least one action that has been previously carried out in the state, involves a deviation between a prediction for the cumulative reward by the neural network and an estimation of the cumulative reward that is ascertained from a subsequent state that has been achieved by the action, and a reward that is obtained by the action (see Jha, paras [0012] and [0039]-[0043]). Analogous art Cabi discloses wherein in the loss, for each action, the deviation for the action is weighted more strongly the greater a likelihood is that the action is selected by the control policy, in relation to a likelihood that the action is selected by the behavior control policy (see Cabi, paras [0007] and [0092]-[0095]). Moreover, since the specific combination of claim elements training the control policy so that it prioritizes actions that result in states for which the neural network predicts a higher value, over actions that result in states for which the neural network predicts a lower value recited in claims 1, 5, and 7 cannot be found in the cited prior art and can only be found as recited in Applicant’s Specification, any combination of the cited references and/or additional references(s) to teach all the claim elements, including the aforementioned features not taught by the cited prior art, would be the result of impermissible hindsight reconstruction. Accordingly, a combination of Bhoj et al., Jha et al., Cabi et al., and/or any other additional reference(s) would be improper to teach the claimed invention. While the teachings of Bhoj et al., Jha et al., and Cabi et al. separately address different parts of the claimed invention, these teachings would not be combinable by one of ordinary skill in the art at the time of the invention with a reasonable expectation of success to provide a predictable combination that would render the claimed invention obvious. Thus, the novelty of the claimed invention is in the combination of limitations rather than any single limitation. Conclusion The prior arts made of record and not relied upon is considered pertinent to applicant's disclosure. Cadena et al, "Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age," IEEE Transactions on robotics , 2016, pp. 1309-1332 discloses estimation of the state of a robot equipped with on-board sensors and the construction of a model (the map ) of the environment that the sensors are perceiving. Hu et al. US Pub No. 2019/0266489 discloses Interaction-aware decision making may include training a first agent based on a first policy gradient, training a first critic based on a first loss function to learn goals in a single-agent environment using a Markov decision process. Mnih et al. US Pub No. 2015/0100530 discloses a method of reinforcement learning for a subject system having multiple states and actions to move from one state to the next. Jaderberg et al. US Pub No. 2024/0330701 discloses training the candidate agent neural network on the candidate task. Lu et al. US Pub No. 2021/0383218 discloses a control policy for an agent interacting with an environment. Wright et al. US Pub No. 2018/0012137 discloses controlling a system, which employs a data set representing a plurality of states and associated trajectories of an environment of the system. Plutowski et al. US Pub No. 6,473,851 discloses a plurality of overlapping policy-based controllers. System also applicable to policy-based process servers. Bohez et al. US Pat No. 10,786,900 discloses a control policy for a vehicles or other robot through the performance of a reinforcement learning simulation of the robot. Mueller et al. US Pub No. 2023/0359208 discloses a computer generates historical time and velocity data for vehicles based on data from sensor(s) observing the vehicles. Marris et al. US Pub No. 2024/0046112 discloses control policies for controlling agents in an environment. Porter et al. US Pat No. 10,766,136 discloses a machine learning system builds and uses computer models for identifying how to evaluate the level of success reflected in a recorded observation of a task. Khansari Zadeh et al. US Pub No. 2018/0222045 discloses a robot control policy that regulates both motion control and interaction with an environment and/or includes a learned potential function and/or dissipative field. Vogelsong et al. US Pat No. 11,707,838 discloses a machine learning system builds and uses control policies for controlling robotic performance of a task. Rao et al. US Pub No. 2020/0304545 discloses a policy data specifying a control policy for controlling a source agent interacting with a source environment to perform a particular task; obtaining a validation data set generated from interactions of a target agent in a target environment. Kalakrishnan et al. US Pub No. 2022/0105624 discloses a meta-learning model, for use in causing a robot to perform a task, using imitation learning as well as reinforcement learning. Chakrabarty et al. US Pub No. 2021/0003973 discloses a control system for controlling a machine with partially modeled dynamics to perform a task estimates a Lipschitz constant bounding the unmodeled dynamics of the machine. Dull et al. US Pub No. 2019/0130263 discloses a method of controlling a complex system and a gas turbine being controlled. Giering et al. US Pub No. 2018/0129974 discloses machine learning using deep reinforcement learning is applied to determine an action based on the observations. Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT HAFIZ A KASSIM whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)272-8534 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT 9:00 - 5:00 PM . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT Rutao Wu can be reached at FILLIN "SPE Phone?" \* MERGEFORMAT 571-272-6045 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /HAFIZ A KASSIM/ Primary Examiner, Art Unit 3623 2/25/2026

Read full office action

Prosecution Timeline

Jul 27, 2023

Application Filed

Feb 25, 2026

Non-Final Rejection — §101 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/341,035

Patent 12602638

RISK MANAGEMENT SYSTEM AND RISK MANAGEMENT METHOD

2y 5m to grant Granted Apr 14, 2026

17/442,870

Patent 12586008

MANAGING HOTEL GUEST HOUSEKEEPING WITHIN AN AUTOMATED GUEST SATISFACTION AND SERVICES SCHEDULING SYSTEM

2y 5m to grant Granted Mar 24, 2026

18/790,850

Patent 12561706

SYSTEMS AND METHODS FOR MANAGING VEHICLE OPERATOR PROFILES BASED ON RELATIVE TELEMATICS INFERENCES VIA A TELEMATICS MARKETPLACE

2y 5m to grant Granted Feb 24, 2026

18/211,936

Patent 12548038

Realtime Busyness for Places

2y 5m to grant Granted Feb 10, 2026

17/588,032

Patent 12541724

SYSTEMS AND METHODS FOR TIME-SERIES FORECASTING

2y 5m to grant Granted Feb 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

44%

Grant Probability

98%

With Interview (+53.7%)

2y 11m

Median Time to Grant

Low

PTA Risk

Based on 338 resolved cases by this examiner. Grant probability derived from career allow rate.