Prosecution Insights
Last updated: April 19, 2026
Application No. 18/238,337

CONTROL DEVICE, CONTROL SYSTEM, CONTROL METHOD, AND COMPUTER READABLE MEDIUM STORING CONTROL PROGRAM

Non-Final OA §102§103
Filed
Aug 25, 2023
Examiner
WENG, PEI YONG
Art Unit
2141
Tech Center
2100 — Computer Architecture & Software
Assignee
Mitsubishi Electric Corporation
OA Round
1 (Non-Final)
79%
Grant Probability
Favorable
1-2
OA Rounds
3y 3m
To Grant
99%
With Interview

Examiner Intelligence

Grants 79% — above average
79%
Career Allow Rate
506 granted / 637 resolved
+24.4% vs TC avg
Strong +23% interview lift
Without
With
+23.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
18 currently pending
Career history
655
Total Applications
across all art units

Statute-Specific Performance

§101
12.4%
-27.6% vs TC avg
§103
49.3%
+9.3% vs TC avg
§102
19.2%
-20.8% vs TC avg
§112
8.8%
-31.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 637 resolved cases

Office Action

§102 §103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. DETAILED ACTION This action is responsive to the following communication: Non-Provisional Application filed Aug . 25 , 202 3 . Claims 1- 9 are pending in the case. Claims 1 and 7-9 are independent claims. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale , or otherwise available to the public before the effective filing date of the claimed invention. Claims 1 -6 and 8-9 are rejected under 35 U.S.C. 10 2 (a )(1 ) as being anticipated by “ Human-Like Autonomous Vehicle Speed Control by Deep Reinforcement learning with Double Q-Learning” Zhang et al. (hereinafter Zhang ) 2018 . With respect to independent claim 1 , Zhang teaches a control device comprising: state data acquisition circuitry to acquire state data indicating a state of a control target (see e.g., Page 2 -3 ) ; state category identification circuitry to identify a state category to which a state indicated by the state data belongs among a plurality of state categories indicating classifications of states of the control target on the basis of the state data ( see e.g., Fig. 1 and Page 2 – “ As shown in Fig.1, the general agent-environment interaction modeling (of both traditional RL and the emerging DRL) consists of an agent, an environment, a finite state space S, a set of available actions A, and a reward function: S×A → R. The decision maker is called the agent, and should be trained as the interaction system runs. The agent needs to interact with the outside, which is called the environment. The interaction between the agent and the environment is a continual process. At each decision epoch k, the agent will make ” It is implicit that the state categories are different and that they would have to be identified and classified ) ; reward generation circuitry to calculate a reward value of a control detail for the control target on the basis of the state category and the state data ( see e.g., Page 2 and 5 – “ Reward network: Reward is necessary in almost all reinforcement learning algorithms offering the goal of the reinforcement learning agent. The reward estimates how good the agent performs an action in a given state (or what are the good or bad things for the agent). In this paper, we design a reward network to map each state to a scalar, ” ) ; and control learning circuitry to learn the control detail on the basis of the state data and the reward value, wherein the reward generation circuitry includes reward calculation formula selection circuitry to select a reward calculation formula different for each of the plurality of state categories on the basis of the inputted state category ( see e.g., Page 5 – the reward function is based on a reward network, where a corresponding reward value is generated in response to a control target the reward network selects a different reward calculation formula reward network e.g. may select +2 for (x, a) E { C-}) based on one of several state categories (e.g. for (x, a) E { C-} ) ) , and reward value calculation circuitry to calculate the reward value using the reward calculation formula selected by the reward calculation formula selection circuitry ( see e.g., Page 5 ) . With respect to independent claim 2 , Zhang teaches training data generation circuitry to generate training data in which the state data and the control detail are associated with each other ( see e.g., Page 3 - ZHANG teaches that the model is trained using correlation (i.e. association) between state and action (i.e. control) data - "DNN to derive the correlation between each state-action pair (s, a) of the system under control"; "DNN ... trained") . With respect to independent claim 3 , Zhang teaches the control target is a vehicle, and the state data acquisition circuitry acquires vehicle state data including a position and a speed of the vehicle as the state data (see F ig . 1 and Page 2 ) . With respect to independent claim 4 , Zhang teaches the control target is a vehicle, and the state data acquisition circuitry acquires vehicle state data including a position and a speed of the vehicle as the state data (see e.g., Page 2, col 1 - “ learning the state-action pairs in a supervised fashion") . With respect to independent claim 5 , Zhang teaches the control target is a character of a computer game, and the state data acquisition circuitry acquires character state data including a position of the character as the state data (see e.g., Page 2, col 1) . With respect to independent claim 6 , Zhang teaches the control target is a character of a computer game, and the state data acquisition circuitry acquires character state data including a position of the character as the state data (see e.g., Page 2, col 1) . Claim 8 is rejected for the similar reasons discussed above with respect to claim 1. Claim 9 is rejected for the similar reasons discussed above with respect to claim 1. Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of “ Reinforcement learning is supervised learning on optimized data ” Eysenbach et al. (hereinafter Eysenbach) 2020 . With respect to independent claim 7 , Zhang teaches a control system comprising: state data acquisition circuitry to acquire state data indicating a state of a control target (see e.g., Page 2 -3 ) ; state category identification circuitry to identify a state category to which a state indicated by the state data belongs among a plurality of state categories indicating classifications of states of the control target on the basis of the state data ( see e.g., Fig. 1 and Page 2 – “ As shown in Fig.1, the general agent-environment interaction modeling (of both traditional RL and the emerging DRL) consists of an agent, an environment, a finite state space S, a set of available actions A, and a reward function: S×A → R. The decision maker is called the agent, and should be trained as the interaction system runs. The agent needs to interact with the outside, which is called the environment. The interaction between the agent and the environment is a continual process. At each decision epoch k, the agent will make ” It is implicit that the state categories are different and that they would have to be identified and classified) ; reward generation circuitry to calculate a reward value of a control detail for the control target on the basis of the state category and the state data; control learning circuitry to learn the control detail on the basis of the state data and the reward value ( see e.g., Page 2 and 5) ; training data generation circuitry to generate training data in which the state data and the control detail are associated with each other ( see e.g., Page 3 - ZHANG teaches that the model is trained using correlation (i.e. association) between state and action (i.e. control) data - "DNN to derive the correlation between each state-action pair (s, a) of the system under control"; "DNN ... trained") ; wherein the reward generation circuitry includes reward calculation formula selection circuitry to select a reward calculation formula different for each of the plurality of state categories on the basis of the inputted state category ( see e.g., Page 5 – the reward function is based on a reward network, where a corresponding reward value is generated in response to a control target the reward network selects a different reward calculation formula reward network e.g. may select +2 for (x, a) E { C-}) based on one of several state categories (e.g. for (x, a) E { C-} ) ) , and reward value calculation circuitry to calculate the reward value using the reward calculation formula selected by the reward calculation formula selection circuitry (see e.g., Page 5 ) . Zhang does not expressly show supervised learning circuitry to generate a supervised learned model for inferring the control detail from the state data on the basis of the training data generated by the training data generation circuitry; and action inference circuitry to infer the control detail using the supervised learned model . However, Zhang expressly indicates that direct supervised learning of state-action pairs would be obvious to consider (see e.g., Page 2 – “ In reference to both Double Q-learning and DQN, we refer to the resulting learning algorithm as Double DQN. In this paper, we use double DQN to build the vehicle speed model. By approximating this function rather than directly learning the state-action pairs in a supervised fashion, one can handle new scenarios better. ”) Further, Eysenbach teaches similar feature (see e.g., Page 2-4 ) . Both Zhang and Eysenbach are directed to control and rewarding system . Accordingly, it would have been obvious to the skilled artisan before the effective filing date of the claimed invention having Zhang and Eysenbach in front of them to modify the system of Zhang to include the above feature . The motivation to combine Zhang and Eysenbach comes from Eysenbach . Eysenbach discloses the motivation to implement direct supervised learning to optimize performance of the system ( see e.g. Page 2-4 ). It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain.” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson , 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co. v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert. denied, 493 U.S. 975 (1989). See also Upsher -Smith Labs. v. Pamlab , LLC, 412 F.3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir. 2005 ); Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998). Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT PEIYONG WENG whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)270-1660 . The examiner can normally be reached on Mon.-Fri. 8 am to 5 pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Matthew Ell , can be reached on (571) 270-3264 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). /PEI YONG WENG/ Primary Examiner, Art Unit 2141
Read full office action

Prosecution Timeline

Aug 25, 2023
Application Filed
Mar 01, 2026
Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602594
DIRECTED TRAJECTORIES THROUGH COMMUNICATION DECISION TREE USING ITERATIVE ARTIFICIAL INTELLIGENCE
2y 5m to grant Granted Apr 14, 2026
Patent 12579468
TRAINING DATA SCREENING DEVICE, ROBOT SYSTEM, AND TRAINING DATA SCREENING METHOD
2y 5m to grant Granted Mar 17, 2026
Patent 12572845
INTELLIGENT MACHINE-LEARNING MODEL CATALOG
2y 5m to grant Granted Mar 10, 2026
Patent 12561608
APPARATUS AND METHODS FOR PREDICTING SLIPPING EVENTS FOR MICROMOBILITY VEHICLES
2y 5m to grant Granted Feb 24, 2026
Patent 12555665
HOME EXERCISE PLAN PREDICTION
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
79%
Grant Probability
99%
With Interview (+23.1%)
3y 3m
Median Time to Grant
Low
PTA Risk
Based on 637 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month