Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. DETAILED ACTION This action is responsive to the following communication: Non-Provisional Application filed Aug . 25 , 202 3 . Claims 1- 9 are pending in the case. Claims 1 and 7-9 are independent claims. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale , or otherwise available to the public before the effective filing date of the claimed invention. Claims 1 -6 and 8-9 are rejected under 35 U.S.C. 10 2 (a )(1 ) as being anticipated by “ Human-Like Autonomous Vehicle Speed Control by Deep Reinforcement learning with Double Q-Learning” Zhang et al. (hereinafter Zhang ) 2018 . With respect to independent claim 1 , Zhang teaches a control device comprising: state data acquisition circuitry to acquire state data indicating a state of a control target (see e.g., Page 2 -3 ) ; state category identification circuitry to identify a state category to which a state indicated by the state data belongs among a plurality of state categories indicating classifications of states of the control target on the basis of the state data ( see e.g., Fig. 1 and Page 2 – “ As shown in Fig.1, the general agent-environment interaction modeling (of both traditional RL and the emerging DRL) consists of an agent, an environment, a finite state space S, a set of available actions A, and a reward function: S×A → R. The decision maker is called the agent, and should be trained as the interaction system runs. The agent needs to interact with the outside, which is called the environment. The interaction between the agent and the environment is a continual process. At each decision epoch k, the agent will make ” It is implicit that the state categories are different and that they would have to be identified and classified ) ; reward generation circuitry to calculate a reward value of a control detail for the control target on the basis of the state category and the state data ( see e.g., Page 2 and 5 – “ Reward network: Reward is necessary in almost all reinforcement learning algorithms offering the goal of the reinforcement learning agent. The reward estimates how good the agent performs an action in a given state (or what are the good or bad things for the agent). In this paper, we design a reward network to map each state to a scalar, ” ) ; and control learning circuitry to learn the control detail on the basis of the state data and the reward value, wherein the reward generation circuitry includes reward calculation formula selection circuitry to select a reward calculation formula different for each of the plurality of state categories on the basis of the inputted state category ( see e.g., Page 5 – the reward function is based on a reward network, where a corresponding reward value is generated in response to a control target the reward network selects a different reward calculation formula reward network e.g. may select +2 for (x, a) E { C-}) based on one of several state categories (e.g. for (x, a) E { C-} ) ) , and reward value calculation circuitry to calculate the reward value using the reward calculation formula selected by the reward calculation formula selection circuitry ( see e.g., Page 5 ) . With respect to independent claim 2 , Zhang teaches training data generation circuitry to generate training data in which the state data and the control detail are associated with each other ( see e.g., Page 3 - ZHANG teaches that the model is trained using correlation (i.e. association) between state and action (i.e. control) data - "DNN to derive the correlation between each state-action pair (s, a) of the system under control"; "DNN ... trained") . With respect to independent claim 3 , Zhang teaches the control target is a vehicle, and the state data acquisition circuitry acquires vehicle state data including a position and a speed of the vehicle as the state data (see F ig . 1 and Page 2 ) . With respect to independent claim 4 , Zhang teaches the control target is a vehicle, and the state data acquisition circuitry acquires vehicle state data including a position and a speed of the vehicle as the state data (see e.g., Page 2, col 1 - “ learning the state-action pairs in a supervised fashion") . With respect to independent claim 5 , Zhang teaches the control target is a character of a computer game, and the state data acquisition circuitry acquires character state data including a position of the character as the state data (see e.g., Page 2, col 1) . With respect to independent claim 6 , Zhang teaches the control target is a character of a computer game, and the state data acquisition circuitry acquires character state data including a position of the character as the state data (see e.g., Page 2, col 1) . Claim 8 is rejected for the similar reasons discussed above with respect to claim 1. Claim 9 is rejected for the similar reasons discussed above with respect to claim 1. Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of “ Reinforcement learning is supervised learning on optimized data ” Eysenbach et al. (hereinafter Eysenbach) 2020 . With respect to independent claim 7 , Zhang teaches a control system comprising: state data acquisition circuitry to acquire state data indicating a state of a control target (see e.g., Page 2 -3 ) ; state category identification circuitry to identify a state category to which a state indicated by the state data belongs among a plurality of state categories indicating classifications of states of the control target on the basis of the state data ( see e.g., Fig. 1 and Page 2 – “ As shown in Fig.1, the general agent-environment interaction modeling (of both traditional RL and the emerging DRL) consists of an agent, an environment, a finite state space S, a set of available actions A, and a reward function: S×A → R. The decision maker is called the agent, and should be trained as the interaction system runs. The agent needs to interact with the outside, which is called the environment. The interaction between the agent and the environment is a continual process. At each decision epoch k, the agent will make ” It is implicit that the state categories are different and that they would have to be identified and classified) ; reward generation circuitry to calculate a reward value of a control detail for the control target on the basis of the state category and the state data; control learning circuitry to learn the control detail on the basis of the state data and the reward value ( see e.g., Page 2 and 5) ; training data generation circuitry to generate training data in which the state data and the control detail are associated with each other ( see e.g., Page 3 - ZHANG teaches that the model is trained using correlation (i.e. association) between state and action (i.e. control) data - "DNN to derive the correlation between each state-action pair (s, a) of the system under control"; "DNN ... trained") ; wherein the reward generation circuitry includes reward calculation formula selection circuitry to select a reward calculation formula different for each of the plurality of state categories on the basis of the inputted state category ( see e.g., Page 5 – the reward function is based on a reward network, where a corresponding reward value is generated in response to a control target the reward network selects a different reward calculation formula reward network e.g. may select +2 for (x, a) E { C-}) based on one of several state categories (e.g. for (x, a) E { C-} ) ) , and reward value calculation circuitry to calculate the reward value using the reward calculation formula selected by the reward calculation formula selection circuitry (see e.g., Page 5 ) . Zhang does not expressly show supervised learning circuitry to generate a supervised learned model for inferring the control detail from the state data on the basis of the training data generated by the training data generation circuitry; and action inference circuitry to infer the control detail using the supervised learned model . However, Zhang expressly indicates that direct supervised learning of state-action pairs would be obvious to consider (see e.g., Page 2 – “ In reference to both Double Q-learning and DQN, we refer to the resulting learning algorithm as Double DQN. In this paper, we use double DQN to build the vehicle speed model. By approximating this function rather than directly learning the state-action pairs in a supervised fashion, one can handle new scenarios better. ”) Further, Eysenbach teaches similar feature (see e.g., Page 2-4 ) . Both Zhang and Eysenbach are directed to control and rewarding system . Accordingly, it would have been obvious to the skilled artisan before the effective filing date of the claimed invention having Zhang and Eysenbach in front of them to modify the system of Zhang to include the above feature . The motivation to combine Zhang and Eysenbach comes from Eysenbach . Eysenbach discloses the motivation to implement direct supervised learning to optimize performance of the system ( see e.g. Page 2-4 ). It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain.” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson , 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co. v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert. denied, 493 U.S. 975 (1989). See also Upsher -Smith Labs. v. Pamlab , LLC, 412 F.3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir. 2005 ); Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998). Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT PEIYONG WENG whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)270-1660 . The examiner can normally be reached on Mon.-Fri. 8 am to 5 pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Matthew Ell , can be reached on (571) 270-3264 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). /PEI YONG WENG/ Primary Examiner, Art Unit 2141