Last updated: April 19, 2026
Application No. 18/209,943
SAFE AGILE HAZARD AVOIDANCE SYSTEM FOR AUTONOMOUS VEHICLES

Non-Final OA §103
Filed
Jun 14, 2023
Examiner
HERRERA, MICHAEL J
Art Unit
3668
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Rutgers The State University Of New Jersey
OA Round
2 (Non-Final)
This examiner grants 59% of cases after interview

— +33.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 71 resolved cases, 2023–2026
Examiner Intelligence

HERRERA, MICHAEL J View full profile →
Grants 59% of resolved cases
Career Allow Rate
42 granted / 71 resolved
+7.2% vs TC avg
Strong +33% interview lift
Without
With
+33.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
28 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
21.6%
-18.4% vs TC avg
§103
54.6%
+14.6% vs TC avg
§102
10.4%
-29.6% vs TC avg
§112
13.2%
-26.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 71 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-20 filed on 06/14/2023 have been examined. 
This Office Action is in response to the Applicant’s amendments and remarks filed on 08/21/2025.  Claims 1 and 14 have been amended.  Claims 1-20 are currently pending and addressed below.

Response to Remarks/Arguments
Applicant’s accompanying amendments and arguments, on page 7 of the Applicant Arguments/Remarks (hereinafter referred to as the “Remarks”), filed 08/21/2025, with respect to the rejection of independent claims 1 and 14, and their corresponding dependent claims, under 35 U.S.C. 112(b) stating “… claims 1 and 14 were rejected because of the multiple recitations of an autonomous vehicle assertedly caused a lack of clarity with respect to the vehicle associated with the dynamic model… As amended, the second an autonomous vehicle now refers to autonomous vehicles… a simulation that utilizes a dynamic model for the autonomous vehicle… the Applicant respectfully submits the rejection is overcome…” have been considered and are persuasive.  Therefore, the Examiner has withdrawn the rejection of claims 1-20 under 35 U.S.C. 112(b).
Applicant’s accompanying amendments and arguments, on pages 7-10 of the Applicant Remarks, filed 08/21/2025, with respect to the rejection of independent claims 1 and 14, and their corresponding dependent claims, under 35 U.S.C. 103 stating “… Claim 1 recites, in part, inputting... a state of the autonomous vehicle into a constrained Markov decision processing (CMDP) model configured to output an action sequence to control the autonomous vehicle to perform the stunt maneuver. Claim 14 recites substantially similar features. Applicant respectfully submits that none of the art of record discloses or suggests inputting a state of an autonomous vehicle into a constrained Markov decision processing model, as claimed… Applicant respectfully submits that claims 1, 14, and each claim dependent therefrom is not rendered obvious by Hu, Wray, Hassani, Mueller, Li, Quirynen, Egbert, Schleede, Park, Harper, Yasui, Ekmark, Raichelgauz, Takekawa, or any combination thereof…” have been considered and are persuasive.  However, upon further consideration, a new ground(s) of rejection is made in view of Wachi et al. US 20230143937 A1 (“Wachi”) and Lu et al. US 20230113168 A1 (“Lu”).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Hu et al. US 11093829 B2 (“Hu”) in view of Wachi et al. US 20230143937 A1 (“Wachi”), Lu et al. US 20230113168 A1 (“Lu”), and Hassani et al. US 20220050524 A1 (“Hassani”).
For claim 1, Hu discloses a computer-implemented method for safe stunt maneuvering of an autonomous vehicle (See at least Col. 7 lines 29-41 – “… enables the controller to autonomously drive the vehicle around based on the CM3 policy network 140, and to make autonomous driving decisions… which should be made based on the training or the simulation…” and Col. 20 lines 65-67 to Col. 21 lines 1-11 of Hu – “… traffic simulator 1112 may utilize a reward function (R) which may be a function that evaluates a taken (e.g., simulated) action… if the simulated autonomous vehicle … becomes involved in a collision, the reward function may penalize the simulated action …”), comprising:
detecting, by one or more processors, a stimulus to initiate a stunt maneuver (See at least Col. 24 lines 9-10 of Hu – “a sensor that detects one or more other vehicles”);
inputting, by the one or more processors, a state of an autonomous vehicle into a Markov decision processing (MDP) model (See at least Col. 30 lines 56-67 to Col. 31 lines 1-7 of Hu – “… the simulator 108 may define a multi-agent Markov game with N number of agents … The Markov game may be defined by a set of states S describing possible configurations of all agents...”) configured to output an action sequence to control the autonomous vehicle to perform the stunt maneuver (See at least Col. 21 lines 20-26 of Hu – “… A trajectory may be a sequence of states and/or actions which include those states. A policy (π) or autonomous vehicle policy may be a strategy by which the action generator 1116 uses or employs to determine the next action for the autonomous vehicle based on the current state…”), wherein the MDP model is trained by:
obtaining a set of actions that, when executed by autonomous vehicles, implement the stunt maneuver (See at least Col. 19 lines 18-44 of Hu – “… the traffic simulator 1112 may be the simulator 108 of the system 100 for CM3 reinforcement of FIG. 1 …The deep Q-learning system may learn a mapping between states and Q-values associated with each potential action…“ and Col. 24 lines 13-28 – “… action generator 1116 may explore a remaining set of actions from the set of possible actions and … determine the autonomous vehicle policy for one or more additional time intervals, such as until the autonomous vehicle reaches a terminal state… may store one or more of the explored set of actions associated with the one or more additional time intervals as one or more corresponding trajectories. As previously discussed, a trajectory may be a sequence of states and/or actions which include those states…”),
obtaining a dynamic model for the autonomous vehicle (See at least Col. 9 lines 61-65 of Hu – “… the simulator 108 may train the N number of agents to achieve one or more cooperative tasks or to achieve different goals … This may be a dynamic environment…”),
performing, using the dynamic model, a plurality of simulations of the stunt maneuver using the set of actions, wherein the MDP rewards simulations that result in successful performance of the stunt maneuver (See at least Col. 20 lines 65-67 to Col. 21 lines 1-11 of Hu – “… traffic simulator 1112 may utilize a reward function (R) which may be a function that evaluates a taken (e.g., simulated) action… reward function may be utilized to measure success or failure… if the simulated autonomous vehicle … becomes involved in a collision, the reward function may penalize the simulated action … the reward function may award rewards based on the fastest time or fastest route to the goal…”); and
applying, by the one or more processors, the action sequence to autonomous vehicle control systems to cause the autonomous vehicle to perform the stunt maneuver (See at least Col. 24 lines 29-38 of Hu – “… The action generator 1116 may explore the remaining set of actions from the set of possible actions … determine the autonomous vehicle policy based on the reward function… learn the autonomous vehicle policy… communicated to the vehicle 170, and implemented via the vehicle ECU 176 to facilitate autonomous driving…”).
Hu fails to specifically disclose inputting, by the one or more processors, a state of the autonomous vehicle into a constrained Markov decision processing (CMDP) model configured to output an action sequence to control the autonomous vehicle to perform the stunt maneuver.
However, Wachi, in the same field of endeavor teaches inputting, by the one or more processors, a state of the autonomous vehicle into a constrained Markov decision processing (CMDP) model configured to output an action sequence to control the autonomous vehicle to perform the stunt maneuver (See at least [0011] – “FIG. 4 is a block/flow diagram of a method for performing automated actions using a constrained Markov decision process (CMDP) model…” and [0036]-[0038] of Wachi – “… Referring now to FIG. 4, a method of using a trained machine learning system is shown… determines the current state s of the agent and the environment. Following the vehicular example above, the state s may include information about the vehicle 100… Block 404 then determines an action proposal… process may be repeated any number of times, until an action proposal a.sub.n passes the safety threshold test of block 406. When this occurs, block 410 performs the action a.sub.n within the environment…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle using a Markov decision model with states representing agents/autonomous vehicles, while Wachi teaches a system that inputs a state of an autonomous vehicle into a constrained Markov decision process (CMDP) model to output an action to be performed.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of inputting, by the one or more processors, a state of the autonomous vehicle into a constrained Markov decision processing (CMDP) model configured to output an action sequence to control the autonomous vehicle to perform the stunt maneuver as taught by Wachi, with a reasonable expectation of success, in order to determine an action that passes a safety threshold as specified in at least [0038] of Wachi.
Furthermore, Hu also fails to specifically disclose wherein the CMDP model is trained by:
obtaining a set of actions that, when executed by autonomous vehicles, implement the stunt maneuver,
obtaining a dynamic model for the autonomous vehicle,
performing, using the dynamic model, a plurality of simulations of the stunt maneuver using the set of actions, wherein the CMDP rewards simulations that result in successful performance of the stunt maneuver; and
applying, by the one or more processors, the action sequence to autonomous vehicle control systems to cause the autonomous vehicle to perform the stunt maneuver.
However, Lu, in the same field of endeavor teaches wherein the CMDP model is trained by:
obtaining a set of actions that, when executed by autonomous vehicles, implement the stunt maneuver (See at least [0007] – “… a reinforcement learning system includes a plurality of agents, each agent having an individual reward function and one or more safety constraints that involve joint actions of the agents, wherein each agent maximizes a team-average long-term return in performing the joint actions, subject to the safety constraints, and participates in operating a physical system… a distributed constrained Markov decision process (D-CMDP) model implemented over the peer-to-peer communication network and configured to perform policy optimization using a decentralized policy gradient (PG) method, wherein the participation of each agent in operating the physical system is based on the D-CMDP model…” and [0109] of Lu – “… each agent 104 corresponds to a vehicle in an autonomous vehicle system. Each vehicle is attempting to reach a destination subject to constraints…”),
obtaining a dynamic model for the autonomous vehicle (See at least [0123] of Lu – “… parameters of the individual reward function for each agent 104 are updated…”),
performing, using the dynamic model, a plurality of simulations of the stunt maneuver using the set of actions, wherein the CMDP rewards simulations that result in successful performance of the stunt maneuver (See at least [0007] – “… a reinforcement learning system includes a plurality of agents, each agent having an individual reward function and one or more safety constraints … wherein each agent maximizes a team-average long-term return in performing the joint actions… based on the D-CMDP model…”); and
applying, by the one or more processors, the action sequence to autonomous vehicle control systems to cause the autonomous vehicle to perform the stunt maneuver (See at least [0034] – “… an action (A) is … performed by the agent 104 in accordance with a state transition function…” and [0112] of Lu – “… each agent 104 maximizes a team-average long-term return in performing the joint actions… a distributed constrained Markov decision process (D-CMDP) model 154 implemented … and configured to perform policy optimization … wherein the participation of each agent 104 in operating the physical system 50 is based on the D-CMDP model…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle using a Markov decision model with states representing agents/autonomous vehicles, while Lu teaches a multi-agent reinforcement learning system that is trained using a constrained Markov decision process model and a plurality of agents to maximize a team-average long-term return in performing the joint actions.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of the CMDP model being trained by obtaining a set of actions that, when executed by autonomous vehicles, implement the stunt maneuver as taught by Lu, with a reasonable expectation of success, in order to maximize a team-average long-term return in performing the joint actions as specified in at least [0007] of Lu.
Lastly, Hu also fails to specifically disclose obtaining a set of fuzzy instructions that indicate a set of actions.
However, Hassani, in the same field of endeavor teaches obtaining a set of fuzzy instructions that indicate a set of actions (See at least [0041] of Hassani – “The VCU 165 may execute vehicle control functions as fuzzy state instruction sets…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Hassani teaches a vehicle control system that performs vehicle functions based on fuzzy instruction sets.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of obtaining a set of fuzzy instructions that indicate a set of actions as taught by Hassani, with a reasonable expectation of success, in order to execute vehicle control functions as fuzzy state instruction sets as specified in at least [0041] of Hassani.
For claim 14, Hu discloses a non-transitory computer-readable storage medium configured to store processor-executable instructions for safe stunt maneuvering of an autonomous vehicle that, when executed by one or more processors (See at least Col. 7 lines 29-41 – “… when the CM3 policy network 140 is stored on the storage device of the vehicle, this enables the controller to autonomously drive the vehicle around based on the CM3 policy network 140… according to the CM3 reinforcement learning which occurred within the simulator 108 …” and Fig. 1 of Hu – processors/controller and memory/storage device in system 100 and vehicle 170), cause the one or more processors to: 
detect a stimulus to initiate a stunt maneuver (See at least Col. 24 lines 9-10 of Hu – “a sensor that detects one or more other vehicles”); 
input a state of an autonomous vehicle into a Markov decision processing (MDP) model (See at least Col. 30 lines 56-67 to Col. 31 lines 1-7 of Hu – “… the simulator 108 may define a multi-agent Markov game with N number of agents … The Markov game may be defined by a set of states S describing possible configurations of all agents...”) configured to output an action sequence to control the autonomous vehicle to perform the stunt maneuver (See at least Col. 21 lines 20-26 of Hu – “… A trajectory may be a sequence of states and/or actions which include those states. A policy (π) or autonomous vehicle policy may be a strategy by which the action generator 1116 uses or employs to determine the next action for the autonomous vehicle based on the current state…”), wherein the MDP model is trained by:
obtaining a set of actions that, when executed by autonomous vehicles, implement the stunt maneuver (See at least Col. 19 lines 18-44 of Hu – “… the traffic simulator 1112 may be the simulator 108 of the system 100 for CM3 reinforcement of FIG. 1 …The deep Q-learning system may learn a mapping between states and Q-values associated with each potential action…“ and Col. 24 lines 13-28 – “… action generator 1116 may explore a remaining set of actions from the set of possible actions and … determine the autonomous vehicle policy for one or more additional time intervals, such as until the autonomous vehicle reaches a terminal state… may store one or more of the explored set of actions associated with the one or more additional time intervals as one or more corresponding trajectories. As previously discussed, a trajectory may be a sequence of states and/or actions which include those states…”), 
obtaining a dynamic model for the autonomous vehicle (See at least Col. 9 lines 61-65 of Hu – “… the simulator 108 may train the N number of agents to achieve one or more cooperative tasks or to achieve different goals … This may be a dynamic environment…”), 
performing, using the dynamic model, a plurality of simulations of the stunt maneuver using the set of actions, wherein the MDP rewards simulations that result in successful performance of the stunt maneuver (See at least Col. 20 lines 65-67 to Col. 21 lines 1-11 of Hu – “… traffic simulator 1112 may utilize a reward function (R) which may be a function that evaluates a taken (e.g., simulated) action… reward function may be utilized to measure success or failure… if the simulated autonomous vehicle … becomes involved in a collision, the reward function may penalize the simulated action … the reward function may award rewards based on the fastest time or fastest route to the goal…”); and 
apply the action sequence to autonomous vehicle control systems to cause the autonomous vehicle to perform the stunt maneuver (See at least Col. 24 lines 29-38 of Hu – “… The action generator 1116 may explore the remaining set of actions from the set of possible actions … determine the autonomous vehicle policy based on the reward function… learn the autonomous vehicle policy… communicated to the vehicle 170, and implemented via the vehicle ECU 176 to facilitate autonomous driving…”).
Hu fails to specifically disclose input a state of the autonomous vehicle into a constrained Markov decision processing (CMDP) model configured to output an action sequence to control the autonomous vehicle to perform the stunt maneuver.
However, Wachi, in the same field of endeavor teaches input a state of the autonomous vehicle into a constrained Markov decision processing (CMDP) model configured to output an action sequence to control the autonomous vehicle to perform the stunt maneuver (See at least [0011] – “FIG. 4 is a block/flow diagram of a method for performing automated actions using a constrained Markov decision process (CMDP) model…” and [0036]-[0038] of Wachi – “… Referring now to FIG. 4, a method of using a trained machine learning system is shown… determines the current state s of the agent and the environment. Following the vehicular example above, the state s may include information about the vehicle 100… Block 404 then determines an action proposal… process may be repeated any number of times, until an action proposal a.sub.n passes the safety threshold test of block 406. When this occurs, block 410 performs the action a.sub.n within the environment…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle using a Markov decision model with states representing agents/autonomous vehicles, while Wachi teaches a system that inputs a state of an autonomous vehicle into a constrained Markov decision process (CMDP) model to output an action to be performed.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of inputting a state of the autonomous vehicle into a constrained Markov decision processing (CMDP) model configured to output an action sequence to control the autonomous vehicle to perform the stunt maneuver as taught by Wachi, with a reasonable expectation of success, in order to determine an action that passes a safety threshold as specified in at least [0038] of Wachi.
Furthermore, Hu also fails to specifically disclose wherein the CMDP model is trained by:
obtaining a set of actions that, when executed by autonomous vehicles, implement the stunt maneuver,
obtaining a dynamic model for the autonomous vehicle,
performing, using the dynamic model, a plurality of simulations of the stunt maneuver using the set of actions, wherein the CMDP rewards simulations that result in successful performance of the stunt maneuver; and
apply the action sequence to autonomous vehicle control systems to cause the autonomous vehicle to perform the stunt maneuver.
However, Lu, in the same field of endeavor teaches wherein the CMDP model is trained by:
obtaining a set of actions that, when executed by autonomous vehicles, implement the stunt maneuver (See at least [0007] – “… a reinforcement learning system includes a plurality of agents, each agent having an individual reward function and one or more safety constraints that involve joint actions of the agents, wherein each agent maximizes a team-average long-term return in performing the joint actions, subject to the safety constraints, and participates in operating a physical system… a distributed constrained Markov decision process (D-CMDP) model implemented over the peer-to-peer communication network and configured to perform policy optimization using a decentralized policy gradient (PG) method, wherein the participation of each agent in operating the physical system is based on the D-CMDP model…” and [0109] of Lu – “… each agent 104 corresponds to a vehicle in an autonomous vehicle system. Each vehicle is attempting to reach a destination subject to constraints…”),
obtaining a dynamic model for the autonomous vehicle (See at least [0123] of Lu – “… parameters of the individual reward function for each agent 104 are updated…”),
performing, using the dynamic model, a plurality of simulations of the stunt maneuver using the set of actions, wherein the CMDP rewards simulations that result in successful performance of the stunt maneuver (See at least [0007] – “… a reinforcement learning system includes a plurality of agents, each agent having an individual reward function and one or more safety constraints … wherein each agent maximizes a team-average long-term return in performing the joint actions… based on the D-CMDP model…”); and
apply the action sequence to autonomous vehicle control systems to cause the autonomous vehicle to perform the stunt maneuver (See at least [0034] – “… an action (A) is … performed by the agent 104 in accordance with a state transition function…” and [0112] of Lu – “… each agent 104 maximizes a team-average long-term return in performing the joint actions… a distributed constrained Markov decision process (D-CMDP) model 154 implemented … and configured to perform policy optimization … wherein the participation of each agent 104 in operating the physical system 50 is based on the D-CMDP model…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle using a Markov decision model with states representing agents/autonomous vehicles, while Lu teaches a multi-agent reinforcement learning system that is trained using a constrained Markov decision process model and a plurality of agents to maximize a team-average long-term return in performing the joint actions.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of the CMDP model being trained by obtaining a set of actions that, when executed by autonomous vehicles, implement the stunt maneuver as taught by Lu, with a reasonable expectation of success, in order to maximize a team-average long-term return in performing the joint actions as specified in at least [0007] of Lu.
Lastly, Hu also fails to specifically disclose obtaining a set of fuzzy instructions that indicate a set of actions.
However, Hassani, in the same field of endeavor teaches obtaining a set of fuzzy instructions that indicate a set of actions (See at least [0041] of Hassani – “The VCU 165 may execute vehicle control functions as fuzzy state instruction sets…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Hassani teaches a vehicle control system that performs vehicle functions based on fuzzy instruction sets.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of obtaining a set of fuzzy instructions that indicate a set of actions as taught by Hassani, with a reasonable expectation of success, in order to execute vehicle control functions as fuzzy state instruction sets as specified in at least [0041] of Hassani.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Hu in view of Wachi, Lu, and Hassani, as applied to claim 1 above, and further in view of Mueller US 20200255060 A1 (“Mueller”).
For claim 2, Hu fails to specifically disclose wherein the stunt maneuver is a J-turn.
However, Mueller, in the same field of endeavor teaches wherein the stunt maneuver is a J-turn (See at least [0018] of Mueller – “… the actuator unit is provided to carry out at least one driving maneuver… a so-called “J-turn” maneuver … advantageously fully automatically …”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Mueller teaches testing of vehicle components when a j-turn is automatically performed on a vehicle.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of the stunt maneuver being a J-turn as taught by Mueller, with a reasonable expectation of success, in order to test the vehicle software of a driving dynamics control system as specified in at least [0038] of Mueller.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Hu in view of Wachi, Lu, and Hassani, as applied to claim 1 above, and further in view of Li et al. US 20210387650 A1 (“Li”).
For claim 3, Hu fails to specifically disclose wherein the set of fuzzy instructions are derived from a set of expert instructions.
However, Li, in the same field of endeavor teaches wherein the set of fuzzy instructions are derived from a set of expert instructions (See at least [0031] of Li – “The rules used by the rule-based algorithm may include one or more fuzzy rules as input or set by human experts”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Li teaches a vehicle control system that uses fuzzy rules set by human experts.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of the set of fuzzy instructions being derived from a set of expert instructions as taught by Li, with a reasonable expectation of success, in order to trigger operations for a vehicle based on a rule-based scenario classification algorithm as specified in at least [0031] of Li.

Claims 4 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Hu in view of Wachi, Lu, and Hassani, as applied to claim 1 above, and further in view of Quirynen et al. US 20230022510 A1 (“Quirynen”).
For claim 4, Hu fails to specifically disclose wherein the dynamic model includes an uncertainty model that represents dynamic forces between tires of the autonomous vehicle and a surface traversed by the autonomous vehicle.
However, Quirynen, in the same field of endeavor teaches wherein the dynamic model includes an uncertainty model that represents dynamic forces between tires of the autonomous vehicle and a surface traversed by the autonomous vehicle (See at least Claim 26 of Quirynen – “The predictive controller … wherein a state of the vehicle includes … uncertainty in one or multiple parameter values indicative of friction between tires of the vehicle and a road surface…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Quirynen teaches a predictive controller for a vehicle that takes into consideration uncertainties in vehicle dynamics.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of the dynamic model including an uncertainty model that represents dynamic forces between tires of the autonomous vehicle and a surface traversed by the autonomous vehicle as taught by Quirynen, with a reasonable expectation of success, in order to control the operation of a system with uncertainty as specified in at least [0068] of Quirynen.
For claim 15, Hu fails to specifically disclose wherein the dynamic model includes an uncertainty model that represents dynamic forces between tires of the autonomous vehicle and a surface traversed by the autonomous vehicle.
However, Quirynen, in the same field of endeavor teaches wherein the dynamic model includes an uncertainty model that represents dynamic forces between tires of the autonomous vehicle and a surface traversed by the autonomous vehicle (See at least Claim 26 of Quirynen – “The predictive controller … wherein a state of the vehicle includes … uncertainty in one or multiple parameter values indicative of friction between tires of the vehicle and a road surface…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Quirynen teaches a predictive controller for a vehicle that takes into consideration uncertainties in vehicle dynamics.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of the dynamic model including an uncertainty model that represents dynamic forces between tires of the autonomous vehicle and a surface traversed by the autonomous vehicle as taught by Quirynen, with a reasonable expectation of success, in order to control the operation of a system with uncertainty as specified in at least [0068] of Quirynen.

Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Hu in view of Wachi, Lu, Hassani, and Quirynen, as applied to claim 4 above, and further in view of Egbert et al. US 12162500 B1 (“Egbert”).
For claim 5, Hu fails to specifically disclose wherein performing the plurality of simulations comprises: 
performing, using an upper bound of uncertainty in the dynamic model, the plurality of simulations. 
However, Egbert, in the same field of endeavor teaches wherein performing the plurality of simulations comprises: 
performing, using an upper bound of uncertainty in the dynamic model, the plurality of simulations (See at least Col. 2 lines 32-38 of Egbert – “The model can, for example, be configured to determine differences between a lateral distance and/or heading from a trajectory based on the historical data…the historical data can represent a maximum lateral error a… as the vehicle navigates in the environment (e.g., a real-world environment and/or a simulated environment…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Egbert teaches a vehicle control system with a model that takes into maximum uncertainties in simulations.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of performing, using an upper bound of uncertainty in the dynamic model, the plurality of simulations as taught by Egbert, with a reasonable expectation of success, in order to avoid limiting the progress of a vehicle in an environment due to uncertainties in the simulations as specified in at least  Col. 3 lines 1-35 of Egbert.
For claim 16, Hu fails to specifically disclose wherein performing the plurality of simulations comprises: 
performing, using an upper bound of uncertainty in the dynamic model, the plurality of simulations.
However, Egbert, in the same field of endeavor teaches wherein performing the plurality of simulations comprises: 
performing, using an upper bound of uncertainty in the dynamic model, the plurality of simulations (See at least Col. 2 lines 32-38 of Egbert – “The model can, for example, be configured to determine differences between a lateral distance and/or heading from a trajectory based on the historical data…the historical data can represent a maximum lateral error a… as the vehicle navigates in the environment (e.g., a real-world environment and/or a simulated environment…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Egbert teaches a vehicle control system with a model that takes into maximum uncertainties in simulations.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of performing, using an upper bound of uncertainty in the dynamic model, the plurality of simulations as taught by Egbert, with a reasonable expectation of success, in order to avoid limiting the progress of a vehicle in an environment due to uncertainties in the simulations as specified in at least  Col. 3 lines 1-35 of Egbert.

Claims 6, 9, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hu in view of Wachi, Lu, and Hassani, as applied to claim 1 above, and further in view of Schleede et al. US 12139133 B1 (“Schleede”).
For claim 6, Hu discloses wherein the fuzzy instructions constrain a predicted trajectory of the autonomous vehicle reflected by the output action sequence of the CMDP (See at least Col. 20 lines 65-67 to Col. 21 lines 1-11 of Hu – “… traffic simulator 1112 may utilize a reward function (R) which may be a function that evaluates a taken (e.g., simulated) action … the reward function may award rewards based on the … fastest route to the goal…”); and 
inputting, by the one or more processors, an indication of the safe zone to the CMDP (See at least Col. 20 lines 65-67 to Col. 21 lines 1-11 of Hu – “… The rewards provided by the reward function enables reinforcement learning to occur based on a given goal (e.g., reach an exit ramp…”).
Hu fails to specifically disclose further comprising:
analyzing, by the one or more processors, data representative of an environment along a direction of travel for the autonomous vehicle to identify a safe zone.
However, Schleede, in the same field of endeavor teaches further comprising:
analyzing, by the one or more processors, data representative of an environment along a direction of travel for the autonomous vehicle to identify a safe zone (See at least Col. 3 lines 18-25 of Schleede – “the model may be trained with input data (e.g., weak label data) that is received over time and/or represents data collected over a time period in which a vehicle navigates in an environment…. output data identifying areas or regions in the environment that represent a good region (e.g., a safe region to navigate) and/or a bad region (e.g., a less safe region to navigate…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Schleede teaches a machine learning system for an autonomous vehicle takes inputs of the environments travelled by the autonomous vehicle to identify safe regions to navigate to.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of analyzing data representative of an environment along a direction of travel for the autonomous vehicle to identify a safe zone as taught by Schleede, with a reasonable expectation of success, in order for a model to determine where good vehicle behavior is likely and navigate accordingly as specified in at least Col. 3 lines 18-36 of Schleede.
For claim 9, Hu discloses wherein the CMDP is configured to assign a discount to simulations where the autonomous vehicle does not remain within the safe zone while performing the stunt maneuver (See at least Col. 20 lines 65-67 to Col. 21 lines 1-11 of Hu – “… if the simulated autonomous vehicle misses a goal (e.g., desired destination) or becomes involved in a collision, the reward function may penalize the simulated action…”).
For claim 20, Hu discloses wherein the fuzzy instructions constrain predicted trajectory of the autonomous vehicle reflected by the output action sequence of the CMDP (See at least Col. 20 lines 65-67 to Col. 21 lines 1-11 of Hu – “… traffic simulator 1112 may utilize a reward function (R) which may be a function that evaluates a taken (e.g., simulated) action … the reward function may award rewards based on the … fastest route to the goal…”); and 
input an indication of the safe zone to the CMDP (See at least Col. 20 lines 65-67 to Col. 21 lines 1-11 of Hu – “… The rewards provided by the reward function enables reinforcement learning to occur based on a given goal (e.g., reach an exit ramp…”).
Hu fails to specifically disclose wherein the instructions, when executed, cause the one or more processors to:
analyze data representative of an environment along a direction of travel for the autonomous vehicle to identify a safe zone.
However, Schleede, in the same field of endeavor teaches wherein the instructions, when executed, cause the one or more processors to:
analyze data representative of an environment along a direction of travel for the autonomous vehicle to identify a safe zone (See at least Col. 3 lines 18-25 of Schleede – “the model may be trained with input data (e.g., weak label data) that is received over time and/or represents data collected over a time period in which a vehicle navigates in an environment…. output data identifying areas or regions in the environment that represent a good region (e.g., a safe region to navigate) and/or a bad region (e.g., a less safe region to navigate…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Schleede teaches a machine learning system for an autonomous vehicle takes inputs of the environments travelled by the autonomous vehicle to identify safe regions to navigate to.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of analyzing data representative of an environment along a direction of travel for the autonomous vehicle to identify a safe zone as taught by Schleede, with a reasonable expectation of success, in order for a model to determine where good vehicle behavior is likely and navigate accordingly as specified in at least Col. 3 lines 18-36 of Schleede.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Hu in view of Wachi, Lu, Hassani, and Schleede, as applied to claim 6 above, and further in view of Park et al. US 20230406354 A1 (“Park”).
For claim 7, Hu fails to specifically disclose wherein the safe zone is indicative of a width of a road traversed by the autonomous vehicle.
However, Park, in the same field of endeavor teaches wherein the safe zone is indicative of a width of a road traversed by the autonomous vehicle (See at least [0014] of Park – “… the searching for the safety zones may include determining a shoulder as a safety zone based on a width of the shoulder …a maximum allowable width of invasion into a driving lane…”.  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Park teaches a safety zone identification system for a vehicle.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of the safe zone being indicative of a width of a road traversed by the autonomous vehicle as taught by Park, with a reasonable expectation of success, in order to identify a safety zone for a vehicle based on the width of the vehicle as specified in at least [0014] of Park.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Hu in view of Wachi, Lu, Hassani, and Schleede, as applied to claim 6 above, and further in view of Harper et al. US 20200156538 A1 (“Harper”).
For claim 8, Hu fails to specifically disclose wherein the safe zone is indicative of hazard.
However, Harper, in the same field of endeavor teaches wherein the safe zone is indicative of hazard (See at least [0004] of Harper – “FIG. 2 is an illustration of an autonomous vehicle in an environment in which a dynamic sound emission system of the autonomous vehicle may identify a hazard within a safety zone and activate a warning sound and/or control the autonomous vehicle to avoid the hazard”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Harper teaches identification of a hazard within a safety zone for an autonomous vehicle.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of the safe zone being indicative of hazard as taught by Harper, with a reasonable expectation of success, in order to activate a warning sound and/or control the autonomous vehicle to avoid the hazard as specified in at least [0004] of Harper.

Claims 10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Hu in view of Wachi, Lu, and Hassani, as applied to claim 1 above, and further in view of Yasui et al. US 20210302982 A1 (“Yasui”).
For claim 10, Hu fails to specifically disclose wherein the CMDP is configured to reward outputs based upon at least one of an amount of kinetic energy lost while performing the stunt maneuver and an amount of error in autonomous vehicle orientation while performing the stunt maneuver.
However, Yasui, in the same field of endeavor teaches wherein the CMDP is configured to reward outputs based upon at least one of an amount of kinetic energy lost while performing the stunt maneuver and an amount of error in autonomous vehicle orientation while performing the stunt maneuver (See at least [0066] of Yasui – “… the reward function 340 is set to output …a reward having a negative value when an acceleration/deceleration or degree of turning (angular velocity) obtained from the target trajectory is higher…”).  Thus, Hu discloses a decision making system for autonomous vehicle operation that simulates and awards/penalizes actions to be performed by an autonomous vehicle to determine routes to reach a destination or goal using a Markov decision model with states representing agents/autonomous vehicles, while Yasui teaches a system that assigned rewards to operations of a vehicle based on a degree of change in speed of the vehicle.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the computer-implemented method and non-transitory computer-readable storage medium as disclosed in Hu to include the feature of the CMDP being configured to reward outputs based upon at least one of an amount of kinetic energy lost while performing the stunt maneuver and an amount of error in autonomous vehicle orientation while performing the stunt maneuver as taught by Yasui, with a reasonable expectation of success, in order to negatively reward a vehicle operation that consists of the deceleration being too excessive as specified in at least [0066] of Yasui.
For claim 19, Hu fails to specifically disclose wherein the CMDP is configured to reward outputs based upon at least one of an amount of kinetic energy lost while performing the stunt maneuver and an amount of error in autonomous vehicle orientation while performing the stunt maneuver.
However, Yasui, in the same field of endeavor teaches wherein the CMDP is configured to reward outputs based upon at least one of an amount of kinetic energy lost while performing the stunt maneuver and an amount of error in autonomous vehicle orientation while performing the stunt maneuver (See at least [0066] of Yasui – “… the reward function 340 is set to output …a reward having a negative value when an acceleration/deceleration or degree of turning (angular velocity) obtained from the target trajectory i
Read full office action
Prosecution Timeline

Jun 14, 2023
Application Filed
Apr 19, 2025
Non-Final Rejection — §103
Aug 21, 2025
Response Filed
Dec 05, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/953,408
Patent 12583563
SHIP DOCKING ASSISTING APPARATUS AND SHIP DOCKING ASSISTING METHOD
2y 5m to grant Granted Mar 24, 2026
17/201,412
Patent 12585281
METHOD AND SYSTEM FOR AUTONOMOUS DRIVING OF A VEHICLE
2y 5m to grant Granted Mar 24, 2026
18/454,351
Patent 12579899
METHOD FOR SELECTING AT LEAST ONE SATELLITE NAVIGATION SERVICE PROVIDER AND ASSOCIATED SELECTION SYSTEM
2y 5m to grant Granted Mar 17, 2026
18/386,283
Patent 12566073
OPERATION MANAGEMENT APPARATUS
2y 5m to grant Granted Mar 03, 2026
18/281,395
Patent 12540821
METHOD FOR ASSISTING WITH THE NAVIGATION OF A VEHICLE
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

2-3
Expected OA Rounds
59%
Grant Probability
92%
With Interview (+33.2%)
3y 5m
Median Time to Grant
Moderate
PTA Risk
Based on 71 resolved cases by this examiner. Grant probability derived from career allow rate.