Last updated: April 19, 2026
Application No. 17/996,143
TACTICAL DECISION-MAKING THROUGH REINFORCEMENT LEARNING WITH UNCERTAINTY ESTIMATION

Final Rejection §103
Filed
Oct 13, 2022
Examiner
LINHARDT, LAURA E
Art Unit
3663
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Volvo Autonomous Solutions AB
OA Round
4 (Final)
Interview Optional

— +22.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 223 resolved cases, 2023–2026
Examiner Intelligence

LINHARDT, LAURA E View full profile →
Grants 70% — above average
Career Allow Rate
155 granted / 223 resolved
+17.5% vs TC avg
Strong +23% interview lift
Without
With
+22.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
51 currently pending
Career history
274
Total Applications
across all art units
Statute-Specific Performance

§101
5.4%
-34.6% vs TC avg
§103
72.8%
+32.8% vs TC avg
§102
5.4%
-34.6% vs TC avg
§112
14.4%
-25.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 223 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-14 and 16-18 are pending in this application.
Claim 16 is amended.
Claim 15 is cancelled.
Claim 18 is newly added.
Claims 1-14 and 16-18 are presented for examination. 

Response to Amendments
Claim Objections
Applicant’s amendments, filed 15 December 2025, with respect to the objection for claim 16 has been considered, and the claim objection has been resolved.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


Claims 1-3, 6-9, 12-14, and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US Publication 2019/0250568 A1) in view of Halder (Foreign Reference CA3096413A1). 
Regarding claim 1, Li teaches a method of controlling actuators in an autonomous vehicle using a reinforcement learning, RL, agent, the method comprising: a plurality of training sessions, in which the RL agent interacts with a simulated or real-world environment including the autonomous vehicle (Li: Para. 21, 23, 28; the learning agent is initially trained to control the (real or virtual object) object and/or system within the (real, augmented, or virtual) environment; each training iteration of the training of the learning agent, the learning agent generates a learning (or exploratory) signal and the supervisor agent generates a supervisor signal; reinforcement learning (RL) framework is employed for training; tasks, such as controlling a drone or other autonomous vehicle), ………… ; decision-making, in which the RL agent outputs at least one tentative driving decision relating to control of actuators in the autonomous vehicle (Li: Para. 23, 27; reinforcement learning; the agent “learns” to select actions, based on sensed or provided current locations within the state space, that tend to increase the expected value of the cumulative reward across the performance of the task).
Li doesn’t explicitly teach such that in each training session the environment is identical but has a different initial value and yields a state-action value function Qk (s, a) dependent on state and action …………. wherein the decision-making is based on a common state-action value function Q(s, a) obtained by combining the state-action value function Qk (s, a) from the training sessions; estimating an uncertainty by computing a variability measure for values of the plurality of state-action value functions evaluated for a state-action pair corresponding to the tentative driving decisions; and vehicle control, wherein the at least one tentative driving decision is executed only if the estimated uncertainty is less than a predefined threshold. 
However Halder, in the same field of endeavor, teaches such that in each training session the environment is identical but has a different initial value and yields a state-action value function Qk (s, a) dependent on state and action (Halder: Para. 80, 168-169; the CNN may be trained using labeled training data comprising sample images of a vehicle's environment; for a given state and a given action, environment predicts what the next state will be and the next reward; RL agent can be trained in a simulation environment) …………. wherein the decision-making is based on a common state-action value function Q(s, a) obtained by combining the state-action value function Qk (s, a) from the training sessions (Halder: Para. 123; iteratively training the Al model using a portion of the training dataset and then validating the trained model using another portion of the training dataset); estimating an uncertainty by computing a variability measure for values of the plurality of state-action value functions evaluated for a state-action pair corresponding to the tentative driving decisions (Halder: Para. 135; autonomous vehicle management system generates a score .. indicative of how similar the inferring data points  received in 606 are to the training data obtained in 602 and used to train the model); and vehicle control, wherein the at least one tentative driving decision is executed only if the estimated uncertainty is less than a predefined threshold (Halder: Para. 9; a score indicative of a low degree of similarity may be generated where the inferring data point is different from or dissimilar to the training data set; in instances where the score for certain inferring data point is low, which indicates a high measure of dissimilarity, the prediction made by the Al model based upon that inferring data point may be overridden or not used by the autonomous vehicle management system).
It would have been obvious to one having ordinary skill in the art to modify the training of learning and pioneer agents with state-action pairs in Li (Li: Para. 23, 33) with the trained convolutional neural network (Halder: Para. 80) with a reasonable expectation of success because training a reinforcement agent by pre-training in a simulation environment to improve action decisions during the runtime phase (Halder: Para. 168-169).
Regarding claim 2, Li doesn’t explicitly teach wherein each of said at least one tentative driving decision is executed only if the estimated uncertainty is less than a predefined threshold C. 
However Halder, in the same field of endeavor, teaches wherein each of said at least one tentative driving decision is executed only if the estimated uncertainty is less than a predefined threshold C (Halder: Para. 9; a score indicative of a low degree of similarity may be generated where the inferring data point is different from or dissimilar to the training data set; in instances where the score for certain inferring data point is low, which indicates a high measure of dissimilarity, the prediction made by the Al model based upon that inferring data point may be overridden or not used by the autonomous vehicle management system).
It would have been obvious to one having ordinary skill in the art to modify the training of learning and pioneer agents with state-action pairs in Li (Li: Para. 23, 33) with the trained convolutional neural network (Halder: Para. 80) with a reasonable expectation of success because training a reinforcement agent by pre-training in a simulation environment to improve action decisions during the runtime phase (Halder: Para. 168-169).
Regarding claim 3, Li teaches the method of claim 2, wherein: the decision-making includes the RL agent outputting multiple tentative driving decisions (Li: Para. 23, 26; reinforcement learning; the agent selects an available action for the object to execute); and the vehicle control includes sequential evaluation of the tentative driving decisions with respect to their estimated uncertainties (Li: Para. 26; agent's selected actions within the environment tend to maximize, or at least increase, the expected value of the cumulative reward associated with object's path through state space and executed actions (i.e., the agent's sequence of state-action pairs)).
Regarding claim 6, Li teaches the method of claim 1, wherein the RL agent includes at least one neural network (Li: Para. 38; the learning agent (and/or learning policy), as well as the pioneer agent (and/or pioneer policy), is implemented via deep neural networks).
Regarding claim 7, Li teaches the method of claim 6, wherein the RL agent is obtained by a policy gradient algorithm (Li: Para. 38; actor-critic framework may be adopted in the training of each of the learning and pioneering networks).
Regarding claim 8, Li teaches the method of claim 6, wherein the RL agent is a Q-learning agent (Li: Para. 38;  the neural networks are deep Q (referring to the Q-function) networks (DQN)).
Regarding claim 9, Li teaches the method of claim 6, wherein the training sessions use an equal number of neural networks (Li: Para. 54; S may include the same number of dimensions that characterize the state space).
Regarding claim 12, Li doesn’t explicitly teach wherein the variability measure is one or more of: a variance, a range, a deviation, a variation coefficient, an entropy.
However Halder, in the same field of endeavor, teaches wherein the variability measure is one or more of: a variance, a range, a deviation, a variation coefficient, an entropy (Halder: Para. 140; a low degree of similarity represented by, for example, a low score, may indicate a high variance or difference between the inferring data point and the training data).
It would have been obvious to one having ordinary skill in the art to modify the training of learning and pioneer agents with state-action pairs in Li (Li: Para. 23, 33) with the trained convolutional neural network (Halder: Para. 80) with a reasonable expectation of success because training a reinforcement agent by pre-training in a simulation environment to improve action decisions during the runtime phase (Halder: Para. 168-169).
Regarding claim 13, Li teaches an arrangement for controlling actuators in an autonomous vehicle, comprising: processing circuitry and memory implementing a reinforcement learning, RL, agent configured to: - interact with a simulated or real-world environment including the autonomous vehicle in a plurality of training sessions (Li: Para. 21, 23, 28, 85; memory, one or more processors; the learning agent is initially trained to control the (real or virtual object) object and/or system within the (real, augmented, or virtual) environment; each training iteration of the training of the learning agent, the learning agent generates a learning (or exploratory) signal and the supervisor agent generates a supervisor signal; reinforcement learning (RL) framework is employed for training; tasks, such as controlling a drone or other autonomous vehicle), …………  , and - output at least one tentative driving decision relating to control of actuators in the autonomous vehicle (Li: Para. 23, 27; reinforcement learning; the agent “learns” to select actions, based on sensed or provided current locations within the state space, that tend to increase the expected value of the cumulative reward across the performance of the task). 
Li doesn’t explicitly teach such that each training session the environment is identical but has a different initial value, and each training session yields a state-action value function Qk (s, a) dependent on state and action ………. , wherein the tentative decision is based on a common state-action value function Q(s, a) obtained by combining the state-action value function Qk (s,a) from the training sessions, the processing circuitry and memory further implementing an uncertainty estimator configured to estimate an uncertainty by computing a variability measure for values of the plurality of state-action value functions evaluated for a state-action pair corresponding to each of the tentative driving decisions by the RL agent, the arrangement further comprising a vehicle control interface configured to control the autonomous vehicle by executing the tentative driving decision only if the estimated uncertainty is less than a predefined  threshold.
However Halder, in the same field of endeavor, teaches such that each training session the environment is identical but has a different initial value, and each training session yields a state-action value function Qk (s, a) dependent on state and action (Halder: Para. 80, 168-169; the CNN may be trained using labeled training data comprising sample images of a vehicle's environment; for a given state and a given action, environment predicts what the next state will be and the next reward; RL agent can be trained in a simulation environment) ………. , wherein the tentative decision is based on a common state-action value function Q(s, a) obtained by combining the state-action value function Qk (s,a) from the training sessions (Halder: Para. 123; iteratively training the Al model using a portion of the training dataset and then validating the trained model using another portion of the training dataset), the processing circuitry and memory further implementing an uncertainty estimator configured to estimate an uncertainty by computing a variability measure for values of the plurality of state-action value functions evaluated for a state-action pair corresponding to each of the tentative driving decisions by the RL agent (Halder: Para. 135; autonomous vehicle management system generates a score .. indicative of how similar the inferring data points  received in 606 are to the training data obtained in 602 and used to train the model), the arrangement further comprising a vehicle control interface configured to control the autonomous vehicle by executing the tentative driving decision only if the estimated uncertainty is less than a predefined  threshold (Halder: Para. 9; a score indicative of a low degree of similarity may be generated where the inferring data point is different from or dissimilar to the training data set; in instances where the score for certain inferring data point is low, which indicates a high measure of dissimilarity, the prediction made by the Al model based upon that inferring data point may be overridden or not used by the autonomous vehicle management system).
It would have been obvious to one having ordinary skill in the art to modify the training of learning and pioneer agents with state-action pairs in Li (Li: Para. 23, 33) with the trained convolutional neural network (Halder: Para. 80) with a reasonable expectation of success because training a reinforcement agent by pre-training in a simulation environment to improve action decisions during the runtime phase (Halder: Para. 168-169).
Regarding claim 14, Li teaches a computer program comprising instructions to cause the arrangement of claim 13 to perform the method (Li: Para. 89; memory includes instructions; instructions, when executed by processor(s) are configured to cause the computing device to perform any of the operations described herein).
Regarding claim 15, Li teaches a data carrier carrying the computer program of claim 14 (Li: Para. 89; memory includes instructions).
Regarding claim 16, Li teaches the method of claim 7, wherein the policy gradient algorithm is an actor-critic algorithm (Li: Para. 38; actor-critic framework may be adopted in the training of each of the learning and pioneering networks).
Regarding claim 17, Li teaches the method of claim 8, wherein the Q-learning agent is a deep Q network (Li: Para. 38; the neural networks are deep Q (referring to the Q-function) networks (DQN)).
Regarding claim 18, Li doesn’t explicitly teach wherein the computed variability measure indicates the variability among the values of the plurality of state-action value functions evaluated for the state-action pair corresponding to the tentative driving decision.
However Halder, in the same field of endeavor, teaches wherein the computed variability measure indicates the variability among the values of the plurality of state-action value functions evaluated for the state-action pair corresponding to the tentative driving decision (Hadler: Para. 9, 123, 168-169; high measure of dissimilarity; not used by the autonomous vehicle management system; runtime phase where it makes decisions regarding actions to be performed and continues its learning process; for a given state and a given action, environment predicts what the next state will be and the next reward).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US Publication 2019/0250568 A1) in view of Halder (Foreign Reference CA3096413A1)  and in further view of Tremblay et al. (US Patent 12,109,701 B2). 
Regarding claim 4, Li and Halder don’t explicitly teach wherein a fallback decision is executed if the sequential evaluation does not return a tentative decision to be executed.
However Tremblay, in the same field of endeavor, teaches herein a fallback decision is executed if the sequential evaluation does not return a tentative decision to be executed (Tremblay: Col. 12 Lines 13-17, Col. 42 Lines 46-53; infotainment system may perform some self-driving functions in event that primary controller(s) fail; infotainment SoC may put vehicle into a chauffeur to safe stop mode).
It would have been obvious to one having ordinary skill in the art to modify the training of learning and pioneer agents with state-action pairs in Li (Li: Para. 23, 33) with the trained convolutional neural network (Halder: Para. 80) and the safe stop mode taught in Tremblay (Tremblay: Col. 42 Lines 46-53) with a reasonable expectation of success because the infotainment system may perform some self-driving functions in event that primary controller’s model-based algorithm fails  as taught by Tremblay (Tremblay: Col 12 Lines 13-17,  Col. 42 Lines 46-53).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US Publication 2019/0250568 A1) in view of Halder (Foreign Reference CA3096413A1) and in further view of Nakhaei Sarvedani et al. (US Patent 11,657,251 B2). 
Regarding claim 5, Li and Halder don’t explicitly teach wherein the decision- making includes tactical decision-making.
However Nakhaei Sarvedani, in the same field of endeavor, teaches wherein the decision- making includes tactical decision-making (Nakhaei Sarvedani: Col. 5 Lines 22-38; multi-agent reinforcement learning application; autonomous driving control that may be applied to allow the ego agent to allow the target agent to pass the ego agent).
It would have been obvious to one having ordinary skill in the art to modify the training of learning and pioneer agents with state-action pairs in Li (Li: Para. 23, 33) with the trained convolutional neural network (Halder: Para. 80) and the pass decision taught in Nakhaei Sarvedani with a reasonable expectation of success because in a scenario in which the lane is occupied, the ego agent may need to determine that the target agent may not be able to change lanes and thereby may determine the controls for the target agent to pass the ego agent as taught by Nakhaei Sarvedani (Nakhaei Sarvedani: Col. 5 lines 22-38).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US Publication 2019/0250568 A1) in view of Halder (Foreign Reference CA3096413A1) and in further view of Schulter et al. (US Publication 2020/0094824 A1). 
Regarding claim 10, Li and Halder don’t explicitly teach wherein the initial value corresponds to a randomized prior function, RPF.
However Schulter, in the same field of endeavor, teaches wherein the initial value corresponds to a randomized prior function, RPF (Schulter: Para. 79; random initialization).
It would have been obvious to one having ordinary skill in the art to modify the training of learning and pioneer agents with state-action pairs in Li (Li: Para. 23, 33) with the trained convolutional neural network (Halder: Para. 80) and the sampling process taught in Schulter (Para. 41) with a reasonable expectation of success because finding parameters that generate pairs such that the function learned with this data achieves high accuracy on test pairs that have not been seen before or are actually coming from a camera as taught by Schulter (Schulter: Para. 41).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US Publication 2019/0250568 A1) in view of Halder (Foreign Reference CA3096413A1) and in further view of Levihn et al. (US Patent 11,243,532 B1). 
Regarding claim 11, Li and Halder don’t explicitly teach wherein the common state-action value function Q(s,a), on which the decision- making is based, corresponds to a central tendency of said plurality of state-action value functions Qk(s,a), where k = 1, 2,….. K.
However Levihn, in the same field of endeavor, teaches the common state-action value function Q(s,a), on which the decision- making is based, corresponds to a central tendency of said plurality of state-action value functions Qk(s,a), where k = 1, 2,….. K (Levihn: Col. 20 Lines 41-52; encodings of the combination of the action and the state may be provided as input to a respective instance of a machine learning model (e.g., a deep neural network-based reinforcement learning model) trained to generate estimated value metrics (Q(s, a)) for the combination).
It would have been obvious to one having ordinary skill in the art to modify the training of learning and pioneer agents with state-action pairs in Li (Li: Para. 23, 33) with the trained convolutional neural network (Halder: Para. 80) and the combination of action and state are provided as an input for a machine learning model (Levihn: Col. 20 Lines 41-52) with a reasonable expectation of success because a deep neural network-based reinforcement learning model trained to generate estimated value metrics (Q(s, a)) for the action state combination as taught by Levihn (Levihn: Col. 20 Lines 41-52).

Response to Arguments
Applicant’s arguments, filed 15 December 2025, with respect to the rejection of claims 1-17 under 35 U.S.C. 103 have been considered, but are not persuasive.  
The applicant’s attorney argues that Halder does not supplement Li to render the claimed invention obvious.
In response to the applicant’s argument above, Halder is being used to teach “such that in each training session the environment is identical but has a different initial value and each training session yields a state-action value function Qk (s, a) dependent on state and action.”
Hadler teaches a convolution neural network used to simulate the environment of the vehicle. A possible action is put into this environment model, and the environment will change states in response to the action. The environment model is a given state and the group of actions are the given actions. The environment model will predict the next state and next reward for each given action. The autonomous vehicle management makes decisions regarding the given actions due to the response of the environment model for each action (Halder: Para. 80, 168-169). 
Hadler is being used to teach “wherein the decision-making is based on a common state-action value function Q(s, a) obtained by combining the state-action value function Qk (s, a) from the training sessions.”
Hadler teaches sending each action into the state function and determining the reward for each action (Hadler: Para. 168-169). The system iteratively training the Al model using a portion of the training dataset and then validating the trained model using another portion of the training dataset. The system is trained, then a set of actions are sent through the state to create a confidence score, where low scores actions are discarded (Halder: Para. 9). Then remaining actions would be sent back through an iteratively trained and validated model. This is to determine the action for the autonomous vehicle. The chosen action will have a good confidence score based on the predicted next state and next reward from the iterations of training and validating the model for runtime phase (Halder: Para. 123). 
Hadler is being used to teach “estimating an uncertainty by computing a variability measure for values of the plurality of state-action value functions evaluated for a state-action pair corresponding to each of the tentative driving decisions”
Hadler teaches the autonomous vehicle management system generates a score indicative of how similar (or different) the inferring data points are to the training data obtained. The score is referred to as the confidence score (Hadler: Para. 135). Hadler’s confidence score is an estimation of the uncertainty. If the confidence score is high then the uncertainty is low. If the comparison of data points to the data set used originally to train the system results in small deviations, then the variability measure is small and the uncertainty is low. Each action sent through the state creates a evaluated state-action prediction with a confidence score. The confidence score reflects the variation of the state-action to the known training set.
Hadler is being used to teach “vehicle control, wherein the at least one tentative driving decision is executed only if of the estimated uncertainty is less than a predefined threshold.”
Hadler teaches an autonomous vehicle management system that is trained by a training data set and then is used to determine autonomous vehicle management. The possible driving decisions are put through the trained model. If the confidence score is low, indicating a high measure of dissimilarity, that particular action is dropped (Halder: Para. 9). Then remaining actions would be sent back through an iteratively trained and validated model. This is to determine the action for the autonomous vehicle. The chosen action will have a good confidence score based on the predicted next state and next reward from the iterations of training and validating the model for runtime phase (Halder: Para. 123).
The applicant next argues that “a variability measure is not a statistical distribution.”
In response to the applicant’s argument above, applicant’s specification includes “A ‘variability measure’  includes any suitable measure for quantifying statistic dispersion, such as a variance, a range of variation, a deviation, a variation coefficient, an entropy etc.” (Specification: Para. 12). The applicant’s specification give a broad definition of variability measure. 
The applicant next argues that it would not be obvious for the person skilled in the art to modify Li with Halder.
In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  In this case, Li teaches reinforced learning where actions are put through a state model representing an autonomous vehicle in order to make control decisions (Li: Para. 21, 23, 28). Halder teaches a reinforcement model where given actions are put into a state to calculate a confidence score to make decisions (Halder: Para. 123, 135). It would have been obvious to one having ordinary skill in the art to modify the training of learning and pioneer agents with state-action pairs in Li (Li: Para. 23, 33) with the trained convolutional neural network (Halder: Para. 80) with a reasonable expectation of success because training a reinforcement agent by pre-training in a simulation environment to improve action decisions during the runtime phase (Halder: Para. 168-169).
The applicant next argues that new dependent claim 18 recites additional elements not described in the cited references.
In response to the applicant’s argument above, Hadler teaches sending each action into the state function and determining the reward for each action (Hadler: Para. 168-169). The system iteratively training the Al model using a portion of the training dataset and then validating the trained model using another portion of the training dataset. The system is trained, then a set of actions are sent through the state to create a confidence score, where low scores actions are discarded (Halder: Para. 9). Then remaining actions would be sent back through an iteratively trained and validated model. This is to determine the action for the autonomous vehicle. The chosen action will have a good confidence score based on the predicted next state and next reward from the iterations of training and validating the model for runtime phase (Halder: Para. 123). 
The examiner apologizes that the USPTO system doesn’t show highlights in any applicant submissions.
The applicant’s arguments have failed to point out the distinguishing characteristics of the amended claim language over the prior art. For the above reasons, Li’s state-action pairs with Halder’s confidence scores used in an iterative model reads on applicant’s tactical decision-making through reinforcement learning with uncertainty estimation. The rejection is maintained. 

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LAURA E LINHARDT whose telephone number is (571)272-8325.  The examiner can normally be reached on M-TR, M-F: 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Angela Ortiz can be reached on (571) 272-1206.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/L.E.L./Examiner, Art Unit 3663          
/ANGELA Y ORTIZ/           Supervisory Patent Examiner, Art Unit 3663
Read full office action
Prosecution Timeline

Oct 13, 2022
Application Filed
Nov 21, 2024
Non-Final Rejection — §103
Feb 24, 2025
Response Filed
Mar 14, 2025
Final Rejection — §103
Jul 25, 2025
Response after Non-Final Action
Aug 26, 2025
Request for Continued Examination
Sep 04, 2025
Response after Non-Final Action
Sep 10, 2025
Non-Final Rejection — §103
Dec 15, 2025
Response Filed
Mar 21, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/151,905
Patent 12586463
DETERMINATION DEVICE, DETERMINATION METHOD, AND PROGRAM
2y 5m to grant Granted Mar 24, 2026
18/174,229
Patent 12578197
Tandem Riding Detection on Personal Mobility Vehicles
2y 5m to grant Granted Mar 17, 2026
18/092,966
Patent 12540822
WATER AREA OBJECT DETECTION SYSTEM AND MARINE VESSEL
2y 5m to grant Granted Feb 03, 2026
17/765,674
Patent 12517275
SUBMARINE EXPLORATION SYSTEM COMPRISING A FLEET OF DRONES
2y 5m to grant Granted Jan 06, 2026
17/524,341
Patent 12459564
ELECTRONIC STEERING APPARATUS OF VEHICLE AND CONTROL METHOD THEREOF
2y 5m to grant Granted Nov 04, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
70%
Grant Probability
92%
With Interview (+22.7%)
3y 1m
Median Time to Grant
High
PTA Risk
Based on 223 resolved cases by this examiner. Grant probability derived from career allow rate.