Last updated: May 29, 2026
Application No. 17/747,243
CALIBRATING PARAMETERS WITHIN A VIRTUAL ENVIRONMENT USING REINFORCEMENT LEARNING

Non-Final OA §101§103
Filed
May 18, 2022
Examiner
KAPOOR, DEVAN
Art Unit
2126
Tech Center
2100 — Computer Architecture & Software
Assignee
GM Global Technology Operations LLC
OA Round
2 (Non-Final)
Interview Optional

— +16.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 10% grant rate with +16.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 10 resolved cases, 2023–2026
Examiner Intelligence

KAPOOR, DEVAN View full profile →
Grants only 10% of cases
Career Allowance Rate
1 granted / 10 resolved
-45.0% vs TC avg
Strong +17% interview lift
Without
With
+16.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
20 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
100.0%
+60.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 10 resolved cases
Office Action

§101 §103
DETAILED ACTION
This action is responsive to the application filed on 08/28/2025. Claims 1-20 are pending and have been examined.
This action is Final.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C.
120, 121, 365(c), or 386(c) is acknowledged.
Response to Arguments
Argument 1: The applicant argues that the amendments to independent claims 1, 9, and 13 overcome the prior 101 rejection by incorporating specific technological improvements discussed during the examiner interview. In particular, claim 9 now recites that the reinforcement learning (RL) agent is trained using states that include process states, inputs, previous outputs, and correct system responses, with a reward function defined by both global metrics (fuel consumption, air pollution, battery range) and local metrics (overshoot, oscillation, response reversal, steady-state response, and prediction response). The applicant points to paragraphs [0073] and [0078]-[0080] of the specification as support, asserting that no new matter was added. These amendments are said to specify a concrete technological application of RL in vehicle control that integrates practical improvements, thus satisfying Step 2A (practical application) and Step 2B (significantly more than an abstract idea). The applicant contends that the examiner agreed during the interview that the amendments to claims 1 and 9 would overcome the 101 rejection.
Examiner Response to Argument 1: The examiner has considered the arguments set forth by the applicant above, the amendments to independent claims 1, 9, and 13 do not overcome the section 101 rejection. The added recitations concerning reinforcement learning states and reward metrics remain mathematical operations and iterative optimization carried out by a generic processor and memory. Defining an action space, updating calibration parameters, and maximizing a reward function with global and local performance metrics constitute mathematical relationships and calculations, which are abstract ideas under Step 2A Prong 1. The claims do not integrate the exception into a practical application under Step 2A Prong 2 because the computer elements, including the processor, memory, and reinforcement learning agent, merely execute these abstract calculations within a simulated environment without improving computer functionality or another technology. Under Step 2B, iteratively updating parameters to optimize a reward is a well understood, routine, and conventional activity in the field of machine learning as described in the specification paragraphs 0073 through 0076. Although the applicant asserts that these amendments specify a concrete technological improvement, the improvements are to the underlying mathematical model rather than to any technological process. Regarding the interview, they serve the purpose of talking over concerns over past rejections and potential avenues to overcome the rejection. The examiner made no promise that fixing the changes would directly result in eligibility/allowance, as further examination is often needed after the interview and when official remarks/amendments are received. Accordingly, the section 101 rejection of claims 1 through 20 is maintained. 
Argument 2: In response to the anticipation rejection of claim 9 over Palanisamy (US 2020/0033869 A1), the applicant amended claim 9 to include additional limitations that distinguish over the cited art. The amended claim recites generating a reported problem responsive to monitored vehicle-state parameters tied to measurable RL reward components, comparing these to performance thresholds, processing and labeling the data at a server, determining whether a reported problem frequency falls below a threshold when compared to historical data, and retraining at least one RL agent within a simulated driving scenario based on the collected parameters. The applicant cites paragraphs [0078] and [0079] for support and asserts that no new matter was added. According to the applicant, these specific server-side labeling and retraining functions, as well as the comparison to predetermined frequency thresholds, were discussed and agreed during the interview to distinguish over Palanisamy, thereby overcoming the 102 rejection.
Examiner Response to Argument 2: The examiner has considered the arguments set forth by the applicant above, however the amendments to claim 9 adding server based processing, labeling, frequency threshold determination, and retraining do not overcome the rejection. While the amendments overcome the original 102 rejection, the claims, including the independents 1,9, and 13 now falls under section 103 over Palanisamy in view of Douillard. The change in the statutory basis from anticipation to obviousness was necessitated by the applicant’s amendments that added additional functional steps involving a server and retraining operations. Under the broadest reasonable interpretation, Palanisamy teaches receiving collected vehicle state parameters, evaluating reward components against performance thresholds, and retraining reinforcement learning agents based on prioritized driving experiences using a centralized policy server, as taught in paragraphs 0011, 0017, and 0122. Douillard further teaches processing and labeling aggregated fleet data at a server, determining whether identified problems fall below a predetermined frequency threshold, and iteratively adjusting calibration parameters to minimize residual error between predicted and measured data, as taught in paragraphs 0095 through 0098 and 0140. The combination of these references teaches or suggests all amended limitations of claim 9, including centralized labeling, threshold comparison, and retraining of reinforcement learning agents based on collected operational data. Palanisamy provides the foundational experience memory and retraining framework, while Douillard supplies the explicit server based aggregation and iterative calibration improvement. It would have been obvious to a person of ordinary skill in the art to modify Palanisamy’s policy server framework with Douillard’s fleet level labeling and calibration methods to efficiently centralize data analysis and enhance retraining accuracy, as suggested by Douillard paragraph 0095.
Argument 3: To address the 103 obviousness rejection of claims 1, 8, 13, and 20 over Dosovitskiy (CARLA) in view of Douillard, the applicant amended independent claims 1 and 13 to include limitations specifying that the RL agent has an action space comprising a tuning calibration parameter that defines a value iteratively updated during simulated vehicle operations to maximize a reward function including both global and local performance metrics. Support is cited in paragraphs [0073] and [0075]-[0076] of the specification. The applicant argues that these amendments add technical detail distinguishing over both Dosovitskiy and Douillard because neither teaches an RL agent with a calibration-parameter action space tied to global/local reward optimization in a simulation loop. Therefore, the applicant asserts that the combination fails to render the claims obvious and requests withdrawal of the 103 rejection for all claims dependent from 1 and 13.
Examiner Response to Argument 3: The examiner has considered the arguments set forth by the applicant above, however the amendments to claims 1 and 13 reciting a reinforcement learning agent having an action space with a tuning calibration parameter iteratively updated to optimize a reward function including both global and local performance metrics do not overcome the section 103 rejection over Dosovitskiy in view of Douillard. Dosovitskiy teaches generating a simulated driving environment for reinforcement learning agents to train in varied driving conditions, as taught on page 2. Douillard teaches iterative calibration by adjusting sensor parameters to minimize residual error between predicted and measured data, as taught in paragraphs 0140 and 0162. Under the broadest reasonable interpretation, Douillard’s iterative adjustment of sensor parameters to minimize error corresponds to the claimed iterative update of tuning calibration parameters to maximize a reward, as both involve parameter optimization through iterative feedback. Incorporating global and local performance metrics merely specifies the types of objective criteria used during optimization and represents an obvious design choice in multi objective reinforcement learning systems. A person of ordinary skill would have combined Douillard’s iterative calibration with Dosovitskiy’s simulated environment to enhance training fidelity and improve system accuracy, as motivated by Douillard paragraph 0140. Therefore, the section 103 rejection of claims 1 and 13 and their dependent claims is maintained.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition
of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the
conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, 
Step 1: – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a machine. The claim satisfies Step 1.
Step 2A Prong 1: – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“generate a simulated environment, the simulated environment representing a plurality of driving situations; -- The limitation is directed to generating an environment to represent driving situations. The limitation is directed to a process that can be performed in the human mind using evaluation, observation, and judgement (with aid of pen and paper), and thus the limitation is directed to a mental process. 
“and generate… at least one calibration parameter based on simulated vehicle operations within a simulated environment…generate a simulated environment, the simulated environment representing a plurality of driving situations; --The limitation is directed to generating an environment to represent driving situations. ” – The limitation is directed to generating calibration parameters based on simulated vehicle operations. The limitation, in view of the spec [0021], is directed to the use of mathematical concept/calculation and generating an environment to, and thus the limitation is directed to math.   
Step 2A Prong 2 and Step 2B – Does the claim recite additional elements that integrate the judicial exception into a practical application and/or provide significantly more than the judicial exception?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application. The additional elements:
“A system comprising a computer including a processor and a memory, the memory including instructions such that the processor is programmed to:” – The limitation recites a system that will comprise of a processor and memory with instructions to apply onto the processor to perform the rest of the claim. The limitation does not integrate to practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(f)).
  “via a reinforcement learning agent”– The limitation is directed to the use of the reinforcement learning agent. Under the broadest reasonable interpretation, how the RL agent is recited in the claim is considered mere instructions to apply the exception (calibration parameter calculation) onto a computer, which cannot be integrated to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(f)). 
“the reinforcement learning agent having an action space comprising a tuning calibration parameter that defines a value of the calibration parameter” -- The limitation is directed to an RL agent having action space that comprises tuning a parameter that defines a value of the action parameter. The limitation amounts to no more than merely simply be refining a property of the reinforcement learning agent, limiting to a field of use/environment, and does not integrate to a practical application, nor does it provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
“the calibration parameter being iteratively updated during execution of the simulated vehicle operations to maximize a reward function including both global and local performance metrics.” -- The limitation recites iterative updating a parameter during execution of the operations for maximizing a reward function that includes both global/local performance metrics. The limitation is directed to a insignificant, extra-solution activity that does not integrate to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, the act of iteratively updating a parameter during execution of operations is a well-understood, routine, and conventional activity (WURC) that does not provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)).  
Thus, claim 1 is non-patent eligible. Claim 13 is analogous to claim 1, and therefore claim 13 will face the same rejection as set forth above. 

Regarding claim 2, (analogous to claim 14)
Step 1: – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a machine. The claim satisfies Step 1.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B – Does the claim recite additional elements that integrate the judicial exception into a practical application and/or provide significantly more than the judicial exception?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application. The additional elements:
“The system of claim 1, wherein the processor is further programmed to generate reinforcement learning agent for each zone within an operation state space, wherein each zone corresponds to a set of calibration parameters.” – The limitation recites that the processor discussed in claim 1 will further be programmed to generate an RL agent per zone within a state space that corresponds to calibration parameter sets. The limitation amounts to no more than further limiting to a field of use/environment, and thus it does not integrate to a practical application, nor does it provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 2 is non-patent eligible. Claim 14 is analogous to claim 2, and therefore claim 14 will face the same rejection as set forth above. 

Regarding claim 3, (analogous to claim 15)
Step 1: – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a machine. The claim satisfies Step 1.
Step 2A Prong 1: – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“The system of claim 2, wherein the processor is further programmed to divide the operation state space into at least two adjacent operation state space zones when the reinforcement learning agent has not converged.” –The limitation is directed to a processor that is programmed to divide the state space into adjacent zones when the RL agent hasn’t converged. The limitation is directed to a mathematical concept, as portioning numerical domains is a classic mathematical operation in the field, and thus the limitation is directed to math. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 3 is non-patent eligible. Claim 15 is analogous to claim 3, and therefore claim 15 will face the same rejection as set forth above. 

Regarding claim 4, (analogous to claim 16)
Step 1: – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a machine. The claim satisfies Step 1.
Step 2A Prong 1: – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“The system of claim 3, wherein each reinforcement learning agent trains for at least one of a predetermined computation budget or a predetermined time budget.” – The limitation recites training the reinforcement learning agent for a predetermined value (computation budget/time budget). The limitation is directed to a mathematical concept/calculation, and thus the limitation is directed to math. Furthermore, training for a predetermined value is a process that can be performed in the human mind using pen and paper, and thus the limitation can also be a mental process. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 4 is non-patent eligible. Claim 16 is analogous to claim 4, and therefore claim 16 will face the same rejection as set forth above. 

Regarding claim 5, (analogous to claim 17)
Step 1: – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a machine. The claim satisfies Step 1.
Step 2A Prong 1: – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“The system of claim 3, the processor is further programmed to generate a supervisor reinforcement learning agent that is configured to manage transitions between at least two adjacent operation state space zones.” – The limitation is directed to the processor that will generate a RL agent configured to manage transitions between state space zones. The limitation is directed to the mental process or organizing human activity, and thus the limitation is directed to a mental process. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 5 is non-patent eligible. Claim 17 is analogous to claim 5, and therefore claim 17 will face the same rejection as set forth above. 

Regarding claim 6, (analogous to claim 18)
Step 1: – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a machine. The claim satisfies Step 1.
Step 2A Prong 1: – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“The system of claim 5, wherein the supervisor reinforcement learning agent generates a transition set of calibration parameters based on the adjacent zones.” – The limitation is directed to generating a transition set of parameters based on the zones. The limitation is directed to a process that can be completed in the human mind using evaluation, observation, and judgement (with aid of pen and paper), and thus the limitation is directed to a mental process. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 6 is non-patent eligible. Claim 18 is analogous to claim 6, and therefore claim 18 will face the same rejection as set forth above. 

Regarding claim 7, (analogous to claim 19)
Step 1: – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a machine. The claim satisfies Step 1.
Step 2A Prong 1: – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“The system of claim 6, wherein the supervisor reinforcement learning agent generates the transition calibration parameter according to w=a1w1+a2w2+ . . . aNwN, where ai represents an i-th coefficient generated by the supervisor reinforcement learning agent, wi represents an output of the i-th reinforcement learning agent, and N represents a number of adjacent zones.” – The limitation is directed to generating calibration parameters according to a mathematical formula. The limitation is explicitly directed to the use of mathematical calculation/concept, and thus the limitation is directed to math. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 7 is non-patent eligible. Claim 19 is analogous to claim 7, and therefore claim 19 will face the same rejection as set forth above. 

Regarding claim 8, (analogous to claim 20)
Step 1: – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a machine. The claim satisfies Step 1. 
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B – Does the claim recite additional elements that integrate the judicial exception into a practical application and/or provide significantly more than the judicial exception?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application. The additional elements:
“The system of claim 1, wherein the processor is further programmed to generate the simulated environment based on a desired simulated driving situation.”  -- The limitation recites generating a simulating environment based on a driving situation. The limitation is directed to a process that can be performed in the human mind using evaluation, observation, and judgement (with pen and paper), and thus the limitation is directed to a mental process. 
Thus, claim 8 is non-patent eligible. Claim 20 is analogous to claim 8, and therefore claim 20 will face the same rejection as set forth above. 

Regarding claim 9,
Step 1: – Is the claim to a process, machine, manufacture, or composition of matter?Yes, the claim is to a machine. The claim satisfies Step 1.
Step 2A Prong 1: – Does the claim recite an abstract idea, law of nature, or natural phenomenon?Yes, the claim recites the abstract ideas of:
“generate a reported problem responsive to monitoring vehicle state parameters related to measurable reinforcement learning agent reward components and comparing the measurable reward components to predetermined performance thresholds;” – The limitation is directed to evaluating and comparing measured parameters against threshold values to detect a deviation. This comparison and evaluation constitute a mathematical concept and/or a mental process, as it involves mathematical relationships (comparison operations) that can be performed in the human mind using judgment or observation.
“process and label, … the collected vehicle state parameters based on the reported problem;” – The limitation is directed to processing and labeling data, which amounts to data organization and classification based on attributes. This constitutes a mental process and data manipulation, which is an abstract idea.
“determine, …whether the reported problem corresponding to the collected vehicle state parameters is below a predetermined frequency threshold, when compared to past collected vehicle data in the server;” – The limitation is directed to comparing frequencies and determining whether a value falls below a threshold, and is a process that can be performed in the human mind using evaluation, observation, and judgement, and thus the limitation is directed to a mental process.

Step 2A Prong 2 and Step 2B: – Does the claim recite additional elements that integrate the judicial exception into a practical application and/or provide significantly more than the judicial exception?No, the claim does not recite additional elements that integrate the judicial exception into a practical application.
The additional elements:
“A system comprising a computer including a processor and a memory, the memory including instructions such that the processor is programmed to:” – The limitation recites a generic computing environment using a processor and memory. These are conventional computing components that perform well-understood, routine, and conventional functions. As such, the limitation does not integrate the exception into a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(f)).
“at a server” and “by the server” – The limitations merely specify that the operations occur on a server or in a networked computing environment. Implementing an abstract idea on a server or distributed computer system is a generic computer implementation and does not integrate to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(f)). 
“receive collected vehicle state parameters from a vehicle;” – The limitation is directed to collecting and receiving data. Receiving or gathering data for subsequent analysis is a basic data-gathering step, which cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, the act of receiving/sending over a network is a well-understood, routine, and conventional activity (WURC) that does not provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)).
“retrain at least one reinforcement learning agent having an action space comprising tuning calibration parameters within a constructed simulated driving scenario based on the collected vehicle state parameters, wherein the reinforcement learning agent is trained using reinforcement learning states that include process states, inputs, previous outputs, and correct system responses, and wherein a reward function for the reinforcement learning agent comprises global metrics including fuel consumption, air pollution, and battery range, and local metrics including overshoot, oscillation, response reversal, steady state response, and prediction response;” – The limitation is directed to retraining an algorithm by updating parameters to maximize a reward function using input/output data and performance metrics and applying an iterative mathematical process using a computer, which cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under step 2B, retraining a model by updating parameters is a conventional technique in the field of artificial intelligence and does not amount to a meaningful integration of the abstract idea into a practical application (see MPEP 2106.05(d)(II)). Berkheimer evidence that RL agent retraining is WURC is below: 
([Haney, abstract, page 8] “A keystone architecture in the machine learning paradigm, reinforcement learning technologies power trading algorithms, driverless cars, and space satellites... Reinforcement learning software optimizes agent performance according to a reward. The process involves building models and developing systems for decision making embedded in software programs.”, wherein the examiner interprets “reinforcement learning software optimizes agent performance according to a reward” and “developing systems for decision making embedded in software programs” to be the same as “training and retraining reinforcement learning agents based on reward functions and decision outputs” because they are both directed to iterative optimization of agent behavior based on reward feedback embedded in software. This demonstrates that reinforcement learning agent training and optimization were well-understood, routine, and conventional activities in the machine-learning field prior to the effective filing date.
Regarding claim 10,
Step 1: – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a machine. The claim satisfies Step 1.
Step 2A Prong 1: – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“The system as recited in claim 9, wherein the processor is further programmed to determine whether the reported problem affects a number of vehicles that exceeds a predetermined vehicle amount.” –  This limitation recites the determining whether the problem affect the vehicles will exceed a predetermined amount/threshold. The limitation is directed to a process that can be performed in the human mind using evaluation, observation, and judgement, and thus the limitation is directed to a mental process. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 10 is non-patent eligible.

Regarding claim 11,
Step 1: – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a machine. The claim satisfies Step 1.
Step 2A Prong 1: – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“The system as recited in claim 10, wherein the processor is further programmed to generate an alert when the reported problem affects a number of vehicles that exceeds the predetermined vehicle amount.” – The limitation is directed to generating an alert once the problem does affect the number of vehicles and exceeds a predetermined amount/threshold. The limitation is directed to a process that can be performed in the human mind using evaluation, observation, and judgement, and thus the limitation is directed to a mental process.
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 11 is non-patent eligible.

Regarding claim 12,
Step 1: – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a machine. The claim satisfies Step 1.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B – Does the claim recite additional elements that integrate the judicial exception into a practical application and/or provide significantly more than the judicial exception?
No, the claim does not recite additional elements that integrate the judicial exception into a practical application. The additional elements:
“The system as recited in claim 11, wherein the alert comprises at least one of an audio alert, a haptic alert, or a visual alert.”  -- The limitation recites that the alert introduced in claim 11 will further comprise of different types of alerts as a form of notification. The limitation amounts to no more than merely limiting to a field of use/environment, and does not integrate to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)).
Thus, claim 12 is non-patent eligible.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this
Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not
identically disclosed as set forth in section 102, if the differences between the claimed invention and the
prior art are such that the claimed invention as a whole would have been obvious before the effective filing
date of the claimed invention to a person having ordinary skill in the art to which the claimed invention
pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are
summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness. 

Claims 1,8, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over NPL reference “CARLA: An Open Urban Driving Simulator” by Dosovitskiy et. al. (referred herein as Dosovitskiy.) in view of, US 20170124781A1, by Douillard et. al. (referred herein as Douillard). 

Regarding claim 1, Dosovitskiy teaches A system comprising a computer including a processor and a memory, the memory including instructions such that the processor is programmed to generate a simulated environment, the simulated environment representing a plurality of driving situations; ([Dosovitskiy, page 2] “We introduce CARLA (Car Learning to Act) – an open simulator for urban driving…We use CARLA to stage controlled goal-directed navigation scenarios of increasing difficulty. We manipulate the complexity of the route that must be traversed, the presence of traffic, and the environmental conditions.”, wherein the examiner interprets “stage controlled goal-directed navigation scenarios … manipulating route complexity, traffic, and environmental conditions” to be the same as generate a simulated environment representing a plurality of driving situations because they are both directed to constructing virtual environments that model varied driving conditions for training and testing autonomous vehicle control behavior.)
Dosovitskiy does not teach generate, via a reinforcement learning agent, at least one calibration parameter based on simulated vehicle operations within a simulated environment, the reinforcement learning agent having an action space comprising a tuning calibration parameter that defines a value of the calibration parameter, the calibration parameter being iteratively updated during execution of the simulated vehicle operations to maximize a reward function including both global and local performance metrics;
Douillard teaches generate, via a reinforcement learning agent, at least one calibration parameter based on simulated vehicle operations within a simulated environment, the reinforcement learning agent having an action space comprising a tuning calibration parameter that defines a value of the calibration parameter, the calibration parameter being iteratively updated during execution of the simulated vehicle operations to maximize a reward function including both global and local performance metrics; ([Douillard, paragraph 0162] “An intrinsic sensor calibration module 3704 may be used to determine intrinsic calibration parameters for a sensor.” and [Douillard, paragraph 0140] “The calibration process may iteratively adjust sensor parameters to minimize residual error between predicted and measured data.”, wherein the examiner interprets “iteratively adjust sensor parameters to minimize residual error between predicted and measured data” to be the same as iteratively updating a tuning calibration parameter to maximize a reward function because they are both directed to adjusting system parameters over multiple iterations to optimize a performance criterion that measures overall system accuracy. The examiner further interprets “intrinsic sensor calibration module … determine intrinsic calibration parameters for a sensor” to be the same as generate at least one calibration parameter based on simulated vehicle operations because they are both directed to calculating calibration values derived from simulated or measured operational data for improving the performance of the system.)
Dosovitskiy, Douillard, and the instant application are analogous art because they are all directed to reinforcement-learning-based vehicle simulation systems that use synthetic or real sensor data to improve autonomous vehicle control and calibration within a simulated driving environment.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the simulation-based reinforcement learning training framework disclosed by Dosovitskiy to include the “intrinsic sensor calibration module 3704 may be used to determine intrinsic calibration parameters for a sensor” disclosed by Douillard. One would be motivated to do so to effectively enhance the accuracy and reliability of reinforcement-learning model performance within the simulated driving framework by introducing iterative calibration feedback loops that align simulated sensor data with real-world sensor responses, as suggested by Douillard (Douillard, [paragraph 0140] “The calibration process may iteratively adjust sensor parameters to minimize residual error between predicted and measured data.”). Claim 13 is analogous to claim 1, and thus would face the same rejection as above.

Regarding claim 8, Dosovitskiy and Douillard teaches The system of claim 1 (see rejection of claim 1).
	Dosovitskiy further teaches The system of claim 1, wherein the processor is further programmed to generate the simulated environment based on a desired simulated driving situation. ([Dosovitskiy, page 2] “We use CARLA to stage controlled goal-directed navigation scenarios of increasing difficulty. We manipulate the complexity of the route that must be traversed, the presence of traffic, and the environmental conditions.”, wherein the examiner interprets staging controlled goal-directed navigation scenarios and manipulating route complexity, traffic, and environmental conditions to be the same as generating the simulated environment based on a desired simulated driving situation because they are both directing the simulator to tailor environmental parameters to match a particular driving scenario desired for testing or training.) Claim 20 is analogous to claim 8, and thus would face the same rejection as above.

Claims 2-6, and 14-18 are rejected under 35 U.S.C. 103 as being unpatentable over Dosovitskiy in view of Douillard further in view of NPL reference “Explainability in reinforcement learning: perspective and position” by Krajna et. al. (referred herein as Krajna). 

Regarding claim 2, Dosovitskiy and Douillard teaches The system of claim 1 (see rejection of claim 1). 
	Douillard further teaches wherein each zone corresponds to a set of calibration parameters. ([Douillard, [0163], “Similar to the intrinsic sensor calibration module 3704, the extrinsic sensor calibration module 3706 may rely on a data transform module 3708 and/or a generative model module 3710 to perform the computations necessary to converge on optimal extrinsic calibration parameters using other sensor data, log file data, and/or map tile data.” wherein the examiner interprets “map tile data” that subdivides the driving environment into discrete tiles to be the same as zones within an operation state space and the calibration parameter determined for each map tile to be the same as the set of calibration parameters corresponding to each zone because they are both associating localized calibration values with each spatial segment of the operational environment).
	Dosovitskiy, Douillard, and the instant application are analogous art, because they are all directed to zones and their corresponding calibration parameters. 
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system of claim 1 disclosed by Dosovitskiy and Douillard to include the calibration module to converge calibration parameters as disclosed by Douillard. One would be motivated to do so to efficiently use a calibration module to generate a set for extrinsic calibration modules to apply onto the zones, as suggested by Douillard ([[Douillard, 0163] “the extrinsic sensor calibration module 3706… to converge on optimal extrinsic calibration parameters using other sensor data”).
	Dosovitskiy and Douillard do not teach wherein the processor is further programmed to generate reinforcement learning agent for each zone within an operation state space.
	Krajna teaches wherein the processor is further programmed to generate reinforcement learning agent for each zone within an operation state space ([Krajna, page 9], “They extend the classic U-tree, classic RL algorithm which represents a Q-function using a tree structure, by adding a linear model to each leaf node,” wherein the examiner interprets instantiating a distinct learned model in every leaf of the tree, to be the same as generating a reinforcement-learning agent for each zone within the operation state space because they are both creating a separate policy model that controls decision-making in its respective partition of the overall state space.)
	Dosovitskiy, Douillard, Krajna, and the instant application are analogous art, because they are all directed to generating an RL agent for the state spaces and zones. 
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system of claim 1 disclosed by Dosovitskiy and Douillard to include the “U-tree, classic RL algorithm” disclosed by Krajna. One would be motivated to do so to efficiently use the algorithm to generate an RL agent as a model for each zone within a state space controlling the decision-making as suggested by Krajna ([Krajna, page 9] “They extend the classic U-tree, classic RL algorithm which represents a Q-function using a tree structure, by adding a linear model to each leaf node”). Claim 14 is analogous to claim 2, and thus would face the same rejection as above.

	Regarding claim 3, Dosovitskiy, Douillard, and Krajna teaches The system of claim 2 (see rejection of claim 2). 
	Krajna further teaches wherein the processor is further programmed to divide the operation state space into at least two adjacent operation state space zones when the reinforcement learning agent has not converged ([Krajna, page 9] ”This algorithm uses a lookahead approach that predicts which split will increase reward the most and only in this case will the tree size be increased”, wherein the examiner interprets the act of splitting that tree to be the same as dividing the operation state space into adjacent zones because they are both directed to partitioning the agent’s current state-space region into finer neighboring sections whenever learning has not yet reached convergence (i.e. reward threshold has not been achieved)
	Dosovitskiy, Douillard, Krajna, and the instant application are analogous art, because they are all directed to adaptive reinforcement-learning systems that subdivide the operating state space into smaller adjacent regions whenever an agent’s learning has not yet converged.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify The system of claim 2 disclosed by Dosovitskiy, Douillard, and Krajna to include the split/division disclosed by Krajna. One would be motivated to do so to efficiently accelerate the agent’s reward-driven convergence, as suggested by Krajna ([Krajna, page 9] “This algorithm uses a lookahead approach that predicts which split will increase reward the most”). Claim 15 is analogous to claim 3, and thus would face the same rejection as above.
	
Regarding claim 4, Dosovitskiy, Douillard, and Krajna teaches The system of claim 3 (see rejection of claim 3).
	Dosovitskiy further teaches wherein each reinforcement learning agent trains for at least one of a predetermined computation budget or a predetermined time budget. ([Dosovitskiy, page 5] “The episode is terminated when the vehicle reaches the goal, when the vehicle collides with an obstacle, or when a time budget is exhausted.” and [Dosovitskiy, page 6] “We limit training to 10 million simulation steps because of computational costs imposed by the realistic simulation.”, wherein the examiner interprets “terminated when …a time budget is exhausted” to be the same as each reinforcement learning agent training for a predetermined time budget because they are both applying a fixed temporal limit that halts training once the allotted time has been consumed, and interprets limiting training to 10 million simulation steps due to computational costs to be the same as each reinforcement learning agent training for a predetermined computation budget because they are both imposing a predefined cap on computational resources available for training.)
	Dosovitskiy, Douillard, Krajna, and the instant application are analogous art, because they are all directed to Douillard, Dosovitskiy, Krajna, and the instant application are analogous art because they are all directed to autonomous-vehicle reinforcement-learning systems that constrain each agent’s training by a predetermined computation or time budget.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the fleet-level autonomous-driving control framework disclosed by Douillard to include the “limit training to 10 million simulation steps because of computational costs” disclosed by Dosovitskiy. One would be motivated to do so to efficiently conserve computational resources during agent training, as suggested by Dosovitskiy ([Dosovitskiy, page 6] “computational costs imposed by the realistic simulation”). Claim 16 is analogous to claim 4, and thus would face the same rejection as above.

	Regarding claim 5, Dosovitskiy, Douillard, and Krajna teaches The system of claim 3 (see rejection of claim 3).
	Kranja further teaches the processor is further programmed to generate a supervisor reinforcement learning agent that is configured to manage transitions between at least two adjacent operation state space zones. ([Krajna, page 13]  “There are two agents, a high-level agent that will divide the full task into smaller actions (sub-goals) for a low-level agent, which follows the tasks one by one”, wherein the examiner interprets a high-level agent that will divide the full task into smaller actions to be the same as a supervisor reinforcement learning agent because they are both top-level controllers that orchestrate subordinate policies), AND ([Krajna, page 13]  “the stochastic temporal grammar (STG) was used to summarize temporal transitions between various tasks which is learned via self supervision”, wherein the examiner interprets “there are two agents, a high-level … for a low-level agent” to be the same as “supervisor reinforcement learning agent” because one oversees the other in both cases. The examiner further interprets summarizing temporal transitions between various tasks to be the same as managing transitions between at least two adjacent operation state space zones because they are both concerned with governing how the system moves from one operational region to the next during execution).
	Dosovitskiy, Douillard, Krajna, and the instant application are analogous art, because they are all directed to autonomous-navigation systems that employ a high-level controller to govern lower-level reinforcement-learning agents as the vehicle moves between discrete operating regions.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the autonomous-driving simulation framework disclosed by Dosovitskiy to include the two agents disclosed by Krajna. One would be motivated to do so to effectively coordinate complex driving behaviors through hierarchical task decomposition, as suggested by Krajna ([Krajna, page 13] “two agents, a high level agent that will divide the full task into smaller actions”). Claim 17 is analogous to claim 5, and thus would face the same rejection as above.
	
Regarding claim 6, Dosovitskiy, Douillard, and Krajna teaches The system of claim 5 (see rejection of claim 5).
	  Kranja searches wherein the supervisor reinforcement learning agent generates a transition set of calibration parameters based on the adjacent zones. ([Krajna, page 13]  “There are two agents, a high-level agent that will divide the full task into smaller actions (sub-goals) for a low-level agent, which follows the tasks one by one [61]. They tried to achieve explainability by using a heatmap on which the subgoals with higher Q values are marked. The agent attributes higher values to sub-goals close to the end-goal rather than those closer to the starting position. This can only show that the agent learned a good representation of its environment, but it cannot give an exact explanation as to why the robot took some action at a given point in time”, wherein the examiner interprets a high-level agent that will divide the full task into smaller actions (sub-goals) to be the same as the supervisor reinforcement learning agent because they are both higher-level reinforcement-learning controllers that oversee subordinate agents and decide what tasks they should execute).
Dosovitskiy, Douillard, Krajna, and the instant application are analogous art, because they are all directed to generating parameters based on adjacent zones.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify system of claim 5 disclosed by Dosovitskiy, Douillard, and Krajna to include the dividing of tasks into sub-goals to be similar to a parameter set as disclosed by Krajna. One would be motivated to do so to effectively generate intermediate calibration parameters that steer low-level agents through adjacent zones, as suggested by Krajna ([Krajna, page 13] “divide the full task into smaller actions (sub-goals) for a low-level agent...They tried to achieve explainability by using a heatmap on which the subgoals with higher Q values are marked.”). Claim 18 is analogous to claim 6, and thus would face the same rejection as above.
 
Claims 7 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Dosovitskiy in view of Douillard in view of Kranja further in view of US10065654B2, by Nishi et. al. (referred herein as Nishi).

	Regarding claim 7, Dosovitskiy, Douillard, and Krajna teaches The system of claim 6 (see rejection of claim 6).
	Dosovitskiy, Douillard, and Krajna do not teach wherein the supervisor reinforcement learning agent generates the transition calibration parameter according to w=a1w1+a2w2+ . . . a_Nw_N, where a_i represents an i-th coefficient generated by the supervisor reinforcement learning agent, wi represents an output of the i-th reinforcement learning agent, and N represents a number of adjacent zones.
	Nishi teaches wherein the supervisor reinforcement learning agent generates the transition calibration parameter according to w=a1w1+a2w2+ . . . a_Nw_N, where a_i represents an i-th coefficient generated by the supervisor reinforcement learning agent, wi represents an output of the i-th reinforcement learning agent, and N represents a number of adjacent zones. ([Nishi, page 10, col 7, lines 56-64] “a linear combination of weighted radial basis functions (RBF’s) may be used: [Eq 60] where wᵢ are the weights, φᵢ are j-th RBFs, and N is the number of RBFs”, wherein the examiner interprets the expression of Z as a weighted sum of N function outputs, each scaled by a learned weight wᵢ, to be the same as computing the transition calibration parameter was a weighted sum of N reinforcement-agent outputs wᵢ scaled by coefficients aᵢ, because they are both forming a single parameter by linearly combining N component values with learned coefficients.)
	Dosovitskiy, Douillard, Krajna, Nishi, and the instant application are analogous art, because they are all directed to reinforcement-learning control systems that compute a transition-calibration parameter as a weighted combination of outputs produced by multiple adjacent agents or zones.
	It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system of claim 6 disclosed by Dosovitskiy, Douillard, and Krajna to include the Eq. 60 as disclosed by Nishi. One would be motivated to do so to effectively produce a single calibration parameter that blends contributions from several neighboring agents for smoother and more accurate transitions, as suggested by Nishi ([Nishi, col. 7, lines 56-64] “a linear combination of weighted radial basis functions (RBF’s) may be used”). Claim 19 is analogous to claim 7, and thus would face the same rejection as above. 

Claims 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Palanisamy (US 20200033869 A1)
 in view of Douillard.

Regarding claim 9, Palanisamy teaches:
 A system comprising a computer including a processor and a memory, the memory including instructions such that the processor is programmed to ([Palanisamy, [0057], ”The controller 34 includes at least one processor 44 and a computer readable storage device or media 46. The processor 44 can be any custom made or commercially available graphics processor, a central processing unit (CPU), a processing unit (GPU), an auxiliary processor among several processors associated with the controller 34, a semiconductor based microprocessor (in the form of a microchip or chip set), a macroprocessor, any combination thereof, or generally any device for executing instructions The computer readable storage device or media 46 may include volatile and nonvolatile storage in read-only memory (ROM), random-access memory (RAM)”, wherein the examiner interprets “controller 34” including “at least one processor 44” and a “computer readable storage device or media 46,” with the processor being “any device for executing instructions,” to be the same as a computer including a processor and a memory, the memory including instructions such that the processor is programmed to, because they are both directed to a computing unit in which a processor executes instructions stored in memory to carry out programmed operations.)
receive collected vehicle state parameters from a vehicle;([Palanisamy, [0017]) “Inone embodiment, each of the driving environment processors is configured to process sensor information from on-board sensors that describes a specific driving environment to generate a state of the specific driving environment, and wherein each of the one or more driver agents is further configured to: process the state, in accordance with a policy, to generate a corresponding action”, wherein the examiner interprets “process sensor information from on-board sensors that describes a specific driving environment to generate a state of the specific driving environment” to be the same as “receive collected vehicle state parameters from a vehicle,” because they are both directed to obtaining sensor-derived data from the vehicle and producing the vehicle’s state information for use by the processor.)
generate a reported problem responsive to monitoring vehicle state parameters related to measurable reinforcement learning agent reward components and; ([Palanisamy, [0092], “ranked based on novelty/priority of each driving experience as determined by a prioritization algorithm 134 of the driving policy generation module 130). For example, the driving policy generation module 130 can update the relative priority/novelty/impact/effectiveness 126 of the driving experiences 124 in the experience memory 120, and then rank the driving experiences in a priority order. In one embodiment, when a driver agent 116 acquires a driving experience, it adds its own estimate of the priority as the Instance information (I) as described. The driving policy learner module(s) 131, which have access to much more information through the pooled experience memory 120, can update a value of priority/novelty/impact/effectiveness…low priority and commonly occurring driving experiences can be discarded to reduce volume of the driving experiences stored.”, wherein the examiner interprets “ranked based on novelty/priority of each driving experience as determined by a prioritization algorithm 134,” “update the relative priority/novelty/impact/effectiveness,” “rank the driving experiences in a priority order,” “adds its own estimate of the priority,” “pooled experience memory 120,” and “low priority and commonly occurring driving experiences can be discarded to reduce volume of the driving experiences stored” to be the same as generate a reported problem responsive to monitoring vehicle state parameters related to measurable reinforcement learning agent reward components and, because they are both directed to monitoring vehicle-derived experience data against agent performance criteria and surfacing higher-priority or anomalous cases as a reported problem while filtering routine cases using criteria analogous to measurable reward components.)
comparing the measurable reward components to predetermined performance thresholds ([Palanisamy, [0007], “a reward comprising: a signal that signifies how desirable an action performed by the driver agent is at a given time under particular environment conditions, wherein the reward is automatically computed”, wherein the examiner interprets “a reward…how desirable an action is performed by the driver agent” to be the same as a “measurable award” and “how desirable” to be the same as “performance threshold” because they are both directed to quantitatively evaluating an agent’s action under defined conditions using automatically computed criteria. A predetermined threshold that determines desirable vs. undesirable driver agent performance.)
retrain at least one reinforcement learning agent having an action space comprising tuning calibration parameters within a constructed simulated driving scenario based on the collected vehicle state parameters ([Palanisamy, [0122]]. “The gradient descent optimizer 140 … compute updated parameters … The updated parameters can be used to retrain and optimize neural network parameters of the DRL algorithm 132,” and [Palanisamy, [0080]] “from the simulation engine in case of simulated driving environments,” wherein the examiner interprets “The updated parameters can be used to retrain and optimize neural network parameters of the [Deep Reinforcement Learning] DRL algorithm 132” to be the same as retrain at least one reinforcement learning agent and interprets “from the simulation engine in case of simulated driving environments” to be the same as within a constructed simulated driving scenario because they are both directed to performing agent training and evaluation in a simulator that replicates driving conditions.)
wherein the reinforcement learning agent is trained using reinforcement learning states that include process states, inputs, previous outputs, and correct system responses; ([Palanisamy, [0080]), In one embodiment, each driving experience can be represented in a large, multi-dimensional tensor that includes information from a particular driving environment at a particular time. Each experience includes: state (S), observation (O), action (A), reward (R), next state (S′), next observation (O’), goal (G), and instance information (I). As used herein, the term “state (S),” when used with reference to a driving experience, can refer to the state of the environment that can be perceived/observed by the driving environment processor and driver agents through sensors on-board the vehicle or through some other means like Vehicle to Infrastructure (V2I) or Vehicle to Vehicle (V2V) communication”, wherein the examiner interprets “state (S)...next observation (O′)” to be the same as reinforcement learning states that include process states, inputs, previous outputs, and correct system responses, because they are both directed to the standard reinforcement learning training elements in which the process state aligns with the state, the inputs align with the “observation”, the previous outputs align with the “action” taken at the prior step, and the correct system responses align with the “reward(R)” used to evaluate that action.)
Palanisamy does not teach process and label, at a server, the collected vehicle state parameters based on the reported problem; determine, by the server, whether the a-reported problem corresponding to the collected vehicle state parameters is below a predetermined frequency threshold, when compared to past collected vehicle data in the server; … and wherein a reward function for the reinforcement learning agent comprises global metrics including fuel consumption, air pollution, and battery range, and local metrics including overshoot, oscillation, response reversal, steady state response, and prediction response.
Douillard teaches process and label, at a server, the collected vehicle state parameters based on the reported problem; ([Douillard, [0095], “FIG 15 is an example of a flow diagram to control an autonomous vehicle … The message data may indicate event attributes associated with a non-normative state of operation in the context of a planned path for an autonomous vehicle. For example, an event may be characterized as a particular intersection that becomes problematic due to, for example, a large number of pedestrians, hurriedly crossing the street against a traffic light.”, and [Douillard, [0096]], “FIG. 16 is a diagram of an example of an autonomous vehicle fleet manager implementing a fleet optimization manager, according to some examples. Diagram 1600 depicts an autonomous vehicle fleet manager that is configured to manage a fleet of autonomous vehicles 1630 transiting within a road network 1650. Autonomous vehicle fleet manager 1603 is coupled to a teleoperator 1608 via a teleoperator computing device 1604, and is also coupled to a fleet management data repository 1646. Autonomous vehicle fleet manager 1603 is configured to receive policy data. 1602 and environmental data 1606, as well as other data. Further to diagram 1600, fleet optimization manager 1620 is shown to include a transit request processor 1631”, wherein the examiner interprets “message data … indicating event attributes associated with a non-normative state of operation” to be the same as “the reported problem” because they are both directed to an identified operational issue for the autonomous vehicle. The examiner further interprets, “autonomous vehicle fleet manager … coupled to a fleet management data repository” to be the same as “at a server” because they are both directed to a centralized computing system that stores and processes vehicle data. Finally, the examiner interprets “receive policy data … and environmental data” to be the same as “the collected vehicle state parameters” because they are both directed to vehicle and environment state information used for processing, such that associating the “event attributes” with the received data constitutes “process and label … based on the reported problem” because they are both directed to attaching problem-specific attributes to the collected data during server-side processing.)
determine, by the server, whether the reported problem corresponding to the collected vehicle state parameters is below a predetermined frequency threshold, when compared to past collected vehicle data in the server; ([Douillard, [0166]], “log file data retrieved from the log file store 3716 … probability score of the likelihood … may be generated from past sensor measurements and current sensor measurements” and [Douillard, [0167]] “Various predetermined thresholds may be used … by the heuristics engine module 3712 … to ensure that the AV system 3602 is operating within acceptable safe operating parameters,” wherein the examiner interprets “log file data retrieved from the log file store … past sensor measurements and current sensor measurements” to be the same as “compared to past collected vehicle data in the server,” because they are both directed to evaluating current indications against historical data stored in a server-resident repository. The examiner further interprets “probability score of the likelihood” to the same as “frequency” of occurrence as probability can only be determined by measuring frequent occurrences and “Various predetermined thresholds may be used” to be the same as “predetermined frequency threshold,” because they are both directed to server-side decision rules that gate conditions using preset thresholds over measured event statistics (a.k.a. frequency or likelihood of occurrence).)
wherein a reward function for the reinforcement learning agent comprises global metrics including fuel consumption, air pollution, and battery range, and local metrics including overshoot, oscillation, response reversal, steady state response, and prediction response. ([Douillard, [0170]] “the current trajectory and/or route … as well as other operational parameters, such as current battery level,” and “[Douillard, Fig. 19, block 1914], “MONITOR A FLEET OF AUTONOMOUS VEHICLES … WITH DATA REPRESENTING FLEET QUALITY OF SERVICE METRICS” wherein the examiner interprets explicit consideration of battery level and “FLEET QUALITY OF SERVICE METRICS” as other operational metrics to be the same as including battery range within a reward function’s global metrics because they are both directed to optimizing fleet/vehicle performance and route.)
Palanisamy, Douillard, and the instant application are analogous art because they are all directed to autonomous-vehicle data management and reinforcement-learning systems that monitor, evaluate, and update vehicle-control models based on fleet-level operational data.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the autonomous driving experience-memory framework disclosed by Palanisamy to include the thresholding technique disclosed by Douillard. One would be motivated to do so to efficiently gate server-side decisions with repository-backed criteria that maintain safe and reliable operation, as suggested by Douillard (Douillard, [0167] “to ensure that the AV system … is operating within acceptable safe operating parameters.”).

Regarding claim 10, Palanisamy and Douillard teaches The system as recited in claim 9 (see rejection of claim 9)
Palanisamy does not teach wherein the processor is further programmed to determine whether the reported problem affects a number of vehicles that exceeds a predetermined vehicle amount.
Douillard teaches wherein the processor is further programmed to determine whether the reported problem affects a number of vehicles that exceeds a predetermined vehicle amount. ([Douillard, [0096] “data for each vehicle may describe maintenance issues, scheduled service calls, daily usage, battery charge and discharge rates, and any other data, which may be updated in real-time, may be used for purposes of optimizing a fleet of autonomous vehicles to minimize downtime.” and [Douillard, [0098] “attributes … are calculated to determine a subset of autonomous vehicles that are available to service the request”, wherein the examiner interprets the combination of “maintenance issues … updated in real-time” together with “determine a subset of autonomous vehicles” to be the same as determining whether the reported problem affects a number of vehicles that exceeds a predetermined vehicle amount because they are both directed to analyzing fleet-wide fault data and comparing how many vehicles exhibit the issue against selection criteria (i.e., a threshold) in order to decide subsequent fleet actions.)
Palanisamy, Douillard, and the instant application are analogous art, because they are all directed to fleet-level autonomous-vehicle management that monitors vehicle condition data and triggers fleet responses when issues arise across multiple vehicles.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system as recited in claim 9 disclosed by Palanisamy and Douillard to include the “data for each vehicle may describe maintenance issues … determine a subset of autonomous vehicles that are available to service the request” disclosed by Douillard. One would be motivated to do so to efficiently minimize downtime across the fleet, as suggested by ([Douillard, 0096, 0098] “optimizing a fleet of autonomous vehicles to minimize downtime…determine a subset of autonomous vehicles that are available to service the request”).

Regarding claim 11, Palanisamy and Douillard teaches The system as recited in claim 10 (see rejection of claim 10).
Douillard further teaches wherein the processor is further programmed to generate an alert when the reported problem affects a number of vehicles that exceeds the predetermined vehicle amount. ([Douillard,  [0095] ”aggregated data associated with a group of autonomous vehicles”; AND “representations of the set of recommended courses of action may be presented visually on a display of a teleoperator computing device”, wherein the examiner interprets aggregated data associated with a group of autonomous vehicles to be the same as the reported problem affects a number of vehicles that exceeds the predetermined vehicle amount because they are both directed to identifying an event that involves more vehicles than a set fleet-size threshold, and interprets presented visually on a display of a teleoperator computing device to be the same as generate an alert because they are both directed to notifying an operator of that fleet-wide condition.)
Palanisamy, Douillard, and the instant application are analogous art because they are all directed to fleet-level autonomous-vehicle monitoring systems that collect operational data and surface alerts when conditions require human awareness.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system as recited in claim 10 disclosed by Palanisamy and Douillard to include the “courses of action may be presented visually on a display” disclosed by Douillard. One would be motivated to do so to effectively alert a human operator of a problem using, for example, visuals, as suggested by Douillard (Douillard, [0095] “present representations of the set of recommended courses of action may be presented visually on a display of a teleoperator computing device”).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Palanisamy in view of Douillard further in view of US20210334644A1, by Yu et. al. (referred herein as Yu). 
Regarding claim 12, Palanisamy and Douillard teaches The system as recited in claim 11 (see rejection of claim 11).
Palanisamy and Douillard do not teach wherein the alert comprises at least one of an audio alert, a haptic alert, or a visual alert.
Yu teaches wherein the alert comprises at least one of an audio alert, a haptic alert, or a visual alert. ([0204] “In at least one embodiment , an FCW system may provide a warning , such as in form of a sound , visual warning , vibration and / or a quick brake pulse.” , wherein the examiner interprets providing a warning in the form of a sound, a visual indication, or a vibration to be the same as the alert comprising, respectively, an audio alert, a visual alert, or a haptic alert.)
Palanisamy, Douillard, Yu, and the instant application are analogous art, because they are all directed to autonomous-vehicle safety systems that issue multimodal alerts (audio, visual, and haptic) in response to detected driving conditions.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify system as recited in claim 11 disclosed by Palanisamy and Douillard to include the “provide a warning, such as in form of a sound, visual warning, vibration and/or a quick brake pulse” disclosed by Yu. One would be motivated to do so to effectively alert vehicle occupants through multiple sensory channels (sound, vibration, quick brake pulse), as suggested by Yu ([Yu, 0204] ”a warning , such as in form of a sound , visual warning , vibration and / or a quick brake pulse.”)

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DEVAN KAPOOR whose telephone number is (703)756-1434. The examiner can normally be reached Monday - Friday: 9:00AM - 5:00 PM EST (times may vary).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DEVAN KAPOOR/Examiner, Art Unit 2126                                                                                                                                                                                                        
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Show 1 earlier event
Jul 25, 2025
Non-Final Rejection mailed — §101, §103
Aug 06, 2025
Interview Requested
Aug 20, 2025
Examiner Interview Summary
Aug 20, 2025
Applicant Interview (Telephonic)
Aug 28, 2025
Response Filed
Nov 14, 2025
Final Rejection mailed — §101, §103
Dec 19, 2025
Interview Requested
Dec 23, 2025
Response after Non-Final Action
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
10%
Grant Probability
27%
With Interview (+16.7%)
4y 4m (~4m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 10 resolved cases by this examiner. Grant probability derived from career allowance rate.