Last updated: May 29, 2026
Application No. 17/751,625
COMBINING MATH-PROGRAMMING AND REINFORCEMENT LEARNING FOR PROBLEMS WITH KNOWN TRANSITION DYNAMICS

Non-Final OA §103
Filed
May 23, 2022
Priority
Aug 03, 2021 — provisional 63/229,049
Examiner
CHUANG, SU-TING
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
3 (Non-Final)
Interview Optional

— +37.2% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 50% grant rate with +37.2% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 104 resolved cases, 2023–2026
Examiner Intelligence

CHUANG, SU-TING View full profile →
Grants 50% of resolved cases
Career Allowance Rate
52 granted / 104 resolved
-5.0% vs TC avg
Strong +37% interview lift
Without
With
+37.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
18 currently pending
Career history
130
Total Applications
across all art units
Statute-Specific Performance

§101
11.7%
-28.3% vs TC avg
§103
75.8%
+35.8% vs TC avg
§102
8.7%
-31.3% vs TC avg
§112
2.0%
-38.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 104 resolved cases
Office Action

§103
DETAILED ACTION
This action is in response the communications filed on 02/02/2026 in which claims 1, 4-5, 10 and 14 are amended, claims 9 and 16-19 are canceled and therefore claims 1-8, 10-15 and 20 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/02/2026 has been entered.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-2, 4-6, 10-11, 13-14 and 20 rejected under 35 U.S.C. 103 as being unpatentable over Regaieg ("Multi-Objective Mixed Integer Linear Programming Model for VM Placement to Minimize Resource Wastage in a Heterogeneous Cloud Provider Data Center" 2018) in view of Zhang ("Soft actor-critic –based multi-objective optimized energy conversion and management strategy for integrated energy systems with renewable energy" 20210610) in further view of Kurtz ("An Integer Programming Approach to Deep Neural Networks with Binary Activation Functions" 20200807)

In regard to claims 1 and 10, Regaieg teaches: A computing device associated with a distributed computing platform configured to perform a computational task, the distributed computing platform comprising a plurality of computing nodes, the computing device comprising: (Regaieg, p. 401 "Virtual Machine Placement (VMP) is one of the pressing issues encountered in cloud computing data centers. [a distributed computing platform] VMP is the process of selecting the most suitable physical machine (PM) [a plurality of computing nodes] to host the virtual machines (VMs)."; p. 404 "We generated 50 test-scenarios, that is, 50 different VM request instances [a computational task] each of which consists of N VM requests generated randomly from a predefined set of VM types referred to as Small (S), Medium (M), Large (L) and XLarge (XL) and whose characteristics are detailed in Table III(b)."; Cloud data centers configured to host virtual machine request instances, i.e. [the platform configured to perform a computational task]. Cloud data centers comprise physical machines, i.e. [the distributed computing 
    PNG
    media_image1.png
    186
    502
    media_image1.png
    Greyscale
platform comprising a plurality of computing nodes])

… wherein the current state indicates memory requirement data of each computing node of the plurality of computing nodes, and (Regaieg, p. 402 "i and j as subscript usually denotes a virtual machine request and a physical machine index respectively… N  corresponds to the number of virtual machines arriving at the Data Center to be hosted. The VM request numbered i, denoted vi,∀1≤i≤N, is defined by the tri-tuple (ci, ri, si) where ci, ri and si are the CPU, memory and storage requirements of VM vi. [memory requirement data]") the memory requirement data of each computing node of the plurality of computing nodes specifies an amount of memory required by a corresponding computing node of the plurality of computing nodes for executing a corresponding subtask of the computational task; (Regaieg, p. 404 "We generated 50 test-scenarios, that is, 50 different VM request instances [the computational task] each of which consists of N VM requests [a corresponding subtask] generated randomly from a predefined set of VM types referred to as Small (S), Medium (M), Large (L) and XLarge (XL) and whose characteristics are detailed in Table III(b)."; see Table III(b) S 2GB, M 4GB, L 8GB... [an amount of memory], i.e. memory used in a PM for executing those VM requests)
… a mixed integer mathematical problem of a plurality of operational states associated with the distributed computing platform and  (Regaieg, p. 403 "A. The Multi-Objective Mixed Integer Linear Programming Model, The Multi-Objective Mixed ILP model [a mixed integer mathematical problem] (Model 1) relies on three separate steps to compute the optimal VM-PM mapping, as shown in Figure 4."; p. 402 "The VM request numbered i, denoted vi,∀1≤i≤N, is defined by the tri-tuple (ci, ri, si)... The PM numbered j, denoted Pj,∀1≤j≤M is characterized by the tri-tuple (Cj, Rj, Sj)..."; see 'Number of PMs, PMs' characteristics, Number of VMs...' in Fig. 4, or 'Given N, M, Cj , Rj , Sj , ci, ri and si with constraints Eq. (2)(3)(4)' in Table I are [operational states associated with the platform], i.e. states are any observations in the environment, e.g. the number of physical/virtual machines and their CPU, memory and storage requirements in the cloud computing data centers) a plurality of actions associated with the distributed computing platform (Regaieg, p. 401 "The way to place these VMs into the Physical Machines (PMs) is known as the VM Placement [actions] (VMP) problem [1]."; p. 406 "In this paper, we proposed a MOMILP model to address the VM placement problem in CSP data-centers with heterogeneous PM configuration.")

    PNG
    media_image2.png
    628
    798
    media_image2.png
    Greyscale
… wherein the selected action includes allocation of at least one computational workload from a first computing node of the plurality of computing nodes to a second computing node of the plurality of computing nodes; (Regaieg, p. 401 "Figure 2 shows an example of a VMP process with 3 VMs and 3 PMs in a heterogeneous data center with an end-goal of maximizing the number of accepted VMs. The VMs requests are given in Figure 1. As it can be seen, after deploying the VMP process, VM1 is hosted in PM1,VM2 is hosted in PM2... and VM3 is hosted in PM3... Figure 3 shows the expected optimized VMP."; see Figures 2 and 3, allocation of the workload is moved from PM1 to PM3, i.e. [allocate workload from a first computing node to a second computing node])

Regaieg teaches the distributed computing platform, but does not teach 
receiving, by a mixed integer program (MIP) actor, a current state of the… platform.
 receiving, by the MIP actor, a predicted performance for the… platform, from a critic approximator module;
selecting, by the MIP actor, an action of the plurality of actions based on a solution of the mixed integer mathematical problem; 
applying the selected action to the… platform
determining a long-term reward for the distributed computing platform corresponding to the applying of the selected action to the… platform;

Regaieg does not teach, but Zhang teaches: 
a processor; a storage device coupled to the processor; and a Programmable Actor Reinforcement Learning (PARL) engine stored in the storage device, wherein an execution of the PARL engine by the processor configures the processor to perform operations comprising: (Zhang, p. 9, 4. Case study "In Fig. 6, the implementation details of the case study in this paper are provided… Step 3. According to the model set above, three scenarios are used for simulation through the SAC algorithm"; p. 10, 4.3. Setup of the benchmark and the proposed algorithms "The hyper-parameters setting for training the SAC algorithm mentioned in Section 3 are listed in Table 1. For updating the weights and biases of the actor and critic networks, the learning rates are... and... , respectively."; see Fig. 6 simulation using 'python' and soft actor-critic algorithm [PARL engine] and the training process inherently teach all the computer components) 

    PNG
    media_image3.png
    297
    998
    media_image3.png
    Greyscale

receiving, by a mixed integer program (MIP) actor, a current state of the… platform, (Zhang, p. 5 "State S: in this paper, the current state [a current state] information st contains fuel cost C..., load demand P_d..., wind power generation P_wind... and the capacity of the battery SoC...; [a mixed integer mathematical problem]"; p. 7 "The actor part [by a MIP actor]… the sole objective of the Actor is seeking for the direction of policy improvement. Note that the state space is continuous... [a mixed integer mathematical problem]... Qθ(s, a)(47)..."; p. 3 "This paper adopts an off-policy algorithm, soft actor-critic (SAC) algorithm based on maximum entropy theory [33], to solve multi-objective optimization problems  in the IPHNGE system [the platform]"; MIP actor is used for solving a mixed integer mathematical problem, where variables can be continuous or integers, with constraints for those variables)
… receiving, by the MIP actor, a predicted performance for the… platform, from a critic approximator module, (Zhang, p. 6 "The critic part... The output value Qθ(s, a) of a DNN [a predicted performance from a critic approximator module] parameterized by θ is used to estimate soft Q-value, i.e., Q(s, a) ˜ Qθ(s, a)."; p. 8 "Temporal Difference (T.D.) error and stochastic gradient decent methods are used to update the parameters of the DNNs... by T.D. error will be back-propagated to update the parameters of the corresponding DNNs, while Qθ(s, a) and Vθ(s) are transferred to renew the parameters of the Actor"; see Fig, 5, Qθ(s, a) is provided (from a critic) to TD error, to update DNNs (which include the actor), i.e. [the actor receiving a predicted performance])
wherein the critic approximator module comprises a deep neural network (DNN) implemented as a critic of the PARL engine; (Zhang, p. 6 "The critic part [the critic approximator module, a critic]... The output value Qθ(s, a) of a DNN [DNN] parameterized by θ is used to estimate soft Q-value, i.e., Q(s, a) ˜ Qθ(s, a)."; p. 9, 4. Case study "In Fig. 6, the implementation details of the case study in this paper are provided… Step 3. According to the model set above, three scenarios are used for simulation through the SAC algorithm [implementation of a critic]"; see Fig. 6 'python' [PARL engine])
solving, by the MIP actor, a mixed integer mathematical problem of a plurality of operational states..., based on the received current state and the predicted performance; (Zhang, p. 6 "Generally, it is a reasonable trick that the cost function described in Section 2 could be regarded as the reward...for the constraints... Eq (24)(25)(26) [a mixed integer mathematical problem (multi-objective problems)]"; p. 7 "The actor part [by the MIP actor]… the sole objective of the Actor is seeking for the direction of policy improvement. Note that the state space is continuous [a mixed integer mathematical problem]"; p. 3 "This paper adopts an off-policy algorithm, soft actor-critic (SAC) [by the MIP actor] algorithm based on maximum entropy theory [33], to solve multi-objective optimization problems [solving a mixed integer mathematical problem] in the IPHNGE system by encouraging strategy exploration..."; p. 7 "L(θ) = E[...(Qθ(si, ai) - Q^(si, ai))2] (36) [based on state si and the predicted performance Qθ]")
wherein the MIP actor is implemented as an actor of the PARL engine and uses mixed integer programming techniques for solving the mixed integer mathematical problem, and (Zhang, p. 9, 4. Case study "In Fig. 6, the implementation details of the case study in this paper are provided… Step 3. According to the model set above, three scenarios are used for simulation through the SAC algorithm [implementation of an actor]"; p. 7 "The actor part [a MIP actor]… the sole objective of the Actor is seeking for the direction of policy improvement. Note that the state space is continuous... [a mixed integer mathematical problem]... Qθ(s, a)(47)..."; see Fig. 6 'python' [PARL engine]; MIP actor is used for solving a mixed integer mathematical problem, where variables can be continuous or integers, with constraints for those variables)
… selecting, by the MIP actor, an action of the plurality of actions based on a solution of the mixed integer mathematical problem; applying the selected action to the… platform… (Zhang, p. 7 "The actor part [by the MIP actor]… Thus, this study employs a state conditioned stochastic network π(·|s) to sample action [selecting an action of actions, applying the selected action]..."; p. 8 "Given the current environment state st [based on a solution of the mixed integer mathematical problem], the Actor selects and executes an action... based on policy π(·|s)."; in RL a loop (a continuous cycle) St, At, St+1, At+1, St+2... is learned, therefore each state is a solution of previous state space, and the state space in Zhang is a mixed integer problem)
determining a long-term reward for the distributed computing platform corresponding to the applying of the selected action to the… platform; (Zhang, p. 7 "̂Q^(si, ai) = r + γ V̂ (s′i) (37) [determining a long-term reward Q^ corresponding to the applying of the selected action ai] where V̂(s′) will be discussed later, which means the target state-value.")
computing an error between the long-term reward and the predicted performance; and (Zhang, p. 7 "L(θ) = E[...(Qθ(si, ai) - Q^(si, ai))2] (36) [computing an error between the long-term reward Q^ to the predicted performance Qθ]") 
iteratively updating parameters of the critic approximator module based on the error between the long-term reward and the predicted performance. (Zhang, p. 6 "In the framework of the actor-critic networks, the two essential roles (the Critic and Actor) are responsible for two tasks (named policy evaluation and policy improvement, respectively). The learning process advances in the direction of achieving maximum return in the long run by the iteration [iteratively] of these two processes."; p. 7 "The critic part... The parameter θ of the DNN is updated [updating parameters of the critic approximator module] based on b tuple information (s,a,r,s'), which is randomly batch-sampled from the replay buffer. The performance evaluation of the DNN parameterized by θ depends on the mean square error (MSE) of the difference [an error] between Q target value Q^(s, a) and Q estimated value Qθ(s, a) [between the determined long-term reward Q^ and the predicted performance Qθ]... (36)")

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Regaieg to incorporate the teachings of Zhang by including deep reinforcement learning strategy. Doing so would optimize decision-making action through empirical learning without prediction information and prior knowledge. (Zhang, p. 1 "A deep reinforcement learning –based energy scheduling strategy is proposed to optimize multiple targets, including minimizing operational costs… The optimized decision-making action can be identified by the soft actor-critic algorithm through empirical learning without prediction information and prior knowledge.")

Regaieg and Zhang do not teach, but Kurtz teaches: the solving of the mixed integer mathematical problem by the MIP actor comprises reformulatinq the DNN as a plurality of linear constraints and a plurality of binary decision variables; (Kurtz, p. 1, Abstract "We show that the BDNN can be reformulated as a mixed-integer linear program which can be solved to global optimality by classical integer programming solvers"; p. 2, 3. Discrete Neural Networks "In the following lemma we show how to reformulate Problem (3) as an integer program. Lemma 1... Problem(3) is equivalent to the mixed-integer non-linear program)... (5)... (6)... (7)... (8) [a plurality of linear constraints] ... (9) u ∈ {0, 1}... [a plurality of binary decision variables] (10)... Next we show that the constraints (5) – (8) correctly model the equation...  The main idea is that the u_i,k-variables model the output of the activation functions of data point i in layer k, i.e. they have value 0 if the activation value is 0 or value 1 otherwise."; in light of [0067] "the DNN based value estimation is represented using binary variables and the corresponding set of constraints")

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Regaieg and Zhang to incorporate the teachings of Kurtz by including include integer programming (IP) formulation to reformulate the neural network. Doing so would achieve better performance on a certain dataset. (Kurtz, p. 1, Abstract " We show that the BDNN can be reformulated as a mixed-integer linear program… We implemented our methods on random and real datasets and show that the heuristic version of the BDNN outperforms classical deep neural networks on the Breast Cancer Wisconsin dataset…")

Claim 10 recites substantially the same limitation as claim 1, therefore the rejection applied to claim 1 also apply to claim 10. In addition, Zhang teaches: A non-transitory computer readable storage medium tangibly embodying a computer readable program code having computer readable instructions that, when executed, causes a computing device to execute operations, the operations comprising: (Zhang, p. 9, 4. Case study "In Fig. 6, the implementation details of the case study in this paper are provided… Step 3. According to the model set above, three scenarios are used for simulation through the SAC algorithm"; p. 10, 4.3. Setup of the benchmark and the proposed algorithms "The hyper-parameters setting for training the SAC algorithm mentioned in Section 3 are listed in Table 1. For updating the weights and biases of the actor and critic networks, the learning rates are... and... , respectively."; see Fig. 6 simulation using 'python' and soft actor-critic algorithm and the training process inherently teach all the computer components)
The rationale for combining the teachings of Regaieg and Zhang is the same as set forth in the rejection of claim 1.

In regard to claims 2 and 11, Regaieg does not teach, but Zhang teaches: wherein the mixed integer mathematical problem is a sequential decision problem. (Zhang, p. 3 "A finite and discrete Markov decision process (MDP) [a sequential decision problem] is applied to formulate energy dispatch problem as a proper task for the SAC algorithm to solve.")
The rationale for combining the teachings of Regaieg and Zhang is the same as set forth in the rejection of claim 1.

In regard to claims 4 and 13, Regaieg does not teach, but Zhang teaches: wherein the critic approximator module is configured to approximate a total reward starting at a given state. (Zhang, p. 6 "The critic part... [the critic approximator module] The method of policy evaluation is introduced in this section, and a deep neural network (DNN) is employed to fulfil the task of approximating the value function because of its good convergence and stability [48]."; p. 5 "Supposed that the simulation starts at time slot t in one episode, the cumulative reward is given as: R(St, t) = Σ γr (22)… (23) [a total reward starting at a given state st]")
The rationale for combining the teachings of Regaieg and Zhang is the same as set forth in the rejection of claim 1.
 
In regard to claims 5 and 14, Regaieg does not teach, but Zhang teaches: where in the DNN is configured to approximate a value function of a next state. (Zhang, p. 6 "The critic part... The method of policy evaluation is introduced in this section, and a deep neural network (DNN) is employed to fulfil the task of approximating the value function because of its good convergence and stability [48]."; p. 8 "the Critic network is responsible for estimating the value of state [a value function of a next state] and state-action"; p. 6 "with the direction of a policy pi, the value function for state s at time slot t can be described in Equation (23).v_pi (st) = E[R(st, t|st=s)] (23)"; time slot can be t, t+1, t+2 ..., i.e. s_t+1 represents a next state, V(s_t+1) [a value function of a next state])
The rationale for combining the teachings of Regaieg and Zhang is the same as set forth in the rejection of claim 1.
 
In regard to claim 6, Regaieg teaches: wherein the environment includes the distributed computing platform. (Regaieg, p. 401 "Virtual Machine Placement (VMP) is one of the pressing issues encountered in cloud computing data centers. [the distributed computing platform] VMP is the process of selecting the most suitable physical machine (PM) to host the virtual machines (VMs).")

Regaieg does not teach, but Zhang teaches: wherein the execution of the PARL engine further configures the processor to perform the operations comprising determining transition dynamics of an environment based on a content sampling of the environment by the MIP actor, and (Zhang, p. 7 "Furthermore, this study introduces a reparameterization trick f(tau_t; st) [47], where tau is the action noise signal sampled from a standard normal distribution N(0, 1)"; p. 8 "a reparameterization trick, including noise sampled from standard normal distribution N(0, 1)"; see Fig. 5: N(0,1) [a content sampling of the environment by the MIP actor]; In reinforcement learning, the transition dynamics describe a sequence of (state, action, reward, next state, next action…), i.e. how states or actions move from one to another, and Zhang teaching using sampling to determine a next action)
The rationale for combining the teachings of Regaieg and Zhang is the same as set forth in the rejection of claim 1.

In regard to claim 20, Regaieg teaches: the environment includes the distributed computing platform. (Regaieg, p. 401 "Virtual Machine Placement (VMP) is one of the pressing issues encountered in cloud computing data centers. [the distributed computing platform] VMP is the process of selecting the most suitable physical machine (PM) to host the virtual machines (VMs).")

Regaieg does not teach, but Zhang teaches: wherein the critic approximator module is a rectified linear unit configured to learn a value function over a state-space of an environment and (Zhang, p. 8 "the Critic network is responsible for estimating the value of state and state-action [a value function over a state-space of the environment]... The Actor is composed of a parameterized DNN, i.e., dense layers and a Rectified Linear Unit (ReLU). The Critic includes a soft Q network, state-value network, and target state-value network."; see Fig. 5: ReLU [a rectified linear unit (RELUs)] in Critic, both actor and critic networks are DNNs, and both have dense and ReLU layers)
The rationale for combining the teachings of Regaieg and Zhang is the same as set forth in the rejection of claim 1.

Claims 3 and 12 rejected under 35 U.S.C. 103 as being unpatentable over Regaieg, Zhang and Kurtz as applied to claims 1 and 10, and in further view of Zhou ("Stochastic Virtual Machine Placement for Cloud Data Centers Under Resource Requirement Variations" 20191203)

In regard to claims 3 and 12, Regaieg, Zhang and Kurtz do not teach, but Zhou teaches: wherein the distributed computing platform is a part of a stochastic environment. (Zhou, p. 174412 "In this work, we study the VM placement problem for minimizing the total energy consumption in a data center under the uncertainty [the distributed computing platform is a part of a stochastic environment] of resource requirements demanded by the VMs. Instead of using deterministic values to represent the resource requirements, as in most existing placers, we propose a stochastic placement approach in which the resource requirement variations are modeled as random variables.")

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Regaieg, Zhang and Kurtz to incorporate the teachings of Zhou by including random variables for the uncertainty of resource requirements. Doing so would achieve more energy-efficient placement solutions compared with the deterministic VM placement algorithm. (Zhou, p. 174412 "By taking into account the uncertainty of resource requirements, the stochastic method can achieve more energy-efficient placement solutions compared with the deterministic VM placement algorithm.")

Claims 7-8 and 15 rejected under 35 U.S.C. 103 as being unpatentable over Regaieg, Zhang and Kurtz as applied to claims 1 and 10, and in further view of Haskell ("A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs" 20190423)

In regard to claim 7, Regaieg teaches: … an environment that includes the distributed computing platform. (Regaieg, p. 401 "Virtual Machine Placement (VMP) is one of the pressing issues encountered in cloud computing data centers. [the distributed computing platform] VMP is the process of selecting the most suitable physical machine (PM) to host the virtual machines (VMs).")

Regaieg does not teach, but Zhang teaches: … the MIP actor (Zhang, p. 5 "State S: in this paper, the current state information st contains fuel cost C..., load demand P_d..., wind power generation P_wind... and the capacity of the battery SoC...; [a mixed integer mathematical problem]"; p. 7 "The actor part [by a MIP actor]… the sole objective of the Actor is seeking for the direction of policy improvement.")
The rationale for combining the teachings of Regaieg and Zhang is the same as set forth in the rejection of claim 1.

Regaieg, Zhang and Kurtz do not teach, but Haskell teaches: wherein the execution of the PARL engine further configures the processor to perform the operations comprising invoking an empirical returns module to calculate an empirical return, based on completion of a number of iterations between the... actor and an environment… (Haskell, p. 3 "We introduce an empirical value learning algorithm with function approximation using random parametrized basis functions (EVL+RPBF)."; p. 4 "Step 1 of such an algorithm (Algorithm 1) involves sampling states s 1:N over which to do value iteration [completion of a  number of iterations between the action (which to do) and an environment (states)] and sampling parameters θ 1:J to pick basis functions φ(·; θ) which are used to do function fitting. Step 2 involves doing an empirical value iteration [invoking an empirical returns module/empirical return] over states s 1:N by sampling next states (Xmsn, a)m=1..M according to the transition kernel Q, and using the current iterate of the value function vk. Note that fresh (i.i.d.) samples of the next state are regenerated in each iteration.")

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Regaieg, Zhang and Kurtz to incorporate the teachings of Haskell by including universally applicable approximate dynamic programming algorithms for continuous state space MDPs with finite action spaces. Doing so would improve computational tractability and reduce the ‘curse of dimensionality.’ (Haskell, p. 10 "In this paper, we have introduced universally applicable approximate dynamic programming algorithms for continuous state space MDPs with finite action spaces. The algorithms introduced are based on using randomization to improve computational tractability and reduce the ‘curse of dimensionality’ via the synthesis of the ‘random function approximation’ and ‘empirical’ approaches.")

In regard to claims 8 and 15, Regaieg, Zhang and Kurtz do not teach, but Haskell teaches: wherein the execution of the PARL engine further configures the processor to perform the operations comprising reducing a computational complexity associated with the computing device by using a Sample Average Approximation (SAA) and discretization of an uncertainty distribution. (Haskell, p. 1 "This paper is inspired by the ‘random function’ approach that uses randomization to (nearly) solve otherwise intractable [uncertainty] problems (see e.g., [25], [26]) and the ‘empirical’ approach that reduces computational complexity of working with expectations..."; p. 2 "For the second non-parametric approach, we pick a RKHS for approximation. Both function spaces are dense in the space of continuous functions. In each iteration, we sample a few states from the state space. Empirical value learning (EVL) is then performed on these states. Each step of EVL involves approximating the Bellman operator with an empirical (random) Bellman operator by plugging a sample average approximation [SAA] from simulation for the expectation. This is akin to doing stochastic approximations with step size one."; p. 1"In this paper, we propose approximate DP algorithms for continuous state space MDPs with finite action space [discretization of an uncertainty distribution] that are universal (approximating function space can provide arbitrarily good approximation for any problem), computationally tractable, simple to implement..."; p. 2 "let M(S) be the space of all probability distributions over S [an uncertainty distribution]"; p. 3 "When the state space S is very large, or even uncountable, exact dynamic programming methods are not practical, or even feasible... The idea is to sample a finite set of states from S, approximate the Bellman update at these states, and then extend to the rest of S through function fitting similar to [9].")
The rationale for combining the teachings of Regaieg, Zhang, Kurtz and Haskell is the same as set forth in the rejection of claim 7.

Response to Arguments
Applicant's arguments with respect to the rejection of the claims under 35 U.S.C. 103 have been fully considered but they are moot:

Applicant argues: (see p. 12) However, Zhang does not teach or suggest an MIP actor that is implemented as an actor of the PARL engine and uses mixed integer programming techniques for solving a mixed integer mathematical problem, and a critic approximator module that comprises the DNN implemented as a critic of the PARL engine. Further, Zhang does not teach or suggest that the solving of the mixed integer mathematical problem by the MIP actor of the PARL engine that uses the mixed integer programming techniques comprises reformulating the DNN as a plurality of linear constraints and a plurality of binary decision variables.

Examiner answers: the arguments do not apply to the new citation from Zhang and the prior art  Kurtz being used in the current rejection. Zhang teaches in Fig. 6, simulation using 'python' and soft actor-critic algorithm [PARL engine] and the training process of the actor-critic algorithm teach the implementation of the actor-critic of the PARL engine. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SU-TING CHUANG whose telephone number is (408)918-7519. The examiner can normally be reached Monday - Thursday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached at (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/SU-TING CHUANG/Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Show 3 earlier events
Nov 07, 2025
Final Rejection mailed — §103
Dec 12, 2025
Interview Requested
Jan 05, 2026
Examiner Interview Summary
Jan 05, 2026
Applicant Interview (Telephonic)
Jan 07, 2026
Response after Non-Final Action
Feb 02, 2026
Request for Continued Examination
Feb 09, 2026
Response after Non-Final Action
Feb 24, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/726,040
Patent 12626164
SYSTEM AND METHOD FOR REDUCTION OF DATA TRANSMISSION BY DATA RECONSTRUCTION
4y 0m to grant Granted May 12, 2026
17/828,778
Patent 12626106
MACHINE LEARNING MODELS FOR BEHAVIOR UNDERSTANDING
3y 11m to grant Granted May 12, 2026
17/871,819
Patent 12626140
SYSTEMS AND METHODS FOR ONLINE TIME SERIES FORCASTING
3y 9m to grant Granted May 12, 2026
16/655,202
Patent 12619890
LEARNING PATTERN DICTIONARY FROM NOISY NUMERICAL DATA IN DISTRIBUTED NETWORKS
6y 6m to grant Granted May 05, 2026
17/131,035
Patent 12619882
DECISION TREE OF MODELS: USING DECISION TREE MODEL, AND REPLACING THE LEAVES OF THE TREE WITH OTHER MACHINE LEARNING MODELS
5y 4m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
50%
Grant Probability
87%
With Interview (+37.2%)
4y 6m (~6m remaining)
Median Time to Grant
High
PTA Risk
Based on 104 resolved cases by this examiner. Grant probability derived from career allowance rate.