Last updated: April 19, 2026

Application No. 18/157,168

WARM UP TABLE FOR FAST REINFORCEMENT LEARNING MODEL TRAINING

Final Rejection §103

Filed

Jan 20, 2023

Examiner

MULLINAX, CLINT LEE

Art Unit

2123

Tech Center

2100 — Computer Architecture & Software

Assignee

DELL PRODUCTS, L.P.

OA Round

2 (Final)

This examiner grants 48% of cases after interview

— +38.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 123 resolved cases, 2023–2026

Examiner Intelligence

MULLINAX, CLINT LEE View full profile →

Grants 48% of resolved cases

Career Allow Rate

59 granted / 123 resolved

-7.0% vs TC avg

Strong +38% interview lift

Without

With

+38.3%

Interview Lift

resolved cases with interview

Typical timeline

4y 4m

Avg Prosecution

25 currently pending

Career history

148

Total Applications

across all art units

Statute-Specific Performance

§101

22.8%

-17.2% vs TC avg

§103

53.6%

+13.6% vs TC avg

§102

6.3%

-33.7% vs TC avg

§112

13.1%

-26.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 123 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is a responsive to the application filed on 12/26/2025.
Claims 1-4, 6-14, and 16-20 are pending.
Claims 1, 6, 8, 10-11, 16, 18, and 20 have been amended.
Claims 5 and 15 have been canceled.

Response to Arguments
Applicant’s arguments, with respect to the rejection(s) of claim(s) 10 and 20 under 35 U.S.C. 112(b), have been fully considered and are persuasive. Therefore, the previous rejections set forth in the previous office action have been withdrawn.


Applicant’s arguments, with respect to the rejection(s) of claim(s) 1-20 under 35 U.S.C. 101, have been fully considered and are persuasive. Therefore, the previous rejections set forth in the previous office action have been withdrawn.

Applicant’s arguments, with respect to the rejection(s) of claim(s) 1 and 11 under 35 U.S.C. 103, have been considered but they are not persuasive. Applicant argues that no reference teaches the amended limitations, since “Mondal…assumes-and requires-that workloads must execute in order for rewards to be obtained”, “Tong does not disclose precomputing execution-time distributions offline”, and the invention “explicitly avoids runtime execution during training” thus making a “performance improvement”. The examiner respectfully disagrees.
Due to the broadness of the claim language Mondal has been found to teach the claimed limitations. Mondal, sections “Background”-“Rewards Design for RL” and Figs. 1-2 teach a time-varying workloads (TVW) vectors (tensors) implemented as learned policies (probability distributions) that include a state of the system and corresponding action at a timestep most likely to move the state forward. For each machine (nodes) scheduled to run a TVW, “Length of the history, i.e. h helps the agent to learn the temporal characteristics of each TVW” so executing certain workloads times have minimum overlap (execution times for workloads executed at the respective nodes), then training a network on the data. Further, while Mondal does teach “online” execution for portions of learning, section “End-to-End Evaluation” teaches “offline training time ranged from approximately 28-108 hrs depending on the load, state space dimensions and the number of iterations”; thus, able to perform the training operations as mapped “offline” as argued.
Further still, the argued “neither reference teaches nor suggests replacing reward determination with sampling from offline-generated, tensor-based probability distributions”, it not explicitly claimed and the applicant is arguing a narrower scope than what is recited in the claimed language.
See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant amendments.

Applicant’s arguments, with respect to the rejection(s) of claim(s) 1 and 11 under 35 U.S.C. 103, have been considered but they are not persuasive. Applicant argues that no reference teaches the amended limitations, since “[n]either MondaI nor Tong discloses or suggests such data structures [of warm up tables and tensors as claimed], nor do they describe indexing execution-time distributions in this manner”. The examiner respectfully disagrees.
Due to the broadness of the claim language Mondal has been found to teach the claimed limitations. Mondal, sections “Background”-“Rewards Design for RL” and Figs. 1-2 teach a time-varying workloads (TVW) vectors (tensors) implemented as learned policies (probability distributions) that include a state of the system and corresponding action at a timestep most likely to move the state forward. For each machine (nodes) scheduled to run a TVW, “Length of the history, i.e. h helps the agent to learn the temporal characteristics of each TVW” so executing certain workloads times have minimum overlap (execution times for workloads executed at the respective nodes), then training a network on the data. Sections “Background”-“Rewards Design for RL”, “Equivalence-class Analysis”, and Figs. 1-2 further teach learning TVW policies (warmup tables/probability distributions) based on computed TVW vectors (first/second tensors) including their execution workload times, “avg., std.” of times (mean/std deviation), to have minimum overlap with other TVWs for different “job types” (different).
Dirac has been cited in combination for teaching the tensor structure since Col. 16, line 57-Col. 17, line 16, and Col. 22, lines 27-33 teach extracting statistical numeric values for each data attribute including the “standard deviation” of executing different jobs types on certain resources; and Col. 83, lines 22-42, Col. 85, lines 1-20, and Col. 142, lines 6-10 teach numeric values as vectors for parameters and being comprised in a “probability distribution”.
See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant amendments.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-4, 6-14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Mondal et al (“Scheduling of Time-Varying Workloads Using Reinforcement Learning”, 2021) hereinafter Mondal, in view of Tong et al (“Proactive scheduling in distributed computing—A reinforcement learning approach”, 2014) hereinafter Tong, in view of Dirac et al (US Patent 10963810) hereinafter Dirac.
Regarding claims 1 and 11, Mondal teaches a method; a non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising (section “Design of TVW-RL” teaches machines for executing embodiments of the disclosure including one or more CPUs and one or more memories): 
generating, prior to training, warm up tables for respective nodes in a computing environment, each warm up table including a plurality of tensors that define probability distributions of execution times for workloads executed at the respective nodes (sections “Background”-“Rewards Design for RL” and Figs. 1-2 teach a time-varying workloads (TVW) vectors (tensors) implemented as learned policies (probability distributions) that include a state of the system and corresponding action at a timestep most likely to move the state forward. For each machine (nodes) scheduled to run a TVW, “Length of the history, i.e. h helps the agent to learn the temporal characteristics of each TVW” so executing certain workloads times have minimum overlap (execution times for workloads executed at the respective nodes), then training a network on the data), wherein the probability distributions are defined by at least a first tensor storing mean execution times and a second tensor storing standard deviations, and wherein the tensors store execution times for different combinations of one or more workloads executed at the respective nodes (Mondal, sections “Background”-“Rewards Design for RL”, “Equivalence-class Analysis”, and Figs. 1-2 teach learning TVW policies (warmup tables/probability distributions) based on computed TVW vectors (first/second tensors) including their execution workload times, “avg., std.” of times (mean/std deviation), to have minimum overlap with other TVWs for different “job types” (different)); and 
training a reinforcement learning model using the warm up tables, by during training, selecting an action corresponding to an allocation of a workload to a node, sampling an execution time from the probability distribution defined by the tensors of the warm up table associated with the node, and determining a reward based on the sampled execution time such that the reinforcement learning model is trained without waiting for the workload to complete execution (sections “Background”-“Rewards Design for RL” and Figs. 1-2 teach a time-varying workload reinforcement learning (TVW-RL) model implemented as an “agent” tasked with learning (training a RL mode…during training) policies (warmup tables) of state, action, rewards. “We use negative rewards” based on computed TVW vectors including their execution workload times to have minimum overlap with other TVWs when scheduling jobs on machines (allocation of a workload to a node)).

Mondal at least implies probability distributions for time required to execute workload (see mappings above); however, Tong teaches probability distributions for time required to execute workload (sections 4.2-5.1 teach a probability distribution of state transitions based on “execution time of individual tasks” for the reward).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Tong’s teachings of determining a probability distribution of processing units executing tasks times for a reward in a reinforcement learning structure into Mondal‘s teaching of time-varying workload scheduling through reinforcement learning in order to increase efficiency of task scheduling achieve “much better load balance” (Tong, sections 5.2 and 7).
	Further, Mondal at least implies wherein the probability distributions are defined by at least a first tensor storing mean execution times and a second tensor storing standard deviations (see mappings above); however, Dirac teaches wherein the probability distributions are defined by at least a first tensor storing mean execution times and a second tensor storing standard deviations (Col. 16, line 57-Col. 17, line 16, and Col. 22, lines 27-33 teach extracting statistical numeric values for each data attribute including the “standard deviation” of executing different jobs types on certain resources; and Col. 83, lines 22-42, Col. 85, lines 1-20, and Col. 142, lines 6-10 teach numeric values as vectors for parameters and being comprised in a “probability distribution”).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Mondal‘s teaching of time-varying workload scheduling through reinforcement learning as modified by Tong’s teachings of determining a probability distribution of processing units executing tasks times for a reward in a reinforcement learning structure, to include  as taught by Dirac in order to  achieve better job implementation flexibility based on resource parameter data knowledge (Dirac, Col. 16, line 57-Col. 17, line 43).

Regarding claims 2 and 12, the combination of Mondal, Tong, and Dirac teach all the claim limitations of claims 1 and 11 above; and further teach further comprising generating execution times for workloads prior to training the reinforcement learning (Mondal, sections “Background”-“Rewards Design for RL” and Figs. 1-2 teach a time-varying workloads (TVW) implemented as policies that include a state of the system and corresponding action at a timestep most likely to move the state forward. For each machine scheduled to run a TVW, “Length of the history, i.e. h helps the agent to learn the temporal characteristics of each TVW” so executing certain workloads times have minimum overlap (time required to execute workload). Then the RL model is “learned” on the policies (prior to training the reinforcement learning).).

Regarding claims 3 and 13, the combination of Mondal, Tong, and Dirac teach all the claim limitations of claims 2 and 12 above; and further teach further comprising generating execution times for different types of workloads (Mondal, sections “Background”-“Rewards Design for RL” and Figs. 1-2 teach workloads times for different “job types”).

Regarding claims 4 and 14, the combination of Mondal, Tong, and Dirac teach all the claim limitations of claims 1 and 11 above; and further teach further comprising generating execution times for one or more workloads of one or more workload types (Mondal, sections “Background”-“Rewards Design for RL” and Figs. 1-2 teach workloads times for different “job types”).

Regarding claims 6 and 16, the combination of Mondal, Tong, and Dirac teach all the claim limitations of claims 5 and 15 above; and further teach wherein the first tensor stores execution times for different combinations of one or more workloads of one or more types (Mondal, sections “Background”-“Rewards Design for RL” and Figs. 1-2 teach learning TVW policies (warmup tables) based on computed TVW vectors (first/second tensors) including their execution workload times to have minimum overlap with other TVWs for different “job types”).

Regarding claims 7 and 17, the combination of Mondal, Tong, and Dirac teach all the claim limitations of claims 6 and 16 above; and further teaches wherein the second tensor stores standard deviations for the one or more workloads of one or more types (Dirac, Col. 16, line 57-Col. 17, line 16 teach extracting statistical values for each data attribute including the “standard deviation” of executing different jobs types on certain resources).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify Mondal‘s teaching of time-varying workload scheduling through reinforcement learning as modified by Tong’s teachings of determining a probability distribution of processing units executing tasks times for a reward in a reinforcement learning structure, to include  as taught by Dirac in order to  achieve better job implementation flexibility based on resource parameter data knowledge (Dirac, Col. 16, line 57-Col. 17, line 43).

Regarding claims 8 and 18, the combination of Mondal, Tong, and Dirac teach all the claim limitations of claims 1 and 11 above; and further teach further comprising, during training, selecting an action and executing the action in a state (Mondal, sections “Background”-“Rewards Design for RL” and Figs. 1-2 teach a time-varying workload reinforcement learning (TVW-RL) model implemented as an “agent” tasked with learning (training a RL model) policies (warmup tables) that include a state of the system and corresponding action at a timestep most likely to move the state forward).

Regarding claims 9 and 19, the combination of Mondal, Tong, and Dirac teach all the claim limitations of claims 8 and 18 above; and further teach further comprising generating the rewards prior to termination of the workloads (Mondal, sections “Background”-“Rewards Design for RL” and Figs. 1-2 teach a time-varying workload reinforcement learning (TVW-RL) model implemented as an “agent” tasked with learning (training a RL model) policies (warmup tables). “We use negative rewards” when computing TVW vectors including their execution workload times to have minimum overlap with other TVWs. Further, “we add a penalty proportional to the sum of unused resources in the used machines. A used machine is one with at least one workload running.”).

Regarding claims 10 and 20, the combination of Mondal, Tong, and Dirac teach all the claim limitations of claims 9 and 19 above; and further teach further comprising observing the new state, computing a loss and updating the states (Mondal, sections “State Space Representation for RL”-“Rewards Design for RL” teach algorithm 1 analyzing an input state, calculating a penalty from error, and updating the policy network).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123

Read full office action

Prosecution Timeline

Jan 20, 2023

Application Filed

Sep 15, 2025

Non-Final Rejection — §103

Dec 26, 2025

Response Filed

Mar 31, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/375,973

Patent 12561620

Machine Learning-Based URL Categorization System With Noise Elimination

2y 5m to grant Granted Feb 24, 2026

16/726,709

Patent 12554962

CONFIGURABLE PROCESSOR ELEMENT ARRAYS FOR IMPLEMENTING CONVOLUTIONAL NEURAL NETWORKS

2y 5m to grant Granted Feb 17, 2026

17/230,446

Patent 12547887

SYSTEM FOR DETECTING ELECTRIC SIGNALS

2y 5m to grant Granted Feb 10, 2026

17/367,179

Patent 12518169

SYSTEMS AND METHODS FOR SAMPLE GENERATION FOR IDENTIFYING MANUFACTURING DEFECTS

2y 5m to grant Granted Jan 06, 2026

18/410,742

Patent 12493771

DEEP LEARNING MODEL FOR ENERGY FORECASTING

2y 5m to grant Granted Dec 09, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

48%

Grant Probability

86%

With Interview (+38.3%)

4y 4m

Median Time to Grant

Moderate

PTA Risk

Based on 123 resolved cases by this examiner. Grant probability derived from career allow rate.