Last updated: May 29, 2026
Application No. 18/461,694
ARTIFICIAL INTELLIGENCE SCHEDULER FOR TASK-EXECUTION SYSTEMS

Non-Final OA §102§103
Filed
Sep 06, 2023
Examiner
KAMRAN, MEHRAN
Art Unit
2196
Tech Center
2100 — Computer Architecture & Software
Assignee
DELL PRODUCTS, L.P.
OA Round
1 (Non-Final)
Interview Optional

— +14.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 90% grant rate with +14.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 491 resolved cases, 2023–2026
Examiner Intelligence

KAMRAN, MEHRAN View full profile →
Grants 90% — above average
Career Allowance Rate
441 granted / 491 resolved
+34.8% vs TC avg
Moderate +14% lift
Without
With
+14.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
13 currently pending
Career history
514
Total Applications
across all art units
Statute-Specific Performance

§101
1.1%
-38.9% vs TC avg
§103
90.6%
+50.6% vs TC avg
§102
1.5%
-38.5% vs TC avg
§112
5.3%
-34.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 491 resolved cases
Office Action

§102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are presented for examination.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1,2,4,12-14 and 16-20 are rejected under 35 U.S.C. 102(a)(2) as anticipated by Amarnath (US 2023/0012710 A1)

As per claim 1, Amarnath teaches A system, comprising: 
a processor; (Amarnath Fig 4 Block 420 (Processor))
a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations, the operations comprising:
inputting state data that describes a current state of a task-execution system to a trained scheduler model; (Amarnath Fig 5 block 510 and [0080] In operation, the dynamic scheduler 520 may receive and schedule one or more real-time DAGs such as, for example, DAG's 510, which may be input DAGs of the scheduler 530. A DAG processor 540 may track dependencies in the DAG, determine task deadline, slack and priorities of the entire DAG such as, for example, each of the DAGs 510. In some embodiments, slack is the amount of time left to a deadline during execution of the DAG's tasks. Priority of a DAG can be determined based on the application domain. For example, the DAGs that are critical for an application to complete and meet the deadline might have higher priority over record keeping tasks.)

obtaining task-related output from the trained scheduler model, the task-related output being based on the state data and learned scheduling policy data; (Amarnath [0069] In one aspect, the learning agent based application scheduling service 410 may, using the machine learning component 440, track each task dependencies in each of the DAGs such as, for example, DAG's 404, 406, and 408. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, determine task priorities, task execution timelines, task wait times, and task deadlines for each of the one or more tasks of the plurality of DAGs. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, rank the one or more tasks in the plurality of DAGs [prioritizing a task over another] based on the plurality of constraints and conditions, the task prioritization policies, the task execution estimates, and the configurations of the heterogenous system [0093] Using a reinforcement learning (“RL”) agent 562 in a task scheduler 560 of FIG. 1, the RL agent 562 may receive tasks schedule on a PE (e.g., a busy or unavailable PE), as in block 702. The RL agent may use a decision tree based RL quality (“Q”) learning operation to execute each decision in the task scheduler 560. It should be noted that the RL agent 562 may be implemented as a specialized accelerator devices that enables the RL agent 562 to execute a machine learning decision in a single scheduler tick (e.g., a time scheduler takes to schedule a task on a PE).).

The examiner believes the usage of machine learning is consistent with what is disclosed in the specification (Fig 2 Blocks 228 (Reinforcement Learning) and Block 230 (Policy Update)) 

based on the task-related output, obtaining scheduling data usable to schedule resources of the task-execution system to execute tasks. (Amarnath [0069] In one aspect, the learning agent based application scheduling service 410 may, using the machine learning component 440, track each task dependencies in each of the DAGs such as, for example, DAG's 404, 406, and 408. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, determine task priorities, task execution timelines, task wait times, and task deadlines for each of the one or more tasks of the plurality of DAGs.  [task parameter data]. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, rank the one or more tasks in the plurality of DAGs based on the plurality of constraints and conditions, the task prioritization policies, the task execution estimates, and the configurations of the heterogenous system and [0091] The task scheduler 560 may determine if the PE is busy (e.g., utilized or unavailable for receiving a task assignment), as in block 638. If no at block 638, the task scheduler 560 may assign a task of the DAG to the PE, as in block 644. At block 646, the task may be scheduled and executed to completion and added to a completed task queue (e.g., a complete task queue (“Q”)), as in block 646.).

As per claim 2, Amarnath teaches wherein the state data comprises at least one of: respective priority levels of a group of respective tasks, respective remaining execution times of the respective tasks of the group of respective tasks, or resource-related data of resources currently being used by the task-execution system. (Amarnath [0096] At block 714, a current state of the RL agent 562 may be accessed or obtained (using one or more inputs 730 such as, for example, characteristics of the mission, environmental conditions, task and systems at a given instant, etc.). In one aspect, the inputs may include, but not limited to, a number of tasks and their priorities, the status of PE's (e.g., available or unavailable for receiving a task), a number of active DAGs, environment congestion, a mission type, and slack available of parent DAG. [0097] The RL agent 562 may traverse the DT, as in block 716. The RL agent 562 may perform a determination operation to determine a wait action for scheduling the task to a busy PE, as in block 718. The action of the RL agent 562 may be classified as an action to wait for the fastest PE (in relation to processing speeds of the other PE's), as in block 722 (e.g., an original, but faster PE) or an action to not wait and immediately schedule a task to an available, but potentially slower PE, as in block 720, (e.g., assign to a slower, but available GPU as compared to waiting for a faster accelerator). From block 718, operations may move to block 720 or 724 based on determining the wait action for scheduling the task to a busy PE. [0104] If the rank is same, priority 2 tasks may be ordered before priority 1 tasks. The schedule and execution of the tasks can be seen in the table 830 task in the readyQ is represented as (DAG_ID.Task_ID). For example, at time stamp 11, the DAG 0 has task 1 scheduled with an accelerator. Task 5 is scheduled with the GPU. Task 2 is now in a waiting period for an available PE. However, accessing and utilizing the RL agent such as, for example, the RL agent 562 of FIG. 6, the RL agent 562 may use machine learning and execute a correct decision for scheduling the task 2 rather than waiting such as, for example, at time stamp “11” the RL agent 562 may schedule for execution DAG 0's task 2 on a free PE (e.g., GPU) since there are four GPU's and there are available GPU's at time stamp 11. Thus, rather than waiting, as depicted in table 830 of the DAG 0's task 2 waiting until time stamp 94 for being assigned to an accelerator, by assigning the DAG 0's task 2 to a free PE (e.g., GPU), the response time of DAG 0, can be reduced to 94 cycles instead of 104 cycles and can meet the deadline)

As per claim 4, Amarnath teaches wherein the operations further comprise obtaining, by a resource allocation module, the scheduling data, and allocating, by the resource allocation module, the resources to the tasks based on the scheduling data. (Amarnath [0104] If the rank is same, priority 2 tasks may be ordered before priority 1 tasks. The schedule and execution of the tasks can be seen in the table 830 task in the readyQ is represented as (DAG_ID.Task_ID). For example, at time stamp 11, the DAG 0 has task 1 scheduled with an accelerator. Task 5 is scheduled with the GPU. Task 2 is now in a waiting period for an available PE. However, accessing and utilizing the RL agent such as, for example, the RL agent 562 of FIG. 6, the RL agent 562 may use machine learning and execute a correct decision for scheduling the task 2 rather than waiting such as, for example, at time stamp “11” the RL agent 562 may schedule for execution DAG 0's task 2 on a free PE (e.g., GPU) since there are four GPU's and there are available GPU's at time stamp 11. Thus, rather than waiting, as depicted in table 830 of the DAG 0's task 2 waiting until time stamp 94 for being assigned to an accelerator, by assigning the DAG 0's task 2 to a free PE (e.g., GPU), the response time of DAG 0, can be reduced to 94 cycles instead of 104 cycles and can meet the deadline).

As per claim 12, Amarnath teaches wherein the operations further comprise updating the learned scheduling policy data based on measured performance data of the task-execution system. (Amarnath [0098] Thus, the RL agent 562 may, during runtime, use the DT as the basis for Q-learning of the RL agent 562 to perform exploration and exploitation of various task policies. The RL agent 562 may wait, based on the determined wait time, for the PE (e.g., the original, busy PE), as in block 722. The RL agent 562 may determine if a task is completed on the processing element (“PE”), as in block 724. If no at block 724, the system will wait for the task to be completed. If yes at block 724, the RL agent 562 may obtain a reward for the RL agent 562, as in block 726. The purpose, of the RL agent 562 may obtain a reward is to reinforce and update the RL model with new experiences that the system encounters for the mission that is being executed. The RL agent 562 may update a RL model, as in block 728).

As per claim 13, Amarnath teaches A method, comprising: 
obtaining, by a system comprising a processor, respective task parameter data representative of respective task parameters for respective tasks to be executed, the respective task parameter data comprising respective task type data representative of respective task types of the respective tasks, respective task priority data representative of respective task priorities of the respective tasks, and respective task deadline data representative of respective task deadlines associated with the respective tasks; (Amaranth [0069] In one aspect, the learning agent based application scheduling service 410 may, using the machine learning component 440, track each task dependencies in each of the DAGs such as, for example, DAG's 404, 406, and 408. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, determine task priorities, task execution timelines, task wait times, and task deadlines for each of the one or more tasks of the plurality of DAGs. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, rank the one or more tasks in the plurality of DAGs based on the plurality of constraints and conditions, the task prioritization policies, the task execution estimates, and the configurations of the heterogenous system [0094] The RL agent 562 may determine a mission type of each task [type of task] such as, for example, if the task mission is first to be schedule or last, as in block 704. That is, block 704 represents the action a task scheduler has to do based on whether the task it has to schedule is the first in the mission or the last.).
generating, by the system, the respective tasks associated with respective task identifiers; (Amarnath [0085] at block 606, the DAG processor 540 may generate one or more tasks of the accepted DAGs to prepare for scheduling and execution, as in block 608. The DAG processor 540 may estimate execution time and available slack, as in block 610, such as, for example, estimating the execution time of task on one or more PEs on a heterogeneous SoC. The DAG processor 540 may determine various environmental conditions such as, for example, congestion (e.g., traffic congestion on a road), as in block 612).
 prioritizing, by the system, the respective tasks into respective prioritized tasks based on the respective task parameter data; (Amarnath 0069] In one aspect, the learning agent based application scheduling service 410 may, using the machine learning component 440, track each task dependencies in each of the DAGs such as, for example, DAG's 404, 406, and 408. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, determine task priorities, task execution timelines, task wait times, and task deadlines for each of the one or more tasks of the plurality of DAGs. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, rank the one or more tasks in the plurality of DAGs [prioritizing a task over another] based on the plurality of constraints and conditions, the task prioritization policies, the task execution estimates, and the configurations of the heterogenous system).
scheduling, by the system based on learned scheduling policy data representative of a learned scheduling policy, the respective prioritized tasks, the scheduling comprising allocating respective resources to the respective prioritized tasks in association with respective execution times to obtain respective scheduled tasks;  (Amarnath [0106] Tasks of directed acyclic graphs (DAGs) may be dynamically scheduled based on a plurality of constraints and conditions, task prioritization policies, task execution estimates, and configurations of a heterogenous system, as in block 904. In an additional embodiment, a machine learning component may be initialized may he initialized (e.g. initialized. active, and/or installed) to dynamically schedule the tasks of the DAGs. The functionality 900 may end, as in block 906).
executing, by the system, the respective scheduled tasks, the executing comprising dispatching the respective scheduled tasks to the respective allocated resources for execution at the respective execution times. (Amarnath Fig 5 Block 570 and [0091] The task scheduler 560 may determine if the PE is busy (e.g., utilized or unavailable for receiving a task assignment), as in block 638. If no at block 638, the task scheduler 560 may assign a task of the DAG to the PE, as in block 644. At block 646, the task may be scheduled and executed to completion and added to a completed task queue (e.g., a complete task queue (“Q”)), as in block 646.).

As per claim 14, Amarnath teaches comprising monitoring, by the system, the execution of the respective scheduled tasks, and, based on a result of the monitoring, outputting, by the system, updated learned scheduling policy data representative of an updated learned scheduling policy. (Amarnath [0098] Thus, the RL agent 562 may, during runtime, use the DT as the basis for Q-learning of the RL agent 562 to perform exploration and exploitation of various task policies. The RL agent 562 may wait, based on the determined wait time, for the PE (e.g., the original, busy PE), as in block 722. The RL agent 562 may determine if a task is completed on the processing element (“PE”), as in block 724. If no at block 724, the system will wait for the task to be completed. If yes at block 724, the RL agent 562 may obtain a reward for the RL agent 562, as in block 726. The purpose, of the RL agent 562 may obtain a reward is to reinforce and update the RL model with new experiences that the system encounters for the mission that is being executed. The RL agent 562 may update a RL model, as in block 728).

As per claim 16, Amarnath teaches wherein the scheduling of the respective prioritized tasks further comprises mapping the respective task parameter data to respective actions based on the learned scheduling policy data. (Amarnath [0094] The RL agent 562 may determine a mission type of each task such as, for example, if the task mission is first to be schedule or last, as in block 704. That is, block 704 represents the action a task scheduler has to do based on whether the task it has to schedule is the first in the mission or the last. If the task is the first task, then offline RL models, based on the type of mission being executed by the system, have to be loaded into the RL agent 562. If the task is the last task, then new learnings from the mission executed will be uploaded back to the cloud. For example, the RL agent 562 may be trained (e.g., offline training) for each mission type using supervised learning. The offline data may be collected using autonomous vehicle simulators and synthetic workload traces. Information about each DAGs may be passed as traces that allows for a lookahead approach to determine an effect of a future arriving DAGs on a current DAG. The offline trained decision tree may then be used for the initialization of the RL agent 562, which may be online. [0097] The RL agent 562 may traverse the DT, as in block 716. The RL agent 562 may perform a determination operation to determine a wait action for scheduling the task to a busy PE, as in block 718. The action of the RL agent 562 may be classified as an action to wait for the fastest PE (in relation to processing speeds of the other PE's), as in block 722 (e.g., an original, but faster PE) or an action to not wait and immediately schedule a task to an available, but potentially slower PE, as in block 720, (e.g., assign to a slower, but available GPU as compared to waiting for a faster accelerator). From block 718, operations may move to block 720 or 724 based on determining the wait action for scheduling the task to a busy PE).

As per claim 17, Amarnath teaches A non-transitory machine-readable medium, comprising executable instructions that, when executed by a processor, facilitate performance of operations, the operations comprising: 
obtaining a stream of data corresponding to a current state of task-execution system; (Amarnath Fig 5 block 510 and [0080] In operation, the dynamic scheduler 520 may receive and schedule one or more real-time DAGs such as, for example, DAG's 510, which may be input DAGs of the scheduler 530. A DAG processor 540 may track dependencies in the DAG, determine task deadline, slack and priorities of the entire DAG such as, for example, each of the DAGs 510. In some embodiments, slack is the amount of time left to a deadline during execution of the DAG's tasks. Priority of a DAG can be determined based on the application domain. For example, the DAGs that are critical for an application to complete and meet the deadline might have higher priority over record keeping tasks.)
generating scheduling policy data, via a recurrent learning module, based on the stream of data; ; (Amarnath 0069] In one aspect, the learning agent based application scheduling service 410 may, using the machine learning component 440, track each task dependencies in each of the DAGs such as, for example, DAG's 404, 406, and 408. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, determine task priorities, task execution timelines, task wait times, and task deadlines for each of the one or more tasks of the plurality of DAGs. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, rank the one or more tasks in the plurality of DAGs [prioritizing a task over another] based on the plurality of constraints and conditions, the task prioritization policies, the task execution estimates, and the configurations of the heterogenous system [0093] Using a reinforcement learning (“RL”) agent 562 in a task scheduler 560 of FIG. 1, the RL agent 562 may receive tasks schedule on a PE (e.g., a busy or unavailable PE), as in block 702. The RL agent may use a decision tree based RL quality (“Q”) learning operation to execute each decision in the task scheduler 560. It should be noted that the RL agent 562 may be implemented as a specialized accelerator devices that enables the RL agent 562 to execute a machine learning decision in a single scheduler tick (e.g., a time scheduler takes to schedule a task on a PE).).

The examiner believes the usage of reinforcement learning is consistent with what is disclosed in the specification (Fig 2 Blocks 228 (Reinforcement Learning) and Block 230 (Policy Update)) 

 executing, in the task-execution system, respective tasks of a group of tasks, the executing comprising executing the respective tasks based on the scheduling policy data and respective task parameter data of the group of tasks. (Amarnath [0069] In one aspect, the learning agent based application scheduling service 410 may, using the machine learning component 440, track each task dependencies in each of the DAGs such as, for example, DAG's 404, 406, and 408. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, determine task priorities, task execution timelines, task wait times, and task deadlines for each of the one or more tasks of the plurality of DAGs.  [task parameter data]. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, rank the one or more tasks in the plurality of DAGs based on the plurality of constraints and conditions, the task prioritization policies, the task execution estimates, and the configurations of the heterogenous system and [0091] The task scheduler 560 may determine if the PE is busy (e.g., utilized or unavailable for receiving a task assignment), as in block 638. If no at block 638, the task scheduler 560 may assign a task of the DAG to the PE, as in block 644. At block 646, the task may be scheduled and executed to completion and added to a completed task queue (e.g., a complete task queue (“Q”)), as in block 646.).

As per claim 18, Amarnath teaches The non-transitory machine-readable medium of claim 17, wherein the obtaining of the stream of data comprises obtaining resource data representative of available resources of the task-execution system, and obtaining respective task data representative of the respective tasks to perform, and wherein the executing of the respective tasks based on the scheduling policy data comprises allocating respective resources of the available resources to perform the respective tasks of the group of tasks at respective execution times. (Amarnath [0069] In one aspect, the learning agent based application scheduling service 410 may, using the machine learning component 440, track each task dependencies in each of the DAGs such as, for example, DAG's 404, 406, and 408. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, determine task priorities, task execution timelines, task wait times, and task deadlines for each of the one or more tasks of the plurality of DAGs.  [task parameter data]. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, rank the one or more tasks in the plurality of DAGs based on the plurality of constraints and conditions, the task prioritization policies, the task execution estimates, and the configurations of the heterogenous system and [0082] one non-transitory machine readable storage medium [0091] The task scheduler 560 may determine if the PE is busy (e.g., utilized or unavailable for receiving a task assignment), as in block 638. If no at block 638, the task scheduler 560 may assign a task of the DAG to the PE, as in block 644. At block 646, the task may be scheduled and executed to completion and added to a completed task queue (e.g., a complete task queue (“Q”)), as in block 646. [0096] At block 714, a current state of the RL agent 562 may be accessed or obtained (using one or more inputs 730 such as, for example, characteristics of the mission, environmental conditions, task and systems at a given instant, etc.). In one aspect, the inputs may include, but not limited to, a number of tasks and their priorities, the status of PE's (e.g., available or unavailable for receiving a task), a number of active DAGs, environment congestion, a mission type, and slack available of parent DAG. [0097] The RL agent 562 may traverse the DT, as in block 716. The RL agent 562 may perform a determination operation to determine a wait action for scheduling the task to a busy PE, as in block 718. The action of the RL agent 562 may be classified as an action to wait for the fastest PE (in relation to processing speeds of the other PE's), as in block 722 (e.g., an original, but faster PE) or an action to not wait and immediately schedule a task to an available, but potentially slower PE, as in block 720, (e.g., assign to a slower, but available GPU as compared to waiting for a faste).

As per claim 19, Amarnath teaches The non-transitory machine-readable medium of claim 17, wherein the operations further comprise monitoring the performance of the task-execution system with respect to executing the respective tasks. (Amarnath [0098] Thus, the RL agent 562 may, during runtime, use the DT as the basis for Q-learning of the RL agent 562 to perform exploration and exploitation of various task policies. The RL agent 562 may wait, based on the determined wait time, for the PE (e.g., the original, busy PE), as in block 722. The RL agent 562 may determine if a task is completed on the processing element (“PE”), as in block 724. If no at block 724, the system will wait for the task to be completed. If yes at block 724, the RL agent 562 may obtain a reward for the RL agent 562, as in block 726. The purpose, of the RL agent 562 may obtain a reward is to reinforce and update the RL model with new experiences that the system encounters for the mission that is being executed. The RL agent 562 may update a RL model, as in block 728).

As per claim 20, Amaranth teaches The non-transitory machine-readable medium of claim 19, wherein the operations further comprise updating the scheduling policy data based on the monitoring of the performance of the task-execution system. (Amarnath [0098] Thus, the RL agent 562 may, during runtime, use the DT as the basis for Q-learning of the RL agent 562 to perform exploration and exploitation of various task policies. The RL agent 562 may wait, based on the determined wait time, for the PE (e.g., the original, busy PE), as in block 722. The RL agent 562 may determine if a task is completed on the processing element (“PE”), as in block 724. If no at block 724, the system will wait for the task to be completed. If yes at block 724, the RL agent 562 may obtain a reward for the RL agent 562, as in block 726. The purpose, of the RL agent 562 may obtain a reward is to reinforce and update the RL model with new experiences that the system encounters for the mission that is being executed. The RL agent 562 may update a RL model, as in block 728).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 3 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Amarnath (US 2023/0012710 A1) in view of Guo (US 2019/0116128 A1).

As per claim 3, Amarnath teaches wherein the trained scheduler model comprises a global artificial intelligence scheduler module configured to output scheduling data that allocates task-execution system resources to the tasks based on resource availability data, task priority data, and per-task resource needs (Amaranth [0069] In one aspect, the learning agent based application scheduling service 410 may [global scheduler], using the machine learning component 440, track each task dependencies in each of the DAGs such as, for example, DAG's 404, 406, and 408. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, determine task priorities, task execution timelines, task wait times, and task deadlines for each of the one or more tasks of the plurality of DAGs. The learning agent based application scheduling service 410 may, using the machine learning component 440, the DAG component 450, the estimation component 460, and/or the scheduler component 470, rank the one or more tasks in the plurality of DAGs based on the plurality of constraints and conditions, the task prioritization policies, the task execution estimates, and the configurations of the heterogenous system [0094] The RL agent 562 may determine a mission type of each task [type of task] such as, for example, if the task mission is first to be schedule or last, as in block 704. That is, block 704 represents the action a task scheduler has to do based on whether the task it has to schedule is the first in the mission or the last.).
 
Amarnath does not teach a local artificial intelligence scheduler module coupled to obtain the scheduling data from the global artificial intelligence scheduler module and assign processors to the tasks based on the per-task resource needs. 
However, Guo teaches a local artificial intelligence scheduler module coupled to obtain the scheduling data from the global artificial intelligence scheduler module and assign processors to the tasks based on the per-task resource needs (Guo [0043] In one example, the global edge scheduler 212 schedules tasks according to deadlines and priorities of the tasks. For example, a task with a higher priority can be scheduled with an early execution time and/or a larger overprovisioned resource capacity. In one example, a task requires multiple pieces of edge computing resources. Accordingly, the task is fit in a time slot when all associated edge computing resources are available. In one example, sub-tasks of a heavy task are distributed to multiple edge computing centers such that the sub-tasks can be processed in parallel. In one example, secondary tasks are matched with idle edge computing resources on which no primary tasks are performed. For example, secondary tasks and primary tasks are arranged separately on different servers, different containers, different virtual machines, or different networks. In one example, secondary tasks are scheduled in a way that secondary tasks and primary tasks share a same edge computing resource, such as a same server, a same virtual machine, or a same network. [0050] The local edge scheduler 262 can be configured to receive PES requests 271 for performing primary tasks initiated by UEs (such as a UE 230), and match the primary tasks to related edge computing resources 266. The local edge scheduler 262 can be further configured to receive SES requests for performing secondary tasks from the global edge scheduler 212. However, the edge scheduler 262 can accept or reject a SES request depending on whether edge computing resources associated with the SES are available or not.)

It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Guo with the system of Amaranth to assign processors to the tasks based on the per-task resource needs. One having ordinary skill in the art would have been motivated to use Guo into the system of Amaranth for the purpose of dynamically allocating edge computing resources. (Guo paragraph 02) 

As to claim 15, it is rejected based on the same reason as claim 3.

Claims 5, 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Amarnath (US 2023/0012710 A1) in view of Alagha (Alagha, A.; Singh, S.; Mizouni, R.; Bentahar, J.; Otrok, H. “Target localization using multi-agent deep reinforcement learning with proximal policy optimization”. Future Generation Computing. Syst. 2022, 136, pages 342–357).

As per claim 5, Amarnath does not teach wherein the operations further comprise a recurrent learning module that learns the learned scheduling policy data, and wherein the recurrent learning module comprises a proximal policy optimization model and a deep-Q network.
However, Alagha teaches wherein the operations further comprise a recurrent learning module that learns the learned scheduling policy data, and wherein the recurrent learning module comprises a proximal policy optimization model and a deep-Q network. (Alagha Abstract [The use of Reinforcement Learning (RL) helps in providing an efficient Artificial Intelligence (AI) paradigm to obtain intelligent agents, which can learn in different complex environments. In this work, an actor–critic structure is used with Convolutional Neural Networks (CNNs), which are optimized using Proximal Policy Optimization (PPO). Agents’ observations are modeled as 2D heatmaps capturing locations and sensor readings of all agents. Cooperation among agents is induced using a team-based reward, which incentivizes agents to cooperate in localizing the target and managing their resources] and page 343 [The proposed models utilize Proximal Policy Optimization (PPO) to optimize actor and critic networks that are based on Convolutional Neural Networks (CNNs).] and page 351 [Finally, the performance of the resultant agents is compared against existing localization benchmarks, such as Bayesian-based survey methods [18], uniform survey methods [33], and single-agent RL-based target localization using Double Deep Q-Networks (DDQN) [2].and page 354 [DDQN: a localization approach based on Double Deep Q learning for a system with a single agent [2], which is trained for 10 million steps.])

It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Alagha with the system of Amaranth to use a proximal policy optimization model and a deep-Q network. One having ordinary skill in the art would have been motivated to use Alagha into the system of Amaranth for the purpose of allowing cooperation among agents (Alagha page 343)

As per claim 7, Alagha teaches wherein the proximal policy optimization model outputs the scheduling policy data. (Alagha [page 345] In this work, we use Proximal Policy Optimization (PPO) [41]; a state-of-the-art policy gradient method for RL that uses the actor–critic structure. In this structure, the policy (actor) is used to select actions, while the estimated value function (critic) is used to criticize the actions made by the actor. PPO uses the current experiences, combined with the critique, and tries to take the biggest improvement step to update the current policy, without moving far from it) 

As per claim 8, Alagha teaches wherein the proximal policy optimization model comprises a policy subnetwork and a value subnetwork, wherein the policy network outputs a probability distribution over candidate actions based on the state data, and wherein the value network estimates an expected value of the current state for use in evaluating a quality metric of current policy data. (. (Alagha [page 345] In this work, we use Proximal Policy Optimization (PPO) [41]; a state-of-the-art policy gradient method for RL that uses the actor–critic structure. In this structure, the policy (actor) is used to select actions, while the estimated value function (critic) is used to criticize the actions made by the actor. PPO uses the current experiences, combined with the critique, and tries to take the biggest improvement step to update the current policy, without moving far from it……probability ratio of taking certain actions between the old and the current policy. Rt (.) is greater than 1 if at is more likely to be taken in st after the latest policy update, and less than 1 if the opposite is true. At ˆA corresponds to an estimator of the advantage function; a function that measures how good a certain action is, given a certain state.) 

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Amarnath (US 2023/0012710 A1) in view of Alagha (Alagha, A.; Singh, S.; Mizouni, R.; Bentahar, J.; Otrok, H. “Target localization using multi-agent deep reinforcement learning with proximal policy optimization”. Future Generation Computing. Syst. 2022, 136, pages 342–357) in further view of Rao (US 2024/0118667 A1).

As per claim 6, Amarnath and Alagha do not teach wherein the deep-Q network outputs action values comprising Q-values representative of candidate actions based on the current state data.
However, Rao teaches wherein the deep-Q network outputs action values comprising Q-values representative of candidate actions based on the current state data. (Rao [0067] As noted above, in some implementations, a vision-based robot task model utilized by the vision-based robot task engine(s) 158 can be a deep Q-learning network (e.g., stored in the vision-based robot task model(s) database 190) that learns a Q-function, Q(s, a), for a vision-based task, where s is an input image, and where a is a candidate action performable based on the input image)

It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Rao with the system of Amaranth and Alagha to use a Q-function system. One having ordinary skill in the art would have been motivated to use Rao into the system of Amaranth and Alagha for the purpose of using stochastic optimization to select actions ( during inference) and target Q-values. (Rao paragraph 41) 

Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Amarnath (US 2023/0012710 A1) in view Gomboly (US 2022/0226994 A1).
           
           As per claim 9, Amarnath does not teach wherein the task-execution system comprises a real-time control system.
           However, Gomboly teaches wherein the task-execution system comprises a real-time control system. (Gomboly [0007] In an aspect, a method is disclosed to generate a schedule for a plurality of heterogenous robots (e.g., robotic equipment, manufacturing equipment, transport equipment, people with assigned tasks in manufacturing, assembling, distributing environment) performing a set of tasks using a scheduler executing instructions, wherein the plurality of heterogenous robots includes a first robot of a first type, and a second robot of a second type, wherein the first type and the second type are different (e.g., in being configured for different tasks or can perform the same tasks at different proficiencies). [0035] In the example shown in FIG. 1, the heterogenous robots 102 can include equipment (shown as 102a) such as, but not limited to, robotic equipment, manufacturing equipment, transport equipment, as well as people with assignable tasks (shown as “workers” 102b) and other equipment and workers described herein. Schedule 110 is used as parameters in control systems (shown as 112) to control the operation of equipment 102a. Schedule 110 may also be used to generate floor schedules or plans (shown as 114) to direct the operation of the equipment 102a as well as workers 102b).

It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Gomboly with the system of Amaranth to use a real-time control system. One having ordinary skill in the art would have been motivated to use Gomboly into the system of Amaranth for the purpose of achieving optimality in scheduling using a scalable framework (Gombolay paragraph 05). 

As per claim 10, Gomboly teaches wherein the real-time control system comprises at least one of: a test control system, a measurement control system, a manufacturing control system, a power generation control system, a transportation control system, an industrial automation control system or a process control system. (Gomboly [0035] In the example shown in FIG. 1, the heterogenous robots 102 can include equipment (shown as 102a) such as, but not limited to, robotic equipment, manufacturing equipment, transport equipment, as well as people with assignable tasks (shown as “workers” 102b) and other equipment and workers described herein. Schedule 110 is used as parameters in control systems (shown as 112) to control the operation of equipment 102a. Schedule 110 may also be used to generate floor schedules or plans (shown as 114) to direct the operation of the equipment 102a as well as workers 102b).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Amarnath (US 2023/0012710 A1) in view Qu (US 2024/0314197 A1).

As per claim 11, Amarnath does not teach wherein the task-execution system comprises at least one of: a network management system, a network optimization system, or a function of an edge computing system.
However, Qu teaches wherein the task-execution system comprises at least one of: a network management system, a network optimization system, or a function of an edge computing system. (Qu [0242] Specifically, this embodiment provides an IDEC system. The IDEC system mainly includes three major modules: an edge resource management module (or IoT device resource management module, which is the first functional component above), a computing task decomposition module (or called the machine learning computing task decomposition module (i.e., the above-mentioned second functional component) and the intelligent computing task allocation (ICTA) module (i.e., the above-mentioned third functional component). As shown in FIG. 4, the IDEC system connects to the widely distributed edge infrastructure of the Internet of Things (that is, edge devices, which can also be called Internet of Things devices) in the southbound direction, and generates a resource graph that supports dynamic construction and update through the edge resource management module. Dynamic perception, unified management, efficient scheduling and sharing and collaboration of various heterogeneous IoT device resources. In the north direction of the IDEC system, the deep learning tasks from intelligent applications and services in actual scenarios are generated through the computing task decomposition module to produce computation graphs, realizing fine-grained operator-level computing task decomposition, providing conditions for parallel computing and distributed processing, and at the same time It is conducive to graph-level optimization of deep learning task execution performance. The middle layer (i.e., core module) of the IDEC system is the ICTA module. Based on the generated resource graph and computation graph, ICTA realizes the cross-device distribution of the underlying deep learning operators. The ICTA module uses the graph convolutional network (GCN) and Deep learning algorithms such as deep neural network (DNN) realize the task allocation strategy corresponding to the best system performance by learning the inherent statistical laws of complex and changeable task scheduling problems between different operating systems on heterogeneous Internet of Things devices. Intelligent decision-making maximizes the use of scattered heterogeneous resources on the edge side of the Internet of Things, thereby improving the overall system performance; at the same time, the ICTA module introduces a continuous learning mechanism to enable the IDEC system to have intelligent self-adaption, realizing “the more you use it, the better it becomes.” clever”).

It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Qu with the system of Amaranth to use an edge computing system. One having ordinary skill in the art would have been motivated to use Qu into the system of Amaranth for the purpose of making full use of resource-constrained and highly heterogeneous Io T devices to perform computing tasks (Qu paragraph 04)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

US 20240370300 A1 – discloses resource prioritization using machine learning techniques are provided herein. An example computer-implemented method includes obtaining data pertaining to multiple resources associated with at least one enterprise; prioritizing one or more of the multiple resources in connection with one or more tasks associated with the at least one enterprise by processing, using one or more machine learning techniques, at least a portion of the data pertaining to the multiple resources and data pertaining to the one or more tasks; and performing one or more automated actions based at least in part on the prioritizing of the one or more resources.

US 12056579 B1 – discloses intelligent priority evaluators configured to perform a method that prioritizes tasks submitted by various users, even if the tasks are similarly classified. The scheduling system can collect, calculate, and use various criteria to determine a reward score in order to prioritize one task over another, such as for dynamic scheduling purposes. This can be performed in addition to or as a replacement for receiving user designations of priority.

US 20230385779 A1 – discloses identifying a task from a plurality of tasks that need to be scheduled and determining other tasks associated with participants associated with the task. The method may also include, by the computing device, determining one or more periods of time when the participants associated with the task are unavailable and determining one or more candidate time slots for the task based on time slots for which the other tasks are scheduled and the one or more periods of time when the participants associated with the task are unavailable. The method may further include, by the computing device, scheduling the task to be performed during one of the determined one or more candidate time slots.

US 20230251900 A1 – discloses scheduling a Real Time (RT) task, includes: receiving a task; obtaining a yield time of the RT task based on one of an execution deadline of the RT task, an execution deadline of next RT task subsequent to the RT task, and a maximum execution time associated with an execution of the next RT task subsequent to the RT task; creating a bandwidth reservation task having a deadline; inserting the RT task along with the bandwidth reservation task into a RT wait queue based on the deadline of each of the RT task and the bandwidth reservation task in accordance with an Early Deadline First (EDF) criteria; and scheduling an unscheduled task based on an available-time of the RT wait queue in accordance with the EDF based scheduling.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MEHRAN KAMRAN whose telephone number is (571)272-3401.  The examiner can normally be reached on 9-5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, April Blair can be reached on (571)270-1014.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MEHRAN KAMRAN/           Primary Examiner, Art Unit 2196
Read full office action
Prosecution Timeline

Sep 06, 2023
Application Filed
Jan 15, 2026
Non-Final Rejection mailed — §102, §103
Apr 14, 2026
Applicant Interview (Telephonic)
Apr 14, 2026
Examiner Interview Summary
Apr 15, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

18/073,262
Patent 12639090
GUEST-ASSISTED LIVE STORAGE MIGRATION
3y 5m to grant Granted May 26, 2026
17/872,059
Patent 12613736
MEMORY AWARE CONTEXT SWITCHING
3y 9m to grant Granted Apr 28, 2026
18/322,810
Patent 12613857
DATABASE SERVER AGENT
2y 11m to grant Granted Apr 28, 2026
17/883,541
Patent 12591444
Hardware Virtual Machine for Controlling Access to Physical Memory Space
3y 7m to grant Granted Mar 31, 2026
18/067,300
Patent 12585486
SYSTEMS AND METHODS FOR DEPLOYING A CONTAINERIZED NETWORK FUNCTION (CNF) BASED ON INFORMATION REGARDING THE CNF
3y 3m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
90%
Grant Probability
99%
With Interview (+14.3%)
2y 7m (~0m remaining)
Median Time to Grant
Low
PTA Risk
Based on 491 resolved cases by this examiner. Grant probability derived from career allowance rate.