DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed on 2/13/2026 has been entered. Claims 1-20 remain pending in this application.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-7, 9-13, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Satish in view of Bonebakker et al (US Patent No. 8,688,430 B1 hereinafter Bonebakker in further view of Sasikumar et al. (US Patent No. 8,407,322 B1 hereinafter Sasikumar).
As per claim 1, Satish teaches a method comprising: obtaining an execution schedule for execution of a plurality of runnables corresponding to a computing application by a heterogenous computing platform including a plurality of compute engines in which two or more compute engines of the plurality of compute engines are of different types (Pg. 149, Section 1 – “Introduction”, “In this work, we consider the problem of statically scheduling a graph of concurrent tasks with associated execution times on a target heterogeneous multiprocessor. The objective of the scheduling algorithm is to minimize the schedule length (makespan) of the system.” Pg. 150, Section 3.1 – “Static Scheduling Problem”, “We consider a static scheduling problem whose objective is to schedule a task dependence graph onto a multiprocessor to minimize the schedule length (makespan). The task graph is a directed acyclic graph (DAG) G = (V, E), where V = {v1, v2, ...vn} represents the set of computation tasks, and the directed edges E ⊆ V × V represent dependence constraints between the tasks. The architecture graph H = (P, C) is a graph showing the processors and their interconnections, where P = {p1, p2, ...pm} is a set of processors and C ⊆ P × P shows the connectivity between the processors. Each task is fully executed without preemption on a single processor.”), the execution schedule being deterministic such that the execution schedule is maintained across multiple different execution iterations of the computing application (Pg. 150, Section 4.1 – “Timing Model”, “Simulation based approaches execute the code with different inputs and get the distribution of real execution times. Such an approach is simpler but assumes access to the final platform. In this work, we use a simulation-based approach to characterize the run times of individual tasks. There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task. To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs. We repeat the execution of each input a certain number of times to ensure that variations in cache access times are captured in our traces.”), and testing the execution schedule using dummy code that mimics one or more operations of the computing application that are associated with the execution schedule (Pg. 150, Section 4.1 – “Timing Model”, “Simulation based approaches execute the code with different inputs and get the distribution of real execution times. Such an approach is simpler but assumes access to the final platform. In this work, we use a simulation-based approach to characterize the run times of individual tasks. There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task. To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs. We repeat the execution of each input a certain number of times to ensure that variations in cache access times are captured in our traces.”).
Although Satish teaches testing/simulating an execution schedule using dummy code that mimics the timing of one or more operations of a computing application, Satish fails to explicitly show that the dummy code is different from code of the one or more runnables.
Accordingly, Bonebakker teaches the known simulation technique or the dummy code being different from actual code of the computing application and mimicking timing of execution of at least a portion of the computing application without executing the actual code corresponding to at least the portion of the computing application itself (Col. 6 & 7, lines 63-67 & 1-21, “Then, during subsequent simulations, instead of repeatedly executing the program code corresponding to each repeating computational phase, the computer system executes the portion of the program code corresponding to each repeated computational phase a limited number of times (step 510) (i.e., at least one time). Hence, the load on the computer system is accurately simulated in much less time than is required to run the full program code. In embodiments of the present invention, some or all of the program code is replaced with the portions of the program code that correspond to the repeating computational phases…Hence, the load on the computer system while running the program code can be accurately simulated using 4 short-duration computations instead of the entire program code. In this case, using the repeating computations from each computational phase instead of executing the entire program code, the 17-hour runtime can be reduced to (6 min.+12 min.+9 min.+10 min.)=37 mins.”).
Satish and Bonebakker are considered to be analogous to the claimed invention because they are in the same field of task scheduling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the simulated testing operation of Satish with the modified testing/simulation technique of Bonebakker to arrive at the claimed invention. The motivation to modify Satish with the teachings of Bonebakker is that using dummy code instead of actual program when testing/simulating one or more runnables of a computing application is that various portions of the actual program code can be tested/simulated without executing the entire program code which can have a long overall runtime.
.
Although Satish and Bonebakker teach that particular runnables are executed by particular compute engines, Satish and Bonebakker fail to teach the execution schedule dictating such a relationship.
However, Sasikumar teaches the execution schedule including a set of commands dictating which individual compute engine of the plurality of compute engines executes which runnable of the plurality of runnables and dictating timing and order of execution of the plurality of runnables by the plurality of compute engines (Col. 7, lines 9-59, “The metadata 206 for each software code block can include information identifying requirements and constraints for a computer executing the code block. For example, such requirements and constraints may include where the code block is to be executed (e.g., in the cloud, on a client, etc.), when the code block is to be executed (e.g., synchronous, asynchronous--as quickly as possible, asynchronous--anytime, scheduled, etc.), which device or type of device is preferred or required for execution of the code block, and the type of runtime which is to be used (e.g., AVM, JVM, and the like). Additionally, such requirements and constraints may describe preferred characteristics for a computer executing the code block, such as preferred memory, storage, and security specifications… In some cases, a developer of the source code 304 may define metadata 310 associated with the source code 304. The metadata 310 can include various hints and constraints identified by the developer, for example, to facilitate handoffs between different computers when executing code blocks based on the source code 310. In some implementations, the developer may specify a hint for a particular part (e.g., a function or a part of a function) of the source code 304 indicating that the part should be executed by a particular computer at runtime. For example, for a client-server computing environment, the developer may indicate that a particular data processing function of an application should run on a server including a particular runtime environment.”).
Satish, Bonebakker, and Sasikumar are considered to be analogous to the claimed invention because they are in the same field of task scheduling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the task graph of Satish and Bonebakker with the scheduling commands/hints of Sasikumar to arrive at the claimed invention. The motivation to modify Satish and Bonebakker with the teachings of Sasikumar is that including such scheduling commands/hints within a task graph can facilitate more efficient scheduling and execution of an execution schedule across multiple compute engines.
As per claim 2, Satish, Bonebakker, and Sasikumar teach the method of claim 1. Satish teaches wherein the dummy code mimics one or more of: one or more respective runtimes of one or more runnables of the plurality of runnables; accessing of memory by one or more runnables of the plurality of runnables; or fence signaling by one or more runnables of the plurality of runnables, the fence signaling including triggering of one or more timing fences corresponding to the set of commands and used to dictate the timing and order of execution of one or more runnables of the plurality of runnables (Pg. 151, Section 4.1 – “Timing Model”, “There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task. To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs. We repeat the execution of each input a certain number of times to ensure that variations in cache access times are captured in our traces. The execution times of each task in each of these runs are then discretized by binning and stored in the form of a probability distribution table. Such a table has entries of the form (runtime range, probability), and stores the probability that the task has an execution time in a particular range.”; ” Pg. 157, Section 7 – “Results”, “In this section, we show the results of applying the two statistical optimization approaches to the scheduling problem with execution time variations. We compare the makespans obtained by the two methods for statistical models at different percentiles with the result of deterministic simulated annealing using worst-case execution times.”).
As per claim 3, Satish, Bonebakker and Sasikumar teach the method of claim 1. Satish teaches further comprising determining whether cross engine interference occurs between two or more of the compute engines based at least on the testing of the execution schedule (Pg. 150, Section 3.1 – “Static Scheduling Problem”, “We consider a static scheduling problem whose objective is to schedule a task dependence graph onto a multiprocessor to minimize the schedule length (makespan). The task graph is a directed acyclic graph (DAG) G = (V, E), where V = {v1, v2, ...vn} represents the set of computation tasks, and the directed edges E ⊆ V × V represent dependence constraints between the tasks. The architecture graph H = (P, C) is a graph showing the processors and their interconnections, where P = {p1, p2, ...pm} is a set of processors and C ⊆ P × P shows the connectivity between the processors.” Pg. 150, Section 4.1 – “Timing Model”, “If the task execution times are dependent, then we need to additionally store the joint probability distributions. Dependencies in task execution times arise mainly due to global sources of variation having to do with varying memory access times. The joint probability distribution is also stored in the form of a table, but the entries are of the form (range1, range2, ... , rangen, probability) that stores the joint probability that the task execution times of task 1 lies in range 1 and the execution time of task 2 lies in range 2 and so on. It may so happen that only certain sets of tasks have dependent task executions, while others do not. In this case, the joint probability distribution tables only need contain the dependent variables. In general, the joint probabilities help capture the correlations between the execution times of different tasks.).
As per claim 4, Satish, Bonebakker, and Sasikumar teach the method of claim 1. Satish teaches further comprising performing different testing iterations with different versions of dummy code corresponding to a plurality of different runtime timing scenarios (Pg. 151, Section 4.1 – “Timing Model”, “Simulation based approaches execute the code with different inputs and get the distribution of real execution times…There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task. To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs. We repeat the execution of each input a certain number of times to ensure that variations in cache access times are captured in our traces.”).
As per claim 5, Satish, Bonebakker, and Sasikumar teach the method of claim 1. Satish teaches wherein the testing of the execution schedule includes testing error handling with respect to one or more instances in which the execution schedule is not followed by one or more runnables of the plurality of runnables (Pg. 150, Section 4.1 – “Timing Model”, “In this work, we use a simulation-based approach to characterize the run times of individual tasks. There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task.”).
As per claim 6, Satish, Bonebakker and Sasikumar teach the method of claim 1. Bonebakker teaches wherein the dummy code differs from the actual code by including routines that only cause a certain amount of time to pass (Col. 6 & 7, lines 63-67 & 1-21, “Then, during subsequent simulations, instead of repeatedly executing the program code corresponding to each repeating computational phase, the computer system executes the portion of the program code corresponding to each repeated computational phase a limited number of times (step 510) (i.e., at least one time). Hence, the load on the computer system is accurately simulated in much less time than is required to run the full program code. In embodiments of the present invention, some or all of the program code is replaced with the portions of the program code that correspond to the repeating computational phases…Hence, the load on the computer system while running the program code can be accurately simulated using 4 short-duration computations instead of the entire program code. In this case, using the repeating computations from each computational phase instead of executing the entire program code, the 17-hour runtime can be reduced to (6 min.+12 min.+9 min.+10 min.)=37 mins.”).
As per claim 7, Satish, Bonebakker, and Sasikumar teach the method of claim 6. Satish teaches wherein the plurality of different runtime error scenarios includes one or more of: a crash of the computing application; one or more runnable timing overruns; one or more frame timing overruns corresponding to one or more timing frames included in the execution schedule; or one or more runnable execution order violations (Pg. 151, Section 4.1 – “Timing Model”, “There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task. To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs. We repeat the execution of each input a certain number of times to ensure that variations in cache access times are captured in our traces. The execution times of each task in each of these runs are then discretized by binning and stored in the form of a probability distribution table. Such a table has entries of the form (runtime range, probability), and stores the probability that the task has an execution time in a particular range.” Pg. 152, Section 4.2 – “Example: H.264 decoding”, “Depending on the input, a particular task may be responsible for decoding I or P macroblocks. P-macroblocks depend on previously decoded frames as also residual information. I and P macroblocks have different execution time characteristics (Fig 4). There is variability in the execution time of both I- and P- macroblocks due to varying amount of residual information…Cache misses and/or bus arbitration leads to uneven execution times even when a single P-macroblock is executed repeatedly with the same input. Since each P-macroblock has to access the frame buffer, this is a global source of variation in task execution times, and causes task execution times to be correlated.”).
As per claim 9, it is a system claim comprising similar limitations to claim 1, so it is rejected for similar reasons. Satish also teaches the execution schedule being generated prior to the execution of the computing application beginning and unchanging across multiple execution iterations of the computing application (Pg. 150, Section 3.1 – “Static Scheduling Problem”, “We consider a static scheduling problem whose objective is to schedule a task dependence graph onto a multiprocessor to minimize the schedule length (makespan). The task graph is a directed acyclic graph (DAG) G = (V, E), where V = {v1, v2, ...vn} represents the set of computation tasks, and the directed edges E ⊆ V × V represent dependence constraints between the tasks…Each task is fully executed without preemption on a single processor.” Pg. 150, Section 4.1 – “Timing Model”, “Simulation based approaches execute the code with different inputs and get the distribution of real execution times. Such an approach is simpler but assumes access to the final platform. In this work, we use a simulation-based approach to characterize the run times of individual tasks. There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task. To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs. We repeat the execution of each input a certain number of times to ensure that variations in cache access times are captured in our traces.”).
As per claim 10, it is a system claim comprising similar limitations to claim 2, so it is rejected for similar reasons.
As per claim 11, it is a system claim comprising similar limitations to claim 3, so it is rejected for similar reasons.
As per claim 12, Satish, Bonebakker, and Sasikumar teach the system of claim 9. Bonebakker teaches wherein the dummy code is different from the code of the computing application in that the dummy code includes less code than the code of the computing application (Col. 6 & 7, lines 63-67 & 1-21, “Then, during subsequent simulations, instead of repeatedly executing the program code corresponding to each repeating computational phase, the computer system executes the portion of the program code corresponding to each repeated computational phase a limited number of times (step 510) (i.e., at least one time). Hence, the load on the computer system is accurately simulated in much less time than is required to run the full program code. In embodiments of the present invention, some or all of the program code is replaced with the portions of the program code that correspond to the repeating computational phases…Hence, the load on the computer system while running the program code can be accurately simulated using 4 short-duration computations instead of the entire program code. In this case, using the repeating computations from each computational phase instead of executing the entire program code, the 17-hour runtime can be reduced to (6 min.+12 min.+9 min.+10 min.)=37 mins.”).
As per claim 13, Satish. Bonebakker, and Sasikumar teach the system of claim 12. Satish teaches wherein the plurality of different runtime timing scenarios includes one or more of: a crash of the computing application; one or more runnable overruns; one or more runnable underruns; one or more frame overruns corresponding to one or more timing frames included in the execution schedule; or one or more runnable execution order violations (Pg. 151, Section 4.1 – “Timing Model”, “There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task. To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs. We repeat the execution of each input a certain number of times to ensure that variations in cache access times are captured in our traces. The execution times of each task in each of these runs are then discretized by binning and stored in the form of a probability distribution table. Such a table has entries of the form (runtime range, probability), and stores the probability that the task has an execution time in a particular range.” Pg. 157, Section 7 – “Results”, “In this section, we show the results of applying the two statistical optimization approaches to the scheduling problem with execution time variations. We compare the makespans obtained by the two methods for statistical models at different percentiles with the result of deterministic simulated annealing using worst-case execution times.”).
As per claim 15, Satish, Bonebakker, and Sasikumar teach the system of claim 9. Sasikumar teaches wherein the system is comprised in at least one of:
a control system for an autonomous or semi-autonomous machine;
a perception system for an autonomous or semi-autonomous machine;
a system for performing simulation operations;
a system for performing digital twin operations;
a system for performing light transport simulation;
a system for performing collaborative content creation for 3D assets;
a system for performing deep learning operations;
a system for presenting at least one of augmented reality content, virtual reality content, or mixed reality content;
a system for hosting one or more real-time streaming applications;
a system implemented using an edge device;
a system implemented using a robot;
a system for performing conversational AI operations;
a system for generating synthetic data;
a system incorporating one or more virtual machines (VMs);
a system implemented at least partially in a data center; or
a system implemented at least partially using cloud computing resources (Col. 7, lines 9-27, “For example, such requirements and constraints may include where the code block is to be executed (e.g., in the cloud, on a client, etc.), when the code block is to be executed (e.g., synchronous, asynchronous--as quickly as possible, asynchronous--anytime, scheduled, etc.), which device or type of device is preferred or required for execution of the code block, and the type of runtime which is to be used (e.g., AVM, JVM, and the like).”)
Claim(s) 8 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Satish, Bonebakker, and Sasikumar as applied to claims 1 and 9 above, and further in view of Iwata (US Pub. No. 2018/0011734 A1).
As per claim 8, Satish, Bonebakker, and Sasikumar teach the method of claim 1. Satish teaches testing of the execution schedule (Pg. 151, Section 4.1 – “Timing Model”, “There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task. To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs.”).
Satish, Bonebakker, and Sasikumar fail to teach testing a system task manager that is configured to manage execution of the plurality of runnables.
However, Iwata teaches wherein the testing of the execution schedule includes testing health monitoring of a system task manager that is to manage execution of the plurality of runnables based at least on the execution schedule (¶ [0047]-[0048], “In the job manager P1 illustrated in FIG. 2, for example, a job submission thread that performs control of a submitted job, a job deletion thread that performs deletion of the submitted job, and a job execution request thread that makes a request for execution of the job to the resource management P3 operate… In the information processing apparatus 1 illustrated in FIG. 2, for example, in the case where the test of the job manager P1 and the job scheduler P2 is performed, the business operator operates a job simulator instead of the resource management P3 and the execution management P4.”).
Satish, Bonebakker, Sasikumar, and Iwata are all considered to be analogous to the claimed invention because they are all in the same field of task scheduling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the Satish, Bonebakker, and Sasikumar with the job manager of Iwata to arrive at the claimed invention. The motivation to modify Satish, Bonebakker, and Sasikumar with the teachings of Iwata is that including a job manager/system task manager ensures proper scheduling and processing of execution schedule.
As per claim 14, it is a system claim comprising similar limitations to claim 8, so it is rejected for similar reasons.
Claim(s) 16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Satish et al. (NPL Document - "Scheduling Task Dependence Graphs with Variable Task Execution Times onto Heterogenous Multiprocessors" hereinafter Satish) in view of Bonebakker et al. (US Patent No. 8,688,430 B1 hereinafter Bonebakker).
As per claim 16, Satish teaches one or more processors comprising: processing circuitry to cause performance of operations comprising: testing an execution schedule (Pg. 150, Section 3.1 – “Static Scheduling Problem”, “We consider a static scheduling problem whose objective is to schedule a task dependence graph onto a multiprocessor to minimize the schedule length (makespan). The task graph is a directed acyclic graph (DAG) G = (V, E), where V = {v1, v2, ...vn} represents the set of computation tasks, and the directed edges E ⊆ V × V represent dependence constraints between the tasks. The architecture graph H = (P, C) is a graph showing the processors and their interconnections, where P = {p1, p2, ...pm} is a set of processors and C ⊆ P × P shows the connectivity between the processors. Each task is fully executed without preemption on a single processor.”) using dummy code that mimics one or more operations corresponding to one or more runnables of a plurality of runnables of a computing application (Pg. 151, Section 4.1 – “Timing Model”, “In this work, we use a simulation-based approach to characterize the run times of individual tasks. There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task. To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs.”), the dummy code is executed during the testing instead of the one or more runnables corresponding thereto being executed and the dummy code mimics one or more of execution timing or execution order of the one or more runnables (Pg. 151, Section 4.1 – “Timing Model”, “To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs. We repeat the execution of each input a certain number of times to ensure that variations in cache access times are captured in our traces. The execution times of each task in each of these runs are then discretized by binning and stored in the form of a probability distribution table. Such a table has entries of the form (runtime range, probability), and stores the probability that the task has an execution time in a particular range.”).
Although Satish teaches testing/simulating an execution schedule using dummy code that mimics the timing of one or more operations of a computing application, Satish fails to explicitly show that the dummy code is different from code of the one or more runnables.
Accordingly, Bonebakker teaches wherein: the dummy code is different from code of the one or more runnables (Col. 6 & 7, lines 63-67 & 1-21, “Then, during subsequent simulations, instead of repeatedly executing the program code corresponding to each repeating computational phase, the computer system executes the portion of the program code corresponding to each repeated computational phase a limited number of times (step 510) (i.e., at least one time). Hence, the load on the computer system is accurately simulated in much less time than is required to run the full program code. In embodiments of the present invention, some or all of the program code is replaced with the portions of the program code that correspond to the repeating computational phases…Hence, the load on the computer system while running the program code can be accurately simulated using 4 short-duration computations instead of the entire program code. In this case, using the repeating computations from each computational phase instead of executing the entire program code, the 17-hour runtime can be reduced to (6 min.+12 min.+9 min.+10 min.)=37 mins.”).
Satish and Bonebakker are considered to be analogous to the claimed invention because they are in the same field of task scheduling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the testing operation of Satish with the modified testing/simulation code of Bonebakker to arrive at the claimed invention. The motivation to modify Satish with the teachings of Bonebakker is that using dummy code instead of actual program when testing/simulating one or more runnables of a computing application is that various portions of the actual program code can be tested/simulated without executing the entire program code which can have a long overall runtime.
As per claim 18, Satish and Bonebakker teach the one or more processors of claim 16. Satish teaches wherein the testing of the execution schedule includes performing different testing iterations with different versions of dummy code corresponding to a plurality of different runtime timing scenarios (Pg. 151, Section 4.1 – “Timing Model”, “Simulation based approaches execute the code with different inputs and get the distribution of real execution times…There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task. To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs. We repeat the execution of each input a certain number of times to ensure that variations in cache access times are captured in our traces.”).
As per claim 19, Satish and Bonebakker teach the one or more processors of claim 18. Satish teaches wherein the plurality of different runtime timing scenarios includes one or more of: a crash of the computing application; one or more runnable overruns; one or more runnable underruns; one or more frame overruns corresponding to one or more timing frames included in the execution schedule; or one or more runnable execution order violations (Pg. 151, Section 4.1 – “Timing Model”, “There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task. To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs. We repeat the execution of each input a certain number of times to ensure that variations in cache access times are captured in our traces. The execution times of each task in each of these runs are then discretized by binning and stored in the form of a probability distribution table. Such a table has entries of the form (runtime range, probability), and stores the probability that the task has an execution time in a particular range.” Pg. 157, Section 7 – “Results”, “In this section, we show the results of applying the two statistical optimization approaches to the scheduling problem with execution time variations. We compare the makespans obtained by the two methods for statistical models at different percentiles with the result of deterministic simulated annealing using worst-case execution times.”).
As per claim 20, Satish and Bonebakker teach the one or more processors of claim 16. Satish also teaches wherein the dummy code mimics one or more of: one or more respective runtimes of one or more runnables of the plurality of runnables; accessing of memory by one or more runnables of the plurality of runnables; or fence signaling by one or more runnables of the plurality of runnables, the fence signaling including triggering of one or more timing fences corresponding to the execution schedule (Pg. 151, Section 4.1 – “Timing Model”, “In this work, we use a simulation-based approach to characterize the run times of individual tasks. There are two main types of variations we wish to capture: (1) variations in the runtime of a single task across many runs due to cache effects or bus arbitration (2) variations in the runtime due to different inputs exciting different execution traces within the task. To capture the second effect, we simulate the runtimes of each task in the task graph with different inputs.”; ” Pg. 157, Section 7 – “Results”, “In this section, we show the results of applying the two statistical optimization approaches to the scheduling problem with execution time variations. We compare the makespans obtained by the two methods for statistical models at different percentiles with the result of deterministic simulated annealing using worst-case execution times.”).
Claim(s) 17 is rejected under 35 U.S.C. 103 as being unpatentable over Satish and Bonebakker as applied to claim 16 above, in view of Sasikumar and further in view of Kiel et al. (US Pub. No. 20170039124 A1 hereinafter Kiel).
As per claim 17, Satish and Bonebakker teach the one or more processors of claim 16. Satish teaches wherein: the execution schedule is for execution using a plurality of compute engines of a runtime system (Pg. 149, Section 1 – “Introduction”, “In this work, we consider the problem of statically scheduling a graph of concurrent tasks with associated execution times on a target heterogeneous multiprocessor. The objective of the scheduling algorithm is to minimize the schedule length (makespan) of the system.” Pg. 150, Section 3.1 – “Static Scheduling Problem”, “We consider a static scheduling problem whose objective is to schedule a task dependence graph onto a multiprocessor to minimize the schedule length (makespan). The task graph is a directed acyclic graph (DAG) G = (V, E), where V = {v1, v2, ...vn} represents the set of computation tasks, and the directed edges E ⊆ V × V represent dependence constraints between the tasks. The architecture graph H = (P, C) is a graph showing the processors and their interconnections, where P = {p1, p2, ...pm} is a set of processors and C ⊆ P × P shows the connectivity between the processors. Each task is fully executed without preemption on a single processor.”).
Although Satish and Bonebakker teach that particular runnables are executed by particular compute engines, Satish and Bonebakker fail to teach the execution schedule dictating such a relationship.
However, Sasikumar teaches the execution schedule includes a set of commands dictating which individual compute engine of the plurality of compute engines executes which runnable of the plurality of runnables and dictating timing and order of execution of the plurality of runnables by the plurality of compute engines (Col. 7, lines 9-59, “The metadata 206 for each software code block can include information identifying requirements and constraints for a computer executing the code block. For example, such requirements and constraints may include where the code block is to be executed (e.g., in the cloud, on a client, etc.), when the code block is to be executed (e.g., synchronous, asynchronous--as quickly as possible, asynchronous--anytime, scheduled, etc.), which device or type of device is preferred or required for execution of the code block, and the type of runtime which is to be used (e.g., AVM, JVM, and the like). Additionally, such requirements and constraints may describe preferred characteristics for a computer executing the code block, such as preferred memory, storage, and security specifications… In some cases, a developer of the source code 304 may define metadata 310 associated with the source code 304. The metadata 310 can include various hints and constraints identified by the developer, for example, to facilitate handoffs between different computers when executing code blocks based on the source code 310. In some implementations, the developer may specify a hint for a particular part (e.g., a function or a part of a function) of the source code 304 indicating that the part should be executed by a particular computer at runtime. For example, for a client-server computing environment, the developer may indicate that a particular data processing function of an application should run on a server including a particular runtime environment.”).
Satish, Bonebakker, and Sasikumar are considered to be analogous to the claimed invention because they are in the same field of task scheduling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the task graph of Satish and Bonebakker with the scheduling commands/hints of Sasikumar to arrive at the claimed invention. The motivation to modify Satish and Bonebakker with the teachings of Sasikumar is that including such scheduling commands/hints within a task graph can facilitate more efficient scheduling and execution of an execution schedule across multiple compute engines.
Although Satish, Bonebakker, and Sasikumar teach executing runnables asynchronously and synchronously, they fail to explicitly teach how this is achieved such as by the insertion of timing fences.
However, Kiel teaches a plurality of timing fences correspond to the set of commands and are used to dictate the timing and order of execution of the plurality of runnables (¶ [0005], “A conventional mechanism for ordering or synchronizing operations with data dependencies across two or more processors (homogenous, heterogeneous, physical, logical, virtual, etc.) is to use synchronization objects or primitives. Such objects allow one processor to communicate with one or more other processors when a workload (set of tasks or operations) has completed. A fence object is an example of such a synchronization primitive. A processor can wait on a fence object, effectively blocking the processor from continuing any work, until the fence is signaled by another processor.” ¶ [0025], “In a non-intercepted system, two or more processors may use a fence or other synchronization primitive to order work as depicted in the process 200 of FIG. 2. In this diagram, processor 0 performs some work (201) before signaling a fence (207) to indicate that the work has been completed. After signaling the fence, processor 0 continues to perform more work (211). Processor 1 has a workload (203) that can be assumed to be independent of any work being done by processor 0 based on the usage of the fence.”).
Satish, Bonebakker, Sasikumar, and Kiel are all considered to be analogous to the claimed invention because they are all in the same field of task scheduling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Satish, Bonebakker, and Sasikumar with the timing fences of Kiel to arrive at the claimed invention. The motivation to modify Satish, Bonebakker, and Sasikumar with the teachings of Kiel is that timing fences help ensure proper timing and order of execution of the schedule.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Applicant has amended the claims with new limitations that change the scope of the claimed invention. Therefore, the amended claims necessitate new rejections, addressed above. The amended claims are not allowable over prior art previously cited along with an additional reference, necessitated by amendment, for reason indicated above.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN ROBERT DAKITA EWALD whose telephone number is (703)756-1845. The examiner can normally be reached Monday-Friday: 9:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached at (571)272-3759. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.D.E./Examiner, Art Unit 2199
/LEWIS A BULLOCK JR/Supervisory Patent Examiner, Art Unit 2199