DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 8/8/2025 has been entered.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 5-9, 13-16 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Herr et al. (US Pub. No. 2019/0317740 A1 hereinafter Herr) in view of Sasikumar et al. (US Patent No. 8,407,322 hereinafter Sasikumar B1) in view of Kiel et al. (US Pub. No. 2017/0039124 A1 hereinafter Kiel) in view of Beylin et al. (US Patent No. 8,381,203 B1 hereinafter Beylin) and in view of Fotland et al. (US Patent No. 7,120,783 B2 hereinafter Fotland).
As per claim 1, Herr teaches a system comprising: a plurality of compute engines to operate together to execute a computing application that includes a plurality of runnables (¶ [0043]-[0044], “In the illustrated example of FIG. 1, one or more of the FPGA 106, the VPU 108, and/or the GPU 110 are processing elements that may be utilized by a program executing on the heterogeneous system 100 for computing tasks, such as hardware acceleration…the heterogeneous system 100 may include any number and/or type of processing elements including application-specific instruction set processors (ASIPs), physic processing units (PPUs), digital signal processors (DSPs), image processors, coprocessors, floating-point units, network processors, multi-core processors, and front-end processors.”) based at least on an execution schedule and a set of commands associated with the execution schedule (¶ [0072], “In the illustrated example of FIG. 3, the runtime scheduler 314 determines how to execute a workload (e.g., one or more algorithms) during runtime of the heterogeneous system 304. In some examples, the runtime scheduler 314 generates and/or transmits execution graphs to a processing element to execute and/or otherwise implement.”), wherein the execution schedule is generated using a compiling system to include the set of commands (¶ [0093], “In the example illustrated of FIG. 4, the variant generator 302 includes the compilation auto-scheduler 408 to generate a schedule associated with the algorithm for the selected processing element based on the cost model (e.g., the weight file) generated by the cost model learner 404. In some examples, the compilation auto-scheduler 408 implements second means for generating a schedule based on configurations for processing elements of the heterogeneous system 304.” ¶ [0096], “In the illustrated example of FIG. 4, the variant generator 302 includes the variant compiler 410 to compile the schedule generated by the compilation auto-scheduler 408.”), the set of commands include one or more individual commands dictating an order of execution of one or more individual runnables of the plurality of runnables by individual compute engines of the plurality of compute engines (¶ [0094], “In some examples, a schedule can correspond to a method or order of operations associated with computing and/or otherwise processing an algorithm, where the schedule can include at least one of choices or decisions about memory locality, redundant computation, or parallelism.”), and such that the execution schedule coordinates execution of the plurality of runnables across different types of compute engines of the plurality of compute engines (¶ [0101], “Accordingly, the fat binary 309 of FIGS. 3 and/or 5 includes multiple versions (e.g., multiple binary versions, multiple schedule versions, etc.) of the same algorithm implemented on a spectrum of processing elements available on target heterogeneous systems (e.g., the heterogeneous system 304 of FIG. 3).”).
Herr fails to teach execution constraints for the plurality of runnables and the compiling system creating the execution schedule based on said execution constraints.
However, Sasikumar teaches the source code indicates execution constraints of the plurality of runnables and the execution constraints indicating timing of execution of the one or more individual runnables, order of execution of the one or more individual runnables, and a compute engine type to use for execution of the one or more individual runnables (i.e., code blocks) (Col. 7, lines 9-59, “The metadata 206 for each software code block can include information identifying requirements and constraints for a computer executing the code block. For example…when the code block is to be executed (e.g., synchronous, asynchronous--as quickly as possible, asynchronous--anytime, scheduled, etc.), which device or type of device is preferred or required for execution of the code block, and the type of runtime which is to be used (e.g., AVM, JVM, and the like). Additionally, such requirements and constraints may describe preferred characteristics for a computer executing the code block, such as preferred memory, storage, and security specifications…FIG. 3 is a process flow diagram illustrating an example process 300 for compiling source code and building an index. In general, the process 300 can be performed by a compiler 302 to transform source code 304 to prepare the code 304 for execution by multiple different computers…In some cases, a developer of the source code 304 may define metadata 310 associated with the source code 304. The metadata 310 can include various hints and constraints identified by the developer, for example, to facilitate handoffs between different computers when executing code blocks based on the source code 310. In some implementations, the developer may specify a hint for a particular part (e.g., a function or a part of a function) of the source code 304 indicating that the part should be executed by a particular computer at runtime. For example, for a client-server computing environment, the developer may indicate that a particular data processing function of an application should run on a server including a particular runtime environment.”).
Herr and Sasikumar are considered to be analogous to the claimed invention because they are in the same field of task scheduling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the algorithms/source code of Herr to include the execution constraints of Sasikumar to arrive at the claimed invention. The motivation to modify Herr with the teachings of Sasikumar is that including execution constraints such as type of compute engine in the source code can help the compiling system optimally schedule and execute the program across a heterogenous system.
Herr and Sasikumar fail to explicitly teach commands corresponding to timing fences.
However, Kiel teaches the set of commands including one or more individual commands corresponding to one or more timing fences (¶ [0005], “A fence object is an example of such a synchronization primitive. A processor can wait on a fence object, effectively blocking the processor from continuing any work, until the fence is signaled by another processor.” ¶ [0028], “When the application issues a command to monitor or wait on a fence, the interception layer applies it to the waiting fence.”)
Herr, Sasikumar, and Kiel are all considered to be analogous to the claimed invention because they are all in the same field of task scheduling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Herr and Sasikumar with the fence commands of Kiel to arrive at the claimed invention. The motivation to modify Herr and Sasikumar with the teachings of Kiel is that fence commands can help synchronize tasks across processors and ensure correct execution order.
Herr, Sasikumar, and Kiel fail to explicitly teach the compiling system inserting timing fences based on a compute graph.
However, Beylin teaches the compiling system inserts the one or more synchronization points in the execution schedule based at least on one or more compute graphs corresponding to the computing application (Col. 3, lines 32-42, “A compiler is configured to determine a minimum set of points in the control flow graph where multithreaded execution synchronization points are inserted to synchronize any divergent threads for SIMD processing.” Col. 8, lines 9-26, “FIG. 3C is a flow diagram of method steps for inserting synchronization points in the control flow graph of FIG. 3A to produce the flow control graph of FIG. 3B, in accordance with one or more aspects of the present invention. In step 360 the compiler identifies a loop in a control flow graph, such as the control flow graph shown in FIG. 3A. In step 362 the compiler determines if the loop has multiple exits, and, if so, in step 364 the compiler creates a pre-tail block before the loop tail block to intercept all edges to the loop tail block.” See also Col. 9, lines 4-12.).
Her, Sasikumar, Kiel, and Beylin are all considered to be analogous to the claimed invention because the are all in the same field of task scheduling. Therefore, I to would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the compiling system of Herr, Sasikumar, and Kiel with the well-known technique of a compiler inserting synchronization points into a program as taught in Beylin to arrive at the claimed invention. This modification would have been reasonable under MPEP § 2143 as all references teach task scheduling in a heterogenous system.
Herr, Sasikumar, Kiel, and Beylin fail to explicitly teach a deterministic schedule that is fixed across multiple execution iterations.
However, Fotland teaches a known processing by a scheduler wherein a deterministic execution schedule that is fixed, e.g. in a same order, across / during multiple different execution iterations of the computing application (Col. 7, lines 27-37, “With reference to FIG. 7a, when the scheduler, e.g., the thread controller that is illustrated in FIG. 3, utilizes strict scheduling the schedule is fixed and does not change over short periods of time. For example if the schedule is programmed to be "ABAC" as illustrated in FIG. 7a then the runtime sequence of threads will "ABACABACABAC . . . " as illustrated in FIG. 7a. Threads that are strictly scheduled are called hard-real-time (HRT) threads because the number of instructions executed per second is exact and so an HRT thread is capable of deterministic performance that can satisfy hard timing requirements.”).
Herr, Sasikumar, Kiel, Beylin, and Fotland are all considered to be analogous to the claimed invention because they are all in the same field of task scheduling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Herr, Sasikumar, Kiel, and Beylin with the ability to support execution of a fixed deterministic schedule as shown in Fotland to arrive at the claimed invention. The motivation to modify Herr, Sasikumar, Kiel, and Beylin with the teachings of Fotland is that supporting a fixed deterministic schedule allows for the ability to predict and satisfy timing requirements for various threads / workloads.
As per claim 5, Herr, Sasikumar, Kiel, Beylin, and Fotland teach the system of claim 1. Kiel also teaches further comprising a shared memory that is accessible by two or more compute engines of the plurality of compute engines (¶ [0005], “A conventional mechanism for ordering or synchronizing operations with data dependencies across two or more processors (homogenous, heterogeneous, physical, logical, virtual, etc.) is to use synchronization objects or primitives. Such objects allow one processor to communicate with one or more other processors when a workload (set of tasks or operations) has completed.”) and that stores one or more synchronization primitives corresponding to the one or more timing fences (¶ [0005], “A fence object is an example of such a synchronization primitive.” ¶ [0028], “When the application issues a command to monitor or wait on a fence, the interception layer applies it to the waiting fence.”). Herr also teaches a shared memory (¶ [0042], “In the example illustrated in FIG. 1, the storage 104 is memory including the executable 105. Additionally or alternatively, the executable 105 may be stored in the CPU 102, the FPGA 106, the VPU 108, and/or the GPU 110. In FIG. 1, the storage 104 is a shared storage between at least one of the CPU 102, the FPGA 106, the VPU 108, or the GPU 110.”).
As per claim 6, Herr, Sasikumar, Kiel, Beylin, and Fotland teach the system of claim 1. Herr teaches wherein the plurality of compute engines two or more different types of compute engines (¶ [0043]-[0044], “In the illustrated example of FIG. 1, one or more of the FPGA 106, the VPU 108, and/or the GPU 110 are processing elements that may be utilized by a program executing on the heterogeneous system 100 for computing tasks, such as hardware acceleration.”).
As per claim 7, Herr, Sasikumar, Kiel, Beylin, and Fotland teach the system of claim 1. Herr teaches wherein one or more individual compute engines of the plurality of compute engines determine execution timing of one or more runnables on the one or more individual compute engines based at least on one or more individual sub-sets of commands of the set of commands that respectively correspond to the one or more individual compute engines (¶ [0101], “In the illustrated example of FIG. 5, the first variant binary 502 is a CPU variant binary and corresponds to a compilation of an algorithm according to a first schedule based on a first target configuration of the CPU 316 of FIG. 3. In FIG. 5, the second variant binary 504 is a GPU variant binary and corresponds to a compilation of the algorithm according to a second schedule based on a second target configuration of the GPU 322 of FIG. 3.”).
As per claim 8, Herr, Sasikumar, Kiel, Beylin, and Fotland teach the system of claim 1. Kiel also teaches wherein the system is comprised in at least one of:
a control system for an autonomous or semi-autonomous machine;
a perception system for an autonomous or semi-autonomous machine;
a system for performing simulation operations;
a system for performing digital twin operations;
a system for performing light transport simulation;
a system for performing collaborative content creation for 3D assets (¶ [0027], “In one or more embodiments, capturing graphics commands may be performed by using function bundles. Each function bundle may represent the tokenization or unitization of a function or method call to the 3D graphics API. Such tokenization includes an ID (e.g., a value) that indicates which function or method the command corresponds to, and the parameters used by the function. During capture mode, a function bundle is recorded each time a function or method is called by the application.”);
a system for performing deep learning operations;
a system for presenting at least one of augmented reality content, virtual reality content, or mixed reality content;
a system for hosting one or more real-time streaming applications;
a system implemented using an edge device;
a system implemented using a robot;
a system for performing conversational Al operations;
a system for generating synthetic data;
a system incorporating one or more virtual machines (VMs);
a system implemented at least partially in a data center; or
a system implemented at least partially using cloud computing resources.
As per claim 9, it is a method claim comprising similar limitations to claim 1, so it is rejected for similar reasons. Herr also teaches such that the plurality of compute engines operate together to execute a same execution instance of the computing application according to the execution schedule (¶ [0031], “The executable file includes a number of different executable sections, where each executable section is executable by a specific processing element, and the executable file is referred to as a fat binary. For example, if a developer is developing code to be used on a heterogeneous processing platform including a CPU, an FPGA, a GPU, and a VPU, an associated fat binary will include executable sections for the CPU, the FPGA, the GPU, and the VPU, respectively. In such examples, a runtime scheduler can utilize the fat binary to execute the algorithm on at least one of the CPU, the FPGA, the GPU, or the VPU depending on the physical characteristics of the heterogeneous system…”). Sasikumar teaches the execution schedule is generated using a compiling system based at least on execution constraints of the plurality of runnables as indicated in source code corresponding to the computing application (Col. 7, lines 9-59, “The metadata 206 for each software code block can include information identifying requirements and constraints for a computer executing the code block. For example…when the code block is to be executed (e.g., synchronous, asynchronous--as quickly as possible, asynchronous--anytime, scheduled, etc.), which device or type of device is preferred or required for execution of the code block, and the type of runtime which is to be used (e.g., AVM, JVM, and the like). Additionally, such requirements and constraints may describe preferred characteristics for a computer executing the code block, such as preferred memory, storage, and security specifications.”). Beylin teaches compute graphs (Col. 3, lines 32-42, “A compiler is configured to determine a minimum set of points in the control flow graph where multithreaded execution synchronization points are inserted to synchronize any divergent threads for SIMD processing.” Col. 8, lines 9-26, “FIG. 3C is a flow diagram of method steps for inserting synchronization points in the control flow graph of FIG. 3A to produce the flow control graph of FIG. 3B, in accordance with one or more aspects of the present invention. In step 360 the compiler identifies a loop in a control flow graph, such as the control flow graph shown in FIG. 3A. In step 362 the compiler determines if the loop has multiple exits, and, if so, in step 364 the compiler creates a pre-tail block before the loop tail block to intercept all edges to the loop tail block.” See also Col. 9, lines 4-12.).
As per claim 13, it is a method claim comprising similar limitations to claim 5, so it is rejected for similar reasons.
As per claim 14, it is a method claim comprising similar limitations to claim 6, so it is rejected for similar reasons.
As per claim 15, it is a method claim comprising similar limitations to claim 7, so it is rejected for similar reasons.
As per claim 16, it is a system claim comprising similar limitations to claim 1, so it is rejected for similar reasons. Herr also teaches wherein the execution schedule coordinates joint execution of the computing application by a plurality of compute engines (¶ [0031], “The executable file includes a number of different executable sections, where each executable section is executable by a specific processing element, and the executable file is referred to as a fat binary. For example, if a developer is developing code to be used on a heterogeneous processing platform including a CPU, an FPGA, a GPU, and a VPU, an associated fat binary will include executable sections for the CPU, the FPGA, the GPU, and the VPU, respectively. In such examples, a runtime scheduler can utilize the fat binary to execute the algorithm on at least one of the CPU, the FPGA, the GPU, or the VPU depending on the physical characteristics of the heterogeneous system…”). Beylin teaches execution schedule generated based on one or more compute graphs corresponding to the computing application (Col. 3, lines 32-42, “A compiler is configured to determine a minimum set of points in the control flow graph where multithreaded execution synchronization points are inserted to synchronize any divergent threads for SIMD processing.” Col. 8, lines 9-26, “FIG. 3C is a flow diagram of method steps for inserting synchronization points in the control flow graph of FIG. 3A to produce the flow control graph of FIG. 3B, in accordance with one or more aspects of the present invention. In step 360 the compiler identifies a loop in a control flow graph, such as the control flow graph shown in FIG. 3A. In step 362 the compiler determines if the loop has multiple exits, and, if so, in step 364 the compiler creates a pre-tail block before the loop tail block to intercept all edges to the loop tail block.” See also Col. 9, lines 4-12.).
As per claim 19, it is a system claim comprising similar limitations to claim 5, so it is rejected for similar reasons.
As per claim 20, it is a system claim comprising similar limitations to claim 7, so it is rejected for similar reasons.
Claim(s) 2-3, 10-11, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Herr, Sasikumar, Kiel, Beylin, and Fotland as applied to claims 1, 9, and 16 above, and further in view of Lu et al. (NPL Document - "Robustness of modular multi-layered software in the automotive domain: a wrapping based approach" hereinafter Lu).
As per claim 2, Herr, Sasikumar, Kiel, Beylin, and Fotland teach the system of claim 1.
Herr, Sasikumar, Kiel, Beylin, and Fotland fail to teach monitoring the plurality of runnables based on the execution schedule.
However, Lu teaches further comprising a monitoring system to monitor execution of the plurality of runnables based at least on the execution schedule (Section 2 – “Problem statement and objectives”, “To achieve this, our robustness approach consists in developing a defense software as a set of wrappers checking multilevel properties at runtime.” Section 5.1 – “Fault assumptions – application level”, “Another control failure type is a ‘wrong sequence of execution’.”)
Herr, Sasikumar, Kiel, Beylin, Fotland, and Lu are all considered to be analogous to the claimed invention because they are all in the same field of task scheduling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Herr, Sasikumar, Kiel, Beylin, and Fotland with the runtime monitoring capabilities of Lu to arrive at the claimed invention. The motivation to modify Herr, Sasikumar, Kiel, Beylin, and Fotland with the teachings of Lu is that runtime monitoring can ensure correct execution and alert the administrator/user to any failures corresponding to execution of the application.
As per claim 3, Herr, Sasikumar, Kiel, Beylin, Fotland, and Lu teach the system of claim 2. Lu teaches wherein the monitoring system is to monitor one or more of:
satisfaction of one or more timing constraints associated with execution of the plurality of runnables, the execution schedule indicating the one or more timing constraints (Section 5.1 – “Fault assumptions – application level”, “The first one, in table 1, targets “triggering events” of execution. These events may be ruled by timing constraints, data reception, emission, or other mixed pre- or post-conditions.”);
satisfaction of an execution order associated with two or more of the plurality of runnables, the execution schedule indicating the execution order (Section 5.1 – “Fault assumptions – application level”, “Another control failure type is a ‘wrong sequence of execution’.”); or
health of a system task manager that is configured to manage execution of the plurality of runnables based on the execution schedule.
As per claim 10, it is a method claim comprising similar limitations to claim 2, so it is rejected for similar reasons.
As per claim 11, it is a method claim comprising similar limitations to claim 3, so it is rejected for similar reasons.
As per claim 17, it is a processor claim comprising similar limitations to claim 2, so it is rejected for similar reasons.
Claim(s) 4, 12, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Herr, Sasikumar, Kiel, Beylin, and Fotland as applied to claims 1, 9, and 16 above, and further in view of Piazza et al. (US Pub. No. 2011/0161964 A1 hereinafter Piazza).
As per claim 4, Herr, Sasikumar, Kiel, Beylin, and Fotland teach the system of claim 1. Herr teaches wherein the compiling system generates the execution schedule (¶ [0093], “In the example illustrated of FIG. 4, the variant generator 302 includes the compilation auto-scheduler 408 to generate a schedule associated with the algorithm for the selected processing element based on the cost model (e.g., the weight file) generated by the cost model learner 404. In some examples, the compilation auto-scheduler 408 implements second means for generating a schedule based on configurations for processing elements of the heterogeneous system 304.” ¶ [0096], “In the illustrated example of FIG. 4, the variant generator 302 includes the variant compiler 410 to compile the schedule generated by the compilation auto-scheduler 408.”).
Herr, Sasikumar, Kiel, Beylin, and Fotland fail to teach the execution schedule being generated based on a bubble scheduling processor a branch and bound scheduling process.
However, Piazza teaches wherein the compiling system generates the execution schedule based at least on one or more of a bubble scheduling process or a branch and bound scheduling process (¶ [0030], “Other examples of methods suitable for determining an optimal task schedule may include any of a number of deterministic methods (e.g., interval optimization and branch and bound methods)…”).
Herr, Sasikumar, Kiel, Beylin, Fotland, and Piazza are all considered to be analogous to the claimed invention because they are all in the same field of task scheduling. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Herr, Sasikumar, Kiel, Beylin, and Fotland with the scheduling method of Piazza to arrive at the claimed invention. The motivation to modify Herr, Sasikumar, Kiel, Beylin, and Fotland with the teachings of Piazza is that employing an optimal scheduling method such as branch and bound ensures efficient scheduling of tasks for optimal use of computation resources.
As per claim 12, it is a method claim comprising similar limitations to claim 4, so it is rejected for similar reasons.
As per claim 18, it is a processor claim comprising similar limitations to claim 4, so it is rejected for similar reasons.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Applicant has amended the claims with new limitations which changes the scope of the claimed invention. Therefore, the amended claims necessitate new rejections, as addressed above. The amended claims are not allowable over prior art cited previously along with an additional reference, necessitated by amendment, for reasons indicated above.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN ROBERT DAKITA EWALD whose telephone number is (703)756-1845. The examiner can normally be reached Monday-Friday: 9:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached at (571)272-3759. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.D.E./Examiner, Art Unit 2199
/LEWIS A BULLOCK JR/Supervisory Patent Examiner, Art Unit 2199