DETAILED ACTION
Claims 1-20 are pending in this application.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 2, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 202010573441 A1 An et al. in view of WO 0169823 A1 to Magill et al. and further in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al.
As to claim 1, An teaches A processor comprising:
a global scheduler comprising circuity configured to distribute work items for execution (Global Task Scheduling Unit 101);
at least one local scheduler comprising circuity configured to distribute work items to one or more processors for execution (Local Task Scheduling Unit 102); and
at least a first mailbox accessible by the global scheduler (Global Memory).
An is silent with reference to wherein the at least one local scheduler writes a first set of messages to the first mailbox to initialize a point-to-point communication with the global scheduler and
at least a first mailbox, comprising a dedicated memory location in a global cache, accessible by the global scheduler and the at least one local scheduler that is at least readable by the global scheduler and at least writable by the at least one local scheduler independent of a main-memory subsystem.
Magill teaches wherein the at least one local scheduler (Local Scheduler 308) writes a first set of messages (data packets) to the first mailbox (Data Packet Queues 302, 304 and 306) to initialize a point-to-point communication with the global scheduler (“…Figure 3 shows a simplified representation of an input port 300 and how data packets of the input port can be drawn off of three different data packet queues 302, 304 and 306 (the output port being the only attribute of each ensemble in this example) according to a local scheduler 308 and input to the switch fabric 210 (in Figure 2). As shown in Figure 3, input data packets come into the port 300 in possibly any order. As shown, a "C" packet arrived in the port 300 ahead of an "A" packet. The designation of "A" "B" or "C" identifies the output port to which the particular packet is supposed to be routed by the switch (which corresponds to which ensemble each packet belongs to in this example). Conceptually, the "A" packets that are in the port are grouped together to form a queue of "A" packets. Similarly, the "B" and "C" packets are also grouped together. The number of internal packets that are read from any of the queues in a frame is determined by the global scheduler 310, but when that determination is made, the local scheduler reads A, B and C packets from the respective queues and sends them into the switch fabric 210. As shown, the output data packet sequence is "ACAB_C"(note that for one slot no internal packet is sent). As set forth above, if too many internal packets for any given output (A, B or C) are sent in to the switch fabric too rapidly, a buffer in the switch fabric could be overrun, causing data to be lost. By properly ordering the sequence according to which queued internal data packets are sent to the switch fabric, buffer overruns can be reduced. The internal packet ordering is better understood by reference to Figure 4…” ).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An with the teaching of Magill because the teaching of Magill would improve the system of An by providing a technique of exchanging messages/packets between a local and global scheduler using asynchronous technique.
Milojicic teaches at least a first mailbox comprising a dedicated memory location in a global cache (cached) accessible by the global scheduler (Global Scheduler 120/220/Regional Scheduler 130/230) and the at least one local scheduler (Local Scheduler 140/240) (“…In some examples, the FaaS request may traverse a plurality of schedulers, including global scheduler 220, regional scheduler 230, and local scheduler 240, as illustrated in block 4 of FIG. 3. FIG. 3 may also include an illustration of the first request, requested optimal resource, and reserve resource. In some examples, information associated with each resource infrastructure pool 250 may be cached at any of these four levels (e.g., global dispatcher 210, global scheduler 220, regional scheduler 230, local scheduler 240)…As illustrated in FIG. 7, functions can be cached on the resource node where they are targeted to execute, or they can be cached in the memory of another resource node or in global memory. The system may manage the caches by rearranging and moving the least beneficial functions stored in the cache to different caches accessible by the system…The decision of where to cache a function can be made through local scheduler 140 or regional scheduler 130. In some examples, the function can be cached close to where it has been executed. For example, if the function cannot be cached close to where it is executed, then it may be redeployed from the source (e.g., one place where it is deployed first or would even have to be redeployed from the customer after long time has expired, etc.)…” paragraphs 0110/0111).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An and Magill with the teaching of Milojicic because the teaching of Milojicic would improve the system of An and Magill by providing a temporary storage place for data, allowing for faster access in the future.
Duault teaches a dedicated memory location (Shared Memory 1/ Queue-1/Queue-2/Queue-3), accessible by the global scheduler (remote scheduler) and the at least one local scheduler (local scheduler) that is at least readable by the global scheduler and at least writable by the at least one local scheduler independent of a main-memory subsystem (“…In a loosely coupled multiprocessor environment wherein a plurality of processors (2) are attached to a shared intelligent memory (1), a distributed scheduling mechanism for scheduling of source processors (4) with respective server processes (5) to be executed by the processors (2) upon their attachment to a data message queue (3) contained in the shared intelligent memory (1), the processes (4, 5) using data messages enqueued into, respectively dequeued from said memory (1). According to this scheduling mechanism, an independent scheduler (6) is dedicated to each of the processes of a process group, and all the schedulers monitor the status of the data message queue, and upon receipt of an empty-to-non-empty E-NE signal, the least busy scheduler dequeues shared data from the queue, so that it can be processed by its associated process, without however, loosing fault-tolerance in case of a particular processor failing…When the local scheduler (i.e the scheduler associated to the working server process) dequeues the only message of the queue, the scheduler updates its state to `Empty` at time t+T4i+T5i…c. When a remote scheduler (i.e a scheduler associated to a process different from the one performing a scheduling operation) dequeues the only message if the queue is `Non.sub.-- Empty`, the scheduler updates its state to `Non.sub.-- Empty` at time t+T1i+T2i. If a remote scheduler has performed a dequeue operation, the local scheduler is not aware that the queue is empty, but it performs a dequeue at time t+T3i and updates its state when it receives the dequeue status at time t+T3i+T4i+T5i…” Abstract, Col. 9 Ln. 66-67, Col. 10 Ln. 1-12).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An, Magill and Milojicic with the teaching of Duault because the teaching of Duault would improve the system of An, Magill and Milojicic by providing a scheduling mechanism, an independent scheduler is dedicated to each of the processes of a process group.
As to claim 2, An teaches the processor as recited in claim 1, further comprising:
a second mailbox (Shared Memory) accessible by the at least one local scheduler (“…the local dispatcher can select the sub-task with the highest priority level in the task buffer area; and transmitting the sub-task with the highest processing priority to the execution core of the persistent operation, so that the execution core executes the task…”); wherein the global scheduler writes a second set of messages to the second mailbox (shared memory) in response to initialization of the point-to-point communication (“…after the global scheduler searches the target sub-task, determining the processor with the least current task in each flow processor, and sending the target sub-task to the task buffer of the shared memory of the flow processor with the least task amount…”).
Duault teaches dedicated to the at least one local scheduler, the second mailbox (Queue-1/Queue-2/Queue-3) being readable by the at least one local scheduler and writable by the global scheduler (figure 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An, Magill and Milojicic with the teaching of Duault because the teaching of Duault would improve the system of An, Magill and Milojicic by providing a scheduling mechanism, an independent scheduler is dedicated to each of the processes of a process group.
As to claim 19, An teaches a computing system comprising:
a central processing circuit (central processor CPU);
a memory controller (Storage System); and
a graphics processing circuit (GPU) comprising:
a global scheduler circuity configured to distribute work items for execution (Global Task Scheduling unit 101);
at least one local scheduler comprising circuity configured to distribute work items to one or more processors for execution (Local Task Scheduling unit 102); and
at least a first mailbox accessible by the global scheduler (Global Memory).
An is silent with reference to wherein the at least one local scheduler writes a first set of messages to the first mailbox to initialize a point-to-point communication with the global scheduler and
at least a first mailbox comprising a dedicated memory location in a global cache accessible by the global scheduler and the at least one local scheduler.
Magill teaches wherein the at least one local scheduler (Local Scheduler 308) writes a first set of messages (data packets) to the first mailbox (Data Packet Queues 302, 304 and 306) to initialize a point-to-point communication with the global scheduler (“…Figure 3 shows a simplified representation of an input port 300 and how data packets of the input port can be drawn off of three different data packet queues 302, 304 and 306 (the output port being the only attribute of each ensemble in this example) according to a local scheduler 308 and input to the switch fabric 210 (in Figure 2). As shown in Figure 3, input data packets come into the port 300 in possibly any order. As shown, a "C" packet arrived in the port 300 ahead of an "A" packet. The designation of "A" "B" or "C" identifies the output port to which the particular packet is supposed to be routed by the switch (which corresponds to which ensemble each packet belongs to in this example). Conceptually, the "A" packets that are in the port are grouped together to form a queue of "A" packets. Similarly, the "B" and "C" packets are also grouped together. The number of internal packets that are read from any of the queues in a frame is determined by the global scheduler 310, but when that determination is made, the local scheduler reads A, B and C packets from the respective queues and sends them into the switch fabric 210. As shown, the output data packet sequence is "ACAB_C"(note that for one slot no internal packet is sent). As set forth above, if too many internal packets for any given output (A, B or C) are sent in to the switch fabric too rapidly, a buffer in the switch fabric could be overrun, causing data to be lost. By properly ordering the sequence according to which queued internal data packets are sent to the switch fabric, buffer overruns can be reduced. The internal packet ordering is better understood by reference to Figure 4…” ).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An with the teaching of Magill because the teaching of Magill would improve the system of An by providing a technique of exchanging messages/packets between a local and global scheduler using asynchronous technique.
Milojicic teaches at least a first mailbox comprising a dedicated memory location in a global cache (cached) accessible by the global scheduler (Global Scheduler 120/220/Regional Scheduler 130/230) and the at least one local scheduler (Local Scheduler 140/240) (“…In some examples, the FaaS request may traverse a plurality of schedulers, including global scheduler 220, regional scheduler 230, and local scheduler 240, as illustrated in block 4 of FIG. 3. FIG. 3 may also include an illustration of the first request, requested optimal resource, and reserve resource. In some examples, information associated with each resource infrastructure pool 250 may be cached at any of these four levels (e.g., global dispatcher 210, global scheduler 220, regional scheduler 230, local scheduler 240)…As illustrated in FIG. 7, functions can be cached on the resource node where they are targeted to execute, or they can be cached in the memory of another resource node or in global memory. The system may manage the caches by rearranging and moving the least beneficial functions stored in the cache to different caches accessible by the system…The decision of where to cache a function can be made through local scheduler 140 or regional scheduler 130. In some examples, the function can be cached close to where it has been executed. For example, if the function cannot be cached close to where it is executed, then it may be redeployed from the source (e.g., one place where it is deployed first or would even have to be redeployed from the customer after long time has expired, etc.)…” paragraphs 0110/0111).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An and Magill with the teaching of Milojicic because the teaching of Milojicic would improve the system of An and Magill by providing a temporary storage place for data, allowing for faster access in the future.
Duault teaches a dedicated memory location (Shared Memory 1/ Queue-1/Queue-2/Queue-3), accessible by the global scheduler (remote scheduler) and the at least one local scheduler (local scheduler) that is at least readable by the global scheduler and at least writable by the at least one local scheduler independent of a main-memory subsystem (“…In a loosely coupled multiprocessor environment wherein a plurality of processors (2) are attached to a shared intelligent memory (1), a distributed scheduling mechanism for scheduling of source processors (4) with respective server processes (5) to be executed by the processors (2) upon their attachment to a data message queue (3) contained in the shared intelligent memory (1), the processes (4, 5) using data messages enqueued into, respectively dequeued from said memory (1). According to this scheduling mechanism, an independent scheduler (6) is dedicated to each of the processes of a process group, and all the schedulers monitor the status of the data message queue, and upon receipt of an empty-to-non-empty E-NE signal, the least busy scheduler dequeues shared data from the queue, so that it can be processed by its associated process, without however, loosing fault-tolerance in case of a particular processor failing…When the local scheduler (i.e the scheduler associated to the working server process) dequeues the only message of the queue, the scheduler updates its state to `Empty` at time t+T4i+T5i…c. When a remote scheduler (i.e a scheduler associated to a process different from the one performing a scheduling operation) dequeues the only message if the queue is `Non.sub.-- Empty`, the scheduler updates its state to `Non.sub.-- Empty` at time t+T1i+T2i. If a remote scheduler has performed a dequeue operation, the local scheduler is not aware that the queue is empty, but it performs a dequeue at time t+T3i and updates its state when it receives the dequeue status at time t+T3i+T4i+T5i…” Abstract, Col. 9 Ln. 66-67, Col. 10 Ln. 1-12).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An, Magill and Milojicic with the teaching of Duault because the teaching of Duault would improve the system of An, Magill and Milojicic by providing a scheduling mechanism, an independent scheduler is dedicated to each of the processes of a process group.
As to claim 20, see the rejection of claim 2 above.
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 202010573441 A1 An et al. in view of WO 0169823 A1 to Magill et al. and further in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al. as applied to claim 2 above, and further in view of CN No. 111427680 A to Vembu et al.
As to claim 3, An as modified by Magill, Milojicic and Duault teaches the processor as recited in claim 2, wherein the first mailbox comprises a command queue configured to store the first set of messages received from the at least one local scheduler.
Vembu teaches wherein the first mailbox comprises a command queue (batch command buffer) configured to store the first set of messages received from the at least one local scheduler (“…FIG. 16B shows system graphical interface according to embodiment 1602. system graphical interface 1602 comprises interrupt unit 1612, an equipment interface 1614, a doorbell 1603, a system/device address translator 1616 and submit 1618 the batch buffer. message signal interrupt unit 1612 be configured as remote or main interrupt unit, and may send a value in the interrupt register unit 1612 in the storing of the generated interrupt (MSI). the device interface 1614 can include hardware to enable graphics system as a whole or as a separate fragment is presented as the interface bus (such as, but not limited to PCIe bus). the doorbell 1603 is a plurality of doorbell interface can submit work load 1604 through it in one, wherein the working load 1604 may be the work load 1604A-1604D of FIG. 16A in any one of them. the doorbell 1603 may be a doorbell structure or register, which may be used to the engine block fragment notice can be used for processing the work request is associated. In one embodiment, in a batch command buffer (e.g., batch buffer) provided in the form of a work request. can be submitted 1618 for processing batch buffer via a batch buffer. In one embodiment, the batch buffer 1618 can use a system/device address converter 1616 from the system address into a device local address for engine block segment. then batch buffer commands can be submitted to the engine block fragment is associated…FIG. 16C shows the engine block 1605 may receive workload from the application or driving program via a system graphical interface. engine block segment 1605 comprising a plurality of engine can process the command received from the host system. one or more block engine may be executing unit 1629A-1629N to execute basic instructions of various operations and perform these commands. engine block slice 1605 further comprises scheduler 1621, the local scheduler of the scheduler 1621 is for the engine block 1605, the scheduling instructions by the command and/or dispatch of the fragmentation processing for executing on the execution unit 1629A-1629N…”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An, Magill, Milojicic and Duault with the teaching of Vembu because the teaching of Vembu would improve the system of An, Magill, Milojicic and Duault by providing a technique for processing tasks in a batch fashion and thus conserving computing resources.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 202010573441 A1 An et al. in view of WO 0169823 A1 to Magill et al. and further in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of CN No. 111427680 A to Vembu et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al.
as applied to claim 3 above, and further in view of U.S. Pub. No. 2013/0155080 A1 to Nordlund et al.
As to claim 4, An as modified by Magill, Milojicic, Vembu and Duault teaches the processor as recited in claim 3, wherein the command queue is configured to store a predetermined number of messages in a first-in-first-out mode.
Nordlund teaches wherein the command queue is configured to store a predetermined number of messages in a first-in-first-out mode (first in first out (FIFO) registers) (“… According to aspects of this disclosure, command processor 56 may initially parse the received command stream and identify each task that is to be performed by GPU 48. In addition to parsing the tasks from the command stream, command processor 56 may maintain a command queue for organizing each of the tasks to be executed by the components of GPU 48. For example, command processor 56 may schedule tasks to be executed by the components of GPU 48 (such as shader processor 52 and/or fixed function units 54) using the command queue. In some examples, the command queues may be fixed function hardware units (e.g., first in first out (FIFO) registers, or the like). In other examples, the command queues may be general memory or register units…” paragraph 0064).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An, Magill, Milojicic, Vembu and Duault with the teaching of Nordlund because the teaching of Nordlund would improve the system of An, Magill, Milojicic, Vembu and Duault by providing a technique for processing tasks or messages in a preferred order.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 202010573441 A1 An et al. in view of WO 0169823 A1 to Magill et al. and further in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al. as applied to claim 2 above, and further in view of U.S. Pub. No. 2014/0240327 A1 to Lustig et al.
As to claim 5, An as modified by Magill, Milojicic and Duault teaches the processor as recited in claim 2, however it is silent with reference to wherein a message of the first set of messages comprises an indication that the second mailbox is empty.
Lustig teaches wherein a message of the first set of messages comprises an indication that the second mailbox is empty (full/empty bit) (“…According to one embodiment, the data-based fine synchronization is accomplished using a full/empty bit associated with each unit of memory in the GPU. When one or more write operations are performed on a unit of memory in the GPU, the full/empty bit associated with that unit of memory is set. When one or more read operations are performed on a unit of memory in the GPU, the full/empty bit associated with that unit of memory is cleared. Accordingly, data-based fine synchronization may be performed between the CPU and GPU at any desired resolution, thereby allowing the heterogeneous computing system to realize performance enhancements and reducing the overhead associated with offloading a process from the CPU to the GPU…According to one exemplary embodiment, the CPU 38 and the GPU 40 are in a consumer-producer relationship. For example, if the GPU 40 wishes to read data provided by the CPU 38, the GPU 40 will issue a read request with a trigger condition specifying that it will not read the requested memory until the F/E bit associated with the requested memory is marked full. Until the CPU 38 sends the data, the F/E bit associated with the requested memory is set to empty, and the GPU 40 will block the request. When the CPU 38 writes the data to the requested memory location, the F/E bit associated with the requested memory is filled, and the GPU 40 executes the read request safely. For coalesced requests, the responses are returned when all the relevant F/E bits indicate readiness…” paragraphs 0018/0034).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An, Magill, Milojicic and Duault with the teaching of Lustig because the teaching of Lustig would improve the system of An, Magill, Milojicic and Duault by providing a boolean or bit data structure to control access to a shared memory.
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 202010573441 A1 An et al. in view of WO 0169823 A1 to Magill et al. and further in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al. as applied to claim 1 above, and further in view of U.S. Pub. No. 2020/0020156 A1 to Howson et al.
As to claim 6, An as modified by Magill, Milojicic and Duault teaches the processor as recited in claim 1, however it is silent with reference to wherein the local scheduler activates a blocking send to the global processor when a predetermined number of messages in the first mailbox is reached.
Howson teaches wherein the local scheduler activates a blocking send to the global processor when a predetermined number of messages in the first mailbox is reached (a buffer threshold is not met) (“…The scheduler 521 is configured to control the reading from and writing to the buffer 522 to ensure that the buffer does not overflow whilst also attempting to minimise the amount of time that the buffer is empty. This allows the tessellation module 500 to maximise the amount of time that the first and second tessellation stages 510 and 530 are operating to optimise throughput. In particular, the scheduler 521 monitors the number of entries currently in the buffer. If the buffer is not full (e.g. a buffer threshold is not met), the scheduler 521 sends a signal to the geometry source 300 to emit another patch of data for processing by the first tessellation stage 510. Moreover, the scheduler 521 is configured to control the tessellation instance distributor 523 by sending a control signal to send data for a tessellation instance to a tessellation pipeline in the second tessellation stage 530. The scheduler 521 controls the tessellation instance distributor 523 based on the availability of tessellation pipelines received as status information from the second tessellation stage 530…” paragraph 0060).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An, Magill, Milojicic and Duault with the teaching of Howson because the teaching of Howson would improve the system of An, Magill, Milojicic and Duault by providing a mechanism for controlling how messages or tasks are organized and retrieved from a data structure to allow for seamless access and retrieval.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 202010573441 A1 An et al. in view of WO 0169823 A1 to Magill et al. and further in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al. as applied to claim 1 above, and further in view of U.S Pub. No. 2012/0020368 A1 to Sundararaman et al.
As to claim 7, An as modified by Magill, Milojicic and Duault teaches the processor as recited in claim 1, however it is silent with reference to wherein the local scheduler pauses execution of an instruction until an acknowledgment of successful transmission and storage of a message to the first mailbox is received.
Sundararaman teaches wherein the local scheduler pauses execution of an instruction until an acknowledgment of successful transmission and storage of a message to the first mailbox is received (Step 1014) (“…At step 1008, if the child scheduler has been scheduled, but a response has not yet been sent to root scheduler 402, then the child scheduler is pending, and SDWRR algorithm 1000 proceeds to step 1014. At step 1014, root scheduler 402 waits for an acknowledgment signal from the child scheduler before processing the corresponding child scheduler.…” paragraph 0085).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An, Magill, Milojicic and Duault with the teaching of Sundararaman because the teaching of Sundararaman would improve the system of An, Magill, Milojicic and Duault by providing an acknowledgment signal for acknowledge the successful or failure of a computer processing.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 202010573441 A1 An et al. in view of WO 0169823 A1 to Magill et al. and further in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al. as applied to claim 1 above, and further in view of U.S. Pub. No. 2017/0185435 A1 to Dewan et al.
As to claim 8, An as modified by Magill, Milojicic and Duault teaches the processor as recited in claim 1, however it is silent with reference to wherein the global scheduler checks for new messages in the first mailbox until a predetermined timeout period is reached.
Dewan teaches wherein the global scheduler checks for new messages in the first mailbox (work queue/a shared memory) until a predetermined timeout period is reached (“…In various embodiments, when an interrupt is received from a device having work associated with a secure virtual machine, a work queue is created in a shared memory. The creation of the work queue may require an exit to a virtual-machine manager, but subsequent interrupts from the device are submitted to the work queue and may not require exiting to the virtual-machine manager, thus eliminating or reducing processing overhead involved with exiting to the virtual-machine manager on each subsequent interrupt. A task-priority register may be updated to filter interrupts from the device such that interrupts from the device to the secure virtual machine are passed while interrupts from the device or other devices to other virtual machines are blocked. In some embodiments, the task-priority register remains at the updated priority until the work queue is empty; in other embodiments, a timer is configured to start when the first work request is received and to expire after a number of cycles have elapsed. The task-priority register may, upon expiration of the timer, restore the previous priority. While the work queue is not empty and while the timer is not expired, however, the secure virtual machine polls the work queue for additional pending work…Once the secure virtual machine is launched, it polls (610) the work queue 414 in the shared memory buffer for work to be executed by the secure virtual machine until the work queue 414 is empty or until the timer 408 expires…” paragraphs 0009/0027/0050).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An, Magill, Milojicic and Duault with the teaching of Dewan because the teaching of Dewan would improve the system of An, Magill, Milojicic and Duault by providing a technique for controlling access to a shared memory and thus optimally managing computing resources.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 202010573441 A1 An et al. in view of WO 0169823 A1 to Magill et al. and further in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al. as applied to claim 1 above, and further in view of U.S. Pub. No. 2006/0179436 A1 to Yasue.
As to claim 9, An as modified by Magill, Milojicic and Duault teaches the processor as recited in claim 1, however it is silent with reference to wherein the point-to-point communication is initialized independent of a main memory subsystem associated with the processor.
Yasue teaches wherein the point-to-point communication is initialized independent of a main memory subsystem associated with the processor (“…Each processor includes a local memory within which to execute the processing tasks without resort to the main memory. In response to the application programming interface code(s), a change from the current processing task to the subsequent processing task is invoked within a given processor while maintaining the output data unit from the current processing task within the local memory of the given processor…” paragraph 0014).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of An, Magill, Milojicic and Duault with the teaching of Yasue because the teaching of Yasue would improve the system of An, Magill, Milojicic and Duault by providing a technique for processing tasks without the use of a main memory and as such unburdening the processing of the main memory.
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 112114951 A to Chen et al. in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al.
As to claim 10, Chen teaches a method comprising:
receiving, at a mailbox associated with a global scheduler (global scheduler), a first message (task request) from a local scheduler (local scheduler) coupled to one or more processors, wherein the global scheduler comprising circuity configured to distribute work items for execution (“…a global scheduler, the global scheduler is located at the main node of the distributed cluster, and storing and identifying the sub-node with enough resource of the task request type in the distributed cluster, receiving the scheduling task request forwarded by the local scheduler from each sub-node; making the scheduling decision according to the load of each sub-node and the constraint of the task…”); and local scheduler comprises circuitry configured to send the work items to the one or more processors for execution (Local Scheduler 1) (“…the global scheduler after receiving the task request of the local scheduler, the task is allocated to the optimal node capable of executing the task, in FIG. 1, the global scheduler 1 after receiving the request of the task 2 of the local scheduler 1, the task 2 is allocated to the local scheduler 2; the local scheduler 2 according to the requirement of the task 2, the global control storage unit obtains all parameters of the execution task 2, locally generating a task 3 to execute the task request…”)
retrieving, responsive to the first message (task request) by the global scheduler, one or more work items from a global storage (global control storage unit) (“…the global scheduler after receiving the task request of the local scheduler, the task is allocated to the optimal node capable of executing the task, in FIG. 1, the global scheduler 1 after receiving the request of the task 2 of the local scheduler 1, the task 2 is allocated to the local scheduler 2; the local scheduler 2 according to the requirement of the task 2, the global control storage unit obtains all parameters of the execution task 2, locally generating a task 3 to execute the task request…”); and
exporting, by the global scheduler, the one or more work items for execution by the one or more processors coupled to the local scheduler (“…Therefore, in the present invention, the task is firstly scheduled in the local scheduler, only the local cannot meet the need of the task to the global scheduler, the global scheduler for scheduling, the local scheduler firstly tries to perform task scheduling locally; Therefore, the scheduling method is the scheduling method from bottom to top…”).
Chen is silent with reference to receiving, at a first mailbox comprising a dedicated memory location in a global cache associated with accessible by a global scheduler and a local scheduler.
Milojicic teaches receiving, at a first mailbox comprising a dedicated memory location in a global cache (cached) associated with accessible by a global scheduler (Global Scheduler 120/220)and a local scheduler (Local Scheduler 140/240) (“…In some examples, the FaaS request may traverse a plurality of schedulers, including global scheduler 220, regional scheduler 230, and local scheduler 240, as illustrated in block 4 of FIG. 3. FIG. 3 may also include an illustration of the first request, requested optimal resource, and reserve resource. In some examples, information associated with each resource infrastructure pool 250 may be cached at any of these four levels (e.g., global dispatcher 210, global scheduler 220, regional scheduler 230, local scheduler 240)…As illustrated in FIG. 7, functions can be cached on the resource node where they are targeted to execute, or they can be cached in the memory of another resource node or in global memory. The system may manage the caches by rearranging and moving the least beneficial functions stored in the cache to different caches accessible by the system…The decision of where to cache a function can be made through local scheduler 140 or regional scheduler 130. In some examples, the function can be cached close to where it has been executed. For example, if the function cannot be cached close to where it is executed, then it may be redeployed from the source (e.g., one place where it is deployed first or would even have to be redeployed from the customer after long time has expired, etc.)…” paragraphs 0110/0111).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of Chen with the teaching of Milojicic because the teaching of Milojicic would improve the system of Chen by providing a temporary storage place for data, allowing for faster access in the future.
Duault teaches a dedicated memory location (Shared Memory 1/ Queue-1/Queue-2/Queue-3), accessible by the global scheduler (remote scheduler) and the at least one local scheduler (local scheduler) that is at least readable by the global scheduler and at least writable by the at least one local scheduler independent of a main-memory subsystem (“…In a loosely coupled multiprocessor environment wherein a plurality of processors (2) are attached to a shared intelligent memory (1), a distributed scheduling mechanism for scheduling of source processors (4) with respective server processes (5) to be executed by the processors (2) upon their attachment to a data message queue (3) contained in the shared intelligent memory (1), the processes (4, 5) using data messages enqueued into, respectively dequeued from said memory (1). According to this scheduling mechanism, an independent scheduler (6) is dedicated to each of the processes of a process group, and all the schedulers monitor the status of the data message queue, and upon receipt of an empty-to-non-empty E-NE signal, the least busy scheduler dequeues shared data from the queue, so that it can be processed by its associated process, without however, loosing fault-tolerance in case of a particular processor failing…When the local scheduler (i.e the scheduler associated to the working server process) dequeues the only message of the queue, the scheduler updates its state to `Empty` at time t+T4i+T5i…c. When a remote scheduler (i.e a scheduler associated to a process different from the one performing a scheduling operation) dequeues the only message if the queue is `Non.sub.-- Empty`, the scheduler updates its state to `Non.sub.-- Empty` at time t+T1i+T2i. If a remote scheduler has performed a dequeue operation, the local scheduler is not aware that the queue is empty, but it performs a dequeue at time t+T3i and updates its state when it receives the dequeue status at time t+T3i+T4i+T5i…” Abstract, Col. 9 Ln. 66-67, Col. 10 Ln. 1-12).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of Chen and Milojicic with the teaching of Duault because the teaching of Duault would improve the system of Chen and Milojicic by providing a scheduling mechanism, an independent scheduler is dedicated to each of the processes of a process group.
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 112114951 A to Chen et al. in view of in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al. as applied to claim 10 above, and further in view of CN. No. 202010573441 A1 An et al.
As to claim 11, Chen as modified by Milojicic and Duault teaches the method as recited in claim 10, however it is silent with reference to writing, by the global scheduler, a second message to a second mailbox associated with the local scheduler in response to receiving the first message, thereby initiating a point-to-point communication with the local scheduler.
An teaches writing, by the global scheduler, a second message to a second mailbox (Shared Memory) associated with the local scheduler in response to receiving the first message, thereby initiating a point-to-point communication with the local scheduler (“…after the global scheduler searches the target sub-task, determining the processor with the least current task in each flow processor, and sending the target sub-task to the task buffer of the shared memory of the flow processor with the least task amount…”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of Chen, Milojicic and Duault with the teaching of An because the teaching of An would improve the system of Chen, Milojicic and Duault by providing a shared memory that allows for multiple processes to access the same memory space, making it easy to share data.
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 112114951 A to Chen et al. in view of in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al. as applied to claim 10 above, and further in view of U.S. Pub. No. 2006/0179436 A1 to Yasue.
As to claim 12, Chen as modified by Milojicic and Duault teaches the method as recited in claim 11, however it is silent with reference to wherein the point-to-point communication is initialized independent of a main memory subsystem associated with the processor.
Yasue teaches wherein the point-to-point communication is initialized independent of a main memory subsystem associated with the processor (“…Each processor includes a local memory within which to execute the processing tasks without resort to the main memory. In response to the application programming interface code(s), a change from the current processing task to the subsequent processing task is invoked within a given processor while maintaining the output data unit from the current processing task within the local memory of the given processor…” paragraph 0014).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of Chen, Milojicic and Duault with the teaching of Yasue because the teaching of Yasue would improve the system of Chen, Milojicic and Duault by providing a technique for processing tasks without the use of a main memory and as such unburdening the processing of the main memory.
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 112114951 A to Chen et al. in view of in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al. as applied to claim 10 above, and further in view of CN No. 111427680 A to Vembu et al.
As to claim 13, Chen as modified by Milojicic and Duault teaches the method as recited in claim 10, however it is silent with reference to wherein the mailbox associated with the global scheduler comprises a command queue configured to store the first message.
Vembu teaches wherein the mailbox associated with the global scheduler comprises a command queue (batch command buffer) configured to store the first message (“…FIG. 16B shows system graphical interface according to embodiment 1602. system graphical interface 1602 comprises interrupt unit 1612, an equipment interface 1614, a doorbell 1603, a system/device address translator 1616 and submit 1618 the batch buffer. message signal interrupt unit 1612 be configured as remote or main interrupt unit, and may send a value in the interrupt register unit 1612 in the storing of the generated interrupt (MSI). the device interface 1614 can include hardware to enable graphics system as a whole or as a separate fragment is presented as the interface bus (such as, but not limited to PCIe bus). the doorbell 1603 is a plurality of doorbell interface can submit work load 1604 through it in one, wherein the working load 1604 may be the work load 1604A-1604D of FIG. 16A in any one of them. the doorbell 1603 may be a doorbell structure or register, which may be used to the engine block fragment notice can be used for processing the work request is associated. In one embodiment, in a batch command buffer (e.g., batch buffer) provided in the form of a work request. can be submitted 1618 for processing batch buffer via a batch buffer. In one embodiment, the batch buffer 1618 can use a system/device address converter 1616 from the system address into a device local address for engine block segment. then batch buffer commands can be submitted to the engine block fragment is associated…FIG. 16C shows the engine block 1605 may receive workload from the application or driving program via a system graphical interface. engine block segment 1605 comprising a plurality of engine can process the command received from the host system. one or more block engine may be executing unit 1629A-1629N to execute basic instructions of various operations and perform these commands. engine block slice 1605 further comprises scheduler 1621, the local scheduler of the scheduler 1621 is for the engine block 1605, the scheduling instructions by the command and/or dispatch of the fragmentation processing for executing on the execution unit 1629A-1629N…”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention to modify the system of Chen, Milojicic and Duault with the teaching of Vembu because the teaching of Vembu would improve the system of Chen, Milojicic and Duault by providing a technique for processing tasks in a batch fashion and thus conserving computing resources.
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over CN. No. 112114951 A to Chen et al. in view of in view of U.S. Pub. No. 2022/0291952 A1 to Milojicic et al. and further in view of U.S. Pub. No. 5,428,781 issued to Duault et al. as applied to claim 10 above, and further in view of CN No. 111427680 A to Vembu et al. as applied to claim 13 above, and further in view of U.S. Pub. No. 2013/0155080 A1 to Nordlund et al.
As to claim 14, Chen as modified by Milojicic and Duault teaches the method as recited in claim 13, however it is silent with reference to wherein the command queue is configured to store a predetermined number of messages in a first-in-first-out mode.
Nordlund teaches wherein the command queue is configured to store a predetermined number of messages in a first-in-first-out mode (first in first out (FIFO) registers) (“… According to aspects of this disclosure, command processor 56 may initially parse the received command stream and identify each task that is to be performed by GPU 48. In addition to parsing the tasks from the command stream, command processor 56 may maintain a command queue for organizing each of the tasks to be executed by the components of GPU 48. For example, command processor 56 may schedule tasks to be executed by the components of GPU 48 (such as shader processor 52 and/or fixed function units 54) using the command queue. In some examples, the command queues may be fixed function hardware units (e.g., first in first out (FIFO) registers, or the like). In other examples, the command queues may be general memory or register units…” paragraph 0064).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claim invention