Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Response to Arguments
Applicant’s arguments with respect to claim(s) 12/18/2025 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-3, 5-6, 9-11, 13-16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Cui (U.S. Patent Application Pub. No. 2020/0042362) in view of Yao, et al., hereinafter Yao ("TS-BatPro: Improving Energy Efficiency in Data Centers by Leveraging Temporal–Spatial Batching", March 2019, IEEE Transactions on Green Communications and Networking Vol. 3, Pages 236-249).
Regarding Claim 1, Cui teaches: A batching system for improving execution of machine-learning tasks (Cui, Para. 0007 – “a computing system comprising a deep learning computing platform, which implements a self-adaptive batch dataset partitioning control method”), comprising:
one or more processors (Cui, Para. 0033 – “processors”); and
a memory communicably coupled to the one or more processors (Cui, Para. 0033-0035 – “system memory” which the processors interface and communicate with) and storing:
a control module including instructions that, when executed by the one or more processors (Cui, Para. 0018 and 0033-0035 – a computing system comprising a “deep learning computing platform” hosted on a “computing node” including “program instructions and data” processed and executed by the processors), cause the one or more processors to:
receive, in a queue, tasks for execution, the tasks being requests to execute a machine-learning model (Cui, Para. 0018, 0033 and 0072-0073 – where the computing system having a “computing node” which hosts a “deep learning computing platform” receives “service requests” which are “stored in the request queue”; where the service requests are “for executing HPC jobs on the server cluster 660 (e.g., distributed DL training, or other HPC jobs)”);
evaluate a current state of the queue according to a batching model to determine when to execute a batch of the taskswhere a training model receives “timing information” for a given “mini-batch iteration” which indicates an amount of “time taken by the accelerators” to “complete the processing of the respective sub-batch datasets”, where the timing information is used to determine “an optimal job partition ratio for partitioning a mini-batch dataset into sub-batch datasets for processing by the accelerator resources”, such that the cost is the usage of the available accelerator resources; where a “computing resource scheduling and provisioning module” implements “protocol for selecting, allocating, scheduling and provisioning one or more GPU server nodes and associated accelerator resources… [for executing] workloads associated with client service requests, depending on various factors including, but not limited to, the available GPU devices and processing resources of the GPU server nodes, the nature of the GPU processing tasks associated with the service request, user-specified conditions and resource demands for executing a given job, conditions based on a service level agreement (SLA) with the given client, predefined policies of the service provider for handing specific types of jobs, etc.” for example by “queue-based GPU virtualization and management systems”),
responsive to determining that the cost satisfies a batch threshold, control a batching processor to execute the batch using the machine-learning model (Cui, Para. 0049, 0057, and 0074 – where “an optimal job partition ratio for partitioning a mini-batch dataset into sub-batch datasets for processing by the accelerator resources” is determined by satisfying a pre-defined completion time standard deviation threshold value, and the computing module provisions resources to “execute pending jobs in the request queue”),
Cui does not teach evaluate a current state of the queue according to a batching model to determine when to execute a batch of the tasks that have been received in the queue by generating a cost of executing the batch that is a current batch at a current time based on a combination of a latency and an energy consumption, including determining whether to delay execution of the batch and increase a latency of execution by waiting for additional requests that increase the batch size to reduce an energy consumption cost of the batch, nor does Cui teach wherein the batch threshold defines a limit for the cost that optimally balances the latency with energy consumption, the batch threshold defining an amount of time by which to increase the latency and to wait for further tasks.
However, Yao teaches evaluate a current state of the queue according to a batching model (Yao, Pages 238-241 – a “two stage queuing model” which determines an “arrival rate” of “job arrivals”, or current state of the queue) to determine when to execute a batch of the tasks that have been received in the queue by generating a cost of executing the batch that is a current batch at a current time based on a combination of a latency and an energy consumption (Yao, Pages 238-242 – determining for a “number of jobs in a batch”, a “runtime” and a “load”, wherein the number of jobs in a batch is determined based on “energy optimization” while meeting “tail latency targets”), including determining whether to delay execution of the batch and increase a latency of execution by waiting for additional requests that increase the batch size to reduce an energy consumption cost of the batch (Yao, Pages 238-242 – wherein the model determines a “number of jobs in a batch”, or a maximum “batching parameter K”, based on a current state of the queue, including variables such as “arrival rate”, “batching delay and queuing delay” for a job, etc., wherein the parameter K is “derived by repetitively incrementing K” until a “target tail latency” is no longer satisfied, where, to prevent “suboptimal energy savings”, the model implements a “load predictor” which determines a load for the batching parameter K, and based on the load, a “new K value is computed and is used in the next epoch”; where the batching parameter K is routinely determined), and wherein the batch threshold defines a limit for the cost that optimally balances the latency with energy consumption, the batch threshold defining an amount of time by which to increase the latency and to wait for further tasks (Yao, Pages 238-242 – wherein the model temporally determines a “number of jobs in a batch”, or a maximum “batching parameter K”, based on a current state of the queue, wherein the parameter K is “derived by repetitively incrementing K” until a “target tail latency” is no longer satisfied, where, to prevent “suboptimal energy savings”, the model implements a “load predictor” which determines a load for the batching parameter K, and based on the load, a “new K value is computed and is used in the next epoch”, in order to perform “energy optimization” while meeting a quality of service latency target).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Cui to include evaluate a current state of the queue according to a batching model to determine when to execute a batch of the tasks that have been received in the queue by generating a cost of executing the batch that is a current batch at a current time based on a combination of a latency and an energy consumption, including determining whether to delay execution of the batch and increase a latency of execution by waiting for additional requests that increase the batch size to reduce an energy consumption cost of the batch, and wherein the batch threshold defines a limit for the cost that optimally balances the latency with energy consumption, the batch threshold defining an amount of time by which to increase the latency and to wait for further tasks, as taught by Yao, in order to optimize data energy consumption while remaining within latency critical quality of service constraints (Yao, Abstract).
In regards to Claim 2, Cui in view of Yao teaches the batching system of Claim 1, and Cui further teaches wherein the control module includes instructions to evaluate the current state including instructions to dynamically adapt a batch size for the batch to optimize execution of the batch using the machine-learning model (Cui, Para. 0032 – “the self-adaptive batch dataset partitioning control module 53 implements an iterative batch size tuning process which is configured to determine an optimal job partition ratio for partitioning mini-batch datasets into sub-batch datasets for processing by a set of hybrid accelerator resources during a data-parallel DL model training process”), wherein the control module includes instructions to evaluate the current state using the batching model including instructions to determine a batch size for the batch to control when the batch executes according to parameters (Cui, Para. 0053 – where a training set for a deep learning model uses adjusted batch sizes for each dataset, where batch size is based on “various factors such as, e.g., the overall size of the training dataset, the desired speed of convergence of the learning process, the number (N) of accelerator resources provisioned”; and where the model is compiled with parameter settings which are initialized before training and include “a standard deviation (SD) threshold value (L.sub.0), a job partition ratio adjustment value (K.sub.0), and a maximum iteration value (T.sub.0)”), but Cui does not teach parameters that define a tradeoff between latency and energy consumption.
However, Yao teaches parameters that define a tradeoff between latency and energy consumption (Yao, Page 238-242 – determining “energy-latency tradeoffs” in order to satisfy quality of service constraints, i.e. target tail latencies, and energy savings).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the batching system including the above limitations of Cui in view of Yao to further include parameters that define a tradeoff between latency and energy consumption, as taught by Yao, in order to determine an operating point that balances speed and the cost of energy to improve the efficiency of a machine learning model.
In regards to Claim 3, Cui in view of Yao teaches the batching system of Claim 2, and Cui in view of Yao further teaches wherein the control module includes instructions to evaluate the current state to determine whether to delay execution of the batch (Cui, Para. 0075-0076 – “computing resource scheduling and provisioning module 642 can implement any suitable method or protocol for selecting, allocating, scheduling and provisioning one or more GPU server nodes and associated accelerator resources (e.g., GPU devices) for executing HPC workloads”, for example “in one embodiment, the utilization of the GPU device is shared temporally, wherein a given GPU device can be allocated to two or more client systems, and wherein the tasks of the two or more client systems are executed on the same allocated GPU device at different times”, such that one task is delayed) and increase a latency of execution for the batch by increasing the batch size (Cui, Para. 0049-0053 – where a training set for a deep learning model uses adjusted batch sizes for each dataset, where batch size is based on “various factors such as, e.g., the overall size of the training dataset, the desired speed of convergence of the learning process, the number (N) of accelerator resources provisioned”; and where the model is compiled with parameter settings which are initialized before training and include “a standard deviation (SD) threshold value (L.sub.0), a job partition ratio adjustment value (K.sub.0), and a maximum iteration value (T.sub.0)”; where by increasing the batch size, the amount of processing time, or latency, increases).
In regards to Claim 5, Cui in view of Yao teaches the batching system of Claim 1, and Cui further teaches wherein the current state indicates whether a batch is currently executing (Cui, Para. 0073-0076 – where “a service request”, in a request queue, “may specify (i) a desired number (N) of accelerator devices (e.g., GPU devices) to provision for the requested job” and accelerator resources are provisioned for jobs based on “available GPU devices and processing resources of the GPU server nodes”, such that those which are unavailable are currently processing another batch of requested jobs), but Cui does not teach wherein the current state indicates at least an arrival rate of the tasks into the queue.
However, Yao teaches wherein the current state indicates at least an arrival rate of the tasks into the queue (Yao, Page 239-240 – “arrival rate” of job arrivals to the system and queued).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the batching system including the above limitations of Cui in view of Yao to further include wherein the current state indicates at least an arrival rate of the tasks into the queue, as taught by Yao, in order to account for the rate of incoming tasks when determining a batch size, such that the batch does not violate a constraint.
In regards to Claim 6, Cui in view of Yao teaches the batching system of Claim 1, and Cui further teaches wherein the control module includes instructions to evaluate the current state using the batching model including instructions to apply dynamic programming to recast a cost objective as a recursive function that is a sum of current costs and an expected cost for subsequent transitions (Cui, Para. 0048-0051 and 0075-0076 – where a training model receives “timing information” for a given “mini-batch iteration” which indicates an amount of “time taken by the accelerators” to “complete the processing of the respective sub-batch datasets”, where the timing information is used to determine “an optimal job partition ratio for partitioning a mini-batch dataset into sub-batch datasets for processing by the accelerator resources”, such that the cost is the usage of the available accelerator resources, where time taken is the cost; where the training model runs an iterative load balancing process on sub-batch datasets such that it is repeated, or recursive), but Cui does not teach costs including at least a latency cost, and an energy cost.
However, Yao teaches costs including at least a latency cost, and an energy cost (Yao, Abstract – “energy consumption” and latency critical “quality of service (QoS) constraints” on “tail latencies”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the batching system including the above limitations of Cui in view of Yao to further include costs including at least a latency cost, and an energy cost, as taught by Yao, in order to improve the efficiency of a machine learning model while minimizing the resource footprint.
Regarding Claim 9, Cui teaches: A non-transitory computer-readable medium storing instructions (Cui, Para. 0018 and 0033-0035, and Claim 11 – a computing system comprising a “deep learning computing platform” hosted on a “computing node” including “program instructions and data” processed and executed by the processors; where the program instructions are stored on “a processor-readable storage medium”) for improving execution of machine-learning tasks and that, when executed by one or more processors (Cui, Para. 0033 – “processors”), cause the one or more processors to:
receive, in a queue, tasks for execution, the tasks being requests to execute a machine- learning model (Cui, Para. 0018, 0033 and 0072-0073 – where the computing system having a “computing node” which hosts a “deep learning computing platform” receives “service requests” which are “stored in the request queue”; where the service requests are “for executing HPC jobs on the server cluster 660 (e.g., distributed DL training, or other HPC jobs)”);
evaluate a current state of the queue according to a batching model to determine when to execute a batch of the tasks that have been received in the queue by generating a cost of executing the batch that is a current batch at a current time (Cui, Para. 0049 and 0075-0076 – where a training model receives “timing information” for a given “mini-batch iteration” which indicates an amount of “time taken by the accelerators” to “complete the processing of the respective sub-batch datasets”, where the timing information is used to determine “an optimal job partition ratio for partitioning a mini-batch dataset into sub-batch datasets for processing by the accelerator resources”, such that the cost is the usage of the available accelerator resources; where a “computing resource scheduling and provisioning module” implements “protocol for selecting, allocating, scheduling and provisioning one or more GPU server nodes and associated accelerator resources… [for executing] workloads associated with client service requests, depending on various factors including, but not limited to, the available GPU devices and processing resources of the GPU server nodes, the nature of the GPU processing tasks associated with the service request, user-specified conditions and resource demands for executing a given job, conditions based on a service level agreement (SLA) with the given client, predefined policies of the service provider for handing specific types of jobs, etc.” for example by “queue-based GPU virtualization and management systems”),
responsive to determining that the cost satisfies a batch threshold, control a batching processor to execute the batch using the machine-learning model (Cui, Para. 0049, 0057, and 0074 – where “an optimal job partition ratio for partitioning a mini-batch dataset into sub-batch datasets for processing by the accelerator resources” is determined by satisfying a pre-defined completion time standard deviation threshold value, and the computing module provisions resources to “execute pending jobs in the request queue”),
Cui does not teach evaluate a current state of the queue according to a batching model to determine when to execute a batch of the tasks that have been received in the queue by generating a cost of executing the batch that is a current batch at a current time based on a combination of a latency and an energy consumption, including determining whether to delay execution of the batch and increase a latency of execution by waiting for additional requests that increase the batch size to reduce an energy consumption cost of the batch, nor does Cui teach wherein the batch threshold defines a limit for the cost that optimally balances the latency with energy consumption, the batch threshold defining an amount of time by which to increase the latency and to wait for further tasks.
However, Yao teaches evaluate a current state of the queue according to a batching model (Yao, Pages 238-241 – a “two stage queuing model” which determines an “arrival rate” of “job arrivals”, or current state of the queue) to determine when to execute a batch of the tasks that have been received in the queue by generating a cost of executing the batch that is a current batch at a current time based on a combination of a latency and an energy consumption (Yao, Pages 238-242 – determining for a “number of jobs in a batch”, a “runtime” and a “load”, wherein the number of jobs in a batch is determined based on “energy optimization” while meeting “tail latency targets”), including determining whether to delay execution of the batch and increase a latency of execution by waiting for additional requests that increase the batch size to reduce an energy consumption cost of the batch (Yao, Pages 238-242 – wherein the model determines a “number of jobs in a batch”, or a maximum “batching parameter K”, based on a current state of the queue, including variables such as “arrival rate”, “batching delay and queuing delay” for a job, etc., wherein the parameter K is “derived by repetitively incrementing K” until a “target tail latency” is no longer satisfied, where, to prevent “suboptimal energy savings”, the model implements a “load predictor” which determines a load for the batching parameter K, and based on the load, a “new K value is computed and is used in the next epoch”; where the batching parameter K is routinely determined), and wherein the batch threshold defines a limit for the cost that optimally balances the latency with energy consumption, the batch threshold defining an amount of time by which to increase the latency and to wait for further tasks (Yao, Pages 238-242 – wherein the model temporally determines a “number of jobs in a batch”, or a maximum “batching parameter K”, based on a current state of the queue, wherein the parameter K is “derived by repetitively incrementing K” until a “target tail latency” is no longer satisfied, where, to prevent “suboptimal energy savings”, the model implements a “load predictor” which determines a load for the batching parameter K, and based on the load, a “new K value is computed and is used in the next epoch”, in order to perform “energy optimization” while meeting a quality of service latency target).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the non-transitory computer-readable medium of Cui to include evaluate a current state of the queue according to a batching model to determine when to execute a batch of the tasks that have been received in the queue by generating a cost of executing the batch that is a current batch at a current time based on a combination of a latency and an energy consumption, including determining whether to delay execution of the batch and increase a latency of execution by waiting for additional requests that increase the batch size to reduce an energy consumption cost of the batch, and wherein the batch threshold defines a limit for the cost that optimally balances the latency with energy consumption, the batch threshold defining an amount of time by which to increase the latency and to wait for further tasks, as taught by Yao, in order to optimize data energy consumption while remaining within latency critical quality of service constraints (Yao, Abstract).
In regards to Claim 10, Cui in view of Yao teaches the non-transitory computer-readable medium of Claim 9, and Cui further teaches wherein the instructions to evaluate the current state including instructions to dynamically adapt a batch size for the batch to optimize execution of the batch using the machine-learning model (Cui, Para. 0032 – “the self-adaptive batch dataset partitioning control module 53 implements an iterative batch size tuning process which is configured to determine an optimal job partition ratio for partitioning mini-batch datasets into sub-batch datasets for processing by a set of hybrid accelerator resources during a data-parallel DL model training process”), and wherein the instructions to evaluate the current state using the batching model including instructions to determine a batch size for the batch to control when the batch executes according to parameters (Cui, Para. 0053 – where a training set for a deep learning model uses adjusted batch sizes for each dataset, where batch size is based on “various factors such as, e.g., the overall size of the training dataset, the desired speed of convergence of the learning process, the number (N) of accelerator resources provisioned”; and where the model is compiled with parameter settings which are initialized before training and include “a standard deviation (SD) threshold value (L.sub.0), a job partition ratio adjustment value (K.sub.0), and a maximum iteration value (T.sub.0)”), but Cui does not teach parameters that define a tradeoff between latency and energy consumption.
However, Yao teaches parameters that define a tradeoff between latency and energy consumption (Yao, Page 238-242 – determining “energy-latency tradeoffs” in order to satisfy quality of service constraints, i.e. target tail latencies, and energy savings).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the non-transitory computer-readable medium including the above limitations of Cui in view of Yao to further include parameters that define a tradeoff between latency and energy consumption, as taught by Yao, in order to determine an operating point that balances speed and the cost of energy to improve the efficiency of a machine learning model.
In regards to Claim 11, Cui in view of Yao teaches non-transitory computer-readable medium of Claim 10, and Cui in view of Yao further teaches wherein the control module includes instructions to evaluate the current state to determine whether to delay execution of the batch (Cui, Para. 0075-0076 – “computing resource scheduling and provisioning module 642 can implement any suitable method or protocol for selecting, allocating, scheduling and provisioning one or more GPU server nodes and associated accelerator resources (e.g., GPU devices) for executing HPC workloads”, for example “in one embodiment, the utilization of the GPU device is shared temporally, wherein a given GPU device can be allocated to two or more client systems, and wherein the tasks of the two or more client systems are executed on the same allocated GPU device at different times”, such that one task is delayed) and increase a latency of execution for the batch by increasing the batch size (Cui, Para. 0049-0053 – where a training set for a deep learning model uses adjusted batch sizes for each dataset, where batch size is based on “various factors such as, e.g., the overall size of the training dataset, the desired speed of convergence of the learning process, the number (N) of accelerator resources provisioned”; and where the model is compiled with parameter settings which are initialized before training and include “a standard deviation (SD) threshold value (L.sub.0), a job partition ratio adjustment value (K.sub.0), and a maximum iteration value (T.sub.0)”; where by increasing the batch size, the amount of processing time, or latency, increases).
In regards to Claim 13, Cui in view of Yao teaches the non-transitory computer-readable medium of Claim 9, and Cui further teaches wherein the current state indicates whether a batch is currently executing (Cui, Para. 0073-0076 – where “a service request”, in a request queue, “may specify (i) a desired number (N) of accelerator devices (e.g., GPU devices) to provision for the requested job” and accelerator resources are provisioned for jobs based on “available GPU devices and processing resources of the GPU server nodes”, such that those which are unavailable are currently processing another batch of requested jobs), but Cui does not teach wherein the current state indicates at least an arrival rate of the tasks into the queue.
However, Yao teaches wherein the current state indicates at least an arrival rate of the tasks into the queue (Yao, Page 239-240 – “arrival rate” of job arrivals to the system and queued).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the non-transitory computer readable medium including the above limitations of Cui in view of Yao to further include wherein the current state indicates at least an arrival rate of the tasks into the queue, as taught by Yao, in order to account for the rate of incoming tasks when determining a batch size, such that the batch does not violate a constraint.
Regarding Claim 14, Cui teaches: A method (Cui, Para. 0052 – “a self-adaptive batch dataset partitioning control method”), comprising:
receiving, in a queue, tasks for execution, the tasks being requests to execute a machine- learning model (Cui, Para. 0018, 0033 and 0072-0073 – where the computing system having a “computing node” which hosts a “deep learning computing platform” receives “service requests” which are “stored in the request queue”; where the service requests are “for executing HPC jobs on the server cluster 660 (e.g., distributed DL training, or other HPC jobs)”);
evaluating a current state of the queue according to a batching model to determine when to execute a batch of the tasks where a training model receives “timing information” for a given “mini-batch iteration” which indicates an amount of “time taken by the accelerators” to “complete the processing of the respective sub-batch datasets”, where the timing information is used to determine “an optimal job partition ratio for partitioning a mini-batch dataset into sub-batch datasets for processing by the accelerator resources”, such that the cost is the usage of the available accelerator resources; where a “computing resource scheduling and provisioning module” implements “protocol for selecting, allocating, scheduling and provisioning one or more GPU server nodes and associated accelerator resources… [for executing] workloads associated with client service requests, depending on various factors including, but not limited to, the available GPU devices and processing resources of the GPU server nodes, the nature of the GPU processing tasks associated with the service request, user-specified conditions and resource demands for executing a given job, conditions based on a service level agreement (SLA) with the given client, predefined policies of the service provider for handing specific types of jobs, etc.” for example by “queue-based GPU virtualization and management systems”),
responsive to determining that the cost satisfies a batch threshold, control a batching processor to execute the batch using the machine-learning model (Cui, Para. 0049, 0057, and 0074 – where “an optimal job partition ratio for partitioning a mini-batch dataset into sub-batch datasets for processing by the accelerator resources” is determined by satisfying a pre-defined completion time standard deviation threshold value, and the computing module provisions resources to “execute pending jobs in the request queue”),
Cui does not teach evaluating a current state of the queue according to a batching model to determine when to execute a batch of the tasks that have been received in the queue by generating a cost of executing the batch that is a current batch at a current time based on a combination of a latency and an energy consumption, including determining whether to delay execution of the batch and increase a latency of execution by waiting for additional requests that increase the batch size to reduce an energy consumption cost of the batch, nor does Cui teach wherein the batch threshold defines a limit for the cost that optimally balances the latency with energy consumption, the batch threshold defining an amount of time by which to increase the latency and to wait for further tasks.
However, Yao teaches evaluating a current state of the queue according to a batching model (Yao, Pages 238-241 – a “two stage queuing model” which determines an “arrival rate” of “job arrivals”, or current state of the queue) to determine when to execute a batch of the tasks that have been received in the queue by generating a cost of executing the batch that is a current batch at a current time based on a combination of a latency and an energy consumption (Yao, Pages 238-242 – determining for a “number of jobs in a batch”, a “runtime” and a “load”, wherein the number of jobs in a batch is determined based on “energy optimization” while meeting “tail latency targets”), including determining whether to delay execution of the batch and increase a latency of execution by waiting for additional requests that increase the batch size to reduce an energy consumption cost of the batch (Yao, Pages 238-242 – wherein the model determines a “number of jobs in a batch”, or a maximum “batching parameter K”, based on a current state of the queue, including variables such as “arrival rate”, “batching delay and queuing delay” for a job, etc., wherein the parameter K is “derived by repetitively incrementing K” until a “target tail latency” is no longer satisfied, where, to prevent “suboptimal energy savings”, the model implements a “load predictor” which determines a load for the batching parameter K, and based on the load, a “new K value is computed and is used in the next epoch”; where the batching parameter K is routinely determined), and wherein the batch threshold defines a limit for the cost that optimally balances the latency with energy consumption, the batch threshold defining an amount of time by which to increase the latency and to wait for further tasks (Yao, Pages 238-242 – wherein the model temporally determines a “number of jobs in a batch”, or a maximum “batching parameter K”, based on a current state of the queue, wherein the parameter K is “derived by repetitively incrementing K” until a “target tail latency” is no longer satisfied, where, to prevent “suboptimal energy savings”, the model implements a “load predictor” which determines a load for the batching parameter K, and based on the load, a “new K value is computed and is used in the next epoch”, in order to perform “energy optimization” while meeting a quality of service latency target).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Cui to include evaluating a current state of the queue according to a batching model to determine when to execute a batch of the tasks that have been received in the queue by generating a cost of executing the batch that is a current batch at a current time based on a combination of a latency and an energy consumption, including determining whether to delay execution of the batch and increase a latency of execution by waiting for additional requests that increase the batch size to reduce an energy consumption cost of the batch, and wherein the batch threshold defines a limit for the cost that optimally balances the latency with energy consumption, the batch threshold defining an amount of time by which to increase the latency and to wait for further tasks, as taught by Yao, in order to optimize data energy consumption while remaining within latency critical quality of service constraints (Yao, Abstract).
In regards to Claim 15, Cui in view of Yao teaches the method of Claim 14, and Cui further teaches wherein evaluating the current state includes dynamically adapting a batch size for the batch to optimize execution of the batch using the machine-learning model (Cui, Para. 0032 – “the self-adaptive batch dataset partitioning control module 53 implements an iterative batch size tuning process which is configured to determine an optimal job partition ratio for partitioning mini-batch datasets into sub-batch datasets for processing by a set of hybrid accelerator resources during a data-parallel DL model training process”), and wherein evaluating the current state using the batching model includes determining a batch size for the batch to control when the batch executes according to parameters (Cui, Para. 0053 – where a training set for a deep learning model uses adjusted batch sizes for each dataset, where batch size is based on “various factors such as, e.g., the overall size of the training dataset, the desired speed of convergence of the learning process, the number (N) of accelerator resources provisioned”; and where the model is compiled with parameter settings which are initialized before training and include “a standard deviation (SD) threshold value (L.sub.0), a job partition ratio adjustment value (K.sub.0), and a maximum iteration value (T.sub.0)”), but Cui does not teach parameters that define a tradeoff between latency and energy consumption.
However, Yao teaches parameters that define a tradeoff between latency and energy consumption (Yao, Page 238-242 – determining “energy-latency tradeoffs” in order to satisfy quality of service constraints, i.e. target tail latencies, and energy savings).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method including the above limitations of Cui in view of Yao to further include parameters that define a tradeoff between latency and energy consumption, as taught by Yao, in order to determine an operating point that balances speed and the cost of energy to improve the efficiency of a machine learning model.
In regards to Claim 16, Cui in view of Yao teaches the method of Claim 15, and Cui in view of Yao further teaches wherein the control module includes instructions to evaluate the current state to determine whether to delay execution of the batch (Cui, Para. 0075-0076 – “computing resource scheduling and provisioning module 642 can implement any suitable method or protocol for selecting, allocating, scheduling and provisioning one or more GPU server nodes and associated accelerator resources (e.g., GPU devices) for executing HPC workloads”, for example “in one embodiment, the utilization of the GPU device is shared temporally, wherein a given GPU device can be allocated to two or more client systems, and wherein the tasks of the two or more client systems are executed on the same allocated GPU device at different times”, such that one task is delayed) and increase a latency of execution for the batch by increasing the batch size (Cui, Para. 0049-0053 – where a training set for a deep learning model uses adjusted batch sizes for each dataset, where batch size is based on “various factors such as, e.g., the overall size of the training dataset, the desired speed of convergence of the learning process, the number (N) of accelerator resources provisioned”; and where the model is compiled with parameter settings which are initialized before training and include “a standard deviation (SD) threshold value (L.sub.0), a job partition ratio adjustment value (K.sub.0), and a maximum iteration value (T.sub.0)”; where by increasing the batch size, the amount of processing time, or latency, increases).
In regards to Claim 19, Cui teaches the method of Claim 14, and Cui further teaches wherein the control module includes instructions to evaluate the current state using the batching model including instructions to apply dynamic programming to recast a cost objective as a recursive function that is a sum of current costs and an expected cost for subsequent transitions (Cui, Para. 0048-0051 and 0075-0076 – where a training model receives “timing information” for a given “mini-batch iteration” which indicates an amount of “time taken by the accelerators” to “complete the processing of the respective sub-batch datasets”, where the timing information is used to determine “an optimal job partition ratio for partitioning a mini-batch dataset into sub-batch datasets for processing by the accelerator resources”, such that the cost is the usage of the available accelerator resources, where time taken is the cost; where the training model runs an iterative load balancing process on sub-batch datasets such that it is repeated, or recursive), but Cui does not teach costs including at least a latency cost, and an energy cost.
However, Yao teaches costs including at least a latency cost, and an energy cost (Yao, Abstract – “energy consumption” and latency critical “quality of service (QoS) constraints” on “tail latencies”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the batching system including the above limitations of Cui in view of Yao to further include costs including at least a latency cost, and an energy cost, as taught by Yao, in order to improve the efficiency of a machine learning model while minimizing the resource footprint.
Claim(s) 4, 12, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Cui in view Yao, and further in view of Cooper (U.S. Patent Application Pub. No. 2023/0041290).
In regards to Claim 4, Cui in view of Yao teaches the batching system of Claim 1, and Cui further teaches wherein the machine-learning model is a deep neural network (DNN) (Cui, Para. 0005 – “deep learning model”).
Cui does not teach wherein the batching model is a probabilistic model that is based on a Markov Chain Model, and parameters define at least a regularization parameter .
However, Cooper teaches wherein the batching model is a probabilistic model that is based on a Markov Chain Model (Cooper, Para. 0029 – “a neural network” which may include “a Markov chain neural network”), and parameters define at least a regularization parameter (Cooper, Para. 0094 – “hyperparameters” which may include “one or more regularization terms” which may be varied, added, or removed).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further modified the batching system including the above limitations of Cui in view of Yao to include wherein the batching model is a probabilistic model that is based on a Markov Chain Model, and parameters define at least a regularization parameter, as taught by Cooper, in order to utilize a probabilistic model that is flexible and capable of handling uncertainty while maintaining simplicity, and to improve prediction accuracy of the model.
In regards to Claim 12, Cui in view of Yao teaches the non-transitory computer-readable medium of Claim 9, and Cui further teaches wherein the machine-learning model is a deep neural network (DNN) (Cui, Para. 0005 – “deep learning model”).
Cui does not teach wherein the batching model is a probabilistic model that is based on a Markov Chain Model, and parameters define at least a regularization parameter .
However, Cooper teaches wherein the batching model is a probabilistic model that is based on a Markov Chain Model (Cooper, Para. 0029 – “a neural network” which may include “a Markov chain neural network”), and parameters define at least a regularization parameter (Cooper, Para. 0094 – “hyperparameters” which may include “one or more regularization terms” which may be varied, added, or removed).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further modified the batching system including the above limitations of Cui in view of Yao to include wherein the batching model is a probabilistic model that is based on a Markov Chain Model, and parameters define at least a regularization parameter, as taught by Cooper, in order to utilize a probabilistic model that is flexible and capable of handling uncertainty while maintaining simplicity, and to improve prediction accuracy of the model.
In regards to Claim 17, Cui in view of Yao teaches the method of Claim 14, and Cui further teaches wherein the machine-learning model is a deep neural network (DNN) (Cui, Para. 0005 – “deep learning model”).
Cui does not teach wherein the batching model is a probabilistic model that is based on a Markov Chain Model, and parameters define at least a regularization parameter .
However, Cooper teaches wherein the batching model is a probabilistic model that is based on a Markov Chain Model (Cooper, Para. 0029 – “a neural network” which may include “a Markov chain neural network”), and parameters define at least a regularization parameter (Cooper, Para. 0094 – “hyperparameters” which may include “one or more regularization terms” which may be varied, added, or removed).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention o have further modified the batching system including the above limitations of Cui in view of Yao to include wherein the batching model is a probabilistic model that is based on a Markov Chain Model, and parameters define at least a regularization parameter, as taught by Cooper, in order to utilize a probabilistic model that is flexible and capable of handling uncertainty while maintaining simplicity, and to improve prediction accuracy of the model.
Claim(s) 7, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Cui in view of Yao, and further in view of Padmanabha Iyer, et al., hereinafter Iyer (U.S. Patent Application Pub. No. 2023/0342278).
In regards to Claim 7, Cui in view of Yao teaches the batching system of Claim 1, and Cui further teaches wherein receiving the tasks includes receiving the tasks from the respective remote devices that are offloading the tasks for execution (Cui, Para. Fig. 6 and 0073 – where “client systems” send “service requests” to the “service controller” to execute the service requests; where the client systems communicate with the computing service platform over a communications network, such that they are remote), but Cui does not teach wherein the control module includes instructions to communicate results of the batch after execution to respective remote devices.
However, Iyer teaches wherein the control module includes instructions to communicate results of the batch after execution to respective remote devices (Iyer, Para. 0046-0049 – where users send requests and a system receives the requests and provides inferences, or results, to the user in response to the request; where the user device is remote from the device containing the system, which executes batches of requests).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further have modified the batching system including the above limitations of Cui in view of Yao to include wherein the control module includes instructions to communicate results of the batch after execution to respective remote devices, as taught by Iyer, in order to provide instructions to communicate the results of a user requested task to the user once the task is executed by the remote device.
In regards to Claim 18, Cui in view of Yao teaches the method of Claim 14, and Cui further teaches wherein the current state indicates whether a batch is currently executing (Cui, Para. 0073-0076 – where “a service request”, in a request queue, “may specify (i) a desired number (N) of accelerator devices (e.g., GPU devices) to provision for the requested job” and accelerator resources are provisioned for jobs based on “available GPU devices and processing resources of the GPU server nodes”, such that those which are unavailable are currently processing another batch of requested jobs), but Cui does not teach wherein the current state indicates at least an arrival rate of the tasks into the queue.
However, Iyer teaches wherein the current state indicates at least an arrival rate of the tasks into the queue (Iyer, Para. 0063 – “request rate of R requests per second” used in determining “an optimal number of splits for the machine learning model 16 using a dynamic programming based optimization”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further have modified the batching system including the above limitations of Cui in view of Yao to include wherein the current state indicates at least an arrival rate of the tasks into the queue, as taught by Iyer, in order to account for the rate of incoming requested tasks when determining a batch size, such that the batch does not violate a cost threshold.
In regards to Claim 20, Cui in view of Yao teaches the method of Claim 14, and Cui further teaches further comprising: wherein receiving the tasks includes receiving the tasks from the respective remote devices that are offloading the tasks for execution (Cui, Para. Fig. 6 and 0073 – where “client systems” send “service requests” to the “service controller” to execute the service requests; where the client systems communicate with the computing service platform over a communications network, such that they are remote), but Cui does not teach communicating results of the batch after execution to respective remote devices.
However, Iyer teaches communicating results of the batch after execution to respective remote devices (Iyer, Para. 0046-0049 – where users send requests and a system receives the requests and provides inferences, or results, to the user in response to the request; where the user device is remote from the device containing the system, which executes batches of requests).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further have modified the batching system including the above limitations of Cui in view of Yao to include communicating results of the batch after execution to respective remote devices, as taught by Iyer, in order to provide instructions to communicate the results of a user requested task to the user once the task is executed by the remote device.
Claim(s) 8 is rejected under 35 U.S.C. 103 as being unpatentable over Cui in view of Yao, and further in view of Roe (U.S. Patent Application Pub. No. 2021/0365617).
In regards to Claim 8, Cui in view of Yao teaches the batching system of Claim 1, but Cui does not teach wherein the tasks are generated by a vehicle for performing functions in relation to autonomous driving.
However, Roe teaches wherein the tasks are generated by a vehicle for performing functions in relation to autonomous driving (Roe, Para. 0048 and 0063 – a method for “optimizing control inputs, for example in robots or autonomous vehicles”, where the method is for training neural networks, including adjusting batches and batch sizes).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further modified the batching system including the above limitations of Cui in view of Yao to include wherein the tasks are generated by a vehicle for performing functions in relation to autonomous driving, as taught by Roe, in order to use a method of efficient training for a model to improve accuracy in functions performed during autonomous driving.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Andrews, et al., hereinafter Andrews (U.S. Patent Application Pub. No. 2013/0243009) teaches network device of a communication network is configured to implement coordinated scheduling and processor rate control including determining an optimal queue-energy tradeoff.
Gupta Hyde, et al. (U.S. Patent Application Pub. No. 2023/0195531) teaches a task modeling system configured to receive input data representing a plurality of processing tasks to be completed by the processing client within a predefined time duration and achieve a power usage within a predefined threshold for the plurality of processing cores during the predefined time duration.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HELEN LI whose telephone number is (703)756-4719. The examiner can normally be reached Monday through Friday, from 9am to 5pm eastern.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hunter Lonsberry can be reached at (571) 272-7298. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/H.L./Examiner, Art Unit 3665
/HUNTER B LONSBERRY/Supervisory Patent Examiner, Art Unit 3665