DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-22 are pending for examination.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-11 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim language in the following claims is not clearly understood:
As per claim 1, line 10-12, it is unclear whether “respective hardware architectures” are referring to “accelerator hardware architectures” in line 6 (i.e. consistent term should be used with “the” or “said” if they are the same);
As per claim 4, line 3, it is unclear whether “accelerators” are referring to “accelerators” in claim 1 (i.e. consistent term should be used with “the” or “said” if they are the same);
As per claims 2-11, they depend from rejected claim and do not resolve the deficiencies thereof and are therefore rejected for at least the same reasons.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception, an abstract idea, as it has not been integrated into practical application and the claims further do not recite significantly more than the judicial exception. Examiner has evaluated the claims under the framework provided in the 2019 Patent Eligibility Guidance published in the Federal Register 01/07/2019 and has provided such analysis below.
Step 1: Claims 1-17 are systems which fall within the statutory category of machines, and claims 18-20 are methods and thus fall within the statutory category of processes. Therefore, “Are the claims to a process, machine, manufacture or composition of matter?” Yes.
In order to evaluate the Step 2A inquiry “Is the claim directed to a law of nature, a natural phenomenon or an abstract idea?” we must determine whether the claim recites a law of nature, a natural phenomenon or an abstract idea and further whether the claim recites additional elements that integrate the judicial exception into a practical application.
Step 2A Prong 1:
Claim 1: The limitation of “graph the DL workloads to operators”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “graph” in the context of this claim encompasses a person thinking about the relationships of the workloads to operators and making observations, evaluations, judgements and opinions, using pen and paper. The result of such a thought process is the generation of a graph.
Similarly, the limitation of “evaluate combinations of accelerators of the accelerator core types for collectively accomplishing the operators” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “evaluate” in the context of this claim encompasses a person thinking about the observations, evaluations, judgements and opinions regarding a list of accelerators and their types for executing the operators, as thought of mentally above, to evaluate which accelerator types are capable of accomplishing the operator.
Similarly, the limitation of “generate a ranking of accelerator hardware architectures” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “generate a ranking” in the context of this claim encompasses a person observing evaluations on different combination of accelerators for performing the workload, and making observations, evaluations, judgements and opinions regarding which accelerator hardware architectures are a best fit based on the evaluation. The result of such a thought process is the generation of a ranking.
Additional Elements are evaluated below.
Claim 2: Similarly, the limitation of “extract the graph from a training script of the DL workload” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “extract the graph” in the context of this claim encompasses a person thinking about the relationships of the workloads to operators and making observations, evaluations, judgements and opinions on a training script, using pen and paper.
Claim 3: Similarly, the limitation of “map the operators to accelerator core types”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “map” in the context of this claim encompasses a person thinking about the relationships of the workloads to operators and making observations, evaluations, judgements and opinions, using pen and paper.
Claim 4: Similarly, the limitation of “partition DL models of the DL workload across accelerators”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “partition” in the context of this claim encompasses a person thinking about the relationships of the workloads to operators and making observations, evaluations, judgements and opinions, using pen and paper.
Claim 5: Similarly, the limitation of “perform operator graph optimizations that consider a memory hierarchy of the accelerator core type” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “perform operator graph optimizations” in the context of this claim encompasses a person thinking about the relationships of the workloads to operators and making observations, evaluations, judgements and opinions based on the information regarding a memory hierarchy of the accelerators, using pen and paper.
Claim 6: Similarly, the limitation of “annotate latency estimations associated with execution of individual graph operators and to determine power usage of potential architectures defined by the architectural template” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “annotate latency estimations… determine power usage…” in the context of this claim encompasses a person thinking about the latency estimations associated with execution, and making observations, evaluations, judgements and opinions regarding the latency and power usage of the architectural template, and annotate the graph using pen and paper.
Claim 7: Similarly, the limitation of “identify critical operators and determine latencies of the operators” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “identify critical operators” in the context of this claim encompasses a person thinking about the latencies of operators and making observations, evaluations, judgements and opinions regarding the critical operators based on the evaluation.
Claim 8: Similarly, the limitation of “perform a local search to determine a design of a single accelerator core type” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “perform local search to determine a design…” in the context of this claim encompasses a person observing the list of architectural templates and making observations, evaluations, judgements and opinions regarding the design of an accelerator, using pen and paper.
Claim 9: Similarly, the limitation of “track relative performance of hardware architectures identified by the global architecture search module” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “track relative performance…” in the context of this claim encompasses a person monitoring the performance of the hardware architectures and making observations, evaluations, judgements and opinions regarding the performance, using pen and paper.
Claim 10: Similarly, the limitation of “annotate each operator in the graph with a corresponding accelerator core type that the operator executed on, latency to execute on the corresponding accelerator core type, and energy expended during the execution on the corresponding accelerator core type”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “annotate… with the type of core the operator executed on, latency… energy expended…” in the context of this claim encompasses a person thinking about the type of core, latency and energy expended associated with the execution, and making observations, evaluations, judgements and opinions to annotate the graph using pen and paper.
Additional Elements are evaluated below.
Claim 11: the limitation of “identify splits in the designs associated with accelerator latency bottlenecks” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “identify splits in the designs…” in the context of this claim encompasses a person thinking about the splits in the designs and making observations, evaluations, judgements and opinions regarding the splits based on the evaluation of accelerator latency bottlenecks.
Claim 12: The limitation of “generate a graph of operators for the DL model”, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “generate a graph” in the context of this claim encompasses a person thinking about the relationships of the workloads to operators and making observations, evaluations, judgements and opinions, using pen and paper. The result of such a thought process is the generation of a graph.
Similarly, the limitation of “identify a first portion of the graph of operators to perform with a first accelerator core of the first accelerator core type and a second portion of the graph of operators to perform with a second accelerator core of the second accelerator core type” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “identify” in the context of this claim encompasses a person thinking about the observations, evaluations, judgements and opinions regarding a list of accelerators and their types for executing the operators, as thought of mentally above, to evaluate which accelerator types are capable of accomplishing the operator.
Similarly, the limitation of “generate a recommended hardware architecture for an a hardware chip that includes the first accelerator core and the second accelerator core” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. For example, “generate a recommended hardware architecture” in the context of this claim encompasses a person observing evaluations on different combination of accelerators for performing the workload, and making observations, evaluations, judgements and opinions regarding which accelerator hardware architectures are a best fit based on the evaluation. The result of such a thought process is the generation of a recommended hardware architecture.
Additional Elements are evaluated below.
Claim 13: Additional Elements are evaluated below.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, then it falls within the “Mental Processes” grouping of abstract ideas. Therefore, “Do the claims recite an abstract idea, law of nature or natural phenomenon? Yes, the claim recites an abstract idea.
Step 2A Prong 2:
Claim 1: The judicial exception is not integrated into a practical application. In particular, the claim only recites two additional elements - obtain multiple user deep learning (DL) workloads, a local architecture search module configured to receive an architectural template that relates to accelerator core types and evaluate individual accelerator types at accomplishing sub-sets of the operators. “Obtain multiple user DL workloads… “ and “receive an architectural template…” is merely insignificant pre-solution data gathering activity which does not meaningfully limit the judicial exception, see MPEP § 2106.05(g).
Claim 10: The judicial exception is not integrated into a practical application. In particular, the claim only recites element - “receive a set of architectural designs”, which is merely insignificant pre-solution data gathering activity which does not meaningfully limit the judicial exception, see MPEP § 2106.05(g).
Claim 12: The judicial exception is not integrated into a practical application. In particular, the claim only recites two additional elements - obtain a deep learning (DL) model for accomplishing a workload; receive an architectural template that relates to multiple accelerator cores types. “Obtain a deep learning (DL) model…” and “receive an architectural template…” is merely insignificant pre-solution data gathering activity which does not meaningfully limit the judicial exception, see MPEP § 2106.05(g).
Claim 13: The judicial exception is not integrated into a practical application. In particular, the claim only recites two additional elements - the workload comprises a training script for the DL model and wherein the architectural template defines areas of each core type and available chip area, which further define the DL workload and architectural template. “Obtain a deep learning (DL) model…for a workload” and “receive an architectural template…” is merely insignificant pre-solution data gathering activity which does not meaningfully limit the judicial exception, see MPEP § 2106.05(g).
Claims 2-9, 11: They do not recite any additional elements.
Therefore, “Do the claims recite additional elements that integrate the judicial exception into a practical application? No, these additional elements do not integrate the abstract idea into a practical application and they do not impose any meaningful limits on practicing the abstract idea. In addition, claims do not reflect the improvement described in the applicant’s specification. The claim is directed to an abstract idea.
After having evaluating the inquires set forth in Steps 2A Prong 1 and 2, it has been concluded that the claims not only recite a judicial exception but that the claims are directed to the judicial exception.
Step 2B:
Claim 1: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “Obtain multiple user DL workloads… “ and “receive an architectural template…” amount to no more than insignificant pre-solution data gathering (as evidenced by court decisions discussed in MPEP § 2103.05(g) and 2103.05(d)(II) and in accordance with the Office guidance in the memorandum addressing the decision in Berkheimer v. HP, Inc.) and field of use/technological environment without imposing meaningful limits on practicing the abstract idea and thus cannot provide an inventive concept.
Claim 10: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “receive a set of architectural designs” amount to no more than insignificant pre-solution data gathering (as evidenced by court decisions discussed in MPEP § 2103.05(g) and 2103.05(d)(II) and in accordance with the Office guidance in the memorandum addressing the decision in Berkheimer v. HP, Inc.) and field of use/technological environment without imposing meaningful limits on practicing the abstract idea and thus cannot provide an inventive concept.
Claim 12: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “Obtain a deep learning (DL) model…” and “receive an architectural template…” amount to no more than insignificant pre-solution data gathering (as evidenced by court decisions discussed in MPEP § 2103.05(g) and 2103.05(d)(II) and in accordance with the Office guidance in the memorandum addressing the decision in Berkheimer v. HP, Inc.) and field of use/technological environment without imposing meaningful limits on practicing the abstract idea and thus cannot provide an inventive concept.
Claim 13: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “Obtain a deep learning (DL) model…for a workload” and “receive an architectural template…” amount to no more than insignificant pre-solution data gathering (as evidenced by court decisions discussed in MPEP § 2103.05(g) and 2103.05(d)(II) and in accordance with the Office guidance in the memorandum addressing the decision in Berkheimer v. HP, Inc.) and field of use/technological environment without imposing meaningful limits on practicing the abstract idea and thus cannot provide an inventive concept.
Claims 2-9, 11: They do not recite any additional elements.
Therefore, “Do the claims recite additional elements that amount to significantly more than the judicial exception? No, these additional elements do not amount to significantly more than the judicial exception.
Having concluded analysis within the provided framework, Claims 1-13 are not eligible subject matter under 35 U.S.C. § 101.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-9, 12-13, 17-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yazdanbakhsh et al. US Pub 2023/0376664 (hereafter Yazdanbakhsh) and in view of Zhang et al. US Patent No. 11,887,353 (hereafter Zhang) and further in view of Zhu et al. US Pub 2023/0168919 (hereafter Zhu)
Reference Zhang was cited in the previous office action.
As per claim 1, Yazdanbakhsh teaches the invention substantially as claimed including a system, comprising: a module configured to obtain multiple user machine learning workloads (para[0004, 0009, 0030, 0038-0040], determine hardware architectures for target computing device to perform machine learning tasks);
a local architecture search module configured to receive an architectural template that relates to accelerator core types and evaluate individual accelerator core types at accomplishing sub-sets of the operators (para[0003-0007, 0027, 0069, 0073-0074, 0088-0091], generate a plurality of candidate hardware accelerator architectures including multiple hardware accelerators, according to the values of hardware parameters, hardware design policy and based on the requirements of the task, and evaluate candidate hardware architecture against the pre-evaluation criteria, and also evaluate the performance measures);
and, a global architecture search module configured to evaluate combinations of accelerators of the accelerator types for collectively accomplishing the operators and generate a ranking of accelerator hardware architectures that employs an individual evaluated combination of accelerators for performing the workload (para[0030, 0038, 0076-0078], after evaluation of the candidate hw architectures is completed, the search system can select the candidate hardware accelerator architecture that has the best performance measures, best satisfies the hw design constraints).
Yazdanbakhsh does not explicitly teach deep learning (DL) workloads; a graph generator module configured to graph the DL workloads to operators; wherein respective hardware architectures specify respective combinations of respective accelerator core types on a chip.
However, Zhang teaches deep learning (DL) workloads; a graph generator module configured to graph the DL workloads to operators (col 2, line 5-30, generating a graph of the deep learning model (workloads) where the graph represent the operators of the deep learning model).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Zhang’s teaching to Yazdanbakhsh’s invention in order to provide a method for deep learning image classification in order to reasonably assign a plurality of operators in the deep learning model to a plurality of computing devices with a goal of minimizing a reasoning completion time of the deep learning model, thereby effectively improving the efficiency of executing the deep learning model for image classification by a plurality of computing devices (col 5, line 46-58).
Yazdanbakhsh and Zhang do not explicitly teach respective hardware architectures specify respective combinations of respective accelerator core types on a chip.
However, Zhu teaches respective hardware architectures specify respective combinations of respective accelerator core types on a chip (para[0040, 0046, 0054-0055, 0129], FIG. 2, workload specifies requirements of hardware accelerators to perform operations, and selecting hardware accelerator machine (architectures) from a plurality of hardware accelerator machines including different types and sizes hardware accelerators).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Zhu’s teaching to Yazdanbakhsh and Zhang’s invention in order to provide a method to preflight checks for hardware accelerators in a system which execute workloads, where diagnosing failures is performed prior to the workload being executed in order to prevent future errors and inaccurate outputs and to reduce application downtime (para[0010-0011]).
As per claim 2, Yazdanbakhsh, Zhang and Zhu teach the system of claim 1, Zhang teaches wherein the graph generator module comprises an operator graph generator sub-module that is configured to extract the graph from a training script of the DL workload (col 6, line 45-67, FIG. 2a and 2b, directed acyclic graph is modeled based on the deep learning model, which extracts operators from the DL model).
As per claim 3, Yazdanbakhsh teaches map the operators to accelerator core types (para[0005, 0077], accelerators are specialized hardware for performing different types of operations, matrix multiplication, more efficiently over general purpose computing devices, and select the hardware accelerator best matches the constraint data).
In addition, Zhang teaches wherein the operator graph generator sub-module is configured to map the operators to computing devices (col 2, line 5-30, assigning the operators of the DL model to the corresponding (selected) computing devices with a goal of minimizing reasoning completion time based on the parameters of the computing devices).
As per claim 4, Yazdanbakhsh teaches accelerators (para[0005, 0054, 0077], hardware accelerator architectures having different accelerators).
In addition, Zhang teaches wherein the graph generator module comprises a model splitter sub-module that is configured to partition DL models of the DL workload across computing devices (col 1, line 37-48, col 14, line 8-26, deep learning model is divided into multiple sub-models including at least one operators), where the operators are assigned to computing devices).
As per claim 5, Yazdanbakhsh teaches accelerators (para[0005, 0054, 0077], hardware accelerator architectures having different accelerators).
In addition, Zhang teaches wherein the graph generator module comprises a model splitter sub-module that is configured to perform operator graph optimizations that consider a memory hierarchy of the computing devices (col 2, line 5-30, operators of the graph of the deep learning model (workloads) are assigned to computing devices, where the memory overhead for each of the computing devices to run the tasks are taken into consideration when selecting a computing device to be assigned).
As per claim 6, Yazdanbakhsh teaches to determine power usage of potential architectures defined by the architectural template (para[0029, 0036, 0057], determine architecture for hardware accelerators based on the design constraints including power consumption).
In addition, Zhang teaches wherein the local architecture search module comprises an architecture estimator sub-module that is configured to annotate latency estimations associated with execution of individual graph operators and to determine usage of potential architectures defined by the architectural template (col 7, line 52-63, col 11, line 54-67, col 12, line 1-8, processing time, memory overhead, transmission latency for each of the tasks are determined).
As per claim 7, Zhang teaches wherein the local architecture search module comprises a critical path analyzer sub-module that is configured to identify critical operators and determine latencies of the operators (col 7, line 52-63, col 11, line 54-67, col 12, line 1-8, processing time, memory overhead, transmission latency for each of the tasks are determined, and assigning the operators of the DL model to the computing devices for execution with a goal of minimizing reasoning completion time of the DL model, thus minimizing latency).
As per claim 8, Zhu teaches wherein the local architecture search module comprises an architecture search sub-module that is configured to perform a local search to determine a design of a single accelerator (para[0040, 0054-0055, 0090], search among the first subset of one or more hardware accelerator machines, where each hardware accelerator machines include one (single) or more accelerators).
As per claim 9, Zhu teaches wherein the local architecture search module comprises a convergence checker sub-module that is configured to track relative performance of hardware architectures identified by the global architecture search module (para[0040, 0054-0055, 0090], search among the first subset of one or more hardware accelerator machines, and perform actions represented by the task to determine the performance of the hardware accelerator machines, and if the action fails, then re-assign the workload to second subset of hardware accelerator machines).
As per claim 12, Yazdanbakhsh teaches the invention substantially as claimed including a system, comprising: storage configured to store computer-readable instructions; and a processor configured to execute the computer-readable instructions to: obtain a machine learning model for accomplishing a workload (para[0004, 0009, 0030, 0038-0040], determine hardware architectures for target computing device to perform machine learning tasks);
receive an architectural template that relates to multiple accelerator cores types including at least a first accelerator core type and a second accelerator core type and generate a recommended hardware architecture for an accelerator (para[0003-0007, 0027, 0069, 0073-0074, 0088-0091], generate a plurality of candidate hardware accelerator architectures including multiple0 hardware accelerators, according to the values of hardware parameters, hardware design policy and based on the requirements of the task, and evaluate candidate hardware architecture against the pre-evaluation criteria, and also evaluate the performance measures);
Yazdanbakhsh does not explicitly teach deep learning (DL) workloads; identify a first portion of the graph of operators to perform with an accelerator core of a first accelerator core type and a second portion of the graph of operators to perform with a second accelerator core of a second accelerator core type; hardware architecture for an accelerator that includes the first accelerator core and the second accelerator core.
However, Zhang teaches generate a graph of operators for the DL model (col 2, line 5-30, generating a graph of the deep learning model (workloads) where the graph represent the operators of the deep learning model);
identify a first portion of the graph of operators to perform with a first accelerator core of the first accelerator core type and a second portion of the graph of operators to perform with a second accelerator core of the second accelerator core type (col 2, line 5-30, assigning the operators of the DL model to the corresponding (selected) computing devices with a goal of minimizing reasoning completion time based on the parameters of the computing devices).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Zhang’s teaching to Yazdanbakhsh’s invention in order to provide a method for deep learning image classification in order to reasonably assign a plurality of operators in the deep learning model to a plurality of computing devices with a goal of minimizing a reasoning completion time of the deep learning model, thereby effectively improving the efficiency of executing the deep learning model for image classification by a plurality of computing devices (col 5, line 46-58).
Yazdanbakhsh and Zhang do not explicitly teach hardware architecture for a hardware chip that includes the first accelerator core and the second accelerator core.
However, Zhu teaches respective hardware architectures specify respective combinations of respective accelerator core types on a chip (para[0040, 0046, 0054-0055, 0129], FIG. 2, workload specifies requirements of hardware accelerators to perform operations, and selecting hardware accelerator machine (architectures) from a plurality of hardware accelerator machines including different types and sizes hardware accelerators).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Zhu’s teaching to Yazdanbakhsh and Zhang’s invention in order to provide a method to preflight checks for hardware accelerators in a system which execute workloads, where diagnosing failures is performed prior to the workload being executed in order to prevent future errors and inaccurate outputs and to reduce application downtime (para[0010-0011]).
As per claim 13, Yazdanbakhsh teaches wherein the workload comprises a training script for the learning model (0030, 0038-0040, 0060, 0117], determine hardware architectures for target computing device to perform machine learning tasks);
wherein the architectural template defines areas of each core type and available chip area (para[0006, 0038, 0054], a space of candidate hardware accelerator architectures is defined by values of hardware parameters including different types of accelerators).
In addition, Zhang teaches DL model (col 2, line 5-30, generating a graph of the deep learning model (workloads) where the graph represent the operators of the deep learning model).
As per claim 17, Yazdanbakhsh teaches wherein the generating recommended accelerator hardware architecture comprises generating recommended hardware architecture and their corresponding schedule accelerator in a pipelined distributed training-based execution (para[0070-0075], FIG. 1, candidate hw architecture is recommended which satisfies the pre-evaluation criteria).
As per claim 18, Yazdanbakhsh teaches the invention substantially as claimed including a device-implemented method, comprising: generating a hardware architecture for a chip that includes the first accelerator core and the second accelerator core arranged together on the chip to collectively accomplish the learning model (para[0073-0074, 0077, 0084, 0088-0091, 0101, 0124], generate a plurality of candidate hardware accelerator architectures including multiple hardware accelerators, according to the values of hardware parameters, hardware design policy and based on the requirements of the task, supporting a different neural network model).
Yazdanbakhsh does not explicitly teach obtaining a deep learning training script associated with a deep learning model; extracting an operator graph from the deep learning training script, splitting the operator graph into first and second portions; tuning a first accelerator core for the first portion of the heterogeneous pipeline and a second accelerator core for the second portion of the heterogeneous pipeline.
However, Zhang teaches extracting an operator graph from the deep learning training script (col 6, line 45-67, FIG. 2a and 2b, directed acyclic graph is modeled based on the deep learning model, which extracts operators from the DL model);
splitting the operator graph into first and second portions (col 1, line 37-48, col 14, line 8-26, deep learning model is divided into multiple sub-models including at least one operators), where the operators are assigned to computing devices).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Zhang’s teaching to Yazdanbakhsh’s invention in order to provide a method for deep learning image classification in order to reasonably assign a plurality of operators in the deep learning model to a plurality of computing devices with a goal of minimizing a reasoning completion time of the deep learning model, thereby effectively improving the efficiency of executing the deep learning model for image classification by a plurality of computing devices (col 5, line 46-58).
Yazdanbakhsh and Zhang do not explicitly teach tuning a first accelerator core for the first portion of the heterogeneous pipeline and a second accelerator core for the second portion of the heterogeneous pipeline.
However, Zhu teaches tuning a first accelerator core for the first portion of the heterogeneous pipeline and a second accelerator core for the second portion of the heterogeneous pipeline (para[0121, 0126, 0133], for each of the hardware accelerators, a program code package including a task action is installed, and the first and second hardware accelerators are selected to perform portion of the user job).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Zhu’s teaching to Yazdanbakhsh and Zhang’s invention in order to provide a method to preflight checks for hardware accelerators in a system which execute workloads, where diagnosing failures is performed prior to the workload being executed in order to prevent future errors and inaccurate outputs and to reduce application downtime (para[0010-0011]).
As per claim 19, Zhu teaches wherein the tuning comprises tuning for a single accelerator core type, tuning for two accelerator core types, or tuning for more than two accelerator core types (para[0121, 0126, 0133], for each of the one or more hardware accelerators, a program code package including a task action is installed, and the first and second hardware accelerators are selected to perform portion of the user job).
As per claim 20, Zhang teaches the generating comprises generating a scheduling recommendation that co-optimizes scheduling of the operator graph with the hardware architecture across an entire training pipeline defined by the operator graph (col 2, line 5-30, col 6, line 36-67, assigning the operators of the DL model to the corresponding (selected) computing devices with a goal of minimizing reasoning completion time based on the parameters of the computing devices).
As per claim 21, Zhu teaches the hardware architecture specifies a number of tensor processing units on the chip and a number of vector processing units on the chip (para[0004, 0078], hardware architecture includes number of different accelerators including tensor processing unit and vector processing units on chip).
As per claim 22, Zhang teaches the memory sizes (col 3, line 7-9, col 4, line 40-45, col 8, line 29-32, memory sizes of the computing devices).
In addition, Zhu teaches the hardware architecture specifies the tensor processing units and the vector processing units (para[0004, 0078], hardware architecture includes number of different accelerators including tensor processing unit and vector processing units on chip).
Claim(s) 14-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yazdanbakhsh in view of Zhang and Zhu as applied to claim 12 above, and further in view of Ma et al. US Pub 2023/0107440 (hereafter Ma).
Reference Ma was cited in the previous office action.
As per claim 14, Yazdanbakhsh, Zhang and Zhu teach the system of claim 12, but they do not explicitly teach wherein the identifying comprises performing compiler optimizations on the graph.
However, Ma teaches the identifying comprises performing compiler optimizations on the graph (para[0036, 0093, 0095], the graph compilation optimization optimizes the graph of the deep learning models).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Ma’s teaching to Yazdanbakhsh, Zhang and Zhu’s invention in order for a deep learning framework to achieve the optimal running efficiency on hardware, by providing a framework to have custom graph optimization strategies which may be adapted by hardware devices according to their own characteristics, so that the framework can take full advantage of the performance of the hardware (para[0036, 0045]).
As per claim 15, Yazdanbakhsh, Zhang, Zhu and Ma teach the system of claim 14, and Yazdanbakhsh teaches wherein the identifying further comprises generating parallelization schemes for training the DL model (para[0053, 0129], machine learning task includes combination of multiple tasks, and the tasks are processed in parallel).
In addition, Zhang teaches DL model (col 2, line 5-30, generating a graph of the deep learning model (workloads) where the graph represent the operators of the deep learning model).
As per claim 16, Yazdanbakhsh teaches wherein the generating recommended accelerator hardware architecture further comprises generating scheduling recommendations for the recommended accelerator hardware architecture (para[0069, 0073-0074, 0088-0091, 0096], generate a plurality of candidate hardware accelerator architectures including multiple hardware accelerators, according to the values of hardware parameters, hardware design policy and based on the requirements of the task, and evaluate candidate hardware architecture against the pre-evaluation criteria and also evaluate the performance measures to generate a final hardware accelerator architecture).
Allowable Subject Matter
Claims 10-11 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, and 35 U.S.C. 101 set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAMMY EUNHYE LEE whose telephone number is (571)270-7773. The examiner can normally be reached Mon, Tues, Thur 9PM-4PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai An can be reached at (571)272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TAMMY E LEE/Primary Examiner, Art Unit 2195