DETAILED ACTION
Applicant’s request for reconsideration of the finality of the rejection of the last Office action is persuasive and, therefore, the finality of that action is withdrawn. This action is responsive to Applicant’s reply filed 06 October 2025. This action is made non-final.
See attached Interview Summary.
Status of Claims
Claims 1, 7, 9, 11 and 18 are currently amended.
Claims 8 and 17 are canceled.
Claims 21-22 are added.
Claim status is currently pending and under examination for claims 1-7, 9-16 and 18-22 of which independent claims are 1, 11 and 18.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s arguments regarding the art rejections for claims 1-7, 9-10 and 18-21 are moot in view of the new grounds of rejection necessitated by applicant’s amendment.
In regards to the rejection of claims 1-7, 9-10 and 18-21 under 35 U.S.C. 101 for being directed towards an abstract idea without significantly more, Applicant argues the claims are not directed to an abstract idea but rather to an improvement in the technical problem of generating machine learning models that consider the availability of dedicated circuitry on inference hardware (See Applicant’s response, pages 9-11). On Page 10, Applicant argues the step of generating second machine learning models by replacing individual inference operations with inference operations that are supported by dedicated circuitry improves latency, power consumption, and memory utilization. Applicant’s arguments are not persuasive since improved latency, power consumption, and memory utilization are not reflected in the claims. On Page 11, Applicant argues that dedicated circuitry and hardware emulation provide a technical solution to the problem of generating machine learning models that consider the availability of dedicated circuitry on inference hardware. Applicant’s arguments are not persuasive since an improvement to this technical problem is not reflected in the claims. On Page 11, Applicant argues amended claim 1 is rooted in a technological environment of “dedicated circuitry” and “hardware emulation”. Applicant’s arguments are not persuasive since “dedicated circuitry” and “hardware emulation” are recited so generically such that the limitations amount to merely indicating a field of use or technological environment in which to apply a judicial exception.
On Page 12, Applicant argues the generating second machine learning models and executing the second machine learning model steps are not well-understood, routine, or conventional activity (WURC). Applicant’s argument is not persuasive since the Examiner did not point to these steps as WURC in the prior office action. The Examiner only pointed to the “obtaining a first machine learning model having one or more first inference operations” step of original claim 1 as amounting to mere data gathering, thus adding insignificant extra-solution activity to the judicial exception and not integrating a judicial exception into a practical application. Thus, the rejections of claims 1-7, 9-10 and 18-21 as being directed towards an abstract idea without significantly more are still maintained.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-7, 9-10, 18-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Independent Claim 1
Step 2A Prong One: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, independent claim 1, under the broadest reasonable interpretation, recites the following limitations that are abstract ideas:
identifying a plurality of second inference operations that are supported by an inference hardware architecture; (mental process)
based at least on the executing, determining one or more metrics that characterize performance of the second machine learning models; (mental process)
and selecting a final machine learning model from the second machine learning models based at least on the one or more metrics (mental process)
The “identifying” step involves determining which inference operations are supported by a hardware architecture, which amounts to no more than observations, evaluations, and judgements that can be performed in human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of identifying inference operations at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “identifying” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III).
The “determining” step involves identifying metrics that are significant in predicting model performance which amounts to no more than observations, evaluations, and judgments that can be performed in the human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of determining one or more metrics at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “determining” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III).
The “selecting” step involves identifying a machine learning model based on metrics, which amounts to no more than observations, evaluations, and judgements that can be performed in human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of selecting a final machine learning model at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “selecting” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III).
Therefore, the independent claim recites a judicial exception.
Step 2A Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the judicial exception recited above is not integrated into a practical application. The claims recite the following additional elements, but these additional elements are not sufficient to integrate the judicial exception into a practical application:
obtaining a first machine learning model having one or more first inference operations; (MPEP § 2106.05(g) necessary data gathering and insignificant extra-solution activity to the judicial exception)
generating second machine learning models by replacing individual first inference operations of the first machine learning model with individual second inference operations that are supported by dedicated circuitry of the inference hardware architecture; (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
executing the second machine learning models using hardware emulation of the individual second inference operations; (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
The “obtaining” step amounts to mere data gathering and is recited at a high level of generality, thus adding insignificant extra-solution activity to the judicial exception – see MPEP § 2106.05(g). Under MPEP § 2106.05(d), such additional elements have been found by the courts to not integrate a judicial exception into a practical application.
The “generating” step is recited at a high-level of generality such that the limitation amounts to no more than mere instructions to “apply” the judicial exception on a computer. It can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f).
The “executing” step is recited at a high-level of generality such that the limitation amounts to no more than mere instructions to “apply” the judicial exception on a computer. It can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f).
Therefore, the above limitations do not integrate the judicial exception into a practical application.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The claims do not include additional elements that are sufficient for the claims to amount to significantly more than the judicial exception.
In regards to the “obtaining” step, this step adds insignificant extra-solution activity. An extra-solution activity is a well-understood, routine and conventional (WURC) activity per MPEP § 2106.05(d)(II), “the courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network, e.g., using the Internet to gather data.” The “obtaining” step does not integrate the judicial exception into a practical application and does not amount to significantly more.
In regards to the “generating” and “executing” steps, the limitations are recited so generically such that they amount to no more than mere instructions to “apply” the judicial exception on a computer using generic computer components. Mere instructions to apply a judicial exception cannot provide an inventive concept. See MPEP § 2106.05(f).
Therefore, independent claim 1 is not patent eligible.
Dependent Claims 2-7, 9-10 and 21
The remaining dependent claims being rejected do not recite additional elements, whether considered individually or in combination, that are sufficient to integrate the judicial exception into a practical application or amount to significantly more than a judicial exception.
Dependent claim 2 recites the further limitation of “wherein the one or more metrics relate to losses or accuracy of the second machine learning models.” The step is recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The step does not integrate the judicial exception into a practical application and does not amount to significantly more.
Dependent claim 3 recites the further limitation of “wherein the one or more metrics relate to latencies, power consumption, or memory utilization of the second machine learning models.” The step is recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The step does not integrate the judicial exception into a practical application and does not amount to significantly more.
Dependent claim 4 recites the further limitation of “simulating execution of the second machine learning models on a central processing unit to determine the one or more metrics.” The step is recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The step does not integrate the judicial exception into a practical application and does not amount to significantly more.
Dependent claim 5 recites the further limitations:
determining a frontier of the second machine learning models with respect to multiple metrics; (mental process)
and selecting the final machine learning model from the frontier (mental process)
The “determining” step involves using metrics to determine a frontier, which amounts to no more than observations, evaluations, and judgements that can be performed in human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of determining a frontier at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “determining” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III).
The “selecting” step involves identifying a machine learning model from a frontier, which amounts to no more than observations, evaluations, and judgements that can be performed in human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of selecting a final machine learning model at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “selecting” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III). This claim does not recite any non-abstract additional elements.
Dependent claim 6 recites the further limitations:
Step 2A Prong One: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, dependent claim 6, under the broadest reasonable interpretation, recites the following limitations that are abstract ideas:
performing two or more iterations of selecting a subset of the second machine learning models for further modification (mental process)
The “performing” step involves identifying subsets of machine learning models, which amounts to no more than observations, evaluations, and judgements that can be performed in human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of selecting a subset of second machine learning models at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “performing” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III).
Therefore, dependent claim 6 recites a judicial exception.
Step 2A Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the judicial exception recited above is not integrated into a practical application. The claims recite the following additional elements, but these additional elements are not sufficient to integrate the judicial exception into a practical application:
and generating further second machine learning models from the selected subset (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
The “generating” step is recited at a high-level of generality such that the limitation amounts to no more than mere instructions to “apply” the judicial exception on a computer. It can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f).
Therefore, the above limitations do not integrate the judicial exception into a practical application.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The claims do not include additional elements that are sufficient for the claims to amount to significantly more than the judicial exception.
In regards to the “generating” step, the step is recited so generically such that it amounts to no more than mere instructions to “apply” the judicial exception on a computer using generic computer components. Mere instructions to apply a judicial exception cannot provide an inventive concept. See MPEP § 2106.05(f).
Therefore, dependent claim 6 is not patent eligible.
Dependent claim 7 recites the further limitations:
wherein generating an individual second machine learning model comprises removing, from the first machine learning model, an individual first inference operation that is not supported by corresponding dedicated circuitry of the inference hardware architecture (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
and replacing the individual first inference operation with a particular second inference operation that is supported by corresponding dedicated circuitry of the inference hardware architecture (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
The “removing” and “replacing” steps are recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The steps do not integrate the judicial exception into a practical application and do not amount to significantly more.
Dependent claim 9 recites the further limitations:
Step 2A Prong One: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, dependent claim 9, under the broadest reasonable interpretation, recites the following limitations that are abstract ideas:
and using the respective per-operation metrics to select individual second machine learning models as parent models for further modification or to select the final machine learning model (mental process)
The “select” step involves identifying optimal machine learning models using metrics, which amounts to no more than observations, evaluations, and judgements that can be performed in human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of selecting machine learning models at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “select” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III).
Therefore, dependent claim 9 recites a judicial exception.
Step 2A Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the judicial exception recited above is not integrated into a practical application. The claims recite the following additional elements, but these additional elements are not sufficient to integrate the judicial exception into a practical application:
obtaining respective per-operation metrics via the hardware emulation; (MPEP § 2106.05(g) necessary data gathering and insignificant extra-solution activity to the judicial exception)
The “obtaining” step amounts to mere data gathering and is recited at a high level of generality, thus adding insignificant extra-solution activity to the judicial exception – see MPEP § 2106.05(g). Under MPEP § 2106.05(d), such additional elements have been found by the courts to not integrate a judicial exception into a practical application.
Therefore, the above limitations do not integrate the judicial exception into a practical application.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The claims do not include additional elements that are sufficient for the claims to amount to significantly more than the judicial exception.
In regards to the “obtaining” step, this step adds insignificant extra-solution activity. An extra-solution activity is a well-understood, routine and conventional (WURC) activity per MPEP § 2106.05(d)(II), “the courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network, e.g., using the Internet to gather data.” The “obtaining” step does not integrate the judicial exception into a practical application and does not amount to significantly more.
Therefore, dependent claim 9 is not patent eligible.
Dependent claim 10 recites the further limitation of “outputting multiple final machine learning models selected according to different metrics.” The step is recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The step does not integrate the judicial exception into a practical application and does not amount to significantly more.
Dependent claim 21 recites the further limitations:
a systolic array configured to perform multiply and accumulate operations on input data using parallel nodes, (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea)
wherein the systolic array includes at least: first dedicated circuitry configured to perform a first convolution operation with a first input tensor size, a first output tensor size, and a first kernel size, (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea)
and second dedicated circuitry configured to perform a second convolution operation with a second input tensor size, a second output tensor size, and a second kernel size, (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea)
wherein the first machine learning model includes a third convolution operation having a third input tensor size, a third output tensor size, and a third kernel size (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
and the systolic array does not have dedicated circuitry for the third convolution operation, (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
and the final machine learning model is generated by replacing the third convolution operation with at least one of the first convolution operation or the second convolution operation (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
The “systolic array”, “first dedicated circuitry”, “second dedicated circuitry”, and “third convolution operation” steps are recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The steps do not integrate the judicial exception into a practical application and do not amount to significantly more.
The “and the systolic array does not have dedicated circuitry …” and “replacing” steps are recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The steps do not integrate the judicial exception into a practical application and do not amount to significantly more.
Independent Claim 18
Step 2A Prong One: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, independent claim 18, under the broadest reasonable interpretation, recites the following limitations that are abstract ideas:
determine a device context for the computing device; (mental process)
based at least on the device context, select a particular machine learning model from a plurality of machine learning models available to the computing device; (mental process)
The “determine” step involves identifying device information for a computing device, which amounts to no more than observations, evaluations, and judgements that can be performed in human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of determining a device context at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “determine” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III).
The “select” step involves identifying a machine learning model based on a device context, which amounts to no more than observations, evaluations, and judgements that can be performed in human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the step of selecting a particular machine learning model at a high degree of generality, thus the step is not required to have any specific level of complexity that would preclude the step from being mental processes. Therefore, the “select” step is considered to be mental processes, see MPEP § 2106.04(a)(2)(III).
Therefore, the independent claim recites a judicial exception.
Step 2A Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the judicial exception recited above is not integrated into a practical application. The claims recite the following additional elements, but these additional elements are not sufficient to integrate the judicial exception into a practical application:
a hardware processing unit configured to execute a plurality of supported inference operations; (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea)
respective supported inference operations having corresponding dedicated circuitry on the hardware processing unit; (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
and a storage resource storing computer-readable instructions which, when executed by the hardware processing unit, cause the hardware processing unit to (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea)
the plurality of machine learning models having different supported inference operations corresponding to different dedicated circuitry on the hardware processing unit (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
and execute the particular machine learning model to perform a particular task (MPEP § 2106.05(f) mere instructions to implement an abstract idea on a computer, or generally links exception to a technological environment)
The “respective supported inference …” step is recited at a high-level of generality such that the limitation amounts to no more than mere instructions to “apply” the judicial exception on a computer. It can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f).
The “plurality of machine learning models …” step is recited at a high-level of generality such that the limitation amounts to no more than mere instructions to “apply” the judicial exception on a computer. It can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f).
The “execute” step is recited at a high-level of generality such that the limitation amounts to no more than mere instructions to “apply” the judicial exception on a computer. It can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f).
The remaining additional elements are recited at a high-level of generality such that they amount to no more than mere instructions to “apply” an exception using a generic component. Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f).
Therefore, the above limitations do not integrate the judicial exception into a practical application.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The claims do not include additional elements that are sufficient for the claims to amount to significantly more than the judicial exception.
In regards to the “respective supported inference …”, “plurality of machine learning models …” and “execute” steps, the limitations are recited so generically such that they amount to no more than mere instructions to “apply” the judicial exception on a computer using generic computer components. Mere instructions to apply a judicial exception cannot provide an inventive concept. See MPEP § 2106.05(f).
In regards to the remaining additional elements, the limitations are recited so generically such that they amount to no more than mere instructions to “apply” the judicial exception on a computer using generic computer components. Mere instructions to apply a judicial exception cannot provide an inventive concept. See MPEP § 2106.05(f).
Therefore, independent claim 18 is not patent eligible.
Dependent Claims 19-20
The remaining dependent claims being rejected do not recite additional elements, whether considered individually or in combination, that are sufficient to integrate the judicial exception into a practical application or amount to significantly more than a judicial exception.
Dependent claim 19 recites the further limitation of “the device context relating to availability of power or memory on the computing device.” The step is recited at a high-level of generality such that the limitations amount to no more than mere instructions to “apply” the judicial exception on a computer. They can also be viewed as nothing more than an attempt to generally link the use of the judicial exception to the technological environment of computers, see MPEP § 2106.05(f). Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP § 2106.05(f). The step does not integrate the judicial exception into a practical application and does not amount to significantly more.
Dependent claim 20 recites the further limitations:
in a first instance when availability of memory for the computing device is constrained, select a first machine learning model as the particular machine learning model to execute to perform the particular task, the first machine learning model having been generated based at least on a first metric relating to memory utilization; (mental process)
and in a second instance when availability of power to the computing device is constrained, select a second machine learning model as the particular machine learning model to execute to perform the particular task, the second machine learning model having been generated based at least on a second metric relating to power consumption (mental process)
The “select” steps involve identifying machine learning models based on metrics, which amounts to no more than observations, evaluations, and judgements that can be performed in human mind or with the use of a physical aid (e.g., pen and paper). The claim recites the steps of selecting machine learning models at a high degree of generality, thus the steps are not required to have any specific level of complexity that would preclude the steps from being mental processes. Therefore, the “select” steps are considered to be mental processes, see MPEP § 2106.04(a)(2)(III). This claim does not recite any non-abstract additional elements.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-7, 10 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kobayashi (US 20180365557 A1) in view of Das et al. (US 20210350203 A1), hereinafter Das.
With respect to claim 1, Kobayashi teaches:
A method performed on a computing device, the method comprising (Kobayashi discloses “there is provided an information processing method including: generating, by a processor, another neural network with a different network structure from an evaluated neural network; acquiring an evaluation result of the generated neural network; updating a Pareto optimal solution relating to an evaluated neural network on a basis of the evaluation result of the generated neural network; and generating another neural network with a different network structure from a neural network relating to the Pareto optimal solution” [0006].):
obtaining a first machine learning model having one or more first inference operations (Kobayashi discloses a seed network (‘a first machine learning model’), “FIG. 1 is a diagram for explaining generation of a neural network by mutation. Referring to FIG. 1, it can be seen that a seed network SN includes 10 layers including “Input” and “Output”. Further, as illustrated in an example in FIG. 1, the neural network according to the present disclosure may include a middle layer, an activating function, or the like, as well as the input and output layers. For example, in the example in FIG. 1. “Conv1” and “Conv2” indicate Convolution layers, and “Pool1” and “Pool2” indicate Max-Pooling” [0056-0057]. See Figure 1.);
identifying a plurality of second inference operations … (Kobayashi discloses “the neural network MN1 is another neural network generated by causing the seed network SN to mutate. Referring to the neural network MN1, it can be seen that part of a layer configuration changes from that in a network structure of the seed network SN. Specifically, in the neural network MN1, an activating function “relu1” relating to the seed network SN changes to another activating function “Tanh1”. In this manner, with the information processing method according to the present disclosure, by changing layer types of layers constituting a network structure, it is possible to generate another neural network with a different network structure” [0058]. See Figure 1.);
generating second machine learning models by replacing individual first inference operations of the first machine learning model with individual second inference operations … (Kobayashi discloses “the generating unit 310 randomly determines a generation method of another neural network to be applied to the original neural network (S1101). In this event, the original neural network may be the seed network designated by the user or may be a network randomly selected by the generating unit 310 from neural networks relating to Pareto optimal solutions updated by the evaluating unit 320. The generating unit 310 then generates another neural network with a different network structure from the original neural network on the basis of the generation method selected in step S1101. Referring to an example illustrated in FIG. 5, the generating unit 310 according to the present embodiment may generate the above-described another neural network by causing the original neural network to mutate (S1102)” [0085-0086].
Kobayashi discloses “mutation according to the present embodiment may include insertion of a layer, deletion of a layer, change of a layer type, change of a parameter, a graph branch and deletion of a graph branch. Referring to FIG. 6, first, the generating unit 310 randomly determines a method of mutation to be applied to the original neural network (S1201). Subsequently, the generating unit 310 changes a network structure of the original neural network on the basis of the method selected” [0093-0094]. See [0095-0098] discussing the process of performing mutations.);
executing the second machine learning models using hardware emulation of the individual second inference operations (Kobayashi discloses “the evaluating unit 320 has a function of acquiring an evaluation result of the generated neural network. The evaluating unit 320 may acquire the above-described evaluation result by, for example, causing a computing resource on cloud to execute the generated neural network. Further, the evaluating unit 320 may acquire the evaluation result by causing an emulator or various kinds of devices connected via the network 20 to execute the neural network. Further, the evaluation result acquired by the evaluating unit 320 may include a calculation amount relating to the generated neural network and at least one of a training error and a validation error (hereinafter, the training error and the validation error may be collectively expressed as an error” [0079-0080].);
based at least on the executing, determining one or more metrics that characterize performance of the second machine learning models (Kobayashi discloses “the evaluating unit 320 has a function of acquiring an evaluation result of the generated neural network. The evaluating unit 320 may acquire the above-described evaluation result by, for example, causing a computing resource on cloud to execute the generated neural network. Further, the evaluating unit 320 may acquire the evaluation result by causing an emulator or various kinds of devices connected via the network 20 to execute the neural network. Further, the evaluation result acquired by the evaluating unit 320 may include a calculation amount relating to the generated neural network and at least one of a training error and a validation error (hereinafter, the training error and the validation error may be collectively expressed as an error” [0079-0080].);
and selecting a final machine learning model from the second machine learning models based at least on the one or more metrics (Kobayashi discloses Figure 9 (reproduced below) depicting Pareto optimal solutions (labeled P4 – P6). Pareto optimal solutions depict the tradeoff between error and calculation amount (performance) metrics.
PNG
media_image1.png
753
1084
media_image1.png
Greyscale
Kobayashi further discloses “trade-off information relating to an error and a calculation amount is presented to the user, the trade-off information according to the present embodiment is not limited to such an example. In the trade-off information according to the present embodiment, for example, memory usage, an amount of heat generation, power consumption, or the like, relating to hardware may be used as well as the calculation amount. Further, in the trade-off information, total cost of hardware calculated from the calculation amount, total service cost, or the like, including server cost, or the like, may be used” [0116].
Kobayashi discloses “it is possible to present a candidate selected from the neural networks relating to the Pareto optimal solutions to the user. Here, the above-described candidate may include a network relating to maximum performance, a network relating to an intermediate solution and a network relating to a minimum calculation amount. Further, with the information processing method according to the present embodiment, it is possible to allow the user to download a file relating to execution of a network. By this means, the user can easily select a network which satisfies conditions and acquire a file relating to execution of the network” [0125].).
However, Kobayashi does not teach identifying second inference operations that are supported by an inference hardware architecture and replacing individual inference operations with individual second inference operations that are supported by dedicated circuitry of the inference hardware architecture, which is taught by Das:
obtaining a first machine learning model having one or more first inference operations (Das discloses “the NAS controller (110) is configured to select the optimal neural blocks from the plurality of neural blocks based on the quality of each neural block. Further, the NAS controller (110) is configured to generate a standard DNN model using the optimal neural blocks. Further, the NAS controller (110) is configured to optimize the standard DNN model by modifying unsupported operations used for the execution of the task with supported operations to generate the optimized DNN model” [0066].);
identifying a plurality of second inference operations that are supported by an inference hardware architecture (Das discloses “the NAS controller (110) is configured to search for standard operations at a knowledgebase to replace the unsupported operations. Further, the NAS controller (110) is configured to replace the unsupported operations with the standard operations, and retraining the neural block with the standard operations, when the standard operations are available” [0071].
Das discloses “FIG. 15B illustrates a flow diagram that includes steps performed in the method of customizing the operations, according to an embodiment as disclosed herein. After constructing the final instance deployable model (1506), the electronic device (100) searches to the knowledgebase of supported/unsupported operations and uses the decision metamodel (505A) to predict the optimal replacement (1102) for the unsupported operations (1101) in the final instance deployable model (1506). Further, the controller (905) of the NAS platform (900) evaluates suitability of the best candidate operations based on the performance, the hardware compatibility, the efficiency and the task precision” [0225].);
generating [a] second machine learning [model] by replacing individual first inference operations of the first machine learning model with individual second inference operations that are supported by dedicated circuitry of the inference hardware architecture (Das discloses “After constructing the final instance deployable model (1506), the electronic device (100) searches to the knowledgebase of supported/unsupported operations and uses the decision metamodel (505A) to predict the optimal replacement (1102) for the unsupported operations (1101) in the final instance deployable model (1506) … As shown in the FIG. 15B, the Maxout, a Leaky ReLU, the ELU in the original operations (1101) are the unsupported operations for the NPU (904B) in the final instance deployable model (1506). The controller (905) (905) of the NAS platform (900) replaces the Maxout, the Leaky ReLU, the ELU in the original operations (1101) with the PAU, a tanh, the PAU operations in the alternative operations (1102) respectively that are supporting by the NPU (904B) for the deployment. Further, the controller (905) generates the final instance deployable model (1507) with the supporting operations suitable for the NPU (904B)” [0225].
A final instance deployable model (‘first machine learning model’) is constructed. The unsupported operations of the final instance deployable model (the Maxout, Leaky ReLU, and ELU original operations) are replaced by the PAU, tanh, and PAU operations that are supported by a NPU (Neural Processing Unit). The result of replacing unsupported operations is a generated final instance deployable model (‘second machine learning model’) containing supported operations suitable for the NPU (‘inference hardware architecture’). See [0135] describing how a Neural Processing Unit is an AI-dedicated processor. A NPU designed for dedicated AI processing that supports various operations implies operations are supported by “dedicated circuitry”.
Das discloses further “Operations inside the DNN models are depended on hardware components that are suitable for the execution. Some operations in the DNN models may not be supported by other computing units due to not having enough memory bandwidth at the electronic device or a number precision to perform a complex tensor operation. This will cause significant commercial loss due to lower performance in use cases for certain electronic devices or may cause up to 30% of model drop ratio. The proposed method can be used to optimize the DNN model by changing/approximating unsupported operations with supported operations” [0061].).
Das teaches searching for operations that are supported by a NPU and generating an optimized model by replacing unsupported operations is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the method of Kobayashi with the technique disclosed by Das to replace unsupported operations in a neural network. By replacing unsupported operations in a neural network, a neural network can be made compatible with a target hardware, leading to improved performance and lower power consumption.
With respect to claim 2, the combination of Kobayashi in view of Das teaches:
the method of claim 1, wherein the one or more metrics relate to losses or accuracy of the second machine learning models (Kobayashi discloses “the evaluation result acquired by the evaluating unit 320 may include a calculation amount relating to the generated neural network and at least one of a training error and a validation error (hereinafter, the training error and the validation error may be collectively expressed as an error). The evaluating unit 320 can acquire the above-described calculation amount on the basis of a network structure of the generated neural network. Further, the evaluating unit 320 has a function of updating a Pareto optimal solution relating to the evaluated neural network on the basis of the evaluation result of the generated neural network. That is, the evaluating unit 320 acquires the evaluation result of the neural network generated by the generating unit 310 and repeatedly executes updating of the Pareto optimal solution on the basis of the evaluation result” [0080-0081]. See Figure 9 (reproduced above) and explanation discussing Pareto optimal solutions.).
With respect to claim 3, the combination of Kobayashi in view of Das teaches:
the method of claim 1, wherein the one or more metrics relate to latencies, power consumption, or memory utilization of the second machine learning models (Kobayashi discloses “the evaluation result acquired by the evaluating unit 320 may include a calculation amount relating to the generated neural network and at least one of a training error and a validation error (hereinafter, the training error and the validation error may be collectively expressed as an error). The evaluating unit 320 can acquire the above-described calculation amount on the basis of a network structure of the generated neural network. Further, the evaluating unit 320 has a function of updating a Pareto optimal solution relating to the evaluated neural network on the basis of the evaluation result of the generated neural network. That is, the evaluating unit 320 acquires the evaluation result of the neural network generated by the generating unit 310 and repeatedly executes updating of the Pareto optimal solution on the basis of the evaluation result” [0080-0081].
Kobayashi further discloses “trade-off information relating to an error and a calculation amount is presented to the user, the trade-off information according to the present embodiment is not limited to such an example. In the trade-off information according to the present embodiment, for example, memory usage, an amount of heat generation, power consumption, or the like, relating to hardware may be used as well as the calculation amount. Further, in the trade-off information, total cost of hardware calculated from the calculation amount, total service cost, or the like, including server cost, or the like, may be used” [0116]. See Figure 9 (reproduced above) and explanation discussing how Pareto optimal solutions depict trade-offs.).
With respect to claim 4, the combination of Kobayashi in view of Das teaches:
the method of claim 1, further comprising: simulating execution of the second machine learning models on a central processing unit to determine the one or more metrics (Kobayashi discloses “the evaluating unit 320 has a function of acquiring an evaluation result of the generated neural network. The evaluating unit 320 may acquire the above-described evaluation result by, for example, causing a computing resource on cloud to execute the generated neural network. Further, the evaluating unit 320 may acquire the evaluation result by causing an emulator or various kinds of devices connected via the network 20 to execute the neural network. Further, the evaluation result acquired by the evaluating unit 320 may include a calculation amount relating to the generated neural network and at least one of a training error and a validation error (hereinafter, the training error and the validation error may be collectively expressed as an error” [0079-0080].).
With respect to claim 5, the combination of Kobayashi in view of Das teaches:
the method of claim 1, further comprising: determining a frontier of the second machine learning models with respect to multiple metrics (Kobayashi discloses Figure 9 (reproduced above) depicting Pareto optimal solutions on a boundary (‘frontier’). As explained above, Pareto optimal solutions represent the trade-off between error and calculation amount (‘multiple metrics’) of the generated neural networks.
Kobayashi further discloses “the boundary PL of the Pareto optimal solution illustrated in FIG. 8B is updated on the basis of the evaluation result of the neural network generated by the generating unit 310. In the example illustrated in FIG. 8B, validation errors P1 to P3 of neural networks relating to new Pareto optimal solutions are displayed on the boundary PL of the Pareto optimal solution” [0112].);
and selecting the final machine learning model from the frontier (Kobayashi discloses “it is possible to present a candidate selected from the neural networks relating to the Pareto optimal solutions to the user. Here, the above-described candidate may include a network relating to maximum performance, a network relating to an intermediate solution and a network relating to a minimum calculation amount. Further, with the information processing method according to the present embodiment, it is possible to allow the user to download a file relating to execution of a network. By this means, the user can easily select a network which satisfies conditions and acquire a file relating to execution of the network” [0125].).
With respect to claim 6, the combination of Kobayashi in view of Das teaches:
the method of claim 1, further comprising: performing two or more iterations of selecting a subset of the second machine learning models for further modification and generating further second machine learning models from the selected subset (Kobayashi discloses “the evaluating unit 320 can acquire an evaluation result of the generated neural network and update the Pareto optimal solution on the basis of the evaluation result. Further, the generating unit 310 may generate another network on the basis of the neural network randomly selected from the neural networks P1 to P3 relating to the Pareto optimal solutions updated by the evaluating unit 320. That is, with the information processing method according to the present embodiment, another neural network is generated from a neural network relating to a Pareto optimal solution, and updating of the Pareto optimal solution based on the evaluation of the other neural network is repeatedly executed” [0113].).
With respect to claim 7, the combination of Kobayashi in view of Das teaches:
the method of claim 1, wherein generating an individual second machine learning model comprises removing, from the first machine learning model, an individual first inference operation … and replacing the individual first inference operation with a particular second inference operation … (Kobayashi discloses “mutation according to the present embodiment may include insertion of a layer, deletion of a layer, change of a layer type, change of a parameter, a graph branch and deletion of a graph branch. Referring to FIG. 6, first, the generating unit 310 randomly determines a method of mutation to be applied to the original neural network (S1201). Subsequently, the generating unit 310 changes a network structure of the original neural network on the basis of the method selected” [0093-0094].).
However, Kobayashi does not teach replacing inference operations that are not supported by an inference hardware architecture, which is taught by Das:
wherein generating an individual second machine learning model comprises removing, from the first machine learning model, an individual first inference operation that is not supported by corresponding dedicated circuitry of the inference hardware architecture (Das discloses “After constructing the final instance deployable model (1506), the electronic device (100) searches to the knowledgebase of supported/unsupported operations and uses the decision metamodel (505A) to predict the optimal replacement (1102) for the unsupported operations (1101) in the final instance deployable model (1506) … As shown in the FIG. 15B, the Maxout, a Leaky ReLU, the ELU in the original operations (1101) are the unsupported operations for the NPU (904B) in the final instance deployable model (1506). The controller (905) (905) of the NAS platform (900) replaces the Maxout, the Leaky ReLU, the ELU in the original operations (1101) with the PAU, a tanh, the PAU operations in the alternative operations (1102) respectively that are supporting by the NPU (904B) for the deployment. Further, the controller (905) generates the final instance deployable model (1507) with the supporting operations suitable for the NPU (904B)” [0225].
A final instance deployable model (‘first machine learning model’) is constructed. The unsupported operations of the final instance deployable model (the Maxout, Leaky ReLU, and ELU original operations) are replaced (‘removed’) by the PAU, tanh, and PAU operations that are supported by a NPU (Neural Processing Unit).)
and replacing the individual first inference operation with a particular second inference operation that is supported by corresponding dedicated circuitry of the inference hardware architecture (Das discloses “The controller (905) (905) of the NAS platform (900) replaces the Maxout, the Leaky ReLU, the ELU in the original operations (1101) with the PAU, a tanh, the PAU operations in the alternative operations (1102) respectively that are supporting by the NPU (904B) for the deployment. Further, the controller (905) generates the final instance deployable model (1507) with the supporting operations suitable for the NPU (904B)” [0225].
The result of replacing unsupported operations is a generated final instance deployable model (‘second machine learning model’) containing supported operations suitable for the NPU (‘inference hardware architecture’). See [0135] describing how a Neural Processing Unit is an AI-dedicated processor. A NPU designed for dedicated AI processing that supports various operations implies operations are supported by “dedicated circuitry”.).
Das teaches searching for operations that are supported by a NPU and generating an optimized model by replacing unsupported operations is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the method of Kobayashi with the technique disclosed by Das to replace unsupported operations in a neural network. By replacing unsupported operations in a neural network, a neural network can be made compatible with a target hardware, leading to improved performance and lower power consumption.
With respect to claim 10, the combination of Kobayashi in view of Das teaches:
the method of claim 1, further comprising: outputting multiple final machine learning models selected according to different metrics (Kobayashi discloses “FIG. 9 is a diagram illustrating a configuration example of a form to be presented to the user when search of a network structure is finished. Referring to FIG. 9, a form F1 in which a search result is displayed includes a region V1 for displaying a Pareto optimal solution and a region V2 for displaying outline of the evaluation result. Here, referring to region V1, in an example illustrated in FIG. 9, it can be seen that neural networks P4 to P6 relating to three Pareto optimal solutions are highlighted in addition to a state of the Pareto optimal solutions illustrated in FIG. 8C. Here, the neural networks P4 to P6 may be respectively a network relating to maximum performance, a network relating to an intermediate solution and a network relating to a minimum calculation amount. In this event, the neural network P4 may be a network with the least error among the found networks” [0117-0119]. See Figure 9 (reproduced above).).
With respect to claim 18, Kobayashi teaches:
a computing device comprising (Kobayashi discloses “the information processing apparatus 10 according to the present disclosure includes a display unit 110, an input unit 120, a form control unit 130 and a server communication unit 140 … the display unit 110 may have a function as an input unit which accepts information input from the user. The function as the input unit can be implemented by, for example, a touch panel. The input unit 120 has a function of accepting information input from the user and handing over the input information to each component of the information processing apparatus 10” [0069-0071].):
a hardware processing unit configured to execute a plurality of … inference operations … (Kobayashi discloses a CPU, “each of the information processing apparatus 10 and the information processing server 30 includes, for example, a CPU 871, a ROM 872, a RAM 873, a host bus 874, a bridge 875, an external bus 876, an interface 877, an input apparatus 878, an output apparatus 879, a storage 880, a drive 881, a connection port 882, and a communication apparatus 883” [0202].
Kobayashi further discloses “generation of a neural network and updating of a Pareto optimal solution may be realized by the information processing apparatus 10. In this case, the form control unit 130 of the information processing apparatus 10 may generate another network on the basis of the seed network and transmit information relating to the other network to the information processing server 30” [0217]. See Figure 1 illustrating the generation of neural networks by mutating convolution layers (‘inference operations’).);
and a storage resource storing computer-readable instructions which, when executed by the hardware processing unit, cause the hardware processing unit to (Kobayashi discloses “the ROM 872 is a device that stores programs read by the CPU 871, data used for operations, and the like. For example, a program read by the CPU 871, various kinds of parameters that appropriately change when the program is executed, and the like are temporarily or permanently stored in the RAM 873” [0205].):
determine a device context for the computing device (Kobayashi discloses “the user may be able to designate a target to be optimized by selecting the optimization target (Optimize for). For example, the user may be able to designate optimization of both the learning accuracy and the calculation amount or may designate optimization of one of the learning accuracy and the calculation amount. By designating the optimization target, the user can obtain a search result which matches application. Further, the user may be able to designate a range of a network to be found by inputting a search range (Search Range). For example, the user may be able to designate a maximum value and a minimum value relating to a validation error and the number of times of multiply add of a network to be found. The user can prevent search of a network for which a calculation amount is too large or a network for which learning accuracy is too low by designating the above-described search range.” [0195-0196].
Kobayashi further discloses “search according to the present disclosure may be controlled on the basis of, for example, the number of times of trial of search designated by the user, and limit information of memory usage, or the like, relating to hardware in which a neural network is implemented. Setting of search according to the present disclosure can be changed as appropriate in accordance with specifications and operation relating to a neural network” [0201].
See Figure 20 illustrating a screen a user can use to configure neural network search settings.);
based at least on the device context, select a particular machine learning model from a plurality of machine learning models available to the computing device, the plurality of machine learning models having different … inference operations … (Kobayashi discloses “in search of a network structure according to the present disclosure, various settings by the user may be accepted. FIG. 20 is an example of a setting screen relating to search of the present disclosure. Here, the example illustrated in FIG. 20 may be an example of a screen displayed at the display unit 110 of the information processing apparatus 10. Referring to FIG. 20, the setting screen relating to search of the present disclosure may include, for example, setting items relating to a search method, an optimization target, a search range, early stopping and time limit” [0192-0193].
Kobayashi discloses Figure 9 (reproduced above) depicting a plurality of Pareto optimal solutions (labeled P4 – P6). Pareto optimal solutions depict the tradeoff between error and calculation amount (performance) metrics. Each Pareto optimal solution is derived from evaluating a generated neural network.
Kobayashi discloses “it is possible to present a candidate selected from the neural networks relating to the Pareto optimal solutions to the user. Here, the above-described candidate may include a network relating to maximum performance, a network relating to an intermediate solution and a network relating to a minimum calculation amount. Further, with the information processing method according to the present embodiment, it is possible to allow the user to download a file relating to execution of a network. By this means, the user can easily select a network which satisfies conditions and acquire a file relating to execution of the network” [0125].);
and execute the particular machine learning model to perform a particular task (Kobayashi discloses “the user may be able to download a file relating to execution of the corresponding neural network by operating the evaluation outline R1 to R3. Here, the file to be downloaded may include a configuration file of a parameter, an XML file which defines a network, a source code which executes ForwardProp (prediction and identification) by loading the above-described two files, or the like … it is possible to present a candidate selected from the neural networks relating to the Pareto optimal solutions to the user. Here, the above-described candidate may include a network relating to maximum performance, a network relating to an intermediate solution and a network relating to a minimum calculation amount. Further, with the information processing method according to the present embodiment, it is possible to allow the user to download a file relating to execution of a network. By this means, the user can easily select a network which satisfies conditions and acquire a file relating to execution of the network” [0124-0125].).
However, Kobayashi does not teach a hardware processing unit configured to execute a plurality of supported inference operations and a machine learning model having different supported inference operations corresponding to different dedicated circuitry, which is taught by Das:
a computing device comprising (Das discloses an electronic device (‘computing device’), “identifying, by an electronic device (100), a task needs to be executed in the electronic device (100). The method includes estimating, by the electronic device (100), a performance threshold at the time of execution of the identified task. The method includes identifying, by the electronic device (100), an operation capability of the electronic device (100). The method includes configuring, by the electronic device (100), a pre-trained Artificial Intelligence (AI) model to select one or more neural blocks from a plurality neural blocks to optimize a performance of the task in the electronic device (100)” [0019].):
a hardware processing unit configured to execute a plurality of supported inference operations (Das discloses “the electronic device (100) searches to the knowledgebase of supported/unsupported operations and uses the decision metamodel (505A) to predict the optimal replacement (1102) for the unsupported operations (1101) in the final instance deployable model (1506) … As shown in the FIG. 15B, the Maxout, a Leaky ReLU, the ELU in the original operations (1101) are the unsupported operations for the NPU (904B) in the final instance deployable model (1506). The controller (905) (905) of the NAS platform (900) replaces the Maxout, the Leaky ReLU, the ELU in the original operations (1101) with the PAU, a tanh, the PAU operations in the alternative operations (1102) respectively that are supporting by the NPU (904B) for the deployment. Further, the controller (905) generates the final instance deployable model (1507) with the supporting operations suitable for the NPU (904B)” [0225].
Das further discloses “the electronic device (100) represents (1201) the abstract DNN model (504) based on the task that is to be executed using the available hardware configuration (i.e. GPU (904A), NPU (904B), DSP(904C)) of the electronic device (100). The electronic device (100) deploys (1202) the abstract DNN model (504) for executing the task” [0161].),
respective supported inference operations having corresponding dedicated circuitry on the hardware processing unit (Das discloses “The controller (905) (905) of the NAS platform (900) replaces the Maxout, the Leaky ReLU, the ELU in the original operations (1101) with the PAU, a tanh, the PAU operations in the alternative operations (1102) respectively that are supporting by the NPU (904B) for the deployment” [0225].
The unsupported operations of a final instance deployable model (the Maxout, Leaky ReLU, and ELU original operations) are replaced by the PAU, tanh, and PAU operations that are supported by a NPU (Neural Processing Unit). See [0135] describing how a Neural Processing Unit (‘hardware processing unit’) is an AI-dedicated processor. A NPU designed for dedicated AI processing that supports various operations implies that each operation is supported by “dedicated circuitry”.);
determine a device context for the computing device (Das discloses hardware parameters (‘device context’), “the method includes identifying the task to be executed in the electronic device (100). In an embodiment, the method allows the task executor (111) to identify the task to be executed in the electronic device (100). At step 402, the method includes estimating performance parameter to be achieved while executing the task. In an embodiment, the method allows the performance parameter estimator (112) to estimate performance parameter to be achieved while executing the task. At step 403, the method includes determining the hardware parameters of the electronic device (100) used to execute the task based on the performance parameter and the task” [0122].
Das further discloses “Examples for the hardware parameters are, but not limited to a processor speed, number of cores in the processor (130), a data transmission speed wireless modules, a storage capacity of the memory (120), a write/read speed at the memory (120)” [0065].);
based at least on the device context, select a particular machine learning model … available to the computing device (Das discloses “the method includes determining the optimal neural blocks from a plurality of neural blocks based on the performance parameter and the hardware parameters of the electronic device (100) … At step 405, the method includes generating the optimized DNN model for executing the at least task based on the optimal neural blocks” [0123].),
the machine learning [model] having different supported inference operations corresponding to different dedicated circuitry on the hardware processing unit (Das discloses “The controller (905) (905) of the NAS platform (900) replaces the Maxout, the Leaky ReLU, the ELU in the original operations (1101) with the PAU, a tanh, the PAU operations in the alternative operations (1102) respectively that are supporting by the NPU (904B) for the deployment” [0225].
The unsupported operations of a final instance deployable model are replaced by the PAU, tanh, and PAU operations (‘different supported inference operations’) that are supported by a Neural Processing Unit. See [0135] describing how a Neural Processing Unit (‘hardware processing unit’) is an AI-dedicated processor. A Neural Processing Unit designed for dedicated AI processing that supports different operations implies that each different operation is supported by “dedicated circuitry.” A NPU performing varying operations implies the use of “different dedicated circuitry” since operations have varying computational requirements that would require specialized hardware.
Das further discloses hardware components consist of circuits implemented by dedicated hardware (‘dedicated circuitry’), “As is traditional in the field, embodiments may be described and illustrated in terms of blocks which carry out a described function or functions. These blocks, which may be referred to herein as managers, units, modules, hardware components or the like, are physically implemented by analog and/or digital circuits … The circuits constituting a block may be implemented by dedicated hardware, or by a processor (e.g., one or more programmed microprocessors and associated circuitry), or by a combination of dedicated hardware to perform some functions of the block and a processor to perform other functions of the block” [0051].
Das discloses model operations depend on hardware components (‘dedicated circuitry’), “Operations inside the DNN models are depended on hardware components that are suitable for the execution. Some operations in the DNN models may not be supported by other computing units due to not having enough memory bandwidth at the electronic device or a number precision to perform a complex tensor operation. This will cause significant commercial loss due to lower performance in use cases for certain electronic devices or may cause up to 30% of model drop ratio. The proposed method can be used to optimize the DNN model by changing/approximating unsupported operations with supported operations” [0061].);
and execute the particular machine learning model to perform a particular task (Das discloses “The controller (905) (905) of the NAS platform (900) replaces the Maxout, the Leaky ReLU, the ELU in the original operations (1101) with the PAU, a tanh, the PAU operations in the alternative operations (1102) respectively that are supporting by the NPU (904B) for the deployment. Further, the controller (905) generates the final instance deployable model (1507) with the supporting operations suitable for the NPU (904B)” [0225].
Das discloses “At step 405, the method includes generating the optimized DNN model for executing the at least task based on the optimal neural blocks … the method includes executing the task using the optimized DNN model. In an embodiment, the method allows the task executor (111) to execute the task using the optimized DNN model” [0123].).
Das teaches generating an optimized model with operations supported by a neural processing unit is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the method of Kobayashi with the technique disclosed by Das to generate a neural network with only supported operations. By removing unsupported operations by a target hardware in a neural network, a neural network can be generated to only include supported operations, leading to improved performance and lower power consumption.
With respect to claim 19, the combination of Kobayashi in view of Das teaches:
the computing device of claim 18, the device context relating to availability of power or memory on the computing device (Kobayashi discloses “search according to the present disclosure may be controlled on the basis of, for example, the number of times of trial of search designated by the user, and limit information of memory usage, or the like, relating to hardware in which a neural network is implemented. Setting of search according to the present disclosure can be changed as appropriate in accordance with specifications and operation relating to a neural network” [0201]).
With respect to claim 20, the combination of Kobayashi in view of Das teaches:
the computing device of claim 19, wherein the computer-readable instructions, when executed by the hardware processing unit, cause the hardware processing unit to (See [0205] of the Kobayashi disclosure discussing a CPU executing a program.):
in a first instance when availability of memory for the computing device is constrained, select a first machine learning model as the particular machine learning model to execute to perform the particular task, the first machine learning model having been generated based at least on a first metric relating to memory utilization (The Examiner interprets “in a first instance when availability of memory for the computing device is constrained” according to its broadest reasonable interpretation as encompassing limiting memory usage as disclosed by Kobayashi. Kobayashi discloses “note that search of a network according to the present disclosure may be controlled through setting other than the above-described examples. Search according to the present disclosure may be controlled on the basis of, for example, the number of times of trial of search designated by the user, and limit information of memory usage, or the like, relating to hardware in which a neural network is implemented” [0125].
Kobayashi discloses “note that, while, in the above description, a case has been described as an example where trade-off information relating to an error and a calculation amount is presented to the user, the trade-off information according to the present embodiment is not limited to such an example. In the trade-off information according to the present embodiment, for example, memory usage, an amount of heat generation, power consumption, or the like, relating to hardware may be used as well as the calculation amount. Further, in the trade-off information, total cost of hardware calculated from the calculation amount, total service cost, or the like, including server cost, or the like, may be used. Still further, switching of the above-described items may be realized by user selection. The evaluating unit 320 can calculate the above-described values on the basis of information relating to hardware and service, which is stored in advance” [0116].
Kobayashi discloses Figure 9 (reproduced above) depicting a plurality of Pareto optimal solutions (labeled P4 – P6). Each Pareto optimal solution depicts the tradeoff between two evaluated metrics of a generated neural network.
See [0125] discussing how a user can select a neural network from Pareto optimal solutions based on whether a neural network satisfies given conditions.);
and in a second instance when availability of power to the computing device is constrained, select a second machine learning model as the particular machine learning model to execute to perform the particular task, the second machine learning model having been generated based at least on a second metric relating to power consumption (Kobayashi discloses power consumption can be a metric in a trade-off relationship, “note that, while, in the above description, a case has been described as an example where trade-off information relating to an error and a calculation amount is presented to the user, the trade-off information according to the present embodiment is not limited to such an example. In the trade-off information according to the present embodiment, for example, memory usage, an amount of heat generation, power consumption, or the like, relating to hardware may be used as well as the calculation amount. Further, in the trade-off information, total cost of hardware calculated from the calculation amount, total service cost, or the like, including server cost, or the like, may be used. Still further, switching of the above-described items may be realized by user selection. The evaluating unit 320 can calculate the above-described values on the basis of information relating to hardware and service, which is stored in advance” [0116].
The Examiner interprets “in a second instance when availability of power to the computing device is constrained” according to its broadest reasonable interpretation as suppressing power consumption as disclosed by Kobayashi.
Kobayashi discloses that a metric in a trade-off relationship is suppressed, which as previously discussed would include power consumption, “however, because a calculation amount largely affects memory usage and execution time of hardware in which a neural network is mounted, a neural network with high learning accuracy is not always the best neural network. In other words, in a neural network, a calculation amount and learning accuracy have, so-called, trade-off relationship. Therefore, a method for searching for a network structure with higher learning accuracy while suppressing a calculation amount has been desired” [0051].
Kobayashi discloses Figure 9 (reproduced above) depicting a plurality of Pareto optimal solutions (labeled P4 – P6). Each Pareto optimal solution depicts the tradeoff between two evaluated metrics of a generated neural network.
See [0125] discussing how a user can select a neural network from Pareto optimal solutions based on whether a neural network satisfies given conditions.).
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Kobayashi in view of Das, in further view of Justus et al. (“Predicting the Computational Cost of Deep Learning Models”), hereinafter Justus.
With respect to claim 9, the combination of Kobayashi in view of Das teaches:
the method of claim of 1, further comprising: obtaining respective [operation] metrics via the hardware emulation (Kobayashi discloses “the evaluating unit 320 has a function of acquiring an evaluation result of the generated neural network. The evaluating unit 320 may acquire the above-described evaluation result by, for example, causing a computing resource on cloud to execute the generated neural network. Further, the evaluating unit 320 may acquire the evaluation result by causing an emulator or various kinds of devices connected via the network 20 to execute the neural network. Further, the evaluation result acquired by the evaluating unit 320 may include a calculation amount relating to the generated neural network and at least one of a training error and a validation error (hereinafter, the training error and the validation error may be collectively expressed as an error” [0079-0080].);
and using the respective [operation] metrics to select individual second machine learning models as parent models for further modification or to select the final machine learning model (Kobayashi discloses “it is possible to acquire an evaluation result of a generated neural network and update a Pareto optimal solution relating to the evaluated neural network on the basis of the acquisition result. That is, with the information processing method according to the present disclosure, in the case where the evaluation result of the other generated neural network exceeds the evaluation result of the evaluated neural network, it is possible to update the above-described another neural network as a Pareto optimal solution … it is possible to generate another neural network with a different network structure from a neural network relating to the Pareto optimal solution” [0061].
Kobayashi discloses “it is possible to present a candidate selected from the neural networks relating to the Pareto optimal solutions to the user. Here, the above-described candidate may include a network relating to maximum performance, a network relating to an intermediate solution and a network relating to a minimum calculation amount. Further, with the information processing method according to the present embodiment, it is possible to allow the user to download a file relating to execution of a network. By this means, the user can easily select a network which satisfies conditions and acquire a file relating to execution of the network” [0125]. See Figure 9 (reproduced above).).
Kobayashi discloses obtaining operation metrics via hardware emulation, however, the combination of Kobayashi in view of Das does not teach obtaining per-operation metrics, which is taught by Justus:
obtaining respective per-operation metrics (Justus discloses “our approach here is to break up each deep learning network into single components, considering individual layers as the atomic operations we are going to use for performance prediction (see Figure 1). We construct a random selection of these atomic operations embedded within the simplest network we can produce. Each of these atomic operations is then executed either using forward or forwards and backwards passes and the execution times are recorded from multiple executions. The feature set for the atomic operation along with the execution timings are then used to train a fully connected feed forward network. This network can then be used for predicting the execution time for a new operation based just on the feature set. Once a prediction has been made for an individual operation these can be combined together and across layers in order to provide a prediction for the overall performance of the deep learning network. By working with these atomic operations we help to reduce the computational time to train our approach whilst also maximising the range of layer types that we can predict” (P. 3876, Sec. V.A, Paragraphs 1-2).);
Justus teaches breaking down neural network layers into operations and executing those operations to obtain runtime metrics is a known method in the art. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine the method of Kobayashi with the technique disclosed by Justus to train an optimized neural network. By recording metrics of each operation, the most efficient operations can be chosen to train a neural network, thereby ensuring that a neural network is optimized to use computational resources efficiently and effectively.
Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Kobayashi in view of Das, in further view of Maiyuran et al. (US 20210073318 A1), hereinafter Maiyuran.
With respect to claim 21, the combination of Kobayashi in view of Das teaches:
the method of claim 1, the inference hardware architecture … configured to perform multiply and accumulate operations … (Kobayashi discloses “in FIG. 8A to FIG. 8C, a case will be described as an example where trade-off information relating to a calculation amount of an objective function and an error rate is presented. Therefore. FIG. 8A to FIG. 8C indicate an error rate on a vertical axis and indicate a calculation amount of an objective function on a horizontal axis. Further, in FIG. 8A to FIG. 8C, the number of times of multiply add is employed as an example relating to the calculation amount of the objective function” [0109]. See [0116] describing how a calculation amount is calculated by evaluating unit 320.
Kobayashi discloses “the information processing server 30 includes, for example, a CPU 871” [0202]. Kobayashi further discloses “the information processing server 30 according to the present disclosure includes a generating unit 310, an evaluating unit 320” [0077].),
first … circuitry configured to perform a first convolution operation with a first input tensor size, a first output tensor size, and a first kernel size (Kobayashi discloses convolution layer Conv1 (‘first convolution operation’), “Referring to FIG. 10A, it can be seen that, in the neural network MN3 after search, the number of parameters relating to “Conv1” and “Pool2” changes compared to that in the seed network SN. Specifically, in the neural network MN3 after search, a kernel shape relating to “Conv1” is changed from 5 (vertical)×5 (horizontal) of the seed network SN to 4 (vertical)×8 (horizontal). Further, in the neural network MN3 after search, a pool shape relating to “Pool2” is changed from 2 (vertical)×2 (horizontal) of the seed network SN to 2 (vertical)×4 (horizontal)” [0128-0129].
Kobayashi discloses Figure 10A (reproduced below) depicting a mutated neural network generated after replacing convolution layers of an original seed network. Convolution layer Conv1 with a kernel shape (‘first kernel size’) of 4x8 replaces the original convolution layer Conv1 of the seed network. Input and output tensors are implied by convolution layer Conv1 having output maps, which further implies input and output tensor sizes.
PNG
media_image2.png
688
675
media_image2.png
Greyscale
Kobayashi discloses “the evaluating unit 320 has a function of acquiring an evaluation result of the generated neural network. The evaluating unit 320 may acquire the above-described evaluation result by, for example, causing a computing resource on cloud to execute the generated neural network” [0079]. A computing resource that can execute a generated neural network implies a processor, therefore “circuitry” that executes convolution operations. See [0058] describing how a generated neural network is a mutated seed network.),
and second … circuitry configured to perform a second convolution operation with a second input tensor size, a second output tensor size, and a second kernel size (Kobayashi discloses Figure 10A (reproduced above) depicting a mutated neural network generated after replacing convolution layers of an original seed network. Convolution layer Conv2 (‘second convolution operation’) with a kernel shape (‘second kernel size’) of 3x3 replaces the original convolution layer Conv2 of the seed network. Input and output tensors are implied by convolution layer Conv2 having output maps, which further implies input and output tensor sizes. See Figure 1 depicting seed network SN with original Conv2 having a kernel shape of 5x5.
Kobayashi discloses “the evaluating unit 320 has a function of acquiring an evaluation result of the generated neural network. The evaluating unit 320 may acquire the above-described evaluation result by, for example, causing a computing resource on cloud to execute the generated neural network” [0079]. A computing resource that can execute a generated neural network implies a processor, therefore “circuitry” that executes convolution operations. See [0058] describing how a generated neural network is a mutated seed network.),
wherein the first machine learning model includes a third convolution operation having a third input tensor size, a third output tensor size, and a third kernel size (Kobayashi discloses convolution layer Conv1 (‘third convolution operation’) of a seed network (‘first machine learning model’), “a seed network SN includes 10 layers including “Input” and “Output”. Further, as illustrated in an example in FIG. 1, the neural network according to the present disclosure may include a middle layer, an activating function, or the like, as well as the input and output layers. For example, in the example in FIG. 1. “Conv1” and “Conv2” indicate Convolution layers, and “Pool1” and “Pool2” indicate Max-Pooling. Therefore, in “Conv1” and “Conv2”, parameters such as kernel shapes and the number of output maps are displayed, and in “Pool1” and “Pool2”, parameters indicating pool shapes are displayed” [0056-0057].
Input and output tensors are implied by convolution layer Conv1 having output maps, which further implies input and output tensor sizes. See Figure 1 depicting Conv1 with a kernel shape (‘third kernel size’) and output maps.)
and … does not have dedicated circuitry for the third convolution operation (Kobayashi discloses “the information processing server 30 according to the present disclosure includes a generating unit 310, an evaluating unit 320 and an apparatus communication unit 330” [0077]. Kobayashi further discloses “each of the information processing apparatus 10 and the information processing server 30 includes, for example, a CPU 871 … The CPU 871 functions as, for example, an operation processing device or a control device and controls operations of all or some of the components on the basis of various kinds of programs recorded in the ROM 872, the RAM 873, the storage 880, or a removable recording medium 901” [0202-0204].
Kobayashi discloses “in FIG. 8A to FIG. 8C, a case will be described as an example where trade-off information relating to a calculation amount of an objective function and an error rate is presented. Therefore. FIG. 8A to FIG. 8C indicate an error rate on a vertical axis and indicate a calculation amount of an objective function on a horizontal axis. Further, in FIG. 8A to FIG. 8C, the number of times of multiply add is employed as an example relating to the calculation amount of the objective function” [0109]. See [0116] describing how a calculation amount is calculated by evaluating unit 320.
See Figure 8B depicting a training error (ST) and a validation error (SV) calculated for a seed network as well as a calculation amount (number of multiply add operations).
To calculate a calculation amount, a seed network and its convolution operations must be executed. Therefore, an evaluating unit that calculates the number of multiply add operations uses a CPU to execute a seed network. A CPU that executes various kinds of programs implies a general-purpose processor that is not designed for a single function, which further implies non-dedicated circuitry. Therefore, a convolution layer Conv1 (‘third convolution operation’) of a seed network is executed by non-dedicated circuitry.),
and the final machine learning model is generated by replacing the third convolution operation with at least one of the first convolution operation or the second convolution operation (Kobayashi discloses “Referring to FIG. 10A, it can be seen that, in the neural network MN3 after search, the number of parameters relating to “Conv1” and “Pool2” changes compared to that in the seed network SN. Specifically, in the neural network MN3 after search, a kernel shape relating to “Conv1” is changed from 5 (vertical)×5 (horizontal) of the seed network SN to 4 (vertical)×8 (horizontal). Further, in the neural network MN3 after search, a pool shape relating to “Pool2” is changed from 2 (vertical)×2 (horizontal) of the seed network SN to 2 (vertical)×4 (horizontal)” [0128-0129].
Kobayashi discloses Figure 10A (reproduced above) depicting a mutated neural network (‘final machine learning model’) generated after changing (‘replacing’) convolution layers of an original seed network. Convolution layer Conv1 (‘first convolution operation’) with a kernel shape of 4x8 replaces the original convolution layer Conv1 (‘third convolution operation’) of the seed network.).
However, Kobayashi does not teach using dedicated circuitry to perform operations which is taught by Das:
first dedicated circuitry configured to perform a first … operation … (Das discloses a PAU operation (‘first operation’) is supported by a NPU, “the controller (905) inserts the universal approximators such as the PAUs or other power series approximations to the unsupported operations in the abstract DNN (504) and generates the optimal DNN (513) … As shown in the FIG. 15B, the Maxout, a Leaky ReLU, the ELU in the original operations (1101) are the unsupported operations for the NPU (904B) in the final instance deployable model (1506). The controller (905) (905) of the NAS platform (900) replaces the Maxout, the Leaky ReLU, the ELU in the original operations (1101) with the PAU, a tanh, the PAU operations in the alternative operations (1102) respectively that are supporting by the NPU (904B) for the deployment. Further, the controller (905) generates the final instance deployable model (1507) with the supporting operations suitable for the NPU (904B)” [0225]. See Figure 15B depicting a PAU operation replacing a Maxout operation.
See [0135] describing how a Neural Processing Unit is an AI-dedicated processor. A Neural Processing Unit designed for dedicated AI processing that supports different operations implies that each different operation is supported by “dedicated circuitry.” Therefore, a PAU operation that is supported by a NPU is performed by dedicated circuitry.),
and second dedicated circuitry configured to perform a second … operation … (Das discloses a tanh operation (‘second operation’) is supported by a NPU, “the controller (905) (905) of the NAS platform (900) replaces the Maxout, the Leaky ReLU, the ELU in the original operations (1101) with the PAU, a tanh, the PAU operations in the alternative operations (1102) respectively that are supporting by the NPU (904B) for the deployment.” [0225]. See Figure 15B depicting a tanh operation replacing a Leaky ReLU operation.
See [0135] describing how a Neural Processing Unit is an AI-dedicated processor. A Neural Processing Unit designed for dedicated AI processing that supports different operations implies that each different operation is supported by “dedicated circuitry.” Therefore, a tanh operation that is supported by a NPU is performed by dedicated circuitry.),
wherein the first machine learning model includes a third … operation … (Das discloses a final instance deployable model (‘first machine learning model’) contains a Maxout original operation (‘third operation’) that is unsupported, “After constructing the final instance deployable model (1506), the electronic device (100) searches to the knowledgebase of supported/unsupported operations and uses the decision metamodel (505A) to predict the optimal replacement (1102) for the unsupported operations (1101) in the final instance deployable model (1506) … As shown in the FIG. 15B, the Maxout, a Leaky ReLU, the ELU in the original operations (1101) are the unsupported operations for the NPU (904B) in the final instance deployable model (1506)” [0225].),
and the final machine learning model is generated by replacing the third … operation with at least one of the first … operation or the second … operation (Das discloses “As shown in the FIG. 15B, the Maxout, a Leaky ReLU, the ELU in the original operations (1101) are the unsupported operations for the NPU (904B) in the final instance deployable model (1506). The controller (905) (905) of the NAS platform (900) replaces the Maxout, the Leaky ReLU, the ELU in the original operations (1101) with the PAU, a tanh, the PAU operations in the alternative operations (1102) respectively that are supporting by the NPU (904B) for the deployment. Further, the controller (905) generates the final instance deployable model (1507) with the supporting operations suitable for the NPU (904B)” [0225]. See Figure 15B depicting a PAU operation (‘first operation’) replacing a Maxout operation (‘third operation’).).
Das teaches generating an optimized model by replacing unsupported operations with operations supported by a neural processing unit is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the method of Kobayashi with the technique disclosed by Das to replace unsupported operations in a neural network. By replacing unsupported operations in a neural network, a neural network can be made compatible with a target hardware, leading to improved performance and lower power consumption.
Furthermore, the combination of Kobayashi in view of Das does not each a systolic array configured to perform multiply and accumulate operations on input data using parallel nodes, which is taught by Maiyuran:
the inference hardware architecture comprising a systolic array configured to perform multiply and accumulate operations on input data using parallel nodes (Maiyuran discloses “apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware” [Abstract].
Maiyuran further discloses “the compute unit 610 can also include a systolic array 612, and a math unit 613. The systolic array 612 includes a W wide and D deep network of data processing units that can be used to perform vector or other data-parallel operations in a systolic manner. In one embodiment the systolic array 612 can be configured to perform matrix operations, such as matrix dot product operations” [0124].),
Maiyuran teaches using a systolic array to perform multiply-add operations and data-parallel operations is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the method of Kobayashi with the systolic array disclosed by Maiyuran to perform multiply-add operations in a parallel manner. By performing multiply-add operations using a systolic array, processing speed can be increased due to a distributed, parallel workload, thereby increasing processing efficiency.
Claims 11-16 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Das in view of Gupta et al. (“Accelerator-Aware Neural Network Design Using AutoML”), hereinafter Gupta, and in further view of Hu et al. (US 20220343165 A1). Hereinafter Hu.
With respect to claim 11, Das teaches:
A system comprising (Das discloses “embodiments herein provide a NAS method of generating an optimized DNN model for executing a task in an electronic device. The method includes identifying, by the electronic device, the task to be executed in the electronic device” [0011].):
a hardware processing unit (Das discloses “the embodiments herein provide the electronic device for generating the optimized DNN model to execute the task. The electronic device includes a NAS controller, a memory, a processor, where the NAS controller is coupled to the memory and the processor” [0018].);
and a storage resource storing computer-readable instructions which, when executed by the hardware processing unit, cause the hardware processing unit to (Das discloses “the embodiments herein provide the electronic device for generating the optimized DNN model to execute the task. The electronic device includes a NAS controller, a memory, a processor, where the NAS controller is coupled to the memory and the processor” [0018]. Stored executable computer-readable instructions are implied by the use of a memory and processor.):
perform a search of a machine learning model search space having a plurality of inference operations that are supported by an inference hardware architecture (Das discloses “After constructing the final instance deployable model (1506), the electronic device (100) searches to the knowledgebase of supported/unsupported operations and uses the decision metamodel (505A) to predict the optimal replacement (1102) for the unsupported operations (1101) in the final instance deployable model (1506)” [0225].
Das further discloses “the optimal DNN model generator (114) searches for the standard operations at the knowledgebase to replace the unsupported operations. Further, the optimal DNN model generator (114) replaces the unsupported operations with the standard operations, and retraining the neural block with the standard operations, when the standard operations are available” [0119].),
and output a final machine learning model selected from the machine learning model search space (Das discloses “After constructing the final instance deployable model (1506), the electronic device (100) searches to the knowledgebase of supported/unsupported operations and uses the decision metamodel (505A) to predict the optimal replacement (1102) for the unsupported operations (1101) in the final instance deployable model (1506). … The controller (905) (905) of the NAS platform (900) replaces the Maxout, the Leaky ReLU, the ELU in the original operations (1101) with the PAU, a tanh, the PAU operations in the alternative operations (1102) respectively that are supporting by the NPU (904B) for the deployment. Further, the controller (905) generates the final instance deployable model (1507) with the supporting operations suitable for the NPU (904B)” [0225].
Das discloses unsupported operations are replaced to deploy (‘output’) a final instance deployable model, “As shown in the FIG. 11, the Maxout and the ELU are the unsupported operations for the NPU (904B) in a final instance deployable model (i.e. the optimized DNN (513)). The controller (905) of the NAS platform (900) replaces the Maxout and the ELU with the PAUs that is supporting by the NPU (904B) for the deployment” [0160].).
However, Das does not teach emulating inference architecture hardware, which is taught by Gupta:
perform a search of a machine learning model search space having a plurality of inference operations that are supported by an inference hardware architecture (Gupta discloses “We extend these NAS frameworks to search for computer vision models customized for the different instantiations of Google’s Edge TPU neural network hardware accelerator architecture: Edge TPU in the USB/PCI-e attached Coral devices and in the Pixel 4 smartphone. We pay special attention to the design of the search space used for sampling the candidate neural network architectures. In particular, we augment the search space with building blocks known to achieve high overall utilization on the Edge TPU architecture. In addition, we prohibit the use of operations incompatible with the production software stack, thereby yielding models that are readily deployed on the target devices” (P. 1, Sec. 1).),
the search involving emulation of the inference architecture hardware (Gupta discloses “to address the challenges of real-device measurements, we used a cycle-accurate Edge TPU performance simulator to estimate the latencies of the candidate models. Our simulator faithfully models most of the key subsystems to evaluate full models under a few minutes while providing a very close proxy for the real device” (P. 2, Sec. 2.1).);
and output a final machine learning model selected from the machine learning model search space (Gupta discloses “we present a class of computer vision models designed using hardware-aware neural architecture search and customized to run on the Edge TPU, Google’s neural network hardware accelerator for low-power, edge devices. For the Edge TPU in Coral devices, these models enable real-time image classification performance while achieving accuracy typically seen only with larger, compute-heavy models running in data centers. On Pixel 4’s Edge TPU, these models improve the accuracy-latency tradeoff over existing SoTA mobile models” (P. 1, Abstract).).
Gupta teaches generating and simulating candidate neural network models is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the neural architecture search method of Das with the performance simulator of Gupta to simulate candidate model performance. By simulating candidate model performance, machine learning model performance can be evaluated without running the actual model on the target hardware, thereby mitigating real-world inference costs and accelerating the model design process.
Furthermore, the combination of Das in view of Gupta does not teach the search considering placement of individual inference operations on different types of processing units, which is taught by Hu:
wherein the search considers placement of individual inference operations on a first type of processing unit that does not support the inference hardware architecture and a second type of processing unit that does support the inference hardware architecture (Hu discloses “a first portion of the neural network may be dispatched to the CPU 26a via a CPU process 30a, a second portion of the neural network might be dispatched to the GPU 26b via a GPU process 30b, a third portion of the neural network may be dispatched to the AI accelerator 26c via an AI process 30c, a fourth portion of the neural network might be dispatched to the FPGA 26d via an FPGA process 30d, and so forth. In an embodiment, the first portion of the neural network includes one or more operations that are either unsupported or less efficient when executed by the GPU 26b, the AI accelerator 26c and/or the FPGA 26d, the second portion of the neural network includes one or more operations that are either unsupported or less efficient when executed by the CPU 26a, the AI accelerator 26c and/or the FPGA 26d, the third portion of the neural network includes one or more operations that are either unsupported or less efficient when executed by the CPU 26a, the GPU 26b and/or the FPGA 26d, and so forth. Accordingly, partitioning the computation graph across the devices 26 based on device capability may enable the architecture 20 to achieve more efficient execution of the neural network, which in turn enhances performance and stability” [0016].),
and the final machine learning model indicates that certain inference operations are performed on the first type of processing unit and other inference operations are performed on the second type of processing unit (Hu discloses “the architecture 20 includes a plurality of heterogeneous devices 26 (26a-26d) such as, for example, a central processing unit (CPU, e.g., host processor) 26a, a graphics processing unit (GPU, e.g., graphics processor with highly parallel processing capabilities) 26b, an artificial intelligence (AI) accelerator 26c and a field programmable gate array (FPGA) 26d. As will be discussed in greater detail, an application programming interface (API, e.g., Web Neural Network/WebNN API) implementation 28 and a plurality of processes 30 (30a-30d) may be used to dispatch portions (e.g., subgraphs) of the neural network to the devices 26 based on the capabilities of the devices 26 in relation to the operations performed by the neural network. Such an approach enhances the performance, stability and/or security of the architecture 20” [0015].
Portions of a neural network (‘final machine learning model’) are dispatched to several processing units based on target hardware compatibility. Each processing unit then performs its supported operations.).
Hu teaches partitioning a neural network across several processing units based on operation compatibility is known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the neural architecture search method of Das and the performance simulator of Gupta with the partitioning technique disclosed by Hu to partition a neural network based on operation compatibility. By partitioning a neural network based on operation compatibility, a neural network can be more efficiently executed, thereby increasing model performance and stability.
With respect to claim 12, the combination of Das in view of Gupta and in further view of Hu teaches:
the system of claim 11, wherein the inference operations include convolution operations, vector operations, or matrix operations having specified input and output data sizes (Gupta discloses “our search space includes several potentially useful blocks with varying kernel and tensor sizes … we introduce a fused inverted bottleneck convolution block that fuses the initial expansion convolution with the depthwise convolution into a single full convolution (Figure 3). Originally this block expands the depth of the input tensor and performs a “cheaper” depthwise convolution with a larger depth dimension. Although, the fused alternative performs a more “expensive” full convolution at a larger depth dimension, it can utilize the hardware resources better and provide more trainable parameters which can be a good latency-accuracy trade-off … In Figure 4, on the top, 5x5 kernel size choice leads to 2.78x increase in the number of MACs and parameters compared to 3x3 kernel size which leads to 2.71x increase in the runtime (1122us vs. 414us)” (P. 3, Sec. 2.2, Paragraphs 2-3). See Figures 3 and 4 on P. 3 depicting convolution blocks with fixed input and output tensor sizes.).
Gupta teaches a search space comprised of convolution blocks (‘operations’) with varying tensor sizes is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to combine the neural architecture search method of Das with the search space disclosed by Gupta to create a search space comprised of convolution blocks with varying tensor sizes. By creating a search space comprised of convolution blocks with varying tensor sizes, machine learning engineers can design a model based on hardware constraints, thereby allowing engineers to balance the trade-off between hardware resource utilization and runtime.
With respect to claim 13, the combination of Das in view of Gupta and in further view of Hu teaches:
the system of claim 11, wherein the search is performed starting from a seed model that has been selected based on performance with respect to a particular task (Das discloses “the NAS controller (110) is configured to determine a quality of each neural block in the plurality of neural blocks based on a probability distribution in executing the task using the data inputs, the performance parameter and the hardware parameters. Further, the NAS controller (110) is configured to select the optimal neural blocks from the plurality of neural blocks based on the quality of each neural block. Further, the NAS controller (110) is configured to generate a standard DNN model using the optimal neural blocks. Further, the NAS controller (110) is configured to optimize the standard DNN model by modifying unsupported operations used for the execution of the task with supported operations to generate the optimized DNN model” [0066].
Das discloses “The NAS controller (110) is configured to estimate a performance parameter to be achieved while executing the task. Examples for the performance parameter are, but not limited to a latency, a frame rate, a resolution, a bit rate, and the like” [0064].
A standard DNN model (‘seed model’) is generated (‘selected’) using optimal neural blocks. Optimal neural blocks are chosen based on neural block quality, which is determined based on a performance parameter (latency, frame rate achieved while executing a task), therefore a standard DNN model is generated based on task performance. The standard DNN is then optimized by replacing unsupported operations to generate an optimized DNN model.
Das discloses “the optimal DNN model generator (114) searches for the standard operations at the knowledgebase to replace the unsupported operations. Further, the optimal DNN model generator (114) replaces the unsupported operations with the standard operations, and retraining the neural block with the standard operations, when the standard operations are available” [0119].).
With respect to claim 14, the combination of Das in view of Gupta and in further view of Hu teaches:
the system of claim 13, wherein the seed model includes a particular inference operation that is not supported by the inference hardware architecture (Das discloses “the NAS controller (110) is configured to generate a standard DNN model using the optimal neural blocks. Further, the NAS controller (110) is configured to optimize the standard DNN model by modifying unsupported operations used for the execution of the task with supported operations to generate the optimized DNN model” [0066].).
With respect to claim 15, the combination of Das in view of Gupta and in further view of Hu teaches:
the system of claim 14, wherein the final machine learning model does not include the particular inference operation (Das discloses “the electronic device (100) generates the optimal DNN model (513) by performing the operation optimization (1205C) i.e. replacing the unsupported operations (1203A, 1203B) for the NPU (904B) with the supported operations (1205A, 1205B). Further, the electronic device (100) uses the optimal DNN model (513) for the deployment with low latency and high accuracy” [0162].).
With respect to claim 16, the combination of Das in view of Gupta and in further view of Hu teaches:
the system of claim 11, wherein the search involves training multiple machine learning models having different inference operations supported by the inference hardware architecture (Gupta discloses “a typical neural architecture search framework consists of the following modules: a controller that samples from search space of all possible architectures, a trainer that trains the models on some dataset to arrive at an accuracy metric, an objective function that scores the candidate model to help the controller navigate the search space” (P. 2, Sec. 2, First Paragraph). See Figure 1 on P. 2 illustrating the process of creating and training multiple candidate models.
Gupta discloses “We pay special attention to the design of the search space used for sampling the candidate neural network architectures. In particular, we augment the search space with building blocks known to achieve high overall utilization on the Edge TPU architecture. In addition, we prohibit the use of operations incompatible with the production software stack, thereby yielding models that are readily deployed on the target devices” (P. 1, Sec. 1).).
Gupta teaches training and generating multiple candidate models using building blocks from a search space of compatible operations is a known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to combine the neural architecture search method of Das with the candidate model generation technique disclosed by Gupta to train multiple candidate models. By training multiple candidate models, the likelihood of generating the best model to perform a particular task can be increased, thereby ensuring that a final selected model provides the best performance as possible.
With respect to claim 22, the combination of Das in view of Gupta and in further view of Hu teaches:
the system of claim 11, the first type of processing unit being a central processing unit and the second type of processing unit being a neural processing unit having dedicated circuitry configured to perform at least some of the plurality of inference operations (Hu discloses “a first portion of the neural network may be dispatched to the CPU 26a via a CPU process 30a, a second portion of the neural network might be dispatched to the GPU 26b via a GPU process 30b, a third portion of the neural network may be dispatched to the AI accelerator 26c via an AI process 30c, a fourth portion of the neural network might be dispatched to the FPGA 26d via an FPGA process 30d, and so forth. In an embodiment, the first portion of the neural network includes one or more operations that are either unsupported or less efficient when executed by the GPU 26b, the AI accelerator 26c and/or the FPGA 26d, … the third portion of the neural network includes one or more operations that are either unsupported or less efficient when executed by the CPU 26a, the GPU 26b and/or the FPGA 26d, and so forth. Accordingly, partitioning the computation graph across the devices 26 based on device capability may enable the architecture 20 to achieve more efficient execution of the neural network, which in turn enhances performance and stability” [0016].
An AI accelerator (‘neural processing unit’) that performs various supported neural network operations implies “dedicated circuitry” configured to perform inference operations.).
Hu teaches partitioning a neural network across a CPU and an AI accelerator is known method in the art. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the neural architecture search method of Das with the partitioning technique disclosed by Hu to partition a neural network based on operation compatibility. By partitioning a neural network based on operation compatibility, a neural network can be more efficiently executed, thereby increasing model performance and stability.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PEDRO J MORALES whose telephone number is (571)272-6106. The examiner can normally be reached 8:30 AM - 6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA M HUANG can be reached at (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PEDRO J MORALES/Examiner, Art Unit 2124
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124