Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Regarding the rejection of claims 1-5 and 13-25 as judicial exceptions to 35 U.S.C. 101, Applicant’s arguments regarding the currently amended claims are persuasive and the rejection is withdrawn. Examiner notes that claims 6-12 were rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter, and these claims have not been amended to overcome the rejection.
Regarding the rejection of claims 1-25 under 35 U.S.C. 103, Applicant’s arguments are directed towards amended claims that have not been previously examined. New grounds of rejection for these claims are given below.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 6-12 rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. These claims recite “At least one computer readable storage medium comprising a set of instructions, which when executed by a computing system, cause the computing system to” perform the steps that follow. The specification provides no explicit definition of a “computer readable storage medium,” therefore the term in its broadest reasonable interpretation includes transitory, propagating signals, which do not fall under a statutory category of invention.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3, 6, 8, 13, 15, 21, and 23 rejected under 35 U.S.C. 103 over Achille at al., “TASK2VEC: Task Embedding for Meta-Learning,” 2019, arXiv:1902.03545v1 (hereafter Achille) in view of Jha et al., US Pre-Grant Publication No. 2019/0227847 (hereafter Jha) and Argerich, Quora Post, “How do you choose between logistic regression, artificial neural networks and SVMs? Do you look at the number of features, size of dataset etc?,” 2016, https://www.quora.com/How-do-you-choose-between-logistic-regression-artificial-neural-networks-and-SVMs-Do-you-look-at-the-number-of-features-size-of-dataset-etc (hereafter Argerich).
Regarding claim 1 and analogous claims 6, 13, and 21:
Achille teaches:
“maintain a database that comprises configuration information of a plurality of neural network workloads and hardware performance metric for performing the plurality of neural network workloads”: Achille, Fig. 1 caption, “Figure 1: Task embedding across a large library of tasks (best seen magnified) (Left) T-SNE visualization of the embedding of tasks extracted from the iNaturalist, CUB-200, iMaterialist datasets [maintain a database that comprises configuration information of a plurality of neural network workloads ]. Colors indicate ground-truth grouping of tasks based on taxonomic or semantic types”; Achille, Section 2.2, paragraph 3, “As we can see from the expressions above, if the fit model is very confident in its predictions, E[(y - p)2] goes to zero. Hence, the norm of the task embedding ||F||* scales with the difficulty of the task for a given feature extractor Φ. Figure 2 (Right) shows that even for more complex models trained on real data, the FIM norm correlates with test performance [hardware performance metric for performing the plurality of neural network workloads, interpreted as including metrics related to performance of the workload on a processing unit].”
“generate a hardware efficiency estimate for the task by deploying a first variant or a second variant of a neural network based cost model, the hardware efficiency estimate indicating an estimated efficiency of a versatile processing unit performing the task”: Achille, section 1, paragraph 2, “Computation of the embedding leverages a duality between network parameters (weights) and outputs (activations) in a deep neural network (DNN): Just as the activations of a DNN trained on a complex visual recognition task are a rich representation of the input images, we show that the gradients of the weights relative to a task-specific loss are a rich representation of the task itself. Specifically, given a task defined by a dataset D = {(xi, yi)}Ni=1 of labeled samples, we feed the data through a pre-trained reference convolutional neural network which we call ‘probe network’ [neural network based], and compute the diagonal Fisher Information Matrix (FIM) of the network filter parameters to capture the structure of the task (Sect. 2). Since the architecture and weights of the probe network are fixed, the FIM provides a fixed-dimensional representation of the task. We show this embedding encodes the ‘difficulty’ of the task, characteristics of the input domain, and which features of the probe network are useful to solve it (Sect. 2.1)”; Achille, section 4, paragraph 2, “We consider for concreteness the problem of learning a joint embedding for model selection. In order to embed models in the task space so that those near a task are likely to perform well on that task [generate a hardware efficiency estimate for the task by deploying a first variant or a second variant of a neural network based cost model, the hardware efficiency estimate indicating an estimated efficiency of a versatile processing unit performing the task, interpreted as including an estimation of factors such as task difficulty that would be indicative of performance factors on a processing unit], we formulate the following meta-learning problem: Given k models, their MODEL2VEC embedding are the vectors mi = Fi + bi where Fi is the task embedding of the task used to train model mi (if available, else we set it to zero), and bi is a learned ‘model bias’ that perturbs the task embedding to account for particularities of the model.”
“wherein the neural network based cost model is a trainable model”: Achille, section 1, paragraph 2, “Computation of the embedding leverages a duality between network parameters (weights) and outputs (activations) in a deep neural network (DNN): Just as the activations of a DNN trained on a complex visual recognition task are a rich representation of the input images, we show that the gradients of the weights relative to a task-specific loss are a rich representation of the task itself. Specifically, given a task defined by a dataset D = {(xi, yi)}Ni=1 of labeled samples, we feed the data through a pre-trained reference convolutional neural network which we call ‘probe network’ [wherein the neural network based cost model is a trainable model], and compute the diagonal Fisher Information Matrix (FIM) of the network filter parameters to capture the structure of the task (Sect. 2). “
“wherein the second variant of the neural network based cost model generates the hardware efficiency estimate by generating an embedding vector of the task, fetching information from the database based on the embedding vector, and generating the hardware efficiency estimate based on the fetched information”: Achille, section 3.1, paragraph 2, “To make the distance computation robust, we propose to use the cosine distance between normalized embeddings
PNG
media_image1.png
59
336
media_image1.png
Greyscale
where dcos is the cosine distance, and Fa and Fb are the two task embeddings (i.e., the diagonal of the Fisher Information computed on the same probe network), and the division is element-wise. This is a symmetric distance which we expect to capture semantic similarity between two tasks [fetching information from the database based on the embedding vector, and generating the hardware efficiency estimate based on the fetched information]”; Achille, section 1, paragraph 2, “Computation of the embedding leverages a duality between network parameters (weights) and outputs (activations) in a deep neural network (DNN): Just as the activations of a DNN trained on a complex visual recognition task are a rich representation of the input images, we show that the gradients of the weights relative to a task-specific loss are a rich representation of the task itself. Specifically, given a task defined by a dataset D = {(xi, yi)}Ni=1 of labeled samples, we feed the data through a pre-trained reference convolutional neural network which we call ‘probe network’, and compute the diagonal Fisher Information Matrix (FIM) of the network filter parameters to capture the structure of the task (Sect. 2). Since the architecture and weights of the probe network are fixed, the FIM provides a fixed-dimensional representation of the task. We show this embedding encodes the ‘difficulty’ of the task, characteristics of the input domain, and which features of the probe network are useful to solve it [generating an embedding vector of the task] (Sect. 2.1).”
Achille does not explicitly teach:
“A computing system comprising: a network controller; a processor coupled to the network controller; and a memory coupled to the processor, the memory including a set of instructions, which when executed by the processor, cause the processor to”
“determine whether a complexity of a task for executing a neural network exceeds a threshold, and in response to determining that the complexity of the task exceeds the threshold”
“wherein the first variant of the neural network based cost model generates the hardware efficiency estimate by predicting utilization of the versatile processing unit for performing the task”
Jha teaches:
“A computing system comprising: a network controller; a processor coupled to the network controller; and a memory coupled to the processor, the memory including a set of instructions, which when executed by the processor, cause the processor to”: Jha, paragraph 0051, “FIG. 7 is a block diagram of an example computing device 700 including non-transitory computer-readable storage medium storing instructions to dynamically generate UI components based on hierarchical component factories. Computing device 700 (e.g., client 104 of FIG. 1) includes a processor 702 [a processor] and a machine-readable storage medium 704 [a memory coupled to the processor] communicatively coupled through a system bus. Processor 702 may be any type of central processing unit (CPU), microprocessor, or processing logic that interprets and executes machine-readable instructions stored in machine-readable storage medium 704 [the memory including a set of instructions, which when executed by the processor, cause the processor to]. Machine-readable storage medium 704 may be a RAM or another type of dynamic storage device that may store information and machine-readable instructions that may be executed by processor 702”; Jha, paragraph 0056, “Some or all of the system components and/or data structures may also be stored as contents ( e.g., as executable or other machine-readable software instructions or structured data) on a non-transitory computer-readable medium (e.g., as a hard disk; a computer memory; a computer network or cellular wireless network or other data transmission medium [a network controller]; or a portable media article to be read by an appropriate drive or via an appropriate connection, such as a DVD or flash memory device) so as to enable or configure the computer-readable medium and/or one or more host computing systems or devices to execute or otherwise use or provide the contents to perform at least some of the described techniques.”
“wherein the first variant of the neural network based cost model generates the hardware efficiency estimate by predicting utilization of the versatile processing unit for performing the task”: Jha, paragraph 0053, “Instructions 708 may be executed by processor 702 to train a set of forecasting models based on the resource utilization data associated with a portion of the period. Instructions 710 may be executed by processor 702 to predict the resource utilization of each of the plurality of containers for a remaining portion of the period using the set of trained forecasting models [predicting utilization of the versatile processing unit for performing the task]. Instructions 712 may be executed by processor 702 to compare the predicted resource utilization with the collected resource utilization data for the remaining portion of the period.”
Jha and Achille are analogous arts as they are both related to workload estimation. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the utilization prediction of Jha with the teachings of Achille to arrive at the present invention, in order to improve allocation of computing resources, as stated in Jha, paragraph 0018, “Embodiments described herein may provide an enhanced computer-based and network-based method, technique, and system for determining recommended resource claims (i.e., optimal resource request values) for the containers based on historical utilization of the resources ( e.g., CPU, memory, and the like) by the containers using a machine learning. The recommended resource claims may be used to dynamically readjust the request of containers, thus improving resource utilization which in tum reduces cost by saving significant amount of resources for an enterprise. Examples described herein may also provide elasticity of the resources to the container. i.e., when a container starts consuming more resources, examples described herein may dynamically recommend additional resources for that container.”
Argerich teaches “determine whether a complexity of a task for executing a neural network exceeds a threshold, and in response to determining that the complexity of the task exceeds the threshold”: Argerich, “SVMs are really good when you have a high dimensionality dataset and you don't have a lot of data. With or without a kernel they are not very likely to overfit and produce good results. Logistic Regression is a very good all-purpose algorithm, if you need probabilities or you have a lot of data LR is usually good. Same if you have only a few features. NNs are very flexible you usually need a lot of data and they are particularly useful for data such as sound, images, video and other multimedia data. So in general: • reduced number of features = > LR [determine whether a complexity of a task for executing a neural network exceeds a threshold] • a lot of features but not a lot of data = > SVM • a lot of features and a lot of data=> NN [and in response to determining that the complexity of the task exceeds the threshold] Of course that is an over-simplification and we'll find plenty of counter-examples but I think it is sound as a rough guideline.”
Argerich and Achille are analogous arts as they are both related to machine learning model analysis. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the matching of model choice to complexity from Argerich with the teachings of Achille to arrive at the present invention, in order to choose an appropriate model, as stated in Argerich, “SVMs are really good when you have a high dimensionality dataset and you don't have a lot of data. With or without a kernel they are not very likely to overfit and produce good results. Logistic Regression is a very good all-purpose algorithm, if you need probabilities or you have a lot of data LR is usually good. Same if you have only a few features. NNs are very flexible you usually need a lot of data and they are particularly useful for data such as sound, images, video and other multimedia data. So in general: • reduced number of features = > LR • a lot of features but not a lot of data = > SVM • a lot of features and a lot of data=> NN Of course that is an over-simplification and we'll find plenty of counter-examples but I think it is sound as a rough guideline.”
Regarding claim 3 and analogous claims 8, 15, and 23:
Achille as modified by Jha and Argerich teaches “The computing system of claim 1.”
Achille further teaches “wherein the second variant of the neural network based cost model is deployed for optimizing an accuracy of the neural network”: Achille, section C.3, paragraph 2, “We learn bj by minimizing a k-way classification loss which, given a task t, aims to select the model that performs best on the task among a collection of models [deployed for optimizing an accuracy of the neural network].”
Claims 2, 7, 14, and 22 rejected under 35 U.S.C. 103 over Achille as modified by Jha and Argerich in view of Justus et al., “Predicting the Computational Cost of Deep Learning Models,” 2018, arXiv:1811.11880v1 (hereafter Justus).
Achille as modified by Jha and Argerich teaches “The computing system of claim 1.”
Achille as modified by Jha and Argerich does not explicitly teach “wherein the instructions, when executed, further cause the processor to train the neural network based cost model based on one or more of hardware profile data or register-transfer level data.”
Justus further teaches “wherein the instructions, when executed, further cause the processor to train the neural network based cost model based on one or more of hardware profile data or register-transfer level data”: Justus, section IV, paragraph 1, “We define here the features which could influence the prediction of execution times when performing training. We categorise these features into layer features, layer specific features, implementation features and hardware features [train the neural network based cost model based on one or more of hardware profile data].”
Justus and Achille are analogous arts as they are both related to workload estimation. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the hardware profiling of Justus with the teachings of Achille to arrive at the present invention, in order to improve the estimations, as stated in Justus, Abstract, “This has advantages over linear approaches as it can model more complex scenarios. But, also, it has the ability to predict execution times for scenarios unseen in the training data. Therefore, our approach can be used not only to infer the execution time for a batch, or entire epoch, but it can also support making a well-informed choice for the appropriate hardware and model.”
Claims 4, 9, 16, and 24 rejected under 35 U.S.C. 103 over Achille as modified by Jha and Argerich in view of Hu et al., US Pre-Grant Publication No. 2008/0201591 (hereafter Hu).
Achille as modified by Jha and Argerich teaches “The computing system of claim 1.”
Achille as modified by Jha and Argerich does not explicitly teach “wherein the instructions, when executed, further cause the processor to deploy the first variant or the second variant of the neural network based cost model to determine whether to reduce power consumed by the versatile processing unit for performing the task.”
Hu teaches “wherein the instructions, when executed, further cause the processor to deploy the first variant or the second variant of the neural network based cost model to determine whether to reduce power consumed by the versatile processing unit for performing the task“: Hu, paragraph 0026, “The operating system (OS) includes a performance monitoring counter (PMC) driver 206. The JVM 134 may monitor performance counters 210 in the CPU 101 in order to predict the processor utilization of a next time interval based on the utilization of the current time interval, and may scale the CPU 101 supply voltage up or down based on the predicted utilization [determine whether to reduce power consumed by the versatile processing unit for performing the task].”
Hu and Achille are analogous arts as they are both related to utilization estimation. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the processor power scaling of Hu with the teachings of Achille to arrive at the present invention, in order to increase energy efficiency, as stated in Hu, paragraph 0003, “Thus, when high computation speed of the processor is not required, the clock frequency/supply voltage of the processor may be reduced in order to reduce the energy consumption of the system. Typically, in order to reduce power dissipation, a processor may support multiple power states and provide a software interface for handling a request to change to a lower or higher power state.”
Claims 5, 10-11, 17-18, and 25 rejected under 35 U.S.C. 103 over Achille as modified by Jha and Argerich in view of Kim et al., US Pre-Grant Publication No. 2020/0210836 (hereafter Kim).
Regarding claim 5 and analogous claims 11, 18, and 25:
Achille as modified by Jha and Argerich teaches “The computing system of claim 1.”
Achille further teaches “wherein the instructions, when executed, further cause the processor to generate one or more of a compiler decision, a driver decision or a network architecture search decision based on the hardware efficiency estimate”: Achille, section 1, paragraph 3, “To address this, we learn a joint task and model embedding, called MODEL2VEC, in such a way that models whose embeddings are close to a task exhibit good performance on the task. We use this to select an expert from a given collection, improving performance relative to fine-tuning a generic model trained on ImageNet and obtaining close to ground-truth optimal selection [a network architecture search decision based on the hardware efficiency estimate].”
Achille as modified by Jha and Argerich does not explicitly teach “wherein the first variant of the neural network based cost model is deployed for optimizing latency of the versatile processing unit performing the task.”
Kim teaches “wherein the first variant of the neural network based cost model is deployed for optimizing latency of the versatile processing unit performing the task”: Kim, paragraph 0032, “For example, the above-mentioned processing time may include estimated values in consideration of the computation time, latency [wherein the first variant of the neural network based cost model is deployed for optimizing latency of the versatile processing unit performing the task] and the like of the software, which can be detected in software, as well as the driving time of the hardware, which can be detected in hardware. Further, the estimated performance is not limited to the processing time, power consumption, computation amount, memory bandwidth usage and memory usage according to performing operations of the neural network, but may include estimated values for any indicator that is considered necessary to estimate the performance in terms of hardware or software.”
Kim and Achille are analogous arts as they are both related to estimations of neural networks. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the estimation considerations of Kim with the teachings of Achille to arrive at the present invention, in order to minimize latency or hardware utilization in the neural network, as stated in Kim, paragraph 0032, “For example, the above-mentioned processing time may include estimated values in consideration of the computation time, latency and the like of the software, which can be detected in software, as well as the driving time of the hardware, which can be detected in hardware. Further, the estimated performance is not limited to the processing time, power consumption, computation amount, memory bandwidth usage and memory usage according to performing operations of the neural network, but may include estimated values for any indicator that is considered necessary to estimate the performance in terms of hardware or software.”
Regarding claims 10 and 17:
Claims 10 and 17 contain only limitations in claims 11 and 18, respectively, and therefore claims 10 and 17 are taught by the combination of references given above for claims 5, 11, 18, and 25.
Claims 12 and 19 rejected under 35 U.S.C. 103 over Achille as modified by Jha and Argerich in view of Luo et al., US Pre-Grant Publication No. 2022/0172074 (hereafter Luo).
Achille as modified by Jha and Argerich teaches “The at least one computer readable storage medium of claim 6.”
Argerich further teaches “in response to determining that the complexity of the task exceeds the threshold”: Argerich, “SVMs are really good when you have a high dimensionality dataset and you don't have a lot of data. With or without a kernel they are not very likely to overfit and produce good results. Logistic Regression is a very good all-purpose algorithm, if you need probabilities or you have a lot of data LR is usually good. Same if you have only a few features. NNs are very flexible you usually need a lot of data and they are particularly useful for data such as sound, images, video and other multimedia data. So in general: • reduced number of features = > LR • a lot of features but not a lot of data = > SVM • a lot of features and a lot of data=> NN [in response to determining that the complexity of the task exceeds the threshold] Of course that is an over-simplification and we'll find plenty of counter-examples but I think it is sound as a rough guideline.”
Argerich and Achille are combinable for the rationale given under claim 1 (covering claim 6).
Jha further teaches “generate the hardware efficiency estimate for the task based on a cost function”: Jha, paragraph 0053, “Instructions 708 may be executed by processor 702 to train a set of forecasting models based on the resource utilization data associated with a portion of the period. Instructions 710 may be executed by processor 702 to predict the resource utilization of each of the plurality of containers for a remaining portion of the period using the set of trained forecasting models [generate the hardware efficiency estimate for the task based on a cost function]. Instructions 712 may be executed by processor 702 to compare the predicted resource utilization with the collected resource utilization data for the remaining portion of the period.”
Jha and Achille are combinable for the rationale given under claim 1 (covering claim 6).
Achille as modified by Jha and Argerich does not explicitly teach “the cost function to generate the hardware efficiency estimate independently of second order effects and nonlinearities associated with the task.”
Luo teaches “the cost function to generate the hardware efficiency estimate independently of second order effects and nonlinearities associated with the task”: Luo, paragraph 0025, “In step S120 the suggested inference neural network graph SN and its parameter dimension are received by the execution performance estimator 120, and the estimated performance EP of the neural network accelerator hardware is calculated according to a hardware calculation abstract information of the suggested inference neural network graph simulates the estimated performance EP of the neural network accelerator hardware using a neural network accelerator hardware simulation statistics extraction algorithm. For example, the convolution operation may contain parameters such as 4 dimensions (the height, the width, the depth and the batch number) of the characteristic image, 4 dimensions (the number of filters, the height, the width, and the depth) of the filters or the operation stride. The normalization operation may contain parameters such as linear slope, standard error and mean. The activation function operation may contain digital resolutions required for positive/negative slopes or non-linear functions such as sigmoid function and tanh function. The pooling operation may contain parameters such as input size, pooling kennel size, and computing stride. The hardware calculation abstract information are such as the types, numbers and dimensions of the above parameters, and the cycle count information of the neural network accelerator hardware, that is, the estimated performance EP of the neural network accelerator hardware, can be calculated using the neural network accelerator hardware simulation statistics extraction algorithm according to the types, numbers and dimensions of the above parameters [cost function to generate the hardware efficiency estimate independently of second order effects and nonlinearities associated with the task].”
Luo and Achille are analogous arts as they are both related to machine learning performance and efficiency. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the efficiency calculation method of Luo with the teachings of Achille to arrive at the present invention, in order to provide a simpler method of estimating hardware performance of a neural network, as stated in Luo, paragraph 0004, “During the development/search process of the neural network software, the research personnel hope that the execution speed and accuracy of the neural network accelerator hardware can be obtained immediately after some of the content of the neural network are finetuned, lest the research personnel might spend a large amount of time and cost in training only to find that the execution speed of the hardware is not satisfactory and needs to be adjusted.”
Claim 20 rejected under 35 U.S.C. 103 over Achille as modified by Jha and Argerich in view of Chen et al., “Electronic Design Automation,” 2009 (hereafter Chen).
Achille as modified by Jha and Argerich teaches “The semiconductor apparatus of claim 13.”
Achille as modified by Jha and Argerich does not explicitly teach “wherein the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates.”
Chen teaches “wherein the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates”: Chen, section 2.2, “In this section, we first discuss the basic constructs and characteristics of a metal oxide semiconductor (MOS) transistor (a.k.a., MOS device). Most transistors in digital circuits are switching devices that operate to perform desired Boolean functions [the logic]. MOS transistors can also be configured as load devices that are used for circuit performance enhancements. […] A MOS transistor is a 4-terminal device on a silicon substrate [Martin 2000]. Circuit schematic diagrams often show transistors in 3-terminal symbols, with the assumption that the fourth terminal (known as the substrate terminal) is either grounded or connected to power supply on the basis of the device type. Figure 2.2a shows the dimensions of a MOS transistor, where L is the n-channel length, W is the n-channel width, and tox is the thickness of the thin oxide layer under the gate. Figure 2.2b shows a cross-section view of a typical n-channel transistor [coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates]. The three terminals of the devices are Gate, Source, and Drain. A fourth terminal connecting the Substrate is sometimes provided with devices as well. Common symbols used for n-channel and p-channel transistors are shown in Figure 2.3.”
Chen and Achille are analogous arts as they are both related to logic devices. It would have been obvious to a person having ordinary skill in the art prior to the effective filing date of the claimed invention to have combined the transistor channels of Chen with the teachings of Achille to arrive at the present invention, in order to implement the device in an integrated circuit, as stated in Chen, Chapter 2 Introduction, “The first integrated circuit (IC), called a phase shift oscillator composed of one transistor, one capacitor, and three resistors, was created by Jack Kilby of Texas Instruments on September 12, 1958. Today, a typical IC chip can easily contain several hundred millions of transistors and miles of interconnect wires. This very large-scale integration (VLSI) ability has been enabled by the modern use of the many electronic design automation (EDA) technologies and applications discussed in this book.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Gracia-Martin et al., “Estimation of energy consumption in machine learning,” 2019, https://doi.org/10.1016/j.jpdc.2019.07.007, surveys methods of estimating the energy consumption of processors and computing systems when performing machine-learning tasks.
Bastani et al., WIPO application 2020/104038, discloses of providing a complexity score for a neural network performing a task, for use in network selection.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VINCENT SPRAUL whose telephone number is (703) 756-1511. The examiner can normally be reached M-F 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MICHAEL HUNTLEY can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/VAS/Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129