Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
2. This office action is in response to the original filing of 03/31/2025. Claims 27-28 are canceled and claims 1-26 are pending and have been considered below.
Claim Rejections - 35 USC § 112
3. The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 19-25 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 19 and 25 recite the limitation "set of controller policy parameters" in lines 23,26 and 24, 28, respectively. There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 102
4. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-3, 5-9, and 11-26 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Weiwei et al. “You Only Search Once: A Fast Automation Framework for Single-Stage DNN/Accelerator Co-design” (May 14, 2020) Hereinafter Weiwei.
Claim 1. Weiwei discloses a method comprising:
generating, using a controller policy (RL Controller fig. 2), a batch of one or more output sequences, each output sequence in the batch defining (i) a respective architecture of a child neural network that is configured to perform a particular neural network task NN Architecture, fig. 2… The first step takes the target machine learning task (e.g., classification, detection), the basic accelerator architecture (e.g., systolic array architecture) and the user constraints (the accuracy, power and latency threshold) as inputs, and then train a HyperNet used to derive different network architectures in the search stage. After that, performance samples are taken from the accelerator simulator and used to build a performance predictor. In this work, we use
energy and latency as the performance metrics for demonstration) Section III. B. A High-level Overview of the Automated Framework) and (ii) a respective architecture of a hardware accelerator on which a trained instance of the child neural network is to be implemented (Arch configuration, fig. 2);
for each output sequence in the batch:
training a respective instance of the child neural network having the architecture defined by the output sequence to perform the particular neural network task (Fig.2, "Hypernet training";…Section III, D. HyperNet Based Accuracy Evaluator: "the full training of the candidate DNN architectures in order to evaluate their accuracy on test dataset");
evaluating a network performance of the trained instance of the child neural network on the particular neural network task to determine a network performance metric (accuracy) for the trained instance of the child neural network on the particular neural network task (Section III, D. HyperNet Based Accuracy Evaluator:); and
evaluating an accelerator performance of a respective instance of the hardware accelerator having the architecture defined by the output sequence to determine an accelerator performance metric for the instance of the hardware accelerator on supporting a performance of the trained instance of the child neural network having the architecture defined by the output sequence on the particular neural network task (Fast evaluator construction. The first step takes the target machine learning task (e.g., classification, detection), the basic accelerator architecture (e.g., systolic array architecture) and the user constraints (the accuracy, power and latency threshold) as inputs, and then train a HyperNet used to derive different network architectures in the search stage. After that, performance samples are taken from the accelerator simulator and used to build a performance predictor. In this work, we use energy and latency as the performance metrics for demonstration…Effective design search. An LSTM based RL searcher keeps generating the solutions iteratively, which includes the NN architecture and hardware configuration, then it receives the QoR and performance results from the evaluator to obtain the multi-objective reward, and finally update the controller towards the most rewarding design search direction and determining the final solution. After the search process reaches a certain number of iterations, we accurately evaluate the top-N promising candidates with the hardware simulation and fully-training, and select the best one as the final solution as output. ) (Section III B High-level Overview of the Automated Framework, fig. 2 “latency”, “energy”); and
using (i) the network performance metrics for the trained instances of the child neural network and (ii) the accelerator performance metrics for the instances of the hardware accelerators to adjust the controller policy (Fig.2, "multi-objective reward", section III, B. A High-level Overview of the Automated Framework" Step 2: Effective design search “to obtain the multi-objective reward, and finally update the controller towards the most rewarding design search direction,"…Section III C. Reinforcement Learning Based Search Strategy).
Claim 2. Weiwei discloses the method of claim 1, wherein: the controller policy is implemented using a controller neural network having a plurality of controller network parameters; and adjusting the controller policy comprises adjusting current values of the plurality of controller network parameters (Fig.2, Section III. C. Reinforcement Learning Based Search Strategy).
Claim 3. Weiwei discloses the method of claim 2, wherein using (i) the network performance metrics for the trained instances of the child neural network and (ii) the accelerator performance metrics for the instances of the hardware accelerators to adjust the controller policy comprises: training, using a reinforcement learning technique, the controller neural network to generate output sequences that result in child neural networks having increased network performance metrics and hardware accelerators having increased accelerator performance metrics (Fig.2, Section III. C. Reinforcement Learning Based Search Strategy).
Claim 5. Weiwei discloses the method of claim 1, wherein each output sequence comprises a value for a respective hyperparameter of the child neural network at each of a first plurality of time steps (Fig.2, Section III. C. Reinforcement Learning Based Search Strategy).
Claim 6. Weiwei discloses the method of claim 1, wherein each output sequence comprises a value for a respective hardware parameter of the hardware accelerator at each of a second plurality of time steps (Fig.2, Section III. C. Reinforcement Learning Based Search Strategy).
Claim 7. Weiwei discloses the method of claim 2, wherein the controller neural network is a recurrent neural network that comprises: one or more recurrent neural network layers that are configured to, for a given output sequence and at each time step: receive as input the value of hyperparameter or hardware parameter at the preceding time step in the given output sequence, and to process the input to update a current hidden state of the recurrent neural network; and a respective output layer for each time step, wherein each output layer is configured to, for the given output sequence: receive an output layer input comprising the updated hidden state at the time step and to generate an output for the time step that defines a score distribution over possible values of the hyperparameter or hardware parameter at the time step (Fig.2, Section III. C. Reinforcement Learning Based Search Strategy).
Claim 8. Weiwei discloses the method of claim 2, wherein generating, using the controller policy, a batch of one or more output sequences comprises, for each output sequence in the batch and for each of the plurality of time steps: providing as input to the controller neural network the value of the hyperparameter or hardware parameters at the preceding time step in the output sequence to generate an output for the time step that defines a score distribution over possible values of the hyperparameter or hardware parameter at the time step; and sampling from the possible values in accordance with the score distribution to determine the value of the hyperparameter or hardware parameter at the time step in the output sequence (Fig.2, Section III. C. Reinforcement Learning Based Search Strategy).
Claim 9. Weiwei discloses the method of claim 1, wherein: the particular neural network task is an object classification and/or detection task, an object pose estimation task, or a semantic segmentation task; the child neural network is a convolutional neural network that includes one or more depthwise separable convolution layers; and the hyperparameters include hyperparameters for each depthwise separable convolution layers in the child neural network (Section III. C. Reinforcement Learning Based Search Strategy, "(e.g., classification, detection'', Section III D. HyperNet Based Accuracy Evaluator, "DWconv3x3, DWconv5x5").
Claim 11. Weiwei discloses the method of claim 1, wherein the respective hardware characteristics of the hardware accelerator comprises one or more of: a bandwidth of the hardware accelerator, a number of processing elements included in the hardware accelerator, a layout of the processing elements on the hardware accelerator, a number of single-instruction multiple-data (SIMD) style multiply-accumulate (MAC) in each processing element, a number of compute lanes in each processing element, a size of a shared memory in each processing element, or a size of a register file in each processing element (register buffer size) (Section IV A, Table 1).
Claim 12. Weiwei discloses the method of claim 1, wherein the accelerator performance metric for the instance of the hardware accelerator on supporting a performance of the trained instance of the child neural network comprises one or more of: an estimated area of the hardware accelerator, an estimated power consumption of the hardware accelerator, or an estimated latency of the neural network on performing the particular neural network task when being deployed on the hardware accelerator (latency) (fig. 2).
Claim 13. Weiwei discloses the method of claim 12, wherein evaluating an accelerator performance of a respective instance of the hardware accelerator having the architecture defined by the output sequence to determine an accelerator performance metric for the instance of the hardware accelerator on supporting a performance of the trained instance of the child neural network having the architecture defined by the output sequence on the particular neural network task comprises: determining, based on using a cycle-accurate performance simulator and from (i) the respective architecture of the child neural network and (ii) the respective architecture of the hardware accelerator defined by the batch of output sequences, the estimated latency of the neural network on performing the particular neural network task when being deployed on the hardware accelerator (fig. 2).
Claim 14. Weiwei discloses the method of claim 12, wherein evaluating an accelerator performance of a respective instance of the hardware accelerator having the architecture defined by the output sequence to determine an accelerator performance metric for the instance of the hardware accelerator on supporting a performance of the trained instance of the child neural network having the architecture defined by the output sequence on the particular neural network task comprises: determining, based on using an analytical area estimator and from the respective architecture of the hardware accelerator defined by the batch of output sequences, the estimated area of the hardware accelerator (evaluating the accelerator performance comprises determining the estimated area of the hardware accelerator; this feature corresponds to an alternative constraint provided by the user (Section III A “SINGLE-STAGE DNN/ACCELERATOR CO-DESIGN FLOW”, "user-provided performance constraints") (size of PE Array) (Table 1).
Claim 15. Weiwei discloses the method of claim 12, wherein using (i) the network performance metrics for the trained instances of the child neural network and (ii) the accelerator performance metrics for the instances of the hardware accelerators to adjust the current values of the controller network parameters of the controller neural network comprises: assigning different weights to the one or more of accelerator performance metrics; and adjusting, according to the different weights, the current values of the controller network parameters of the controller neural network (Section III. C. Reinforcement Learning Based Search Strategy, Equation (2)).
Claim 16. Weiwei discloses the method of claim 2, wherein using (i) the network performance metrics for the trained instances of the child neural network and (ii) the accelerator performance metrics for the instances of the hardware accelerators to adjust the controller policy further comprises: fixing the network performance metric for the trained instance of the child neural network on the particular neural network task and using only the determined accelerator performance metrics for the instances of the hardware accelerators to adjust the current values of the controller network parameters of the controller neural network (Section I, "stacked to construct a DNN architecture with the highest accuracy for the target dataset Then it will customize the hardware parameters").
Claim 17. Weiwei discloses the method of claim 1, further comprising: generating, in accordance with the adjusted values of the controller network parameters, a final output sequence that defines a final architecture of the child neural network (Fig.2, Section III. A High-level Overview of the Automated Framework, Step 3).
Claim 18. Weiwei discloses the method of claim 17, further comprising performing the particular neural network task for received network inputs by processing the received network inputs using a child neural network having the final architecture (Fig.2, Section III. A High-level Overview of the Automated Framework, Step 3).
Claim 19. Weiwei discloses a method comprising:
receiving data specifying one or more target hardware constraints of a hardware accelerator on which a neural network for performing a particular machine learning task is to be deployed (abstract, Fig.2, "threshold", Section III.B, "target machine learning task", toolchain Neutrams that transforms an existing NN to satisfy the hardware constraints of a neuromorphic chip, Related Works);
receiving training data and validation data for the particular machine learning task (Fig. 2, "training", Section III C, "accuracy on the validation set"); and
selecting, from a space of candidate network architectures and using the training data and the validation data, a network architecture for the neural network for performing the particular machine learning task, selecting, from a space of candidate hardware architectures, a hardware architecture for the hardware accelerator on which the neural network performing the particular machine learning task is to be deployed (Section III.A "accelerator architecture configurations search space", Fig.2, "Arch configuration"), wherein each candidate network architecture in the space is defined by a corresponding set of decision values that includes a respective decision value for each of a first plurality of categorical decisions (Step 3: determining the final solution. After the search process reaches a certain number of iterations, we accurately evaluate the top-N promising candidates with the hardware simulation and fully-training, and select the best one as the final solution as output.) (Section III B High-level Overview of the Automated Framework, Step 3; Section III, C), wherein each candidate hardware architecture in the space is defined by a corresponding set of decision values that includes a respective decision value for each of a second plurality of categorical decisions (Step 3: determining the final solution. After the search process reaches a certain number of iterations, we accurately evaluate the top-N promising candidates with the hardware simulation and fully-training, and select the best one as the final solution as output.) (Section III B High-level Overview of the Automated Framework, Step 3; Section III, C) [wherein, finding the optimal pair (network, hardware) by exploring these two sets of steps simultaneously, or through a jointly optimized process, to maximize accuracy while minimizing hardware-specific metrics like latency, energy consumption, or memory footprint], and wherein the selecting comprises:
jointly updating (Fig.2, ''multi-objective reward", section III B, "to obtain the multi-objective reward, and finally update the controller towards the most rewarding design search direction.") (i) a set of controller parameters that define, for each of the first and second plurality of categorical decisions, a respective probability distribution over decision values for the categorical decision (Section III C "DNN architecture (hyper-parameters) and the accelerator configurations'', "Each parameter ... can be treated as an action", "The LSTM samples actions"…Section III D, “HyperNet Training Strategy…Gaussian Process regressor”) and (ii) a shared set of parameters (Section III C,. the controller parameters can also legitimately be considered as shared parameters as the claim does not further specify which entities share these parameters), wherein: updating the set of controller policy parameters comprises updating the set of controller parameters through reinforcement learning to maximize a reward function that measures (Fig.2, "RL controller", "multi-objective reward") (i) an estimated quality of a candidate hardware architecture (Fig.2, "latency", "energy") and (ii) an estimated quality a candidate network architecture (Fig,2, "accuracy") defined by sets of decision values sampled from probability distributions generated using the controller policy parameters (Section III C and Section III D, “HyperNet Training Strategy…Gaussian Process regressor”), and
updating the shared set of model parameters comprises updating the shared set of model parameters to optimize an objective function that measures a performance on the particular machine learning task of the candidate network architectures defined by the sets of decision values sampled from the probability distributions generated using the controller policy (Section III.C, Equations 2,3);
after the joint updating, selecting as the network architecture for the neural network, a candidate network architecture that is defined by respective particular decision values for each of the first plurality of categorical decisions (Step 3: determining the final solution. After the search process reaches a certain number of iterations, we accurately evaluate the top-N promising candidates with the hardware simulation and fully-training, and select the best one as the final solution as output.) (Section III B High-level Overview of the Automated Framework, Step 3; Section III, C); and selecting as the hardware architecture for the hardware accelerator, a candidate hardware architecture that is defined by respective particular decision values for each of the second plurality of categorical decisions (Fast evaluator construction. The first step takes the target machine learning task (e.g., classification, detection), the basic accelerator architecture (e.g., systolic array architecture) and the user constraints (the accuracy, power and latency threshold) as inputs, and then train a HyperNet used to derive different network architectures in the search stage. After that, performance samples are taken from the accelerator simulator and used to build a performance predictor. In this work, we use energy and latency as the performance metrics for demonstration…Step 2:Effective design search. An LSTM based RL searcher keeps generating the solutions iteratively, which includes the NN architecture and hardware configuration, then it receives the QoR and performance results from the evaluator to obtain the multi-objective reward, and finally update the controller towards the most rewarding design search direction and Step 3: determining the final solution. After the search process reaches a certain number of iterations, we accurately evaluate the top-N promising candidates with the hardware simulation and fully-training, and select the best one as the final solution as output.) (Section III B High-level Overview of the Automated Framework; Section III, C) [wherein, After jointly optimizing (updating) the neural network weights and the hardware configuration (e.g., via gradient-based methods or reinforcement learning) to satisfy performance constraints like latency and accuracy, the final, best-performing candidate is selected].
Claim 20. Weiwei discloses the method of claim 19, further comprising receiving data specifying a target latency for performing the particular machine learning task by the neural network when being deployed on the hardware accelerator (section III.A, "user provided performance constraints threshold", Section III.C, "latency threshold":).
Claim 21. Weiwei discloses the method of claim 19, wherein the reward function includes a quality term that measures the (i) the estimated quality of the candidate hardware architecture and (ii) the estimated quality of the candidate network architecture, and a latency term that is based on a ratio between an estimated latency of the candidate architecture and the target latency (Section III. C. Reinforcement Learning Based Search Strategy, Equation (2).
Claim 22. Weiwei discloses the method of claim 19, wherein the joint updating comprises repeatedly performing operations comprising: determining, using the validation data, an estimated quality on the particular machine learning task of a neural network having a candidate architecture that has a subset of the shared set of model parameters that is defined by the selected decision values for the first plurality of categorical decisions, wherein the quality is estimated in accordance with current values of the subset of the shared set of model parameters that is defined by the selected decision values for the first plurality of categorical decisions (Section III. C. Reinforcement Learning Based Search Strategy, "accuracy on the validation set").
Claim 23. Weiwei discloses the method of claim 19, wherein the joint updating comprises repeatedly performing operations comprising: determining, using the validation data and a latency simulator, an estimated latency when performing the particular machine learning task of the neural network having the candidate network architecture that has the subset of the shared set of model parameters that is defined by the selected decision values for the first plurality of categorical decisions, wherein the neural network is deployed on the hardware architecture having the hardware architecture that has the subset of the shared set of model parameters that is defined by the selected decision values for the second plurality of categorical decisions (Section III. C. Reinforcement Learning Based Search Strategy, "accuracy on the validation set, latency", Fig.2, "Latency predictor").
Claim 24. Weiwei discloses the method of claim 19, wherein the joint updating comprises repeatedly performing operations comprising: determining, using an area simulator, an estimated quality of the candidate hardware architecture that has the subset of the shared set of model parameters that is defined by the selected decision values for the second plurality of categorical decisions (evaluating the accelerator performance comprises determining the estimated area of the hardware accelerator; this feature corresponds to an alternative constraint provided by the user (Section III A “SINGLE-STAGE DNN/ACCELERATOR CO-DESIGN FLOW”, "user-provided performance constraints") (size of PE Array) (Table 1).
Claim 25. Weiwei discloses the method of claim 23, wherein the latency simulator and the area simulator are each a respective neural network trained on labelled training data generated using an accelerator simulator (Section IV A. Experiment setup”).
Claim 26 represents the machine learning task-specific hardware accelerator of claim 19 and is rejected under the same rationale.
Claim Rejections - 35 USC § 103
6. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Weiwei et al You Only Search Once: A Fast Automation Framework for Single-Stage DNN/Accelerator Co-design (March 9-13, 2020) Hereinafter Weiwei in view of Zoph et al. Learning Transferable Architectures for Scalable Image Recognition (2018).
Claim 4. Weiwei discloses the method of claims 3, but fails to explicitly disclose wherein: the reinforcement learning technique is a proximal policy optimization (PPO) technique.
However, Zoph discloses the reinforcement learning technique is a proximal policy optimization (PPO) technique (p. 8707 Appendix A1). Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Weiwei with Zoph features. One would have been motivated to do so in order to optimize convolutional architectures on a dataset of interest.
7. Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Weiwei et al You Only Search Once: A Fast Automation Framework for Single-Stage DNN/Accelerator Co-design (March 9-13, 2020) Hereinafter Weiwei in view of Sandler et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks (2018).
Claim 10. Weiwei discloses the method of claim 1, but fails to explicitly disclose wherein: the child neural network includes one or more inverted residual layers and one or more linear bottleneck layers; and the hyperparameters include hyperparameters for each inverted residual layers and linear bottleneck layers in the child neural network.
However, Sandler discloses the child neural network includes one or more inverted residual layers and one or more linear bottleneck layers (p. 4511 section 3.2); and the hyperparameters include hyperparameters for each inverted residual layers and linear bottleneck layers in the child neural network (p. 4512, section 3.3). Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Weiwei with Sandler features. One would have been motivated to do so in order to reduce the need for main memory access in many embedded hardware designs, that provide small amounts of very fast software controlled cache memory.
Conclusion
8. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure (See PTO-892).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Phenuel S. Salomon whose telephone number is (571) 270-1699. The examiner can normally be reached on Mon-Fri 7:00 A.M. to 4:00 P.M. (Alternate Friday Off) EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-3800.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PHENUEL S SALOMON/Primary Examiner, Art Unit 2146