Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/30/2025 has been entered.
Remarks
This Office Action is responsive to Applicants' Amendment filed on December 30, 2025, in which claims 1, 4-6, 11, 14-16, and 19 are currently amended. Claims 1-20 are currently pending.
Response to Arguments
Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 101 based on amendment have been considered, however, are not persuasive.
With respect to Applicant’s arguments on p. 11 of the Remarks submitted 12/30/2025 that recitation of “executing one or more processors” changes the scope of the claims to that which “merely involve a judicial exception”, Examiner respectfully disagrees. The claim is wholly directed towards a mental process of determining optimal hyper-parameters which is routinely performed entirely in the mind. The mere recitation of executing the judicial exception on a generic computer system does not integrate the judicial exception into a practical application (See MPEP 2106.05(f)) and MPEP 2106.07(a)(II) "employing well-known computer functions to execute an abstract idea, even when limiting the use of the idea to one particular environment, does not integrate the exception into a practical application". Similarly, since updating parameters for at least some of the iterations can and is routinely done without a computer, specifying that the parameters are of a trained computing device stored in a memory device is not seen as integrating the judicial exception into a practical application but rather as mere instructions to apply the judicial exception using generic computer components. Examiner notes that the instant specification explicitly reinforces this interpretation ([¶0023] “selected options for design parameters may be defined solely by a human design for a particular purpose “ [¶0024] “Through experimentation, human experts have devised several useful neural network structures such as, for example, attention and residual connection”).
With respect to Applicant’s arguments on p. 12 of the Remarks submitted 12/30/2025 that the claims present a technical improvement, Examiner respectfully disagrees. The cited paragraph from the instant specification largely recites a judicial exception which may or may not be performed on particular hardware without any suggestion of how the hardware itself may be improved by the judicial exception.
For at least these reasons and those described below, Examiner asserts that it is reasonable and appropriate to maintain the rejection of claims 1-20 under 35 U.S.C. 101.
Applicant’s arguments with respect to rejection of claims 1-20 under 35 U.S.C. 103 based on amendment have been considered.
With respect to Applicant’s arguments on p. 17 of the Remarks submitted 12/30/2025 that He does not disclose or suggest “updating parameters of the trained computing device stored in a memory”, Examiner respectfully disagrees. He repeatedly describes the disclosed system as being performed on a computer system having processors and using instruction libraries necessarily stored in computer memory and necessarily executed on a processor. Examiner notes that the instant specification itself does not explicitly mention updating parameters stored in memory, nor does the instant specification ever use the term “trained computing device”, however, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that said parameters, if updated on a computer system using a processor, must also necessarily be stored in some form of memory. So just as it would be reasonable to infer the claim limitation from the instant specification, Examiner asserts that it would be equally, if not more, reasonable to infer the same limitation from He. Examiner further notes that the claim is not rejected in view of He alone, but rather the combination of Huang and He, where Huang explicitly and repeatedly describes the relationship between the parameters and computer memory ([¶0083] "the policy of the controller 30 can be updated based on the reward 32. In particular, the search system can update one or more values of one or more parameters of the controller model based on the value function described above" [¶0088] "The memory 114 can store data 116 and instructions 118 which are executed by the processor 112 to cause the user computing device 102 to perform operations." [¶0089] " the user computing device 102 can store or include one or more neural networks 120" [¶0090] "the one or more neural networks 120 can be received from the server computing system 130 over network 180, stored in the user computing device memory 114, and then used or otherwise implemented by the one or more processors 112" [¶0098] "The architecture search computing system 150 can include a model trainer 160 that trains and/or evaluates the machine-learned networks 120 and/or 140 stored at the user computing device 102 and/or the server computing system 130 using various training or learning techniques"). For at least these reasons and those further described below, Examiner asserts that the rejection of the claims in view of the combination of Huang and He is reasonable and should be maintained.
Claim Rejections - 35 USC § 101
101 Rejection
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter.
Regarding Claim 1: Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Claim 1 under its broadest reasonable interpretation is a series of mental processes. For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following:
a current sampling iteration of the multiple iterations to select one or more sampled hyper-parameter search space instances that are subsets of one or more sampled hyper-parameter search space instances selected in a previous sampling iteration of the multiple sampling iterations, the subsets of the one or more sampled hyper-parameter search space instances selected in the previous sampling iteration according to a gradient computed based, at least in part on a latency estimator generated on at least some of the multiple sampling iterations (observation, evaluation, and judgement)
the subsets of the one or more sampled hyper- parameter search space instances selected in the previous sampling iteration to define candidate neural network (NN) architectures (observation, evaluation, and judgement)
updating parameters […] for at least some of the iterations based, at least in part, on empirically determined latencies of at least some sampled hyper-parameter search space instances selected on the at least some of the sampling iterations (observation, evaluation, and judgement)
Therefore, claim 1 recites an abstract idea which is a judicial exception.
Step 2A Prong Two Analysis: Claim 1 recites additional elements “executing one or more processors in multiple sampling iterations of a hyper-parameter search space”, “by a trained computing device”, and “the trained computing device stored in a memory device”. However, these additional features are computer components recited at a high-level of generality, such that they amount to no more than mere instructions to apply the judicial exception using a generic computer component. An additional element that merely recites the words “apply it” (or an equivalent) with the judicial exception, or merely includes instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, does not integrate the judicial exception into a practical application (See MPEP 2106.05(f)). Therefore, claim 1 is directed to a judicial exception.
Step 2B Analysis: Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the lack of integration of the abstract idea into a practical application, the additional elements recited in claim 1 amount to no more than mere instructions to apply the judicial exception using a generic computer component.
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 11 and 16, which recite an apparatus and an article, respectively, as well as to dependent claims 2-10, 12-15, and 17-20. Independent claim 11 and dependent claims 12-15 recite additional elements “one or more processors to” which amounts to mere instructions to apply the judicial exception using generic computer components and does not integrate the judicial exception into a practical application.
Similarly, independent claim 16 and dependent claims 1-20 recite additional elements “a non-transitory storage medium comprising computer-readable instructions stored thereon, the instructions to be executable by one or more processors of a computing device to” which amounts to mere instructions to apply the judicial exception using generic computer components and does not integrate the judicial exception into a practical application.
The additional limitations of the dependent claims are addressed briefly below:
Dependent claims 2 and 12 recite additional observation, evaluation, and judgement “wherein the empirically determined latencies are determined based, at least in part, on an observed and/or measured latencies of execution and/or simulation of candidate NN architectures, the candidate NN architectures being defined, at least in part, by sampled hyper-parameter search space instances.”
Dependent claims 3 and 13 recite additional instructions to apply the judicial exception using generic computer components “execution of the candidate NN architectures comprises execution of the candidate NN architectures on one or more neural processing units (NPUs)”
Dependent claims 4 and 14 recite additional observation, evaluation, and judgement “updating the parameters of the trained computing device further comprises: applying the latency estimator to at least some of the candidate NN architectures to compute predictions and/or estimates of latencies; and applying a loss function to the empirically determined latencies and the predictions and/or estimates of latencies”.
Dependent claims 5 and 15 recite additional observation, evaluation, and judgement “at least some of the parameters of the trained computing device comprise weights associated with nodes of a neural network stored in the memory device, and further comprising: updating at least some of the weights associated with the nodes of the neural network based, at least in part, on a gradient applied to the second loss function”.
Dependent claims 6 and 17 recite additional observation, evaluation, and judgement “updating the parameters of the trained computing device further comprises: identifying a subsequent hyper-parameter search space to define candidate NN architectures based, at least in part, on application of the latency estimator to obtain predictions and/or estimates of at least some NN architectures in a current hyper-parameter search space; identifying one or more dummy hyper-parameter search spaces based, at least in part, on the subsequent search space; and updating parameters of the latency estimator for application to at least some of the candidate NN architectures in the subsequent search space based, at least in part, on empirically determined latencies of at least some NN architectures in the one or more hyper-parameter dummy search spaces”
Dependent claims 7 and 18 recite additional observation, evaluation, and judgement “wherein the gradient is computed based, at least in part, on a latency loss function and a functional loss”
Dependent claims 8 and 19 recite additional observation, evaluation, and judgement “applying a first gradient for updating a super set of weights to be selectable for application of nodes of subsequently identified candidate NN architectures; and applying a second gradient for updating a set of NN network topology features to be selectable for the subsequently identified candidate NN architectures”
Dependent claims 9 and 20 recite additional observation, evaluation, and judgement “wherein the set of NN network topology features comprises selectable channel sizes for at least one layer in the subsequently identified candidate NN architectures”
Dependent claim 10 recites additional observation, evaluation, and judgement “mapping the selectable channel sizes to a probability mass function; and selecting at least one of the subsequently identified candidate NN architectures based, at least in part, on the probability mass function”
Therefore, when considering the elements separately and in combination, they do not add significantly more to the inventive concept. Accordingly, claims 1-20 are rejected under 35 U.S.C. § 101.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 4, 5, 7, 11, 12, 14, 15, 16, and 18 are rejected under U.S.C. §103 as being unpatentable over the combination of Huang (US 20250156715 A1) with priority data 2/16/2022 and He (“AMC: AutoML for Model Compression and Acceleration on Mobile Devices”, 2018).
PNG
media_image1.png
478
704
media_image1.png
Greyscale
FIG. 1B of Huang
Regarding claim 1, Huang teaches A method comprising: executing one or more processors in multiple sampling iterations of a hyper-parameter search space([¶0040] "The architecture of a neural network is the structure of how nodes are connected; examples of architectural choices are hidden layer sizes and activation types" [¶0097] "The architecture search computing system 150 includes one or more processors 152 and a memory 154" [¶0057] "for iter =1 to max_iter do […] xi <- the i-th layer size sampled from {sij}jE[Ci]" [¶0058] "β is a hyperparameter that needs careful tuning. The idea behind these reward functions is to encourage models with high quality with respect the resource target." layer size and activation type are considered hyperparameters in view of the instant specification ([¶0036] "hyper-parameter search space may be defined, at least in part, based on available features of a neural network architecture such as, for example, weight quantization, activation quantization, channel width, operator types or random pruning, just to provide a few examples of parameters over which a hyperparameter search space may be defined" [¶0057] "hyper-parameters such as among channel options for a jth layer of a candidate neural network"))
a current sampling iteration of the multiple iterations to select one or more sampled hyper-parameter search space instances([¶0072] "the reinforcement learning process shown in FIG. 1A includes a controller 30 that operates to generate (e.g., select values for) a new architecture 18." [¶0057] "for iter =1 to max_iter do […] xi <- the i-th layer size sampled from {sij}jE[Ci] with distribution {pij}jE[Ci]" Each iteration involves the controller generating a new architecture by sampling candidate selections (x and y) from the learned distribution {pij}. The sampled x and y being the search space instances)
that are subsets of one or more sampled hyper-parameter search space instances selected in a previous sampling iteration of the multiple sampling iterations, ([¶0002] "the present disclosure relates to neural architecture search techniques that have improved computational efficiency via performance of a constraint-based screening and improved gradient update approach" [¶0058] "Example Rejection-Based Reward with MC Sampling. Only a subset of the architectures in the search space S will satisfy a set of given resource constraint(s), V denotes this set of feasible architectures. To find a feasible architecture, a resource target T0 is often used in an RL reward. Given an architecture y, a latency-aware reward combines its quality Q(y) and resource consumption T(y) into a single reward. [...] In these approaches β is a hyperparameter that needs careful tuning. The idea behind these reward functions is to encourage models with high quality with respect the resource target." [¶0055] " the layer-wise sampling probabilities gradually converge to more deterministic distributions, under which one or several architectures are finally selected" [¶0061] "The search system samples only from the set of feasible architectures V, whose distribution is {P(y|y∈V)}y∈v" iterative subsets are explicitly limited to the feasible subset {P(y|y∈V)}y∈v of the instance P(y) of a previous sampling iteration (every iteration samples candidates and then gates that selection based on the union subset with V). Huang also reinforces the iterative subset selection by explicitly stating that ([¶0055] "the layer-wise sampling probabilities gradually converge to more deterministic distributions") where the converged candidate set is a subset of the initial candidate set. The final selected architecture is also a subset of the hyper-parameter search space architectures of the previous iteration, the final selected architecture being a subset selected in the last iteration.)
the subsets of the one or more sampled hyper-parameter search space instances selected in the previous sampling iteration according to a gradient computed ([¶0054] "the search system can then update the logits for the RL controller by sampling a child network y that is independent of the network x from the same layerwise distributions, computing the quality reward Q(y) as 1−loss(y) on the validation set, and then updating the controller (e.g., the logits) with the gradient of J(y)=stop_grad(Q(y)−Q)logP(y): the product of the advantage of the current network's reward over past rewards (usually an exponential moving average) and the log-probability of the current sample." Huang's controller update is explicitly gradient based)
based, at least in part on a latency estimator [generated on at least some of the multiple sampling iterations] by a trained computing device, ([¶0050] "In updates to the RL controller, the weights of the child network can be used to estimate the quality reward that is used to update the controller (e.g., the logits)" [¶0054] "updating the controller (e.g., the logits) with the gradient of J(y)" [¶0074] "the constraints evaluated at 20 can include constraints on the number of parameters, storage space required by the model, model runtime, training latency, serving latency, interoperability to certain hardware accelerators, parallelizability, etc." [¶0075] "some or all of the constraint(s) can be evaluated prior to any training 22 and/or evaluation 24 of the new architecture 18 [...] some or all of the constraints can be evaluated at the beginning or during training 22 and/or evaluation 24 [...] the runtime of the network can be estimated (e.g., without performing any forward passes through the network) and the runtime constraint can be evaluated based on such estimation" The computing system, computing system data 116, computing system operating system, or the neural architect search system in FIG. 1A-B in Huang all interpreted as a trained computing device, the computing system in Huang repeatedly updates learned controller parameters during iterative gradient based search such that the computer is explicitly trained. Huang places runtime/latency handling inside the iterative NAS loop where each iteration generates a candidate architecture and then performs "constraint evaluation" where runtime latency is explicitly estimated for evaluation.)
the subsets of the one or more sampled hyper- parameter search space instances selected in the previous sampling iteration to define candidate neural network (NN) architectures([¶0002] "the present disclosure relates to neural architecture search techniques that have improved computational efficiency via performance of a constraint-based screening and improved gradient update approach" [¶0058] "Example Rejection-Based Reward with MC Sampling. Only a subset of the architectures in the search space S will satisfy a set of given resource constraint(s), V denotes this set of feasible architectures. To find a feasible architecture, a resource target T0 is often used in an RL reward. Given an architecture y, a latency-aware reward combines its quality Q(y) and resource consumption T(y) into a single reward. [...] In these approaches β is a hyperparameter that needs careful tuning. The idea behind these reward functions is to encourage models with high quality with respect the resource target." [¶0055] " the layer-wise sampling probabilities gradually converge to more deterministic distributions, under which one or several architectures are finally selected" [¶0061] "The search system samples only from the set of feasible architectures V, whose distribution is {P(y|y∈V)}y∈v" iterative subsets are explicitly limited to the feasible subset {P(y|y∈V)}y∈v of the instance P(y) of a previous sampling iteration (every iteration samples candidates and then gates that selection based on the union subset with V). Huang also reinforces the iterative subset selection by explicitly stating that ([¶0055] "the layer-wise sampling probabilities gradually converge to more deterministic distributions") where the converged candidate set is a subset of the initial candidate set. The final selected architecture is also a subset of the hyper-parameter search space architectures of the previous iteration, the final selected architecture being a subset selected in the last iteration.)
updating parameters of the trained computing device stored in a memory device for at least some of the iterations ([¶0083] "the policy of the controller 30 can be updated based on the reward 32. In particular, the search system can update one or more values of one or more parameters of the controller model based on the value function described above" [¶0088] "The memory 114 can store data 116 and instructions 118 which are executed by the processor 112 to cause the user computing device 102 to perform operations." [¶0089] " the user computing device 102 can store or include one or more neural networks 120" [¶0090] "the one or more neural networks 120 can be received from the server computing system 130 over network 180, stored in the user computing device memory 114, and then used or otherwise implemented by the one or more processors 112" [¶0098] "The architecture search computing system 150 can include a model trainer 160 that trains and/or evaluates the machine-learned networks 120 and/or 140 stored at the user computing device 102 and/or the server computing system 130 using various training or learning techniques" Huang explicitly updates controller 30 which is explicitly stored on the computer in memory, and Huang further explicitly anticipates training (updating parameters of) the models stored on the computer in memory.)
based, at least in part, on empirically determined latencies of at least some sampled hyper-parameter search space instances selected on the at least some of the sampling iterations.([¶0100] "The architecture search computing system 150 can also optionally be communicatively coupled with various other devices (not specifically shown) that measure performance parameters of the generated networks (e.g., mobile phone replicas which replicate mobile phone performance of the networks to evaluate hardware-specific runtimes)." [¶0074] "the constraints evaluated at 20 can include constraints on the number of parameters, storage space required by the model, model runtime, training latency, serving latency, interoperability to certain hardware accelerators, parallelizability, etc." [¶0075] "a constraint on the runtime of the new architecture 18 may require some number of initial forward passes using the network to evaluate the runtime of the network" Huang explicitly empirically evaluates the network constraints and explicitly anticipates latency as one of the constraints that the architecture search space focuses on).
While Huang is already explicit that latency estimation can happen during iterative search/training loop, Huang does not explicitly disclose the subsets of the one or more sampled hyper-parameter search space instances selected in the previous sampling iteration according to a gradient computed based, at least in part on a latency estimator generated on at least some of the multiple sampling iterations by a trained computing device.
He, in the same field of endeavor, teaches the subsets of the one or more sampled hyper-parameter search space instances selected in the previous sampling iteration according to a gradient computed based, at least in part on a latency estimator generated on at least some of the multiple sampling iterations by a trained computing device ([p. 6] "our algorithm is not limited to constraining model size and it can be replaced by other resources, such as FLOPs or the actual inference time on mobile device" [p. 12] "we use TensorFlow Lite framework for timing evaluation […] measure how much we can improve its inference speed" [p. 3] "we define a reward that is a function of both accuracy and hardware resource. With this reward function, we are able to explore the limit of compression without harming the accuracy of models" [p. 2] "Our reinforcement learning agent (DDPG) receives the embedding st from a layer t, and outputs a sparsity ratio at. After the layer is compressed with at, it moves to the next layer Lt+1. The accuracy of the pruned model with all layers compressed is evaluated. Finally, as a function of accuracy and FLOP, reward R is returned to the reinforcement learning agent" AMC teaches using on-device timing evaluation as the resource signal (latency) in the reinforcement learning (RL) loop. Specifically generating the latency value during iterations and feeding it into the reward/objective that drives policy updates).
Huang as well as He are directed towards automated machine learning utilizing reinforcement-learning based neural architecture search for hardware-aware optimization. Therefore, Huang as well as He are analogous art in the same field of endeavor. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Huang with the teachings of He. Huang already has latency constraints evaluated during the iterative loop. AMC teaches the predictable optimization of generating that latency signal by device timing and using it as the resource term in the same RL reward/update loop. This can be accomplished by simple substitution of the loss function in Huang with the loss function in He. He provides as additional motivation for combination ([p. 13] “with AMC, we significantly raise the pareto curve”). This motivation for combination also applies to the remaining claims which depend on this combination.
Regarding claim 2, the combination of Huang, and He teaches The method of claim 1, wherein the empirically determined latencies are determined based, at least in part, on an observed and/or measured latencies of execution and/or simulation of candidate NN architectures(He [p. 9 §4.2] "The latency is measured with 224 × 224 input throughout the experiments")
the candidate NN architectures being defined, at least in part, by sampled hyper-parameter search space instances(Huang [¶0002] "the present disclosure relates to neural architecture search techniques that have improved computational efficiency via performance of a constraint-based screening and improved gradient update approach" [¶0058] "Example Rejection-Based Reward with MC Sampling. Only a subset of the architectures in the search space S will satisfy a set of given resource constraint(s), V denotes this set of feasible architectures. To find a feasible architecture, a resource target T0 is often used in an RL reward. Given an architecture y, a latency-aware reward combines its quality Q(y) and resource consumption T(y) into a single reward. [...] In these approaches β is a hyperparameter that needs careful tuning. The idea behind these reward functions is to encourage models with high quality with respect the resource target.").
Regarding claim 4, the combination of Huang, and He teaches The method of claim 1, wherein updating the parameters of the trained computing device further comprises: applying the latency estimator to at least some of the candidate NN architectures to compute predictions and/or estimates of latencies; and(He [p. 3 §1] "we propose resource-constrained compression to achieve the best accuracy given the maximum amount of hardware resources (e.g., FLOPs, latency, and model size)" [p. 4 §2] "AMC engine optimizes for both accuracy and latency" [p. 4 §3] "We aim to automatically find the redundancy for each layer, characterized by sparsity. We train an reinforcement learning agent to predict the action and give the sparsity, then perform form the pruning. We quickly evaluate the accuracy after pruning but before fine-tuning as an effective delegate of final accuracy. Then we update the agent by encouraging smaller, faster and more accurate models." [p. 5 3.2] "The State Space For each layer t, we have 11 features that characterize the state st" [p. 13] "By substituting FLOPs with latency, we can change from FLOPs-constrained search to latency-constrained search and directly optimize the inference time." [p. 9 §4.2] "The latency is measured with 224 × 224 input throughout the experiments")
applying a loss function to the empirically determined latencies and the predictions and/or estimates of latencies.(He Eqn. 5 interpreted as second loss. Eqn. 3 interpreted as first loss which depends on eqn. 5 which is interpreted as latency loss.).
Regarding claim 5, the combination of Huang, and He teaches The method of claim 4, wherein at least some of the parameters of the trained computing device comprise weights associated with nodes of a neural network stored in the memory device, and further comprising:(He [p. 4 3.1] "Coarse-grained structured pruning [31] aims to prune entire regular regions of weight tensors (e.g., channel, row, column, block, etc.). The pruned weights are regular and can be accelerated directly with off-the-shelf hardware and libraries. Here we study structured pruning that shrink the input channel of each convolutional and fully connected layer." Pruning weights and weight tensors interpreted as synonymous with updating parameters with the latency estimator (AMC).)
updating at least some of the weights associated with the nodes of the neural network based, at least in part, on a gradient applied to the loss function.(He [p. 6 3.2] "During the update, the baseline reward b is subtracted to reduce the variance of gradient estimation, which is an exponential moving average of the previous rewards").
Regarding claim 7, the combination of Huang, and He teaches The method of claim 1, wherein the gradient is computer based, at least in part, on a latency loss function and a functional loss. (He [p. 3 §1] "we define a reward that is a function of both accuracy and hardware resource" Eqn. 3 y interpreted as functional loss which depends on eqn. 5 for immediate reward ri which is interpreted as latency loss.).
Regarding claim 11, claim 11 is directed towards an apparatus for performing the method of claim 1. Therefore, the rejection applied to claim 1 also applies to claim 11. Claim 11 also recites additional elements one or more processors to:(Huang [¶0040] "The architecture of a neural network is the structure of how nodes are connected; examples of architectural choices are hidden layer sizes and activation types" [¶0097] "The architecture search computing system 150 includes one or more processors 152 and a memory 154").
Similarly, regarding claims 12 and 14-15, claims 12 and 14-15 are directed towards an apparatus for performing the methods of claims 2 and 4-5, respectively. Therefore, the rejections applied to claims 2 and 4-5 apply to claims 12 and 14-15.
Regarding claim 16, claim 16 is directed towards an article for performing the method of claim 1. Therefore, the rejection applied to claim 1 also applies to claim 16. Claim 16 also recites additional elements a non-transitory storage medium comprising computer-readable instructions stored thereon, the instructions to be executable by one or more processors of a computing device to: (Huang [¶0040] "The architecture of a neural network is the structure of how nodes are connected; examples of architectural choices are hidden layer sizes and activation types" [¶0097] "The architecture search computing system 150 includes one or more processors 152 and a memory 154").
Similarly, regarding claim 18, claim 18 is directed towards an article for performing the method of claim 7. Therefore, the rejection applied to claim 7 also applies to claim 18.
Claims 3 and 13 are rejected under U.S.C. §103 as being unpatentable over the combination of Huang and He and in further view of Choi (US 20220076121 A1).
Regarding claim 3, the combination of Huang, and He teaches The method of claim 2.
However, the combination of Huang, and He doesn't explicitly teach wherein execution of candidate NN architectures comprises execution of the candidate NN architectures on one or more neural processing units (NPUs).
Choi, in the same field of endeavor, teaches execution of candidate NN architectures comprises execution of the candidate NN architectures on one or more neural processing units (NPUs). ([¶0101] "The hardware in which the neural network is executed may correspond to hardware including a learnable processor such as a central processing unit (CPU), a graphics processing unit (GPU), a neural processing unit/neural processor (NPU), and the like. By setting the hardware restriction condition based on the hardware in which the neural network is executed, the neural network architecture may be optimized based on the hardware. For example, when the neural architecture search method is used to optimize a neural network architecture in the NPU, an operation supported in the NPU and a memory capacity of the NPU may be set as the hardware restriction condition").
The combination of Huang and He as well as Choi are directed towards neural architecture search. Therefore, The combination of Huang and He as well as Choi are analogous art in the same field of endeavor. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Huang and He with the teachings of Choi by performing the neural architecture search to optimize for a neural processing unit. Choi provides as additional motivation for combination ([¶0101] “By setting the hardware restriction condition based on the hardware in which the neural network is executed, the neural network architecture may be optimized based on the hardware. For example, when the neural architecture search method is used to optimize a neural network architecture in the NPU, an operation supported in the NPU and a memory capacity of the NPU may be set as the hardware restriction condition”). This motivation for combination also applies to the remaining claims which depend on this combination.
Regarding claim 13, claim 13 is directed towards an apparatus for performing the method of claim 3. Therefore, the rejection applied to claim 3 also applies to claim 13.
Claims 6, 8, 9, 10, 17, 19, and 20 are rejected under U.S.C. §103 as being unpatentable over the combination of Huang and He and Zhang (US20220130137A1).
Regarding claim 6, the combination of Huang, and He teaches The method of claim 1.
However, the combination of Huang, and He doesn't explicitly teach wherein updating the parameters of the trained computing device further comprises: identifying a subsequent hyper-parameter search space to define candidate NN architectures based, at least in part, on application of the latency estimator to obtain predictions and/or estimates of at least some NN architectures in a current hyper-parameter search space;
identifying one or more dummy hyper-parameter search spaces based, at least in part, on the subsequent hyper-parameter search space; and
updating parameters of the trained computing device for application of the latency estimator to at least some of the candidate NN architectures in the subsequent hyper-parameter search space based, at least in part, on empirically determined latencies at least some NN architectures in the one or more dummy hyper-parameter search spaces.
Zhang, in the same field of endeavor, teaches updating the parameters of the trained computing device further comprises: identifying a subsequent hyper-parameter search space to define candidate NN architectures based, at least in part, on application of the latency estimator to obtain predictions and/or estimates of at least some NN architectures in a current hyper-parameter search space; ([Abstract] "A method and an apparatus for searching a neural network architecture comprising a backbone network and a feature network. The method comprises: a. forming a first search space for the backbone network and a second search space for the feature network; b. using a first controller to sample a backbone network model in the first search space, and using a second controller to sample a feature network model in the second search space; c. combining the first controller and the second controller by adding collected entropy and probability of the sampled backbone network model and feature network model to obtain a combined controller; d. using the combined controller to obtain a combined model; e. evaluating the combined model, and updating a combined model parameter according to an evaluation result" [¶0056] "it is possible to handle multi-task problems and balance accuracy and latency during the search due to the use of multiple losses (such as RLOSS, FLOSS, FLOP)" Second controller interpreted as synonymous with subsequent controller.)
identifying one or more dummy hyper-parameter search spaces based, at least in part, on the subsequent hyper-parameter search space; and([¶0019] "FIG. 6 schematically shows combination of features and a second search space." [¶0052] "The right part of FIG. 6 schematically shows construction of the second search space" Schematic of search space interpreted as dummy search space.)
updating parameters of the trained computing device for application of the latency estimator to at least some of the candidate NN architectures in the subsequent hyper-parameter search space based, at least in part, on empirically determined latencies at least some NN architectures in the one or more dummy hyper-parameter search spaces.([¶0056] "the backbone network and the feature network can be updated at the same time, so as to ensure an overall good output of the detection network; it is possible to handle multi-task problems and balance accuracy and latency during the search due to the use of multiple losses (such as RLOSS, FLOSS, FLOP); since lightweight convolution operation is used in the search space, the found model is small and thus is especially suitable for mobile environments and resource-limited environments.").
The combination of Huang and He as well as Zhang are directed towards neural architecture search. Therefore, the combination of Huang and He as well as Zhang are analogous art in the same field of endeavor. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Huang and He with the teachings of Zhang by having multiple search spaces for fine grain control and continuous model optimization. Zhang provides as additional motivation for combination ([¶0032] "By iteratively performing steps S240 to S270, the joint controller may be continuously updated according to the validation accuracy of the joint model, so that the updated joint controller may generate a better joint model, and thereby continuously improving the validation accuracy of the obtained joint model"). This motivation for combination also applies to the remaining claims which depend on this combination.
Regarding claim 8, the combination of Huang, and He teaches The method of claim 1, and further comprising: applying a first gradient for updating a super set of weights to be selectable for application of nodes of subsequently identified candidate NN architectures; and(He [p. 6 3.2] "During the update, the baseline reward b is subtracted to reduce the variance of gradient estimation, which is an exponential moving average of the previous rewards").
However, the combination of Huang, and He doesn't explicitly teach applying a second gradient for updating a set of NN network topology features to be selectable for the subsequently identified candidate NN architectures..
Zhang, in the same field of endeavor, teaches applying a second gradient for updating a set of NN network topology features to be selectable for the subsequently identified candidate NN architectures.([¶0031] "the calculated gradient is scaled according to the validation accuracy of the joint model, so as to update the joint controller." Scaled gradient interpreted as second gradient.).
The combination of Huang and He as well as Zhang are directed towards neural architecture search. Therefore, the combination of Huang and He as well as Zhang are analogous art in the same field of endeavor. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Huang and He with the teachings of Zhang by having multiple search spaces for fine grain control and continuous model optimization. Zhang provides as additional motivation for combination ([¶0032] "By iteratively performing steps S240 to S270, the joint controller may be continuously updated according to the validation accuracy of the joint model, so that the updated joint controller may generate a better joint model, and thereby continuously improving the validation accuracy of the obtained joint model"). This motivation for combination also applies to the remaining claims which depend on this combination.
Regarding claim 9, the combination of Huang, He, and Zhang teaches The method of claim 8, wherein the set of NN network topology features comprises selectable channel sizes for at least one layer in the subsequently identified candidate NN architectures.(He [p. 4] "The shape of a weight tensor is n × c × k × k, where n, c are output and input channels, and k is the kernel size." See Eqn. 1 and FIG. 1 where layer percentage (probability mass function) corresponds to selectable channel sizes.).
Regarding claim 10, the combination of Huang, He, and Zhang teaches The method of claim 9, and further comprising: mapping the selectable channel sizes to a probability mass function; and selecting at least one of the subsequently identified candidate NN architectures based, at least in part, on the probability mass function(He [p. 4] "The shape of a weight tensor is n × c × k × k, where n, c are output and input channels, and k is the kernel size." See Eqn. 1 and FIG. 1 where layer percentage (probability mass function) corresponds to selectable channel sizes. Alternatively state space to select channel pruning action can be interpreted as probability mass function.).
Regarding claims 17 and 19-20, claims 17 and 19-20 are directed towards an article for performing the methods of claims 6 and 8-9, respectively. Therefore, the rejections applied to claims 6 and 8-9 also apply to claims 17 and 19-20.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Fu (“Enhancing Model Parallelism in Neural Architecture Search for Multidevice System”, 2020) is directed towards multiple sampling iterations of a latency-aware hyper-parameter search space.
Xu (“Latency-Aware Differentiable Neural Architecture Search”, 2020) is directed towards sampling sub-architectures each iteration of a gradient based latency estimator in a neural architecture search.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124