DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments regarding the 112(b) rejection have been fully considered. The rejection is withdrawn in view of claim amendments.
Applicant’s arguments regarding the 103 rejections have been fully considered but are moot in light of a new rejection.
Applicant’s arguments regarding the motivation to combine have been considered. In response, applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art. See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007). In this case the references are not combined to modify the structure but rather to combine techniques from both references to achieve optimal results. These optimal results are evident from their respective disclosures. Both references pertain to searching for neural network architectures and thus inherently work with each other. While Tan pertains to optimizing for mobile hardware, general optimization is a desirable aspect in any computing system utilizing machine learning.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4 and 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Singh in view of Tan further in view of Stamoulis, Dimitrios, et al. "Hyperpower: Power-and memory-constrained hyper-parameter optimization for neural networks." [herein in Stam].
Regarding claim 1, Singh teaches “A method of determining a target neural network architecture, the method comprising: determining, a structure of a first neural network architecture based on a loss function […];” Singh, abstract “In one aspect, neural architecture search method including selecting a neural architecture for training as part of an automated machine learning process; collecting statistical parameters on individual nodes of the neural architecture during the training; determining, based on the statistical parameters, active nodes of the neural architecture to form a candidate neural architecture; and validating the candidate neural architecture to produce a trained neural architecture to be used in implemented an application or a service” and Col 8, Lines 1-11; “At S314, NAS system 200 (via neural controller 202 and/or analysis engine 206) performs a validation process using validation set described above to determine whether the candidate neural architecture's performance is acceptable (e.g., within a margin of error of a defined output result defined by neural network description 102). At S315, NAS system 200 (via neural controller 202) determines if the validation indicates that the performance is acceptable. If acceptable, at S316, NAS system 200 returns the candidate neural architecture as a child neural architecture of output 210.”
Singh fails to teach the remaining limitations.
However, Tan teaches:
“generating a target neural network architecture used in a processor, based on a result of the determining of the first neural network architecture;” Tan, Paragraph [0031], [0035], [0039]; “More particularly, in some implementations, a search system can define an initial network structure that includes a plurality of blocks. A plurality of sub-search spaces can be respectively associated with the plurality of blocks. The sub-search space for each block can have one or more searchable parameters associated therewith… [based on a result of the determining of the first neural network architecture] Thus, at each of a plurality of iterations, the search system can modify at least one of the searchable parameters in the sub-search space associated with at least one of the plurality of blocks to generate one or more new network structures for an artificial neural network… [generating a target neural network architecture] Thus, the present disclosure: proposes a novel factorized hierarchical search space to maximize the on device resource efficiency of mobile models, [used in a processor] by striking the right balance between flexibility and search space size; introduces a multi-objective neural architecture search approach based on reinforcement learning, which is capable of finding high accuracy CNN models with low real-world inference latency; and show significant and consistent improvements over state-of-the-art mobile CNN models.”
“generating an output by providing an input to a trained neural network having the target neural network architecture on the processor” Tan, Paragraph [0036], Fig. 3; “Additionally or alternatively, the search system can use the measured performance characteristics to determine a reward to provide to the controller in a reinforcement learning scheme and/or other measurements of loss, reward, regret, and/or the like (e.g., for use in gradient-based optimization schemes). As an example, the measured performance characteristics can include an accuracy (or an estimated accuracy) of the network structure as trained for and evaluated on a particular training dataset and/or prediction task.”
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the methods of Singh to incorporate the teachings of Tan to perform a neural architecture search based on specific processor hardware because specifying a processor to use allows the NAS to develop a network that is tailored to the specific processor which improves performance when deployed on mobile hardware.
The references however do not explicitly teach the power consumption aspect.
Stam however teaches “determining a structure of a first neural network architecture based on a loss function comprising a processor computation cost until a first search end condition is satisfied, where the processor computation cost includes a power-consuming hyperparameter based on power consumed to access memory when the processor trains the first neural architecture” Stam abstract “we propose HyperPower, a framework that enables efficient Bayesian optimization and random search in the context of power- and memory-constrained hyper parameter optimization for NNs running on a given hardware platform. HyperPower is the first work (i) to show that power consumption can be used as a low-cost, a priori known con straint, and (ii) to propose predictive models for the power and memory of NNsexecuting on GPUs” and pg. 2 figure 2.
PNG
media_image1.png
400
1128
media_image1.png
Greyscale
“wherein the trained neural network has a same architecture as the target neural network architecture with balanced prediction performance and processor efficiency based on the loss function” Stam pg. 3 ¶ above §3.3 “We are first to exploit this insight to train predic tive models for the power and memory of NN architectures. More importantly, we use the predictive models to formulate a power- and memory-constrained acquisition function.” which shows that the architectures are utilized for balanced performance
It would have been obvious to one having ordinary skill in the art at the time that the invention was effectively filed to combine the teachings of Singh and Tan with at of Stam since “HyperPower significantly speeds up the hyper parameter optimization, achieving up to 57.20× more function evaluations compared to constraint-unaware methods for a given time interval, effectively yielding significant accuracy improvements by up to 67.6%” Stam abstract. This shows that by combining these techniques with each other, one would have a more efficient learning model that is able to run effectively on a given hardware.
Claim 13, which recites the additional limitation of, “A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1” Singh, Col. 3, Lines 40-43; “In one aspect, one or more non-transitory computer readable media include computer-readable instructions, which when executed by one or more processors implementing a neural architecture search system” is rejected under the same grounds as claim 1, mutatis mutandis.
Claim 14, which recites the additional limitation of, “A target neural network architecture determination apparatus, the apparatus comprising” Singh, Col. 10, Lines 35-40; “Exemplary system includes a cache and a processing unit (CPU or processor) and a system connection that couples various system components including the system memory, such as read only memory (ROM) and random access memory (RAM), to the processor.” is rejected under the same grounds as claim 1, mutatis mutandis.
Regarding claim 2, Singh in view of Tan further in view of Stam teaches the method of claim 1.
Stam further teaches “wherein determining of the structure of the first neural network architecture comprises determining from among a plurality of candidate architectures by comparing respective loss functions to the candidate architectures” Stam figure 2 “The objective function (NN test error) is evaluated, i.e., the candidate NN design xn+1 is trained and tested. Then, the probabilistic model M is refined via Bayesian posterior updating based on the new observation. After Nmax iterations, HyperPower returns the design x∗ with optimal accuracy that satisfies the hardware constraints.s”
Regarding claim 3, Singh in view of Tan teaches the method of claim 1.
Tan further teaches “wherein the processor computation cost further comprises a power-consuming hyperparameter, the time-consuming hyperparameter is determined based on a time to access a memory when the processor trains the first neural network architecture.” Tan, Paragraph [0037]; “Unlike in previous work, where mobile latency is considered via another, often inaccurate proxy (e.g., FLOPS), in some implementations, real-world inference latency can be directly measured by executing the model on a particular platform… [the time-consuming hyperparameter is determined based on a time to access a memory when the processor trains the first neural network architecture] In further implementations, various other performance characteristics can be included in a multi-objective function that guides the search process, including, as examples, power consumption… [wherein the processor computation cost further comprises a power-consuming hyperparameter]”
Claim 15 is rejected under the same grounds as claim 3 as being substantially similar, mutatis mutandis.
Regarding claim 4, Singh in view of Tan teaches the method of claim 1.
Tan further teaches, “wherein the first neural network architecture comprises at least one structure, the at least one structure is formed by stacking at least one network block, the at least one network block comprises at least one mix operation, and the at least one mix operation is connected to at least one primitive operation.” Tan, Fig. 3;
PNG
media_image2.png
514
1220
media_image2.png
Greyscale
(Examiner Notes: Tan discloses a convolutional network block structure (at least one structure formed by stacking at least one network block) the network block contains the concatenation mix operation and several primitive operations (convolutions, etc.))
Claim 16 is rejected under the same grounds as claim 4 as being substantially similar, mutatis mutandis.
Claims 5-6 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Singh and Tan and Stam in view of Liu
Regarding claim 5, Singh and Tan teach “the method of claim 4”.
Singh further teaches: “wherein the at least one mix operation is determined by: determining, in response to a second search end condition not being satisfied a second neural network architecture;” Singh, Col. 3, Lines 9-10; “during each iteration, a different neural network architecture is selected.”
“and generating, based on a result of the searching for the second neural network, a mix operation of the second neural network architecture.” Singh, Col. 3, Lines 18-21; “the candidate neural architecture is not validated when, output of the candidate neural architecture, when tested with a validation dataset, is not within a margin of error of a defined output.”
(Examiner Notes: Singh discloses an architecture being selected and searching the architecture in response to a stop condition not being met.)
Singh and Tan fail to teach: “and generating, based on a result of the searching for the second neural network, a mix operation of the second neural network architecture.”
However, Liu teaches:
“and generating, based on a result of the searching for the second neural network, a mix operation of the second neural network architecture.” Liu, Section 2.2;
PNG
media_image3.png
452
1202
media_image3.png
Greyscale
(Examiner notes: Liu discloses a method for determining a mix operation based on the result of a neural architecture search.)
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the methods of Singh and Tan and Stam to incorporate the teachings of Liu to continuously calculate the best mixing operations because selecting optimal mixing operations produces networks with better performance in floating point operations per second.
Claim 17 is rejected under the same grounds as claim 5 as being substantially similar, mutatis mutandis.
Regarding claim 6, Singh, Tan, Stam and Liu teach the method of claim 5.
Liu further teaches, “wherein the second neural network architecture comprises at least one network block, the at least one network block comprises at least one candidate combination operation based on a network block configuration rule of the at least one network block,” Liu, Fig. 1;
PNG
media_image4.png
449
944
media_image4.png
Greyscale
(Examiner Notes: Liu discloses a second neural architecture made up of at least one network block with each network block having at least one candidate combination operation selected by the configuration rule of continuous relaxation and evaluation of the highest weighted operations between nodes.)
“and the at least one candidate combination operation comprises at least one of a plurality of primitive operations.” Liu, Section 2.2;”
PNG
media_image5.png
294
806
media_image5.png
Greyscale
(Examiner Notes: Liu additionally discloses that the candidate operation is at least one of a selection of primitive operations (convolution, max pooling, zero).)
Claim 18 is rejected under the same grounds as claim 6 as being substantially similar, mutatis mutandis.
Claims 7-11 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Singh, Tan, Stam and Liu in view of Yang
Regarding claim 7, Singh, Tan and Liu teach the method of claim 6.
Singh, Tan, Stam and Liu fail to teach, “further comprising: determining the network block configuration rule based on artificial settings.”
However, Yang teaches:
“further comprising: determining the network block configuration rule based on artificial settings.” Yang, Section II(B); “Targeting on a 2 × 2 NoC with fixed hardware configuration by X-Y routing in Figure 3(b), HW-aware NAS can identify a simpler architecture in Figure 3(a) with only 0.32% accuracy loss. As a result, the latency is reduced from 6.2ms to 5.1ms (by 17.07%) as shown in Figure 3(c).” (Examiner Notes: Yang discloses determining a block configuration rule (latency reduction) based on a real time throughput constraint which is an artificial setting.)
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the methods of Singh, Tan and Liu to incorporate the teachings of Yang to use latency reduction percentage as a constraint because adequate configuration rules lead to minimal loss in accuracy while producing a significant gain in processing throughput.
Regarding claim 8, Singh, Tan, Stam, Liu and Yang teach the method of claim 6.
Yang further teaches: “further comprising: determining the network block configuration rule based on a network block structure of the processor.” Yang, Section II(B); “Targeting on a 2 × 2 NoC with fixed hardware configuration by X-Y routing in Figure 3(b), HW-aware NAS can identify a simpler architecture in Figure 3(a) with only 0.32% accuracy loss. As a result, the latency is reduced from 6.2ms to 5.1ms (by 17.07%) as shown in Figure 3(c).” (Examiner Notes: Yang discloses determining a block configuration rule (2 x 2 NoC hardware configuration) which is a rule based on the network block structure of a processor.)
Claim 19 is rejected under the same grounds as claim 8 as being substantially similar, mutatis mutandis.
Regarding claim 9, Singh, Tan, Stam, Liu and Yang teach the method of claim 8.
Yang further teaches, “The method of claim 8, wherein the determining of the network block configuration rule based on the network block structure of the processor comprises: obtaining one or more candidate neural networks by transforming an initial neural network based on at least one transformation scheme in a test platform of the processor; obtaining a running state of each of the one or more candidate neural networks in the test platform; and determining the network block configuration rule based on the running state of each of the one or more candidate neural networks.” Yang, Fig. 3;
PNG
media_image6.png
482
850
media_image6.png
Greyscale
Claim 20 is rejected under the same grounds as claim 9 as being substantially similar, mutatis mutandis.
Regarding claim 10, Singh, Tan, Stam, Liu, and Yang teach the method of claim 9.
Yang further teaches, “wherein the obtaining of the running state comprises obtaining a time consumed by each of the one or more candidate neural networks to process a reference data set in the test platform.” Yang, Fig. 3;
PNG
media_image7.png
482
850
media_image7.png
Greyscale
(Examiner Notes: Yang discloses a running state obtained based on throughput in terms of bits processed per second which results in a time to process an input in milliseconds.)
Regarding claim 11, Singh, Tan, Stam, Liu, and Yang teach the method of claim 9.
Tan further teaches, “wherein the obtaining of the one or more candidate neural networks comprises any one or any combination of: horizontally expanding the initial neural network; vertically expanding the initial neural network; performing parallel splitting on a single operation of the initial neural network; changing a size of a feature map of the initial neural network; and changing a number of channels of the initial neural network.” Tan, Fig. 3, Paragraphs [0049-0067];
PNG
media_image8.png
514
1220
media_image8.png
Greyscale
(Examiner Notes: Tan discloses horizontally expanding a model via iterative searching.)
Regarding claim 12, Singh, Tan, Liu Stam, and Yang teach the method of claim 8.
Yang further teaches, “wherein the network block configuration rule is determined based on any one or any combination of a priority relationship” Yang, Section 4(B); “Compared with single PE based NAS, the best accuracy can be improved from 88.39% to 90.68% (2 × 2 NoC) and 93.59% (3 × 3 NoC).”
between vertical expansion and horizontal expansion, a number of operations obtained by parallel splitting of a single operation, a number of channels, and a size of a feature map. Yang, Section 4(B); “In addition, for solutions with the maximum throughput, the accuracy and throughput for single PE platform are 88.39%, 0.50Gbps, which are improved to 90.68%, 0.72Gbps for 2 × 2 NoC and 91.58%, 2.40Gbps for 3 × 3 NoC.”
(Examiner Notes: Yang discloses a priority relationship (accuracy and throughput) between expanding a network both horizontally and vertically to introduce an increase in both accuracy and throughput the larger the network grows.)
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN W FIGUEROA whose telephone number is (571)272-4623. The examiner can normally be reached Monday-Friday, 10AM-6PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached at (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
KEVIN W FIGUEROA
Primary Examiner
Art Unit 2124
/Kevin W Figueroa/Primary Examiner, Art Unit 2124