Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. CN202010745395.3, filed on 07/29/2020.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/07/2023 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1, 9, 12, and 15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Choukroun et al. “Low-bit Quantization of Neural Networks for Efficient Inference”.
Regarding claim 1 Choukroun teaches A data processing method, comprising: marking each layer of a network model as a key layer or a non-key layer according to acquired structural information of the network model; (Choukroun, page 3013-3014, section 3.5, teaches the marking of key layers of a neural network to be used in the quantization process. The layers that are not chosen as key layers can be interpreted as non-key layers since they are not key-layers) respectively determining a quantization bit width range of the key layer and a quantization bit width range of the non-key layer according to hardware resource information that needs to be deployed; (Choukroun, page 3011-3016, section 3, teaches the determining of the bit precision (i.e. quantization bit width) using quantization based on the system that is going to be deployed.) determining, in the quantization bit width range, optimal quantization bit widths of each layer of the network model; and training the network model based on the optimal quantization bit widths of each layer of the network model, so as to obtain an optimal network model, and performing data processing using the optimal network model. (Choukroun, page 3011-3016, section 3, teaches the determining of bit precision (i.e. quantization bit width) of each layer and training a neural network based on those determined optimal bit precisions in order to determine an optimal neural network to run on the desired limited hardware.)
Regarding claim 5 Choukroun teaches The method according to claim 1, wherein the network model comprises at least one of an image classification model, an image detection model, an image recognition model, and a natural language processing model. (Choukroun, page 3016, section 4, teaches the training and use of a neural network model that comprises a image recognition model.)
Regarding claim 9 Choukroun teaches A data processing device, comprising: at least one processor; and a memory, configured to store a computer program which can be run on the at least one processor, and the computer program, when being executed by the at least one processor, causes the at least one processor to: marking each layer of a network model as a key layer or a non-key layer according to acquired structural information of the network model; (Choukroun, page 3013-3014, section 3.5, teaches the marking of key layers of a neural network to be used in the quantization process. The layers that are not chosen as key layers can be interpreted as non-key layers since they are not key-layers) respectively determining a quantization bit width range of the key layer and a quantization bit width range of the non-key layer according to hardware resource information that needs to be deployed; (Choukroun, page 3011-3016, section 3, teaches the determining of the bit precision (i.e. quantization bit width) using quantization based on the system that is going to be deployed.) determining, in the quantization bit width range, optimal quantization bit widths of each layer of the network model; and training the network model based on the optimal quantization bit widths of each layer of the network model, so as to obtain an optimal network model, and performing data processing using the optimal network model; (Choukroun, page 3011-3016, section 3, teaches the determining of bit precision (i.e. quantization bit width) of each layer and training a neural network based on those determined optimal bit precisions in order to determine an optimal neural network to run on the desired limited hardware.)
Regarding claim 12 Choukroun teaches The method according to claim 1, wherein the network model is selected based on a service that needs to be performed. (Choukroun, page 3011-3016, section 3-4, teaches the use of different neural network architectures (i.e. network model) that correspond to different tasks that are being performed (i.e. selected based on a service that needs to be performed).)
Regarding claim 15 Choukroun teaches The method according to claim 1, wherein the optimal quantization bit widths of each layer of the network model are determined by a global search method or by an exhaustive method. (Choukroun, page 3011-3016, section 3, teaches the use of quantization to find the optimal bit precisions (i.e. optimal bit width) through a process of running each layer in order to find the best bit precision for every layer (i.e. exhaustive method.)
Regarding claim 23 Choukroun teaches The device according to claim 9, wherein the network model comprises at least one of an image classification model, an image detection model, an image recognition model, and a natural language processing model. (Choukroun, page 3016, section 4, teaches the training and use of a neural network model that comprises a image recognition model.)
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Choukroun et al. “Low-bit Quantization of Neural Networks for Efficient Inference” in view of Frumkin Pub. No.: US 20190278600 A1.
Regarding claim 13 Choukroun teaches The method according to claim 1, wherein the hardware resource information …1bearable by a deployed platform. (Choukroun, page 8, sections 4-5, teach the ability of the neural network to specify a desired precision (i.e. hardware resource information) based on the constrained hardware (i.e. deployed platform))
Choukroun does not teach…1includes a maximum model size or maximum computing resources… However, Frumpkin teaches this limitation in analogous art (Frumkin, Paragraph 0064- 0065, teaches the ability of a user or system to set a maximum model size for a machine learning algorithm.)
It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Frumkin’s teaching of setting a maximum model size with Choukroun’s teaching of a method of quantizing neural networks to run on limited hardware. The motivation to do so would be to allow for the system to specify the maximum size of the model based on the limited size and computing capacity of the system that will be running the model.
Allowable Subject Matter
Claim 2-4, 10-11, 14, and 16-22 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS B LANE whose telephone number is (571)272-1872. The examiner can normally be reached M-Th: 7am-5pm; F: Out of Office.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MARIELA REYES can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/THOMAS BERNARD LANE/Examiner, Art Unit 2142
/HAIMEI JIANG/Primary Examiner, Art Unit 2142