Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 18-27 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claims do not fall within at least one of the four categories of patent eligible subject matter because the claimed invention is directed to an abstract idea of a mental concept and mathematical relationship without significantly more. The claims recite the mental concept and mathematical relationship of obtaining a codebook based on a tensor; quantizing parameters; encoding the quantized parameters obtaining a codebook size from a search; quantizing a probability density function (PDF); and selecting a PDF bounding factor. This judicial exception is not integrated into a practical application because the steps of transmitting information is insignificant extra solution activity. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements of processors are generic computer parts.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 21 and 26 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Applicant claims the acronym “PDF” without declaring the definition first. Examiner will interpret the acronym to mean “probability density function”. Spec. 89.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 18, 22, 23 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions by Wu et al and DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING by Han et al.
Claims 19 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions by Wu et al, DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING by Han et al and US 20190327479 A1 to Chen et al.
Claims 20 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions by Wu et al, DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING by Han et al and US 20120170830 A1 to Blanton et al.
Claims 21 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions by Wu et al, DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING by Han et al and Weighted-Entropy-based Quantization for Deep Neural Networks by Park et al.
Wu teaches claims 18 and 23. A method comprising:
obtaining a codebook including a codebook size for quantizing parameters of a tensor associated with at least one layer of a Deep Neural Network, (Wu sec. 2.1 “a convolutional layer RsXsXcXm … we reshape it as a matrix W…. we treat all columns of W as N samples, and apply k-means to assign them with K clusters…. we need only to store the cluster indexes and codebooks after k-means.” The clusters are the codebook. K is the codebook size.) the codebook size obtained according to a (Wu sec. 2.1 “cluster rate for each layer as K/N here… each convolutional layer chooses its K value such that all layers have the same cluster rate, expect for the first layer whose cluster rate is often set higher.” The value is the cluster rate because the cluster rate is the number of cluster centers K divided by the number of weights N – so the cluster rate is determined based on quantize tensors (cluster centers) and tensors (weights). Wu, above, teaches that K (codebook size) for each layer is based on the cluster rate for the layer.
quantizing the parameters of the tensor using the obtained codebook to represent the parameters with at least a determined size. (Wu sec. 1.1 “compression is performed via weight-sharing, by only recording cluster centers and weight assignment indexes…” Wu footnote 1 p. 7 “we only quantize weight and activation to 8 bit and to 16 bit in fully-connected layers, respectively…” The compression is the quantizing. The obtained codebook is the cluster centers. The cluster centers represent the parameters (weights) with at leas a determined size of 8bit or 16 bit.)
Wu doesn’t teach a distortion value like the distortion value described in paragraph 90 of the instant specification.
However, Han teaches determining a K based on distortion value. (Han sec. 3.1 “We partition n original weights W = {w1, w2, ..., wn} into k clusters C = {c1, c2, ..., ck}, n >> k, so as to minimize the within-cluster sum of squares (WCSS)…” The number of clusters K is chosen to minimize the distortion value sum of squares.)
The claims, Han and Wu all compress neural networks. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to have a dynamic K because Wu explicitly advocates for it1 and Han says that dynamic K can be used to minimize cluster spread.2
Wu teaches claims 19 and 24. The method of claim 18, further comprising:
encoding the quantized parameters
(Wu sec. 2.1 “a convolutional layer RsXsXcXm … we reshape it as a matrix W…. we treat all columns of W as N samples, and apply k-means to assign them with K clusters…. we need only to store the cluster indexes and codebooks after k-means.” Assigning a weights to clusters is encoding the parameters/weights. Wu sec. 1 teaches that the point of the compression is to “bring CNN into resource-constrained mobile devices…” brining quantized NN into another device requires encoding the quantized NN to transfer the quantized NN to the mobile device.)
Wu doesn’t teach transmitting the compressed neural network.
However, Chen teaches encoding the quantized parameters in a bitstream for transmission; and
transmitting the encoded quantized parameters in the bitstream to a decoder. (Chen para 11 “FIG. 1 is a block diagram illustrating an example process for compressing, transmitting, and decompressing neural network data…”)
The claims, Chen and Wu are all directed to compressing neural networks. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to send a compressed neural network to decrease “demands on storage performance and memory access bandwidth.” Chen para 5.
Han teaches claims 20 and 25. The method of claim 18, further comprising obtaining the codebook size from a (Han sec. 3.1 “We partition n original weights W = {w1, w2, ..., wn} into k clusters C = {c1, c2, ..., ck}, n >> k, so as to minimize the within-cluster sum of squares (WCSS)…” The number of clusters K is chosen to minimize the distortion value sum of squares. Trying different numbers for K is searching over a range of codebook sizes.)
Han doesn’t teach the binary search for K.
However, Blanton teaches a binary search for K. (Blanton para 45 “as long as each data point (i.e., snippet image) is reasonably close to its cluster center, K is not increased to avoid fragmentation of the snippet images. To improve efficiency, a binary search may be used to search for K.” K is the codebook size.)
The claims, Wu, Han and Blanton all use K-means clustering. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to do a binary search for K to “improve efficiency…” Blanton para 45.
Wu teaches claims 21 and 26. The method of claim 18, further comprising:
quantizing based on a (Wu sec. 2.1 “a convolutional layer RsXsXcXm … we reshape it as a matrix W…. we treat all columns of W as N samples, and apply k-means to assign them with K clusters…. we need only to store the cluster indexes and codebooks after k-means.” Assigning a weights to clusters is quantizing.)
selecting from a (Wu sec. 2.1 “cluster rate for each layer as K/N here… each convolutional layer chooses its K value such that all layers have the same cluster rate…” choosing K values is selecting.)
Wu doesn’t teach quantizing based on a selected PDF factor based on entropy.
However, Park teaches quantizing based on a pdf-based initialization bounded according to a first pdf factor; and (Park sec. 4.1 “Based on this importance value of each weight, we derive a metric for evaluating the quality of a clustering result (i.e., quantization result) based on weighted entropy…” The quantizing of the weights is based on a weighted entropy S which is a function of relative frequency. Relative frequency is a PDF-based initialization bounded according to a first pdf factor.)
selecting from a candidate bounding pdf factor, the candidate bounding pdf factor based on an entropy obtained from candidate quantized parameters. (Park sec. 4.1 p. 5459 right column teaches the selection of a relative frequency (pdf factor) based on entropy of the weights “Starting from the initial cluster boundaries, we iteratively perform incremental search on the new cluster boundaries... for each cluster boundary candidate c’i, we recalculate the weighted entropy... and update the boundary to c'i only if the new overall weighted entropy S' is higher than the current one.”)
The claims, Wu and Park all quantize a set of weights. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to quantize using a pdf factor selected based on entropy of the quantized parameters in order to “greatly reduc[] the design-time effort to quantize the network.” Park abs.
Wu teaches claims 22 and 27. The method of claim 18, further comprising encoding information representative of a codebook type corresponding to the codebook. (Wu sec. 1 teaches that the point of the compression is to “bring CNN into resource-constrained mobile devices…” brining quantized NN into another device requires encoding the quantized NN to transfer the quantized NN to the mobile device. Wu sec. 2.1 “When K << N, we need only to store the cluster indexes and codebooks after k-means.” The codebook type is K-means, the codebook is the k clusters representing the weights of the quantized network.)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Austin Hicks whose telephone number is (571)270-3377. The examiner can normally be reached Monday - Thursday 8-4 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AUSTIN HICKS/Primary Examiner, Art Unit 2142
1 Wu sec. 5 “Our future work will exploit more adaptive cluster rates for different layers instead of the current uniform scheme.” Wu’s K is selected based on cluster rate, so dynamic cluster rates inherit a dynamic K.
2 Han sec. 3.1 “We partition n original weights W = {w1, w2, ..., wn} into k clusters C = {c1, c2, ..., ck}, n >> k, so as to minimize the within-cluster sum of squares (WCSS)…”