Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea of a mathematical concept without significantly more. The claims recite calculating the weights of a neural network using forward propagation, determining the fit using expansion and binarization, calculating the gradient of the error for the weights, calculating subfunction of the series expansion, fitting the error to a neural network, and using Fourier/wavelet/discrete Fourier as the expansion. This judicial exception is not integrated into a practical application because the claims to a specific data type merely link the abstract idea to the field of computers. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the components such as memory and processor are generic computer parts.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1, 2, 4, 5, 7-9, 11, 12, 14-16, 18 and 19 are rejected under 35 U.S.C. 102(a)(1) as being described by Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks by Gong et al.
Gong teaches claims 1, 8 and 15. A neural network training method, comprising: (Gong abs. “DSQ can automatically evolve during training to gradually approximate the standard quantization.”)
performing, in a forward propagation process and using a binarization function, (Gong sec. 3.1 “For 1-bit binary quantization, the binary neural network (BNN) limits its activations and weights to either-1 or +1 usually using the binary function… sgn(x)…” The binarizing happens in the forward pass in algorithm 1 in section 3.5, see below.)
PNG
media_image1.png
146
342
media_image1.png
Greyscale
binarization processing on a target weight to obtain a weight of a first neural network layer in a neural network, (See above in Gong algorithm 1 where the wq- is the weight and it is a function of binarization of the weight wsq. These weights are the layer weights.) or on an activation value of a second neural network layer in neural network to obtain an input of the first neural network layer; (The inputs are quantized/binarized too because Gong sec. 4.1 says “When building up a quantized model, we simply insert DSQ function to all places that will be quantized ,e.g., the inputs and weights of a convolution layer.”)
determining a fitting function based on series expansion of the binarization function; and (Gong sec. 1 “DSQ employs a series of hyperbolic tangent functions to gradually approach the staircase function for low-bit quantization (e.g., sign for 1-bit case), and meanwhile keeps the smoothness for easy gradient calculation.” The binarization is aproximated by a series of tangent functions.)
calculating, in a backward propagation process, a first gradient of a loss function with respect to the target weight using a second gradient of the fitting function as a third gradient of the binarization function. (Gong sec. 1 “DSQ employs a series of hyperbolic tangent functions to gradually approach the staircase function for low-bit quantization (e.g., sign for 1-bit case), and meanwhile keeps the smoothness for easy gradient calculation.” Gong abs “Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation…” This shows that Gong uses a series of smooth differentiable tangent function to approximate the gradient used in backpropagation. This teaches taking the gradient of the fitting function instead of the gradient of the binarization function, because binaries are not differentiable.)
Gong teaches claims 2, 9 and 16. The neural network training method of claim 1, further comprising determining a plurality of subfunctions based on the series expansion, wherein the fitting function comprises the plurality of subfunctions and an error function. (Gong sec. 1 “DSQ employs a series of hyperbolic tangent functions to gradually approach the staircase function for low-bit quantization (e.g., sign for 1-bit case), and meanwhile keeps the smoothness for easy gradient calculation.” The subfunctions are the tangent functions, the error function is included because the gradient is calculated and the gradient is a gradient of error.)
Gong teaches claims 4, 11 and 18. The neural network training method of claim 2, further comprising fitting the error function using at least one neural network layer, wherein calculating the first gradient comprises:
calculating, in the backward propagation process, fourth gradients of the plurality of subfunctions with respect to the target weight;
calculating a fifth gradient of the at least one neural network layer with respect to the target weight; and
calculating the first gradient based on the fourth gradients and the fifth gradient. (Gong algorithm 1 and equation 6, see below. The gradient of alpha is the gradient of the subfunctions. It’s all with respect to weight because Gong sec. 4.1 says “When building up a quantized model, we simply insert DSQ function to all places that will be quantized ,e.g., the inputs and weights of a convolution layer.”)
PNG
media_image2.png
116
308
media_image2.png
Greyscale
Algorithm 1
PNG
media_image3.png
40
280
media_image3.png
Greyscale
Equation 6
Gong teaches claims 5, 12 and 19. The neural network training method of claim 1, further comprising determining a plurality of subfunctions based on the series expansion, wherein the fitting function comprises the plurality of subfunctions. (Gong sec. 1 “DSQ employs a series of hyperbolic tangent functions to gradually approach the staircase function for low-bit quantization (e.g., sign for 1-bit case), and meanwhile keeps the smoothness for easy gradient calculation.” The tangent functions are the series expansion and, collectively, they are the fitting function.)
Gong teaches claims 7 and 14. The neural network training method of claim 1, wherein a data type of the target weight is a 32-bit floating point type, a 64-bit floating point type, a 32-bit integer type, or an 8-bit integer type. (Gong sec. 2.2 “frameworks usually support 8-bit integer arithmetic…” Gong sec. 4.5 says “while existing open-source high performance inference frameworks (e.g.,NCNN-8-bit [31]) usually only support 8-bit operations. In practice, the lower bitwidth doesn’t mean a faster inference speed, mainly due to the overflow and transferring among the registers…” This teaches an 8bit integer type as the standard before this paper. That means the 8-bit integer type was known at the time of filing.)
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 3, 10 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks by Gong et al and Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm by Liu et al.
Claims 6, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks by Gong et al and https://www.physics.smu.edu/scalise/P4321sp20/fs.pdf (Olver).
Gong teaches claims 3, 10 and 17. The neural network training method of claim 2, further comprising fitting the error function by using a (Gong sec. 4.1 “All convolution and fully-connected layers except the first and the last one are quantized with DSQ.”)
Gong doesn’t teach a two layer with a residual.
However, Liu teaches a two-layer fully connected neural network with a residual. (Liu p. 14 “keep the weights and activations in the first convolution and the last fully-connected layers to be real-valued.” Liu p. 2 “we propose to keep these real activations via adding a simple yet effective shortcut, dubbed Bi-Real net. As shown in Fig. 1(b), the shortcut connects the real activations to an addition operator with the real-valued activations of the next block.” The shortcut connection is the residual connection.)
Liu, Gong and the claims are all directed to training fully connected neural networks. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to have a residual shortcut to “keep these real activations” (Liu p. 14) because “the representational capability of the Bi-Real net is significantly enhanced and the additional cost on computation is negligible.” Liu abs.
Gong teaches claims 6, 13 and 20. The neural network training method of claim 1, wherein the series expansion is a (Gong sec. 1 “DSQ employs a series of hyperbolic tangent functions to gradually approach the staircase function for low-bit quantization (e.g., sign for 1-bit case), and meanwhile keeps the smoothness for easy gradient calculation.”)
Gong doesn’t teach a Fourier series expansion of the binarization function.
However, Olver teaches a Fourier series expansion of the binarization function. (Olver p. 645 “Thus, the Fourier series converges, as expected, to f(x) at all points of continuity; at discontinuities, the Fourier series can’t decide whether to converge to the right or left hand limit, and so ends up ‘splitting the difference’ by converging to their average; see Figure 12.4.” Fig. 12.4 shows a fourier series of a binary/step function, below. And equation 12.41, shows a fourier series expansion of a step function.)
PNG
media_image4.png
250
314
media_image4.png
Greyscale
PNG
media_image5.png
86
652
media_image5.png
Greyscale
Olver, Gong and the claims all turn a non-continuous step function into a smooth function. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to use a Fourier series to do the conversion because “If f(x) is any piecewise continuous function, then its Fourier coefficients are well defined — the integrals (12.28) exist and are finite.” Olver p. 644
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Austin Hicks whose telephone number is (571)270-3377. The examiner can normally be reached Monday - Thursday 8-4 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AUSTIN HICKS/Primary Examiner, Art Unit 2142