DETAILED ACTION
1. This communication is in response to the Application No. 18/228,153 filed on July 31, 2023 in which Claims 1-12 are presented for examination.
Notice of Pre-AIA or AIA Status
2. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
3. The information disclosure statement submitted on 11/14/2023 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
4. The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
5. Claim 6 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 6 recites the limitation "[…] according to the method (30) of claim 1 […]” without any prior recitation of such a method (30) in Independent Claim 1. Further, although Applicant’s drawing Figure 3 discloses a method 30 for transforming a pre-trained neural network, this method is not identical to the language of currently drafted Independent claim 1. There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 101
6. 35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
7. Claims 1-12 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1:
Step 1: Claim 1 is a method type claim. Therefore, Claims 1-6 and 11-12 are directed to either a process, machine, manufacture, or composition of matter.
2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components, then it falls within the “Mathematical Concepts” grouping of abstract ideas.
transforming a pre-trained neural network (mental process – transforming a pre-trained neural network may be performed manually by a user observing/analyzing a received pre-trained neural network and accordingly using judgement/evaluation to transform the pre-trained neural network. For example, a user may transform the pre-trained neural network by adjusting the number of nodes, number of layers, weight values, loss functions, etc. with the aid of pen and paper)
generating, by the transformation device, a ternary representation of each weight vector, by transforming each weight vector into a ternary decomposition, comprising a ternary matrix, and a power-of-two vector, wherein elements of the power-of-two vector are different powers of two (mathematical process – generating a ternary representation of each weight vector by transforming each weight vector into a ternary decomposition comprising a ternary matrix and a power-of-two vector, wherein elements of the power-of-two vector are different powers of two may be performed by mathematical process/calculation. See Applicant’s specification Pgs. 9-10 which detail the mathematical process of transforming trained neural network weights into a ternary representation – this process of transforming is also illustrated by instant Figure 5)
whereby an output of each neuron, obtainable by a multiplication between an input vector of each neuron and the respective weight vector, can be determined by additions, subtractions and bit shift operations (mathematical process – determining an output of each neuron, obtainable by a multiplication between an input vector of each neuron and the respective weight vector, by additions, subtractions, and bit shift operations may be performed by mathematical process/calculation. Particularly, the output of each neuron may be obtained by mathematical processes such as vector multiplication and/or additions, subtractions, and bit shift operations)
2A Prong 2: This judicial exception is not integrated into a practical application.
Additional elements:
[…] by a transformation device […] (recited at a high-level of generality (i.e., as a generic “transformation device” configured to perform the specific operations of the claim language without significantly more) such that it amounts to no more than mere instructions to apply the exception using generic computer components)
receiving, by a transformation device, the pre-trained neural network, wherein the pre-trained neural network comprises a number of neurons, and wherein each neuron is associated with a respective weight vector (Adding insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g))
outputting, by the transformation device, a transformed neural network, wherein the weight vectors of each neuron is represented by the ternary representation (Adding insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g))
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
[…] by a transformation device […] (mere instructions to apply the exception using generic computer components cannot provide an inventive concept)
receiving, by a transformation device, the pre-trained neural network, wherein the pre-trained neural network comprises a number of neurons, and wherein each neuron is associated with a respective weight vector (MPEP 2106.05(d)(II) indicates that merely “Receiving or transmitting data over a network” is a well-understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed limitation is well-understood, routine, conventional activity is supported under Berkheimer)
outputting, by the transformation device, a transformed neural network, wherein the weight vectors of each neuron is represented by the ternary representation (MPEP 2106.05(d)(II) indicates that merely “Presenting offers and gathering statistics” is a well-understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed limitation is well-understood, routine, conventional activity is supported under Berkheimer)
For the reasons above, Claim 1 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 2-6 and 11-12. The additional limitations of the dependent claims are addressed below.
Regarding Claim 2:
Step 2A Prong 1:
See the rejection of Claim 1 above, which Claim 2 depends on.
Step 2A Prong 2 & Step 2B:
wherein each element of the ternary matrix has a value of 1, 0 or -1 (Field of Use – limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception does not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application; in this case specifying wherein each element of the ternary matrix has a value of 1, 0, or -1 does not integrate the exception into a practical application nor amount to significantly more – See MPEP 2106.05(h))
Accordingly, under Step 2A Prong 2 and Step 2B, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.
Regarding Claim 3:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 3 depends on.
pruning the neural network by removing a set of columns of the ternary matrix and the corresponding elements of the power-of-two vector of at least one neuron (mental process – pruning the neural network by removing a set of columns of the ternary matrix and the corresponding elements of the power-of-two vector of at least one neuron may be performed manually by a user observing/analyzing the ternary matrix and elements of the power-of-two vector of at least one neuron and using judgement/evaluation to remove a set of columns of the ternary matrix and the corresponding elements of the power-of-two vector with the aid of pen and paper, hence pruning the neural network)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.
Regarding Claim 4:
Step 2A Prong 1: See the rejection of Claim 3 above, which Claim 4 depends on.
determining a contribution score of each element in the power-of-two vector (mental process – determining a contribution score of each element in the power-of-two vector may be performed manually by a user observing/analyzing each element in the power-of-two vector and accordingly using judgement/evaluation to determine and assign a contribution score of each element, based on said analysis)
selecting the elements of the power-of-two vector which has a contribution score below a contribution threshold, and the corresponding columns of the ternary matrix (mental process – selecting the elements with a contribution score below a threshold may be performed manually by a user observing/analyzing all elements, their corresponding contribution scores, and the threshold value and accordingly using judgement/evaluation to select elements with a contribution score below the threshold and similarly selecting corresponding columns of the ternary matrix)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.
Regarding Claim 5:
Step 2A Prong 1: See the rejection of Claim 3 above, which Claim 5 depends on.
selecting the elements of the power-of-two vector which has an exponent value below an exponent threshold, and the corresponding columns of the ternary matrix (mental process – selecting elements of the power-of-two vector with an exponent value below a threshold may be performed manually by a user observing/analyzing the elements of the power-of-two vector and the exponent threshold and accordingly using judgement/evaluation to select elements of the power-of-two vector with an exponent value below the exponent threshold and similarly selecting the corresponding columns of the ternary matrix)
Step 2A Prong 2 & Step 2B:
Accordingly, under Step 2A Prong 2 and Step 2B, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.
Regarding Claim 6:
Step 2A Prong 1: See the rejection of Claim 1 above, which Claim 6 depends on.
wherein an output of a neuron of the number of neurons is obtained by a multiplication between an input vector and a weight vector represented by the ternary representation, and wherein said multiplication is determined by additions, subtractions and bit shift operations (mathematical process – determining an output of each neuron, obtainable by a multiplication between an input vector of each neuron and the respective weight vector, by additions, subtractions, and bit shift operations may be performed by mathematical process/calculation. Particularly, the output of each neuron may be obtained by mathematical processes such as vector multiplication and/or additions, subtractions, and bit shift operations)
Step 2A Prong 2 & Step 2B:
running a neural network having been transformed according to the method (30) of claim 1 (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner' s note: high level recitation of generically applying a machine learning model without significantly more. This cannot provide an inventive concept)
receiving, by an inference device, input data (MPEP 2106.05(d)(II) indicates that merely “Receiving or transmitting data over a network” is a well-understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed limitation is well-understood, routine, conventional activity is supported under Berkheimer)
[…] inference device […] (mere instructions to apply the exception using generic computer components cannot provide an inventive concept)
inputting, by the inference device, the input data into the neural network (MPEP 2106.05(d)(II) indicates that merely “Receiving or transmitting data over a network” is a well-understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed limitation is well-understood, routine, conventional activity is supported under Berkheimer)
processing, by the inference device, the input data by the neural network to determine output data of the neural network, wherein said processing comprises propagating the input data through a number of neurons of the neural network (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner' s note: high level recitation of generically applying a machine learning model to determine output data without significantly more. This cannot provide an inventive concept)
Accordingly, under Step 2A Prong 2 and Step 2B, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.
Independent Claim 7 recites substantially the same limitations as Claim 1, in the form of a device, including generic computer components. The claim is also directed to performing mental processes/mathematical calculations without significantly more, therefore it is rejected under the same rationale.
For the reasons above, Claim 7 is rejected as being directed to an abstract idea without significantly more. This rejection applies equally to dependent claims 8-10. The additional limitations of the dependent claims are addressed below.
Claim 8 recites substantially the same limitations as Claim 3, in the form of a device, including generic computer components. The claim is also directed to performing mental processes/mathematical calculations without significantly more, therefore it is rejected under the same rationale.
Claim 9 recites substantially the same limitations as Claim 4, in the form of a device, including generic computer components. The claim is also directed to performing mental processes/mathematical calculations without significantly more, therefore it is rejected under the same rationale.
Claim 10 recites substantially the same limitations as Claim 5, in the form of a device, including generic computer components. The claim is also directed to performing mental processes/mathematical calculations without significantly more, therefore it is rejected under the same rationale.
Claim 11 recites substantially the same limitations as Claim 6, in the form of a device, including generic computer components. The claim is also directed to performing mental processes/mathematical calculations without significantly more, therefore it is rejected under the same rationale.
Regarding Claim 12:
Step 2A Prong 1:
See the rejection of Claim 1 above, which Claim 12 depends on.
Step 2A Prong 2 & Step 2B:
a non-transitory computer-readable storage medium comprising program code portions which, when executed on a device having processing capabilities, performs the method according to claim 1 (mere instructions to apply the exception using generic computer components cannot provide an inventive concept)
Accordingly, under Step 2A Prong 2 and Step 2B, these additional elements do not integrate the abstract idea into practical application because they do not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1.
Claim Rejections - 35 USC § 103
8. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
9. Claims 1-3, 5-8, and 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (hereinafter Zhou) (“Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights”), in view of Keller et al. (hereinafter Keller) (US PG-PUB 20220129755).
Regarding Claim 1, Zhou teaches a computer implemented method for transforming a pre-trained neural network (Zhou, Pg. 1, Abstract, “This paper presents incremental network quantization (INQ), a novel method, targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version whose weights are constrained to be either powers of two or zero.”, therefore, methods for transforming a pre-trained neural network are disclosed), the method comprising:
receiving, by a transformation device (Zhou does not explicitly disclose a transformation device – See introduction of Keller reference below for teaching of a transformation device), the pre-trained neural network, wherein the pre-trained neural network comprises a number of neurons, and wherein each neuron is associated with a respective weight vector (Zhou, Pg. 3, Section 2.1 Weight Quantization with Variable-Length Encoding, “Suppose a pre-trained full-precision (i.e., 32-bit floating-point) CNN model can be represented by {Wl : 1 ≤ l ≤ L}, where Wl denotes the weight set of the lth layer, and L denotes the number of learnable layers in the model. […] Given a pre-trained full-precision CNN model, the main goal of our INQ is to convert all 32-bit floating-point weights to be either powers of two or zero without loss of model accuracy”, therefore, a pre-trained neural network is received, wherein the pre-trained neural network (CNN model) comprises a number of neurons and each neuron is associated with an according weight set/vector. The specific pre-trained models (AlexNet, VGG-16, GoogleNet, ResNet-18, and ResNet-50) are detailed in Section 3 Experimental Results);
generating, by the transformation device (Zhou does not explicitly disclose a transformation device – See introduction of Keller reference below for teaching of a transformation device), a ternary representation of each weight vector, by transforming each weight vector into a ternary decomposition, comprising a ternary matrix (Zhou, Pg. 8, Section 3.3 The Trade-off between expected bit-width and model accuracy, “The third set of experiments is performed to explore the limit of the expected bit-width under which our INQ can still achieve lossless network quantization. Similar to the second set of experiments, we also use ResNet-18 as a test case, and the parameter settings for the batch size, the weight decay and the momentum are completely the same. Finally, lower-precision models with 4-bit, 3-bit and even 2-bit ternary weights are generated for comparisons.”, therefore, a ternary representation of each weight vector may be generated by transforming each weight vector into a ternary decomposition comprising a 2-bit ternary matrix. The decomposition process is also shown by Algorithm 1 on Pg. 6), and a power-of-two vector, wherein elements of the power-of-two vector are different powers of two (Zhou, Pgs. 3-4, “Given a pre-trained full-precision CNN model, the main goal of our INQ is to convert all 32-bit floating-point weights to be either powers of two or zero without loss of model accuracy. Besides, we also attempt to explore the limit of the expected bit-width under the premise of guaranteeing lossless network quantization. Here, we start with our basic network quantization method on how to convert Wl to be a low-precision version c Wl, and each of its entries is chosen from Pl ={±2n1,··· ,±2n2,0}, (1) where n1 and n2 are two integer numbers, and they satisfy n2 ≤ n1”, therefore, each weight vector may have a corresponding power-of-two vector, wherein elements of the power-of-two vector are different powers of two (depicted by equation 1 on Pg. 4)); and
outputting, by the transformation device (Zhou does not explicitly disclose a transformation device – See introduction of Keller reference below for teaching of a transformation device), a transformed neural network, wherein the weight vectors of each neuron is represented by the ternary representation (Zhou, Pg. 6, Algorithm 1, “Output: {c Wl : 1 ≤ l ≤ L}: the final low-precision model with the weights constrained to be either powers of two or zero”, therefore, the transformed neural network/final low-precision model is outputted, wherein the weight vectors of each neuron is represented by the ternary representation);
whereby an output of each neuron, obtainable by a multiplication between an input vector of each neuron and the respective weight vector, can be determined by additions, subtractions (Zhou does not explicitly disclose that the output of each neuron can be determined by additions and subtractions – See introduction of Keller reference below for teaching of whereby an output of each neuron, obtainable by a multiplication between an input vector of each neuron and the respective weight vector, can be determined by additions and subtractions) and bit shift operations (Zhou, Pg. 10, Section 3.4 Low-Bit Deep Compression, “The direct advantage of our INQ is that the original floating-point multiplication operations can be replaced by cheaper binary bit shift operations on dedicated hardware like FPGA.”, therefore, the output of each neuron, obtainable by floating point multiplication operations, can be instead determined by bit shift operations)
Zhou does not explicitly disclose a transformation device
However, Keller teaches a transformation device (Keller, Par. [0085], “FIG. 4B is a conceptual diagram of a processing system 400 implemented using the PPU 200 of FIG. 2, in accordance with an embodiment. The exemplary system 465 may be configured to implement the method 100 shown in FIG. 1.”, therefore, a computing system/device for implementing the method of Keller Figure 1 (which comprises creating and modifying/transforming an artificial neural network to include at least one ternary matrix) is disclosed)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer implemented method for transforming a pre-trained neural network, as disclosed by Zhou to include a transformation device for implementing the operations of claim 1, as disclosed by Keller. One of ordinary skill in the art would have been motivated to make this modification to enable the use of a processing device for transforming a pre-trained neural network, which may improve performance by reducing model size and computational costs (Keller, Par. [0120], “Specifically, the employment of such networks on edge devices, which are bandwidth, energy, and area constrained, is often too costly in practical applications. Sparsity and quantization techniques are viable solutions, reducing the model size and computational cost while being suited for existing hardware.”).
While Zhou teaches that the output of each neuron, obtainable by floating point multiplication operations, can be instead determined by bit shift operations (See Zhou Pg. 10 Section 3.4), Zhou does not explicitly disclose: whereby an output of each neuron, obtainable by a multiplication between an input vector of each neuron and the respective weight vector, can be determined by additions and subtractions
However, Keller teaches whereby an output of each neuron, obtainable by a multiplication between an input vector of each neuron and the respective weight vector, can be determined by additions, subtractions (Keller, Par. [0249], “Sampling proportional to the weights of a neural unit provides a derivation of ternary weights and yields a simple algorithm to create artificial neural networks that only use addition and subtraction instead of weight multiplication.”, therefore, an output of each neuron, obtainable by weight multiplication (See Par. [0179] and Figure 6 for support), can be instead determined by addition and subtraction)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer implemented method for transforming a pre-trained neural network including a transformation device, as disclosed by Zhou in view of Keller to include whereby an output of each neuron, obtainable by a multiplication between an input vector of each neuron and the respective weight vector, can be determined by additions, subtractions, as disclosed by Keller. One of ordinary skill in the art would have been motivated to make this modification to reduce complex multiplication operations by replacing these operations with simpler addition and subtraction operations, hence reducing computational cost while also maintaining precision (Keller, Par. [0249], “We showed that dropout partitions realize dropout at only a fraction of the pseudo-random numbers used before. Sampling proportional to the weights of a neural unit provides a derivation of ternary weights and yields a simple algorithm to create artificial neural networks that only use addition and subtraction instead of weight multiplication. As a consequence, it is straightforward to quantize neural networks to integer weights without retraining. […] Constructing sparse artificial neural networks with linear complexity bears the potential of artificial neural networks without max-pooling, i.e. without resolution reduction, and therefore more precision”).
Regarding Claim 2, Zhou in view of Keller teaches the method according to claim 1, wherein each element of the ternary matrix has a value of 1, 0 or -1 (Keller, Par. [0024], “In one embodiment, each ternary matrix may include a matrix that includes only the values −1, 0, and 1.”, thus, each element of the ternary matrix has a value of 1, 0, or -1).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of claim 1, as disclosed by Zhou in view of Keller to include wherein each element of the ternary matrix has a value of 1, 0 or -1, as disclosed by Keller. One of ordinary skill in the art would have been motivated to make this modification to enable the substation of the weight matrices with ternary matrices having a value of 1, 0, or -1, which may reduce complexity and resource usage while improving performance of the network (Keller, Par. [0020], “By substituting ternary matrices for one or more weight matrices of fully connected layers within the ANN, a complexity and resource usage of the ANN may be reduced, while improving the performance of the ANN.”).
Regarding Claim 3, Zhou in view of Keller teaches the method according to claim 1, further comprising pruning the neural network by removing a set of columns of the ternary matrix and the corresponding elements of the power-of-two vector of at least one neuron (Zhou, Pg. 7, “Regarding weight partition, there are several candidate strategies as we tried in our previous work for efficient network pruning (Guo et al., 2016). In Guo et al. (2016), we found random partition and pruning-inspired partition are the two best choices compared with the others. Thus in this paper, we directly compare these two strategies for weight partition. In random strategy, the weights in each layer of any pre-trained full-precision deep CNN model are randomly split into two disjoint groups. In pruning-inspired strategy, the weights are divided into two disjoint groups by comparing their absolute values with layer-wise thresholds which are automatically determined by a given splitting ratio. Here we directly use pruning-inspired strategy and the experimental results in Section 3.2 will show why.”, thus, the neural network may be pruned by removing a set of columns of the ternary matrix and respective elements of the power-of-two vector – this is better depicted by Figure 2 on Pg. 5 & the pruning process is detailed further in section 3.2).
Regarding Claim 5, Zhou in view of Keller teaches the method according to claim 3, wherein the set of columns and elements to be removed is selected by:
selecting the elements of the power-of-two vector which has an exponent value below an exponent threshold, and the corresponding columns of the ternary matrix (Zhou, Pgs. 3-4, Section 2.1 Weight Quantization with Variable-Length Encoding, “Here, we start with our basic network quantization method on how to convert Wl to be a low-precision version c Wl, and each of its entries is chosen from Pl ={±2n1,··· ,±2n2,0}, (1) where n1 and n2 are two integer numbers, and they satisfy n2 ≤ n1. Mathematically, n1 and n2 help to bound Pl in the sense that its non-zero elements are constrained to be in the range of either [−2n1,−2n2] or [2n2,2n1]. That is, network weights with absolute values smaller than 2n2 will be pruned away (i.e., set to zero) in the final low-precision model.”, therefore, elements of the power-of-two vector which has an exponent value below a threshold (weights with absolute values smaller than 2^n2) will be pruned away and correspondingly removed from the ternary matrix).
Regarding Claim 6, Zhou in view of Keller teaches a method for running a neural network having been transformed according to the method (30) of claim 1 (Zhou, Pg. 7, Section 3. Experimental Results, “To analyze the performance of our INQ, we perform extensive experiments on the ImageNet large scale classification task, which is known as the most challenging image classification benchmark so far.”, thus, a method for running a neural network having been transformed according to the method of claim 1, in order to analyze its performance, is disclosed), the method comprising:
receiving, by an inference device, input data (Zhou, Pg. 7, Section 3. Experimental Results, “To analyze the performance of our INQ, we perform extensive experiments on the ImageNet large scale classification task, which is known as the most challenging image classification benchmark so far. ImageNet dataset has about 1.2 million training images and 50 thousand validation images. Each image is annotated as one of 1000 object classes.”, therefore, input data is received);
inputting, by the inference device, the input data into the neural network (Zhou, Pg. 7, “We apply our INQ to AlexNet, VGG-16, GoogleNet, ResNet-18 and ResNet-50, covering almost all known deep CNN architectures. Using the center crops of validation images, we report the results with two standard measures: top-1 error rate and top-5 error rate.”, therefore, input data is inputted into the neural network to test its performance); and
processing, by the inference device, the input data by the neural network to determine output data of the neural network (Zhou, Pg. 7, “Using the center crops of validation images, we report the results with two standard measures: top-1 error rate and top-5 error rate. For fair comparison, all pre-trained full-precision (i.e., 32-bit floating point) CNN models except ResNet-18 are taken from the Caffe model zoo2. Note that He et al. (2016) do not release their pre-trained ResNet-18 model to the public, so we use a publicly available re-implementation by Facebook3. Since our method is implemented with Caffe, we make use of an open source tool4 to convert the pre-trained ResNet-18 model from Torch to Caffe.”, thus, the input data is processed by the neural network to determine output data – results of which are shown by Table 1 on Pg.. 7 and Tables 2 and 4 on Pg. 9);
wherein said processing comprises propagating the input data through a number of neurons of the neural network (Zhou, Pg. 7, “We apply our INQ to AlexNet, VGG-16, GoogleNet, ResNet-18 and ResNet-50, covering almost all known deep CNN architectures.”, therefore, the incremental network quantization is applied to a multitude of networks, all of which are deep CNN architectures which comprise a plurality of neurons for which the input data is propagated through),
wherein an output of a neuron of the number of neurons is obtained by a multiplication between an input vector and a weight vector represented by the ternary representation, and wherein said multiplication is determined by additions, subtractions (Keller, Par. [0249], “Sampling proportional to the weights of a neural unit provides a derivation of ternary weights and yields a simple algorithm to create artificial neural networks that only use addition and subtraction instead of weight multiplication.”, therefore, an output of each neuron, obtainable by weight multiplication (See Par. [0179] and Figure 6 for support), can be instead determined by addition and subtraction) and bit shift operations (Zhou, Pg. 10, Section 3.4 Low-Bit Deep Compression, “The direct advantage of our INQ is that the original floating-point multiplication operations can be replaced by cheaper binary bit shift operations on dedicated hardware like FPGA.”, therefore, the output of each neuron, obtainable by floating point multiplication operations, can be instead determined by bit shift operations).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.
Regarding Claim 7, Zhou in view of Keller teaches a transformation device (Keller, Par. [0085], “FIG. 4B is a conceptual diagram of a processing system 400 implemented using the PPU 200 of FIG. 2, in accordance with an embodiment. The exemplary system 465 may be configured to implement the method 100 shown in FIG. 1.”, therefore, a computing system/device for implementing the method of Keller Figure 1 (which comprises creating and modifying/transforming an artificial neural network to include at least one ternary matrix) is disclosed) for transforming a pre-trained neural network (Zhou, Pg. 1, Abstract, “This paper presents incremental network quantization (INQ), a novel method, targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version whose weights are constrained to be either powers of two or zero.”, therefore, methods for transforming a pre-trained neural network are disclosed), the device comprising circuitry (Keller, Par. [0021], “FIG. 1 illustrates a flowchart of a method 100 for incorporating a ternary matrix into a neural network, in accordance with an embodiment. Although method 100 is described in the context of a processing unit, the method 100 may also be performed by a program, custom circuitry, or by a combination of custom circuitry and a program.”, thus, the transformation device as disclosed above may comprise circuitry) configured to execute: […]
The rest of the claim language in Claim 7 recites substantially the same limitations as Claim 1, in the form of a device, therefore it is rejected under the same rationale.
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.
Claim 8 recites substantially the same limitations as Claim 3 in the form of a device, therefore it is rejected under the same rationale.
Claim 10 recites substantially the same limitations as Claim 5 in the form of a device, therefore it is rejected under the same rationale.
Claim 11 recites substantially the same limitations as Claim 6 in the form of a device, therefore it is rejected under the same rationale.
Regarding Claim 12, Zhou in view of Keller teaches a non-transitory computer-readable storage medium comprising program code portions which, when executed on a device having processing capabilities (Keller, Claim 16, “A non-transitory computer-readable media storing computer instructions which when executed by one or more processors of a device cause the one or more processors to cause the device to: […]”, thus, a non-transitory computer-readable storage medium comprising program code to be executed on a device with processing capabilities (processing system 400 taught by the rejection of Claim 1 above) is disclosed), performs the method according to claim 1 (See rejection of Claim 1 above which recites substantially the same limitations and rejected under the same rationale).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.
10. Claims 4 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (hereinafter Zhou) (“Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights”), in view of Keller et al. (hereinafter Keller) (US PG-PUB 20220129755), further in view of Molchanov et al. (hereinafter Molchanov) (“Importance Estimation for Neural Network Pruning”).
Regarding Claim 4, Zhou in view of Keller teaches the method according to claim 3.
While Zhou in view of Keller teaches pruning based on different partitioning strategies for the power-of-two vector and ternary matrix (See Zhou Pgs. 8-9), Zhou in view of Keller does not explicitly disclose wherein the set of columns and elements to be removed is selected by:
determining a contribution score of each element in the power-of-two vector; and,
selecting the elements of the power-of-two vector which has a contribution score below a contribution threshold, and the corresponding columns of the ternary matrix.
However, Molchanov teaches wherein the set of columns and elements to be removed is selected by:
determining a contribution score of each element in the Molchanov, Pg. 2, “We propose a new method for estimating, with a little computational overhead over training, the contribution of a neuron (filter) to the final loss. To do so, we use averaged gradients and weight values that are readily available during training. We compare two variants of our method using the first and second-order Taylor expansions, respectively, against a greedy search (“oracle”), and show that both variants achieve state-of-the-art results, with our first order criteria being significantly faster to compute with slightly worse accuracy. We also find that using a squared loss as a measure for contribution leads to better correlations with the oracle and better accuracy when compared to signed difference”, therefore, a contribution score of each element of the weight vector of a neuron is determined. Further, it should be noted that although Molchanov does not explicitly disclose a power-of-two vector in particular, Molchanov teaches determining a contribution score of each element in a weight vector of a neuron, regardless of whether the vector is a power-of-two vector – Zhou in view of Keller is relied upon for teaching the specific power-of-two vector per the rejection of Claims 1 & 3 above. Applicant’s specification Pg. 16 lines 31-37 and Pg. 17 lines 1-5 also support determining a contribution score based on a first order Taylor expansion of the loss, as similarly shown above by Molchanov); and,
selecting the elements of the Molchanov, Pg. 4, “Averaging importance scores over pruning iterations. We average importance scores between pruning iterations using an exponential moving average filter (momentum) with coefficient 0.9. Pruning strategy. We found that the method performs better when we define the number of neurons to be removed, prune them in batches and fine-tune the network after that. An alternative approach is to continuously prune as long as the training or validation loss is below the threshold. The latter approach leads the optimization into local minima and final results are slightly worse.”, therefore, elements of the vector which have a contribution score below a threshold may be removed)
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of pruning according to claim 3, as disclosed by Zhou in view of Keller to include wherein the set of columns and elements to be removed is selected by: determining a contribution score of each element in the power-of-two vector; and, selecting the elements of the power-of-two vector which has a contribution score below a contribution threshold, and the corresponding columns of the ternary matrix, as disclosed by Molchanov. One of ordinary skill in the art would have been motivated to make this modification to enable pruning based on a contribution score, which may provide a reliable estimate of true importance, hence improving network accuracy (Molchanov, Pg. 1, Abstract, “For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-of-the-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet.”).
Claim 9 recites substantially the same limitations as Claim 4 in the form of a device, therefore it is rejected under the same rationale.
Conclusion
11. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Devika S Maharaj whose telephone number is (571)272-0829. The examiner can normally be reached Monday - Thursday 8:30am - 5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached at (571)270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DEVIKA S MAHARAJ/Examiner, Art Unit 2123