Office Action Analysis: 17960052 — MODEL COMPRESSION VIA QUANTIZED SPARSE PRINCIPAL COMPONENT ANALYSIS

Office Action

§101 §102 §103 §112
DETAILED ACTION
This action is responsive to the application filed on 01/21/2023. Claims 1-29 are pending and have been examined.
This action is Non-final.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C.
120, 121, 365(c), or 386(c) is acknowledged.

Claim Interpretation
(f) Element in Claim for a Combination. - An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph: 
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Regarding 35 U.S.C. 112(f) invocations:
 The following claim limitations invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
Claim 11: “means for retrieving…means for determining….means for processing…”
Claim 12: “means for repeating…means for retrieving retrieving…means for determining….means for processing...”
Claim 13: “means for processing…means for performing…” 
Clam 14: “means for generating…means for factorizing…means for quantizing…”
Claim 15: “means for factorizing…means for factorizing…” 
Claim 16: “means for optimizing…” 

The following appears to be the closest portions of the specification corresponding to the 35 U.S.C. 112(f) invocations:
[0083] “The various operations of methods described above may be performed by any suitable means capable of performing the corresponding functions. The means may include various hardware and/or software component(s) and/or module(s), including, but not limited to, a circuit, an application specific integrated circuit (ASIC), or processor. Generally, where there are operations illustrated in the figures, those operations may have corresponding counterpart means-plus-function components with similar numbering.”
[0096] “Further, it should be appreciated that modules and/or other appropriate means for performing the methods and techniques described can be downloaded and/or otherwise obtained by a user terminal and/or base station as applicable. For example, such a device can be coupled to a server to facilitate the transfer of means for performing the methods described. Alternatively, various methods described can be provided via storage means (e.g., RAM, ROM, a physical storage medium such as a compact disc (CD) or floppy disk, etc.), such that a user terminal and/or base station can obtain the various methods upon coupling or providing the storage means to the device. Moreover, any other suitable technique for providing the methods and techniques to a device can be utilized.”
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b): 
(b) CONCLUSION.-The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. 
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. 
Claims 11-20 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Regarding 35 U.S.C. 112(f) invocations:
 The following claim limitations invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
Claim 11: “means for retrieving…means for determining….means for processing…”
Claim 12: “means for repeating…means for retrieving retrieving…means for determining….means for processing...”
Claim 13: “means for processing…means for performing…” 
Clam 14: “means for generating…means for factorizing…means for quantizing…”
Claim 15: “means for factorizing…means for factorizing…” 
Claim 16: “means for optimizing…” 

However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The claims recites “means for” limitation processing, performing, generating, factorizing, etc., and is devoid of specific, corresponding and clear structure in the claim language as well as the specification. 

Therefore, the claims are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. Applicant may:
Amend the claim so that the claim limitation will no longer be interpreted as a limitation (b) under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph;
Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); 
Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either:
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition
of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the
conditions and requirements of this title. 
Claims 1-29 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1,
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a process.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“determining, for the layer, a weight tensor based on a product of the codebook matrix and the latent matrix;” -- The limitation is directed to determining a weight tensor based on a product of matrices. The limitation is directed to mathematical operation/relationship and thus is directed to math. 
“A processor-implemented method, comprising: retrieving, for a layer of an artificial neural network (ANN), a codebook matrix representing a codebook and a latent matrix representing linear coefficient;” -- Retrieving a matrix for an ANN that represents a matrix/coefficient is directed to math. 
“and processing, at the layer, an input based on the weight tensor.” -- The limitation “processing, at the layer, an input based on the weight tensor” – This recites mathematical operations. The specification explains that this involves the “product of the codebook matrix and the latent matrix” and that “the input may be processed by performing a convolution based on weights associated with the weight tensor” (specification at [0066]). These steps are mathematical in nature, involving matrix operations and convolution. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 1 is non-patent eligible.  Claims 11 and 21 are analogous to claim 1, apart from one limitation in claim 21 “An apparatus for an artificial neural network (ANN), comprising: a processor; a memory coupled with the processor; and instructions stored in the memory and operable, when executed by the processor, to cause the apparatus to” which would have been seen as an additional element merely applying the abstract ideas to a computer (see MPEP 2106.05(f)). The same rejection above can be applied to all cases. Regarding claim 29, it also shares similar limitations with claims 1,11, and 21, despite the difference of using “program code” to process the limitations. Using program code to perform the same additional elements does not exceed judicial exception, and the same rejection for claim 1 is able to apply as well. 

Regarding claim 2, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a process.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B: 
“The processor-implemented method of claim 1, further comprising repeating, for each remaining layer of a set of layers, the retrieving, the determining, and the processing based on a codebook matrix and a latent matrix associated with a weight tensor of each remaining layer of the set of layers.” -- The limitation recites repeating the steps of retrieving, determining and processing the matrices as recited in claim 1 with further limitations that the matrices that are processed will be associated with a remaining layer. It does not integrate to a practical application, nor does it provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 2 is non-patent eligible. Claims 12 and 22 are analogous to claim 2, and thus would face the same rejection as set forth above. 

Regarding claim 3, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a process.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“The processor-implemented method of claim 1, in which processing the input comprises performing a convolution based on weights associated with the weight tensor.” -- The limitation recites that the processing of input will comprise performed convolution based on the weights associated with the tensor. The limitation is directed performing a mathematical concept/calculation (convolution), and thus the limitation is directed to math. 
Thus, claim 3 is non-patent eligible. Claims 13 and 23 are analogous to claim 3, and thus would face the same rejection as set forth above.

Regarding claim 4, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a process.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“The processor-implemented method of claim 1, further comprising: generating, during training of the ANN, the weight tensor based on a transformation of an original weight tensor from a four-dimensional weight tensor to a two-dimensional weight tensor; factorizing, during the training, the weight tensor to the codebook matrix and the latent matrix; quantizing, during the training, the codebook matrix and the latent matrix.” -- The limitation is directed to generating a weight tensor during training of the ANN based on transformation of dimensions for the weight tensor, and factorizing/quantizing the matrices during training, which is all directed to the use of a mathematical concept/calculation, and thus the limitation is directed to math. 
Thus, claim 4 is non-patent eligible. Claim 24 is analogous to claim 4, and thus would face the same rejection as set forth above.

Regarding claim 5, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a process.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“The processor-implemented method of claim 4, further comprising factorizing the weight tensor to the codebook matrix and the latent matrix based on a floating point implementation for principal component analysis (PCA).” -- The limitation is directed to mathematical operations (factorization using PCA and floating-point computation), which are mathematical concepts, thus the limitation is directed to math. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 5 is non-patent eligible. Claims 15 and 25 are analogous to claim 5, and thus would face the same rejection as set forth above.

Regarding claim 6, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a process.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites the abstract ideas of:
“The processor-implemented method of claim 4, further comprising optimizing, during the training, the codebook matrix and the latent matrix on training data based on quantizing the dense matrix and the sparse matrix.” -- The limitation recites optimizing codebook and latent matrices on data based on quantizing the dense and sparse matrices. Quantization is a mathematical operation and concept, and thus the limitation is directed to math.
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 6 is non-patent eligible. Claims 16 and 26 are analogous to claim 6, and thus would face the same rejection as set forth above.

Regarding claim 7, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a process.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B: 
“The processor-implemented method of claim 4, in which a compression rate of a factorization associated with the codebook matrix is different from a compression rate of a factorization associated with the latent matrix.” -- The limitation recites a compression rate of a factorization value that is different from a compression rate of factorization that’s associated with the latent matrix. The limitation amounts to no more than merely reciting further limits without integration to a practical application, nor provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 7 is non-patent eligible. Claims 17 and 27 are analogous to claim 7, and thus would face the same rejection as set forth above. 

Regarding claim 8, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a process.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B: 
“The processor-implemented method of claim 1, in which the codebook matrix and the latent matrix are stored in a memory associated with a device that implements the ANN.” -- The limitation recites that a codebook and latent matrix is stored in memory associated in a device that will execute/implement the neural network. The limitation is directed to an insignificant, extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, the act of storing gathered/generated data into memory is a well-understood, routine, and conventional activity (WURC) that cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 8 is non-patent eligible. Claims 18 and 28 are analogous to claim 8, and thus would face the same rejection as set forth above. 

Regarding claim 9, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a process.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B: 
“The processor-implemented method of claim 1, in which the layer is a convolutional layer or a fully connected layer.” -- The limitation recites that the layer is a convolutional layer or a fully connected layer. The limitation is merely limiting the claim without integration to a practical application or providing significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 9 is non-patent eligible. Claim 19 is analogous to claim 9, and thus would face the same rejection as set forth above. 

Regarding claim 10, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is to a process.
There are no elements to be evaluated under Step 2A Prong 1. 
Step 2A Prong 2 and Step 2B: 
“The processor-implemented method of claim 1, in which the latent matrix is a sparse quantized matrix and the codebook matrix is a dense quantized matrix.” -- The limitation recites that the latent matrix is now a sparse quantized matrix, and the codebook matrix is now a dense quantized matrix. The limitation is merely limiting the claim without integration to a practical application or providing significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 10 is non-patent eligible. Claim 20 is analogous to claim 10, and thus would face the same rejection as set forth above.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections
under this section made in this Office action:
A person shall be entitled to a patent unless —
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise
available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in patent issued, under section 151, or in an application for patent
published or deemed published under section 122(b), in which the patent or application as the case may be, names
another inventor and was effectively filed before the effective filing date of the claimed invention. 

Claim(s) 1-2,3,8,10, 11-12, 13, 18, 20-23, and 28-29 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by NPL reference “Iteratively training look-up tables for network quantization.”, by Cardinaux et. al. (referred herein as Cardinaux.) 

Regarding claim 1, Cardinaux teaches: 
A processor-implemented method, comprising: retrieving, for a layer of an artificial neural network (ANN), a codebook matrix representing a codebook and a latent matrix representing linear coefficient; ([Cardinaux, page 1] “In this paper, we propose a training method for reducing the size and the number of operations of a deep neural network (DNN) that we call look-up table quantization (LUT-Q)...Using LUT-Q, instead of storing W, the dictionary d and the assignment matrix A are stored”, wherein the examiner interprets “we propose a training method” to be the same as A processor-implemented method, comprising because both are directed to a computer-executed method operating on a neural network. The examiner further interprets “the dictionary d and the assignment matrix A are stored” to be the same as retrieving … a codebook matrix … and a latent matrix because both are directed to having, for each layer, a stored codebook (dictionary d) and a stored companion representation (assignment matrix A) that would be retrieved for computation.)
determining, for the layer, a weight tensor based on a product of the codebook matrix and the latent matrix; and ([Cardinaux, page 1-3] “As depicted in Fig. 1, LUT-Q trains a network that represents the weights W ∈ R^(O×I) of one layer by a dictionary d ∈ R^K and assignments A ∈ [1,...,K]^(O×I) such that Qoi = dAoi, i.e., elements of Q are restricted to the K dictionary values in d. To learn the assignment matrix A and dictionary d, we iteratively update them after each minibatch. Our LUT-Q algorithm, run for each mini-batch … Step 1: Compute tied weights … Q(l) = d(l)[A(l)”, wherein the examiner interprets “Qoi = dAoi” and “Q(l) = d(l)[A(l)” to be the same as “determining, for the layer, a weight tensor based on a product of the codebook matrix and the latent matrix” because both are directed to forming the layer’s weights as the product of a codebook matrix and a companion matrix. The examiner further interprets “assignment matrix A” to be the same as a “latent matrix” because they are both directed to a per-layer matrix whose entries specify, for each weight location, the latent parameters that, together with the codebook/dictionary d, produce the effective weight values used by the layer (i.e. “mixing instructions” for the codebook to get the weight tensor).
processing, at the layer, an input based on the weight tensor. ([Cardinaux, page 2-3] “Weights (tied) Q ∈ R^(O×I) used in forward/backward pass … Step 1: Compute tied weights … Q(l) = d(l)[A(l)] … Step 2: Compute current cost and gradients … Forward ( X, Q(1), … , Q(L) ).”, wherein the examiner interprets Q, the tied, quantized weights that the network actually uses for computation/processing which is formed from dictionary, d(l), and assignments, A(l) to be the same as “weight tensor” because they used to do further processing in the neural network. The examiner further interprets “compute current cost and gradients … Forward (X, Q..).” to be the same as “processing” at a layer of a feed forward network using, Q, the “weight tensor”.) Claims 11 and 21 are analogous to claim 1 so the same rejection can be applied as set forth above.

Regarding claim 2, Cardinaux teaches:
 The processor-implemented method of claim 1, further comprising repeating, for each remaining layer of a set of layers, ([Cardinaux, page 3] “For each layer,we jointly update both dictionary and weight assignments during training. … for l = 1 to L do Q(l) = d(l)[A(l)] end for … Forward(X,Q(1), . . . ,Q(L))”, wherein the examiner interprets “For each layer” and “for l = 1 to L” to be the same as repeating, for each remaining layer of a set of layers because they are both directed to iterating the operations across all layers in the network.)
the retrieving, the determining, and the processing based on a codebook matrix and a latent matrix associated with a weight tensor of each remaining layer of the set of layers. ([Cardinaux, page 1] “In this paper, we propose a training method for reducing the size and the number of operations of a deep neural network (DNN) that we call look-up table quantization (LUT-Q)...Using LUT-Q, instead of storing W, the dictionary d and the assignment matrix A are stored” and [Cardinaux, page 3] “For each layer,we jointly update both dictionary and weight assignments during training. … for l = 1 to L do Q(l) = d(l)[A(l)] end for … Forward(X,Q(1), . . . ,Q(L))” , wherein the examiner interprets “the dictionary d and the assignment matrix A are stored” to be the same as the “codebook matrix and a latent matrix”, d and A, respectively. Those matrices are later retrieved because they are both “stored”, obtained, and used for each weight tensor per layer in a set of layers. The examiner further interprets “Q(l) = d(l)[A(l)” to be the same as the determining because they are both directed to forming, at the layer, the effective weights from the codebook matrix and the latent matrix. Finally, the examiner further interprets “compute current cost and gradients … Forward (X, Q..).” to be the same as “processing” at a layer of a feed forward network using, Q, the “weight tensor”.) Claims 12 and 22 are analogous to claim 2 so the same rejection can be applied as set forth above.

Regarding claim 3, Cardinaux teaches The processor-implemented method of claim 1, in which processing the input ([Cardinaux, page 3] “Forward(X,Q(1), . . . ,Q(L)) … Backward (X,T,Q(1),...,Q(L)) … used in forward/backward pass” wherein the examiner interprets “Forward(X,Q(1), . . . ,Q(L))” and “used in forward/backward pass” to be the same as processing the input because they are both directed to executing the forward computation of the network on inputs.)
comprises performing a convolution based on weights associated with the weight tensor. ([Cardinaux, page 1-2] “As depicted in Fig. 1, LUT-Q trains a network that represents the weights W ∈ R O×I of one layer by a dictionary d and assignments A…The memory used for the parameters is dominated by the weights in affine/convolution layers…replacing the convolution layers by factorized convolutions, and finally applying LUT-Q in order to quantize the weights of the network to 8-bit.”, wherein the examiner interprets “affine/convolution layers” and “replacing the convolution layers by factorized convolutions” to be the same as comprises performing a convolution because they are both directed to layers that perform convolution operations; and interprets “represents the weights W ∈ R O×I of one layer by a dictionary d and assignments A” to be the same as based on weights associated with the weight tensor because they are both directed to using the layer’s weights (the weight parameters of the tensor for that layer) during the computation.) Claims 13 and 23 are analogous to claim 3 so the same rejection can be applied as set forth above.

Regarding claim 8, Cardinaux teaches The processor-implemented method of claim 1, in which the codebook matrix and the latent matrix are stored in a memory associated with a device that implements the ANN. ([Cardinaux, page 1] “Operating deep neural networks on devices with limited resources requires the reduction of their memory footprints and computational requirements…The memory used for the parameters is dominated by the weights in affine/convolution layers. Using LUT-Q, instead of storing W, the dictionary d and the assignment matrix A are stored.” wherein the examiner interprets “the dictionary d and the assignment matrix A are stored” to be the same as “the codebook matrix and the latent matrix are stored” because both are directed to keeping, in storage, a per-layer codebook (dictionary) and a per-layer latent representation (assignments). The examiner further interprets “The memory used for the parameters … Using LUT-Q, instead of storing W, the dictionary d and the assignment matrix A are stored.” to be the same as “stored in a memory” because both are directed to storing the codebook and assignment data in memory for the network. Finally, the examiner further interprets “Operating deep neural networks on devices with limited resources” to be the same as “a device that implements the ANN” because both are directed to a computing device on which a neural network (NN) is implemented and executed.) Claims 18 and 28 are analogous to claim 3 so the same rejection can be applied as set forth above.

Regarding claim 10, Cardinaux teaches The processor-implemented method of claim 1, in which the latent matrix is a sparse quantized matrix ([Cardinaux, page 1] “LUT-Q trains a network that represents the weights W ∈ R^(O×I) of one layer by a dictionary d ∈ R^K and assignments A ∈ [1,...,K]^(O×I) such that Qoi = dAoi, i.e., elements of Q are restricted to the K dictionary values in d…we can constrain the assignment matrix and the dictionary in order to generate a network with pruned weight matrices…to {−1,0,1} resulting in a Ternary Weight Network”, wherein the examiner interprets “assignments A ∈ [1, . . . ,K] … elements of Q are restricted to the K dictionary values” to be the same as a quantized matrix because they are both directed to discrete-valued encodings (quantization) for the latent representation. The examiner further interprets “constrain the assignment matrix and the dictionary … pruned weight matrices” together with the inclusion of “0” in “{−1, 0, 1}” to be the same as sparse because they are both directed to enforcing zeros in the effective weights via the assignments and dictionary, yielding a sparse latent-driven representation.
and the codebook matrix is a dense quantized matrix. ([Cardinaux, page 1] “we can … restrict the dictionary to powers-of-two … by a dictionary d … such that Qoi = dAoi, i.e., elements of Q are restricted to the K dictionary values in d.” wherein the examiner interprets “dictionary d … K dictionary values” and “restrict the dictionary to powers-of-two” to be the same as a “dense quantized matrix” because they are both directed to a stored table of quantized real values (the codebook) without any requirement of sparsity in the codebook entries themselves (dense) while explicitly defining the quantized value set (quantized).) Claim 20 is analogous to claim 10 so the same rejection can be applied as set forth above.
Regarding claim 29, Cardinaux teaches:
 A non-transitory computer-readable medium, ([Cardinaux, page 1] “The memory used for the parameters is dominated by the weights in affine/convolution layers. Using LUT-Q, instead of storing W, the dictionary d and the assignment matrix A are stored.”, wherein the examiner interprets “memory used … for storing” to be the same as “non-transitory computer-readable medium” because they are both describing persistent storage in computer memory (i.e., a computer-readable medium that stores data rather than a transitory signal)).
having program code recorded thereon for an artificial neural network (ANN), the program code executed by a processor and comprising: ([Cardinaux, page 1-3] “In this paper, we propose a training method for reducing the size and the number of operations of a deep neural network (DNN) … Our LUT-Q algorithm, run for each mini-batch, is summarized in Table 1… Table 1: LUT-Q training algorithm … // Step 1: Compute tied weights … // Step 2: Compute current cost and gradients … // Step 3: Update full precision weights … // Step 4: Update weight tying by M k-means iterations”, wherein the examiner interprets “Our LUT-Q algorithm, run for each mini-batch” is the same as “having program code recorded thereon” because they are both referring to real algorithmic steps that is program code recorded on that storage. The examiner further interprets, “deep neural network (DNN)” to be the same as “an artificial neural network (ANN)” because they are both standard terminology for artificial neural networks, with DNN being a common subclass/phrasing of ANN used in the paper.)
program code to retrieve, for a layer of the ANN, a codebook matrix representing a codebook and a latent matrix representing linear coefficients; ([Cardinaux, page 1] “In this paper, we propose a training method for reducing the size and the number of operations of a deep neural network (DNN) that we call look-up table quantization (LUT-Q)...Using LUT-Q, instead of storing W, the dictionary d and the assignment matrix A are stored”, wherein the examiner interprets “we propose a training method” to be the same as A processor-implemented method, comprising because both are directed to a computer-executed method operating on a neural network. The examiner further interprets “the dictionary d and the assignment matrix A are stored” to be the same as retrieving … a codebook matrix … and a latent matrix because both are directed to having, for each layer, a stored codebook (dictionary d) and a stored companion representation (assignment matrix A) that would be retrieved for computation.)
program code to determine, for the layer, a weight tensor based on a product of the codebook matrix and the latent matrix; and ([Cardinaux, page 1-3] “As depicted in Fig. 1, LUT-Q trains a network that represents the weights W ∈ R^(O×I) of one layer by a dictionary d ∈ R^K and assignments A ∈ [1,...,K]^(O×I) such that Qoi = dAoi, i.e., elements of Q are restricted to the K dictionary values in d. To learn the assignment matrix A and dictionary d, we iteratively update them after each minibatch. Our LUT-Q algorithm, run for each mini-batch … Step 1: Compute tied weights … Q(l) = d(l)[A(l)”, wherein the examiner interprets “Qoi = dAoi” and “Q(l) = d(l)[A(l)” to be the same as “determining, for the layer, a weight tensor based on a product of the codebook matrix and the latent matrix” because both are directed to forming the layer’s weights as the product of a codebook matrix and a companion matrix. The examiner further interprets “assignment matrix A” to be the same as a “latent matrix” because they are both directed to a per-layer matrix whose entries specify, for each weight location, the latent parameters that, together with the codebook/dictionary d, produce the effective weight values used by the layer (i.e. “mixing instructions” for the codebook to get the weight tensor).
program code to process, at the layer, an input based on the weight tensor. ([Cardinaux, page 2-3] “Weights (tied) Q ∈ R^(O×I) used in forward/backward pass … Step 1: Compute tied weights … Q(l) = d(l)[A(l)] … Step 2: Compute current cost and gradients … Forward ( X, Q(1), … , Q(L) ).”, wherein the examiner interprets Q, the tied, quantized weights that the network actually uses for computation/processing which is formed from dictionary, d(l), and assignments, A(l) to be the same as “weight tensor” because they used to do further processing in the neural network. The examiner further interprets “compute current cost and gradients … Forward (X, Q..).” to be the same as “processing” at a layer of a feed-forward network using, Q, the “weight tensor”.)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this
Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not
identically disclosed as set forth in section 102, if the differences between the claimed invention and the
prior art are such that the claimed invention as a whole would have been obvious before the effective filing
date of the claimed invention to a person having ordinary skill in the art to which the claimed invention
pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are
summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness. 

Claims 4-7, 9, 14-17, 19, and 24-27 are rejected under 35 U.S.C. 103 as being unpatentable over Cardinaux in view of NPL reference “Sparse low rank factorization for deep neural network compression”. by Swaminathan et. al. (referred herein as Swaminathan). 

Regarding claim 4, Cardinaux teaches The processor-implemented method of claim 1, (see rejection of claim 1). 
Cardinaux further teaches factorizing, during the training, the weight tensor to the codebook matrix and the latent matrix; ([Cardinaux, page 1-3] “look-up table quantization, LUT-Q, which learns a dictionary and assigns each weight to one of the dictionary’s values … As depicted in Fig. 1, LUT-Q trains a network that represents the weights W ∈ R^(O×I) of one layer by a dictionary d ∈ R^K and assignments A ∈ [1,...,K]^(O×I) such that Qoi = dAoi, i.e., elements of Q are restricted to the K dictionary values in d. To learn the assignment matrix A and dictionary d, we iteratively update them after each minibatch. Our LUT-Q algorithm, run for each mini-batch … Step 1: Compute tied weights … Q(l) = d(l)[A(l)”, wherein the examiner interprets decomposing the layer’s weight tensor into two trained pieces; a “codebook matrix”, d, (a small set of basis patterns) and a “latent matrix”, A, (the per-weight coefficients/codes that tell how to mix those patterns) to be the same as “factorizing”. The examiner further interprets “we iteratively update them after each minibatch” is the same as “during the training.” )
quantizing, during the training, the codebook matrix and the latent matrix. ([Cardinaux, page 1] “In this paper we introduce a training method, called look-up table quantization, LUT-Q, which learns a dictionary and assigns each weight to one of the dictionary’s values…To learn the assignment matrix A and dictionary d, we iteratively update them after each minibatch”, wherein the examiner interprets “look-up table quantization … learns a dictionary and assigns each weight to one of the dictionary’s values” to be the same as “quantizing … the codebook matrix and the latent matrix” because they are both directed to quantization via learned codebooks (dictionary values) and per-weight assignments, and “iteratively update … after each minibatch” is the same as “during the training.”).
Cardinaux does not teach further comprising: generating, during training of the ANN, the weight tensor based on a transformation of an original weight tensor from a four-dimensional weight tensor to a two-dimensional weight tensor;. 
Swaminathan teaches further comprising: generating, during training of the ANN, the weight tensor based on a transformation of an original weight tensor from a four-dimensional weight tensor to a two-dimensional weight tensor; ([Swaminathan, page 187] “A 4D convolution tensor can be factorized into decomposed matrices of two-components … decomposed convolution weights into two … resulted [in] decent speedup in text character recognition model with small drop in accuracy.” wherein the examiner interprets “A 4D convolution tensor can be factorized into decomposed matrices” to be the same as “a transformation of an original weight tensor from a four-dimensional weight tensor to a two-dimensional weight tensor” because they are both directed to representing a 4D convolutional kernel by 2D matrices as part of training-time processing.)
Cardinaux, Swaminathan, and the instant application are analogous art because they are all directed to transforming convolutional weight tensors during training and factorizing them into compact matrices for efficient deployment.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 1 disclosed by Cardinaux to include the tensor dimensionality reduction technique disclosed by Swaminathan. One would be motivated to do so to efficiently improve speed and compression while maintaining accuracy, as suggested by Swaminathan ([Swaminathan, page 189] “A 4D convolution tensor can be factorized into decomposed matrices of two-components … decomposed convolution weights into two matrices resulted decent speedup in text character recognition model with small drop in accuracy … achieving better compression rates with minimal or no loss in the accuracy.”) Claims 14 and 24 are analogous to claim 4 so the same rejection can be applied as set forth above.

Regarding claim 5, Cardinaux and Swaminathan teaches The processor-implemented method of claim 4, (see rejection of claim 4). 
Cardinaux further teaches further comprising factorizing the weight tensor to the codebook matrix and the latent matrix ([Cardinaux, page 1-3] “look-up table quantization, LUT-Q, which learns a dictionary and assigns each weight to one of the dictionary’s values … As depicted in Fig. 1, LUT-Q trains a network that represents the weights W ∈ R^(O×I) of one layer by a dictionary d ∈ R^K and assignments A ∈ [1,...,K]^(O×I) such that Qoi = dAoi, i.e., elements of Q are restricted to the K dictionary values in d. To learn the assignment matrix A and dictionary d, we iteratively update them after each minibatch. Our LUT-Q algorithm, run for each mini-batch … Step 1: Compute tied weights … Q(l) = d(l)[A(l)”, wherein the examiner interprets decomposing the layer’s weight tensor into two trained pieces; a “codebook matrix”, d, (a small set of basis patterns) and a “latent matrix”, A, (the per-weight coefficients/codes that tell how to mix those patterns) to be the same as “factorizing”. The examiner further interprets “we iteratively update them after each minibatch” is the same as “during the training.” )
based on a floating point implementation ([Cardinaux, page 1-2] “Using LUT-Q, instead of storing W, the dictionary d and the assignment matrix A are stored. Hence, for an affine/convolution layer with N parameters, we reduce the memory usage in bits from NBfloat to just KBfloat + N ⌈log2K⌉, where Bfloat is the number of bits used to store one weight … we first use the full precision 32-bit …”, wherein the examiner interprets “NBfloat … where Bfloat is the number of bits used to store one weight” and “full precision” to be the same as a floating point implementation because they are both directed to representing and operating on network weights in floating-point format.)
Cardinaux does not teach for principal component analysis (PCA).
Swaminathan teaches for principal component analysis (PCA). ([Swaminathan, page 187] “Tucker Decomposition (TD) for decomposing … used canonical polyadic decomposition (CPD) … Other than SVD, TD, and CPD methods, another class of matrix decomposition approach called non-negative matrix factorization (NMF) is also considered a useful strategy for latent factor analysis (LFA). The NMF guarantees to have decomposed matrices with positive entries in latent factors that are easily interpretable. It has many applications … Singular value decomposition (SVD) is the most commonly used approach for low-rank approximation of 2D matrix weights in a stable way.”, wherein the examiner interprets “negative matrix factorization (NMF)...Singular value decomposition (SVD)...TD and CPD methods” to be the same as principal component analysis (PCA) because they are both directed to computing principal directions via orthonormal bases; PCA of a (centered) matrix is obtained through SVD, so both describe the same decomposition operation used for low-rank factorization in neural-network compression.)
Cardinaux, Swaminathan, and the instant application are analogous art because they are all directed to further comprising factorizing the weight tensor to the codebook matrix and the latent matrix during training to reduce memory while enabling efficient processing.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the processor-implemented method of claim 4 disclosed by Cardinaux and Swaminathan to include the PCA-like decomposition technique disclosed by Swaminathan. One would be motivated to do so to efficiently improve the stability of the low-rank approximation for layer weight matrices and decomposition to 2D, as suggested by Swaminathan ([Swaminathan, page 187] “Singular value decomposition (SVD) is the most commonly used approach for low-rank approximation of 2D matrix weights in a stable way.”) Claims 15 and 25 are analogous to claim 5 so the same rejection can be applied as set forth above.

Regarding claim 6, Cardinaux and Swaminathan teaches The processor-implemented method of claim 4, (see rejection of claim 4). 
Cardinaux further teaches further comprising optimizing, during the training, ([Cardinaux, page 1] “To learn the assignment matrix A and dictionary d, we iteratively update them after each minibatch. Our LUT-Q algorithm, run for each mini-batch,is summarized in Table 1.” wherein the examiner interprets “iteratively update them after each minibatch” to be the same as “optimizing, during the training” because they are both directed to making parameter-improving updates while training proceeds.)
the codebook matrix and the latent matrix ([Cardinaux, page 1-3] “As depicted in Fig. 1, LUT-Q trains a network that represents the weights W ∈ R^(O×I) of one layer by a dictionary d ∈ R^K and assignments A ∈ [1,...,K]^(O×I)”, wherein the examiner interprets “dictionary d … and assignments A” to be the same as the codebook matrix and the latent matrix because they are both directed to a learned codebook at each layer together with per-weight assignment variables that serve as the latent representation.)
on training data based on quantizing ([Cardinaux, page 1] “In this paper we introduce a training method, called look-up table quantization, LUT-Q, which learns a dictionary and assigns each weight to one of the dictionary’s values…To learn the assignment matrix A and dictionary d, we iteratively update them after each minibatch. Our LUT-Q algorithm, run for each mini-batch,is summarized in Table 1.” wherein the examiner interprets “each minibatch” and “run for each mini-batch” to be the same as on training data because they are both directed to updates performed while processing batches drawn from the training dataset. The examiner further interprets “look-up table quantization, LUT-Q” to be the same as based on quantizing because they are both directed to a quantization-driven training approach.)
the dense matrix and the sparse matrix. ([Cardinaux, page 1-2] “we can constrain the assignment matrix and the dictionary in order to generate a network with pruned weight matrices … Weights (full precision) W ∈ R O×I, updated by optimizer … LUT-Q can also be used to prune and quantize networks simultaneously.”, wherein the examiner interprets “Weights (full precision) W, updated by optimizer” to be the same as the dense matrix because they are both directed to the full-precision weight matrix of a layer (all weights stored at normal precision, with no enforced zeros, i.e. the uncompressed baseline) that is optimized during training. The examiner further interprets “p
Read full office action
MODEL COMPRESSION VIA QUANTIZED SPARSE PRINCIPAL COMPONENT ANALYSIS

This examiner grants 11% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

MODEL COMPRESSION VIA QUANTIZED SPARSE PRINCIPAL COMPONENT ANALYSIS

This examiner grants 11% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email