Last updated: May 29, 2026

Application No. 17/622,954

CLUSTERING-BASED QUANTIZATION FOR NEURAL NETWORK COMPRESSION

Non-Final OA §103

Filed

Dec 27, 2021

Priority

Jul 02, 2019 — provisional 62/869,754 +1 more

Examiner

HAN, KYU HYUNG

Art Unit

2123

Tech Center

2100 — Computer Architecture & Software

Assignee

Interdigital Vc Holdings Inc.

OA Round

3 (Non-Final)

Interview Optional

— +7.1% interview lift. Interview lift (+7.1%) is below the 15.0% threshold. A written response is recommended.

Based on 11 resolved cases, 2023–2026

Examiner Intelligence

HAN, KYU HYUNG View full profile →

Grants 46% of resolved cases

Career Allowance Rate

5 granted / 11 resolved

-9.5% vs TC avg

Moderate +7% lift

Without

With

+7.1%

Interview Lift

resolved cases with interview

Typical timeline

4y 1m

Avg Prosecution

15 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

4.6%

-35.4% vs TC avg

§103

95.4%

+55.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 11 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/02/2025 has been entered.

Response to Remarks
Claim Rejections – 35 U.S.C. 103
Applicant’s prior art arguments have been fully considered but they are not persuasive.
Applicant argues (pgs. 7-9) that Gong does not appear to reconstruct the weight matrix to the original. Applicant argues that Gong, at best, relates to compressing a convolution neural network and performing image recognition using the compressed neural network. Applicant adds that Gong does not decompress the compressed network back to the original (i.e. reconstruction back to the original).
Examiner respectfully disagrees. The amended limitation in claim 1 reads: “generating a weight matrix shape indication;”. Gong teaches different methods for compressing the parameters in the layers of the neural network, one of which is matrix factorization. On Page 3 of Gong, Equation 1 reads W = USV^T. This is a matrix factorization of the parameter W into two orthogonal and one diagonal matrix, known as singular value decomposition. On page 3 of Gong, Equation 2 gives the reconstruction of W, denoted as W_hat: W_hat = U_hat S_hat V_hat^T. While W_hat is not exactly identical to W, it has the same matrix shape, as the matrix product of U_hat S_hat V_hat^T gives the matrix dimensionality of m by n, which is the same dimensionality of W, the original weight matrix. Therefore, Gong does indeed teach that the weight matrix shape indication is generated, as it is done so during the reconstruction of the weight matrix, which has the same dimensionality/shape as the original.
The foregoing applies to all independent claims and their dependent claims.

Claim Rejections – 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 15-33 are rejected under 35 U.S.C. 103 as being unpatentable over Gong et al. (“Compressing Deep Convolutional Networks using Vector Quantization”) hereinafter known as Gong in view of Grangetto et al. (US 20200014955 A1) hereinafter known as Grangetto in view of Jin et al. (“Combined Inter-Intra Prediction for High Definition Video Coding”) hereinafter known as Jin.

Regarding independent claim 15, Gong teaches:
A method of encoding, by a processor, the method comprising: obtaining a neural network (NN) model, wherein the NN model comprises an NN layer, and wherein the NN layer is associated with a weight matrix; (Gong [Page 3, Paragraph 3]: “Given the parameter W ∈ R{m×n} in one dense connected layer” Gong teaches obtaining a NN model where a parameter (weight), is associated with a NN layer.)
…
based on the identified dimensionality of the weight matrix, reshaping the weight matrix to reduce the dimensionality of the weight matrix; (Gong [Page 3, Paragraph 5]: “are two submatrices that correspond to the leading k singular vectors in U and V” Gong teaches two submatrices, when multiplied together, will reconstruct the weight matrix. Since the outside dimension of the product of the submatrices is k, Gong teaches that the matrix is reduced to dimensionality k, where k is less than m.)
generating a weight matrix shape indication; (Gong [Page 3, Equation 1]: Gong teaches a matrix factorization of the parameter W into two orthogonal and one diagonal matrix, known as singular value decomposition. Gong [Page 3, Equation 2]: Gong teaches the reconstruction of W, denoted as W_hat: W_hat = U_hat S_hat V_hat^T. While W_hat is not exactly identical to W, it has the same matrix shape, as the matrix product of U_hat S_hat V_hat^T gives the matrix dimensionality of m by n, which is the same dimensionality of W, the original weight matrix.)
and coding the NN layer based on the reshaped weight matrix; (Gong [Page 3, Equation 3]: Gong teaches binarizing the weights to quantize them. The present invention describes in the specs (paragraph [0011]) that coding the NN may include quantization.)

Gong does not explicitly teach:
identifying a dimensionality of the weight matrix;

However, Grangetto teaches:
identifying a dimensionality of the weight matrix; (Grangetto [¶ 0023]: “The graph describing the image pixels can be represented as a N×N matrix (i.e., a matrix having N.sup.2 elements) that it is referred as weights matrix W, as discussed later on.” Grangetto teaches explicitly identifying the dimensionality of the weight matrix by declaring that it is a square matrix of dimension N.)

Gong and Grangetto are in the same field of endeavor as the present invention, as the
references are directed to methods of encoding and decoding matrices using quantization. It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine encoding/decoding the weight matrices of neural networks as taught in Gong with techniques involving identifying and reducing the weight matrices, and subsequently transmitting the reduced dimensionality of these weight matrices, as taught in Grangetto. Grangetto provides this additional functionality. As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Gong to include teachings of Grangetto because the combination would allow for the weight matrices of neural networks to be encoded via quantization with the ability to be retrieved to be decoded later. This has the potential benefit of compressing neural network models, reducing the storage or bandwidth, and thus increasing the efficiency of the operation of the models.

Gong and Gangretto do not explicitly teach:
wherein the coding comprises inter prediction or intra prediction.

However, Jin teaches:
wherein the coding comprises inter prediction or intra prediction. (Jin [Page 3, Column 2, Paragraph 4]: “Combining Inter prediction and Intra prediction will increase the encoding and decoding complexity” Jin teaches that in video coding, inter prediction is combined with intra prediction to encode video data. This increases the complexity of the encoding.)

Jin is in the same field as the present invention, since it is directed to coding/encoding video data using inter prediction and intra prediction.  It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine weight matrices of neural networks to be encoded via quantization as taught in Gong as modified by Grangetto with coding/encoding video data using inter prediction and intra prediction as taught in Jin. Jin provides this additional functionality. As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Gong as modified by Grangetto to include teachings of Jin because the combination would allow for the prediction on the video data to be made between frames and within a frame. This has the potential benefit of preserving the accuracy of the data while compressing it, which may speed up the processing time.

Regarding dependent claim 16, Gong and Grangetto teach:
The method of claim 15, 

Gong teaches:
wherein reshaping the weight matrix comprises flattening or rearranging the dimensionality of the weight matrix. (Gong [Page 3, Paragraph 5]: “are two submatrices that correspond to the leading k singular vectors in U and V” Gong teaches two submatrices, when multiplied together, will reconstruct the weight matrix. Since the outside dimension of the product of the submatrices is k, Gong teaches that the matrix is reduced to dimensionality k, where k is less than m.)

The reasons to combine are substantially similar to those of claim 15.

Regarding dependent claim 17, Gong and Grangetto teach:
The method of claim 15, 
wherein the dimensionality of the weight matrix comprises a two-dimension, a three-dimension, or a higher dimension, and the weight matrix is reshaped to a one- dimension weight vector. (Gong [Page 3, Paragraph 5]: “are two submatrices that correspond to the leading k singular vectors in U and V” Gong teaches two submatrices, when multiplied together, will reconstruct the weight matrix. Since the outside dimension of the product of the submatrices is k, Gong teaches that the matrix is reduced to dimensionality k, where k is less than m. Gong allows for the weight matrix to be originally higher dimension and then for the reduced dimensionality of the weight matrix to be of k=1.)

The reasons to combine are substantially similar to those of claim 15.

Regarding dependent claim 18, Gong and Grangetto teach:
The method of claim 15, 

Grangetto teaches:
wherein the method comprises at least one of: transmitting the identified dimensionality and the reduced dimensionality of the weight matrix in a bitstream; (Grangetto [¶ 0094]: “Successively, the CPU 1110 activates the entropy coding unit 1160, which fetches from the memory the selected mode information and the set of the selected quantized coefficients, executes the phases of the method for arranging said selected quantized coefficients in a sequence according to the present invention (see FIG. 3 step 330), then this unit entropy encodes said selected mode information and the sequence of selected quantized coefficients, obtaining a bitstream which is stored into the memory 1140.” Grangetto teaches getting a bitstream of the reduced dimensionality of the weight matrix and then transmitting/storing it in memory.)

Gong teaches:
or performing prediction based on the reshaped weight matrix. (Gong [Page 7, Paragraph 3]: “Compressing all three layers together usually led to larger error, especially when the compression rate was high. Finally, some sample predictions results are shown in Figure 5.” Gong teaches performing predictions based on the reshaped weight matrix and its layers. Gong [Page 8, Figure 5] Gong displays these predictions in a figure.)

The reasons to combine are substantially similar to those of claim 15.


Regarding dependent claim 19, Gong and Grangetto teach:
The method of claim 15, 
wherein coding the NN layer comprises performing a quantization on the NN layer, and wherein the quantization comprises vector quantization. (Gong [Page 3, Equation 3]: Gong teaches binarizing the weights to quantize them. The present invention describes in the specs (paragraph [0011]) that coding the NN may include quantization. Gong’s teaching of binarizing the weights is a form of vector quantization.)

The reasons to combine are substantially similar to those of claim 15.

Claim 20 is substantially similar to claim 15, but has the additional elements:
Regarding independent claim 20, Gong and Grangetto teach:
An apparatus for encoding comprising: a processor configured to: (Grangetto [¶ 0079]: “processing unit 1110, like a Central Processing Unit (CPU), configured for executing a set of instruction for carrying out a method for encoding digital images or video streams according to an embodiment of the invention.” Grangetto teaches a processor to encode the weight matrices.)

The reasons to combine are substantially similar to those of claim 15.

Claims 21-22 are rejected on the same grounds under 35 U.S.C. 103 as claims 16-17, as they are substantially similar, respectively. Mutatis mutandis.

Claim 23 is rejected on the same grounds under 35 U.S.C. 103 as claim 18, as they are substantially similar. Mutatis mutandis.

Claim 24 is rejected on the same grounds under 35 U.S.C. 103 as claim 19, as they are substantially similar. Mutatis mutandis.

Claim 25 is rejected on the same grounds under 35 U.S.C. 103 as claim 18, as they are substantially similar. Mutatis mutandis.

	Regarding independent claim 26, Gong teaches:
A method of decoding, by a processor, the method comprising: …, wherein the compressed NN model comprises a quantized NN layer, and wherein the quantized NN layer is associated with a weight matrix having a first dimensionality; (Gong [Page 8, Paragraph 1]: “Given the compressed CNN above, we were able to process the images using compressed CNN on the cellphone side, and to perform retrieval on the database side by uploading only the processed feature.” Gong teaches receiving compressed CNNs with NN layers with weight matrices with a first dimensionality. They are receiving the NN compressed using the techniques described in the first part of the paper.)
obtaining a weight matrix shape indication, wherein the weight matrix shape indication indicates a weight matrix shape having a second dimensionality; (Gong [Page 8, Paragraph 3]: “However, because our goal was to best reconstruct the original weight matrix, this improvement from this binary kmeans case indeed showed that it is not a very accurate approximation.” Gong teaches reconstruction of the weight matrix to the original – the information regarding the original must have been obtained to initiate this process.)
based on the weight matrix shape indication, reshaping the weight matrix to the second dimensionality; (Gong [Page 8, Paragraph 3]: “However, because our goal was to best reconstruct the original weight matrix, this improvement from this binary kmeans case indeed showed that it is not a very accurate approximation.” Gong teaches reconstruction of the weight matrix to the original.)
and decoding the NN layer based on the reshaped weight matrix; (Gong [Page 8, Paragraph 3]: “This section presents an application of the compressed CNN to image retrieval, in order to verify the generalization ability of the compressed networks.” Gong teaches that the reshaping the weight matrix back to its original is part of decoding the NN layer process.)

Gong does not explicitly teach:
… obtaining a compressed neural network (NN) model …

However, Grangetto teaches:
… obtaining a compressed neural network (NN) model … (Grangetto [¶ 0103]: “a graph decoding unit 1250, which is configured for executing the phases of the method for decompressing digital images or video streams according to an embodiment of the invention; in particular, this unit is configured for de-quantizing the coefficients of each decoded block.” Grangetto teaches an decoding unit that obtains a compressed weight matrix that is to be decoded later.)

Gong and Grangetto do not explicitly teach:
wherein the decoding comprises inter prediction or intra prediction.

However, Jin teaches:
wherein the decoding comprises inter prediction or intra prediction. (Jin [Page 3, Column 2, Paragraph 4]: “Combining Inter prediction and Intra prediction will increase the encoding and decoding complexity” Jin teaches that in video coding, inter prediction is combined with intra prediction to decode video data. This increases the complexity of the decoding.)

The reasons to combine are substantially similar to those of claim 15.

Regarding dependent claim 27, Gong and Grangetto teach:
The method of claim 26, 

Gong teaches:
wherein reshaping the weight matrix comprises restoring the weight matrix having the first dimensionality to the weight matrix having the second dimensionality. (Gong [Page 8, Paragraph 3]: “However, because our goal was to best reconstruct the original weight matrix, this improvement from this binary kmeans case indeed showed that it is not a very accurate approximation.” Gong teaches reconstruction of the weight matrix to the original.)

The reasons to combine are substantially similar to those of claim 15.


Regarding dependent claim 28, Gong and Grangetto teach:
The method of claim 26, 

Gong teaches:
wherein the weight matrix shape having the second dimensionality comprises the weight matrix having an original dimensionality prior to the quantization, and wherein the weight matrix shape indication comprises a number of columns and a number of rows associated with the original dimensionality. (Gong [Page 8, Paragraph 3]: “However, because our goal was to best reconstruct the original weight matrix, this improvement from this binary kmeans case indeed showed that it is not a very accurate approximation.” Gong teaches reconstruction of the weight matrix to the original, which means that the number of columns and rows of the original dimensionality are restored.)

The reasons to combine are substantially similar to those of claim 15.

Regarding dependent claim 29, Gong and Grangetto teach:
The method of claim 26, 

Gong teaches:
wherein the second dimensionality of the weight matrix comprises a two-dimension, a three-dimension, or a higher dimension, and the weight matrix is reshaped by increasing the first dimensionality of the weight matrix to the second dimensionality of the weight matrix. (Gong [Page 8, Paragraph 3]: “However, because our goal was to best reconstruct the original weight matrix, this improvement from this binary kmeans case indeed showed that it is not a very accurate approximation.” Gong teaches reconstruction of the weight matrix to the original. Gong allows for the original dimension of the weight matrix to be of higher dimensions and the compressed dimension to be k=1. Therefore, when the original weight matrix is reconstructed, then the dimensionality is increased back to the higher dimension.)

The reasons to combine are substantially similar to those of claim 15.


Claim 30 is substantially similar to claim 26, but has the additional elements:
Regarding independent claim 30, Gong and Grangetto teach:
An apparatus for decoding comprising: a processor configured to: (Grangetto [¶ 0079]: “processing unit 1110, like a Central Processing Unit (CPU), configured for executing a set of instruction for carrying out a method for encoding digital images or video streams according to an embodiment of the invention.” Grangetto teaches a processor to encode the weight matrices.)

The reasons to combine are substantially similar to those of claim 15.

Claims 31-33 are rejected on the same grounds under 35 U.S.C. 103 as claims 27-29, as they are substantially similar, respectively. Mutatis mutandis.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KYU HYUNG HAN whose telephone number is (703) 756-5529.  The examiner can normally be reached on MF 9-5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Kyu Hyung Han/
Examiner
Art Unit 2123

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123

Read full office action

Prosecution Timeline

Dec 27, 2021

Application Filed

Feb 18, 2025

Non-Final Rejection mailed — §103

May 19, 2025

Response Filed

Sep 02, 2025

Final Rejection mailed — §103

Dec 02, 2025

Request for Continued Examination

Dec 08, 2025

Response after Non-Final Action

Mar 09, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/332,295

Patent 12585928

HARDWARE ARCHITECTURE FOR INTRODUCING ACTIVATION SPARSITY IN NEURAL NETWORK

4y 10m to grant Granted Mar 24, 2026

17/317,300

Patent 12387101

SYSTEMS AND METHODS FOR PRUNING BINARY NEURAL NETWORKS GUIDED BY WEIGHT FLIPPING FREQUENCY

4y 3m to grant Granted Aug 12, 2025

Study what changed to get past this examiner. Based on 2 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

46%

Grant Probability

53%

With Interview (+7.1%)

4y 1m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 11 resolved cases by this examiner. Grant probability derived from career allowance rate.