Office Action Analysis: 17552501 — NEURAL NETWORK MODEL QUANTIZATION METHOD AND APPARATUS

Office Action

§101 §102 §103
Detailed Action
This action is in response to the RCE filed on 01/01/2026 for the amended claims filed 11/20/2025 for application 17/552,501, in which:
Claims 1 and 11 are independent claims.
Claims 1-2, 6, 11-12, and 16 are amended.
Claims 1-19 are currently pending
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/01/2026 has been entered.
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. CN202011564315.0, filed on 12/25/2020.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Regarding the 35 USC § 112(b) Rejections: 
Applicant asserts (Pages 13-14), that the term "compute-intensive" is a term of art; however, the claim has been amended for clarity and the 112(b) rejection is respectfully requested.
Applicant's amendments to Claims 6 and 16 overcomes the previous rejections. The 35 USC § 112(b) Rejections have been withdrawn.
Response to Arguments
Applicant's arguments filed 11/20/2025 have been fully considered but they are not persuasive.
Regarding the Claim Interpretation: 
	Applicant disagrees (Pages 10-11), with the 112(f) interpretation due to specifically naming the structures as (1) a data acquirer, (2) a quantization parameter calculator, and (3) a quantization implementor - that perform the respectively claimed operations and are not “generic placeholder”. The applicant further supports their assertion with noting the specification which notes “a neural network model quantization apparatus … may include ...” or by noting “examples” of the listed structures ((1)-(3)). As no “means” or “step” is recited in claims 11-19, the claims do not modify the alleged “means” or “step”, and the structures are tied to a hardware component… the 112(f) interpretations are requested to be withdrawn.
Examiner respectfully disagrees. As noted within MPEP 2181, instead of using "means" in such cases, a substitute term acts as a generic placeholder for the term "means" and would not be recognized by one of ordinary skill in the art as being sufficiently definite structure for performing the claimed function. "The standard is whether the words of the claim are understood by persons of ordinary skill in the art to have a sufficiently definite meaning as the name for structure." Williamson, 792 F.3d at 1349, 115 USPQ2d at 1111; see also Greenberg v. Ethicon Endo-Surgery, Inc., 91 F.3d 1580, 1583, 39 USPQ2d 1783, 1786 (Fed. Cir. 1996). The alleged structures (1)-(3) are considered generic placeholders as the structures recited are terms that are simply a substitute for the term "means". The generic placeholders in the limitations are data acquirer, quantization parameter calculator, and quantization implementor which do not recite any structure in the Claims as asserted by the Applicant with no indication of where the sufficient physical structure is described in the specification as performing the claimed function. The functional term in the limitations is noted as configured to which can be seen within MPEP 2181 B as a “linking word” to associate the generic placeholder with the function. Applicant has made no amendments to the Claims to overcome the interpretation of 35 U.S.C. 112(f). Thus, the Claims are still being interpreted under 35 U.S.C. § 112(f).

Regarding the 35 USC § 101 Rejections: 
Applicant's arguments regarding the 35 U.S.C. 101 rejections of the previous office action have been fully considered, but are unpersuasive.
Applicant disagrees (Pages 14-16) with the rejections of Claims 1-19 as being directed to an abstract idea and traverses the rejections due to amendments. Applicant further supports their assertion by noting Diamond v. Diehr, the 2019 Revised Patent Subject Matter Eligibility Guidance, and Alice, to note the claims are not reciting a mathematical formula/abstract idea, are allowable under 101, and combination of claim elements that go beyond/transform the claim into patent-eligible subject matter, respectively. The office action has failed to meet its initial burden of "presenting a prima facie case of unpatentability," as required by the MPEP. Thus, claims 1-19 are not directed to an abstract idea and are allowable under 35 U.S.C. § 101 as the claims integrate their respective features into a practical application.  
Examiner respectfully disagrees. The 35 U.S.C. § 101 rejections for the amended claims are directed to an abstract idea (Step 2A Prong 1) and do not integrate the abstract idea into a practical application (Step 2A Prong 2). The pending Claims are directed to a judicial exception due to reciting limitations which fall within the “mathematical concepts” and “mental processes” group of abstract ideas; where the judicial exception is unable to be directed to significantly more than the judicial exception due to the pending Claims not including additional elements that contribute to an “inventive concept”. Due to the additional elements falling under MPEP 2106.05, the judicial exception is not integrated into a practical application and the specific details are discussed below within the Examiner’s Responses and 35 USC § 101 Rejections. The claims are directed towards the improvement of an abstract idea. Improvements to an abstract idea are still considered an abstract idea. Although the Claims are interpreted in light of the specification, limitations from the specification are not read into the Claims. The rejections follow the steps of the analysis laid out within the MPEP which was followed for the previous and current examination (see MPEP 2106). The rejection also follows the steps of the analysis as laid out in the MPEP which was followed for the previous and current examination (see MPEP 2106). Therefore, for the reasons given above and in the updated rejections below, the rejection to all Claims (including Claim 1, similar independent claims, and all dependent Claims) are maintained and updated as necessitated by Claim amendments. More specific details are discussed below within the responses and 35 USC § 101 Rejections.
Applicant asserts (Page 16-20), that the above-noted claimed features are not, and/or would/could not be, practically performed in the human mind and/or correspond to mental activities. The applicant further supports this assertion by noting under Step 2A Prong 1 since the claim specifically recites (1) the claimed method is a processor-implemented neural network model quantization method, and (2) the claimed operations are performed by a neural network model quantization apparatus. Applicant further supports their assertions by noting that the claimed invention is improving computer technologies by improving operation speed of a model and adding quantization indication operators to solve existing issues; where the operations are rooted in computer technology and are too complex to be performed by the human mind. The office has failed to provide any rationale evidencing that the alleged abstract ideas are similar to what the courts identified (ex. Hannun); where one of ordinary skill in the art would not understand how to mentally perform all the limitations from Claim 1. 
Examiner respectfully disagrees. The 35 U.S.C. § 101 rejections for the amended claims are directed to an abstract idea (Step 2A Prong 1) and do not integrate the abstract idea into a practical application (Step 2A Prong 2). The amended independent claim recites calculating … a quantization parameter corresponding to an operator of the received neural network model to be quantized based on bisection approximation (a mathematical relationship between variables and/or numbers using a mathematical formula/equations), quantizing the operator of the received neural network model to be quantized based on the calculated quantization parameter (a mathematical relationship between variables and/or numbers using a mathematical formula/equations), obtaining a neural network model having the quantized operator (a human being can apply evaluation to obtain a neural network model having a quantized operator), and wherein the bisection approximation comprises individually calculating the quantization parameter by combining data distribution subintervals and a minimum mean squared error (MSE) (a mathematical relationship between variables and/or numbers using a mathematical formula/equations); which are all pertaining to limitations that are evaluations or judgements that can be performed in the human mind, or by a human using pen and paper, or a mathematical relationship. The office action establishes a proper and well-supported prima facie case as the claims are explained to be not patentable via the Patent Subject Matter Eligibility steps within MPEP 2106. 
Applicant asserts (Page 20-24), that under Step 2A Prong 2: Claims 1-19 are Not "Directed To" The Judicial Exception Because the Claimed Features are Integrated into a Practical Application that Imposes a "Meaningful Limit". Even if the Claims have an alleged judicial exception, which Applicant does not agree with, claims 1-19 are not "directed to" the judicial exception as the judicial exception is "integrated into a practical application of the judicial exception." Applicant further supports their assertion by noting 2019 Revised Patent Subject Matter Eligibility Guidance, Diehr, Alice, Specification, etc. to note examples/support for claims which are not considered abstract ideas due to examining the additional features; where the amended claimed invention improves the field of neural network model quantization. Applicant respectfully submit that an analysis of the claims under Part 2B of the Mayo framework, in light of 2019 Revised Patent Subject Matter Eligibility Guidance, the claims are integrated into a practical application. Thus, claims 1-19 are directed to patent eligible subject matter and should have their 101 rejections withdrawn.
Examiner respectfully disagrees. The claims do not integrate the judicial exception into a practical application nor amount to significantly more. The claim is not patent eligible. Although the Claims are interpreted in light of the specification, limitations from the specification are not read into the Claims.
MPEP 2106.05(a) recites:
After the examiner has consulted the specification and determined that the disclosed invention improves technology, the claim must be evaluated to ensure the claim itself reflects the disclosed improvement in technology … the claim must include the components or steps of the invention that provide the improvement described in the specification 
…
It is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements. See the discussion of Diamond v. Diehr, 450 U.S. 175, 187 and 191-92, 209 USPQ 1, 10 (1981)) in subsection II, below. 
Applicant fails to show how any alleged technical improvement would be provided by anything more than the judicial exception on its own. Additionally, applicant fails to show how the claim includes components or steps that would provide the alleged improvement described in the specification or by the cited case law. By MPEP 2106.05(f)(1), "the claim recites only the idea of a solution or outcome, i.e. the claim fails to recite details of how a solution to a problem is accomplished". Moreover, the examiner maintains that the Claim does not impose any meaningful limits on the judicial exceptions. As noted in the rejection, the Claim does not include additional elements that are sufficient to amount to an integration of the identified abstract idea into a practical application, thus the claim is directed to an abstract idea. Applicant’s arguments regarding the other independent and dependent claims rely upon the same assertions as with respect to Claim 1, and are thus likewise unpersuasive. Therefore, for the reasons given above and in the updated rejections below, the rejection to all Claims (including Claim 1, similar independent claims, and all dependent Claims) are maintained and updated as necessitated by Claim amendments. More specific details are discussed below within the responses and 35 USC § 101 Rejections.

Regarding the 35 USC § 102 Rejections: 
Applicant's arguments regarding the 35 U.S.C. 102 rejections of the previous office action have been fully considered, but are unpersuasive. 
Applicants asserts (Pages 25-26) that there is no teaching or suggestion in Filini of the feature "calculating … a quantization parameter corresponding to an operator of the received neural network model to be quantized based on bisection approximation" as recited in independent claim 1; specifically the quantizable operator of a neural network (as Filini only mentions the claimed operator to be a convolution/fully-connected-layer/etc.). Applicants respectfully submit that DNN weights of Filini are quite different from the operators of a neural network.
Examiner respectfully disagrees. The examiner interprets an “operator” of a Neural Network to be a function to apply an operation/mapping. Thus, DNN weights are applying weights which is a weight function (a weight operation) and interpreted as an operator by the examiner. Although the Claims are interpreted in light of the specification, limitations from the specification are not read into the Claims. Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.
Applicants asserts (Pages 26-27), that there is no teaching or suggestion in Filini of the feature "the bisection approximation comprises individually calculating the quantization parameter by combining data distribution subintervals and a minimum mean squared error (MSE)" as claimed. Thus, Filini fails to teach each and every limitation as within independent claim 1. With regard to claims 2-6, 9, and 10, Applicants respectfully submit that Filini also fails to teach or suggest all of the features of claims 2-6, 9, and 10, and claims 2-6, 9, and 10 are allowable in view of their individual recitations, and also in view of their respective dependencies on an allowable base claim. Applicants respectfully submit that Filini also does not teach or suggest the features of independent claim 11. It is respectfully submitted that Filini also fails to disclose all the respective features of claims 12-16 and 19, and claims 12-16 and 19 are also allowable at least by virtue of their respective dependencies on a patentable base claim.
Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. Filini does teach/suggest all limitations within the Independent Claim under the broadest reasonable interpretation. Applicant’s arguments regarding the other independent and dependent claims rely upon the same assertions as with respect to Claim 1, and are thus likewise unpersuasive.

Regarding the 35 USC § 103 Rejections: 
Applicant's arguments regarding the 35 U.S.C. 103 rejections of the previous office action have been fully considered, but are unpersuasive.
Applicants asserts (Page 27) that Elmer does not remedy the above-noted deficiencies
of Filini. Accordingly, claims 7, 8, 17, and 18 are allowable in view of their individual recitations,
and also in view of their respective dependencies on patentable base claims.
Examiner respectfully disagrees. Elmer does not need to cure factual deficiencies of Filini as it is covered within the Filini reference under the broadest reasonable interpretation as noted in the remarks above and in the office action below. Applicant’s arguments regarding the other independent and dependent claims rely upon the same assertions as with respect to Claim 1, and are thus likewise unpersuasive.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
Such claim limitations are: 
a data acquirer configured to receive a neural network model; 
a quantization parameter calculator configured to calculate a quantization parameter … ; and
a quantization implementor configured to quantize in Claim 11. Thus, the Dependent Claims 12-19 are also interpreted under 35 U.S.C. 112(f).
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1:
Claim 1 recites a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 1 further recites the method comprising of: 
calculating … a quantization parameter corresponding to an operator of the received neural network model to be quantized based on bisection approximation (a mathematical relationship between variables and/or numbers using a mathematical formula/equations)
quantizing the operator of the received neural network model to be quantized based on the calculated quantization parameter (a mathematical relationship between variables and/or numbers using a mathematical formula/equations)
obtaining a neural network model having the quantized operator (a human being can apply evaluation to obtain a neural network model having a quantized operator)
wherein the bisection approximation comprises individually calculating the quantization parameter by combining data distribution subintervals and a minimum mean squared error (MSE) (a mathematical relationship between variables and/or numbers using a mathematical formula/equations)
Claim 1 thus recites an abstract idea (that falls into the “mathematical concepts” and “mental processes” group of abstract ideas).
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the additional elements recited consists of:
A processor-implemented neural network model quantization method, the method comprising (which is restricting the abstract idea to a Particular Technological Environment, by MPEP 2106.05(h))
… by a neural network model quantization apparatus … (to perform a mental process and the performance of an abstract idea on a computer is no more than instructions to “apply it” on a computer, by MPEP 2106.05(f)).
receiving … a neural network model (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements recited, alone or in combination, do not provide significantly more than the abstract idea itself. Additional element a is only restricting the abstract idea to a Particular Technological Environment (MPEP 2106.05(h)) which cannot provide significantly more. Additional element b is merely applying the abstract idea on a computer (MPEP 2106.05(f)) which cannot provide significantly more. Additional element c falls within MPEP 2106.05(d) as well-understood, routine and conventional activities of receiving or transmitting data over a network (MPEP 2106.05(d)(II): buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014)). Thus, the claim is subject-matter ineligible.

Regarding Claim 2:
Dependent Claim 2 recites the method of Claim 1. Claim 1 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 2 further recites the method comprising of calculating a quantization parameter corresponding to the minimum mean squared error (MSE) of the input data of the operator to be quantized before and after quantization based on the input data of the operator to be quantized, by implementing bisection approximation (a mathematical relationship between variables and/or numbers using a mathematical formula/equations). Claim 2 thus recites an abstract idea (that falls into the “mathematical concepts” group of abstract ideas).
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the sole additional element recited consists of receiving input data of the operator to be quantized by verifying the neural network model with a verification dataset (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g)).
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the new sole additional element recited, alone or in combination, does not provide significantly more than the abstract idea itself. The additional element falls within MPEP 2106.05(d) as well-understood, routine and conventional activities of receiving or transmitting data over a network (MPEP 2106.05(d)(II): buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014)). Thus, the claim is subject-matter ineligible.

Regarding Claim 3:
Dependent Claim 3 recites the method of Claim 2. Claim 2 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 3 further recites the method comprising of: 
wherein the calculating of the quantization parameter corresponding to the minimum MSE comprises: (a mathematical relationship between variables and/or numbers using a mathematical formula/equations)
performing dimensionality reduction on the input data of the operator to be quantized (a human being can mentally apply evaluation to perform dimensionality reduction on input data)
searching for the quantization parameter corresponding to the minimum MSE by bisectionally approximating an intermediate point between a start point and an end point of each of the data distribution intervals, by implementing bisection approximation (a human being can mentally apply evaluation to search for a parameter corresponding to a minimum MSE by implementing a bisection approximation)
Claim 3 thus recites an abstract idea (that falls into the “mathematical concepts” and “mental processes” group of abstract ideas).
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the sole additional element recited consists of dividing the input data of the operator to be quantized after the performing of the dimensionality reduction into a plurality of data distribution intervals based on a statistical characteristic of the input data of the operator to be quantized after the dimensionality reduction, and obtaining an interval upper value array which is an array of upper values in each of the plurality of data distribution intervals (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g)).
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the new sole additional element recited, alone or in combination, does not provide significantly more than the abstract idea itself. The additional element falls within MPEP 2106.05(d) as well-understood, routine and conventional activities of receiving or transmitting data over a network (MPEP 2106.05(d)(II): buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014)). Thus, the claim is subject-matter ineligible.

Regarding Claim 4:
Dependent Claim 4 recites the method of Claim 3. Claim 3 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 4 does not recite any additional abstract ideas and only inherits the abstract ideas from Claim 3. Claim 4 thus recites an abstract idea.
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the sole additional element recited consists of wherein the quantization parameter comprises at least one of a clipping parameter, a quantization factor parameter, and a clipping factor parameter of each of the plurality of data distribution intervals (which is restricting the abstract idea to a Particular Technological Environment, by MPEP 2106.05(h)).
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the new sole additional element recited, alone or in combination, does not provide significantly more than the abstract idea itself. The additional element is only restricting the abstract idea to a Particular Technological Environment (MPEP 2106.05(h)) which cannot provide significantly more. Thus, the claim is subject-matter ineligible.

Regarding Claim 5:
Dependent Claim 5 recites the method of Claim 3. Claim 3 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 5 further recites the method comprising of: 
wherein the searching for the quantization parameter comprises: (a human being can mentally apply evaluation to search for the quantization parameter)
calculating an MSE of an approximate point of each of the plurality of data distribution intervals by bisectionally approximating the intermediate point between the start point and the end point of each of the plurality of data distribution intervals (a mathematical relationship between variables and/or numbers using a mathematical formula/equations)
initializing the minimum MSE to be an initial MSE of each of the plurality of data distribution intervals when obtaining the interval upper value array each time for each of the plurality of data distribution intervals (a human being can mentally apply evaluation to initialize the minimum MSE for each of the plurality of intervals when obtaining the interval upper value array)
updating the minimum MSE by implementing the MSE of the approximate point when the MSE of the approximate point is less than the minimum MSE (a human being can mentally apply evaluation to update the minimum MSE by implementing the MSE of the approximate point when the MSE of the approximate point is less than a minimum MSE)
Claim 5 thus recites an abstract idea (that falls into the “mathematical concepts” and “mental processes” group of abstract ideas).
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the additional elements recited consists of: 
outputting the quantization parameter corresponding to the minimum MSE when traversing the data distribution intervals  (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
wherein the initial MSE corresponds to a quantization parameter corresponding to an intermediate point between a start point and an end point of each of the data distribution intervals, and wherein the MSE of the approximate point corresponds to a quantization parameter corresponding to an approximate point of each of the data distribution intervals (which is restricting the abstract idea to a Particular Technological Environment, by MPEP 2106.05(h))
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements recited, alone or in combination, do not provide significantly more than the abstract idea itself. Additional element a falls within MPEP 2106.05(d) as well-understood, routine and conventional activities of presenting offers and gathering statistics (MPEP 2106.05(d)(II): OIP Techs., 788 F.3d at 1362-63, 115 USPQ2d at 1092-93). Additional element b is only restricting the abstract idea to a Particular Technological Environment (MPEP 2106.05(h)) which cannot provide significantly more. Thus, the claim is subject-matter ineligible.

Regarding Claim 6:
Dependent Claim 6 recites the method of Claim 1. Claim 1 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 6 does not recite any additional abstract ideas and only inherits the abstract ideas from Claim 1. Claim 6 thus recites an abstract idea.
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the sole additional element recited consists of wherein the operator of the received neural network model to be quantized is a quantizable operator comprised in the neural network model, wherein the quantizable operator is an operator of which a ratio of parameters comprised in an operator of the neural network model to all parameters of the neural network model exceeds a threshold value, or an operator which belongs to a computationally intensive operator (which is restricting the abstract idea to a Particular Technological Environment, by MPEP 2106.05(h)).
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the new sole additional element recited, alone or in combination, does not provide significantly more than the abstract idea itself. The additional element is only restricting the abstract idea to a Particular Technological Environment (MPEP 2106.05(h)) which cannot provide significantly more. Thus, the claim is subject-matter ineligible.

Regarding Claim 7:
Dependent Claim 7 recites the method of Claim 1. Claim 1 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 7 does not recite any additional abstract ideas and only inherits the abstract ideas from Claim 1. Claim 7 thus recites an abstract idea.
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the sole additional element recited consists of inserting a quantization indicating operator in front of a quantizable operator of the neural network model and indicating the quantizable operator, before the calculating of the quantization parameter corresponding to the operator of the neural network model to be quantized (which is restricting the abstract idea to a Particular Technological Environment, by MPEP 2106.05(h)).
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the new sole additional element recited, alone or in combination, does not provide significantly more than the abstract idea itself. The additional element is only restricting the abstract idea to a Particular Technological Environment (MPEP 2106.05(h)) which cannot provide significantly more. Thus, the claim is subject-matter ineligible.

Regarding Claim 8:
Dependent Claim 8 recites the method of Claim 7. Claim 7 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 8 further recites the method comprising of verifying whether weight data is present in input data of the quantizable operator (a human being can mentally apply evaluation to verify whether weight data is present in input data). Claim 8 thus recites an abstract idea (that falls into the “mental processes” group of abstract ideas).
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the additional elements recited consists of: 
wherein the indicating of the quantizable operator comprises: (which is restricting the abstract idea to a Particular Technological Environment, by MPEP 2106.05(h))
wherein when the weight data is not present in the input data of the quantizable operator, inserting the quantization indicating operator in front of the quantizable operator (which is restricting the abstract idea to a Particular Technological Environment, by MPEP 2106.05(h))
wherein when the weight data is present in the input data of the quantizable operator, inserting the quantization indicating operator in front of the quantizable operator, and inserting the quantization indicating operator in front of the weight data to indicate whether the weight data needs to be quantized (which is restricting the abstract idea to a Particular Technological Environment, by MPEP 2106.05(h))
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements recited, alone or in combination, do not provide significantly more than the abstract idea itself. Additional elements a-c is only restricting the abstract idea to a Particular Technological Environment (MPEP 2106.05(h)) which cannot provide significantly more. Thus, the claim is subject-matter ineligible.

Regarding Claim 9:
Dependent Claim 9 recites the method of Claim 1. Claim 1 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 9 does not recite any additional abstract ideas and only inherits the abstract ideas from Claim 1. Claim 9 thus recites an abstract idea.
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the sole additional element recited consists of wherein the neural network model is a deep learning neural network model trained to perform at least one of image recognition, natural language processing, and recommendation system processing. (which is restricting the abstract idea to a Particular Technological Environment, by MPEP 2106.05(h)).
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the new sole additional element recited, alone or in combination, does not provide significantly more than the abstract idea itself. The additional element is only restricting the abstract idea to a Particular Technological Environment (MPEP 2106.05(h)) which cannot provide significantly more. Thus, the claim is subject-matter ineligible.

Regarding Claim 10:
Dependent Claim 10 recites the method of Claim 1. Claim 1 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 10 does not recite any additional abstract ideas and only inherits the abstract ideas from Claim 1. Claim 10 thus recites an abstract idea.
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the sole additional element recited consists of A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform (to perform a mental process and the performance of an abstract idea on a computer is no more than instructions to “apply it” on a computer, by MPEP 2106.05(f)).
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the new sole additional element recited, alone or in combination, does not provide significantly more than the abstract idea itself. The additional element is merely applying the abstract idea on a computer (MPEP 2106.05(f)) which cannot provide significantly more. Thus, the claim is subject-matter ineligible.

Regarding Claims 11-19:
Claims 11-19 incorporate substantively all the limitations of Claims 1-9 in an apparatus (thus a machine) and further recites three new additional elements data acquirer configured to, quantization parameter configured to, and quantization implementer configured to (these claim limitations appear to perform a mental process and the performance of an abstract idea on a computer is no more than instructions to “apply it” on a computer, by MPEP 2106.05(f)) and does not appear to integrate the abstract idea into a particular application; thus, the claim is subject matter ineligible as it does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements, alone or in combination, do not provide significantly more than the abstract idea itself); thus, Claims 11-19 are rejected for reasons set forth in the rejections of Claims 1-9, respectively.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-6, 9-16, and 19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Filini et al., “Rate-Accuracy Optimization of Deep Convolutional Neural Network Models”, hereinafter “Filini”.
Regarding Claim 1:
A processor-implemented neural network model quantization method, the method comprising: 
(Filini, Fig. 3; Page 91, Column 2, Paragraph 4, “Another case which deserves much attention due to its real application is … when the learning and compression procedures are taking place among different devices”; Page 93, Column 1, Paragraph 7, “An overview of the coding framework architecture is illustrated in Fig. 3. As it can be noticed from the diagram, the method takes as input a pre-trained DNN model”. The DNN (which is considered a deep convolutional neural network within Filini) model is compressed on a computing device, in which a processor and CRM are inherent, via a quantization method which is shown with the main modules within the solution architecture of the compression (quantization) solution in Fig. 3).
receiving, by a neural network model quantization apparatus, a neural network model;
(Filini, Fig. 3; Page 93, Column 1, Paragraph 7, “An overview of the coding framework architecture is illustrated in Fig. 3. … the method takes as input a pre-trained DNN model”. The original model within Fig. 3 is considered the receiving of a neural network model. As the method receives the original model as the starting point for applying the quantization compression techniques). 
calculating, by the neural network model quantization apparatus, a quantization parameter corresponding to an operator of the received neural network model to be quantized based on bisection approximation; and 
(Filini, Fig. 3; Page 94, Column 1, Paragraph 5, “Quantization parameters: … In the case of the non-uniform quantizer, the encoder needs to transmit some side information to perform an accurate reconstruction at the decoder, namely the reconstruction levels, i.e. the centers of each quantization bin for each layer of the DNN model”; Page 94, Column 2, Paragraph 4, “1) Lloyd-Max quantization: … a Lloyd-Max algorithm [26] can be used to cluster the DNN weights and compute the centers of the quantization bins (centroid values) according to the DNN weights distribution”. The operator is interpreted as the DNN Weights within Fig. 3. The quantization parameter is calculated corresponding to the DNN weights (operator) of the received neural network model. The quantization parameter is based on finding an approximate midpoint or half-way point, i.e. centroid, in an interval (bin width); thus, the quantization is interpreted by the examiner as being based on bisection approximation). 
quantizing the operator of the received neural network model to be quantized based on the calculated quantization parameter, and obtaining a neural network model having the quantized operator,
(Filini, Fig. 3; Page 94, Column 1, Page 94, Column 2, Paragraph 4, “1) Lloyd-Max quantization: … After the clustering has been performed, each weight is quantized by replacing its value by the index (bin) of the region it belongs. Then, during the reconstruction step all weights belonging to the same bin are assigned the value of its center”; Page 94, Column 1, Paragraph 2, “Reconstruction: The model weights are finally reconstructed, i.e. from the quantization index of each weight, a reconstructed value is obtained with a minimum mean square (MSE) error”. The operator is then quantized based on the calculated quantization parameter, encoded/decoded (via the convolutional layers), and then reconstructed which is shown in Fig. 3. The reconstruction values are sent to the decompressed model which has he original DNN structure. Thus, obtaining a neural network model (DNN) having the quantized operator (weights)). 
wherein the bisection approximation comprises individually calculating the quantization parameter by combining data distribution subintervals and a minimum mean squared error (MSE).
(Filini, Page 94, Column 1, Paragraph 2, “Reconstruction: The model weights are finally reconstructed, i.e. from the quantization index of each weight, a reconstructed value is obtained with a minimum mean square (MSE) error”; Page 94, Column 2, Paragraph 4, “1) Lloyd-Max quantization: … weights are not uniformly distributed along their range, a suitable quantization scheme should intuitively use more bins where the distribution of weights lead to higher probability … a Lloyd-Max algorithm [26] can be used to cluster the DNN weights and compute the centers of the quantization bins (centroid values) according to the DNN weights distribution”. Page 94, Column 2, Paragraph 5, “The weights are then partitioned around these initial centroids in such a way that each partition consists in a set of weights that are well represented by the centroid in the mean square error (MSE) sense. In each iteration of the Lloyd-Max algorithm a new set of centroids is computed, which give an MSE less than or equal to the previous iteration. The centers have to be initialized … despite the guaranteed convergence of the algorithm to some local minimal in terms of MSE”. The quantization is based on finding an approximate midpoint, i.e. centroid, in an interval for data distribution of weights (bin width); thus, the quantization utilizes the calculation of the MSE is based off the bisectionally approximated midpoint which is done for each of the plurality of data distribution intervals coming from the DNN weight distribution).

Regarding Claim 2:
	Filini teaches the method of Claim 1 and further teaches:
wherein the calculating of the quantization parameter corresponding to the operator to be quantized comprises: 
receiving input data of the operator to be quantized by verifying the neural network model with a verification dataset; and 
(Filini, Fig. 3; Table 1; Page 93, Column 1, Paragraph 7, “An overview of the coding framework architecture is illustrated in Fig. 3. … the method takes as input a pre-trained DNN model”; Page 96, Column 1, Paragraph 5, “The rate-accuracy performance of the proposed framework has been assessed using the widely popular ImageNet Large- Scale Visual Recognition Challenge (ILSVRC) 2012 dataset [28] … The evaluation is performed with the first 1000 validation images of this dataset … For each image, the dataset mean value is subtracted and the image is spatially rescaled (downsampled) to make it compatible with the input layer of the DNN model. Two DNN models are used, the AlexNet [4] and Oxford’s VGG-net … More details about the models are reported in Table I”. The DNN weights, within the original model shown in Fig. 3, is interpreted as the input data of the operator to be quantized which is verified by the validation dataset (interpreted as a verification dataset) of images on the model that is having their weights quantized to obtain an updated neural network based off the models shown in Table 1). 
calculating a quantization parameter corresponding to the minimum mean squared error (MSE) of the input data of the operator to be quantized before and after quantization based on the input data of the operator to be quantized, by implementing bisection approximation.  
(Filini, Fig. 3; Page 94, Column 2, Paragraph 5, “The first step in Lloyd-Max algorithm is to select an initial set of K points (where K is the number of clusters), or centroids, from the weights that need to be quantized. The weights are then partitioned around these initial centroids in such a way that each partition consists in a set of weights that are well represented by the centroid in the mean square error (MSE) sense. In each iteration of the Lloyd-Max algorithm a new set of centroids is computed, which give an MSE less than or equal to the previous iteration. The centers have to be initialized to some values before the iterative clustering procedure starts and different choices for the initialization can lead to different rate accuracy results, despite the guaranteed convergence of the algorithm to some local minimal in terms of MSE”; Page 94, Column 1, Paragraph 2, “Reconstruction: The model weights are finally reconstructed, i.e. from the quantization index of each weight, a reconstructed value is obtained with a minimum mean square (MSE) error”. The decision boundaries (bin widths where the centroid is at the midpoint) are considered to be a quantization parameter as the bins are integral to the quantization. The quantization parameter corresponds to a minimum MSE as the intervals/bins are partitioned (divided) in an MSE sense. The centers of the bins are initialized and are iterated on before and after each iterative step, where the Lloyd-Max algorithm calculates a new set of centroids each time and compares to the previous iteration. This is based on implementing the bisection approximation method of find the midpoint within the intervals/bins for the final convergence to a local minimal term in MSE). 

Regarding Claim 3:
	Filini teaches the method of Claim 2 and further teaches: 
wherein the calculating of the quantization parameter corresponding to the minimum MSE comprises: 
performing dimensionality reduction on the input data of the operator to be quantized; 
(Filini, Fig. 3; The CNN encoder takes the weight matrix as input and maps it to a lower-dimensionality space while the Lloyd-Max quantization reduces the precision of the weights. The CNN thus performs dimensionality reduction on the input data of the operator).
dividing the input data of the operator to be quantized after the performing of the dimensionality reduction into a plurality of data distribution intervals based on a statistical characteristic of the input data of the operator to be quantized after the dimensionality reduction, and obtaining an interval upper value array which is an array of upper values in each of the plurality of data distribution intervals; and 
(Filini, Fig. 3; Page 94, Column 2, Paragraph 4, “… a Lloyd-Max algorithm [26] can be used to cluster the DNN weights and compute the centers of the quantization bins (centroid values) according to the DNN weights distribution. … After the clustering has been performed, each weight is quantized”. The input data of the operators (DNN weights) to be quantized are first clustered to compute the bins (which is interpreted as an interval by the examiner as an interval is any region between two endpoints) based on the DNN weight distribution (which is interpreted by the examiner as a statistical characteristic of the input data); thus, the Lloyd-Max algorithm first divides the input data of the operator into intervals (bins) via clusters and then after quantizes each weight (where the Lloyd-Max quantization occurs after the dimensionality reduction via the CNN encoder which is shown in Fig. 3 where the input data goes into the encoder and then starts with the quantization process -> arithmetic encoding). The interval upper value array (which is interpreted by the examiner as the boundaries of the intervals) is obtained iteratively via the Lloyd-Max quantization process where each bin edge (the boundaries of the bin/interval) is computed based on the data distribution intervals).
searching for the quantization parameter corresponding to the minimum MSE by bisectionally approximating an intermediate point between a start point and an end point of each of the data distribution intervals, by implementing bisection approximation.  
(Filini, Page 94, Column 2, Paragraph 5, “The first step in Lloyd-Max algorithm is to select an initial set of K points (where K is the number of clusters), or centroids, from the weights that need to be quantized. The weights are then partitioned around these initial centroids in such a way that each partition consists in a set of weights that are well represented by the centroid in the mean square error (MSE) sense. In each iteration of the Lloyd-Max algorithm a new set of centroids is computed, which give an MSE less than or equal to the previous iteration … different choices for the initialization can lead to different rate accuracy results, despite the guaranteed convergence of the algorithm to some local minimal in terms of MSE”; Page 94, Column 1, Paragraph 2, “Reconstruction: The model weights are finally reconstructed, i.e. from the quantization index of each weight, a reconstructed value is obtained with a minimum mean square (MSE) error”.  The quantization parameter corresponding to the minimum MSE is searched via bisectionally approximating a midpoint (interpreted as an intermediate point) between the bin edges (start point interval -> end point of interval) by utilizing the Lloyd-Max algorithm and reconstruction (where the quantization index of each DNN weight is reviewed in terms of minimum MSE) within the compression solution architecture).

Regarding Claim 4:
	Filini teaches the method of Claim 3 and further teaches: 
wherein the quantization parameter comprises at least one of a … quantization factor parameter … of each of the plurality of data distribution intervals.  
(Filini, Page 93, Column 2, Paragraph 2, “Weight quantization: Quantization is performed layer by layer. All the weights are quantized using the same quantizer, regardless of the number of levels. Two types of scalar quantizers are studied: 1) non-uniform quantizer with Lloyd-Max bin computation and 2) uniform quantizer with dead-zone”. Lloyd-Max quantization uses the centroids factors and interval boundaries (bin edges) factors to dictate the quantization parameter; thus, a quantization factor parameter which is used in each of the plurality of data distribution intervals as the data distribution is non-uniform).

Regarding Claim 5:
	Filini teaches the method of Claim 3 and further teaches: 
wherein the searching for the quantization parameter comprises: 
initializing the minimum MSE to be an initial MSE of each of the plurality of data distribution intervals when obtaining the interval upper value array each time for each of the plurality of data distribution intervals; 
(Filini, Page 94, Column 2, Paragraph 4, “Thus, a Lloyd-Max algorithm [26] can be used to cluster the DNN weights and compute the centers of the quantization bins (centroid values) according to the DNN weights distribution … In each iteration of the Lloyd-Max algorithm a new set of centroids is computed, which give an MSE less than or equal to the previous iteration. The centers have to be initialized to some values before the iterative clustering procedure starts and different choices for the initialization can lead to different rate accuracy results, despite the guaranteed convergence of the algorithm to some local minimal in terms of MSE”. The centroids are computed for the center of the bins where the bin width is the size of the interval for the data distribution interval of the DNN weight distribution. These centroids are first initialized with some initial minimum MSE that will converge with each iteration when obtaining the interval upper value array (boundaries of the intervals) and is obtained iteratively via the Lloyd-Max quantization process where each bin edge (the boundaries of the bin/interval) is computed based on the data distribution intervals for each of the plurality of data distribution interval).
calculating an MSE of an approximate point of each of the plurality of data distribution intervals by bisectionally approximating the intermediate point between the start point and the end point of each of the plurality of data distribution intervals; 
(Filini, Page 94, Column 2, Paragraph 4, “1) Lloyd-Max quantization: … a Lloyd-Max algorithm [26] can be used to cluster the DNN weights and compute the centers of the quantization bins (centroid values) according to the DNN weights distribution”. Page 94, Column 2, Paragraph 5, “The weights are then partitioned around these initial centroids in such a way that each partition consists in a set of weights that are well represented by the centroid in the mean square error (MSE) sense. In each iteration of the Lloyd-Max algorithm a new set of centroids is computed, which give an MSE less than or equal to the previous iteration. The centers have to be initialized to some values before the iterative clustering procedure starts and different choices for the initialization can lead to different rate accuracy results, despite the guaranteed convergence of the algorithm to some local minimal in terms of MSE”; Page 94, Column 1, Paragraph 2, “Reconstruction: The model weights are finally reconstructed, i.e. from the quantization index of each weight, a reconstructed value is obtained with a minimum mean square (MSE) error”.   The quantization is based on finding an approximate midpoint or half-way point or intermediate point, i.e. centroid, in an interval (bin width which has a start point for the interval and an end point); thus, the quantization that utilizes the calculation of the MSE is based off the bisectionally approximated intermediate point which is done for each of the plurality of data distribution intervals coming from the DNN weight distribution).
updating the minimum MSE by implementing the MSE of the approximate point when the MSE of the approximate point is less than the minimum MSE; and 
(Filini, Page 94, Column 2, Paragraph 4, “1) Lloyd-Max quantization: … each weight is quantized by replacing its value by the index (bin) of the region it belongs. Then, during the reconstruction step all weights belonging to the same bin are assigned the value of its center … In each iteration of the Lloyd-Max algorithm a new set of centroids is computed, which give an MSE less than or equal to the previous iteration … guaranteed convergence of the algorithm to some local minimal in terms of MSE”; Page 94, Column 1, Paragraph 2, “Reconstruction: The model weights are finally reconstructed, i.e. from the quantization index of each weight, a reconstructed value is obtained with a minimum mean square (MSE) error”. The minimum MSE is updated each iteration of the Lloyd-Max algorithm as new sets of centroids are computed when MSE is less than or equal to the previous iteration).  
outputting the quantization parameter corresponding to the minimum MSE when traversing the data distribution intervals, wherein the initial MSE corresponds to a quantization parameter corresponding to an intermediate point between a start point and an end point of each of the data distribution intervals, and wherein the MSE of the approximate point corresponds to a quantization parameter corresponding to an approximate point of each of the data distribution intervals.  
(Filini, Fig. 3; Page 94, Column 1, Paragraph 5, “Quantization parameters: Some information about the quantization indices (codebook) is necessary for the decoder to perform weight reconstruction. In the case of the non-uniform quantizer, the encoder needs to transmit some side information to perform an accurate reconstruction at the decoder, namely the reconstruction levels, i.e. the centers of each quantization bin for each layer of the DNN model”; Page 94, Column 2, Paragraph 4, “1) Lloyd-Max quantization: … a Lloyd-Max algorithm [26] can be used to cluster the DNN weights and compute the centers of the quantization bins (centroid values) according to the DNN weights distribution”; Page 93, Column 2, Paragraph 2, “Weight quantization: Quantization is performed layer by layer … The output of this step are the quantization indices (bins) that represent each weight”. The quantization parameter is calculated corresponding to the DNN weights (operator) of the received neural network model and corresponds to the minimum MSE when traversing the bins within the probability density function of the weight distribution. The quantization parameter is used as output via a quantization index for each of the bins (which corresponds to the MSE and based on bisectional approximation)). 

Regarding Claim 6:
	Filini teaches the method of Claim 1 and further teaches: 
wherein the operator of the received neural network model to be quantized is a quantizable operator comprised in the neural network model, 
wherein the quantizable operator is … or an operator which belongs to a computationally intensive operator.  
(Filini, Fig. 3; Page 92, Abstract, “An efficient compression framework for the parameters of a neural network, more precisely the weights that interconnect the different neurons, which consume a significant amount of resources (memory, storage and bandwidth) is proposed”. The CNN encoder takes the weight matrix as input and maps it to a lower-dimensionality space while the Lloyd-Max quantization reduces the precision of the weights. Thus, the compression (quantization) framework of the weights (operator) is an operator which belongs to a computationally intensive operator).

Regarding Claim 9:
	Filini teaches the method of Claim 1 and further teaches: 
wherein the neural network model is a deep learning neural network model trained to perform at least one of image recognition …  
(Filini, Page 96, Column 1, Paragraph 5, “… the proposed framework has been assessed using the widely popular ImageNet Large- Scale Visual Recognition Challenge (ILSVRC) 2012 dataset [28]. The images were collected from the web and labeled by humans using a crowd sourcing tool. The evaluation is performed with the first 1000 validation images of this dataset, for which ground-truth data is available … Two DNN models are used, the AlexNet [4] and Oxford’s VGG-net … More details about the models are reported in Table I. The Berkely Vision Caffe deep learning framework [29] was used for the implementation of the DNN model, for which the pre-trained models described above are available”. The deep learning neural network model is trained to perform accurate classification of images; thus, performing an image recognition task)

Regarding Claim 10:
	Filini teaches:
A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors … and further teaches to perform the neural network model quantization method of claim 1: 
(Filini, Fig. 3; Page 91, Column 2, Paragraph 4, “Another case which deserves much attention due to its real application is … when the learning and compression procedures are taking place among different devices”; Page 93, Column 1, Paragraph 7, “An overview of the coding framework architecture is illustrated in Fig. 3. As it can be noticed from the diagram, the method takes as input a pre-trained DNN model”. The DNN (which is considered a deep convolutional neural network within Filini) model is compressed on a computing device, in which a processor and CRM are inherent, via a quantization method which is shown with the main modules within the solution architecture of the compression (quantization) solution in Fig. 3).

Regarding Claims 11-16, and 19:
Claims 11-16 and 19 incorporate substantively all the limitations of Claims 1-6 and 9 in an apparatus and further recites three new additional elements data acquirer configured to (Filini, Fig. 1 & 3; Page 91, Column 2, Paragraph 4, “… This scenario is illustrated in Fig. 1 and allows a service provider … to compress models …”. Fig. 1 shows the ability to compress models with a device (interpreted as the apparatus) by sending models which acquire the data comprising the DNN structure and DNN Weights (shown in Fig. 3); thus, the DNN model compression device is interpreted as the data acquirer configured to receive a neural network as it receives the input data), quantization parameter calculator configured to (Filini, Page 94, Column 2, Paragraph 4, “1) Lloyd-Max quantization: … a Lloyd-Max algorithm [26] can be used to cluster the DNN weights and compute the centers of the quantization bins (centroid values) according to the DNN weights distribution”. The Lloyd-Max algorithm is interpreted as the quantization parameter calculator as it calculates the parameter for quantization of each operator based on bisectional approximation), and quantization implementer configured to (Filini, Fig 3. The quantization implementor is the total DNN model compression solution architecture (Fig. 3) as the weights are quantized within the Quantization module within the Encoder and then implemented finally with the Reconstruction module with the Decoder to obtain the final Decompressed model.); thus, Claims 11-16 and 19 are rejected for reasons set forth in the rejections of Claims 1-6 and 9, respectively.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 7-8 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Filini et al., “Rate-Accuracy Optimization of Deep Convolutional Neural Network Models”, in view of Elmer et al., US 11,816,446 B2.
Regarding Claim 7:
	Filini teaches the method of Claim 1 and further teaches: 
further comprising: … quantization … operator… quantizable operator of the neural network model … before the calculating of the quantization parameter corresponding to the operator of the neural network model to be quantized.  
(Filini, Fig. 3; Page 93, Paragraph 7, “…the coding framework architecture is illustrated in Fig. 3. As it can be noticed from the diagram, the method takes as input a pre-trained DNN model. The proposed scheme only compresses the weights, … only the parameters (weights) are coded and transmitted”; Page 94, Column 2, “The first step in Lloyd-Max algorithm is to select an initial set of K points (where K is the number of clusters), or centroids, from the weights that need to be quantized.”. The quantization occurs on the operators (DNN weights) from the original model. The original model only sends the DNN weights to the model compression solution before the calculating of the quantization parameter corresponding to the DNN weight (operator) to be quantized via the Lloyd-Max algorithm which clusters the initial set of points/centroids from the weights that need to be quantized).
While Filini teaches before the calculation of the quantization parameter and only handles the weights from the DNN model that need to be quantized. Filini does not explicitly disclose how the DNN model is indicating the need to quantize a quantizable operator. 
However Elmer teaches indicators for indicating:
… inserting a … indicating operator in front of a … operator … and indicating the … operator … 
	(Elmer, Fig. 7: 702, “RECEIVE, AT A PROCESSING ELEMENT (PE} FOR NEURAL NETWORK COMPUTATIONS, A ZERO WEIGHT INDICATOR … INDICATING WHETHER A WEIGHT VALUE IS ZERO”; Column 19, Line 44-48 “Each zero weight detector … may be configured to detect whether a weight value … is zero and generate a corresponding zero weight indicator …”; Column 7, Lines 34-39, “The PE 200 may be configured to receive an input data element 222, a weight 224, a zero data element indicator 226, a zero weight indicator 228 … to perform the convolution computations …” The zero weight indicator indicates if the processing element (input data) has a weight for calculations (which includes quantization within Elmer). Thus, the examiner interprets the receiving of the indicators with the input data as inserting an indicating operator in front of a operator (PE) as the indicator is labeling the processing element prior to processing/quantization).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the calculations and quantization corresponding to an operator from a neural network by Filini, with the indicators taught by Elmer to be able to mark/flag quantizable operators to be able to increase data type compatibility and monitor processing/power consumptions (see Elmer, Column 1, Lines 19-, “As more applications use artificial neural networks, the applications also use a wide range of input data types and input data ranges. Increasing data type compatibility and data range compatibility can often result in increases to the complexity, size, and cost of processing elements in the systolic array. These increases can also affect the system processing speed and the system power consumption. Power consumption and size of the systolic array can become important considerations when a systolic array is required to support multiple data types.”).

Regarding Claim 8:
	Filini and Elmer teach the method of Claim 7 where Filini teaches the quantization and Elmer teaches the indicators. Elmer further teaches: 
wherein the indicating of the … operator comprises: 
verifying whether weight data is present in input data of the … operator; 
(Elmer, Column 19, Line 44-48 “Each zero weight detector … may be configured to detect whether a weight value … is zero” The zero weight detector verifies if the processing element (input data) has a weight present).
wherein when the weight data is not present in the input data of the … operator, inserting the … indicating operator in front of the … operator; and 
(Elmer, Column 19, Line 44-48 “… generate a corresponding zero weight indicator.”; Column 19, Lines 23-27, “… zero weight detector 308a may include a comparator to compare the weight value with a zero to assert (e.g., set to "1") or de-assert (e.g., set to "0") the zero weight indicator 228”. The zero weight indicator indicates whether a weight value (data) is not present (assert) or is present (de-assert) in the processing element (input data)).
wherein when the weight data is present in the input data of the … operator, inserting the … indicating operator in front of the … operator, and inserting the … indicating operator in front of the weight data to indicate whether the weight data needs to be quantized.  
(Elmer, Column 19, Line 44-48 “… generate a corresponding zero weight indicator.”; Column 19, Lines 23-27, “… zero weight detector 308a may include a comparator to compare the weight value with a zero to assert (e.g., set to "1") or de-assert (e.g., set to "0") the zero weight indicator 228”. The zero weight indicator indicates whether a weight value (data) is not present (assert) or is present (de-assert) in the processing element (input data)).
The motivation of Claim 7’s combination of Filini and Elmer is still maintained.

Regarding Claims 17-18:
Claims 17-18 incorporate substantively all the limitations of Claims 7-8 in an apparatus and further recites three new additional elements data acquirer configured to (Filini, Fig. 1 & 3; Page 91, Column 2, Paragraph 4, “… This scenario is illustrated in Fig. 1 and allows a service provider … to compress models …”. Fig. 1 shows the ability to compress models with a device (interpreted as the apparatus) by sending models which acquire the data comprising the DNN structure and DNN Weights (shown in Fig. 3); thus, the DNN model compression device is interpreted as the data acquirer configured to receive a neural network as it receives the input data), quantization parameter calculator configured to (Filini, Page 94, Column 2, Paragraph 4, “1) Lloyd-Max quantization: … a Lloyd-Max algorithm [26] can be used to cluster the DNN weights and compute the centers of the quantization bins (centroid values) according to the DNN weights distribution”. The Lloyd-Max algorithm is interpreted as the quantization parameter calculator as it calculates the parameter for quantization of each operator based on bisectional approximation), and quantization implementer configured to (Filini, Fig 3. The quantization implementor is the total DNN model compression solution architecture (Fig. 3) as the weights are quantized within the Quantization module within the Encoder and then implemented finally with the Reconstruction module with the Decoder to obtain the final Decompressed model.); thus, Claims 17-18 are rejected for reasons set forth in the rejections of Claims 7-8, respectively.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IBRAHIM RAHMAN whose telephone number is (703)756-1646. The examiner can normally be reached M-F 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/I.R./Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
Read full office action
NEURAL NETWORK MODEL QUANTIZATION METHOD AND APPARATUS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

NEURAL NETWORK MODEL QUANTIZATION METHOD AND APPARATUS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email