DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 9/19/2025 has been entered.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 4, 6-9,11, 13-14, 16-20 are rejected under 35 U.S.C. 101
because the claimed invention is directed to an abstract idea without significantly
more.
When considering subject matter eligibility under 35 U.S.C. 101, it must be
determined whether the claim is directed to one of the four statutory categories of
invention, i.e., process, machine, manufacture, or composition of matter (Step 1). If the
claim does fall within one of the statutory categories, the second step in the analysis is
to determine whether the claim is directed to a judicial exception (Step 2A). The Step 2A
analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined
whether or not the claims recite a judicial exception (e.g., mathematical concepts,
mental processes, certain methods of organizing human activity). If it is determined in
Step 2A, Prong 1 that the claims recite a judicial exception, the analysis proceeds to the
second prong (Step 2A, Prong 2), where it is determined whether or not the claims
integrate the judicial exception into a practical application. If it is determined at step 2A,
Prong 2 that the claims do not integrate the judicial exception into a practical
application, the analysis proceeds to determining whether the claim is a patent-eligible
application of the exception (Step 2B). If an abstract idea is present in the claim, any
element or combination of elements in the claim must be sufficient to ensure that the
claim integrates the judicial exception into a practical application, or else amounts to
significantly more than the abstract idea itself. Applicant is advised to consult the 2019
PEG for more details of the analysis.
Step 1
According to the first part of the analysis, in the instant case, claims 1-10, 11-17, 18-20 are directed to a method, medium and method of quantizing a ML model. Thus, each of the claims falls within one of the four statutory categories (i.e. process, machine, manufacture, or composition of matter). Step 2A,
Step 2A, Prong 1
Following the determination of whether or not the claims fall within one of the four
categories (Step 1), it must be determined if the claims recite a judicial exception (e.g.
mathematical concepts, mental processes, certain methods of organizing human
activity) (Step 2A, Prong 1). In this case, the claims are determined to recite a judicial
exception as explained below.
Regarding Claims 1, 11 these claims recite
generating a first set of quantized feature values based on a first set of feature values and a first set of quantization levels; determining that a first output generated by the machine learning model based on the first set of quantized feature values being input into the machine learning model matches a second output generated by the machine learning model based on the first set of feature values; and storing a mapping of the first set of quantized feature values to the first output in a lookup table representing the machine learning model.
Regarding Claim 18 recites
matching a first set of feature values for the machine learning model to a first set of quantized feature values included in a lookup table representing the machine learning model, wherein the lookup table comprises a plurality of mappings between a plurality of sets of quantized feature values to a plurality of outputs generated by the machine learning model, wherein the plurality of mappings is generated based on determining that the plurality of outputs generated by the machine learning model based on the plurality of sets of quantized feature values matches a plurality of second outputs generated by the machine learning model based on a plurality of set of feature values corresponding to the plurality of sets of quantized feature values; retrieving a first output that is mapped to the first set of quantized feature values within the lookup table; and generating a predicted output of the machine learning model for the first set of feature values based on the first output.
The claims recite a mental process. As set forth in MPEP 2106.04(a)(2)(III)(C), “Claims can recite a mental process even if they are claimed as being performed on a computer”. These claims are disclosed as a human user performing these functions, simply using a computer as a tool-disclosed at Fig. 1, specification [0021]-[0028], etc. Thus, the claim recites abstract ideas.
Step 2A, Prong 2
Following the determination that the claims recite a judicial exception, it must be
determined if the claims recite additional elements that integrate the exception into a
practical application of the exception (Step 2A, Prong 2). In this case, after considering
all claim elements individually and as an ordered combination, it is determined that the
claims do not include additional elements that integrate the exception into a practical
application of the exception as explained below.
Step 2A Prong 2:
In Prong Two, a claim is evaluated as a whole to determine whether the recited judicial exception is integrated into a practical application of that exception. A claim is not “directed to” a judicial exception, and thus is patent eligible, if the claim as a whole integrates the recited judicial exception into a practical application of that exception. A claim that integrates a judicial exception into a practical application will apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the judicial exception. MPEP 2106.04(d). The claim 1, 11 and 18 recite an abstract idea and further the claims as a whole does not integrate the recited judicial exception into a practical application of the exception. A claim that integrates a judicial exception into a practical application will apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the judicial exception. MPEP 2106.04(d).
Regarding Claims 1, 11 and 18 these claims
This limitation recites using one or more neural networks as a tool to perform an
abstract idea, which is not indicative of integration into a practical application. MPEP 2106.05(f).)
Mere Instructions to Apply an Exception. Do the additional element(s) amount to merely the words “apply it” (or an equivalent) or are mere instructions to implement an abstract idea or other exception on a computer? Yes
which is not indicative of integration into a practical application. MPEP 2106.05(f).)
Step 2B
Based on the determination in Step 2A of the analysis that the claims are
directed to a judicial exception, it must be determined if the claims contain any element
or combination of elements sufficient to ensure that the claim amounts to significantly
more than the judicial exception (Step 2B). In this case, after considering all claim
elements individually and as an ordered combination, it is determined that the claims do
not include additional elements that are sufficient to amount to significantly more than
the judicial exception for the same reasons given above in the Step 2A, Prong 2
analysis. Furthermore, each additional element identified above as being insignificant
extra-solution activity is also well-known, routine, conventional as described below.
Step 2B:
Claims 1, 11 and 18:
The claims do not include additional elements, alone or in combination, that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than generic computing components and field of use/technological environment which do not amount to significantly more than the abstract idea. The underlying concept merely receives information, analyzes it, and store the results of the analysis – this concept is not meaningfully different than concepts found by the courts to be abstract (see Electric Power Group, collecting information, analyzing it, and displaying certain results of the collection and analysis; see Cybersource, obtaining and comparing intangible data; see Digitech, organizing information through mathematical correlations; see Grams, diagnosing an abnormal condition by performing clinical tests and thinking about the results; see Cyberfone, using categories to organize store and transmit information; see Smartgene, comparing new and stored information and using rules to identify options). Claims recite a mental process even if they are claimed as being performed on a computer and they are disclosed as a human user performing these functions, simply using a computer as a tool-disclosed at Fig. 1, specification [0021]-[0028], etc. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as a combination do not amount to significantly more than the abstract idea. For example, claim 1 and 11 recite generating a first set of quantized feature values based on a first set of feature values and a first set of quantization levels; determining that a first output generated by the machine learning model based on the first set of quantized feature values being input into the machine learning model matches a second output generated by the machine learning model based on the first set of feature values; and storing a mapping of the first set of quantized feature values to the first output in a lookup table representing the machine learning model and claim 18 recite matching a first set of feature values for the machine learning model to a first set of quantized feature values included in a lookup table representing the machine learning model, wherein the lookup table comprises a plurality of mappings between a plurality of sets of quantized feature values to a plurality of outputs generated by the machine learning model, wherein the plurality of mappings is generated based on determining that the plurality of outputs generated by the machine learning model based on the plurality of sets of quantized feature values matches a plurality of second outputs generated by the machine learning model based on a plurality of set of feature values corresponding to the plurality of sets of quantized feature values; retrieving a first output that is mapped to the first set of quantized feature values within the lookup table; and generating a predicted output of the machine learning model for the first set of feature values based on the first output. These elements are recited at a high level of generality and are well-understood, routine, and conventional activities in the computer art. Generic computers performing generic computer functions, without an inventive concept, do not amount to significantly more than the abstract idea. Looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims do not amount to significantly more than the abstract idea itself.
Step 2A/2B Prong 2 Dependent Claims
Regarding to claims 4, 14
Claim 4 and 14 merely recite other additional elements that generate a prediction based on criteria and input of the ML which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 6
Claim 6 merely recite other additional elements that storing data model which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 7
Claim 7 merely recite other additional elements that defining quantization resolution which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claims 8-9
Claims 8-9 merely recite other additional elements that defining training data set in the training ML model which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 13
Claim 13 merely recite other additional elements that generating features and determining output of the ML model based on conditions which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 16
Claim 16 merely recite other additional elements that defining data mappings which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 17
Claim 17 merely recite other additional elements that storing data mappings which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 19
Claim 19 merely recite other additional elements that defining features which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 20
Claim 20 merely recite other additional elements that defining resolutions which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 4, 6-9, 11-20 are rejected under 35 U.S.C. 103 as being unpatentable over Raha et al. (Raha) “qLUT: Input-Aware Quantized Table Lookup for Energy-Efficient Approximate Accelerators”. ACM Transactions on Embedded Computing Systems, Vol. 16, No. 5s, Article 130, pages 1-23, Publication date: September 2017, XP058677071, ISSN: 1539-9087, https://doi.org/10.1145/3126531 in view of Smith et al. (Smith) US 2022/0122305
In regard to claim 1, Raha disclose A computer-implemented method for quantizing a machine learning model, the method comprising: (abstract, “optimizing the performance and energy consumption of error-resilient applications in domains such as machine learning, graphics, data analytics, etc. Numerous techniques for approximate computing have been proposed at different layers of the system stack, from circuits to architecture to software. In this work, we propose a new technique, called quantized table lookup, for approximating the meta-functions used in the core computational kernels of error-resilient applications “…“the proposed technique instead approximates the input data to the meta-functions by reducing/quantizing them to a much smaller set of values that we call quantized inputs. The small number of quantized inputs enables us to completely replace the energy-intensive arithmetic units in the meta-function with small and energy efficient lookup tables (called quantized lookup tables or qLUT) that contain precomputed output values corresponding to the quantized inputs”)
generating a first set of quantized feature values based on a first set of feature values and a first set of quantization levels; (section 3.4, “The high level operation is explained in Algo. 1. It first shows that the total number of vbins is determined from the specified quality degradation bound (extract_bins) using representative sample training sets of input and reference vectors (train_inp_vec and train_ref _vec) and see line 2 of algorithm 1: {Ni , Nr } = extract_bins(Q, train_inp_vec, train_ref_vec); various extract_bins are determined with specified quality degradation, various inputs based on specified quality degradation are inputted to the ML model and generating corresponding outputs from the ML model)
storing mapping of the first set of quantized feature values to the first output in a lookup table representing the machine learning model. (section 1, “Each vbin represents a range of consecutive input values and is assigned a representative bin value which is used to construct the quantized lookup table (qLUT). The original input vectors are then converted to quantized input vectors using these vbins, and each element of this quantized vector is now used to index into the qLUT for obtaining the expected output result.” section 3, 4. “Once the vbins are determined and the respective bin values are assigned, they are used to construct the qLUT (series of memory-mapped registers) using the original complex function (fc, e.g., multiplication for dot product and subtraction+multiplication for distance computation). Subsequently, each original input vector is converted to the quantized values (using conv_vec described in Algo. 3) before being fed into the hardware accelerator for generating the meta-function output.” Mapping of the inputs and outputs are stored in the lookup table)
But Raha fail to explicitly disclose “determining a first output generated by the machine learning model based on the first set of quantized feature values being input into the machine learning model matches a second output generated by the machine learning model based on the first set of feature values;”
Smith disclose determining a first output generated by the machine learning model based on the first set of quantized feature values being input into the machine learning model matches a second output generated by the machine learning model based on the first set of feature values; (Fig. 6-7, [0043]-[0049][0123][0144]-[0158] generate the first output with lower resolution image match the generate second output with high-resolution output image from the generator NN based on the input feature values, the outputs are generated from the NN through different layers and the inputs are based on the latent space representation of the input image, Note: “based on” is very broad, please further define feature values, quantized feature values, what are the differences between them, what are the features, etc. to help move forward the prosecution and please call to discuss if necessary.)
It would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made to incorporate Smith‘s generative NN projection into Raha’s invention as they are related to the same field endeavor of model training and learning based on precision. The motivation to combine these arts, as proposed above, at least because Smith‘s generative NN with different precision levels would help to provide more different precisions training into Raha’s system. Therefore it would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made that providing more different precisions in training ML model would help to improve accuracy of prediction precision and training efficiency.
In regard to claim 2, Raha and Smith disclose The computer-implemented method of claim 1, the rejection is incorporated herein.
Raha disclose further comprising:
generating a second set of quantized feature values based on the first set of feature values and a second set of quantization levels; and determining that a third output generated by the machine learning model based on the second set of quantized feature values does not match the second output, (section 1, “Each vbin represents a range of consecutive input values and is assigned a representative bin value which is used to construct the quantized lookup table (qLUT). The original input vectors are then converted to quantized input vectors using these vbins, and each element of this quantized vector is now used to index into the qLUT for obtaining the expected output result. Since this combination of quantization and binning results in a many-to-one mapping of values, it results in some information loss of due to the reduction of finegrained input values to coarse-grained quantized values.” Section 3.4, “The high level operation is explained in Algo. 1. It first shows that the total number of vbins is determined from the specified quality degradation bound (extract_bins) using representative sample training sets of input and reference vectors (train_inp_vec and train_ref _vec). Note that this is obtained from a one-time characterization step performed before the start of computation (using software). This step can be invoked again during runtime for recalibrating the estimates as mentioned later in Section 3.4.1. Once the number of vbins is determined, we first construct the different vbins as shown in Algo. 2 (create_bins). This function takes into account the probability distribution of the input vectors to determine the appropriate value ranges to be allocated to each bin. Once the vbins are determined and the respective bin values are assigned, they are used to construct the qLUT (series of memory-mapped registers) using the original complex function (fc, e.g., multiplication for dot product and subtraction+multiplication for distance computation)” outputs are generated based on the input with quality degradation) and
wherein the first set of quantization levels is associated with a higher quantization resolution than the second set of quantization levels. (section 3.4, “The high level operation is explained in Algo. 1. It first shows that the total number of vbins is determined from the specified quality degradation bound (extract_bins) using representative sample training sets of input and reference vectors (train_inp_vec and train_ref _vec). Note that this is obtained from a one-time characterization step performed before the start of computation (using software).” Vbins are determined with specified quality degradation bound which can be higher)
In regard to claim 4, Raha and Smith disclose The computer-implemented method of claim 1, the rejection is incorporated herein.
Raha disclose further comprising: determining that an input set of feature values does not match one or more sets of quantized feature values stored in the lookup table; and applying the machine learning model to the input set of feature values to generate a prediction associated with the input set of feature values. (section 3.4. “(i) Calibration phase and (ii) Evaluation phase. In the calibration phase, the probability distributions of test input vectors and reference vectors are created and the vbins are reconstructed to be able to adapt to varying input characteristics. For example, depending on the input characteristics, it may be required to not only change the vbins definitions but also the number of vbins itself to satisfy the specified quality bound. Hence, this phase is used to reconfigure the accelerator design (qLUT and vbins). Note that this reconfiguration is possible due to the register-based implementation of qLUT that can be loaded by the software running on the processor using memory mapped I/O. The calibration phase can be invoked periodically (e.g., after every 100 test vectors) or can be invoked on demand when there is a necessity to change the quality mode. Note that the cdf for reference vectors needs to be created just once before the first evaluation phase since it is usually not expected to change throughout program execution. Evaluation phase is when the actual operation occurs. During this phase, the accelerator uses the LUT to determine the estimates for the complex arithmetic functions that are accumulated and finally sent back to the processor for generating the overall result. Note that the calibration phase incurs negligible overhead compared to the runtime of the evaluation phase. Note that due to the presence of the memory mapped registers, it is possible to reconfigure the qLUT during runtime (calibration) based on the target output quality degradation bound.” Reconstruct the qLUT based on the estimation and mapping)
In regard to claim 6, Raha and Smith disclose The computer-implemented method of claim 1, the rejection is incorporated herein.
Raha disclose further comprising storing, in the mapping, an uncertainty value associated with the first set output. (section 3.3. “In the proposed technique, input-aware approximation is achieved by considering the input statistics (probability distribution of input vector elements) while creating the value bins. Instead of uniformly distributing the entire range of values in the vbins, the vbins are now formed based on the probability in which they appear in the input vectors (both test and reference vectors). Separate probability distributions are created for reference vectors and input test vectors using a sample representative set. For some applications, such as KNN, KMEANS, IMG-SEG, and GLVQ, the characteristics of the input test vector is very similar to those of the reference vectors and hence, a single probability distribution would suffice” input vector including statistics information with associated with its output)
In regard to claim 7, Raha and Smith disclose The computer-implemented method of claim 1, the rejection is incorporated herein.
Raha disclose wherein the first set of quantization levels is associated with an increase in the second set of quantization levels by an increment or a multiple. (section 3.3, 3.4, ” “Similar to prior works [6, 8], the exact approximation configuration (vbins and qLUT size) required for achieving a particular quality degradation bound can be obtained by running the applications multiple times, each time with a different number of vbins, using a randomly selected sample training input vector set.” “Figures 5(b) and (c) demonstrate how the vbins are constructed for different quality requirements. These examples show a naive way of constructing vbins for two different quality degradation bounds, 5% and 1%, respectively. In this approach, the cdf is split into equal vbins based on the probability distribution of the values.” “Note that due to the presence of the memory mapped registers, it is possible to reconfigure the qLUT during runtime (calibration) based on the target output quality degradation bound. To make this feasible, the accelerator is initially designed with the maximum number of registers that will be able to support at least 1% (or the highest quality supported) quality degradation for a selected set of training inputs. The number of registers that can support 1% quality degradation is a superset of the number of registers that is required to support higher quality degradations such as 2.5% and 5%. Hence, it is possible to support different quality specifications by reconfiguring the vbins definitions during the calibration phase. When supporting lower quality specifications (such as 2.5% and 5%), the unused registers are power gated to save energy consumption.” with different quality degradation bound, 2.5% vs 5%, etc. as configured)
In regard to claim 8, Raha and Smith disclose The computer-implemented method of claim 1, the rejection is incorporated herein.
Raha disclose wherein the first set of feature values is included in a training dataset used to train the machine learning model. (section 3.4 “The high level operation is explained in Algo. 1. It first shows that the total number of vbins is determined from the specified quality degradation bound (extract_bins) using representative sample training sets of input and reference vectors (train_inp_vec and train_ref _vec). Note that this is obtained from a one-time characterization step performed before the start of computation (using software).” “Note that due to the presence of the memory mapped registers, it is possible to reconfigure the qLUT during runtime (calibration) based on the target output quality degradation bound. To make this feasible, the accelerator is initially designed with the maximum number of registers that will be able to support at least 1% (or the highest quality supported) quality degradation for a selected set of training inputs. The number of registers that can support 1% quality degradation is a superset of the number of registers that is required to support higher quality degradations such as 2.5% and 5%. Hence, it is possible to support different quality specifications by reconfiguring the vbins definitions during the calibration phase. When supporting lower quality specifications (such as 2.5% and 5%), the unused registers are power gated to save energy consumption.” Training dataset is selected for training the model based on the quality degradation bound)
In regard to claim 9, Raha and Smith disclose The computer-implemented method of claim 8, the rejection is incorporated herein.
Raha disclose wherein the first set of quantized feature values is associated with at least one of a row in the training dataset or a feature inputted into the machine learning model. (Fig,4, construction of vbins, vbins associated with reference vector)
In regard to claims 11, 12, claims 11, 12 are medium claims corresponding to the method claims 1, 5, above and, therefore, are rejected for the same reasons set forth in the rejections of claims 1, 5.
In regard to claim 13, Raha and Smith disclose The one or more non-transitory computer readable media of claim 12, the rejection is incorporated herein.
Raha disclose wherein the instruction further cause the one or more processors to perform the steps of:
generating a second set of quantized feature values based on the first set of feature values and a second set of quantization levels; determining that a third output generated by the machine learning model based on the second set of quantized feature values does not match the second output, (section 1, “Each vbin represents a range of consecutive input values and is assigned a representative bin value which is used to construct the quantized lookup table (qLUT). The original input vectors are then converted to quantized input vectors using these vbins, and each element of this quantized vector is now used to index into the qLUT for obtaining the expected output result. Since this combination of quantization and binning results in a many-to-one mapping of values, it results in some information loss of due to the reduction of finegrained input values to coarse-grained quantized values.” Section 3.4, “The high level operation is explained in Algo. 1. It first shows that the total number of vbins is determined from the specified quality degradation bound (extract_bins) using representative sample training sets of input and reference vectors (train_inp_vec and train_ref _vec). Note that this is obtained from a one-time characterization step performed before the start of computation (using software). This step can be invoked again during runtime for recalibrating the estimates as mentioned later in Section 3.4.1. Once the number of vbins is determined, we first construct the different vbins as shown in Algo. 2 (create_bins). This function takes into account the probability distribution of the input vectors to determine the appropriate value ranges to be allocated to each bin. Once the vbins are determined and the respective bin values are assigned, they are used to construct the qLUT (series of memory-mapped registers) using the original complex function (fc, e.g., multiplication for dot product and subtraction+multiplication for distance computation)” outputs are generated based on the input with quality degradation) wherein the first set of quantization levels is associated with a higher quantization resolution than the second set of quantization levels. (section 3.3-3.4,
“Similar to prior works [6, 8], the exact approximation configuration (vbins and qLUT size) required for achieving a particular quality degradation bound can be obtained by running the applications multiple times, each time with a different number of vbins, using a randomly selected sample training input vector set.” “Figures 5(b) and (c) demonstrate how the vbins are constructed for different quality requirements. These examples show a naive way of constructing vbins for two different quality degradation bounds, 5% and 1%, respectively. In this approach, the cdf is split into equal vbins based on the probability distribution of the values.” “Note that due to the presence of the memory mapped registers, it is possible to reconfigure the qLUT during runtime (calibration) based on the target output quality degradation bound. To make this feasible, the accelerator is initially designed with the maximum number of registers that will be able to support at least 1% (or the highest quality supported) quality degradation for a selected set of training inputs. The number of registers that can support 1% quality degradation is a superset of the number of registers that is required to support higher quality degradations such as 2.5% and 5%. Hence, it is possible to support different quality specifications by reconfiguring the vbins definitions during the calibration phase. When supporting lower quality specifications (such as 2.5% and 5%), the unused registers are power gated to save energy consumption.” “The high level operation is explained in Algo. 1. It first shows that the total number of vbins is determined from the specified quality degradation bound (extract_bins) using representative sample training sets of input and reference vectors (train_inp_vec and train_ref _vec). Note that this is obtained from a one-time characterization step performed before the start of computation (using software). This step can be invoked again during runtime for recalibrating the estimates as mentioned later in Section 3.4.1. Once the number of vbins is determined, we first construct the different vbins as shown in Algo. 2 (create_bins). This function takes into account the probability distribution of the input vectors to determine the appropriate value ranges to be allocated to each bin. Once the vbins are determined and the respective bin values are assigned, they are used to construct the qLUT (series of memory-mapped registers) using the original complex function (fc, e.g., multiplication for dot product and subtraction+multiplication for distance computation). Subsequently, each original input vector is converted to the quantized values (using conv_vec described in Algo. 3) before being fed into the hardware accelerator for generating the meta-function output” with different quality degradation bound, 2.5% vs 5%, etc. as configured, vbins are determined)
In regard to claim 14, Raha and Smith disclose The one or more non-transitory computer readable media of claim 11, the rejection is incorporated herein.
Raha disclose wherein the instructions further cause the one or more processors to perform the steps of: matching a set of input feature values to one or more quantized feature values included in another mapping within the lookup table; and generating a prediction associated with the second set of feature values based on a corresponding output included in the another mapping. (Fig. 4, section 3.4. “(i) Calibration phase and (ii) Evaluation phase. In the calibration phase, the probability distributions of test input vectors and reference vectors are created and the vbins are reconstructed to be able to adapt to varying input characteristics. For example, depending on the input characteristics, it may be required to not only change the vbins definitions but also the number of vbins itself to satisfy the specified quality bound. Hence, this phase is used to reconfigure the accelerator design (qLUT and vbins). Note that this reconfiguration is possible due to the register-based implementation of qLUT that can be loaded by the software running on the processor using memory mapped I/O. The calibration phase can be invoked periodically (e.g., after every 100 test vectors) or can be invoked on demand when there is a necessity to change the quality mode. Note that the cdf for reference vectors needs to be created just once before the first evaluation phase since it is usually not expected to change throughout program execution. Evaluation phase is when the actual operation occurs. During this phase, the accelerator uses the LUT to determine the estimates for the complex arithmetic functions that are accumulated and finally sent back to the processor for generating the overall result. Note that the calibration phase incurs negligible overhead compared to the runtime of the evaluation phase. Note that due to the presence of the memory mapped registers, it is possible to reconfigure the qLUT during runtime (calibration) based on the target output quality degradation bound.” Generating the estimation based on the mapping of the vbins in the qLUT)
In regard to claim 15, Raha and Smith disclose Raha disclose The one or more non-transitory computer readable media of claim 14, the rejection is incorporated herein.
Raha disclose wherein matching the set of input feature values to the one or more quantized feature values comprises determining that the set of input feature values differs from the one or more quantized feature values by less than a threshold, wherein the threshold is based on the first set of quantization level. (Fig. 4, section 3.3. “Quality configurablity enables the approximation mechanism to support multiple quality modes by modulating the degree of approximation that is required by the user for adapting to variable energy constraints. In our proposed approximation technique, different quality modes can be realized by modulating the total number of vbins and the size of the qLUT. Specifically, higher the number of vbins (or larger the qLUT size), more accurate is the output result. Therefore, the number of vbins acts as the quality control knob in this case. Similar to prior works [6, 8], the exact approximation configuration (vbins and qLUT size) required for achieving a particular quality degradation bound can be obtained by running the applications multiple times, each time with a different number of vbins, using a randomly selected sample training input vector set. This characterization step is a critical part of any dynamic approximation scheme. As a result, quality-configurable execution is achieved in our proposed approximation technique by reconfiguring the qLUT and the definitions of vbins” vbins are generated to achieve a particular quality degradation bound)
In regard to claim 16 , claim 16 is a medium claim corresponding to the method claim 8 above and, therefore, is rejected for the same reasons set forth in the rejections of claim 8.
In regard to claim 17, Raha and Smith disclose The one or more non-transitory computer readable media of claim 11, the rejection is incorporated herein.
Raha disclose wherein the instructions further cause the one or more processors to perform the step of storing another mapping of a second set of quantized feature values to an indicator representing a quantized version of the machine learning model. (section 6.2, section 3.4 1. “By using this technique, we achieved 61% − 74% meta-function-level energy savings on an average which are 12% − 15% more than the reconfigurable register-based LUT version as shown in Figure 14” “For example, depending on the input characteristics, it may be required to not only change the vbins definitions but also the number of vbins itself to satisfy the specified quality bound. Hence, this phase is used to reconfigure the accelerator design (qLUT and vbins). Note that this reconfiguration is possible due to the register-based implementation of qLUT that can be loaded by the software running on the processor using memory mapped I/O.” qLUT with versions based on different configurations)
In regard to claim 18, Raha disclose A computer-implemented method for performing inference associated with a machine learning model, (abstract, table 1,
“Approximate computing has emerged as a popular design paradigm for optimizing the performance and energy consumption of error-resilient applications in domains such as machine learning… “”In this work, we propose a new technique, called quantized table lookup, for approximating the meta-functions used in the core computational kernels of error-resilient applications.”” the proposed technique instead approximates the input data to the meta-functions by reducing/quantizing them to a much smaller set of values that we call quantized inputs. The small number of quantized inputs enables us to completely replace the energy-intensive arithmetic units in the meta-function with small and energy efficient lookup tables (called quantized lookup tables or qLUT) that contain precomputed output values corresponding to the quantized inputs.”) the method comprising:
matching a first set of feature values for the machine learning model to a first set of quantized feature values included in a lookup table representing the machine learning model, wherein the lookup table comprises a plurality of mappings between a plurality of sets of quantized feature values to a plurality of outputs generated by the machine learning model; (Fig. 4, section 3.4.1 “Evaluation phase is when the actual operation occurs. During this phase, the accelerator uses the LUT to determine the estimates for the complex arithmetic functions that are accumulated and finally sent back to the processor for generating the overall result.” Output is generated based on the mapping from the qLUT by the quantized inputs into the algorithm)
retrieving a first output that is mapped to the first set of quantized feature values within the lookup table; and generating prediction output of the machine learning model for the first set of feature values based on the first output. (Fig. 4, section 3.4.1 “Evaluation phase is when the actual operation occurs. During this phase, the accelerator uses the LUT to determine the estimates for the complex arithmetic functions that are accumulated and finally sent back to the processor for generating the overall result.” estimation is generated based on the mapping from the qLUT by the model) based on the output of vbin for the algorithm)
But Raha fail to explicitly disclose “the plurality of outputs generated by the machine learning model, wherein the plurality of mappings is generated based on determining that the plurality of outputs generated by the machine learning model based on the plurality of sets of quantized feature values matches a plurality of second outputs generated by the machine learning model based on a plurality of set of feature values corresponding to the plurality of sets of quantized feature values;”
Smith disclose the plurality of outputs generated by the machine learning model, wherein the plurality of mappings is generated based on determining that the plurality of outputs generated by the machine learning model based on the plurality of sets of quantized feature values matches a plurality of second outputs generated by the machine learning model based on a plurality of set of feature values corresponding to the plurality of sets of quantized feature values; (Fig. 6-7, [0043]-[0049][0076]-[0084]
[0123][0144]-[0158] various resolutions images are generated, the mappings are generated based on features from a GNN to low-resolution images resembling the high-resolution image outputs with various resolutions from multiple images based on the input feature values, the outputs are generated from the NN through different layers and the inputs are based on the latent space representation of the input images, Note: “based on, corresponding to” are very broad, please further define feature values, quantized feature values, what are the differences between them, what are the features, etc. to help move forward the prosecution and please call to discuss if necessary.)
It would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made to incorporate Smith‘s generative NN projection into Raha’s invention as they are related to the same field endeavor of model training and learning based on precision. The motivation to combine these arts, as proposed above, at least because Smith‘s generative NN with different precision levels would help to provide more different precisions training into Raha’s system. Therefore it would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made that providing more different precisions in training ML model would help to improve accuracy of prediction precision and training efficiency.
In regard to claim 19, Raha and Smith disclose The computer-implemented method of claim 18, the rejection is incorporated herein.
Raha disclose wherein the plurality of sets of quantized features included in the lookup table comprise a first set of quantized feature values at a first quantization resolution and a second set of quantized features at a second quantization resolution, wherein the second quantization resolution is higher than the first quantization resolution. (section 3.4 “The high level operation is explained in Algo. 1. It first shows that the total number of vbins is determined from the specified quality degradation bound (extract_bins) using representative sample training sets of input and reference vectors (train_inp_vec and train_ref _vec)” vbins with different specified quality degradations)
In regard to claim 20, Raha and Smith disclose The computer-implemented method of claim 19, the rejection is incorporated herein.
Raha disclose wherein the second quantization resolution is a multiple of the first quantization resolution. (section 3.3, 3.4, ” “Similar to prior works [6, 8], the exact approximation configuration (vbins and qLUT size) required for achieving a particular quality degradation bound can be obtained by running the applications multiple times, each time with a different number of vbins, using a randomly selected sample training input vector set.” “Figures 5(b) and (c) demonstrate how the vbins are constructed for different quality requirements. These examples show a naive way of constructing vbins for two different quality degradation bounds, 5% and 1%, respectively. In this approach, the cdf is split into equal vbins based on the probability distribution of the values.” “Note that due to the presence of the memory mapped registers, it is possible to reconfigure the qLUT during runtime (calibration) based on the target output quality degradation bound. To make this feasible, the accelerator is initially designed with the maximum number of registers that will be able to support at least 1% (or the highest quality supported) quality degradation for a selected set of tra