DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 2/6/2026 has been entered.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101
because the claimed invention is directed to an abstract idea without significantly
more.
When considering subject matter eligibility under 35 U.S.C. 101, it must be
determined whether the claim is directed to one of the four statutory categories of
invention, i.e., process, machine, manufacture, or composition of matter (Step 1). If the
claim does fall within one of the statutory categories, the second step in the analysis is
to determine whether the claim is directed to a judicial exception (Step 2A). The Step 2A
analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined
whether or not the claims recite a judicial exception (e.g., mathematical concepts,
mental processes, certain methods of organizing human activity). If it is determined in
Step 2A, Prong 1 that the claims recite a judicial exception, the analysis proceeds to the
second prong (Step 2A, Prong 2), where it is determined whether or not the claims
integrate the judicial exception into a practical application. If it is determined at step 2A,
Prong 2 that the claims do not integrate the judicial exception into a practical
application, the analysis proceeds to determining whether the claim is a patent-eligible
application of the exception (Step 2B). If an abstract idea is present in the claim, any
element or combination of elements in the claim must be sufficient to ensure that the
claim integrates the judicial exception into a practical application, or else amounts to
significantly more than the abstract idea itself. Applicant is advised to consult the 2019
PEG for more details of the analysis.
Step 1
According to the first part of the analysis, in the instant case, claims 1-10, 11-17, 18-20 are directed to a method, medium of quantizing a ML model and a method of performing inference associated with a ML model. Thus, each of the claims falls within one of the four statutory categories (i.e. process, machine, manufacture, or composition of matter). Step 2A,
Step 2A, Prong 1
Following the determination of whether or not the claims fall within one of the four
categories (Step 1), it must be determined if the claims recite a judicial exception (e.g.
mathematical concepts, mental processes, certain methods of organizing human
activity) (Step 2A, Prong 1). In this case, the claims are determined to recite a judicial
exception as explained below.
Regarding Claims 1, 11 these claims recite
executing a default quantized version of the machine learning model defined by a quantization engine, wherein the quantization engine defines the default quantized version of the machine learning model to be a version, among a plurality of quantized versions of the machine learning model, that has a lowest quantization resolution or lowest resource overhead that falls within a threshold from a most accurate version of the machine learning model based on a plurality of performance metrics for the plurality of quantized versions of the machine learning model; outputting, by the default quantized version, a first output based on a first input comprising a first set of feature values; and where the first output does not match a second output associated with the first set of feature values, storing, by the quantization engine, a first mapping of one or more first feature values included in the first set of feature values to a first quantized version of the machine learning model in a lookup table representing the machine learning model by mapping feature values to outputs of the machine learning model, wherein the first quantized version is associated with a higher quantization resolution than the default quantized version.
Regarding Claim 18 recites
matching, by an inference engine, a first set of feature values for the machine learning model to a second set of feature values included in a lookup table representing the machine learning model by mapping feature values to outputs of the machine learning model, wherein the lookup table comprises a plurality of mappings between a plurality of sets of feature values to a plurality of identifiers for a plurality of quantized versions of the machine learning model; retrieving a first identifier that is mapped to the second set of feature values within the lookup table; and executing, by the inference engine, a first quantized version of the machine learning model that corresponds to the first identifier to the first set of feature values to generate a prediction associated with the first set of feature values, wherein the first quantized version of the machine learning model is a version of the machine learning model among the plurality of quantized versions that has a lowest quantization resolution or lowest resource overhead that falls within a threshold from a most accurate version of the machine learning model
The claims recite a mental process. As set forth in MPEP 2106.04(a)(2)(III)(C), “Claims can recite a mental process even if they are claimed as being performed on a computer”. These are recited at a high level and disclosed as a human user performing these functions, simply using a computer as a tool-see spec, [0021]-[0028], Fig. 1, Thus, the claim recites abstract ideas.
Step 2A, Prong 2
Following the determination that the claims recite a judicial exception, it must be
determined if the claims recite additional elements that integrate the exception into a
practical application of the exception (Step 2A, Prong 2). In this case, after considering
all claim elements individually and as an ordered combination, it is determined that the
claims do not include additional elements that integrate the exception into a practical
application of the exception as explained below.
Step 2A Prong 2:
In Prong Two, a claim is evaluated as a whole to determine whether the recited judicial exception is integrated into a practical application of that exception. A claim is not “directed to” a judicial exception, and thus is patent eligible, if the claim as a whole integrates the recited judicial exception into a practical application of that exception. A claim that integrates a judicial exception into a practical application will apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the judicial exception. MPEP 2106.04(d).
Regarding Claims 1, 11 and 18 these claims
This limitation recites using one or more neural networks as a tool to perform an
abstract idea, which is not indicative of integration into a practical application. MPEP 2106.05(f).)
This limitation recites using/applying the neural network as a tool to perform an abstract idea which is not indicative of integration into a practical application" MPEP 2106.05(f).)
MPEP § 2106.05(a): Improvements to the functioning of a computer or any other Technology or Technical Field. Does the specification include
a technical explanation of an asserted improvement, (Yes)
and does the claim reflect the particular way of achieving that improvement? (No)
These claims recite an abstract idea and further the claims as a whole does not integrate the recited judicial exception into a practical application of the exception. A claim that integrates a judicial exception into a practical application will apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the judicial exception. MPEP 2106.04(d)
MPEP § 2106.05(f): Mere Instructions to Apply an Exception. Do the additional element(s) amount to merely the words “apply it” (or an equivalent)
or are mere instructions to implement an abstract idea or other exception on a computer? (yes)
Step 2B
Based on the determination in Step 2A of the analysis that the claims are
directed to a judicial exception, it must be determined if the claims contain any element
or combination of elements sufficient to ensure that the claim amounts to significantly
more than the judicial exception (Step 2B). In this case, after considering all claim
elements individually and as an ordered combination, it is determined that the claims do
not include additional elements that are sufficient to amount to significantly more than
the judicial exception for the same reasons given above in the Step 2A, Prong 2
analysis. Furthermore, each additional element identified above as being insignificant
extra-solution activity is also well-known, routine, conventional as described below.
Step 2B:
Claims 1, 11 and 18:
The claims do not include additional elements, alone or in combination, that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than generic computing components and field of use/technological environment which do not amount to significantly more than the abstract idea. The underlying concept merely receives information, analyzes it, and store the results of the analysis – this concept is not meaningfully different than concepts found by the courts to be abstract (see Electric Power Group, collecting information, analyzing it, and displaying certain results of the collection and analysis; see Cybersource, obtaining and comparing intangible data; see Digitech, organizing information through mathematical correlations; see Grams, diagnosing an abnormal condition by performing clinical tests and thinking about the results; see Cyberfone, using categories to organize store and transmit information; see Smartgene, comparing new and stored information and using rules to identify options).
MPEP § 2106.05(g): Insignificant Extra-Solution Activity. Do the additional element(s) add more than activities that are incidental to the primary
process or product or merely a nominal or tangential addition to the claim? (no)
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as a combination do not amount to significantly more than the abstract idea. For example, claims 1 and 11 recite “executing..”, “outputting…”, etc. and claim 18 recites “matching…”, retrieving…”, “executing…”, etc. These elements are recited at a high level of generality and are well-understood, routine, and conventional activities in the computer art. Generic computers performing generic computer functions, without an inventive concept, do not amount to significantly more than the abstract idea. Looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims do not amount to significantly more than the abstract idea itself.
Step 2A/2B Prong 2 Dependent Claims
Regarding to claim 2, 12
Claims 2 and 12 merely recite other additional elements that determining output of the ML model which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 3, 5, 13
Claims 3, 5 and 13 merely recite other additional elements that selecting a quantized version of the ML model which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 4, 14
Claim 4, 14 merely recite other additional elements that determining output of the ML model and selecting a version of the ML model based on condition which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 6
Claim 6 merely recite other additional elements that determining output of the ML model based on the version and the feature values which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 7
Claim 7 merely recite other additional elements that selecting the version of the ML model which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 8, 16
Claims 8 and 16 merely recite other additional elements that selecting the version of the ML model based on conditions which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed. individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 9, 17, 19
Claims 9, 17, 19 merely recite other additional elements that defining feature values which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 10
Claim 10 merely recite other additional elements that defining output which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 15
Claim 15 merely recite other additional elements that determining output of the ML model based on the version and the feature values which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 20
Claim 20 merely recite other additional elements that defining lookup table which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1, 11 and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention.
Claim 1, 11 and 18 recite “a second output”, but there is no corresponding input associated with it and there are different possibilities, it may use the first input or other input, therefore it is indefinite for failing to particularly point out and distinctly claim the subject matter. For the purpose of examination, the claim is interpreted with BRI. The examiner would suggest the applicant to further define the claim limitations to help move forward the prosecution.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Donnelly US 2022/0044109 in view of Mathur US 2021/0281662 and Burger et al. (Burger) US 2019/0340492
In regard to claim 1, Donnelly disclose A computer-implemented method for quantizing a machine learning model, the method comprising: ([0007]-[0018] quantizing ML models)
executing a default quantized version of the machine learning model defined by a quantization engine, (Fig. 1, [0007]-[0018] [0027]-[0030] [0038]-[0061] execute a quantized NN with a first precision defined by the quantization engine) wherein the quantization engine defines the default quantized version of the machine learning model to be a version, among a plurality of quantized versions of the machine learning model, (Fig. 1, [0007]-[0018] [0027]-[0030] [0038]-[0061] the quantization engine defines various precisions of the quantized NN and identify the first precision for the quantized NN to execute) based on a plurality of performance metrics for the plurality of quantized versions of the machine learning model; (Fig. 1, [0007]-[0018] [0027]-[0030] [0038]-[0059] execute a quantized NN with a first precision defined by the quantization engine, generate quantized layer output using the current values of the quantization range of the NN layer based on the weights, range, error, accuracy, score, parameters etc. to update the quantization range of the quantized NN layer to generate a second, higher precision, etc.)
outputting, by the default quantized version, a first output based on a first input comprising a first set of feature values; ([0033]-[0064] generate output based on the input with variables which characterize the input) and
where the first output does not match a second output associated with the first set of feature values; ([0033]-[0065]. the outputs are different associated with the variables which characterize the input. Note: please further define the second output which is not clear how it is generated, it can has a different input or use the first input, and what is the relationship between the first output and the second output, etc. please call to discuss if necessary)
storing, by the quantization engine, a first mapping of one or more first feature values included in the first set of feature values to a first quantized version of the machine learning model by mapping feature values to outputs of the machine learning model, wherein the first quantized version is associated with a higher quantization resolution than the default quantized version, ([0027]-[0038] [0044][0045] [0053]-[0059] data store stores current parameters for quantization ranges, accuracy, error, score, values corresponding to the character or properties and classifications associated with the particular precision and associate character or properties and classifications values from the generated outputs to classify the objects in the category and the second version has higher precision than the first version)
But Donnelly fail to explicitly disclose “storing the first mapping of the one or more first feature values to the first quantized version of the machine learning model in a lookup table representing the machine learning model,”
Mathur disclose storing the first mapping of the one or more first feature values to the first quantized version of the machine learning model in a lookup table representing the machine learning model, ([0018]-[0026][0030]-[0040] claim 13, store the parameters associated with each ML model in a data structure, such as table, etc, with an entry or row associated with each ML model of ML models)
It would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made to incorporate Mathur’s method of ML configuration parameters’ store and retrieval into Donnelly’s invention as they are related to the same field endeavor of versions of ML models. The motivation to combine these arts, as proposed above, at least because Mathur’s storing ML configuration parameters into tables would help to provide more information storage method into Donnelly’s system. Therefore it would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made that storing ML configuration parameters in data structure would help to provide more useful information related to ML version to use.
But Donnelly and Mathur fail to explicitly “defines the default quantized version of the machine learning model to be the version, that has a lowest quantization resolution or lowest resource overhead that falls within a threshold from a most accurate version of the machine learning model;”
Burger disclose defines the default quantized version of the machine learning model to be the version, that has a lowest quantization resolution or lowest resource overhead that falls within a threshold from a most accurate version of the machine learning model; ([0053] [0118]-[0131] define the quantized version of the NN by comparing with a predetermined threshold of the accuracy to (the lowest accuracy) or reduced resource usage as sufficient accuracy)
It would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made to incorporate Burger’s quantized NN into Mathur and Donnelly’s invention as they are related to the same field endeavor of versions of ML models. The motivation to combine these arts, as proposed above, at least because Burger’s quantizing NN with various versions would help to provide more NN quantization into Mathur and Donnelly’s system. Therefore it would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made that providing more NN quantization with various versions would help to improve tradeoff between accuracy of training and the resource usage.
In regard to claim 2, Donnelly and Mathur, Burger disclose The computer-implemented method of claim 1, the rejection is incorporated herein.
Donnelly further disclose comprising determining that a third output generated by the first quantized version based on the first set of feature values matches the second output prior to storing the first mapping in the lookup table. (Fig. 1, [0009]-[0018] [0027]-[0030] [0032]-[0036] [0038]-[0063] generate a third quantized layer output based on take a value from the same set of possible values defined by the min./max. value., accuracy, score, etc. for example corresponding to the second precision and only maintain quantized versions of the weight tensors)
In regard to claim 3, Donnelly and Mathur, Burger disclose The computer-implemented method of claim 2, the rejection is incorporated herein.
Donnelly further disclose wherein the first quantized version is selected to have a lowest quantization resolution among a subset of the plurality of quantized versions of the machine learning model that generates the third output based on the first set of feature values. (Fig. 1, [0009]-[0018] [0027]-[0030] [0032]-[0036] [0038]-[0063] the first version has the lowest precision among the versions of ML model based on the same set of possible values defined by the min./max. value., accuracy, score, etc. for example)
In regard to claim 4, Donnelly and Mathur, Burger disclose The computer-implemented method of claim 1, the rejection is incorporated herein.
Donnelly further disclose further comprising:
determining that a third output generated by the default quantized version based on a second set of feature values matches a fourth output associated with the second set of feature values; (Fig. 1, [0009]-[0018] [0027]-[0030] [0032]-[0036] [0038]-[0063] generate another quantized layer output based on take a value from the same set of possible values defined by the min./max. value., accuracy, score, etc. for example corresponding to the fourth precision)
selecting a second quantized version of the machine learning model that generates the third output based on the second set of feature values, wherein the second quantized version is associated with a lower quantization resolution than the default quantized version; (Fig. 1, [0015]-[0018] [0027]-[0030] [0032]-[0036] [0038]-[0059] update the quantization range of the quantized NN layer to generate the third output based on the set of possible values defined by the min./max. value., accuracy, score, etc. having a second, lower precision) and
storing a second mapping of one or more second feature values included in the second set of feature values to the second quantized version in the lookup table. ([0032]-[0036] [0044][0045] [0053]-[0059] [0094]-[0095] data store stores current parameters for quantization ranges, values, etc. corresponding to the second version)
In regard to claim 5, Donnelly and Mathur, Burger disclose The computer-implemented method of claim 4, the rejection is incorporated herein.
Donnelly further disclose wherein the second quantized version is selected to have a lowest quantization resolution among a subset of the plurality of quantized versions of the machine learning model that generates the third output based on the second set of feature values. (Fig. 1, [0015]-[0018] [0027]-[0030][0032]-[0036] [0038]-[0059] update the quantization range of the quantized NN layer to generate the third output based on the set of possible values defined by the min./max. value., score, accuracy, etc. having a lowest precision from various precisions)
In regard to claim 6, Donnelly and Mathur, Burger disclose The computer-implemented method of claim 1, the rejection is incorporated herein.
Donnelly further disclose further comprising:
determining that a second set of feature values for the machine learning model is not stored in the lookup table; ([0041]-[0059] only maintain quantized versions of the trained weight tensor) and
applying the default quantized version to the second set of feature values to produce a third output. ([0015]-[0018] [0027]-[0030] [0032]-[0036][0038]-[0059] [0070]-[0079] generate quantized layer output using the values of the quantization range of the NN layer based on the weights, range, accuracy, parameters etc. of the first precision)
In regard to claim 7, Donnelly and Mathur, Burger disclose The computer-implemented method of claim 1, the rejection is incorporated herein.
Donnelly further disclose wherein selecting the default quantized version comprises determining that the default quantized version has a highest performance metric within the plurality of performance metrics. ([0009]-[0018] [0027]-[0030][0055]-[0076] determine the quantized version having the min./max values, error, cost, range, etc.)
In regard to claim 8, Donnelly and Mathur, Burger disclose The computer-implemented method of claim 1, the rejection is incorporated herein.
Donnelly further disclose wherein selecting the default quantized version comprises determining that the default quantized version has a performance metric that is within a threshold of a highest performance metric included in the plurality of performance metrics. [0009]-[0018] [0027]-[0030][0055]-[0076] determine the quantized version having ranges within the min. and max values, error, cost, range, etc.)
In regard to claim 9, Donnelly and Mathur, Burger disclose The computer-implemented method of claim 1, the rejection is incorporated herein.
Donnelly further disclose wherein the one or more first feature values included in the first mapping comprise at least one of a range of feature values, a quantized feature value, or multiple feature values for a single feature. [0009]-[0018] [0027]-[0037][0055]-[0076] determine the quantized version having ranges within the min. and max values, error, cost, accuracy, range, etc.)
In regard to claim 10, Donnelly and Mathur, Burger disclose The computer-implemented method of claim 1, the rejection is incorporated herein.
Donnelly further disclose wherein the second output comprises at least one of a label associated with the first set of feature values or an output generated by the machine learning model based on the first set of feature values. (Fig. 1, [0015]-[0018] [0027]-[0037] [0038]-[0059] generate quantized layer output having the first precision based on the weights, range, error, score, accuracy, parameters, etc. by the ML model)
In regard to claim 11, Donnelly disclose One or more non-transitory computer readable media storing instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of: ([0111]-[0117] medium, processor with instructions and storage)
executing a default quantized version of a machine learning model defined by a quantization engine, (Fig. 1, [0007]-[0018] [0027]-[0030] [0038]-[0061] execute a quantized NN with a first precision defined by the quantization engine) wherein the quantization engine defines the default quantized version of the machine learning model to be a version from a plurality of quantized versions of the machine learning model based on a plurality of performance metrics for the plurality of quantized versions; (Fig. 1, [0007]-[0018] [0027]-[0030] [0038]-[0061] the quantization engine defines various precisions of the quantized NN and identify the first precision for the quantized NN to execute) based on a plurality of performance metrics for the plurality of quantized versions; (Fig. 1, [0007]-[0018] [0027]-[0030] [0038]-[0059] execute a quantized NN with a first precision defined by the quantization engine, generate quantized layer output using the current values of the quantization range of the NN layer based on the weights, range, error, accuracy, score, parameters etc. to update the quantization range of the quantized NN layer to generate a second, higher precision, etc.)
outputting, by the default quantized version, a first output based on a first input comprising a first set of feature values; ([0033]-[0064] generate output based on the input with variables which characterize the input) and
where the first output matches a second output associated with the first set of feature values; ([0033]-[0065]. the outputs are with scores for a category with classes of objects associated with the variables which characterize the input. The outputs may belong to a category if it depicts an object included in the object class corresponding to the category. Note: please further define the second output which is not clear how it is generated, it can has a different input or use the first input, please call to discuss if necessary)
storing, by the quantization engine, a first mapping of one or more first feature values included in the first set of feature values to a first quantized version of the machine learning model by mapping feature values to outputs of the machine learning model, wherein the first quantized version is associated with a lower quantization resolution than the default quantized version, ([0027]-[0038] [0044][0045] [0053]-[0059] data store stores current parameters for quantization ranges, accuracy, error, score, values corresponding to the character or properties and classifications associated with the particular precision and associate character or properties and classifications values from the generated outputs to classify the objects in the category and the second version has lower precision than the first version)
But Donnelly fail to explicitly disclose “storing the first mapping of the one or more first feature values to the first quantized version of the machine learning model in a lookup table representing the machine learning model,”
Mathur disclose storing the first mapping of the one or more first feature values to the first quantized version of the machine learning model in a lookup table representing the machine learning model. ([0040] [0018]-[0026] claim 13, store the parameters associated with each ML model in a data structure, such as table, etc, with an entry or row associated with each ML model of ML models)
It would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made to incorporate Mathur’s method of ML configuration parameters’ store and retrieval into Donnelly’s invention as they are related to the same field endeavor of versions of ML models. The motivation to combine these arts, as proposed above, at least because Mathur’s storing ML configuration parameters into tables would help to provide more information storage method into Donnelly’s system. Therefore it would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made that storing ML configuration parameters in data structure would help to provide more useful information related to ML version to use.
But Donnelly and Mathur fail to explicitly “defines the default quantized version of the machine learning model to be the version, that has a lowest quantization resolution or lowest resource overhead that falls within a threshold from a most accurate version of the machine learning model;”
Burger disclose defines the default quantized version of the machine learning model to be the version, that has a lowest quantization resolution or lowest resource overhead that falls within a threshold from a most accurate version of the machine learning model; ([0053] [0118]-[0131] define the quantized version of the NN by comparing with a predetermined threshold of the accuracy to (the lowest accuracy) or reduced resource usage as sufficient accuracy)
It would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made to incorporate Burger’s quantized NN into Mathur and Donnelly’s invention as they are related to the same field endeavor of versions of ML models. The motivation to combine these arts, as proposed above, at least because Burger’s quantizing NN with various versions would help to provide more NN quantization into Mathur and Donnelly’s system. Therefore it would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made that providing more NN quantization with various versions would help to improve tradeoff between accuracy of training and the resource usage.
In regard to claims 12-13, claims 12-13 are media claims corresponding to the method claims 2-3 above and, therefore, are rejected for the same reasons set forth in the rejections of claims 2-3.
In regard to claim 14, Donnelly and Mathur, Burger disclose The one or more non-transitory computer readable media of claim 11, the rejection is incorporated herein.
Donnelly further disclose wherein the instructions further cause the one or more processors to perform the steps of:
determining that a third output generated by the default quantized version based on a second set of feature values does not match a fourth output associated with the second set of feature values; (Fig. 1, [0009]-[0018] [0027]-[0036] [0038]-[0063] generate another quantized layer output based on take a value from the set of possible values defined by the min./max. value., accuracy, error, score, etc. for example corresponding to the different precisions)
selecting a second quantized version of the machine learning model that generates the fourth output based on the second set of feature values, wherein the second quantized version is associated with a higher quantization resolution than the default quantized version; (Fig. 1, [0015]-[0018] [0027]-[0036] [0038]-[0059] update the quantization range of the quantized NN layer to generate the third output based on the set of possible values defined by the min./max. value., accuracy, error, score, etc. having a second, higher precision) and
storing a second mapping of one or more second feature values included in the second set of feature values to the second quantized version in the lookup table. ([0027]-[0038] [0044][0045] [0053]-[0059] data store stores current parameters for quantization ranges, accuracy, error, score, etc. corresponding to the second version)
In regard to claim 15, Donnelly and Mathur, , Burger disclose The one or more non-transitory computer readable media of claim 14, the rejection is incorporated herein.
Donnelly further disclose wherein the instructions further cause the one or more processors to perform the steps of:
determining that a third output generated by the default quantized version based on a second set of feature values matches a fourth output associated with the second set of feature values; (Fig. 1, [0009]-[0018] [0027]-[0036] [0038]-[0063] generate another quantized layer output based on take a value from the same set of possible values defined by the min./max. value. accuracy, error, score, etc., for example corresponding to the further precision) and
storing a second mapping of one or more second feature values included in the second set of feature values to the default quantized version in the lookup table. ([0044][0045] [0053]-[0059] data store stores current parameters for quantization ranges, accuracy, error, score, etc. corresponding to the current version)
In regard to claims 16-17, claims 16-17 are media claims corresponding to the method claims 8-9 above and, therefore, are rejected for the same reasons set forth in the rejections of claims 8-9.
In regard to claim 18, Donnelly disclose A computer-implemented method for performing inference associated with a machine learning model, ([0007]-[0018][0032][0047] inference of ML model) the method comprising:
Donnelly disclose matching, a first set of feature values for the machine learning model to a second set of feature values by mapping feature values to outputs of the machine learning model, ([0027]-[0038] [0044][0045] [0053]-[0059] values corresponding to the character or properties and classifications associate character or properties and classifications values from the generated outputs to classify the objects in the category)
execute, by the inference engine, a first quantized version of the machine learning model to the first set of feature values to generate a prediction associated with the first set of feature values, ([0006] [0033]-[0036] [0044]-[0050] deploying the first version of the ML corresponding to the version corresponding to the current values of the quantization range of the NN layer based on the weights, range, error, parameters etc. and generate prediction output) wherein the quantization engine defines the default quantized version of the machine learning model is a version of the machine learning model among a plurality of quantized versions of the machine learning model, (Fig. 1, [0007]-[0018] [0027]-[0030] [0038]-[0061] the quantization engine defines various precisions of the quantized NN and identify the first precision for the quantized NN to execute)
But Donnelly fail to explicitly disclose “matching, by an inference engine, the first set of feature values for the machine learning model to the second set of feature values included in a lookup table representing the machine learning model, wherein the lookup table comprises a plurality of mappings between a plurality of sets of feature values to a plurality of identifiers for a plurality of quantized versions of the machine learning model; retrieving a first identifier that is mapped to the second set of feature values within the lookup table; and the first quantized version of the machine learning model that corresponds to the first identifier.”
Mathur disclose matching, by an inference engine, a first set of feature values for the machine learning model to a second set of feature values included in a lookup table representing the machine learning model, wherein the lookup table comprises a plurality of mappings between a plurality of sets of feature values to a plurality of identifiers for a plurality of quantized versions of the machine learning model; ([0009]-[0013] [0018]-[0026] [0040] claim 13, the metadata, values related to the ML model are stored in an entry or row of the table or linked list, etc. for each ML model, such as the version with a serialization number, identifier or name of the ML model, configuration parameters, etc. all are correlated with the ML model identifier by the ML model with the inference request, it implicitly discloses the matching and this is well known to the people with the skill of the art.)
retrieving a first identifier that is mapped to the second set of feature values within the lookup table; ([0009]-[0013] [0018]-[0026] [0040] claim 13, identifies the model and retrieve the model specific configuration parameters, values related to the ML model are stored in an entry or row of the table or linked list, etc. for each ML model, such as the version with a serialization number, identifier or name of the ML model, configuration parameters, etc. all are correlated with the ML model identifier can be retrieved. it implicitly discloses the mapped and this is well known to the people with the skill of the art.) and
the first quantized version of the machine learning model that corresponds to the first identifier ([0009]-[0013] [0018]-[0026] [0040] claim 13, identifies the model and retrieve the model specific configuration parameters, values related to the ML model are stored in an entry or row of the table or linked list, etc. for each ML model, such as the version with a serialization number, identifier or name of the ML model)
It would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made to incorporate Mathur’s method of ML configuration parameters’ store and retrieval into Donnelly’s invention as they are related to the same field endeavor of versions of ML models. The motivation to combine these arts, as proposed above, at least because Mathur’s storing ML configuration parameters into tables would help to provide more information storage method into Donnelly’s system. Therefore it would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made that storing ML configuration parameters in data structure would help to provide more useful information related to ML version to use.
But Donnelly and Mathur fail to explicitly “defines the default quantized version of the machine learning model to be the version, that has a lowest quantization resolution or lowest resource overhead that falls within a threshold from a most accurate version of the machine learning model;”
Burger disclose defines the default quantized version of the machine learning model to be the version, that has a lowest quantization resolution or lowest resource overhead that falls within a threshold from a most accurate version of the machine learning model; ([0053] [0118]-[0131] define the quantized version of the NN by comparing with a predetermined threshold of the accuracy to (the lowest accuracy) or reduced resource usage as sufficient accuracy)
It would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made to incorporate Burger’s quantized NN into Mathur and Donnelly’s invention as they are related to the same field endeavor of versions of ML models. The motivation to combine these arts, as proposed above, at least because Burger’s quantizing NN with various versions would help to provide more NN quantization into Mathur and Donnelly’s system. Therefore it would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made that providing more NN quantization with various versions would help to improve tradeoff between accuracy of training and the resource usage.
In regard to claim 19, Donnelly and Mathur, Burger disclose The computer-implemented method of claim 18, the rejection is incorporated herein.
Donnelly further disclose wherein the plurality of sets of feature values included in the lookup table comprise at least one of a range of feature values, a quantized feature value, or multiple feature values for a single feature. ( [0009]-[0018] [0027]-[0030][0045] [0052]-[0076] values in the data store having ranges within the min. and max values, error, cost, range, etc.)
In regard to claim 20, Donnelly and Mathur, Burger disclose The computer-implemented method of claim 18, the rejection is incorporated herein.
But Donnelly and Burger fail to explicitly disclose “wherein the lookup table further comprises a second identifier for a default quantized version of the machine learning model.”
Mathur disclose wherein the lookup table further comprises a second identifier for a default quantized version of the machine learning model. ([0040] [0018]-[0026] claim 13, store the parameters associated with each ML model in a data structure, the version with a serialization number, identifier or name of the ML model.)
It would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made to incorporate Mathur’s method of ML configuration parameters’ store and retrieval into Burger and Donnelly’s invention as they are related to the same field endeavor of versions of ML models. The motivation to combine these arts, as proposed above, at least because Mathur’s storing ML configuration parameters into tables would help to provide more information storage method into Burger and Donnelly’s system. Therefore it would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made that storing ML configuration parameters in data structure would help to provide more useful information related to ML version to use.
Response to Arguments
Applicant’s arguments with respect to claims 1-20 filed on 2/6/2026 have been considered but are moot because the arguments do not apply to the current rejection.
Please see above for the detailed 35 USC § 101 rejection.
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure.
U.S. Patent Documents PATENT DATE INVENTOR(S) TITLE
US 20210081806 A1 2021-03-18 Chai et al.
USING A RUNTIME ENGINE TO FACILITATE DYNAMIC ADAPTATION OF DEEP NEURAL NETWORKS FOR EFFICIENT PROCESSING
Chai et al. disclose The disclosed embodiments relate to a system that facilitates dynamic runtime execution of a deep neural network (DNN). During operation, the system receives a model, a set of weights and runtime metadata for the DNN. The system also obtains code to perform inference-processing operations for the DNN. Next, the system compiles code to implement a runtime engine that facilitates throttling operations during execution of the inference-processing operations, wherein the runtime engine conserves computing resources by selecting portions of the inference-processing operations to execute based on the runtime metadata… see abstract.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XUYANG XIA whose telephone number is (571)270-3045. The examiner can normally be reached Monday-Friday 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached at 571-272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
XUYANG XIA
Primary Examiner
Art Unit 2143
/XUYANG XIA/Primary Examiner, Art Unit 2143