Notice of Pre-AIA or AIA Status
This action is in response to the amendments filed on 02/05/2026. Claims 1-27 are pending in the application and have been examined.
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 02/05/2026 has been entered.
The status of the claims is as follows.
Claims 1, 8, 15 and 22 are amended. Claims 1-27 are currently pending.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6; 8-13; 15-17, 20; 22-27 are rejected under 35 U.S.C. 103 as being unpatentable by Wang et al. (US20190050710A1, hereinafter “Wang”) in view of Li et al. (CN114065902A, hereinafter “Li”).
Regarding Claim 1,
Wang discloses a processor, comprising: one or more circuits; (Wang [0033]; “The various components shown in FIG. 2 may be implemented in hardware, software, firmware, including one or more signal processing and/or application specific integrated circuits, or a combination of thereof.”)
iteratively adjust precision of one or more parameters associated with one or more portions of one or more neural networks based, at least in part, on one or more performance metrics of the one or more portions …; (Wang [0064]; “As discussed above, in some embodiments, the predefined measure of information loss is the Jensen-Shannon Divergence that measures the difference between two statistical distributions. In this case, the statistical distributions are the collection of full layer responses for all layers (or in respective layers) in the full-precision trained model (e.g., model 106′ or 106″), and the quantized candidate model with a particular combination of bit-widths for its layers” wherein the Jensen-Shannon defined information loss reads on performance metrics of portions of the neural network. Notably, this information loss
Wang [0045]; “The training process typically includes two steps, forward propagation and backward propagation, that are repeated multiple times until a predefined convergence condition is met. In the forward propagation, the set of weights for different layers are applied to the input data and intermediate results from the previous layers. In the backward propagation, the margin of error of the output is measured, and the weights are adjusted accordingly to decrease the error. The activation function can be linear, rectified linear unit, sigmoid, hyperbolic tangent, or other types.” wherein the sets of weights of different layers read on weight parameters associated with portions of the neural network; wherein the weights are adjusted based on error margin of the output in backpropagation, reading on adjusting precision of weight parameters based on performance metrics.
Wang [0005]; “preferred values of the respective reduced bit-widths are determined through multiple iterations of forward propagation through the first neural network model using a validation data set while each of two or more layers of the first neural network model is expressed with different degrees of quantization corresponding to different reduced bit-widths until a predefined information loss threshold is met by respective response statistics of the two or more layers; and generating a reduced neural network model that includes the plurality of layers, wherein each layer of two or more the plurality of layers includes a respective set of quantized parameters, and each quantized parameter is expressed with the preferred values of the respective reduced bit-widths for the layer as determined through the multiple iterations” which reads on adjusting the precision of weight parameters iteratively
Wang [Figure 5];
PNG
media_image1.png
534
508
media_image1.png
Greyscale
wherein the information loss interpreted as a performance metric of portions of the neural network is used during multiple iterations of forward propagation comprising different degrees of quantization to achieve preferred reduced bit-widths until an information loss threshold Is met, thus reading on iteratively adjusting precision (forward propagation comprising in part quantization for reduced bit-widths) of one or more parameters associated with one or more portions of the neural network (network layer parameters upon which the quantization and bit-width reduction is performed) based, at least in part, on one or more performance metrics (based on the information loss interpreted as a performance metric)) during training iterations of the one or more neural networks (forward propagation to determine preferred values of the respective reduced bit-widths across multiple iterations inherently reading on training iterations of the neural network)
and perform the respective training iterations to update … one or more parameters of different portions of the one or more neural networks having differently adjusted precisions during the respective training iterations (Wang [Figure 5];
PNG
media_image1.png
534
508
media_image1.png
Greyscale
wherein generating the reduced neural network model including quantized parameters expressed with the preferred values of the respective reduced bit-widths for the layer as determined through the multiple training iterations reads on updating parameters of different portions of the one or more neural networks having differently adjusted precisions during training iterations)
Wang fails to explicitly disclose but Li discloses to iteratively adjust precision … prior to respective training iterations of the one or more neural networks (Li [Page 4 Lines 14-27]; “In view of at least one problem of the prior art, the present invention provides a layer training method of a neural network, the layer including one or more weight parameters stored with a first accuracy, the training method comprising:
s101: providing an input to the layer;
s102: selecting a second precision that is lower in precision than the first precision;
s103: quantizing the one or more weight parameters at the second precision;
s104: computing an output of the layer based on the input with the one or more weight parameters at a second precision;
s105: updating, by a back-propagation algorithm, the one or more weight parameters with a first precision.
According to an aspect of the invention, the layer training method further comprises: and adjusting the second precision, and repeating the steps S101, S102, S103 and S104.
… According to an aspect of the invention, the layer training method further comprises: computing an output of the layer based on the input with the one or more weight parameters of the first precision.
According to an aspect of the invention, the layer training method further comprises: all layers of the neural network are trained through the steps S101, S102, S103, S104 and S105” wherein examiner interprets the analysis and selection of a second updated precision (S102) prior to layer output computation and weight parameter update (S104,S105) as adjusting of a selected precision prior to a training iteration, and the repetition of steps 101-105 thus read on the repeated selection of precisions prior to training across iterations)
perform the respective training iterations to update via backpropagation one or more parameters of different portions of the one or more neural networks having differently adjusted precisions (Li [Page 4 Lines 20-21]; “s105: updating, by a back-propagation algorithm, the one or more weight parameters with a first precision.” wherein the iterations perform their update of the one or more parameters via backpropagation)
It would have been obvious to modify Wang’s precision adjustments for computing model parameter updates to be performed prior to the model parameter update training iterations in a similar fashion to Li’s precision adjustments. One would have been motivated to do so because “Different levels of accuracy required in the application must be specified during the model preparation phase, and the model is trained for each level of progress.” (Li [Page 4 Lines 9-10])
Regarding Claim 2,
Wang/Li teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Wang/Li further discloses wherein the one or more performance metrics indicate sensitivity of the one or more portions of the one or more neural networks for quantization; (Wang [0064];“As discussed above, in some embodiments, the predefined measure of information loss is the Jensen-Shannon Divergence that measures the difference between two statistical distributions. In this case, the statistical distributions are the collection of full layer responses for all layers (or in respective layers) in the full-precision trained model (e.g., model 106′ or 106″), and the quantized candidate model with a particular combination of bit-widths for its layers” wherein the Jensen-Shannon divergence between a full-precision trained model and a finer granularity precision bit-width quantized model reads on a performance metric indicating the sensitivity of the neural network to quantization.
Wang [0062]; “The adaptive bit-width of the model refers to the characteristic that the respective bit-width for storing the set of parameters (e.g., weights and bias) for each layer of the model is specifically selected for that set of parameters (e.g., in accordance with the distribution and range of the parameters). Specifically, the validation data set is used as input in a forward pass through the pruned slender full-precision network (e.g., model 106″), and the statistical distribution of the response values in each layer is collected. Then different configurations of bit-width and layer combinations are prepared as candidates for evaluation. For each candidate model, the validation data set is used as input in a forward pass through the candidate model, and the statistical distribution of the response values in each layer is collected. Then, the candidate is evaluated based on the amount of information loss that has resulted from the quantization applied to the candidate model. In some embodiments, Jensen-Shannon divergence between the two statistical distributions for each layer (or for the model as a whole) is used to identify the optimal bit-widths with the least information loss for that layer.” wherein the parameter set includes an evaluation of the quantization process on the performance of the model during the forward pass of forward propagation, thus reading on the performance metrics indicating a sensitivity of the network to quantization during training.)
Regarding Claim 3,
Wang/Li teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Wang/Li further discloses wherein the one or more portions comprise one or more layers of the one or more neural networks; (Wang [0004]; “This disclosure describes a technique for producing a high-accuracy, lightweight machine learning model with adaptive bit-widths for the parameters of different layers of the model. Specifically, the conventional training phase is modified to promote parameters of each layer of the model toward integer values within an 8-bit range.” wherein the portions of the neural networks being trained comprise layers of the neural networks)
Regarding Claim 4,
Wang/Li teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Wang/Li further discloses wherein the one or more circuits are to adjust the precision of the weight parameters by at least changing a representation of weight parameters for a first layer of the one or more neural networks at a first time and changing a representation of weight parameters for a second layer of the one or more neural networks at a second time; (Wang [0019]; “In some embodiments, during training, integer (INT) weight regularization and 8-bit quantization techniques are applied to push the values of the full-precision parameters of the deep learning model 106 toward their corresponding integer values, and reduce the value ranges of the parameters such that they fall within the dynamic range of a predefined reduced maximum bit-width (e.g., 8 bits).” wherein the backpropagation-based training process involves changing weights of multiple layers in a neural network by iterating backward from the last layer to the first layer in the neural network, all the while updating its weight parameters throughout the iteration. As such, the back-propagation process of pushing full-precision parameters through iterative INT weight regularization and 8-bit quantization reads on changing the weight parameter representation for first and second layers of the multiple layers iterated throughout backward propagation; wherein back-propagation occurs through a backward pass of the neural network layers, thus reading on layers being sequentially parsed in reverse at different times and interpreted as a first and second time for weight parameter representations to be updated)
Regarding Claim 5,
Wang/Li teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Wang/Li further discloses wherein the precision of the weight parameters is indicated at least through one or more numbers of bits; (Wang [0018]; “As shown in FIG. 1, the model generation system 102 generates a full-precision deep learning model 106 (e.g., a CNN with two or more hidden layers, and with its parameters (e.g., weights and biases) expressed in a single-precision floating point format that occupies 32 bits (e.g., FP32)) through training using a corpus of training data 108 (e.g., input data with corresponding output data). As used herein, “full-precision” refers to floating-point precision, and may include half-precision (16-bit), single precision (32-bit), double-precision (64-bit), quadruple-precision (128-bit), Octuple-precision (256-bit), and other extended precision formats (40-bit or 80-bit)”)
Regarding Claim 6,
Wang/Li teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Wang/Li further discloses wherein the one or more circuits are to iteratively adjust the precision of the weight parameters through one or more binary integer linear programming (BILP) processes; (Wang [0045]; “The training process typically includes two steps, forward propagation and backward propagation, that are repeated multiple times until a predefined convergence condition is met. In the forward propagation, the set of weights for different layers are applied to the input data and intermediate results from the previous layers. In the backward propagation, the margin of error of the output is measured, and the weights are adjusted accordingly to decrease the error. The activation function can be linear, rectified linear unit, sigmoid, hyperbolic tangent, or other types.” wherein the adjustment of the weights through an activation function of modified linear units is interpreted as adjustment of weight parameters through BILP processes)
Claim 8 recites a system comprising the same processor and instructions of Claim 1. Thus, Claim 8 is rejected for reasons set forth in the rejection of Claim 1.
Regarding Claim 9,
Wang/Li teaches the method of Claim 8 (and thus the rejection of Claim 8 is incorporated). Wang/Li further discloses to iteratively adjust the precision of the weight parameters for each portion of the one or more portions at different times using a set of bit-with values calculated based, at least in part, on a threshold value; (Wang [0005]; “preferred values of the respective reduced bit-widths are determined through multiple iterations of forward propagation through the first neural network model using a validation data set while each of two or more layers of the first neural network model is expressed with different degrees of quantization corresponding to different reduced bit-widths until a predefined information loss threshold is met by respective response statistics of the two or more layers; and generating a reduced neural network model that includes the plurality of layers, wherein each layer of two or more the plurality of layers includes a respective set of quantized parameters, and each quantized parameter is expressed with the preferred values of the respective reduced bit-widths for the layer as determined through the multiple iterations” which reads on adjusting the precision of weight parameters iteratively through the use of a threshold)
Regarding Claim 10,
Wang/Li teaches the method of Claim 8 (and thus the rejection of Claim 8 is incorporated). Wang/Li further discloses to iteratively adjust the precision of the weight parameters for each portion of the one or more portions at different times based, at least in part, on training progress; (Wang [0045]; “The training process typically includes two steps, forward propagation and backward propagation, that are repeated multiple times until a predefined convergence condition is met. In the forward propagation, the set of weights for different layers are applied to the input data and intermediate results from the previous layers. In the backward propagation, the margin of error of the output is measured, and the weights are adjusted accordingly to decrease the error. The activation function can be linear, rectified linear unit, sigmoid, hyperbolic tangent, or other types.” wherein adjusting the weights to reduce error is completed through forward propagation, thus reading on adjusting the precision of weight parameters based on training progress)
Regarding Claim 11,
Wang/Li teaches the method of Claim 8 (and thus the rejection of Claim 8 is incorporated). Wang/Li further discloses to adjust the precision of weight parameters, during training, for a first portion of the one or more portions at a first time and adjust the precision of weight parameters, during training, for a second portion of the one or more portions at a time different from the first time; (Wang [0045]; “The training process typically includes two steps, forward propagation and backward propagation, that are repeated multiple times until a predefined convergence condition is met. In the forward propagation, the set of weights for different layers are applied to the input data and intermediate results from the previous layers. In the backward propagation, the margin of error of the output is measured, and the weights are adjusted accordingly to decrease the error. The activation function can be linear, rectified linear unit, sigmoid, hyperbolic tangent, or other types.” wherein the sets of weights of different layers read on weight parameters associated with portions of the neural network
Wang [0019]; “In some embodiments, during training, integer (INT) weight regularization and 8-bit quantization techniques are applied to push the values of the full-precision parameters of the deep learning model 106 toward their corresponding integer values, and reduce the value ranges of the parameters such that they fall within the dynamic range of a predefined reduced maximum bit-width (e.g., 8 bits).” wherein the backpropagation-based training process involves changing weights of multiple layers in a neural network by iterating backward from the last layer to the first layer in the neural network, all the while updating its weight parameters throughout the iteration. As such, the back-propagation process of pushing full-precision parameters through iterative INT weight regularization and 8-bit quantization reads on changing the precision of weights for first and second layers of the multiple layers iterated throughout backward propagation; wherein back-propagation occurs through a backward pass of the neural network layers, thus reading on layers being sequentially parsed in reverse at different times and interpreted as a first and second time for the precision of weights to be updated)
Regarding Claim 12,
Wang/Li teaches the method of Claim 8 (and thus the rejection of Claim 8 is incorporated). Wang/Li further discloses wherein the one or more performance metrics indicate sensitivity of the one or more portions of the one or more neural networks using the precision of weight parameters; (Wang [0064];“As discussed above, in some embodiments, the predefined measure of information loss is the Jensen-Shannon Divergence that measures the difference between two statistical distributions. In this case, the statistical distributions are the collection of full layer responses for all layers (or in respective layers) in the full-precision trained model (e.g., model 106′ or 106″), and the quantized candidate model with a particular combination of bit-widths for its layers” wherein the Jensen-Shannon divergence between a full-precision trained model and a finer granularity precision bit-width quantized model reads on a performance metric indicating the sensitivity of the neural network to quantization)
Regarding Claim 13,
Wang/Li teaches the method of Claim 8 (and thus the rejection of Claim 8 is incorporated). Wang/Li further discloses wherein the one or more portions correspond to one or more layers of the one or more neural network; (Wang [0004]; “This disclosure describes a technique for producing a high-accuracy, lightweight machine learning model with adaptive bit-widths for the parameters of different layers of the model. Specifically, the conventional training phase is modified to promote parameters of each layer of the model toward integer values within an 8-bit range” wherein the portions of the neural networks being trained comprise layers of the neural networks)
Claim 15 recites a non-transitory machine-readable medium storing the same instructions as those executed on the processor of Claim 1. Thus, Claim 15 is rejected for reasons set forth in the rejection of Claim 1.
Regarding Claim 16,
Wang/Li teaches the method of Claim 15 (and thus the rejection of Claim 15 is incorporated). Wang/Li further discloses wherein the set of instructions which if iteratively adjust the precision of the weight parameters by at least: adjusting, at a first time, the precision of the weight parameters based, at least in part, on the one or more performance metrics and a first threshold value; adjusting, at a second time, the precision of the weight parameters based, at least in part, on the one or more performance metrics and a second threshold value; (Wang [0045]; “The training process typically includes two steps, forward propagation and backward propagation, that are repeated multiple times until a predefined convergence condition is met. In the forward propagation, the set of weights for different layers are applied to the input data and intermediate results from the previous layers. In the backward propagation, the margin of error of the output is measured, and the weights are adjusted accordingly to decrease the error. The activation function can be linear, rectified linear unit, sigmoid, hyperbolic tangent, or other types.” wherein the sets of weights of different layers read on weight parameters associated with portions of the neural network; wherein the weights adjusted across the forward and back propagation-based training process conducted iteratively across layers of the neural network reads on adjusting precision of the weights at a first/second iteration and time.
Wang [0005]; “preferred values of the respective reduced bit-widths are determined through multiple iterations of forward propagation through the first neural network model using a validation data set while each of two or more layers of the first neural network model is expressed with different degrees of quantization corresponding to different reduced bit-widths until a predefined information loss threshold is met by respective response statistics of the two or more layers; and generating a reduced neural network model that includes the plurality of layers, wherein each layer of two or more the plurality of layers includes a respective set of quantized parameters, and each quantized parameter is expressed with the preferred values of the respective reduced bit-widths for the layer as determined through the multiple iterations” which reads on adjusting the precision of weight parameters iteratively through the use of a first threshold associated with the first layer and a second threshold associated with the second layer.)
Regarding Claim 17,
Wang/Li teaches the method of Claim 15 (and thus the rejection of Claim 15 is incorporated). Wang/Li further discloses wherein the set of instructions, which if performed by the one or more processors, cause the one or more processors to compute a set of bit-widths through at least one or more linear programming processes to adjust the precision of the weight parameters; (Wang [0005]; “preferred values of the respective reduced bit-widths are determined through multiple iterations of forward propagation through the first neural network model using a validation data set while each of two or more layers of the first neural network model is expressed with different degrees of quantization corresponding to different reduced bit-widths until a predefined information loss threshold is met by respective response statistics of the two or more layers; and generating a reduced neural network model that includes the plurality of layers, wherein each layer of two or more the plurality of layers includes a respective set of quantized parameters, and each quantized parameter is expressed with the preferred values of the respective reduced bit-widths for the layer as determined through the multiple iterations”
Wang [0045]; “The training process typically includes two steps, forward propagation and backward propagation, that are repeated multiple times until a predefined convergence condition is met. In the forward propagation, the set of weights for different layers are applied to the input data and intermediate results from the previous layers. In the backward propagation, the margin of error of the output is measured, and the weights are adjusted accordingly to decrease the error. The activation function can be linear, rectified linear unit, sigmoid, hyperbolic tangent, or other types.” wherein the adjustment of the weights through an activation function of modified linear units is interpreted as one or more linear programming processes to adjust the precision of weight parameters.)
Regarding Claim 20,
Wang/Li teaches the method of Claim 15 (and thus the rejection of Claim 15 is incorporated). Wang/Li further discloses wherein the one or more performance metrics indicate sensitivity of the one or more portions of the one or more neural networks based, at least in part, on a precision of weight parameters; (Wang [0064];“As discussed above, in some embodiments, the predefined measure of information loss is the Jensen-Shannon Divergence that measures the difference between two statistical distributions. In this case, the statistical distributions are the collection of full layer responses for all layers (or in respective layers) in the full-precision trained model (e.g., model 106′ or 106″), and the quantized candidate model with a particular combination of bit-widths for its layers” wherein the Jensen-Shannon divergence between a full-precision trained model and a finer granularity precision bit-width quantized model reads on a performance metric indicating the sensitivity of the neural network to quantization.
Wang [0062]; “The adaptive bit-width of the model refers to the characteristic that the respective bit-width for storing the set of parameters (e.g., weights and bias) for each layer of the model is specifically selected for that set of parameters (e.g., in accordance with the distribution and range of the parameters). Specifically, the validation data set is used as input in a forward pass through the pruned slender full-precision network (e.g., model 106″), and the statistical distribution of the response values in each layer is collected. Then different configurations of bit-width and layer combinations are prepared as candidates for evaluation. For each candidate model, the validation data set is used as input in a forward pass through the candidate model, and the statistical distribution of the response values in each layer is collected. Then, the candidate is evaluated based on the amount of information loss that has resulted from the quantization applied to the candidate model. In some embodiments, Jensen-Shannon divergence between the two statistical distributions for each layer (or for the model as a whole) is used to identify the optimal bit-widths with the least information loss for that layer.” wherein the parameter set includes an evaluation of the quantization process on the performance of the model during the forward pass of forward propagation, thus reading on the performance metrics indicating a sensitivity based on weight parameters)
Claim 22 recites the method executed by the processor of Claim 1, and is thus rejected for reasons set forth in the rejection of Claim 1.
Regarding Claim 23,
Wang/Li teaches the method of Claim 22 (and thus the rejection of Claim 22 is incorporated). Wang/Li further discloses adjusting a first set of weight parameters associated with a first portion of the one or more portions to a first precision and adjusting a second set of weight parameters associated with a second portion of the one or more portions to a second precision; (Wang [0005]; “preferred values of the respective reduced bit-widths are determined through multiple iterations of forward propagation through the first neural network model using a validation data set while each of two or more layers of the first neural network model is expressed with different degrees of quantization corresponding to different reduced bit-widths until a predefined information loss threshold is met by respective response statistics of the two or more layers; and generating a reduced neural network model that includes the plurality of layers, wherein each layer of two or more the plurality of layers includes a respective set of quantized parameters, and each quantized parameter is expressed with the preferred values of the respective reduced bit-widths for the layer as determined through the multiple iterations” which reads on adjusting to a first and second precision of a first and second set of weight parameters in a first and second iteration, respectively)
Regarding Claim 24,
Wang/Li teaches the method of Claim 22 (and thus the rejection of Claim 22 is incorporated). Wang/Li further discloses wherein the one or more portions comprises one or more layers of the one or more neural networks; (Wang [0004]; “This disclosure describes a technique for producing a high-accuracy, lightweight machine learning model with adaptive bit-widths for the parameters of different layers of the model. Specifically, the conventional training phase is modified to promote parameters of each layer of the model toward integer values within an 8-bit range” wherein the portions of the neural networks being trained comprise layers of the neural networks)
Regarding Claim 25,
Wang/Li teaches the method of Claim 22 (and thus the rejection of Claim 22 is incorporated). Wang/Li further discloses wherein the one or more performance metrics indicate a sensitivity for each layer of the one or more neural networks associated with a set of weight parameters; (Wang [0064];“As discussed above, in some embodiments, the predefined measure of information loss is the Jensen-Shannon Divergence that measures the difference between two statistical distributions. In this case, the statistical distributions are the collection of full layer responses for all layers (or in respective layers) in the full-precision trained model (e.g., model 106′ or 106″), and the quantized candidate model with a particular combination of bit-widths for its layers” wherein the Jensen-Shannon divergence between a full-precision trained model and a finer granularity precision bit-width quantized model reads on a performance metric indicating the sensitivity of the neural network to quantization.
Wang [0062]; “The adaptive bit-width of the model refers to the characteristic that the respective bit-width for storing the set of parameters (e.g., weights and bias) for each layer of the model is specifically selected for that set of parameters (e.g., in accordance with the distribution and range of the parameters). Specifically, the validation data set is used as input in a forward pass through the pruned slender full-precision network (e.g., model 106″), and the statistical distribution of the response values in each layer is collected. Then different configurations of bit-width and layer combinations are prepared as candidates for evaluation. For each candidate model, the validation data set is used as input in a forward pass through the candidate model, and the statistical distribution of the response values in each layer is collected. Then, the candidate is evaluated based on the amount of information loss that has resulted from the quantization applied to the candidate model. In some embodiments, Jensen-Shannon divergence between the two statistical distributions for each layer (or for the model as a whole) is used to identify the optimal bit-widths with the least information loss for that layer.” wherein the parameter set includes an evaluation of the quantization process on the performance of the model during the forward pass of forward propagation, thus reading on the performance metrics indicating a sensitivity based on weight parameters)
Regarding Claim 26,
Wang/Li teaches the method of Claim 22 (and thus the rejection of Claim 22 is incorporated). Wang/Li further discloses calculating a set of bit-width values based at least in part on a threshold value; and iteratively adjusting the precision of the weight parameters using at least the set of bit-width values; (Wang [0005]; “preferred values of the respective reduced bit-widths are determined through multiple iterations of forward propagation through the first neural network model using a validation data set while each of two or more layers of the first neural network model is expressed with different degrees of quantization corresponding to different reduced bit-widths until a predefined information loss threshold is met by respective response statistics of the two or more layers; and generating a reduced neural network model that includes the plurality of layers, wherein each layer of two or more the plurality of layers includes a respective set of quantized parameters, and each quantized parameter is expressed with the preferred values of the respective reduced bit-widths for the layer as determined through the multiple iterations” which reads on adjusting the precision of weight parameters iteratively through the use of a threshold as well as calculation of a preferred bit width value based on a predetermined information loss threshold.)
Regarding Claim 27,
Wang/Li teaches the method of Claim 22 (and thus the rejection of Claim 22 is incorporated). Wang/Li further discloses iteratively adjusting the precision of the weight parameters based, at least in part, on progress of one or more training processes of the one or more neural networks; (Wang [0045]; “The training process typically includes two steps, forward propagation and backward propagation, that are repeated multiple times until a predefined convergence condition is met. In the forward propagation, the set of weights for different layers are applied to the input data and intermediate results from the previous layers. In the backward propagation, the margin of error of the output is measured, and the weights are adjusted accordingly to decrease the error. The activation function can be linear, rectified linear unit, sigmoid, hyperbolic tangent, or other types.” wherein adjusting the weights to reduce error is completed through forward propagation, thus reading on adjusting the precision of weight parameters based on training progress)
Claims 7, 14, and 21 are rejected under 35 U.S.C. 103 as being anticipated by Wang et al. (US20190050710A1, hereinafter “Wang”) in view of Li et al. (CN114065902A, hereinafter “Li”) in view of Santos et al. (“Impact of Reduced Precision in the Reliability of Deep Neural Networks for Object Detection” [2019], hereinafter “Santos”).
Regarding Claim 7,
Wang/Li teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Wang/Li fails to explicitly disclose but Santos discloses wherein the one or more circuits are further to perform one or more autonomous vehicle tasks using the one or more neural networks. (Santos [Section I Paragraph 1]; “DNNs executed on GPUs are being deployed in self-driven vehicles to detect paths, objects, and for image segmentation [1]. Major hardware and software vendors are pursuing faster hardware/software solutions. One of the latest improvements proposed by major GPU vendors is mixed-precision architecture, that aims at expediting the execution of float point operations specifically for machine learning algorithms”)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Wang/Li’s method of iteratively adjusting the precision of neural network weight parameters to use the adjusted precision neural networks for autonomous vehicle tasks. The motivation to do so lies in how autonomous driving “can benefit from mixed-precisions architectures as they are very efficient regarding computing efficiency and power consumption” (Santos [Section 1 Paragraph I]) thus allowing self-driving cars to have lower power requirements.
Regarding Claim 14,
Wang/Li teaches the method of Claim 8 (and thus the rejection of Claim 8 is incorporated). Wang/Li fails to explicitly disclose but Santos discloses wherein the one or more processors are to use the one or more neural networks to perform an object detection task. (Santos [Section I Paragraph 3]; “In this paper, we aim at deeply investigating, through both accelerated neutron beam experiments and software fault-injection, the impact of mixed-precision on the reliability of DNNs. We run YOLOv3 DNN implemented in double, single, and half-precision data types on NVIDIA mixed-precision GPUs. We measure the error rate for all the configurations and evaluate faults propagation probabilities. Commonly object detection frameworks outputs are composed of the objects shapes, positions, and its classification. In this work, we investigate how mixed precision data affects the criticality of faults on the final object detection.”)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Wang/Li’s method of iteratively adjusting the precision of neural network weight parameters to use the adjusted precision neural networks for object detection. The motivation to do so lies in how object detection “can benefit from mixed-precisions architectures as they are very efficient regarding computing efficiency and power consumption” (Santos [Section 1 Paragraph I]) thus allowing object detection to have lower power requirements.
Regarding Claim 21,
Wang/Li teaches the method of Claim 15 (and thus the rejection of Claim 15 is incorporated). Wang/Li fails to explicitly disclose but Santos discloses wherein the set of instructions, which if performed by the one or more processors, cause the one or more processors to perform one or more image processing tasks based, at least in part, on the one or more neural networks. (Santos [Section I Paragraph 1]; “DNNs executed on GPUs are being deployed in self-driven vehicles to detect paths, objects, and for image segmentation” wherein image segmentation reads on image processing tasks based on neural network)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Wang/Li’s method of iteratively adjusting the precision of neural network weight parameters to use the adjusted precision neural networks for performing image processing tasks. The motivation to do so lies in how image processing “can benefit from mixed-precisions architectures as they are very efficient regarding computing efficiency and power consumption” (Santos [Section 1 Paragraph I]) thus allowing image processing to have lower power requirements.
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US20190050710A1, hereinafter “Wang”) in view of Li et al. (CN114065902A, hereinafter “Li”) in view of Hu et al. (CN112686384A, hereinafter “Hu”).
Regarding Claim 18,
Wang/Li teaches the method of Claim 15 (and thus the rejection of Claim 15 is incorporated). Wang/Li fails to explicitly disclose but Hu discloses wherein the set of instructions, which if performed by the one or more processors, cause the one or more processors to adjust the precision of the weight parameters based, at least in part, on one or more numbers of bit operations (BOP) corresponding to the one or more portions. (Hu [Pg. 3 Paragraph 7]; “The method of the invention decomposes the weight parameter of the neural network into multi-bit binary weight, replaces the original 32-bit floating point weight parameter with M (M < 32) bit binary number, and reduces the occupied storage space to the original one … The storage and memory usage of the neural network can be greatly reduced. In the process of calculating the activation parameters, floating point number operation can be replaced by bit operation” wherein the weight parameter is adjusted through replacement of the floating point number operation with M bit binary number, reading on a bit operation to adjust the weight parameter of a portion)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Wang/Li’s method of iteratively adjusting the precision of neural network weight parameters to incorporate Hu’s parameters based in bit operations. The motivation to do is “so that the consumption of calculation resources is greatly reduced, and the method is more favorable for deployment on portable equipment and embedded equipment” (Hu [Pg. 3 Paragraph 7])
Claim 19 is rejected under 35 U.S.C. 103 as being anticipated by Wang et al. (US20190050710A1, hereinafter “Wang”) in view of Li et al. (CN114065902A, hereinafter “Li”) in view of Yao et al. (“Pyhessian: Neural Networks through the lens of the hessian” [2020], hereinafter “Yao”).
Regarding Claim 19,
Wang/Li teaches the method of Claim 15 (and thus the rejection of Claim 15 is incorporated). Wang/Li fails to explicitly disclose but Hu discloses wherein the set of instructions, which if performed by the one or more processors, cause the one or more processors to calculate the one or more performance metrics based, at least in part, on one or more trace estimation processes. (Yao [Page 4 Section 3A]; “For a NN with m parameters, the gradient of the loss w.r.t. model parameters is a vector
PNG
media_image2.png
94
289
media_image2.png
Greyscale
commonly called the Hessian. A typical NN model involves millions of parameters, and thus even forming the Hessian is computationally infeasible. However, it is possible to compute properties of the Hessian spectrum without explicitly forming the Hessian matrix. Instead, all we need is an oracle to compute the application of the Hessian to a random vector v. This can be achieved by observing the following:
PNG
media_image3.png
42
231
media_image3.png
Greyscale
… Having this oracle, we can easily compute the top k Hessian eigenvalues using power iteration [53]; see Algorithm 2. However, for a typical NN with millions of parameters, the top eigenvalues may not be representative of how the loss landscape behaves. Therefore, we also compute the trace and ESD of the Hessian, as described below. B. Hutchinson Method for Hessian Trace Computation The trace of the Hessian can be computed using RandNLA, and in particular with Hutchinson’s method [4, 5] for the fast computation of the trace, using only Hessian matvec computations (as given in Eq. 2).”)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Wang/Li’s method of iteratively adjusting the precision of neural network weight parameters to incorporate Yao’s performance metrics being calculated trace-estimation in the Hessian. The motivation to do so lies in how by using the Hessian, “Our extensive analysis shows new finer-scale insights” (Yao [Page 9 Section V]), allowing Hessian-based analysis to make new observations not present in Wang/Li’s performance metrics.
Response to Arguments
The Examiner acknowledges the Applicant’s amendments in which Claims 1, 8, 15 and 22 are amended.
Applicant’s arguments filed February 5th, 2026, traversing the rejection of claims 1-27 under 35 U.S.C. § 101 have been fully considered, and are fully persuasive.
Applicant’s arguments regarding the 35 U.S.C. § 102(a)(1) and 35 U.S.C. § 103 rejection of claims 1-27 of the previous office action have been considered, but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
“DATA PARALLELISM IN DISTRIBUTED TRAINING OF ARTIFICIAL INTELLIGENCE MODELS” (US 20210019152 A1) which discloses adjusted precision models trained through backward-propagation.
“NEURAL NETWORK TRAINING TECHNIQUE” (US 20210334644 A1) which discloses altered parameter precisions for different scenarios
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN J KIM whose telephone number is (571)272-0523. The examiner can normally be reached 9-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matt El can be reached on (571) 270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000
/JONATHAN J KIM/Examiner, Art Unit 2141 /MATTHEW ELL/Supervisory Patent Examiner, Art Unit 2141