Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Remarks
This Office Action is responsive to Applicants' Amendment filed on February 19, 2026, in which claims 1, 4-6, 8-9, 12-14, and 16 are currently amended. Claims 1, 2, 4-6, 8-10, 12-14, and 16 are currently pending.
Response to Arguments
The rejections to claims 13-20 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1, 2, 4-6, 8-10, 12-14, and 16 under 35 U.S.C. 103 based on amendment have been considered, however, are not persuasive.
With respect to Applicant's arguments on p. 9 of the Remarks submitted 2/19/2026 that "Examiner asserts that the "mutual loss" in Song corresponds to the "importance value" of the present invention", Examiner respectfully wants to clarify that it is the i-th element of the element-wise output difference dx^(l+1)_(a,q) specifically that is interpreted as the importance value which is bounded together as part of the mutual loss term by p-norm. Song explicitly compares the layer output produced from clean input data with the layer output produced from perturbed input data, both generated by the same underlying neural network model ([p. 4] "For a single fully-connected layer, suppose W is the weight matrix, W +∆W is the weight after quantization, x is the original input, and x+∆x is the adversarial input. The difference in the output of this layer (δ) can be represented as follow"). These two inputs are then processed by the same neural network layer with weights Wl. For the clean input the layer produces x^(l+1)=al(Wlxl) and for the perturbed input xl+1 a,q =al[(Wl +∆Wl)(xl +∆xl)] (Eqn. 3) and then explicitly computes the difference between these two layer outputs (Eqn. 4) and then further expands this expression element by element ([p. 6] "The i-th element of ∆xl+1 is [...]"). Song then analyzes the magnitude of these differences by introducing p-norm bounds on the error terms including the interaction term (mutual loss) dWldxl ([p. 7] "For the mutual loss, we will use the induced p-norm to bound the error"). Examiner further notes that neither the instant claims or instant specification explicitly limit an "importance value" which is not a term of the art with a single well-accepted meaning. For at least these reasons Examiner believes the interpretation of Song to disclose "determining a respective importance value for each element of the at least one original feature map by computing a sample variance or p-norm of differences between each element of the at least one original feature map and a corresponding element of the at least one perturbed feature map" is reasonable and should be maintained.
With respect to Applicant's arguments on p. 10 of the Remarks submitted 2/19/2026 that Examiner is "mapping the original weights W to the "original feature map" and the quantized weights Wq1 to the "quantized feature map"", Examiner respectfully disagrees. Examiner is mapping the output activation al(Wlxl) to the original feature map, and the output activations al[(Wl+dWl)(xl+dxl)] to the quantized feature map which is consistent with how feature maps are defined in the art.
With respect to Applicant's follow up argument on p. 10 of the Remarks submitted 2/19/2026 that "Song's Lipschitz constant [...] does not involve calculating a distance between feature maps for input data using an element-wise importance value derived from input sensitivity", Examiner respectfully disagrees. As noted above Song explicitly computes a difference between the quantized output feature map and the original output feature map to determine x^(l+1)_(a,q) (See Eqn. 4 on p. 6 of Song) where the difference vector represents the distance between the feature maps. Song then evaluates the magnitude of this difference bounded by p-norm. The magnitude is explicitly computed using the Lipschitz constant ([p. 7] "We use the Lipschitz constant (the matrix norm) as a metric for quantization optimization") which is interpreted as the evaluation value.
With respect to Applicant's arguments on p. 11 of the Remarks submitted 2/19/2026 that Examiner has used contradictory interpretations of feature map, Examiner respectfully disagrees. When Applicant asserts "The Examiner interpreted the term "feature map" as the output (activations) of a layer (e.g., x^(l+1) in Song))", as noted above, Examiner has interpreted the output activations (which are synonymous with the layer outputs) as the feature map (al(Wlxl), al[(Wl+dWl)(xl+dxl)]) which is consistent with what is known in the art. Each of these output activations do objectively correspond to respective weight matrices as can be seen in the respective equations provided by Applicant on p. 11 of the Remarks submitted 2/19/2026. For this reason Examiner asserts that the interpretation is consistent, the output activations/layer outputs are consistently interpreted as the feature maps.
Then with respect to Applicant's follow up argument that "If the "feature map" is consistently interpreted as the layer output (as done for Feature 1), the Song's Lipschitz constant (derived solely from weights W) cannot correspond to the claimed evaluation value calculated based on feature maps", Examiner respectfully disagrees. As already discussed above, Song first establishes feature map differences dx^(l+1)_(a,q)=xl+1 a,q −xl+1 = al[(Wl +∆Wl)(xl +∆xl)]−al(Wlxl) (Eqn. 4). Then Song explicitly states ([p. 6] "To get rid of the activation function, let us consider the overall error before activation") where they obtain (Wl +∆Wl)(xl +∆xl)−Wlxl explicitly by stripping the activation function to examine the effects the weights have on the output activation function. This shows the direct dependency the weights have on the output activation. Song explicitly norms the vector of feature-map differences into a single magnitude measure (Eqn. 8) where dxl is the bounded difference, dWl is the bounded weights such that the norm of the weight perturbation provides a bound on the magnitude of the feature-map difference. Examiner asserts that Applicant appears to be looking only at the final metric ||dW|| in isolation, instead of the derivation chain that connects that metric to the feature-map differences. By focusing only on L=||dW|| and concluding that the metric depends only on weights the Applicant is missing the earlier derivation showing that dW determines the magnitude of dx and dx is the feature-map difference.
For at least these reasons and those further detailed below Examiner asserts that the interpretation in view of the combination of Song and Bar is reasonable and should be maintained.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 4, 5, 8, 9, 10, 12, 13, and 16 are rejected under U.S.C. §103 as being unpatentable over the combination of Song (“A Layer-wise Adversarial-aware Quantization Optimization for Improving Robustness”, 2021) and Bar (“A Spectral Perspective of DNN Robustness to Label Noise”, 2022).
Regarding claim 1, Song teaches A method for evaluating a quantized artificial neural network, performed by a computing device having one or more processors and a memory for storing one or more programs executed by the one or more processors, the method comprising: ([p. 3 §2.2] "Neural network quantization [6] helps save computation and memory costs and therefore improves power efficiency. Quantization has also been shown to improve accuracy in some cases, and in almost all cases retains the accuracy of the original network. Moreover, quantization enables the deployment of neural networks on limited-precision hardware. Many hardware systems have physical constraints and do not allow full-precision models to directly map to them. For example, GPUs support half precision floating point arithmetic (FP16); ReRAM (a.k.a memristor) [33] only allows limited precision because of process variations [12, 21]")
generating at least one original feature map for input data using a first artificial neural network model; ([p. 4 §3.1] "suppose W is the weight matrix, W + ∆W is the weight after quantization, x is the original input, and x+∆x is the adversarial input. The difference in the output of this layer (δ) can be represented as follow: [See Eqn. 1]" [p. 5 §3.2] "Cisse et al. [4] has already given detailed explanations on how to represent convolution operations as basic matrix multiplications and we have verified the correctness of their derivations" Song explicitly anticipates W representing convolution kernel such that W is interpreted as a convolutional filter producing at least one original feature map for input data (explicitly al(Wlxl)=x^(l+1)) in Eqn. 4).)
generating at least one piece of modified data by applying a predetermined perturbation to said input data;([p. 2 §2.1] "An adversarial example Xe is generated by injecting adversarial perturbation ε (a.k.a. adversarial strength) to a clean sample X: Xe = X + ε. Usually, adversarial perturbations are so tiny that they are even imperceptible to human eyes. However, carefully designed adversarial perturbations can cause a neural network to misclassify adversarial examples with high confidence levels" See also dxl in Eqn. 3)
generating at least one perturbed feature map for each piece of modified data using the first artificial neural network model;([p. 4 §3.1] "suppose W is the weight matrix, W + ∆W is the weight after quantization, x is the original input, and x+∆x is the adversarial input. The difference in the output of this layer (δ) can be represented as follow: [See Eqn. 1]" [p. 5 §3.2] "Cisse et al. [4] has already given detailed explanations on how to represent convolution operations as basic matrix multiplications and we have verified the correctness of their derivations" [p. 6 §4.1] "the output of layer l is" See Eqn. 3 where dxl is explicitly the layer input perturbation.)
determining a respective importance value for each element of the at least one original feature map by computing a sample variance or p-norm of differences between each element of the at least one original feature map and a corresponding element of the at least one perturbed feature map([p. 6 §4.1] "where a l (·) is the element-wise activation function of layer l [...] The loss introduced by adversarial and quantization loss in the output of layer l is: ∆x l+1 a,q = x l+1 a,q −x l+1 = a l [(Wl +∆Wl)(x l +∆x l)]−a l (Wl x l). (4) The i-th element of ∆x l+1 a,q is a l [∑ n j=1 (Wl i j +∆Wl i j)(x l j +∆x l j)]−a l (∑ n j=1Wl i jx l j). To get rid of the activation function, let us consider the overall error before activation: (Wl +∆Wl)(x l +∆x l)−Wl x l = Wl∆x l +∆Wl x l +∆Wl∆x l . (5) Its i-th element can be written as ∑ n j=1 (Wl i j∆x l j +∆Wl i jx l j +∆Wl i j∆x l j)." Song at each layer measures the element-wise difference between the output feature map produced by quantized weights + perturbed input and full-precision weights + clean input and bounds it via p-norm to guide layer-wise quantization. Bounded interaction term ||∆Wl∆xl||p where "The i-th element of ∆x l+1 a,q is a l [∑ n j=1 (Wl i j +∆Wl i j)(x l j +∆x l j)]−a l (∑ n j=1Wl i jx l j)" interpreted as importance value for each element of the original feature map al(Wlxl)j and each perturbed feature map al [(Wl +∆Wl)(x l +∆x l)].)
generating at least one quantized feature map for the input data using a second artificial neural network model that is a quantized artificial neural network model for the first artificial neural network model; ([p. 4 §3.1] "W + ∆W is the weight after quantization [...] δ = (W +∆W)·(x+∆x)−W x = W∆x+∆W x+∆W∆x" Weight filter (W + ∆W) interpreted as filter of a second artificial neural network generating at least one quantized feature map for the first artificial neural network model having weight filter W.)
and calculating an evaluation value for the second artificial neural network model based on the at least one original feature map, the at least one quantized feature map, and the respective importance value.([p. 7 §4.2] "We use the Lipschitz constant (the matrix norm) as a metric for quantization optimization" Lipschitz constant interpreted as synonymous with evaluation value for the second (quantized) artificial neural network.)
wherein the calculating of the evaluation value comprises a distance between each of the at least one original feature map and its corresponding quantized feature map from the at least one quantized feature map, the distance being calculated based on the respective importance value([p. 7 §4.2] "If we change the quantization bitwidth or switch the quantization method in layer l, different quantization settings will result in different ∆Wl . Let us assume the two quantization settings are q1 and q2, respectively. Then the quantization error difference between q1 and q2 in the output of layer l is: (∆Wl q1 − ∆Wl q2)x l . We may further define L1 = ∆Wl q1 p , L2 = ∆Wl q2 p , ∆L = ∆Wl q1 −∆Wl q2 p . According to the triangle inequality: ∆L ≥ |L1 −L2|." Song explicitly defines the Lipschitz constant as ||dW||p where distance dW = Wq-W where Wq corresponds to the quantized feature map of the second artificial neural network and W corresponds to the feature map of the original neural network and is calculated based on the element wise importance.).
However, Song does not explicitly teach outputting the evaluation value.
Bar, in the same field of endeavor, teaches outputting the evaluation value ([p. 4 §3.1] "Note that the assumption on non-expansive activation functions holds for the currently used functions (e.g., ReLU, sigmoid and tanh). The above proposition provides an upper bound for the regularization term in equation 2, which may suggest to replace the regularization on the network derivative with a regularization on the network weights, which is feasible during training. From equation 4, this can be done through a penalty on the weights’ spectral or Frobenius norm" See Table 4 which shows output evaluation values (Lipschitz constraint)).
Song as well as Bar are directed towards bounded regularization for neural network input perturbation analysis. Therefore, Song as well as Bar are analogous art in the same field of endeavor. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Song with the teachings of Bar by outputting the Lipschitz constraint. Bar provides as additional motivation for combination ([p. 8] "Table 1: Bounding the network weights increases its smoothness."). This motivation for combination also applies to the remaining claims which depend on this combination.
Regarding claim 2, the combination of Song, and Bar teaches The method of claim 1, wherein the at least one original feature map includes a feature map generated in at least one of a plurality of layers included in the first artificial neural network model, and the at least one quantized feature map includes a quantized feature map generated in layers corresponding to respective layers of the first artificial neural network model, which have generated the at least one original feature map, among a plurality of layers included in the second artificial neural network model.(Song [p. 6 §4.1] "Let us assume that for a neural network which has a total of L layers, the input vector and the input perturbation introduced by the adversarial noise of layer l are x l and ∆x l , respectively (both are column vectors with n elements). The full-precision weight matrix and the quantized weight matrix of layer l are Wl and Wl + ∆Wl , respectively (both are m×n matrices). Here ∆Wl is the error introduced by quantization in layer l. Then the output of layer l is" See also Eqn. 3-7 which show that each filter map is layer (l) specific).
Regarding claim 4, the combination of Song, and Bar teaches The method of claim 1, wherein the determining of the respective importance value comprises determining the respective importance value based on a difference between each element of the at least one original feature map and the corresponding element of the at least one perturbed feature map(Song [p. 6 §4.1] "The loss introduced by adversarial and quantization loss in the output of layer l is: ∆x l+1 a,q = x l+1 a,q −x l+1 = a l [(Wl +∆Wl)(x l +∆x l)]−a l (Wl x l). (4) The i-th element of ∆x l+1 a,q is a l [∑ n j=1 (Wl i j +∆Wl i j)(x l j +∆x l j)]−a l (∑ n j=1Wl i jx l j). To get rid of the activation function, let us consider the overall error before activation: (Wl +∆Wl)(x l +∆x l)−Wl x l = Wl∆x l +∆Wl x l +∆Wl∆x l . (5) Its i-th element can be written as ∑ n j=1 (Wl i j∆x l j +∆Wl i jx l j +∆Wl i j∆x l j)." Song at each layer measures the element-wise difference between the output feature map produced by quantized weights + perturbed input and full-precision weights + clean input and bounds it via p-norm to guide layer-wise quantization. Bounded interaction term ||∆Wl∆xl||p where "The i-th element of ∆x l+1 a,q is a l [∑ n j=1 (Wl i j +∆Wl i j)(x l j +∆x l j)]−a l (∑ n j=1Wl i jx l j)" interpreted as importance value for each element of the original feature map W_ij and each perturbed feature map dW_ij.).
Regarding claim 5, the combination of Song, and Bar teaches The method of claim 1, wherein the determining of the respective importance value comprises determining the respective importance value using a metric learning loss function.(Song [p. 7] "For the mutual loss, we will use the induced p-norm to bound the error" The mutual loss interpreted as a metric learning loss function).
Regarding claim 8, the combination of Song, and Bar teaches The method of claim 1, wherein the calculating of the evaluation value comprises calculating the evaluation value based on a distance between a result of applying the respective importance value of each element of a feature map generated by an i-th layer of the first artificial neural network model among the at least one original feature map to the feature map generated by the i-th layer of the first artificial neural network model and a result of applying the respective importance value of each element of the feature map generated by the i-th layer of the first artificial neural network model to a feature map generated by an i-th layer of the second artificial neural network model among the at least one quantized feature map.(Song [p. 7 §4.2] "If we change the quantization bitwidth or switch the quantization method in layer l, different quantization settings will result in different ∆Wl . Let us assume the two quantization settings are q1 and q2, respectively. Then the quantization error difference between q1 and q2 in the output of layer l is: (∆Wl q1 − ∆Wl q2)x l . We may further define L1 = ∆Wl q1 p , L2 = ∆Wl q2 p , ∆L = ∆Wl q1 −∆Wl q2 p . According to the triangle inequality: ∆L ≥ |L1 −L2|." Song explicitly defines the Lipschitz constant as ||dW||p where distance dW = Wq-W corresponding to (a l [(Wl +∆Wl)(x l +∆x l)]−a l (Wl x l)) where Wq corresponds to the quantized feature map (a l [(Wl +∆Wl)(x l +∆x l)]) of the second artificial neural network and W corresponds to the feature map of the original neural network and is calculated based on the element wise importance x l+1 a,q.).
Regarding claims 9, 10, 12, 13, and 16, claims 9, 10, 12, 13, and 16 are directed towards an apparatus for performing the methods of claims 1, 2, 4, 5, and 8, respectively. Therefore, the rejections applied to claims 1, 2, 4, 5, and 8 also apply to claims 9, 10, 12, 13, and 16.
Claims 6 and 14 are rejected under U.S.C. §103 as being unpatentable over the combination of Song and Bar and in further view of Wang (“Smoothed Geometry for Robust Attribution”, 2020).
Regarding claim 6, the combination of Song, and Bar teaches The method of claim 1.
However, the combination of Song, and Bar doesn't explicitly teach wherein the determining of the respective importance value comprises determining the respective importance value using a gradient of each of the at least one original feature map.
Wang, in the same field of endeavor, teaches the determining of the respective importance value comprises determining the respective importance value using a gradient of each of the at least one original feature map.([p. 2] "Definition 1 (Saliency Map (SM) [38]). Given a model f(x), the Saliency Map for an input x is defined as g(x) = Oxf(x)" [p. 4 §3.2] "Viewing Prop. 2 from an adversarial view, it is possible that an adversary happens to find a certain neighbor whose local geometry is totally different from the input so that the chosen noise level is not large enough to produce semantically similar attribution maps. Similar idea can also be applied to Integrated Gradient")).
The combination of Song, and Bar as well as Wang are directed towards bounded regularization for neural network input perturbation analysis. Therefore, the combination of Song, and Bar as well as Wang are analogous art in the same field of endeavor. Song already bounds layer feature map under input perturbation and weight quantization, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention in view of Wang to also bound the gradient which is explicitly stated in Wang ([p. 4 §3.2] "Viewing Prop. 2 from an adversarial view, it is possible that an adversary happens to find a certain neighbor whose local geometry is totally different from the input so that the chosen noise level is not large enough to produce semantically similar attribution maps. Similar idea can also be applied to Integrated Gradient"). Wang provides as additional motivation for combination ([Abstract] “Our experiments on a range of image models demonstrate that both of these mitigations consistently improve attribution robustness, and confirm the role that smooth geometry plays in these attacks on real, large-scale models”).
Regarding claim 14, claim 14 is directed towards an apparatus for performing the method of claim 6. Therefore, the rejections applied to claim 6 also apply to claim 14.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Karabutov (US20230106778A1) is directed towards a method for evaluating importance of quantization errors in convolutional neural network feature maps.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124