Last updated: July 17, 2026
Application No. 17/656,742
Multiplier-Less Sparse Deep Neural Network

Non-Final OA §103§112
Filed
Mar 28, 2022
Priority
Sep 10, 2021 — provisional 63/242,636
Examiner
KNIGHT, PAUL M
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Mitsubishi Electric Corporation
OA Round
3 (Non-Final)
This examiner grants 62% of cases after interview

— +17.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 278 resolved cases, 2023–2026
Examiner Intelligence

KNIGHT, PAUL M View full profile →
Grants 62% of resolved cases
Career Allowance Rate
173 granted / 278 resolved
+7.2% vs TC avg
Strong +17% interview lift
Without
With
+17.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
26 currently pending
Career history
303
Total Applications
across all art units
Statute-Specific Performance

§101
1.1%
-38.9% vs TC avg
§103
78.5%
+38.5% vs TC avg
§102
3.4%
-36.6% vs TC avg
§112
10.4%
-29.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 278 resolved cases
Office Action

§103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Style
In this action unitalicized bold is used for claim language, while italicized bold is used for emphasis. 

Information Disclosure Statement
No information Disclosure Statement appears to have been filed with this application. The Specification of this application alludes to various prior art publications that teach aspects of the invention(s) claimed in this application.  See e.g. Spec. ¶¶2-5 (“In prior arts, some memory-efficient DNN methods and systems were proposed . . . One type of such approaches . . . Another type of approaches . . . Both of them are known as a network distillation technique . . . Recently, another simple approach for weights pruning was proposed . . . It was shown that the LTH pruning can provide better performance . . .”)  Further the named inventor of this application appears to have published an academic paper including the subject matter of this application 6 days after the provisional application was filed.  See Akino (Zero-Multiplier Sparse DNN Equalization for Fiber-Optic QAM Systems with Probabilistic Amplitude Shaping.)  Several claims are obvious in view of a combination of documents cited in the academic publication.  It is submitted that documents alluded to in the background section and documents listed in the academic publication by the inventor may be material to patentability.  


Applicant Reply
“The claims may be amended by canceling particular claims, by presenting new claims, or by rewriting particular claims as indicated in 37 CFR 1.121(c). The requirements of 37 CFR 1.111(b) must be complied with by pointing out the specific distinctions believed to render the claims patentable over the references in presenting arguments in support of new claims and amendments. . . . The prompt development of a clear issue requires that the replies of the applicant meet the objections to and rejections of the claims. Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. . . . An amendment which does not comply with the provisions of 37 CFR 1.121(b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714.” MPEP § 714.02.  Generic statements or listing of numerous paragraphs do not “specifically point out the support for” claim amendments.  “With respect to newly added or amended claims, applicant should show support in the original disclosure for the new or amended claims. See, e.g., Hyatt v. Dudas, 492 F.3d 1365, 1370, n.4, 83 USPQ2d 1373, 1376, n.4 (Fed. Cir. 2007) (citing MPEP § 2163.04 which provides that a ‘simple statement such as ‘applicant has not pointed out where the new (or amended) claim is supported, nor does there appear to be a written description of the claim limitation ‘___’ in the application as filed’ may be sufficient where the claim is a new or amended claim, the support for the limitation is not apparent, and applicant has not pointed out where the limitation is supported.’)” MPEP § 2163(II)(A).


Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-5, 8-11, and 13-14 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA  the inventor(s), at the time the application was filed, had possession of the claimed invention.   
In General: Separately listed claim elements are construed as distinct components, all claim terms must be given weight, and there is presumed to be a difference in meaning and scope when different words or phrases are used in separate claims.  Since different term or phrases are presumed to differ in scope and each term or phrase in the claims must find clear support in the description, a description of a single element in the Specification may fail to support multiple claim terms.  “[C]laims must ‘conform to the invention as set forth in the remainder of the specification and the terms and phrases used in the claims must find clear support or antecedent basis in the description so that the meaning of the terms in the claims may be ascertainable by reference to the description.’ 37 C.F.R. § 1.75(d)(1).”  Phillips v. AWH Corp., 415 F.3d 1303, 1316 (Fed. Cir. 2005) (as cited in MPEP § 2111). Further, a lack of lack of detail in the Specification describing how a claimed result is achieved can support a finding that the Applicant was not in possession of the claimed invention at the time of filing, notwithstanding verbatim support. “It is not enough that one skilled in the art could write a program to achieve the claimed function because the specification must explain how the inventor intends to achieve the claimed function to satisfy the written description requirement. See, e.g., Vasudevan Software, Inc. v. MicroStrategy, Inc., 782 F.3d 671, 681-683, 114 USPQ2d 1349, 1356, 1357 (Fed. Cir. 2015) (reversing and remanding the district court’s grant of summary judgment of invalidity for lack of adequate written description where there were genuine issues of material fact regarding "whether the specification show[ed] possession by the inventor of how accessing disparate databases is achieved"). If the specification does not provide a disclosure of the computer and algorithm in sufficient detail to demonstrate to one of ordinary skill in the art that the inventor possessed the invention a rejection under 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph, for lack of written description must be made.”  MPEP § 2161.01(I). “An original claim may lack written description support when (1) the claim defines the invention in functional language specifying a desired result but the disclosure fails to sufficiently identify how the function is performed or the result is achieved[.] See Ariad Pharms., Inc. v. Eli Lilly & Co., 598 F.3d 1336, 1349-50 (Fed. Cir. 2010) (en banc). The written description requirement is not necessarily met when the claim language appears in ipsis verbis in the specification. ‘Even if a claim is supported by the specification, the language of the specification, to the extent possible, must describe the claimed invention so that one skilled in the art can recognize what is claimed. The appearance of mere indistinct words in a specification or a claim, even an original claim, does not necessarily satisfy that requirement.’”  MPEP § 2163.03.
All independent claims substantially recite “wherein the set of trainable weights of the set of trainable parameters includes weights quantized based on additive powers-of-two that includes a summation of two raised to the power of a first integer and two raised to the power of a second integer, such that, during execution of the artificial neural network by the at least one processor, affine transformations of the artificial neural network are performed using bit-shifting and accumulation operations corresponding to the weights quantized based on additive powers-of-two, without multiplication operations[.]” An Affine transformation generally follows the pattern y=Wx+b. This is consistent with the figure 6 of this application. See e.g. items 610 and 620 in figure 6. As best understood, figure 6 follows the usual convention where W is the weight matrix, x is the input vector, and y is an output. The problem is that “b” is the term is what distinguishes a linear transform from an affine transform. But the Specification is silent to any use of the term “b.” Further, figure 7c indicates that only the weight and input vectors are calculated using APOT. See Fig. 7c. Since the claims recite “affine transforms are performed using bit-shifting and accumulation operations,” but the specification only shows a linear transform being performed using bit shifting and accumulation operations, the claimed subject matter is not supported. In other words, the “b” in the linear transformation may represent a bias value, but nothing was found in the Specification describing the use of APOT for the bias value. 
All dependent claims are rejected as containing the limitations of the claims from which they depend.  



The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-5, 8-11, and 13-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
In General: Separately listed claim elements are construed as distinct components, all claim terms must be given weight, there is presumed to be a difference in meaning and scope when different words or phrases are used in separate claims, and repeated and consistent descriptions in the specification indicate the proper scope of a claimed term. “[C]laims must ‘conform to the invention as set forth in the remainder of the specification and the terms and phrases used in the claims must find clear support or antecedent basis in the description so that the meaning of the terms in the claims may be ascertainable by reference to the description.’ 37 C.F.R. § 1.75(d)(1).”  Phillips v. AWH Corp., 415 F.3d 1303, 1316 (Fed. Cir. 2005) (as cited in MPEP § 2111).  Therefore, use of two different terms in the claims that both rely on the description of a single structure in the Specification may render at least one term indefinite because there is no way to determine which term should be construed in view of the description of the single structure. 
All independent claims substantially recite “wherein the set of trainable weights of the set of trainable parameters includes weights quantized based on additive powers-of-two that includes a summation of two raised to the power of a first integer and two raised to the power of a second integer, such that, during execution of the artificial neural network by the at least one processor, affine transformations of the artificial neural network are performed using bit-shifting and accumulation operations corresponding to the weights quantized based on additive powers-of-two, without multiplication operations[.]” An Affine transformation generally follows the pattern y=Wx+b. This is consistent with the figure 6 of this application. See e.g. items 610 and 620 in figure 6. As best understood, figure 6 follows the usual convention where W is the weight matrix, x is the input vector, and y is an output. The problem is that “b” is the term is what distinguishes a linear transform from an affine transform, and it is not clear whether the use of “affine transform” requires the “b” term (i.e. bias) to be implemented “using bit-shifting and accumulation operations” or if only weights must be quantized.
All dependent claims are rejected as containing the limitations of the claims from which they depend.  



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 8-11, and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou (Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask; March 2020), Nielsen (Neural networks and deep learning; 2015), “Li (Additive Powers-Of-Two Quantization: An Efficient Non-Uniform Discretization for Neural Networks; 2020), and Saifullah teaches (Neural Networks From Linear Algebraic Perspective July 2021)
1. A computer-implemented method for training a set of artificial neural networks, performed by at least one computing processor, wherein the method uses the at least one processor coupled with a memory storing instructions implementing the method, wherein the instructions, when executed by the at least processor, (“Our experiments required more computation than regular training procedures, as networks were trained up to 24 times with iterative pruning. We used single GPUs for each experiment (NVIDIA GeForce GTX 1080 Ti) and parallelized by running multiple experiments on multiple GPUs.” Zhou P. 12. (Spec. Sheet for TTX 1080 Ti is included in the file wrapper as evidence showing how POSA would understand this term of art.) Further, one of ordinary skill in the art would understand the experiments in Zhou to be carried out on an ordinary computer using a memory and a processor running instructions. Note that Zhou teaches training a neural network with hundreds of thousands of weights using tens of thousands of images carrying out tens of thousands of iterations per batch of training data. See Zhou P. 12, Table S1. “The sole issue is whether claims 7 and 8 are anticipated[.] . . . We agree with appellant that Figure 1 of Thacker, by itself, does not disclose every limitation in the appealed claims. However, in considering the disclosure of a reference, it is proper to take into account not only specific teachings of the reference but also the inferences which one skilled in the art would reasonably be expected to draw therefrom. In re Shepard, 319 F.2d 194, 50 CCPA 1439 (1963).” In re Preda, 401 F.2d 825, 826 (C.C.P.A. 1968).  One of ordinary skill in the art, an engineer working in the area of neural networks, would draw the inference that a computer using a memory and processor executing instructions were used to carry out training of the networks described in the Zhou.) carry out steps of the method, comprising: (a) initializing a set of trainable parameters of an artificial neural network, wherein the set of trainable parameters comprise a set of trainable weights and a set of trainable biases; (“Different mask criteria can be thought of as segmenting the 2D (wi = initial weight value, wf = final weight value) space into regions corresponding to mask values of 1 vs 0.” Zhou P. 2.   “We attempt to answer these questions by exploiting the essential steps in the lottery ticket algorithm, described below: 0. Initialize a mask m to all ones. Randomly initialize the parameters w of a network f(x;w m)” Zhou P. 2.  
Zhou does not expressly teach initialization of biases.
Nielsen teaches initialization of biases.  See Nielsen PP. 36-37 showing source code that initializes biases.  
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Nielsen because this reference teaches the basic implementation used to train a neural network including the specific steps technical steps that result in the ability of a network to learn to generalize a solution to a problem based on training data.  See also Nielsen P. 10.) (b) training the set of trainable parameters using a set of training data, (1. Train the parameters w of the network f(x;w m) to completion. Denote the initial weights before training wi and the final weights after training wf.”  Zhou P. 2.) wherein the set of trainable weights of the set of trainable parameters includes weights quantized based on additive powers-of-two that includes a summation of two raised to the power of a first integer and two raised to the power of a second integer, (The previously cited art does not teach APOT. Li teaches “To tackle the rigid resolution problem, we propose Additive Powers-of-Two (APoT) quantization. . . . In APoT quantization, each level is the sum of n PoT terms as shown below [in equation 1] . . . where is a scaling coefficient to make sure the maximum level in Qa is . k is called the base bit-width, which is the bit-width for each additive term, and n is the number of additive terms. When the bit-width b and the base bit-width k is set, n can be calculated by n = bk. There are 2kn = 2b levels in total. The number of additive terms in APoT quantization can increase with bit-width b, which provides a higher resolution for the non-uniform levels.” Li P. 4. See also Equation 5: 

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale

It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Li because using APOT is a resource efficient way to carry out calculations in neural networks.) such that, during execution of the artificial neural network by the at least one processor, (“We propose Additive Powers-of-Two (APoT) quantization, an efficient nonuniform quantization scheme for the bell-shaped and long-tailed distribution of weights and activations in neural networks.” Li Abstract.) affine transformations of the artificial neural network are performed (The previously cited art does not use the term “affine transformation” or clearly show this mathematical representation of the operations within a neural network. Saifullah teaches “A sample from the dataset gets feed into the network which is a column vector represented by x in the figure, then we apply a linear transformation on x which is simply a dot product between the vector and the weight matrix then we add a bias vector, thus, it now becomes an affine transformation of the input which ultimately gives us a vector that we pass through a non-linear function f(.) and what we get is a vector called the activations. . . . After that, we repeat the same process, that is, passing through an affine transformation followed by another non-linear function sigmoid and we get our class prediction (blue node at the end) from the network. It could be shown as the following: σ(Woa + b) = y^[.]”) Saifullah pp. 1-3.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Saifullah because the use of Affine transformations are conducive to using linear algebra in describing neural networks.) using bit-shifting and accumulation operations corresponding to the weights quantized based on additive powers-of-two, without multiplication operations, (“Furthermore, multiplication between a Powers-of-two number 2x and the other operand r can be implemented by bit-wise shift instead of bulky digital multipliers, i.e., [equation 4] where >> denotes the right shift operation and is computationally cheap, which only takes 1 clock cycle in modern CPU architectures.” Li p. 3. “Figure 3 shows the hardware accelerator, the weights buffer takes k-bit as a PoT term and shift-adds the activations.” Li p. 4.) and wherein a value of the first integer in greater than a value of the second integer; (“For this example, we have p0 E {0; 2-0; 2-2; 2-4}, p1 E {0, 2-1, 2-3, 2-5} ,   . . . (p0 + p1) for all (2b = 16) combinations of p0 and p1.” Li p. 4. See also Li Equation 5 above.) (c) generating a pruning mask based on the trained set of trainable parameters; (“Mask Criterion. Use the mask criterion M(wi;wf ) to produce a masking score for each currently unmasked weight. Rank the weights in each layer by their scores, set the mask value for the top p% to 1, the bottom (100 􀀀 p)% to 0, breaking ties randomly. Here p may vary by layer, and we follow the ratios chosen in [5], summarized in Table S1. In [5] the mask selected weights with large final value corresponding to M(wi;wf ) = jwf j.”  Zhou P. 2.) (d) rewinding the set of trainable parameters; (3. Mask-1 Action. Take some action with the weights with mask value 1. In [5] these weights were reset to their initial values and marked for training in the next round.) (e) pruning a selected set of trainable parameters based on the pruning mask; (4. Mask-0 Action. Take some action with the weights with mask value 0. In [5] these weights were pruned: set to 0 and frozen during any subsequent training.) and (f) repeating the above steps from (b) to (e) for a specified number of times to generate a set of sparse neural networks having an incremental sparsity. (“Repeat from 1 if performing iterative pruning.” Zhou P. 2. Note that weights below some threshold are pruned at each iteration, creating a set of sparse neural networks having “incremental sparsity.”  Zhou teaches: “the pruned, skeletal LT networks train well when you rewind to its original initialization, but degrades in performance when you randomly reinitialize the network.”  Zhou P. 4.)

2. The method of claim 1, wherein the training further comprises the steps of: feeding the set of training data into a plurality of input nodes of the artificial neural network; (Zhou teaches “In this section and throughout the remainder of the paper, we follow the experimental framework from [5] and perform iterative pruning experiments on a 3-layer fully-connected network (FC) trained on MNIST [12] and on three convolutional neural networks (CNNs), Conv2, Conv4, and Conv6 (small CNNs with 2/4/6 convolutional layers, same as used in [5]) trained on CIFAR-10 [11].” Zhou P. 3.  Zhou does not expressly state that the training data is fed into “a plurality of an input node” or generally explain the individual steps of error backpropagation, used to train neural networks.  
Nielsen teaches: “What is a neural network? To get started, I'll explain a type of artificial neuron called a perceptron. . . . So how do perceptrons work? A perceptron takes several binary inputs, , and produces a single binary output[.] . . . In the example shown the perceptron has three inputs, x1, x2, x3. In general it could have more or fewer inputs. Rosenblatt proposed a simple rule to compute the output. He introduced weights, w1, w1, . . ., real numbers expressing the importance of the respective inputs to the output.” Nielsen PP. 3-4.  “And it should seem plausible that a complex network of perceptrons could make quite subtle decisions[.] . . . In this network, the first column of perceptrons - what we'll call the first Layer of perceptrons - is making three very simple decisions, by weighing the input evidence. What about the perceptrons in the second layer? Each of those perceptrons is making a decision by weighing up the results from the first layer of decision-making.” Nielsen PP. 5-6.  See also Nielsen P. 16 showing the “input layer.”  “The backpropagation equations provide us with a way of computing the gradient of the cost function. Let's explicitly write this out in the form of an algorithm: 1. Input: Set the corresponding activation for the input layer.” Nielsen P.  70.
With respect to the backpropagation algorithm, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Nielsen before the effective filing date because the steps of backpropagation train the network to actually become a usable model based on the data.) 
propagating the set of training data across the artificial neural network according to the set of pruning masks and the trainable parameters; and generating a set of output values from a plurality of output nodes of the artificial network; (Zhou teaches that “pruning” sets the weights to zero. “Take some action with the weights with mask value 0. In [5] these weights were pruned: set to 0 and frozen during any subsequent training.” Zhou P. 2. “Typical network pruning procedures [9, 8, 15] perform two actions on pruned weights: set them to zero, and freeze them in subsequent training (equivalent to removing those connections from the network).” Zhou P. 5.  Note that the forward pass will be executed “according to the pruning masks” that set the weights when using the backpropagation algorithm taught in Nielsen.  Nielsen P. 70 explains the forward pass of during backpropagation.  Note that “w” “a” and “b” refer to weights and activations and biases at a given layer, respectively. The “z” is the output of a given layer. 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
)
calculating a set of loss values for the set of training data based on the set of output values; (Nielsen P. 70 explains the output error: 

    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale

updating the set of trainable parameters based on the set of loss values through backpropagation; and (Nielsen P. 70 explains the backward pass of backpropagation used to update parameters in the network:)

    PNG
    media_image4.png
    200
    400
    media_image4.png
    Greyscale

repeating the above steps for a specified number of iteration times.  (Nielsen teaches: “Of course, to implement stochastic gradient descent in practice you also need an outer loop generating mini-batches of training examples, and an outer loop stepping through multiple epochs of training. I've omitted those for simplicity.”  Nielsen P. 71.)

3. The method of claim 2, wherein the updating the set of trainable parameters is based on at least one of stochastic gradient descent, resilient backpropagation, root-mean-square propagation, Broyden-Fletcher-Goldfarb-Shanno algorithm, adaptive momentum optimization, adaptive subgradient, and adaptive delta.  (Zhou teaches “We train the networks with mask m for each layer (and all regular kernels and biases frozen) with SGD, 0.9 momentum.”  Zhou P. 13.  Note that Nielsen P. 71 also teaches using SGD.)

4. The method of claim 1, wherein the set of loss values is based on at least one of mean-square error, mean absolute error, cross entropy, connectionist temporal classification loss, negative log-likelihood, Kullback-Leibler divergence, margin loss, ranking loss, embedding loss, hinge loss, and Huber loss.  (Zhou teaches “A different method to try would be to add an L1 loss to influence layers to go toward certain values, which may alleviate the cold start problems of some networks not learning anything due to mask values starting too low (effectively having the entire network start at zero).” Zhou P. 13.  Note that L1 loss refers to mean absolute error.  From the context of the reference it is unclear whether this is a different embodiment so a motivation to modify the original embodiment is given. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Zhou to include L1 loss because this may alleviate the cold start problem described in the cited portion of the reference.   

5. The method of claim 2, wherein the updating the set of trainable parameters further comprises the step of rounding the trainable weights to quantize values based on power-of-two, the additive powers-of-two, power-of-three, or additive powers- of-three. (See rejection of claim 1. See also Li Fig. 2.)

8. A computer-implemented method for testing an artificial neural network, performed by at least one computing processor, wherein the method uses the at least one processor coupled with a memory storing instructions implementing the method, wherein the instructions, when executed by the at least processor, carry out steps of the method, comprising: (See rejection of claim 1 showing the specific model of graphics card used to train models in Zhou. Further, one of ordinary skill in the art would understand the implementation of a neural network using source code as a reference to operations to be carried out on a standard computer storing instructions and data in memory, and executing instructions on a processor.  See e.g. Nielsen PP. 40-43 teaching implementation of backpropagation using code written in Pytorch.  “The sole issue is whether claims 7 and 8 are anticipated[.] . . . We agree with appellant that Figure 1 of Thacker, by itself, does not disclose every limitation in the appealed claims. However, in considering the disclosure of a reference, it is proper to take into account not only specific teachings of the reference but also the inferences which one skilled in the art would reasonably be expected to draw therefrom. In re Shepard, 319 F.2d 194, 50 CCPA 1439 (1963).” In re Preda, 401 F.2d 825, 826 (C.C.P.A. 1968).  One of ordinary skill in the art, an engineer in the area of neural networks, would draw the inference that a computer using a memory and processor executing instructions were used to carry out training of the networks described in the Zhou.) feeding a set of testing data into a plurality of input nodes of the artificial neural network, (See rejections of claims 1 and 2 including Nielsen’s description of training.) wherein the artificial neural network includes trained weights quantized based on additive powers-of-two that includes a summation of two raised to the power of a first integer and two raised to the power of a second integer, wherein a value of the first integer in greater than a value of second integer; (See rejection of claim 1.) propagating the set of testing data across the artificial neural network according to a set of pruning masks; wherein propagating comprises performing affine transformations using bit-shifting and accumulation operations corresponding to the trained weights quantized based on additive powers-of-two, without multiplication operations; and generating a set of output values from a plurality of output nodes of the artificial network. (See rejections of claims 1 and 2. Note Nielsen’s description of training.  Note that Zhou teaches testing of the neural networks.  “Test accuracy at early stopping iteration of different mask criteria for four networks at various pruning rates.”  Further, Nielsen teaches how source code is modified to include testing during backpropagation.  See e.g. Nelson PP. 38-41 (“If ‘test data’ is provided then the network will be evaluated against the test data after each epoch, and partial progress printed out. This is useful for tracking progress, but slows things down substantially.”)  It would have been obvious to combine this feature of Nelson because it is useful for tracking how training affects performance, which can potentially allow the user to see where performance can be improved.)

9. The method of claim 8, wherein the propagating further comprises the steps of: transforming values of neuron nodes according to a trained set of trainable parameters of the artificial neural network; modifying the transformed values of neuron nodes according to a set of activation functions; and repeating the above steps across a plural of a neuron layer of the artificial neural network. (See rejection of claims 1 and 2.  Note that the “a” in the equations of Nielsen cited in the rejection of claim 2 refers to the activation function in a “neuron node.”)

10. The method of claim 9, wherein the transforming values of neuron nodes is based on at least one of sign flipping, bit shifting, accumulation, and biassing, according to a quantized set of trainable parameters with power-of-two, the additive powers-of-two, power-of-three, or additive powers-of-three.  (See rejection of claim 5.)

For rejection of claim 11, see rejection of claim 1.

13. The system of claim 11, wherein the artificial neural network uses a set of pruning masks to reduce arithmetic operations. (“As pointed out in [5], it appears that the specific combination of pruning mask and weights underlying the mask form a more efficient subnetwork found within the larger network, or, as named by the original study, a lucky winning “Lottery Ticket,” or LT.” Zhou P. 1. “Some compression approaches enable more efficient computation by pruning parameters[.]” Zhou P. 1.  “Typical network pruning procedures [9, 8, 15] perform two actions on pruned weights: set them to zero, and freeze them in subsequent training (equivalent to removing those connections from the network).” Zhou P. 5. Note also that “to reduce arithmetic operations” does not require steps to be performed or limit to a particular structure.  Intended use language is explained in MPEP §§ 2103 and 2111.02.  “Claim scope is not limited by claim language that suggests or makes optional but does not require steps to be performed, or by claim language that does not limit a claim to a particular structure.”  MPEP § 2111.04.)

14. The system of claim 11, wherein the artificial neural network uses a set of quantized parameters to reduce arithmetic multiplication operations using power-of-two, , power-of-three, or additive powers-of-three. (See rejection of claim 5.)

Response to Arguments
Applicant's arguments filed 05/20/2026 have been fully considered but they are not persuasive.
Rejections under § 112a
All rejections in the previous action are withdrawn in response to claim amendments. 
Rejections under § 112b
All rejections in the previous action are withdrawn in response to claim amendments. 
Rejections under § 101:
The rejection under this section is withdrawn. After further consideration, the pruning method recited in all independent claims is directed to an improvement in the field of machine learning. 
Rejections under § 103:
The amendments overcame the art cited in the previous action. New art was cited rendering Applicant’s arguments moot. 




Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL M KNIGHT whose telephone number is (571) 272-8646. The examiner can normally be reached Monday - Friday 9-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached on (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

PAUL M. KNIGHTPrimary ExaminerArt Unit 2148



/PAUL M KNIGHT/
Primary Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Mar 28, 2022
Application Filed
Mar 13, 2025
Non-Final Rejection mailed — §103, §112
Jun 11, 2025
Response Filed
Jan 26, 2026
Final Rejection mailed — §103, §112
Apr 02, 2026
Response after Non-Final Action
May 20, 2026
Request for Continued Examination
May 22, 2026
Response after Non-Final Action
May 29, 2026
Non-Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/958,020
Patent 12682223
NON-UNIFORM QUANTIZATION FOR FLEXIBLE POWER-OF-TWO COMPUTATIONS IN NEURAL NETWORKS
3y 9m to grant Granted Jul 14, 2026
17/749,740
Patent 12670434
MULTIPLE INSTANCE LEARNING MODELS FOR CYBERSECURITY USING JAVASCRIPT OBJECT NOTATION (JSON) TRAINING DATA
4y 1m to grant Granted Jun 30, 2026
17/680,932
Patent 12657524
JOINTLY PREDICTING MULTIPLE INDIVIDUAL-LEVEL FEATURES FROM AGGREGATE DATA
4y 3m to grant Granted Jun 16, 2026
17/468,498
Patent 12530592
NON-LINEAR LATENT FILTER TECHNIQUES FOR IMAGE EDITING
4y 4m to grant Granted Jan 20, 2026
18/017,589
Patent 12530612
METHODS FOR ALLOCATING LOGICAL QUBITS OF A QUANTUM ALGORITHM IN A QUANTUM PROCESSOR
2y 12m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
62%
Grant Probability
79%
With Interview (+17.0%)
3y 2m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 278 resolved cases by this examiner. Grant probability derived from career allowance rate.