DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Style
In this action unitalicized bold is used for claim language, while italicized bold is used for emphasis.
Information Disclosure Statement
No information Disclosure Statement appears to have been filed with this application. The Specification of this application alludes to various prior art publications that teach aspects of the invention(s) claimed in this application. See e.g. Spec. ¶¶2-5 (“In prior arts, some memory-efficient DNN methods and systems were proposed . . . One type of such approaches . . . Another type of approaches . . . Both of them are known as a network distillation technique . . . Recently, another simple approach for weights pruning was proposed . . . It was shown that the LTH pruning can provide better performance . . .”) Further the named inventor of this application appears to have published an academic paper including the subject matter of this application 6 days after the provisional application was filed. See Akino (Zero-Multiplier Sparse DNN Equalization for Fiber-Optic QAM Systems with Probabilistic Amplitude Shaping.) Several claims are obvious in view of a combination of documents cited in the academic publication. It is submitted that documents alluded to in the background section and documents listed in the academic publication by the inventor may be material to patentability.
Applicant Reply
“The claims may be amended by canceling particular claims, by presenting new claims, or by rewriting particular claims as indicated in 37 CFR 1.121(c). The requirements of 37 CFR 1.111(b) must be complied with by pointing out the specific distinctions believed to render the claims patentable over the references in presenting arguments in support of new claims and amendments. . . . The prompt development of a clear issue requires that the replies of the applicant meet the objections to and rejections of the claims. Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. . . . An amendment which does not comply with the provisions of 37 CFR 1.121(b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714.” MPEP § 714.02. Generic statements or listing of numerous paragraphs do not “specifically point out the support for” claim amendments. “With respect to newly added or amended claims, applicant should show support in the original disclosure for the new or amended claims. See, e.g., Hyatt v. Dudas, 492 F.3d 1365, 1370, n.4, 83 USPQ2d 1373, 1376, n.4 (Fed. Cir. 2007) (citing MPEP § 2163.04 which provides that a ‘simple statement such as ‘applicant has not pointed out where the new (or amended) claim is supported, nor does there appear to be a written description of the claim limitation ‘___’ in the application as filed’ may be sufficient where the claim is a new or amended claim, the support for the limitation is not apparent, and applicant has not pointed out where the limitation is supported.’)” MPEP § 2163(II)(A).
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-6 and 8-14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) and the claims as a whole, considering all claim elements both individually and in combination, do not amount to significantly more.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter?
All claims are found to be directed to one of the four statutory categories, unless otherwise indicated in this action.
Step 2A Prongs One and Two (Alice Step 1): According to Office guidance, claims that read on math do not recite an abstract idea at step 2A1, when the claims fail to refer to the math by name.1 The MPEP also equates “recit[ing] a judicial exception” with “state[ing]” or “describ[ing]” an abstract idea in the claims.2 Consistent with this guidance an abstract idea may be first recited in a dependent claim, even though the independent claims may read on the abstract idea. Claim limitations which recite any of the abstract idea groupings set forth in the manual are found to be directed, as a whole, to an abstract idea unless otherwise indicated.3 The claims do not recite additional elements that integrate the abstract ideas into a practical application.4 To confer patent eligibility to an otherwise abstract idea, claims may recite a specific means or method of solving a specific problem in a technological field.5
1. A computer-implemented method for training a set of artificial neural networks, performed by at least one computing processor, wherein the method uses the at least one processor coupled with a memory storing instructions implementing the method, wherein the instructions, when executed by the at least processor, carry out steps of the method, comprising: (This merely recite application of a generic computer used in its ordinary capacity to implement the abstract idea below. See MPEP § 2106.05(f).) (a) initializing a set of trainable parameters of an artificial neural network, wherein the set of trainable parameters comprise a set of trainable weights and a set of trainable biases; (This reads on setting numbers, which is a mathematical operation.) (b) training the set of trainable parameters using a set of training data, wherein the set of trainable weights of the set of trainable parameters includes weights quantized based on additive powers-of-two that includes a summation two raised to the power a first integer and two raised to the power a second integer, and wherein a value of the first integer in greater than a value of second integer; (Training “parameters” based on additive powers is a mathematical operation.) (c) generating a pruning mask based on the trained set of trainable parameters; (Generating a pruning mask involves generating numbers, which is a mathematical operation.) (d) rewinding the set of trainable parameters; (This reads on resetting weights to an earlier value, which is a mathematical operation.) (e) pruning a selected set of trainable parameters based on the pruning mask; (This reads on using the numbers in the pruning mask to remove numbers from matrix operations, which is math.) and (f) repeating the above steps from (b) to (e) for a specified number of times to generate a set of sparse neural networks having an incremental sparsity.” (Repeating math reads on more math. The language “to generate a set of sparse neural networks having an incremental sparsity” is first, and intended use. Second this language is merely the outcome of the claimed math that deletes connections, thereby removing numbers from matrix operations.)
2. The method of claim 1, wherein the training further comprises the steps of: feeding the set of training data into a plurality of input nodes of the artificial neural network; (The claimed “feeding” training data into a neural network reads on multiplying training data values by the first weights in the network, which is math.) propagating the set of training data across the artificial neural network according to the set of pruning masks and the trainable parameters; (The claimed “propagating . . . according to . . . masks” merely reads on carrying out matrix operations.) and generating a set of output values from a plurality of output nodes of the artificial network; (The claimed “generating” output values reads on multiplication that takes place at the last layer of the network.) calculating a set of loss values for the set of training data based on the set of output values; (Calculating loss values reads on determining a difference between the correct and output value, also called subtraction.) updating the set of trainable parameters based on the set of loss values through backpropagation; (Updating the parameters based on loss values through backpropagation reads on using the chain rule to differentiate functions corresponding to each path backwards through the network. This is math.) and repeating the above steps for a specified number of iteration times. (Repeating math is math.)
3. The method of claim 2, wherein the updating the set of trainable parameters is based on at least one of stochastic gradient descent, resilient backpropagation, root-mean-square propagation, Broyden-Fletcher-Goldfarb-Shanno algorithm, adaptive momentum optimization, adaptive subgradient, adaptive delta. (Stochastic gradient descent is math.)
4. The method of claim 1, wherein the set of loss values is based on at least one of mean-square error, mean absolute error, cross entropy, connectionist temporal classification loss, negative log-likelihood, Kullback-Leibler divergence, margin loss, ranking loss, embedding loss, hinge loss, Huber loss. (Calculating loss values based on mean-square error is math.)
5. The method of claim 2, wherein the updating the set of trainable parameters further comprises the step of rounding the trainable weights to quantize values based on power-of-two, the additive powers-of-two, power-of-three, or additive powers- of-three. (This reads on mathematical operations.)
6. The method of claim 1, wherein the incremental sparsity is controlled by an auxiliary neural network trained through a deep reinforcement learning framework. (Controlling the incremental sparsity by an auxiliary neural network (product) created by the process of reinforcement learning reads on changing the mask values controlling sparsity in each network. Changing mask values is math.)
8. A computer-implemented method for testing an artificial neural network, performed by at least one computing processor, wherein the method uses the at least one processor coupled with a memory storing instructions implementing the method, wherein the instructions, when executed by the at least processor, carry out steps of the method, comprising: (This merely recites a generic computer used in its ordinary capacity to implement an abstract idea. See MPEP § 2106.05(f).) feeding a set of testing data into a plurality of input nodes of the artificial neural network (See rejection of claim 2.) wherein the artificial neural network includes trained weights quantized based on additive powers-of-two that includes a summation two raised to the power a first integer and two raised to the power a second integer, wherein a value of the first integer in greater than a value of second integer; (See rejection of claim 1.) propagating the set of testing data across the artificial neural network according to a set of pruning masks; (See rejection of claim 2. Note that “according to a set of pruning masks” reads on using math to eliminate connections in a network corresponding to matrix operations.) and generating a set of output values from a plurality of output nodes of the artificial network. (See rejection of claim 2.)
9. The method of claim 8, wherein the propagating further comprises the steps of: transforming values of neuron nodes according to a trained set of trainable parameters of the artificial neural network; modifying the transformed values of neuron nodes according to a set of activation functions; and repeating the above steps across a plural of a neuron layer of the artificial neural network. (The claimed “transforming values . . . modifying transformed values . . . according to functions . . . and repeating” the steps across a neural network reads on math.)
10. The method of claim 9, wherein the transforming values of neuron nodes is based on at least one of sign flipping, bit shifting, accumulation, and biassing, according to a quantized set of trainable parameters with power-of-two, the additive powers-of-two, power-of-three, or additive powers-of-three. (See rejection of claim 5. Note that bit shifting is a form of division.)
11. A system deployed for an artificial neural network comprises: at least one interface link; at least one computing processor; at least one memory bank configured to store instructions implementing a training method, wherein the instructions, when executed by the at least computing processor, carry out at steps of the training method, comprising; (a) initializing a set of trainable parameters of an artificial neural network, wherein the set of trainable parameters comprise a set of trainable weights and a set of trainable biases; (b) training the set of trainable parameters using a set of training data; wherein the set of trainable weights of the set of trainable parameters includes weights quantized based on additive powers-of-two that includes a summation two raised to the power a first integer and two raised to the power a second integer, and wherein a value of the first integer in greater than a value of second integer; (c) generating a pruning mask based on the trained set of trainable parameters; (d) rewinding the set of trainable parameters; (e) pruning a selected set of trainable parameters based on the pruning mask; and (f) repeating the above steps from (b) to (e) for a specified number of times to generate a set of sparse neural networks having an incremental sparsity. (See rejection of claim 1.)
12. The system of claim 11, wherein a student model is designed in parallel to the artificial neural network for a knowledge distillation to find a pareto-optimal trade-off between performance and complexity of the artificial neural network. (Designing a student model in “for knowledge distillation” reads on a mental process with an intended use.)
13. The system of claim 11, wherein the artificial neural network uses a set of pruning masks to reduce arithmetic operations. (Pruning, the elimination of connections that required in matrix operations, is math. Further, the pruning mask reads on something that multiplies or sets a connection to 0, which is math.)
14. The system of claim 11, wherein the artificial neural network uses a set of quantized parameters to reduce arithmetic multiplication operations using power- 3 of-two, power-of-three, or additive powers-of-three.” (See rejection of claim 5.)
Step 2B (Alice Step 2): The rejected claims do not recite additional elements that amount to significantly more than the judicial exception.
All additional limitations that do not integrate the claimed judicial exception into a practical application also fail to amount to significantly more, for the reasons given at step 2A2. All limitations found to be extra-solution activity at step 2A2 are found to be WURC, including limitations that read on mere data gathering, data storage, and data input/output/transfer. This finding is based on cases which have recognized that generic input-output operations, repetitive processing operations, and storage operations are WURC.6 Other aspects of generic computing have also been found to be WURC.7 Further, the description itself may provide support for a finding that claim elements are WURC. The analysis under § 112(a) as to whether a claim element is “so well-known that it need not be described in detail in the patent specification” is the same as the analysis as to whether the claim element is widely prevalent or in common use.8 Similarly, generic descriptions in the Specification of claimed components and features has been found to support a conclusion that the claimed components were conventional.9 Improvements to the relevant technology may support a finding that the claims include a patent eligible inventive concept. But some mechanism that results in any asserted improvements must be recited in the claim, and the Specification must provide sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing the improvement.10
All dependent claims are rejected as containing the material of the claims from which they depend.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 6 and 12 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA the inventor(s), at the time the application was filed, had possession of the claimed invention.
In General: Separately listed claim elements are construed as distinct components, all claim terms must be given weight, and there is presumed to be a difference in meaning and scope when different words or phrases are used in separate claims. Since different term or phrases are presumed to differ in scope and each term or phrase in the claims must find clear support in the description, a description of a single element in the Specification may fail to support multiple claim terms. “[C]laims must ‘conform to the invention as set forth in the remainder of the specification and the terms and phrases used in the claims must find clear support or antecedent basis in the description so that the meaning of the terms in the claims may be ascertainable by reference to the description.’ 37 C.F.R. § 1.75(d)(1).” Phillips v. AWH Corp., 415 F.3d 1303, 1316 (Fed. Cir. 2005) (as cited in MPEP § 2111). Further, a lack of lack of detail in the Specification describing how a claimed result is achieved can support a finding that the Applicant was not in possession of the claimed invention at the time of filing, notwithstanding verbatim support. “It is not enough that one skilled in the art could write a program to achieve the claimed function because the specification must explain how the inventor intends to achieve the claimed function to satisfy the written description requirement. See, e.g., Vasudevan Software, Inc. v. MicroStrategy, Inc., 782 F.3d 671, 681-683, 114 USPQ2d 1349, 1356, 1357 (Fed. Cir. 2015) (reversing and remanding the district court’s grant of summary judgment of invalidity for lack of adequate written description where there were genuine issues of material fact regarding "whether the specification show[ed] possession by the inventor of how accessing disparate databases is achieved"). If the specification does not provide a disclosure of the computer and algorithm in sufficient detail to demonstrate to one of ordinary skill in the art that the inventor possessed the invention a rejection under 35 U.S.C. 112(a) or pre-AIA 35 U.S.C. 112, first paragraph, for lack of written description must be made.” MPEP § 2161.01(I). “An original claim may lack written description support when (1) the claim defines the invention in functional language specifying a desired result but the disclosure fails to sufficiently identify how the function is performed or the result is achieved[.] See Ariad Pharms., Inc. v. Eli Lilly & Co., 598 F.3d 1336, 1349-50 (Fed. Cir. 2010) (en banc). The written description requirement is not necessarily met when the claim language appears in ipsis verbis in the specification. ‘Even if a claim is supported by the specification, the language of the specification, to the extent possible, must describe the claimed invention so that one skilled in the art can recognize what is claimed. The appearance of mere indistinct words in a specification or a claim, even an original claim, does not necessarily satisfy that requirement.’” MPEP § 2163.03.
Claim 6 recites “wherein the incremental sparsity is controlled by an auxiliary neural network trained by multiple training episodes through a deep reinforcement learning framework.” The Specification only mentions an “auxiliary” network once. See Spec. ¶47 (“For some embodiments, the control of the incremental sparcification is adjusted by another auxiliary neural network trained by multiple trainng [sic] episodes through a deep reinforcement learning framework.”). See also Spec ¶¶1-56. This is the entire explanation of the claimed use of an auxiliary neural network trained with reinforcement learning to implement incremental sparsity. While neural networks trained with reinforcement learning are known, this description lacks sufficient detail to demonstrate to POSA that the Applicant was in possession of such an algorithm that could be applied to the problem of incrementally determining which connections in a neural network to omit. “If the specification does not provide a disclosure of the computer and algorithm in sufficient detail to demonstrate to one of ordinary skill in the art that the inventor possessed the invention a rejection under 35 U.S.C. 112(a) or pre-AIA 35 U.S.C. 112, first paragraph, for lack of written description must be made.” MPEP § 2161.01(I). “An original claim may lack written description support when (1) the claim defines the invention in functional language specifying a desired result but the disclosure fails to sufficiently identify how the function is performed or the result is achieved[.] See Ariad Pharms., Inc. v. Eli Lilly & Co., 598 F.3d 1336, 1349-50 (Fed. Cir. 2010) (en banc).
Claim 12 recites “The system of claim 11, wherein a student model is designed in parallel to the artificial neural network for a knowledge distillation to find a Pareto-optimal trade-off between performance and complexity of the artificial neural network.” This claims a result. The Specification fails to sufficiently identify how the result is achieved. Specifically, the claim recites design of an artificial neural network “to find” the optimization of a tradeoff between performance and complexity. The Specification includes similar language without any further explanation. See e.g. Spec. ¶¶ 8-9. Paragraph 48 and Figure 11 describe a tradeoff between “more hidden nodes and more hidden layers [that] improve performance” and “computational complexity,” but the Specification is silent as to implementation of any way of actually achieving the optimal tradeoff. At best, the Specification suggests “a moderate depth such as a 4-layer DNN can be best in the Pareto sense of the performance-complexity trade-off in low-complexity regimes,” indicating that one of ordinary skill in the art may practice the invention by simply selecting 4 layers. Spec. ¶48. It is submitted that merely naming the number of layers that works in one case, is insufficient to support a finding that the applicant was in possession of the entire scope of the claimed invention as of the effective filing date.
All dependent claims are rejected as containing the limitations of the claims from which they depend.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-6 and 8-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention.
At the outset it is noted that separately listed claim elements are construed as distinct components, that all claim terms must be given weight, there is presumed to be a difference in meaning and scope when different words or phrases are used in separate claims, and repeated and consistent descriptions in the specification indicate the proper scope of a claimed term. “[C]laims must ‘conform to the invention as set forth in the remainder of the specification and the terms and phrases used in the claims must find clear support or antecedent basis in the description so that the meaning of the terms in the claims may be ascertainable by reference to the description.’ 37 C.F.R. § 1.75(d)(1).” Phillips v. AWH Corp., 415 F.3d 1303, 1316 (Fed. Cir. 2005) (as cited in MPEP § 2111). Therefore, use of two different terms in the claims that both rely on the description of a single structure in the Specification may render at least one term indefinite because there is no way to determine which term should be construed in view of the description of the single structure.
All independent claims substantially recite “wherein the set of trainable weights of the set of trainable parameters includes weights quantized based on additive powers-of-two that includes a summation [] two raised to the power [] a first integer and two raised to the power [] a second integer, and wherein a value of the first integer in greater than a value of second integer[.]” Something appears to be missing in the location of the hard brackets. As a best guess, a preposition indicating a relationship between the words on each side of the hard brackets belongs in each of these locations. For instance, “a summation two” could refer to “a summation of two” or a “summation by two.” Similarly, “the power a first integer” and the “power a second integer” could be read as “a power raised to a first/second integer” or a “power times/divided by a first/second integer.” Since there are multiple possible ways of filling in the omitted term, the claim language is indefinite. Also, “a value of second integer” omits antecedent basis before “second integer” so it is not clear whether this term refers back to the earlier recited “a second integer.”
Claim 4 recites a list without any and/or at the end of the list. Without an and/or, it is not clear whether only one basis for the loss values is required, or if all of the listed bases for calculating loss is required are required by the claim language.
All dependent claims are rejected as containing the limitations of the claims from which they depend.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 8-11, and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou (Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask; March 2020), Nielsen (Neural networks and deep learning; 2015), and Elhoushi (DeepShift: Towards Multiplication-Less Neural Networks, June 2021).
1. A computer-implemented method for training a set of artificial neural networks, performed by at least one computing processor, wherein the method uses the at least one processor coupled with a memory storing instructions implementing the method, wherein the instructions, when executed by the at least processor, (“Our experiments required more computation than regular training procedures, as networks were trained up to 24 times with iterative pruning. We used single GPUs for each experiment (NVIDIA GeForce GTX 1080 Ti) and parallelized by running multiple experiments on multiple GPUs.” Zhou P. 12. (Spec. Sheet for TTX 1080 Ti is included in the file wrapper as evidence showing how POSA would understand this term of art.) Further, one of ordinary skill in the art would understand the experiments in Zhou to be carried out on an ordinary computer using a memory and a processor running instructions. Note that Zhou teaches training a neural network with hundreds of thousands of weights using tens of thousands of images carrying out tens of thousands of iterations per batch of training data. See Zhou P. 12, Table S1. “The sole issue is whether claims 7 and 8 are anticipated[.] . . . We agree with appellant that Figure 1 of Thacker, by itself, does not disclose every limitation in the appealed claims. However, in considering the disclosure of a reference, it is proper to take into account not only specific teachings of the reference but also the inferences which one skilled in the art would reasonably be expected to draw therefrom. In re Shepard, 319 F.2d 194, 50 CCPA 1439 (1963).” In re Preda, 401 F.2d 825, 826 (C.C.P.A. 1968). One of ordinary skill in the art, an engineer working in the area of neural networks, would draw the inference that a computer using a memory and processor executing instructions were used to carry out training of the networks described in the Zhou.) carry out steps of the method, comprising: (a) initializing a set of trainable parameters of an artificial neural network, wherein the set of trainable parameters comprise a set of trainable weights and a set of trainable biases; (“Different mask criteria can be thought of as segmenting the 2D (wi = initial weight value, wf = final weigh00t value) space into regions corresponding to mask values of 1 vs 0.” Zhou P. 2. “We attempt to answer these questions by exploiting the essential steps in the lottery ticket algorithm, described below: 0. Initialize a mask m to all ones. Randomly initialize the parameters w of a network f(x;w m)” Zhou P. 2.
Zhou does not expressly teach initialization of biases.
Nielsen teaches initialization of biases. See Nielsen PP. 36-37 showing source code that initializes biases.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Nielsen because this reference teaches the basic implementation used to train a neural network including the specific steps technical steps that result in the ability of a network to learn to generalize a solution to a problem based on training data. See also Nielsen P. 10.) (b) training the set of trainable parameters using a set of training data, (1. Train the parameters w of the network f(x;w m) to completion. Denote the initial weights before training wi and the final weights after training wf.” Zhou P. 2.) wherein the set of trainable weights of the set of trainable parameters includes weights quantized based on additive powers-of-two that includes a summation two raised to the power a first integer and two raised to the power a second integer, and wherein a value of the first integer in greater than a value of second integer; (The previously cited art does not expressly teach additive powers of two.
Elhoushi teaches: “This paper presents an approach to reduce computation and power requirements of CNNs by replacing regular multiplication-based convolution and linear operations, also known as a fully-connected layer or matrix multiplication, with bitwise-shift-based convolution and linear operations respectively. Applying bitwise shift operation on an element is mathematically equivalent to multiplying it by a power of 2.” Elhoushi P. 2.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Eloushi because this method avoids multiplication operations which require more computer resources than bit shifting.) (c) generating a pruning mask based on the trained set of trainable parameters; (“Mask Criterion. Use the mask criterion M(wi;wf ) to produce a masking score for each currently unmasked weight. Rank the weights in each layer by their scores, set the mask value for the top p% to 1, the bottom (100 p)% to 0, breaking ties randomly. Here p may vary by layer, and we follow the ratios chosen in [5], summarized in Table S1. In [5] the mask selected weights with large final value corresponding to M(wi;wf ) = jwf j.” Zhou P. 2.) (d) rewinding the set of trainable parameters; (3. Mask-1 Action. Take some action with the weights with mask value 1. In [5] these weights were reset to their initial values and marked for training in the next round.) (e) pruning a selected set of trainable parameters based on the pruning mask; (4. Mask-0 Action. Take some action with the weights with mask value 0. In [5] these weights were pruned: set to 0 and frozen during any subsequent training.) and (f) repeating the above steps from (b) to (e) for a specified number of times to generate a set of sparse neural networks having an incremental sparsity. (“Repeat from 1 if performing iterative pruning.” Zhou P. 2. Note that weights below some threshold are pruned at each iteration, creating a set of sparse neural networks having “incremental sparsity.” Zhou teaches: “the pruned, skeletal LT networks train well when you rewind to its original initialization, but degrades in performance when you randomly reinitialize the network.” Zhou P. 4.)
2. The method of claim 1, wherein the training further comprises the steps of: feeding the set of training data into a plurality of input nodes of the artificial neural network; (Zhou teaches “In this section and throughout the remainder of the paper, we follow the experimental framework from [5] and perform iterative pruning experiments on a 3-layer fully-connected network (FC) trained on MNIST [12] and on three convolutional neural networks (CNNs), Conv2, Conv4, and Conv6 (small CNNs with 2/4/6 convolutional layers, same as used in [5]) trained on CIFAR-10 [11].” Zhou P. 3. Zhou does not expressly state that the training data is fed into “a plurality of an input node” or generally explain the individual steps of error backpropagation, used to train neural networks.
Nielsen teaches: “What is a neural network? To get started, I'll explain a type of artificial neuron called a perceptron. . . . So how do perceptrons work? A perceptron takes several binary inputs, , and produces a single binary output[.] . . . In the example shown the perceptron has three inputs, x1, x2, x3. In general it could have more or fewer inputs. Rosenblatt proposed a simple rule to compute the output. He introduced weights, w1, w1, . . ., real numbers expressing the importance of the respective inputs to the output.” Nielsen PP. 3-4. “And it should seem plausible that a complex network of perceptrons could make quite subtle decisions[.] . . . In this network, the first column of perceptrons - what we'll call the first Layer of perceptrons - is making three very simple decisions, by weighing the input evidence. What about the perceptrons in the second layer? Each of those perceptrons is making a decision by weighing up the results from the first layer of decision-making.” Nielsen PP. 5-6. See also Nielsen P. 16 showing the “input layer.” “The backpropagation equations provide us with a way of computing the gradient of the cost function. Let's explicitly write this out in the form of an algorithm: 1. Input: Set the corresponding activation for the input layer.” Nielsen P. 70.
With respect to the backpropagation algorithm, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Nielsen before the effective filing date because the steps of backpropagation train the network to actually become a usable model based on the data.)
propagating the set of training data across the artificial neural network according to the set of pruning masks and the trainable parameters; and generating a set of output values from a plurality of output nodes of the artificial network; (Zhou teaches that “pruning” sets the weights to zero. “Take some action with the weights with mask value 0. In [5] these weights were pruned: set to 0 and frozen during any subsequent training.” Zhou P. 2. “Typical network pruning procedures [9, 8, 15] perform two actions on pruned weights: set them to zero, and freeze them in subsequent training (equivalent to removing those connections from the network).” Zhou P. 5. Note that the forward pass will be executed “according to the pruning masks” that set the weights when using the backpropagation algorithm taught in Nielsen. Nielsen P. 70 explains the forward pass of during backpropagation. Note that “w” “a” and “b” refer to weights and activations and biases at a given layer, respectively. The “z” is the output of a given layer.
PNG
media_image1.png
200
400
media_image1.png
Greyscale
)
calculating a set of loss values for the set of training data based on the set of output values; (Nielsen P. 70 explains the output error:
PNG
media_image2.png
200
400
media_image2.png
Greyscale
updating the set of trainable parameters based on the set of loss values through backpropagation; and (Nielsen P. 70 explains the backward pass of backpropagation used to update parameters in the network:)
PNG
media_image3.png
200
400
media_image3.png
Greyscale
repeating the above steps for a specified number of iteration times. (Nielsen teaches: “Of course, to implement stochastic gradient descent in practice you also need an outer loop generating mini-batches of training examples, and an outer loop stepping through multiple epochs of training. I've omitted those for simplicity.” Nielsen P. 71.)
3. The method of claim 2, wherein the updating the set of trainable parameters is based on at least one of stochastic gradient descent, resilient backpropagation, root-mean-square propagation, Broyden-Fletcher-Goldfarb-Shanno algorithm, adaptive momentum optimization, adaptive subgradient, and adaptive delta. (Zhou teaches “We train the networks with mask m for each layer (and all regular kernels and biases frozen) with SGD, 0.9 momentum.” Zhou P. 13. Note that Nielsen P. 71 also teaches using SGD.)
4. The method of claim 1, wherein the set of loss values is based on at least one of mean-square error, mean absolute error, cross entropy, connectionist temporal classification loss, negative log-likelihood, Kullback-Leibler divergence, margin loss, ranking loss, embedding loss, hinge loss, Huber loss. (Zhou teaches “A different method to try would be to add an L1 loss to influence layers to go toward certain values, which may alleviate the cold start problems of some networks not learning anything due to mask values starting too low (effectively having the entire network start at zero).” Zhou P. 13. Note that L1 loss refers to mean absolute error. From the context of the reference it is unclear whether this is a different embodiment so a motivation to modify the original embodiment is given. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Zhou to include L1 loss because this may alleviate the cold start problem described in the cited portion of the reference.
5. The method of claim 2, wherein the updating the set of trainable parameters further comprises the step of rounding the trainable weights to quantize values based on power-of-two, the additive powers-of-two, power-of-three, or additive powers- of-three. (Elhoushi teaches: “This paper presents an approach to reduce computation and power requirements of CNNs by replacing regular multiplication-based convolution and linear operations, also known as a fully-connected layer or matrix multiplication, with bitwise-shift-based convolution and linear operations respectively. Applying bitwise shift operation on an element is mathematically equivalent to multiplying it by a power of 2.” Elhoushi P. 2.)
8. A computer-implemented method for testing an artificial neural network, performed by at least one computing processor, wherein the method uses the at least one processor coupled with a memory storing instructions implementing the method, wherein the instructions, when executed by the at least processor, carry out steps of the method, comprising: (See rejection of claim 1 showing the specific model of graphics card used to train models in Zhou. Further, one of ordinary skill in the art would understand the implementation of a neural network using source code as a reference to operations to be carried out on a standard computer storing instructions and data in memory, and executing instructions on a processor. See e.g. Nielsen PP. 40-43 teaching implementation of backpropagation using code written in Pytorch. “The sole issue is whether claims 7 and 8 are anticipated[.] . . . We agree with appellant that Figure 1 of Thacker, by itself, does not disclose every limitation in the appealed claims. However, in considering the disclosure of a reference, it is proper to take into account not only specific teachings of the reference but also the inferences which one skilled in the art would reasonably be expected to draw therefrom. In re Shepard, 319 F.2d 194, 50 CCPA 1439 (1963).” In re Preda, 401 F.2d 825, 826 (C.C.P.A. 1968). One of ordinary skill in the art, an engineer in the area of neural networks, would draw the inference that a computer using a memory and processor executing instructions were used to carry out training of the networks described in the Zhou.) feeding a set of testing data into a plurality of input nodes of the artificial neural network, (See rejections of claims 1 and 2 including Nielsen’s description of training.) wherein the artificial neural network includes trained weights quantized based on additive powers-of-two that includes a summation two raised to the power a first integer and two raised to the power a second integer, wherein a value of the first integer in greater than a value of second integer; (See rejection of claim 1.) propagating the set of testing data across the artificial neural network according to a set of pruning masks; and generating a set of output values from a plurality of output nodes of the artificial network. (See rejections of claims 1 and 2 citing Nielsen’s description of training. Note that Zhou teaches testing of the neural networks. “Test accuracy at early stopping iteration of different mask criteria for four networks at various pruning rates.” Further, Nielsen teaches how source code is modified to include testing during backpropagation. See e.g. Nelson PP. 38-41 (“If ‘test data’ is provided then the network will be evaluated against the test data after each epoch, and partial progress printed out. This is useful for tracking progress, but slows things down substantially.”) It would have been obvious to combine this feature of Nelson because it is useful for tracking how training affects performance, which can potentially allow the user to see where performance can be improved.)
9. The method of claim 8, wherein the propagating further comprises the steps of: transforming values of neuron nodes according to a trained set of trainable parameters of the artificial neural network; modifying the transformed values of neuron nodes according to a set of activation functions; and repeating the above steps across a plural of a neuron layer of the artificial neural network. (See rejection of claims 1 and 2. Note that the “a” in the equations of Nielsen cited in the rejection of claim 2 refers to the activation function in a “neuron node.”)
10. The method of claim 9, wherein the transforming values of neuron nodes is based on at least one of sign flipping, bit shifting, accumulation, and biassing, according to a quantized set of trainable parameters with power-of-two, the additive powers-of-two, power-of-three, or additive powers-of-three. (See rejection of claim 5.)
11. A system deployed for an artificial neural network comprises: at least one interface link; at least one computing processor; at least one memory bank configured to store instructions implementing a training methods, wherein the instructions, when executed by the at least computing processor, carry out at steps of the training method, comprising; (a) initializing a set of trainable parameters of the artificial neural network, wherein the set of trainable parameters comprise a set of trainable weights and a set of trainable biases; (b) training the set of trainable parameters using a set of training data, wherein the set of trainable weights of the set of trainable parameters includes weights quantized based on additive powers-of-two that includes a summation two raised to the power a first integer and two raised to the power a second integer, and wherein a value of the first integer in greater than a value of second integer; (c) generating a pruning mask based on the trained set of trainable parameters; (d) rewinding the set of trainable parameters; (e) pruning a selected set of trainable parameters based on the pruning mask; and (f) repeating the above steps from (b) to (e) for a specified number of times to generate a set of sparse neural networks having an incremental sparsity. (See rejection of claim 1.)
13. The system of claim 11, wherein the artificial neural network uses a set of pruning masks to reduce arithmetic operations. (“As pointed out in [5], it appears that the specific combination of pruning mask and weights underlying the mask form a more efficient subnetwork found within the larger network, or, as named by the original study, a lucky winning “Lottery Ticket,” or LT.” Zhou P. 1. “Some compression approaches enable more efficient computation by pruning parameters[.]” Zhou P. 1. “Typical network pruning procedures [9, 8, 15] perform two actions on pruned weights: set them to zero, and freeze them in subsequent training (equivalent to removing those connections from the network).” Zhou P. 5. Note also that “to reduce arithmetic operations” does not require steps to be performed or limit to a particular structure. Intended use language is explained in MPEP §§ 2103 and 2111.02. “Claim scope is not limited by claim language that suggests or makes optional but does not require steps to be performed, or by claim language that does not limit a claim to a particular structure.” MPEP § 2111.04.)
14. The system of claim 11, wherein the artificial neural network uses a set of quantized parameters to reduce arithmetic multiplication operations using power-of-two, , power-of-three, or additive powers-of-three. (See rejection of claim 5.)
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Zhou, Nielsen, and Elhoushi, and Gupta (Learning to Prune Deep Neural Networks via Reinforcement Learning; 2020)
6. The method of claim 1, wherein the incremental sparsity is controlled by an auxiliary neural network trained by multiple training episodes through a deep reinforcement learning framework. (Zhou teaches “Typical network pruning procedures [9, 8, 15] perform two actions on pruned weights: set them to zero, and freeze them in subsequent training (equivalent to removing those connections from the network).” Zhou P. 5. This teaches effectively removing connections from the network, increasing sparsity. Zhou further teaches incrementally reducing sparsity, as shown in the rejection of claim 1.
The previously cited art does not teach using an auxiliary neural network trained with reinforcement learning to alter network architecture, in this case by controlling the incremental sparsity.
Gupta teaches: “This paper proposes PuRL - a deep reinforcement learning (RL) based algorithm for pruning neural networks. Unlike current RL based model compression approaches where feedback is given only at the end of each episode to the agent, PuRL provides rewards at every pruning step. . . . Lastly, we point out that PuRL is simple to use and can be easily adapted for various architectures.” Gupta Abstract. “Our primary focus is on sample efficiency and accuracy. Deep Q-Network (DQN) Mnih et al. (2013), a form of Q-learning, does a very fast exploration, however, it is not very stable. Through careful design of our reward structure, we make DQN stable and hence, utilise it for doing pruning” Gupta P. 3. See also Gupta P.4 Fig. 1 showing the DQN being separate from the neural network to be trained.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Gupta because this allows automation of an accurate way of pruning networks.
Claims 12 is rejected under 35 U.S.C. 103 as being unpatentable over Zhou, Nielsen, Elhoushi, Stewart (Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm, Feb. 2021), and Guo (Online Knowledge Distillation via Collaborative Learning; 2020).
12. The system of claim 11, wherein a student model is designed in parallel to the artificial neural network for a knowledge distillation to find a Pareto-optimal trade-off between performance and complexity of the artificial neural network. (The previously cited art does not discuss using a pareto optimal tradeoff to find a student model.
Stewart teaches “This paper makes the following contributions: . . .• A new framework called NEMOKD for hardware aware evolution of knowledge distilled student models (Section 3). . . . • An evaluation of NEMOKD showing its ability to minimise both latency and accuracy loss on Intel’s fixed Movidius Myriad X VPU architecture (Section 4.3).” Stewart P. 3. “Recent neuro-evolution techniques retain stochastic gradient descent and back propagation for training, before using evolutionary algorithms to search for optimal architectural configurations. Device-aware Progressive Search for Pareto-Optimal Neural Architectures [27] is a method of neural architecture search that has been shown to simultaneously optimise device-related objectives such as inference time and device-agnostic objectives such as accuracy. This search algorithm uses progressive search and mutation operators to explore the trade-offs between these objectives.” Stewart P. 5. “Multi-Objective Optimisation solves optimisation problems with at least two conflicting objectives. For a solution space A that contains all permissible neural networks configurations, the two objectives of NEMOKD are (1) minimise inference latency (latency) and (2) minimising accuracy loss (error):” Stewart P. 5. See also Stewart P. 7 showing mutation of the number of layers, channels, and neurons in student networks.
PNG
media_image4.png
200
400
media_image4.png
Greyscale
Stewart P. 17.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Stewart because this technique optimizes for a better tradeoff against accuracy when selecting smaller, faster networks.
Stewart does not expressly state that the student and teacher models taught in the reference are trained in parallel.
Guo teaches “An alternative method proposed by [15] (ONE) is to train a multi-branch network while establishing teacher on the fly, as shown in Fig. 1c. . . . The efficacy of self-distillation and online distillation leads us to the following question: Could we use a small network to improve the model with larger capacity in a one-stage distillation framework? In this work, we propose a novel online knowledge distillation method via collaborative learning. In KDCL, student networks with different capacities learn collaboratively to generate high-quality soft target supervision, which distills the additional knowledge to each student as illustrated in Fig.1d.” Guo P. 11018. Note here, that “student networks with different capacities” are not structurally different than student and teacher networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Guo because this may improve both models.
Response to Arguments
Applicant's arguments filed 06/11/2025 have been fully considered but they are not persuasive.
Rejections under § 101
Applicant submits that the claims merely involve math without actually reciting math. See Rem. 6-7. Nothing in remarks is offered in support this assertion.
Applicant submits that even if an abstract idea is recited in the claims, the claims are directed to an “improvement to the field of artificial neural networks . . . and the systems thereof.” Rem. 8. The improvement, according to Applicant is the replacement of binary representations of weights in a neural network with additive powers of two representations. As noted by Applicant, this way of representing weights of a neural network reduces computational and storage overhead when training and using ANN’s. Rem. 8-9. But this describes an improvement to math and therefore and improvement to the abstract idea itself. An improvement to the abstract idea itself is not patent eligible.
Further, the Remarks indicate that Applicant views APoT as known, asserting that a specific implementation of APoT should result in an improvement. (“In order to reduce the computational complexity of DNN for real-time processing, the present invention provides a way to integrate APoT quantization into a DeepShift framework. In the original DeepShift, DNN weights are quantized into a signed PoT as w+/-2u, where u is an integer to train. Note that the PoT weights can fully eliminate multiplier operations from DNN equalizers as it can be realized with bit shifting for fixed-point precision or addition operation for floating-point (FP) precision. We further improve the DeepShift by using APoT weights . . . it requires . . . no multiplication likewise PoT. . . . Note that the original APoT uses a deterministic non-trainable look-up table, whereas our invention extends it as an improved DeepShift with trainable APoT weights through the use of QAT.” Rem. 9 (apparently citing paragraph 44 of the Spec.) But nothing in the remarks explains what operations of QAT result in this training benefit and the claims recite “weights quantized based on additive powers of two” without reciting any specific operations. Merely using a technique which was known as of the effective filing date does not provide an improvement to the state of the art. If there are operations described in the Specification which would be understood by one of ordinary skill in the art as resulting in an improvement to the state of the art, such operations may be amended into the claims. If operations resulting in an improvement are claimed, the claims may include an inventive concept. “In short, first the specification should be evaluated to determine if the disclosure provides sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement. The specification need not explicitly set forth the improvement, but it must describe the invention such that the improvement would be apparent to one of ordinary skill in the art. Conversely, if the specification explicitly sets forth an improvement but in a conclusory manner (i.e., a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art), the examiner should not determine the claim improves technology. Second, if the specification sets forth an improvement in technology, the claim must be evaluated to ensure that the claim itself reflects the disclosed improvement. That is, the claim includes the components or steps of the invention that provide the improvement described in the specification. . . . It should be noted that while this consideration is often referred to in an abbreviated manner as the ‘improvements consideration,’ the word ‘improvements’ in the context of this consideration is limited to improvements to the functioning of a computer or any other technology/technical field, whether in Step 2A Prong Two or in Step 2B.” MPEP 2106.04(d)(1).
Rejections under § 112a
No specific arguments are found in the remarks.
Rejections under § 112b
No specific arguments are found in the remarks.
Rejections under § 103
Applicant states that Zhou and Nielsen fail to teach APoT. However, APoT is known. See rejection above.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL M KNIGHT whose telephone number is (571) 272-8646. The examiner can normally be reached Monday - Friday 9-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached on (571. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
PAUL M. KNIGHT
/PAUL M KNIGHT/Examiner, Art Unit 2148
1 This distinction between claims which read on math and claims which recite an abstract idea is based on official USPTO Guidance. The 2019 Subject Matter Eligibility (SME) Examples instructs examiners that a claim reciting “training the neural network” where the background describes training as “using stochastic learning with backpropagation which is a type of machine learning algorithm that uses the gradient of a mathematical loss function to adjust the weights of the network” “does not recite any mathematical relationships, formulas, or calculations.” See 2019 SME Example 39, PP. 8-9 (emphasis added). In this example, the plain meaning of “training the neural network” read in light of the disclosure reads on backpropagation using the gradient of a mathematical loss function. See MPEP § 2111.01. In contrast, the 2024 SME Examples instructs examiners that a claim reciting “training, by the computer, the ANN . . . wherein the selected training algorithm includes a backpropagation algorithm and a gradient descent algorithm” does recite an abstract idea because “[t]he plain meaning of [backpropagation algorithm and gradient descent algorithm] are optimization algorithms, which compute neural network parameters using a series of mathematical calculations.” 2024 PEG Example 47, PP. 4-6. The Memorandum of August 4, 2025; Reminders on evaluating subject matter eligibility of claims under 35 U.S.C. 101, P. 3 also directs examiners that “training the neural network” recited in Example 39 merely “involve[s] . . . mathematical concepts” and contrasts claim 2 of example 47 as “referring to [specific] mathematical calculations by name[.]” (Emphasis added.)
2 “For instance, the claims in Diehr . . . clearly stated a mathematical equation . . . and the claims in Mayo . . . clearly stated laws of nature . . . such that the claims ‘set forth’ an identifiable judicial exception. Alternatively, the claims in Alice Corp. . . . described the concept of intermediated settlement without ever explicitly using the words ‘intermediated’ or ‘settlement.’” MPEP § 2106.04(II)(A).
3 “By grouping the abstract ideas, the examiners’ focus has been shifted from relying on individual cases to generally applying the wide body of case law spanning all technologies and claim types. . . . If the identified limitation(s) falls within at least one of the groupings of abstract ideas, it is reasonable to conclude that the claim recites an abstract idea in Step 2A Prong One.” MPEP § 2106.04(a). See also MPEP 2104(a)(2).
4 Step 2A prongs one and two are evaluated individually, consistent with the framework in the MPEP. Evaluation of relationships between abstract ideas and additional elements in one location promotes clarity of the record.
5 “In short, first the specification should be evaluated to determine if the disclosure provides sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement. The specification need not explicitly set forth the improvement, but it must describe the invention such that the improvement would be apparent to one of ordinary skill in the art. Conversely, if the specification explicitly sets forth an improvement but in a conclusory manner (i.e., a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art), the examiner should not determine the claim improves technology. Second, if the specification sets forth an improvement in technology, the claim must be evaluated to ensure that the claim itself reflects the disclosed improvement. That is, the claim includes the components or steps of the invention that provide the improvement described in the specification. . . . It should be noted that while this consideration is often referred to in an abbreviated manner as the ‘improvements consideration,’ the word ‘improvements’ in the context of this consideration is limited to improvements to the functioning of a computer or any other technology/technical field, whether in Step 2A Prong Two or in Step 2B.” MPEP 2106.04(d)(1). See also Koninklijke KPN N.V. v. Gemalto M2M GmbH, 942 F.3d 1143, 1150-1152 (Fed. Cir. 2019).
6 See MPEP § 2106.05(d)(II) listing operations including “receiving or transmitting data,” “storing and retrieving data in memory,” and “performing repetitive calculations” as WURC.
7 “But ‘[f]or the role of a computer in a computer-implemented invention to be deemed meaningful in the context of this analysis, it must involve more than performance of 'well-understood, routine, [and] conventional activities previously known to the industry.’ Content Extraction, 776 F.3d at 1347-48 (quoting Alice, 134 S. Ct at 2359). Here, the server simply receives data, ‘extract[s] classification information . . . from the received data,’ and ‘stor[es] the digital images . . . taking into consideration the classification information.’ See ‘295 patent, col. 10 ll. 1-17 (Claim 17). . . . These steps fall squarely within our precedent finding generic computer components insufficient to add an inventive concept to an otherwise abstract idea. Alice, 134 S. Ct. at 2360 (‘Nearly every computer will include a 'communications controller' and a 'data storage unit' capable of performing the basic calculation, storage, and transmission functions required by the method claims.’); Content Extraction, 776 F.3d at 1345, 1348 (‘storing information’ into memory, and using a computer to ‘translate the shapes on a physical page into typeface characters,’ insufficient confer patent eligibility); Mortg. Grader, 811 F.3d at 1324-25 (generic computer components such as an ‘interface,’ ‘network,’ and ‘database,’ fail to satisfy the inventive concept requirement); Intellectual Ventures I, 792 F.3d at 1368 (a ‘database’ and ‘a communication medium’ ‘are all generic computer elements’); BuySAFE v. Google, Inc., 765 F.3d 1350, 1355 (Fed. Cir. 2014) (‘That a computer receives and sends the information over a network—with no further specification—is not even arguably inventive.’).” TLI Commc'ns LLC v. AV Auto., LLC, 823 F.3d 607, 614 (Fed. Cir. 2016), Emphasis Added.
8 “The analysis as to whether an element (or combination of elements) is widely prevalent or in common use is the same as the analysis under 35 U.S.C. 112(a) as to whether an element is so well-known that it need not be described in detail in the patent specification. See Genetic Techs. Ltd. v. Merial LLC, 818 F.3d 1369, 1377, 118 USPQ2d 1541, 1546 (Fed. Cir. 2016) (supporting the position that amplification was well-understood, routine, conventional for purposes of subject matter eligibility by observing that the patentee expressly argued during prosecution of the application that amplification was a technique readily practiced by those skilled in the art to overcome the rejection of the claim under 35 U.S.C. 112, first paragraph)[.]” MPEP § 2106.05(d)(I).
9 “Similarly, claim elements or combinations of claim elements that are routine, conventional or well-understood cannot transform the claims. (Citing BSG Tech LLC v. BuySeasons, Inc., 899 F.3d 1281, 1290-1291 (Fed. Cir. 2018)). When the patent's specification ‘describes the components and features listed in the claims generically,’ it ‘support[s] the conclusion that these components and features are conventional.’ Weisner v. Google LLC, 51 F.4th 1073, 1083-84 (Fed. Cir. 2022); see also Beteiro, LLC v. DraftKings Inc., 104 F.4th 1350, 1357-58 (Fed. Cir. 2024).” Broadband iTV, Inc. v. Amazon.com, Inc., 113 F.4th 1359 (Fed. Cir. 2024)
10 “If it is asserted that the invention improves upon conventional functioning of a computer, or upon conventional technology or technological processes, a technical explanation as to how to implement the invention should be present in the specification. That is, the disclosure must provide sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement. The specification need not explicitly set forth the improvement, but it must describe the invention such that the improvement would be apparent to one of ordinary skill in the art. Conversely, if the specification explicitly sets forth an improvement but in a conclusory manner (i.e., a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art), the examiner should not determine the claim improves technology.” MPEP § 2106.05(a).