DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 26-45 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim(s) recite(s) mathematical calculations for pruning a neural network. Specifically, computing a loss, deriving importance scores for weights based on that loss, selecting weights with lower importance scores, and setting those weights to zero. This judicial exception is not integrated into a practical application because the claims are drafted a purely functional level with no recitation of a specific technical improvement, particular machine, or transformation beyond generic computing components. The covariance-based improvement described in the specification, which is the actual technical advance over conventional pruning, is not recited in the claims. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the recited additional elements (a neural network, a computer processor, and non-transitory computer-readable media) are generic computing components that are well-understood, routine, and conventional, and do not meaningfully limit the abstract idea.
Claims 26, 33, 40:
Step 1: All three are statutory categories (process, manufacture, machine).
Step 2A, Prong 1: The claims recite mathematical concepts - computing a loss, deriving importance scores from the loss, selecting weights by score, and zeroing them. This is a mathematical calculation/relationship under MPEP § 2106.04(a)(2)(1).
Judicial exception identified.
Step 2A, Prong 2: No practical application. The claims are purely functional with no recitation of a specific technical improvement, particular machine, or transformation. The covariance-based improvement described in the spec is not claimed. "Neural network" at this abstraction level is itself a mathematical construct.
Step 2B: Generic processor and non-transitory media are WURC. No inventive concept.
Conclusion: Claims 26, 33, and 40 are rejected under 35 U.S.C. § 101 as directed to the abstract idea of mathematical calculations - computing a loss, scoring weights by their effect on that loss, selecting low-scoring weights, and zeroing them – without integration into a practical application or significantly more. See MPEP § § 2106, 2106.04, 2106.04(a)(2)(I), 2106.05, 2106.07; Alice Corp. v. CLS Bank lnt'l, 573 U.S. 208 (2014); Mayo Collaborative Servs. v. Prometheus Labs., Inc. , 566 U.S. 66 (2012); Recentive Analytics, Inc. v. Fox Corp., 101 F.4th 956 (Fed. Cir. 2024).
Claim
Meaningful Limitation Added
Eligible?
Rationale
27, 34, 41
Select weight with smaller importance score
No
Mathematical comparison – adds math to math
28, 35 , 42
Input data is training data
No
Field-of-use; does not integrate exception into practical application
29, 36, 43
Network already trained; maintain unselected weights; further train after zeroing
No
Data-state descriptor & additional mathematical steps on the model
30, 37, 44
During retraining, hold zeros fixed;
update only non-zero weights
No
Constrained optimization – limits abstract idea without integrating it into practical application
31 , 38, 45
Select and zero an additional
unselected weight
No
Iterating the same abstract mathematical process
32, 39
Layers are convolutional
No
Architectural field-of-use limitation; Recentive Analytics, applying math to a particular model type is insufficient
All dependent claims 27-32, 34-39, and 41-45 remain ineligible. None adds a limitation that (1)
reflects a specific technical improvement grounded in the claim language, (2) ties the abstract idea to a particular machine in a non-generic way, or (3) effects a transformation of a physical article. The dependent claims collectively add mathematical refinements (score comparison method, data type, training state, fixed-zero retraining, iteration, architecture type) all of which operate entirely within the abstract idea or at best constitute field-of-use or data-category narrowing.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 26-31, 33- 38, and 40-45 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Le Cun et al., "Optimal Brain Damage," Advances in Neural Information Processing Systems, vol. 2, pp. 598- 605, 1990 (hereinafter "OBD").
Claims 26, 33, and 40. (New)
OBD discloses one or more non-transitory computer-readable media storing instructions executable to perform operations (ODB describes a computational algorithm executed on a trained network: “training has converged” and the method proceeds by “computing the second derivates … hkk” via backpropagation, then iterating “to step 2.” The entire ODB procedure (forward pass, Hessian diagonal computation, saliency ranking, and weight deletion) is an algorithmic sequence that can only be performed by a processor operating on weights stored in memory. The paper reports empirical results on a digit recognition system, confirming actual computer implementation. The examiner takes official notice that implementing such an algorithm on a processor and storing it on a non-transitory computer-readable medium was well-known and the only possible means of execution.), the operations comprising:
providing input data to a neural network, the neural network comprising one or more layers with weights, the input data processed in the one or more layers (OBD: "The method
was validated using our handwritten digit recognition network trained with backpropagation ... The network state is computed using the standard formulae
PNG
media_image1.png
49
167
media_image1.png
Greyscale
... where xi is the state of unit i, ai its total input (weighted sum) ... and Wij is the connection going from unit j to unit i" (Section 2.1). This teaches providing input data to a multi-layer neural network whose layers have weight parameters Wij.);
computing a loss of the neural network based on the input data and the weights (OBD:
"We assume the objective function is the usual mean-squared error (MSE); generalization to
other additive error measures is straightforward ... We approximate the objective function E by a Taylor series. A perturbation δU of the parameter vector will change the objective function by
PNG
media_image2.png
51
438
media_image2.png
Greyscale
" (Section 2). This teaches computing a loss function E (the MSE objective) as a function of the input data and the network weights.);
determining importance scores for the weights based on the loss, an importance score of a weight indicating a measurement of a change in the loss by removing the weight (OBD: "it is more than reasonable to define the saliency of a parameter to be the change in the objective function caused by deleting that parameter ... 4. Compute the saliencies for each parameter: sk = hkk uk2 / 2" (Section 2 and Section 2.2, The Recipe, Step 4). The saliency sk derived from the diagonal second derivative hkk of the loss E and the weight value uk, and directly approximate ΔE – the change in the loss – caused by deleting (setting to zero) weight parameter k. This teaches determining an importance score (saliency) for each weight that indicates a measurement of the change in the loss by removing that weight.);
selecting one or more weights based on the importance scores of the weights (OBD: "Sort the parameters by saliency and delete some low-saliency parameters" (Section 2.2, The Recipe, Step 5). This teaches selecting one or more weights – specifically those with the lowest
importance scores (saliencies) – for removal.); and
changing the one or more selected weights to one or more zeros (OBD: "Deleting a parameter is defined as setting it to 0 and freezing it there" (Section 2.2). This teaches changing the selected low-saliency weights to zero.).
Claims 27, 34, and 41. (New)
ODB discloses the one or more non-transitory computer-readable media of The one or more non-transitory computer-readable media of wherein selecting the one or more weights based on the importance scores of the weights comprises: comparing an importance score of a first weight with an importance score of a second weight; and selecting the first weight over the second weight based on the importance score of the first weight being smaller than the importance score of the second weight (OBD: "Sort the parameters by saliency and delete some low-saliency parameters" (Section 2.2, Step 5); "It is clear that deleting parameters by order of saliency causes a significantly smaller increase of the objective function than deleting them according to their magnitude" (Section 2.3). Sorting by saliency rank and deleting those with the lowest (smallest) saliency scores directly teaches comparing the importance scores of individual weights and selecting a weight with a smaller importance score over a weight with a larger importance score for deletion.).
Claims 28, 35, and 42. (New)
ODB discloses the one or more non-transitory computer-readable media of The one or more non-transitory computer-readable media of wherein the input data is training data used to train the neural network (OBD: "It was trained on a database of segmented handwritten zip code digits and printed digits containing approximately 9300 training examples" (Section 2.3). This
teaches that the input data provided to the neural network is training data used in training the network.).
Claims 29, 36, and 43. (New)
ODB discloses the one or more non-transitory computer-readable media of The one or more non-transitory computer-readable media of wherein the neural network has been trained, and the operations further comprise: maintaining one or more values of one or more unselected weights; and after changing the one or more selected weights to the one or more zeros and maintaining the one or more values of the one or more unselected weights, further training the neural network (OBD: "Train the network until a reasonable solution is obtained" (Section 2.2, Step 2) – the neural network has been trained prior to pruning. "Deleting a parameter is defined as setting it to 0 and freezing it there" (Section 2.2) – only the selected low-saliency
weights are frozen at zero; the unselected weights retain their current values (are maintained). "Iterate to step 2" (Section 2.2, Step 6) – the network undergoes further
training (retraining) after the selected weights are set to zero. This teaches maintaining
the values of unselected weights while further training the network after changing
selected weights to zeros.).
Claims 30, 37, and 44. (New)
ODB discloses the one or more non-transitory computer-readable media of The one or more non-transitory computer-readable media of wherein further training the neural network comprises: maintaining the one or more zeros; and modifying the one or more values of the one or more unselected weights (OBD: "Deleting a parameter is defined as setting it to 0 and freezing it there" (Section 2.2). The phrase "freezing it there" expressly teaches that the zeroed weights are maintained (held at zero) throughout further training. The remaining unselected weights are then updated through continued backpropagation in Step 2, thereby
modifying their values while the zeros are maintained. This teaches maintaining zeros of
selected weights and modifying values of unselected weights during further training.).
Claims 31, 38, and 45. (New)
ODB discloses the one or more non-transitory computer-readable media of The one or more non-transitory computer-readable media of wherein the operations further comprise: selecting an additional weight from the one or more unselected weights based on one or more importance scores of the one or more unselected weights; and changing the additional weight to a zero (OBD: "Iterate to step 2" (Section 2.2, Step 6). In each subsequent iteration, OBD recomputes the second-derivative saliency scores (Steps 3-4) for the remaining non-zero (previously unselected) weights, then "sort[s] the parameters by saliency and delete[s] some low-saliency parameters" (Step 5) – i.e., selects one or more additional weights from the unselected pool based on their updated importance scores and sets them to zero. This teaches iteratively selecting additional weights from the unselected weights based on importance scores and changing them to zero. See also OBD Section 2.3, Figure 2, showing the performance benefit of iterative pruning with retraining.).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 32 and 39 is/are rejected under 35 U.S.C. 103 as being unpatentable over OBD as applied to claims 26 and 33 above, in view of Han et al., "Learning both Weights and Connections for Efficient Neural Networks," arXiv: 1506.02626, 2015 (hereinafter "Han").
Claims 32 and 39. (New)
The one or more non-transitory computer-readable media of The one or more non-transitory computer-readable media of wherein the one or more layers comprises one or more convolutional layers.
OBD does not explicitly recite that the one or more layers comprise one or more convolutional layers, as OBD describes its network as a shared-weight architecture. However, Han, in the same field of neural network pruning for computational efficiency, explicitly teaches that the weight selection and zeroing methodology is applicable to one or more convolutional layers (Han: "Both CONV and FC layers can be pruned, but with different sensitivity" (Section 5); "pruning reduces the number of weights by 12x and computation by 6x ... [for layers] conv1, conv2" (Table 3, LeNet-5 results); "We further examine the performance of pruning on the lmageNet ... dataset ... VGG-16 has far more convolutional layers ... We aggressively pruned both convolutional and fully-connected layers" (Section 4.3). This teaches that a weight-pruning-and-zeroing methodology like that of OBD applies to and is particularly beneficial for one or more convolutional layers in a CNN.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply OBD's importance-score-based weight pruning to convolutional layers, as taught by Han. The motivation for this combination would have been to reduce the storage and computational burden of convolutional layers; which, as Han demonstrates, account for the majority of parameters and arithmetic operations in deep CNNs used for computer vision tasks. Thereby enabling deployment of accurate neural network models on resource constrained mobile and embedded devices without loss of predictive accuracy.
Conclusion
The prior art made of record but not relied, yet considered pertinent to the applicant’s disclosure, is listed on the PTO-892 form.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ross Varndell whose telephone number is (571)270-1922. The examiner can normally be reached M-F, 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, O’Neal Mistry can be reached at (313)446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Ross Varndell/Primary Examiner, Art Unit 2674