DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on February 17, 2026 has been entered.
Response to Amendment and Arguments
Claims 1, 2, 5-12, and 15-24 are pending and are being examined in this application.
In light of Applicant’s amendments to the claims, the 102 rejection of claims 1, 2, 5-12, and 15-20 is withdrawn. However, claims 1, 2, 5-12, and 15-20 are newly rejected under 112(a).
Applicant’s arguments with respect to the 102 rejection have been considered, but are not directed to claim 21. Claim 21, as amended, differs in scope from claims 1 and 11, and does not recite the limitation of maximizing nor optimizing the amount of computation/memory/time used in performing an operation of the neural network as contended by applicant. Claim 21 recites “determining that the pruning mask’s application to the neural network will optimize the performance of the inference,” not optimizing the amount of computation/memory/time used in performing the inference.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1, 2, 5-12, and 15-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
The examiner has reviewed the application as originally filed and Remarks submitted by applicant, but cannot find support for the limitation “wherein the optimization operation comprises determining, based on the weight importance, that the pruning mask's application to the neural network will satisfy the constraint condition in performing the operation of the neural network and will: maximize the amount of memory to be used in performing of the operation of the neural network, maximize the amount of computation to be used in performing the neural network operation, or maximize the amount of time to be taken in performing the neural network operation” recited in claims 1 and 11.
Per the specification:
[0010] The determining of the pruning mask may include, in accordance with the constraint condition, expressing an optimization equation for maximizing the weight importance of the neural network as at least one of the pruning binary vector of the input channel and the spatial pruning binary vector of the output channel.
[0102] An apparatus configured to implement embodiments described with reference to FIG. 4 may optimize a pruning mask in a way of maximizing a resource of a weight in accordance with (or as constrained by) an amount of resources indicated by a user or otherwise. For example, the apparatus may predict an estimated inference time, an amount of memory necessary to use a neural network, and/or floating point operations (FLOPs) required for an operation based on weights remaining after pruning using a determined pruning mask (e.g., mask A described above).
[0105] In operation 420, the apparatus may receive a constraint condition related to an operation resource. The constraint condition related to the operation resource may be determined based on an amount of memory, FLOPs, and/or time to operate the neural network to perform an inference (i.e., an inference time). For example, if the neural network needs to complete an inference or prediction within a preset time, a corresponding inference time may be a constraint condition. For example, the constraint condition may be a time limit of using a hardware resource such as a processor, supercomputer, or the like.
[0106] In operation 420, the apparatus may determine, according to the constraint condition, a pruning mask (or a pruning vector or matrix) for maximizing or improving the weight importance of the neural network. The description of the pruning mask referring to FIG. 3B may also apply to FIG. 4, and thus, a duplicate description is omitted. Although description below refers to a pruning mask, the same description may apply to the use of a pruning vector, a pruning matrix, or the like.
[0107] In some embodiments, based on remaining weights after pruning, inference time, for example, may be predicted, as well as the amount of memory required to use the neural network, FLOPs required for calculation, etc. The apparatus may express any such prediction values of the constraint condition as a linear combination...
[0120] The apparatus may determine a pruning mask satisfying all conditions requested by a user (or otherwise selected) at once by maintaining a maximal number of remaining important weights without repeatedly adjusting a threshold value.
As provided above, paragraph 10 of the specification discloses that determining the pruning mask includes performing an optimization operation for maximizing the weight importance of the neural network while satisfying constraint condition. Thus, the claimed “wherein the optimization operation comprises determining, based on the weight importance, that the pruning mask's application to the neural network will...” is clearly unsupported because the specification discloses that the weight importance is maximized by the optimization operation, not that the weight importance is used to determine that the pruning mask’s application will satisfy the constraint condition or will maximize the amount memory/computation/time consumption.
Paragraph 102 of the specification discloses optimize a pruning mask in a way of maximizing a resource of a weight in accordance with (or as constrained by) an amount of resources. This paragraph provides a brief introduction to the flow chart of figure 4. Paragraphs 103-122 go into the details of how the optimization is performed. Paragraph 106, which focuses on the optimization step, discloses the apparatus may determine, according to the constraint condition, a pruning mask (or a pruning vector or matrix) for maximizing or improving the weight importance of the neural network. Thus, the claimed determining that the pruning mask’s application to the neural network will maximize the amount memory/computation/time consumption of the neural network operation is clearly unsupported because the specification discloses that the optimization operation maximizes the weight importance, not the amount memory/computation/time consumed by the neural network operation.
Per paragraph 105 of the specification, the constraint condition refers to a limit on the amount of memory/computation/time consumption that can be used by the neural network operation. Per paragraphs 102 and 107, an estimated amount of memory/computation/time consumption by the neural network operation can be predicted based on remaining weights after pruning. Putting everything together, the pruning mask is determined by an optimization operation that maximizes weight importance. The optimization operation uses the remaining weights after pruning to predict an estimated amount of memory/computation/time that will be consumed by the neural network operation. This is to ensure that application of the pruning mask will be within the limits imposed by the constraint condition (i.e., satisfy the constraint condition). There is no maximizing of the amount of memory/computation/time consumed by the neural network operation, as claimed.
It is suggested that the claim limitation at issue be amended to:
“wherein the optimization operation comprises determiningmaximize the weight importance of the neural network and will satisfy the constraint condition in performing the operation of the neural network by using weights remaining after application of the pruning mask to predict: the amount of memory to be used in performing of the operation of the neural network,
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 21-24 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Garg et al. (US Pub. 20250139516, supported by provisional application 63/309036 filed on February 11, 2022).
Referring to claim 21, Garg discloses A method performed by an apparatus comprising a processor [fig. 2, processor 202], the method comprising:
receiving a constraint condition [pars. 4 and 5; note resource constraints when using a machine learning model], the constraint condition indicating a constraint to be satisfied when a corresponding trained neural network performs an inference [pars. 4, 5, 66, 79, and 81; the resource constraints are applicable to a trained machine learning model (e.g., a neural network) used to perform a specific task (e.g., inference)], wherein the constraint condition specifies a maximum of an amount of memory to be used when the inference is performed, a maximum of an amount of computation to be used when the inference is performed, or a maximum of an amount of time that is to be used when the inference is performed [pars. 4-7, 48, 59, and 116; the resource constraints include computational bandwidth, power, and throughput (i.e., an amount of computation), storage capabilities (i.e., an amount of storage capacity), and/or power capacity of an end device for running the trained machine learning model to perform the task; note that these resource constraints are limits (i.e., maximum amounts) of resources allowed for performing the task]; and
by performing an optimization operation based on the constraint condition and the trained neural network, generating a pruning mask, wherein the optimization operation comprises determining that the pruning mask’s application to the neural network will optimize the performance of the operation of the neural network with respect to the weight importance while also satisfying the constraint condition [pars. 4-7, 48, 52, 59, 66, 79, 81, 85, 114-116, 119, and 126; a subset of neuronal units with confidence scores (i.e., weight importance) satisfying a configurable threshold defined based on the resource constraints are selected for removal (i.e., an optimization operation is performed to generate a pruning mask); note that the machine learning model is compressed (i.e., pruned) “for applied use in specific tasks” into a configuration “for optimality with respect to the end device context and the resource constraints”; the configuration of the compressed machine learning model minimizes accuracy loss due to compression by only removing neuronal units having low contribution and redistributing their parameters to the remaining neuronal units (i.e., maximizing contribution/weight of neuronal units remaining after compression)];
using the pruning mask to prune a weight of an input channel of a layer of the trained neural network and to prune a weight of an output channel of the layer of the trained neural network [pars. 80, 105-110, 116-118; selecting and removing the subset of neuronal units comprises generating binary vectors from inbound and outbound confidence scores]; and
performing the inference using the pruned neural network to infer an output based on an input inputted to the pruned neural network [pars. 11, 66, and 81; the compressed machine learning model is deployed for use in performing the task (e.g., inference)].
Referring to claim 22, Garg discloses The method of claim 21, wherein the pruning mask is determined based on an input pruning vector corresponding an input channel, with respect to the pruning, of a layer of the neural network and an output pruning vector corresponding to an output channel of the layer of the neural network [pars. 80, 105-110, 116, and 117; note the generating of binary vectors from inbound and outbound confidence scores].
Referring to claim 23, Garg discloses The method of claim 22, wherein the weight feature is based on one or more weights of the trained neural network [pars. 82, 83, 85, 98-100, 105, 106, 108 and 114-116; note the confidence scores based on weight].
Referring to claim 24, Garg discloses The method of claim 21, wherein the weight feature corresponds to an effect of the weights on prediction accuracy of the trained neural network [pars. 79, 82, 83, 85, 94, 98-100, 105-110; note that a confidence score represents importance of a neuronal unit based on weight with respect to accuracy].
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GRACE PARK whose telephone number is (571)270-7727. The examiner can normally be reached M-F 8AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, TAMARA KYLE can be reached at (571)272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Grace Park/Primary Examiner, Art Unit 2144