Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
1. Applicant’s arguments, filed March 18th, 2026, with respect to the claim objections have been fully considered and are persuasive in light of the claim amendments. The claim objections have been withdrawn.
2. Applicant's arguments filed March 18th, 2026, with respect to the 35 U.S.C. 103 rejections have been fully considered but they are not persuasive.
As Applicant’s arguments are directed toward limitations of the claims which have been modified or added via amendment, they will be addressed in the rejections below.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
3. Claims 1, 3-13, and 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Gao et al (US 2019/0279072, herein Gao) in view of Yu et al (US 2021/0174177, herein Yu).
Regarding claim 1, Gao teaches a neural network processing unit (NPU) for processing a neural network (NN) model, the NPU comprising:
a processing element array configured to process the NN model (Figs 3, 8, [0041], processor unit, [0119], various neural network units, [0123], multiprocessor embodiments);
a memory of the NPU configured to store data of the NN model processed in the processing element array (Figs 3 & 8, [0041], [0119], multiple layers of memories internal and external to NPU); and
a processing control circuit configured to:
control communication, via an NPU interface or a system bus, with a main memory system external to the NPU (Fig 3, [0040-0041], external memory, network interface, bus),
control the processing element array and the memory of the NPU to use a value corresponding to output data of a first layer of the NN model as a value corresponding to input data of a second layer of the NN model ([0119], [0121], control unit 406 & [0051], [0074], [0097], output of each layer is used as input to next layer in calculations); and
control use of data stored in the memory of the NPU based on NN model structure data or NN data locality information (Abstract, [0038], [0054], [0113], optimizing model structure).
Gao fails to teach wherein the NPU is to reuse a memory address value corresponding to the output data of the first layer as a memory address value corresponding to the input of the second layer, or wherein the control circuit controls whether a request for memory access is made to the main memory system such that the request is not made when the data stored in the memory of the NPU is reused.
Yu teaches a neural network processing unit (NPU) configured to reuse a memory address value corresponding to output data of a first layer as a memory address value corresponding to input of a second layer, wherein whether a request for memory access is made to the main memory system such that the request is not made when the data stored in the memory of the NPU is reused ([0075-0076], reuse output feature map of first layer as input to second layer, [0087], [0095-0096], [0130], [0145], feature map to be reused accessed via address memory, [0072], [0076], exclude access to external memory when feature map is reused).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Gao and Yu to reuse shared attributes of data such as a memory address when using it in multiple neural network layers. While both Gao and Yu disclose neural network operations wherein an output of a first layer is used as input to a second layer, Gao does not explicitly disclose wherein the memory address of the output is necessarily reused by the second layer. However, one of ordinary skill in the art would understand that the usage of the same set of data by two processing elements or neural network layers would necessarily entail accessing the same location in memory where the data is stored, and therefore reusing the address of such data, as taught by Yu, would be an obvious means to efficiently implement this sharing of the data. As both Gao and Yu disclose neural network processors for processing neural network modes, the combination would merely entail a simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.
Regarding claim 3, the combination of Gao and Yu teaches the NPU of claim 1, wherein the NN model is optimized based on at least one of structure data of the NPU and structure data of the memory of the NPU (Gao Abstract, [0038], [0054], [0113], optimizing model structure).
Regarding claim 4, the combination of Gao and Yu teaches the NPU of claim 1, wherein the NN model is optimized so as to satisfy a condition that a deterioration of inference accuracy of the NN model is maintained above a threshold value (Gao [0030], [0089], maintain operation accuracy according to threshold parameter).
Regarding claim 5, the combination of Gao and Yu teaches the NPU of claim 1, wherein the NN model is optimized so that a data size of the NN model becomes less than or equal to a threshold value in while degradation of inference accuracy is minimized (Gao [0030], [0089], maintain operation accuracy according to threshold parameter).
Regarding claim 6, the combination of Gao and Yu teaches the NPU of claim 1, wherein the NN model is optimized by utilizing at least one of a quantization algorithm, a pruning algorithm, a retraining algorithm, a quantization aware retraining algorithm and a model compression algorithm (Gao [0038], quantization algorithm, [0023], model training).
Regarding claim 7, the combination of Gao and Yu teaches the NPU of claim 1, wherein the processing control circuit is further configured to control the processing element array and the memory of the NPU based on sequence information configured to schedule a processing sequence from an input layer to an output layer of the NN model (Gao [0071-0072], model structure sequences, Yu [0075-0076], reuse output feature map of first layer as input to second layer).
Regarding claim 8, the combination of Gao and Yu teaches the NPU of claim 1, wherein the processing control circuit is further configured to control the processing element array and the memory of the NPU by analyzing predefined operation order information of the NN model (Gao [0049], [0071-0073], defined sequence order for processing model layer).
Regarding claim 9, the combination of Gao and Yu teaches the NPU of claim 1, wherein the processing control circuit is further configured to schedule an operation order of the NN model based on the NN model structure data or the NN data locality information (Gao [0049], [0071-0073], defined sequence order for processing model layer & Abstract, [0038], optimizing model structure).
Regarding claim 10, the combination of Gao and Yu teaches the NPU of claim 1, wherein the processing control circuit is configured to access a memory address value where node data and weight data of layers of the NN model are stored based on a predefined operation order information of the NN model (Gao [0049], [0071-0073], defined sequence order for processing model layer & Yu [0075-0076], reuse output feature map of first layer as input to second layer).
Regarding claim 11, the combination of Gao and Yu teaches the NPU of claim 1, wherein the processing control circuit is further configured to schedule a processing order based on structure data from an input layer to an output layer of the neural network or the NN data locality information (Gao [0049], [0071-0073], defined sequence order for processing model layer & Yu [0075-0076], reuse output feature map of first layer as input to second layer).
Regarding claim 12, the combination of Gao and Yu teaches the NPU of claim 1, wherein the processing control circuit is further configured to recognize reusable variable values and reusable constant values based on predefined operation order information of the NN model and control to reuse the memory of the NPU using the reusable variable values and the reusable constant values (Gao [0049], [0071-0073], defined sequence order for model layer & Yu [0075-0076], reuse output feature map of first layer as input to second layer).
Regarding claim 13, Gao teaches a neural network processing unit (NPU) for processing an artificial neural network model (ANN model), the NPU comprising:
a plurality of processing elements; (Figs 3, 8, [0041], processor unit, [0119], various neural network units, [0123], multiprocessor embodiments);
a data storage circuit of the NPU configured to store data of the ANN model processed in the plurality of processing elements (Figs 3 & 8, [0041], [0119], multiple layers of memories internal and external to NPU); and
an NPU control circuit configured to:
control communication, via an NPU interface or a system bus, with a main memory system external to the NPU (Fig 3, [0040-0041], external memory, network interface, bus),
control the data storage circuit to store a value corresponding to output data of a first layer of the ANN model as a value corresponding to input data of a second layer of the ANN model ([0119], [0121], control unit 406 & [0051], [0074], [0097], output of each layer is used as input to next layer in calculations), and
control use of data stored in the memory of the NPU based on ANN model structure data or ANN data locality information (Abstract, [0038], [0054], [0113], optimizing model structure).
Gao fails to teach wherein the NPU is to reuse a memory address value corresponding to the output data of the first layer as a memory address value corresponding to the input of the second layer, or wherein the control circuit controls whether a request for memory access is made to the main memory system such that the request is not made when the data stored in the memory of the NPU is reused.
Yu teaches a neural network processing unit (NPU) configured to reuse a memory address value corresponding to output data of a first layer as a memory address value corresponding to input of a second layer, wherein whether a request for memory access is made to the main memory system such that the request is not made when the data stored in the memory of the NPU is reused ([0075-0076], reuse output feature map of first layer as input to second layer, [0087], [0095-0096], [0130], [0145], feature map to be reused accessed via address memory, [0072], [0076], exclude access to external memory when feature map is reused).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Gao and Yu to reuse shared attributes of data such as a memory address when using it in multiple neural network layers. While both Gao and Yu disclose neural network operations wherein an output of a first layer is used as input to a second layer, Gao does not explicitly disclose wherein the memory address of the output is necessarily reused by the second layer. However, one of ordinary skill in the art would understand that the usage of the same set of data by two processing elements or neural network layers would necessarily entail accessing the same location in memory where the data is stored, and therefore reusing the address of such data, as taught by Yu, would be an obvious means to efficiently implement this sharing of the data. As both Gao and Yu disclose neural network processors for processing neural network modes, the combination would merely entail a simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.
Claims 15-18 refer to an alternate NPU embodiment of the NPU embodiment of claims 3-6. Therefore, the above rejections for claims 3-6 are applicable to claims 15-18, respectively.
Regarding claim 19, Gao teaches a neural network processing unit (NPU) comprising:
a processing element array; (Figs 3, 8, [0041], processor unit, [0119], various neural network units, [0123], multiprocessor embodiments);
a memory of the NPU configured to store data of an artificial neural network (ANN) model processed in the processing element array, the ANN model optimized by utilizing at least one of a quantization algorithm, a pruning algorithm, a retraining algorithm, a quantization aware retraining algorithm and a model compression algorithm ([0038], quantization algorithm, [0023], model training, Figs 3 & 8, [0041], [0119], internal and external memory); and
a processing control circuit configured to:
control communication, via an NPU interface or a system bus, with a main memory system external to the NPU (Fig 3, [0040-0041], external memory, network interface, bus),
use a value in which an operation value of a first layer of a first scheduling is stored as a value corresponding to input data of a second layer of a second scheduling that immediately follows the first scheduling ([0119], [0121], control unit 406 & [0051], [0074], [0097], output of each layer is used as input to next layer in calculations),
control use of data stored in the memory of the NPU based on ANN model structure data or ANN data locality information (Abstract, [0038], [0054], [0113], optimizing model structure), and
wherein the artificial neural network model is optimized by utilizing at least one of a quantization algorithm, a pruning algorithm, a retraining algorithm, a quantization aware retraining algorithm and a model compression algorithm ([0038], quantization algorithm, [0023], model training).
Gao fails to teach wherein the NPU is to reuse a memory address value corresponding to the output data of the first layer as a memory address value corresponding to the input of the second layer, or wherein the control circuit controls whether a request for memory access is made to the main memory system such that the request is not made when the data stored in the memory of the NPU is reused.
Yu teaches a neural network processing unit (NPU) configured to reuse a memory address value corresponding to output data of a first layer as a memory address value corresponding to input of a second layer, wherein whether a request for memory access is made to the main memory system such that the request is not made when the data stored in the memory of the NPU is reused ([0075-0076], reuse output feature map of first layer as input to second layer, [0087], [0095-0096], [0130], [0145], feature map to be reused accessed via address memory, [0072], [0076], exclude access to external memory when feature map is reused).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Gao and Yu to reuse shared attributes of data such as a memory address when using it in multiple neural network layers. While both Gao and Yu disclose neural network operations wherein an output of a first layer is used as input to a second layer, Gao does not explicitly disclose wherein the memory address of the output is necessarily reused by the second layer. However, one of ordinary skill in the art would understand that the usage of the same set of data by two processing elements or neural network layers would necessarily entail accessing the same location in memory where the data is stored, and therefore reusing the address of such data, as taught by Yu, would be an obvious means to efficiently implement this sharing of the data. As both Gao and Yu disclose neural network processors for processing neural network modes, the combination would merely entail a simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.
4. Claims 2, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Gao and Yu as applied to claims 1, 13, and 19 above, and further in view of Chinya et al (US 2020/0410327, herein Chinya, cited in the previous Office Action).
Regarding claim 2, the combination of Gao and Yu teaches the NPU of claim 1, wherein the processing control circuit is further configured to reuse, in consideration of a data size and operation steps of the NN model, a specific memory address in which weight data is stored, and to store an operation value of the NN model according to a scheduling order in a specific memory address of the memory of the NPU, such that the specific memory address in which the operation value is stored is input data of the operation in a next scheduling order (Gao Abstract, [0038], [0054], [0113], optimizing model structure according to scheduling and data parameters & Yu [0075-0076], reuse output feature map of first layer as input to second layer, [0087], [0095-0096], [0130], [0145], feature map to be reused accessed via address memory, [0010], [0020], data reusage at a specific memory address).
Gao and Yu fail to teach wherein the operation value is a MAC operation value.
Chinya teaches a neural processing unit configured to reuse a specific memory address in which weight data is stored ([0028-0029], [0034], weights of neural network model & reusage of NN data) and to store a MAC operation value of the NN model such that the MAC operation value is input data in a next scheduling order ([0028], MAC operations performed in neural network, [0029], [0032], [0034], reuse of neural network data in subsequent operations).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Gao and Yu with those of Chinya to utilize the neural network data reuse techniques in a processing operation that includes MAC (multiply-accumulate) operations. While neither Gao nor Yu explicitly disclose the types of operations performed by the layers of the neural network as being specifically MAC operations, one of ordinary skill in the art would understand that convolutional neural networks (Gao [0004], Yu [0011]) commonly include multiply-accumulate operations as part of their neural network calculations. As both Yu and Chinya disclose techniques for reusing parts of memory for storing neural network parameters between layers or scheduling steps in the model calculation in order to reduce external memory accesses and increase the efficiency of the processor, the combination would merely entail a simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.
Claim 14 refers to an alternate NPU embodiment of the NPU embodiment of claim 2. Therefore, the above rejections for claim 2 is applicable to claim 14.
Claim 20 refers to an alternate NPU embodiment of the NPU embodiment of claim 2. Therefore, the above rejections for claim 2 is applicable to claim 20.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J METZGER whose telephone number is (571)272-3105. The examiner can normally be reached Monday-Friday 8:30-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached at 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL J METZGER/ Primary Examiner, Art Unit 2183