DETAILED ACTION
Claims 1-20 are pending.
The office acknowledges the following papers:
IDS filed on 12/4/2025, 5/14/2024, and 3/24/2023.
Priority
The effective filing date for the subject matter defined in the pending claims in this application is 9/21/2021.
Drawings
The Examiner contends that the drawings submitted on 11/17/2022 are acceptable for examination proceedings.
Specification
The disclosure is objected to because of the following informalities:
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. The Applicant’s cooperation is requested in correcting any errors of which the Applicant may become aware.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 1 and 11 recite “a sparsity management unit configured to …”. Paragraphs 4, 5, 32, and 34 of the specification describe the recitation “sparsity management unit”, but failed to show sufficient corresponding structure. The limitation at hand has been found to be indefinite under 35 U.S.C. 112(b) for failure to disclose sufficient corresponding structure, which necessitates a rejection for lack of written description for the lack of sufficient structure within the specification. Thus, the amendment failed to convey to one skilled in the art that possession of the claimed invention is present in the specification at the time of filing.
Claims 2-10 and 12-20 are rejected due to their dependency.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention.
Claims 1 and 11 include the limitation “a sparsity management unit configured to …” that invokes 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specification paragraphs 4, 5, 32, and 34 failed to describe sufficient corresponding structure for the sparsity management unit. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA 35 U.S.C. 112, second paragraph.
Applicant may:
(a) Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g. sparsity management circuit, instead of sparsity management unit);
(b) Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or
(c) Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either:
(a) Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or
(b) Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Claims 2-10 and 12-20 are rejected due to their dependency.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-2 are rejected under 35 U.S.C. 102(a)(1 & 2) as being anticipated by Gunnam et al. (U.S. 2021/0191733).
As per claim 1:
Gunnam disclosed a neural network inference accelerator, comprising:
a memory configured to store at least one activation tensor and at least one weight tensor (Gunnam: Figures 2 and 4 elements 215, 220, and 430, paragraphs 32 and 47)(Input features and weights are compressed and stored in either DRAM or SRAM.);
a first neural processing unit configured to receive the activation tensor and the weight tensor from the memory based on an activation sparsity density of the activation tensor and a weight sparsity density of the weight tensor corresponding to the activation tensor both being greater than a predetermined sparsity density (Gunnam: Figures 2 and 9 elements 245 and 900, paragraphs 34, 36, and 64)(The sparse tensor compute unit (i.e. first NPU) receives input feature maps and weights from the SRAM/DRAM based on sparsity exceeding a predetermined threshold (e.g. 50%).);
a second neural processing unit configured to receive the activation tensor and the weight tensor from the memory based on at least one of the activation sparsity density of the activation tensor and the weight sparsity density of the weight tensor corresponding to the activation tensor being less than or equal to the predetermined sparsity density (Gunnam: Figures 2 and 5 elements 240 and 500, paragraphs 34, 36, and 54)(The dense tensor compute cluster (i.e. second NPU) receives input feature maps and weights from the SRAM/DRAM based on a sparsity being less than a predetermined threshold (e.g. 50%).); and
a sparsity management unit configured to control transfer of the activation tensor and the weight tensor corresponding to the activation tensor from the memory to the first neural processing unit or to the second neural processing system based on the activation sparsity density of the activation tensor and the weight sparsity density of the weight tensor with respect to the predetermined sparsity density (Gunnam: Figures 2 and 10 elements 235 and 1025-1030, paragraphs 34, 36-37, and 73-75)(The scheduling engine (i.e. sparsity management unit) assigns input and weight tensors to dense and sparse tensor compute clusters based on the sparsity analysis. Gunnam gives the example that input feature maps and weights that have a number of zero elements exceeding 50% are allocated to the sparse tensor compute cluster. Additionally, Gunnam gives the example that input feature maps and weights that have a number of zero elements less than 50% are allocated to the dense tensor compute cluster.).
As per claim 2:
Gunnam disclosed the neural network inference accelerator of claim 1, wherein the first neural processing unit is configured to compute a first result for the activation tensor and the weight tensor (Gunnam: Figures 2, 6-7, and 9 elements 245, 615, 700, and 900, paragraphs 34, 36, 56, 58, and 64)(The sparse tensor compute unit (i.e. first NPU) receives input feature maps and weights from the SRAM/DRAM based on sparsity exceeding a predetermined threshold (e.g. 50%). The systolic array performs matrix dot-product calculations.), and
wherein the second neural processing unit is configured to compute a second result for the activation tensor and the weight tensor (Gunnam: Figures 2 and 5-7 elements 240, 500, 615, and 700, paragraphs 34, 36, 54, 56, and 58)(The dense tensor compute cluster (i.e. second NPU) receives input feature maps and weights from the SRAM/DRAM based on a sparsity being less than a predetermined threshold (e.g. 50%). The systolic array performs matrix dot-product calculations.).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 3-4 are rejected under 35 U.S.C. 103 as being unpatentable over Gunnam et al. (U.S. 2021/0191733), further in view of Official Notice.
As per claim 3:
Gunnam disclosed the neural network inference accelerator of claim 2, further comprising a compressor unit configured to receive and compress the first result computed by the first neural processing unit, and to receive and compress the second result computed by the second neural processing unit (Gunnam: Figures 2 and 4 elements 215, 220, and 430, paragraphs 32 and 47)(Gunnam disclosed compressing input features and weights into either DRAM or SRAM, but doesn’t discuss compressing execution results. Official notice is given that execution results can be compressed for storage into memory for the advantage of reducing storage requirements. Thus, it would have been obvious to one of ordinary skill in the art to implement compressing execution results from the tensor compute clusters.), and
wherein the memory is further configured to store the first result compressed by the compressor unit and store the second result compressed by the compressor unit (Gunnam: Figures 2 and 4 elements 215, 220, and 430, paragraphs 32 and 47)(In view of the above official notice, tensor compute results are compressed and stored in the DRAM or SRAM.).
As per claim 4:
Gunnam disclosed the neural network inference accelerator of claim 3, wherein the compressor unit is further configured to generate first metadata associated with the first result and to generate second metadata associated with the second result, and wherein the memory is further configured to store the first metadata and the second metadata (Gunnam: Figures 2 and 4 elements 215, 220, and 430, paragraphs 31-33, 42-43, 47, and 52-53)(Gunnam disclosed compressing input features and weights into either DRAM or SRAM, as well as information indicating compression information (i.e. metadata). In view of the above official notice, tensor compute results are compressed and stored in the DRAM or SRAM with corresponding compression information (i.e. first/second metadata.).
Claims 5-13 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Gunnam et al. (U.S. 2021/0191733), in view of Fishel et al. (U.S. 2019/0340488).
As per claim 5:
The additional limitation(s) of claim 5 basically recite the additional limitation(s) of claim 11. Therefore, claim 5 is rejected for the same reason(s) as claim 11.
As per claim 6:
Gunnam and Fishel disclosed the neural network inference accelerator of claim 5, wherein the decompressor unit is further configured to decompress the activation tensor to the activation sparsity density using first metadata associated with the activation tensor based on the activation tensor being compressed, and to decompress the weight tensor to the weight sparsity density using second metadata associated with the weight tensor based on the weight tensor being compressed (Fishel: Figures 3-4 elements 314A-N and 432, paragraph 48, 81)(Gunnam: Figures 2, 4, and 6 elements 215, 220, 430, and 615, paragraphs 31-33, 42-43, 47, 52-53, and 56)(Gunnam disclosed that input features and weights are compressed and stored in either DRAM or SRAM, as well as information indicating compression information (i.e. metadata). Fishel disclosed decompressing compressed kernel data prior to processing. The combination implements decompression elements for the input features and weights such that these compressed inputs are decompressed before systolic array processing using the corresponding compression information for both input features and weights (i.e. first and second metadata).).
As per claim 7:
Gunnam and Fishel disclosed the neural network inference accelerator of claim 5, wherein the activation sparsity density is based on a structured-sparsity arrangement or a random-sparsity arrangement (Gunnam: Figures 2 and 10 elements 235 and 1025-1030, paragraphs 34, 36-37, and 73-75)(The scheduling engine (i.e. sparsity management unit) assigns input and weight tensors to dense and sparse tensor compute clusters based on the sparsity analysis. Gunnam gives examples of sparse input feature tensor having 80% zeros and greater than 50% zeros (i.e. structured-sparsity).).
As per claim 8:
Gunnam and Fishel disclosed the neural network inference accelerator of claim 7, wherein the activation sparsity density is based on a 1:4 structured-sparsity arrangement, or a 2:8 structured-sparsity arrangement (Gunnam: Figures 2 and 10 elements 235 and 1025-1030, paragraphs 34, 36-37, and 73-75)(The scheduling engine (i.e. sparsity management unit) assigns input and weight tensors to dense and sparse tensor compute clusters based on the sparsity analysis. Gunnam gives examples of sparse input feature tensors having 80% zeros and greater than 50% zeros (i.e. structured-sparsity). It would have been obvious to one of ordinary skill in the art that the sparsity threshold can be 75% zeros. In addition, according to “In re Rose” (105 USPQ 237 (CCPA 1955)), changes in size or range doesn’t give patentability over prior art.).
As per claim 9:
Gunnam and Fishel disclosed the neural network inference accelerator of claim 7, wherein the weight sparsity density is based on a structured-sparsity arrangement or a random-sparsity arrangement (Gunnam: Figures 2 and 10 elements 235 and 1025-1030, paragraphs 34, 36-37, and 73-75)(The scheduling engine (i.e. sparsity management unit) assigns input and weight tensors to dense and sparse tensor compute clusters based on the sparsity analysis. Gunnam gives examples of sparse weight tensors having 80% zeros and greater than 50% zeros (i.e. structured-sparsity).).
As per claim 10:
Gunnam and Fishel disclosed the neural network inference accelerator of claim 9, wherein the weight sparsity density is based on a 1:4 structured-sparsity arrangement, or a 2:8 structured-sparsity arrangement (Gunnam: Figures 2 and 10 elements 235 and 1025-1030, paragraphs 34, 36-37, and 73-75)(The scheduling engine (i.e. sparsity management unit) assigns input and weight tensors to dense and sparse tensor compute clusters based on the sparsity analysis. Gunnam gives examples of sparse weight tensors having 80% zeros and greater than 50% zeros (i.e. structured-sparsity). It would have been obvious to one of ordinary skill in the art that the sparsity threshold can be 75% zeros. In addition, according to “In re Rose” (105 USPQ 237 (CCPA 1955)), changes in size or range doesn’t give patentability over prior art.).
As per claim 11:
Claim 11 essentially recites the same limitations of claim 1. Claim 11 additionally recites the following limitations:
a decompressor unit configured to decompress an activation tensor to a first predetermined sparsity density based on the activation tensor being compressed, and to decompress an weight tensor to a second predetermined sparsity density based on the weight tensor being compressed (Fishel: Figures 3-4 elements 314A-N, 432, paragraph 48, 81)(Gunnam: Figures 2, 4, and 6 elements 215, 220, 430, and 615, paragraphs 32, 47, and 56)(Gunnam disclosed that input features and weights are compressed and stored in either DRAM or SRAM. Fishel disclosed decompressing compressed kernel data prior to processing. The combination implements decompression elements for the input features and weights such that these compressed inputs are decompressed before systolic array processing.).
Gunnam disclosed compressing input features and weights into memory, but doesn’t explicitly state how the tensor compute units handle compressed inputs. One of ordinary skill in the art would have been motivated by this lack of teaching in Gunnam to find the Fishel reference that disclosed neural engines including decompression logic to decompress compressed weight inputs. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the decompression logic of Fishel into the system of Gunnam for the advantage of accurately handling incoming compressed input features and weights that are to be processed.
As per claim 12:
Claim 12 essentially recites the same limitations of claim 1. Claim 12 additionally recites the following limitations:
wherein the decompressor unit receives the activation tensor and weight tensor from the memory (Fishel: Figures 3-4 elements 314A-N, 432, paragraph 48, 81)(Gunnam: Figures 2, 4, and 6 elements 215, 220, 430, and 615, paragraphs 32, 47, and 56)(The combination implements decompression elements for the input features and weights such that these compressed inputs are decompressed before systolic array processing. The decompression elements receive compressed inputs from the SRAM/DRAM.).
As per claim 13:
The additional limitation(s) of claim 13 basically recite the additional limitation(s) of claim 2. Therefore, claim 13 is rejected for the same reason(s) as claim 2.
As per claim 17:
The additional limitation(s) of claim 17 basically recite the additional limitation(s) of claim 7. Therefore, claim 17 is rejected for the same reason(s) as claim 7.
As per claim 18:
The additional limitation(s) of claim 18 basically recite the additional limitation(s) of claim 8. Therefore, claim 18 is rejected for the same reason(s) as claim 8.
As per claim 19:
The additional limitation(s) of claim 19 basically recite the additional limitation(s) of claim 9. Therefore, claim 19 is rejected for the same reason(s) as claim 9.
As per claim 20:
The additional limitation(s) of claim 20 basically recite the additional limitation(s) of claim 10. Therefore, claim 20 is rejected for the same reason(s) as claim 10.
Claims 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Gunnam et al. (U.S. 2021/0191733), in view of Fishel et al. (U.S. 2019/0340488), in view of Official Notice.
As per claim 14:
The additional limitation(s) of claim 14 basically recite the additional limitation(s) of claim 3. Therefore, claim 14 is rejected for the same reason(s) as claim 3.
As per claim 15:
The additional limitation(s) of claim 15 basically recite the additional limitation(s) of claim 4. Therefore, claim 15 is rejected for the same reason(s) as claim 4.
As per claim 16:
The additional limitation(s) of claim 16 basically recite the additional limitation(s) of claim 6. Therefore, claim 16 is rejected for the same reason(s) as claim 6.
Conclusion
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Saeed et al. (U.S. 2023/0079975), taught a NPU receiving compressed input features and weights.
Paramasivam et al. (U.S. 2022/0222513), taught scheduling operations on NPUs.
Hazanchuk (U.S. 2022/0188611), taught NPU optimization.
Lamb et al. (U.S. 2019/0087713), taught compression of convolution network weights.
Grymel et al. (U.S. 11,940,907), taught generating sparsity maps.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988. The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on (571) 270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JACOB PETRANEK/Primary Examiner, Art Unit 2183