Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
2. This action is in response to the original filing on 06/02/2023. Claims 1-8 are pending and have been considered below.
Claim Rejections - 35 USC § 112
3. The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention. The claim recites “AMM”, which is an acronym that is not standard in the art and is not clearly defined in the claim set. Although the specification may describe an Activation Memory Matrix, the acronym “AMM” is introduced in the claims without first providing full term, and it is therefore unclear whether “AMM” refers to Activation Memory Matrix or some other structures or module. For purposes of examination and consistent with the specification’s description, the Examiner interprets “AMM” as “Activation Memory Matrix”.
Claim Rejections - 35 USC § 101
4. 35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-8 are rejected under 35 U.S.C. 101 because the claimed invention is directed to the abstract idea without significantly more.
Step 1, the claims are directed to the statutory categories of a process.
Claim 1:
Step 2A Prong 1, Claim 1 recites, in part
implementing a non-zero Activation jump algorithm (Mathematical concepts, mathematical calculations).
for each vector multiplication (Mathematical concepts, mathematical calculations).
Step 2A Prong 2, this judicial exception is not integrated into a practical application.
The additional elements:
using multiple first in first out (FIFO) memories to store non-zero activations (mere instructions to apply the exception using a generic computer component).
Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, either alone or in combination.
The additional elements:
using multiple first in first out (FIFO) memories to store non-zero activations (mere instructions to apply the exception using a generic computer component).
Claims 2-8 provide further limitations to the abstract idea (Mathematical concepts) as rejected in claim 1, however, they do not disclose any additional elements that would amount to a practical application or significantly more than an abstract idea (data gathering/insignificant extra-solution activity and/or generic computer component).
Claim Rejections - 35 USC § 102
5. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
6. Claims 1 and 3 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hill et al. (U.S. Patent Application Pub. No. US 20200104692 A1).
Claim 1: Hill teaches a method of activation sparsity removal (i.e. Notably, the activation tensor 510 has many zero values. Processing of activations having a zero value may be wasteful. Aspects of the present disclosure exploit activation sparsity caused by zero value activations to reduce the total number of multiplications performed; para. [0057]), comprising at least one of:
implementing a non-zero Activation jump algorithm (i.e. A painting algorithm may pack the activations in the FIFO buffers 660. For example, the painting algorithm may be used to populate the FIFO buffer 660 from the first multilane segment 610 of the activation tensor 510. In this example, the painting function operates by traversing down and across the sparse activations 520 and populating the FIFO buffer 660 with the non-zero activation values; para. [0068]), the function traverses sparse activations and outputs only non-zero values into a compact stream (FIFO); or
using multiple first in first out (FIFO) memories (i.e. the non-zero activations from the first activation group 822 are packed into the FIFO buffers 860. Similarly, the non-zero activations from the second activation group 824 are packed into FIFO buffers 862. Subsequently, the non-zero activations are popped from the FIFO buffers 860 and the FIFO buffers 862 and stored in the TCM memory 870 and the TCM memory 872 for processing on the vector lanes of the MAC hardware 540 to compute dot products of the activations and the weights of a weight tensor 850; para. [0079], FIFO buffers 860 and 862 and popping them for dot-product processing) to store non-zero activations (i.e. activations from the first multilane segment 610 are packed (e.g., compacted) in first-in-first-out (FIFO) buffers 660 (or memory buffer) in a maximally dense form using intra-segment lane sloshing. That is, the FIFO buffers 660 include only non-zero activations; para. [0065]) for each vector multiplication (i.e. Once loaded into the compressed activation columns of the FIFO buffers 660, activations from the compressed activation columns of the FIFO buffers 660 may be popped from the FIFO buffers 660 and written to tightly coupled memory (TCM 670) with their metadata for processing on a corresponding one of the multiplier/vector lanes (e.g., 541, 542, 543, or 544) by the MAC hardware 540; para. [0057, 0070]), multiplier/vector lanes for dot-product MAC processing.
Claim 3: Hill teaches the method of claim 1. Hill further teaches comprising supporting at least one of multiple different parallel modes including at least one of: a multiple points (pixels) parallel scheme, a lines parallel scheme, a multiple input channels parallel scheme, or a multiple output channels parallel scheme (i.e. When the number of input channels is less than the vector width of the machine, resulting in empty activation channels, artificial sparsity is introduced to spread the activations of the non-empty channel across the vector lanes of MAC hardware; para. [0065, 0078, 0088]).
Claim Rejections – 35 USC § 103
7. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
8. Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Hill in view of Ng et al. (U.S. Patent Application Pub. No. US 20190114529 A1).
Claim 2: Hill teaches the method of claim 1. Hill further teaches comprising generating vector multiplication tensors for machine learning models or algorithms (i.e. FIG. 5 is a block diagram illustrating vector lanes to compute dot products of activations and weights in a deep neural network (DNN) 500. This example illustrates a part of processing a layer of DNN 500, in which an X*I activation tensor and a W*I weight tensor are retrieved from memory. FIG. 5 shows a simplified example in which X=5, 1=8, and W=9 to avoid obscuring details of the present disclosure. Processing the layer of the DNN 500 may include calculating the dot product of every column of an activation tensor 510 with every column of a weight tensor 550 using multiple accumulate (MAC) hardware 540; para. [0057]).
Hill does not explicitly teach generating different combinations of vector multiplication tensors.
However, Ng teaches comprising generating different combinations (i.e. Each per-layer instruction specifies processing of a respective layer of the neural network. In addition, each per-layer instruction specifies a respective offset of a weight matrix from the base address of the combined weight matrices in a shared memory. The processing of each layer of the neural network will access a respective one of the weight matrices. The per-layer instructions also specify configuration parameters for different neural network operations in different layers; para. [0048]) of vector multiplication tensors for machine learning models or algorithms (i.e. The configuration registers are accessible to the KA interface 412 for storing addresses of memory buffers in the RAM 226 and configuration parameters for neural network operations, such as matrix dimensions for general matrix multiplication (GEMM) and the stride/window for convolution; para. [0045]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Hill to include the feature of Ng. One would have been motivated to make this modification because it provides better performance/efficiency when activations are sparse.
9. Claims 4 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Hill in view of Ng et al. (U.S. Patent Application Pub. No. US 20190114538 A1).
Claim 4: Hill teaches the method of claim 1. Hill further teaches comprising implementing a execution NPU, a execution NPU, or a combination of a execution NPU and a execution NPU to implement the vector multiplication (i.e. The operating system, in turn, may exploit activation sparsity in computations performed on the CPU 422, the DSP 424, the GPU 426, the NPU 428, or some combination thereof; para. [0054, 0056]).
Hill does not explicitly teach a sequential execution, a concurrent execution, or a combination of a sequential execution and a concurrent execution.
However, Ng ‘538 teaches comprising implementing a sequential execution NPU, a concurrent execution NPU, or a combination of a sequential execution NPU and a concurrent execution NPU to implement the vector multiplication (i.e. the host waits until the neural network accelerator signals completion of performing the operations of one layer before instructing the neural network accelerator to commence performing the neural network operations of the next layer; para. [0021, 0024]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Hill to include the feature of Ng ‘538. One would have been motivated to make this modification because hardware can be reused across many layers by executing layers one after another, avoiding duplicating compute blocks.
Claim 5: Hill and Ng ‘538 teaches the method of claim 4. Hill does not explicitly teach wherein: implementing a sequential execution comprises storing back (feedback) an output of each neural network layer to a current AMM layer; and implementing a concurrent execution comprises allocating different hardware resources to different DNN layers to process the DNN layers in parallel (concurrently).
However, Ng ‘538 further teaches wherein: implementing a sequential execution NPU comprises storing back (feedback) an output of each neural network layer to a current AMM layer (i.e. In response to the neural network accelerator signaling completion of layer i, the host prepares a work request for layer i+1. The work request for layer i+1 instructs the neural network accelerator to use the results data matrix in the shared memory from layer i as an input data matrix for layer i+1. After the neural network accelerator has completed the neural network operations of the last layer of the neural network, the host copies the results data matrix from the shared memory to the host memory; para. [0047]); and implementing a concurrent execution NPU comprises allocating different hardware resources to different DNN layers to process the DNN layers in parallel (concurrently) (i.e. In systems having multiple neural network accelerators 238 controlled by the same KA interface 412, the KA interface can direct different ones of the neural network accelerators to perform the operations of different layers of the neural network. The KA interface controls and tracks the locations in the RAM 226 of the buffers that store the results data matrices generated by the neural network accelerator(s); para. [0048]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Hill to include the feature of Ng ‘538. One would have been motivated to make this modification because it enables sequential reuse of compute and pipeline design.
10. Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Hill in view of Ng ‘538 and further in view of Ng et al. (U.S. Patent Application Pub. No. US 20190114533 A1).
Claim 6: Hill and Ng ‘538 teaches the method of claim 4. Hill does not explicitly implementing a sequential execution comprises reusing hardware resources to calculate different layers of a same neural network; and implementing a concurrent execution comprises providing results of each DNN layer to another hardware logic that executes a next DNN layer.
However, Ng ‘538 further teaches wherein: implementing a sequential execution NPU comprises reusing hardware resources to calculate different layers of a same neural network (i.e. In response to the neural network accelerator signaling completion of layer i, the host prepares a work request for layer i+1. The work request for layer i+1 instructs the neural network accelerator to use the results data matrix in the shared memory from layer i as an input data matrix for layer i+1. After the neural network accelerator has completed the neural network operations of the last layer of the neural network, the host copies the results data matrix from the shared memory to the host memory; para. [0047]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Hill to include the feature of Ng ‘538. One would have been motivated to make this modification because it enables sequential reuse of compute and pipeline design.
However, Ng ‘533 teaches wherein: implementing a sequential execution NPU comprises reusing hardware resources to calculate different layers of a same neural network (i.e. The layers are defined in a sequential order such that Layer 1 is performed before Layer 2, Layer 2 is performed before Layer 3, and so forth … the neural network accelerator executes four convolutions sequentially (e.g., conv1->conv2->conv3->conv4) on FPGA; para. [0030, 0046]); and implementing a concurrent execution NPU comprises providing results of each DNN layer to another hardware logic that executes a next DNN layer (i.e. the neural network 100 can be parallelized such that each layer can operate concurrently. That is, during each clock cycle, the layers can receive new data and output processed data. For example, during each clock cycle, new image data 101 can be provided to Layer 1. For simplicity, assume that during each clock cycle a new image is provided to Layer 1 and each layer can output processed data for image data that was received in the previous clock cycle. If the layers are implemented in hardware to form a parallelized pipeline, after seven clock cycles, each of the layers operates concurrently to process image data (albeit on seven different images). Thus, implementing the layers in hardware to form a parallel pipeline can vastly increase the throughput of the neural network when compared to operating the layers one at a time; para. [0030, 0043]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the combination of Hill and Ng ‘538 to include the feature of Ng ‘533. One would have been motivated to make this modification because it enables a single engine to support many models without per-layer dedicated circuits and improves energy and throughput.
11. Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Hill in view of Mills (U.S. Patent Application Pub. No. US 20190340498 A1).
Claim 7: Hill teaches the method of claim 1. Hill further teaches comprising supporting different convolution operations (i.e. Upon receiving the image 226, a convolutional layer 232 may apply convolutional kernels (not shown) to the image 226 to generate a first set of feature maps 218. As an example, the convolutional kernel for the convolutional layer 232 may be a 5×5 kernel that generates 28×28 feature maps. In the present example, because four different convolutional kernels were applied to the image 226 at the convolutional layer 232, four different feature maps are generated in the first set of feature maps 218. The convolutional kernels may also be referred to as filters or convolutional filters; para. [0039]).
Hill does not explicitly teach different size convolution operations.
However, Mills teaches comprising supporting different size convolution operations (i.e. per-sub-channel kernels of different sizes can be generated at each neural engine 314. For example, 5×5 shaped kernel data 326 may be sub-sampled (e.g., at kernel extract circuit 432) into sub-kernels of sizes 3×3, 2×3, 3×2 and 2×2, and provided as corresponding kernel coefficients 422 to MAC 404 for sub-channel convolutions with corresponding sub-channels of portion 408 of input data; para. [0095]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Hill to include the feature of Mills. One would have been motivated to make this modification because it improves accelerator applicability and performance by avoiding software fallbacks and reducing memory overhead.
Claim 8: Hill and Mills teach the method of claim 7. Hill further teaches wherein supporting convolution operations comprises supporting two n*n convolution operations (i.e. Upon receiving the image 226, a convolutional layer 232 may apply convolutional kernels (not shown) to the image 226 to generate a first set of feature maps 218. As an example, the convolutional kernel for the convolutional layer 232 may be a 5×5 kernel that generates 28×28 feature maps. In the present example, because four different convolutional kernels were applied to the image 226 at the convolutional layer 232, four different feature maps are generated in the first set of feature maps 218. The convolutional kernels may also be referred to as filters or convolutional filters; para. [0039]).
Hill does not explicitly teach wherein supporting different size convolution operations comprises supporting two different n*n convolution operations, and wherein n in a first of the convolution operations has a first value that is different than a second value of n in a second of the convolution operations.
However, Mills further teaches wherein supporting different size convolution operations comprises supporting two different n*n convolution operations, and wherein n in a first of the convolution operations has a first value that is different than a second value of n in a second of the convolution operations (i.e. per-sub-channel kernels of different sizes can be generated at each neural engine 314. For example, 5×5 shaped kernel data 326 may be sub-sampled (e.g., at kernel extract circuit 432) into sub-kernels of sizes 3×3, 2×3, 3×2 and 2×2, and provided as corresponding kernel coefficients 422 to MAC 404 for sub-channel convolutions with corresponding sub-channels of portion 408 of input data; para. [0095, 0104]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Hill to include the feature of Mills. One would have been motivated to make this modification because it improves accelerator applicability and performance by avoiding software fallbacks and reducing memory overhead.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Langhammer et al. (Pub. No. US 20230325665 A1), The sparsity module 450 improves performance and reduces power consumptions of the DNN accelerator 300 based on sparsity in input data (e.g., activations, weights, etc.) of deep learning operations. The sparsity module 450 may have a sparsity acceleration logic that can identify non-zero-valued activation-weight pairs and skips zero-valued activation-weight pairs. A non-zero-valued activation-weight pair includes a non-zero-valued activation and a non-zero-valued weight, while a zero-valued activation-weight pair includes a zero-valued activation or a zero-valued weight.
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAN TRAN whose telephone number is (303)297-4266. The examiner can normally be reached on Monday - Thursday - 8:00 am - 5:00 pm MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matt Ell can be reached on 571-270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/TAN H TRAN/Primary Examiner, Art Unit 2141