Office Action Analysis: 18304713 — DETECTING AND MITIGATING FAULT IN SPARSITY COMPUTATION IN DEEP NEURAL NETWORK

Office Action

§101 §103
DETAILED ACTION
	This action is responsive to the claims filed on 04/21/2023. Claims 1-20 are pending for examination.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/21/2023 and 07/02/2024 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Statutory Categories
Claims 1-10 are directed to a method.
Claims 11-16 are directed to a computer-readable medium.
Claim 17-20 is directed to an apparatus.



Independent Claims – Claims 1, 11, and 17
Step 2A Prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes. Independent claims 1, 11, and 17 recites limitations that are abstract ideas in the form of mental processes:
Claim 1 recites:
generating a bitmap based on an activation sparsity vector and a weight sparsity vector, the activation sparsity vector indicating one or more positions of the one or more nonzero valued activations in the activation operand, the weight sparsity vector indicating one or more positions of the one or more nonzero valued weights in the weight operand; (this limitation merely amounts to generating a activation and weight sparsity vector at a high level of generality of generation which is being interpreted as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
identifying a nonzero valued activation in the compressed activation operand or a nonzero valued weight in the compressed weight operand based on the bitmap; (this limitation merely amounts to identifying a non-zero activation or weight from the generated sparsity vector which is being interpreted as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
and determining whether there is a fault in identifying the nonzero valued activation or the nonzero valued weight based on a number of one or more nonzero elements in the bitmap.  (this limitation merely amounts to determining a value at a high level of generality being interpreted as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
Claim 1 also recites the following additional elements for the purposes of Step 2A Prong Two
analysis:
A method for deep learning, comprising: storing a compressed activation operand and a compressed weight operand, the compressed activation operand comprising one or more nonzero valued activations in an activation operand of a deep learning operation, the compressed weight operand comprising one or more nonzero valued weights in a weight operand of the deep learning operation;  (storing information compressed values is merely data gathering and is considered insignificant extra-solution activity under MPEP 2106.05(g))
The additional limitations fail step 2A Prong 2 of the 101 analysis because they do not transform the claim into a practical application. These limitations are too abstract or lack technical improvement that would make the concept practically useful. Without clear utility or integration into a specific field, the claim does not relate to any particular application. It does not meet the requirements of Step 2A Prong 2, as it fails to make the concept meaningfully applicable in practice.
Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is “directed” to an abstract idea. This claim recites the following additional elements for the purposes of Step 2B analysis:
A method for deep learning, comprising: storing a compressed activation operand and a compressed weight operand, the compressed activation operand comprising one or more nonzero valued activations in an activation operand of a deep learning operation, the compressed weight operand comprising one or more nonzero valued weights in a weight operand of the deep learning operation;  (storing information compressed values is merely data gathering and is considered insignificant extra-solution activity under MPEP 2106.05(g), furthermore it should noted that the courts have recognized receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information) as well-understood, routine, and conventional activity.)
The claim also fails Step 2B of the analysis because the additional limitations do not amount to significantly more than the abstract idea itself. The additional limitations do not enhance the claim in a way that would move it beyond its abstract ideas as they minimally elaborate on the core concept without adding any inventive or technical substance. Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

Claims 11 and 17 recite limitations substantially similar to claim 1, as such a similar analysis applies.
Claim 11 recites an additional limitation for consideration:
One or more non-transitory computer-readable media storing instructions executable to perform operations for in-network computing (Under step 2A prong II and step 2B, this limitation is invoking computers or other machinery merely as a tool to perform an existing process, see MPEP 2106.05(f))
Claim 17 recites an additional limitation for consideration:
An apparatus, comprising: a computer processor for executing computer program instructions; and a non-transitory computer-readable memory storing computer program instructions executable by the computer processor to perform operations comprising  (Under step 2A prong II and step 2B, this limitation is invoking computers or other machinery merely as a tool to perform an existing process, see MPEP 2106.05(f))


Dependents of Claims 1, 11, and 17
The remaining dependent claims corresponding to independent claims 1 and 4 do not recite additional elements, whether considered individually or in combination, that are sufficient to integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. The analysis of which is shown below:
The claims below recite additional limitations which fail step 2A Prong 2 of the 101 analysis because they do not transform the claim into a practical application. These limitations are too abstract or lack technical improvement that would make the concept practically useful. Without clear utility or integration into a specific field, the claim does not relate to any particular application. It does not meet the requirements of Step 2A Prong 2, as it fails to make the concept meaningfully applicable in practice.
The claims also fails Step 2B of the analysis because the additional limitations do not amount to significantly more than the abstract idea itself. The additional limitations do not enhance the claim in a way that would move it beyond its abstract ideas as they minimally elaborate on the core concept without adding any inventive or technical substance. The claims are unpatentable.

Claim 2 recites the additional limitation of:
The method of claim 1, wherein the bitmap is generated based on a previous bitmap, another nonzero valued activation in the compressed activation operand or another nonzero valued weight in the compressed weight operand was identified based on the previous bitmap, (generation of the bitmap based on a previous bitmap is still being considered as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper) 
and determining whether there is a fault in identifying the nonzero valued activation or the nonzero valued weight comprises: determining a number of one or more nonzero elements in the previous bitmap; (this limitation merely amounts to determining a value at a high level of generality being interpreted as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
and determining whether the number of one or more nonzero elements in the bitmap is not equal to a sum of one plus the number of one or more nonzero elements in the previous bitmap. (this limitation merely amounts to determining a value at a high level of generality being interpreted as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Claim 3 recites the additional limitation of:
The method of claim 2, wherein identifying the nonzero valued activation in the compressed activation operand comprises: after determining that there is a fault in identifying the nonzero valued activation or the nonzero valued weight, generating a new bitmap based on the previous bitmap; (generation of the bitmap based on a previous bitmap is still being considered as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
and identifying the nonzero valued activation in the compressed activation operand or the nonzero valued weight in the compressed weight operand based on the new bitmap.  (a mental process of identification which can reasonably be permed in human mind or with aid of pen and paper)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Claim 4 recites the additional limitation of:
The method of claim 3, wherein generating the new bitmap based on the previous bitmap comprises: replacing a nonzero element in the previous bitmap with zero. (replacing a value with 0 is being considered a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Claim 5 recites the additional limitation of:
The method of claim 1, wherein determining whether there is a fault in identifying the nonzero valued activation or the nonzero valued weight comprises: determining whether the number of one or more nonzero elements in the bitmap is greater than a number of the one or more nonzero valued activations in the activation operand. (a comparison that is being considered a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Claim 6 recites the additional limitation of:
The method of claim 1, wherein determining whether there is a fault in identifying the nonzero valued activation or the nonzero valued weight comprises: determining whether the number of one or more nonzero elements in the bitmap is greater than a number of the one or more nonzero valued weights in the weight operand. (a comparison that is being considered a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Claim 7 recites the additional limitation of:
The method of claim 1, wherein identifying the nonzero valued activation in the compressed activation operand comprises: after determining that there is a fault in identifying the nonzero valued activation, generating a first bitmap and a second bitmap based on the activation sparsity vector and the weight sparsity vector; (generation of the bitmap based on a previous bitmap is still being considered as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
determining a position of the nonzero valued activation in the compressed activation operand based on the bitmap; (determining a position of a value in a bitmap is being considered as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
determining a first position of the nonzero valued activation in the compressed activation operand based on the first bitmap; (determining a position of a value in a bitmap is being considered as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
determining a second position of the nonzero valued activation in the compressed activation operand based on the second bitmap; (determining a position of a value in a bitmap is being considered as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
and identifying the nonzero valued activation in the compressed activation operand based on the position, the first position, and the second position. (a mental process of identification which can reasonably be permed in human mind or with aid of pen and paper)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Claim 8 recites the additional limitation of:
The method of claim 1, wherein identify the nonzero valued weight in the compressed weight operand comprises: after determining that there is a fault in identifying the nonzero valued weight, generating a first bitmap and a second bitmap based on the activation sparsity vector and the weight sparsity vector; (generation of the bitmap based on a previous bitmap is still being considered as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
determining a position of the nonzero valued weight in the compressed weight operand based on the bitmap; (determining a position of a value in a bitmap is being considered as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
determining a first position of the nonzero valued weight in the compressed weight operand based on the first bitmap; (determining a position of a value in a bitmap is being considered as a mental process of evaluation which can reasonably be permed in human mind or with aid of pen and paper)
and identifying the nonzero valued weight in the compressed weight operand based on the position, the first position, and the second position. (a mental process of identification which can reasonably be permed in human mind or with aid of pen and paper)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Claim 9 recites the additional limitation of:
The method of claim 1, wherein identifying the nonzero valued activation in the compressed activation operand comprises: after determining that there is a fault in identifying the nonzero valued activation, identifying another nonzero valued activation in the compressed activation operand, wherein the another nonzero valued activation is subsequently next to a previously identified nonzero valued activation in the compressed activation operand. (a mental process of identification which can reasonably be permed in human mind or with aid of pen and paper)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Claim 10 recites the additional limitation of:
The method of claim 1, wherein identifying the nonzero valued activation in the compressed activation operand or the nonzero valued weight in the compressed weight operand comprises: determining a position of the nonzero valued activation in the compressed activation operand; (determining a position of non-zero value is a mental process of identification which can reasonably be permed in human mind or with aid of pen and paper)
and determining a position of the nonzero valued weight in the compressed weight operand, wherein the nonzero valued activation is multiplied with the nonzero valued weight in the deep learning operation. (a mental process of position identification which can reasonably be permed in human mind or with aid of pen and paper)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Claims 12 and 18 are substantially similar to claim 2, as such a similar analysis applies.
Claims 13-14 are substantially similar to claims 5-6, as such a similar analysis applies.
Claims 15 and 20 are substantially similar to claim 7, as such a similar analysis applies.
Claim 16 is substantially similar to claim 9, as such a similar analysis applies.
Claim 19 is substantially similar to claim 5, as such a similar analysis applies.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 10-11, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Raha et al., (US 12141683 B2), hereafter referred to as Raha in view of Narad et al. (US 6157955 A), hereafter referred to as Narad.

Claim 1: Raha teaches:
A method for deep learning, comprising: storing a compressed activation operand and a compressed weight operand, the compressed activation operand comprising one or more nonzero valued activations in an activation operand of a deep learning operation, the compressed weight operand comprising one or more nonzero valued weights in a weight operand of the deep learning operation; (Raha, col. 12, lines 33-43, “For a sparse ML accelerator, the data in the IF 681 and FL 682 RFs 608 is usually stored in compressed format in input channel (IC) dimension with the accompanying bitmaps stored within dedicated bitmap storage. The find-first logic uses a combination of these two bitmaps (AND) to skip RF 608 values that result in zero partial sums (or MAC 606 multiplications) such that only those IF 681 and FL 682 operands are read from the IF 681 and FL 682 RFs 608 that will result in non-zero partial sums towards the accumulated output… In these embodiments, a bitmap (BM) is encoded inline with the compressed data for decoding into dense data.”
Raha, col. 33, lines 5-14, “As used herein, the terms "sparse vector", "sparse matrix", and "sparse array" refer to an input vector, matrix, or array including both non-zero elements and zero elements. As used herein, the terms "ZVC data vector" "ZVC matrix", and "ZVC array" refer to a vector, matrix, or array that includes all non-zero elements of a vector, matrix, or array in the same order as a sparse vector, matrix, or array, but excludes all zero elements.”, Under Broadest Reasonable Interpretation (BRI), the claimed “activation operand” and “weight operand” read on Raha’s IF (input activation/input feature map) and FL (filter data/weights), respectively. Raha teaches these operands are stored in compressed format and explicitly ties compression to accompanying bitmaps (“IF 681 and FL 682 … stored in compressed format … with the accompanying bitmaps”). Raha’s ZVC definition further supports that compressed storage can be nonzero-only (“includes all non-zero elements … excludes all zero elements”), matching the claim’s “compressed … comprising one or more nonzero valued [activations/weights].”)
generating a bitmap based on an activation sparsity vector and a weight sparsity vector, the activation sparsity vector indicating one or more positions of the one or more nonzero valued activations in the activation operand, the weight sparsity vector indicating one or more positions of the one or more nonzero valued weights in the weight operand; (Raha, col. 12, lines 56-59, “Within each PE 230, the sparsity bitmap is automatically recreated as seen during the load phase using incoming IF and FL wren signals coming from the schedule aware sparse decoders. Subsequently, during the compute phase, the sparsity bitmaps of IF and FL are combined and given as an input to the find-first logic to skip 0 data and gain performance and energy improvements for the entire accelerator system 124.”, The claim’s “activation sparsity vector” and “weight sparsity vector” read on Raha’s sparsity bitmaps of IF and sparsity bitmaps of FL. Under BRI, a “bitmap” is a bit-vector and therefore a type of “vector” that indicates positions of interest (here, sparsity/nonzero structure). Raha expressly teaches these sparsity bitmaps exist (including that the “sparsity bitmap is automatically recreated”) and that IF/FL sparsity bitmaps are used as the sparsity metadata for controlling sparse compute. It is interpreted by the examiner that Raha provides the required sparsity representations that correspond to the claim’s activation/weight sparsity vectors indicating nonzero positions.)
identifying a nonzero valued activation in the compressed activation operand or a nonzero valued weight in the compressed weight operand based on the bitmap; (Raha, col. 12, lines 37-42, “The find-first logic uses a combination of these two bitmaps (AND) to skip RF 608 values that result in zero partial sums (or MAC 606 multiplications) such that only those IF 681 and FL 682 operands are read from the IF 681 and FL 682 RFs 608 that will result in non-zero partial sums towards the accumulated output.”, The claimed “bitmap” used for identification reads on Raha’s Combined Bitmap produced from the Activation Bitmap (IF sparsity bitmap) and Weight Bitmap (FL sparsity bitmap), furthermore, Raha explicitly uses the ANDed combination (“combination of these two bitmaps (AND)”) as the control signal to drive operand selection via “find-first logic.” Because Raha then states “only those IF … and FL … operands are read” (i.e., selected/identified for compute) such that they “will result in non-zero partial sums,” it teaches identifying the nonzero(-relevant) activation/weight operands based on the bitmap. This aligns with claim 1’s “identifying a nonzero valued activation … or … weight … based on the bitmap.”)
Narad, in the same field of , teaches the following limitations which Settles fails to teach:
and determining whether there is a fault in identifying the nonzero valued activation or the nonzero valued weight based on a number of one or more nonzero elements in the bitmap. (Narad, col. 92, lines 19-50, “Returns true if all of the 1-bits in the mask are left contiguous, and returns false otherwise… The function bits returns the number of left-contiguous 1-bits in the mask (a form of “population count”)…. Returns the number of left-contiguous bits in the mask. Returns -1 if the 1-bits in the mask are not left-contiguous.”, Under BRI, the claimed “number of one or more nonzero elements in the bitmap” corresponds to the number of ‘1’ bits in a bitmap/mask (i.e., a population count). Narad explicitly teaches determining that number via a function that “returns the number of … 1-bits … (a form of ‘population count’)” and further teaches using that result to signal an invalid/fault condition (e.g., returning “-1” or false when the mask does not satisfy expected properties).  Thus, Narad teaches determining whether an identification/selection mask is faulty/invalid based on the count of set bits in the bitmap/mask, which corresponds to claim 1’s fault determination “based on” the bitmap’s nonzero-element count.)
It would have been obvious to a person having ordinary skill in the art at the time of the invention to modify Raha’s bitmap-driven sparse operand identification to further include a bitmap validity/fault check based on the number of set bits (popcount) in the bitmap, as taught by Narad, which explicitly describes that a bitmap/mask function “returns the number of … 1-bits … (a form of ‘population count’)” and produces an invalid indication (e.g., “Returns -1 …” / “returns false otherwise”) (Narad, col. 92, lines 19-50) when the mask is not as expected.  A motivation of which would have been to improve correctness and robustness of Raha’s bitmap-controlled operand identification (e.g., detecting malformed/erroneous combined sparsity bitmaps that could cause incorrect IF/FL selection and thus incorrect partial sums), using a bitmap checking approach that relies on counting set bits and flagging invalid masks. 

Claim 10: Raha, and Narad teaches the limitations of claim 1, Raha further teaches:
The method of claim 1, wherein identifying the nonzero valued activation in the compressed activation operand or the nonzero valued weight in the compressed weight operand comprises: determining a position of the nonzero valued activation in the compressed activation operand; (Raha, col. 12, lines 33-43, “For a sparse ML accelerator, the data in the IF 681 and FL 682 RFs 608 is usually stored in compressed format in input channel (IC) dimension with the accompanying bitmaps stored within dedicated bitmap storage. The find-first logic uses a combination of these two bitmaps (AND) to skip RF 608 values that result in zero partial sums (or MAC 606 multiplications) such that only those IF 681 and FL 682 operands are read from the IF 681 and FL 682 RFs 608 that will result in non-zero partial sums towards the accumulated output… In these embodiments, a bitmap (BM) is encoded inline with the compressed data for decoding into dense data.”, Under BRI, the “compressed activation operand” maps to the compressed IF data in the IF RF, with an accompanying IF bitmap. Raha then teaches using the combined bitmaps as input to find-first logic, and implementing operand selection via read pointers based on combined sparsity. Those pointers/indexes are the claimed “position” used to locate/select the nonzero activation from the compressed IF operand.)
and determining a position of the nonzero valued weight in the compressed weight operand, (Raha, col. 12, lines 33-43, “For a sparse ML accelerator, the data in the IF 681 and FL 682 RFs 608 is usually stored in compressed format in input channel (IC) dimension with the accompanying bitmaps stored within dedicated bitmap storage. The find-first logic uses a combination of these two bitmaps (AND) to skip RF 608 values that result in zero partial sums (or MAC 606 multiplications) such that only those IF 681 and FL 682 operands are read from the IF 681 and FL 682 RFs 608 that will result in non-zero partial sums towards the accumulated output… In these embodiments, a bitmap (BM) is encoded inline with the compressed data for decoding into dense data.”, Under BRI, the “compressed weight operand” maps to the compressed FL (weights/filters) in the FL RF, with an accompanying FL bitmap. Raha teaches that the ANDed bitmap drives find-first selection such that only those FL operands are read (selected by bitmap/pointers). That selection inherently determines the position/index of the nonzero weight within the compressed FL operand.)
wherein the nonzero valued activation is multiplied with the nonzero valued weight in the deep learning operation. (Raha, col. 10, lines 8-12, “Here, an input activation/input feature map (IF) and weights/filters (FL) are fed into the MAC 606, and the MAC 606 generates an output activations/output feature map (OF).”, Raha explicitly links IF (activations) and FL (weights) to a MAC datapath, i.e., IF and FL are “fed into the MAC,” satisfying the claim’s requirement that activation is multiplied with weight (MAC multiplication component).)

Claim 11 is substantially similar to claim 1, as such a similar analysis applies.
Claim 11 has the following additional limitations for consideration which Raha further teaches:
One or more non-transitory computer-readable media storing instructions executable to perform operations for in-network computing, (Raha, claim 12, “One or more non-transitory computer-readable media (NTCRM) storing instructions, ”)

Claim 17 is substantially similar to claim 1, as such a similar analysis applies.
Claim 17 has the following additional limitations for consideration which Raha further teaches:
a computer processor for executing computer program instructions; and a non-transitory computer-readable memory storing computer program instructions executable by the computer processor to perform operations comprising:  (Raha, claim 12, “One or more non-transitory computer-readable media (NTCRM) storing instructions, when executed by one or more processors, cause the one or more processors to:”)

Claims 2-8, 12-15, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Raha in view of Narad and Hall et al., (US20120144089A1), hereafter referred to as Hall.

Claim 2: Raha and Narad teaches the limitations of claim 1, Hall in the same field bit mask analysis, teaches the following limitations which Raha and Narad fail to teach:
The method of claim 1, wherein the bitmap is generated based on a previous bitmap, another nonzero valued activation in the compressed activation operand or another nonzero valued weight in the compressed weight operand was identified based on the previous bitmap, (Hall, paragraph 50, “Mask register 410 may track the completion of the gather operation by monitoring the data stored in destination register 415.”
Hall, paragraph 51, “In one embodiment, a processor may call or execute the gather step instruction, for example, in a ‘while’ loop or repeating ‘if’ statement, until mask register 410 may be completely cleared”, Hall’s mask register 410 is a bitmap-like state vector that is read each iteration to decide which elements remain to be processed (analogous to using a previous bitmap to identify the next candidate element). Updating the mask register across iterations provides the claimed “bitmap generated based on a previous bitmap” concept in an iterative selection workflow.)
and determining whether there is a fault in identifying the nonzero valued activation or the nonzero valued weight comprises: determining a number of one or more nonzero elements in the previous bitmap; and determining whether the number of one or more nonzero elements in the bitmap is not equal to a sum of one plus the number of one or more nonzero elements in the previous bitmap. (Hall, paragraph 50, “For example, a “1” in mask register 410 may indicate that a corresponding data element was not written into destination register 415; otherwise a “0” may be used. In such embodiments, the gather instruction may execute until the sum of the values of the state elements in mask register 410 is equal to a predetermined threshold, for example, the number of data elements to be gathered, which may vary for each gather instruction.”, Hall expressly uses a bit-sum (“sum of the values of the state elements”) as a progress/termination check, i.e., counting set bits in the state/bitmap.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Raha and Narad by further incorporating the teachings of Hall to provide the additional functionality regarding (i) generating a current bitmap based on a previous bitmap and (ii) verifying correctness of the update via a count-based check. Raha already teaches iterative bitmap-guided identification in sparse deep-learning compute, e.g., “the sparsity bitmaps of IF and FL are combined and given as an input to the find-first logic” and “the find-first logic uses a combination of these two bitmaps (AND) … such that only those IF and FL operands are read.” Narad teaches determining mask/bitmap validity using a count of set bits (“population count”) and indicates invalidity when the mask does not have expected structure (“Returns -1 … if the 1-bits … are not …”). Hall is analogous (mask/bitmap-driven iterative selection) and explicitly teaches maintaining and updating a “previous” mask state across iterations: “mask register 410 may track the completion …” and the operation proceeds via repeated steps until the mask state indicates completion (e.g., cleared/thresholded), including a count-based completion check where “the sum of the values of the state elements in mask register 410 is equal to a predetermined threshold” (Hall, paragraph 50).  A motivation of which would have been to ensure correct, repeatable stepwise progress (i.e., robust iterative bitmap state updates) during repeated bitmap-guided selection.

Claim 3: Raha, Narad, and Hall teaches the limitations of claim 2, Hall further teaches:
The method of claim 2, wherein identifying the nonzero valued activation in the compressed activation operand comprises: after determining that there is a fault in identifying the nonzero valued activation or the nonzero valued weight, generating a new bitmap based on the previous bitmap and identifying the nonzero valued activation in the compressed activation operand or the nonzero valued weight in the compressed weight operand based on the new bitmap. (Hall, paragraph 48, “According to embodiments of the invention, by storing data elements that have been gathered in destination register 415, the data previously collected by an interrupted or stopped gather operation may be preserved and the gather operation may restart in the middle. The interrupted gather operation (e.g., having gathered one or more data elements) may start from the middle, for example, gathering the remaining elements missing from destination register 415.”, 
Hall, paragraph 50, “Mask register 410 may track the completion of the gather operation by monitoring the data stored in destination register 415. In one embodiment, there is a one-to-one correspondence between data elements stored in destination register 415 and corresponding state elements stored in mask register 410. State elements or values may include flags, markers, tabs, indicators, signals, and or other numbers, bits and/or codes for indicating whether of not a corresponding data element (e.g., in a corresponding or pointed register location) is stored in destination register 415. For example, a “1” in mask register 410 may indicate that a corresponding data element was not written into destination register 415; otherwise a “0” may be used.”, In Hall, the claimed “previous bitmap” corresponds to the contents of mask register 410 at the time the gather operation is stopped/interrupted (i.e., the bitmap state indicating which corresponding data elements are still not written/loaded). Hall explains that “Mask register 410 may track the completion of the gather operation” and that “a ‘1’ in mask register 410 may indicate that a corresponding data element was not written into destination register 415; otherwise a ‘0’ may be used.” The claimed “new bitmap based on the previous bitmap” corresponds to the updated mask register 410 after at least one element has been gathered/written and the corresponding state bit(s) are cleared. Hall explicitly teaches this update behavior: when an element is gathered and written, “the corresponding … state elements in mask register 410 may be set to ‘0’.” Hall further teaches that after interruption, the operation can “restart in the middle” and continue by “gathering the remaining elements missing from destination register 415,” which corresponds to identifying remaining elements using the updated (new) bitmap state.)

Claim 4: Raha, Narad, and Hall teaches the limitations of claim 3, Hall further teaches:
The method of claim 3, wherein generating the new bitmap based on the previous bitmap comprises: replacing a nonzero element in the previous bitmap with zero. (Hall, paragraph 55, “For example, if the cache line read has one element to be gathered, then one element may be written into destination register 415 and the corresponding one bit state elements in mask register 410 may be set to “0”.”, Hall expressly teaches clearing the bit corresponding to a processed element (“set to ‘0’”), which teaches “replacing a nonzero element… with zero”.)

Claim 5: Raha, and Narad teaches the limitations of claim 1, Hall in the same field bit mask analysis, teaches the following limitations which Raha and Narad fail to teach::
The method of claim 1, wherein determining whether there is a fault in identifying the nonzero valued activation or the nonzero valued weight comprises: determining whether the number of one or more nonzero elements in the bitmap is greater than a number of the one or more nonzero valued activations in the activation operand. (Hall, paragraph 50, “For example, a “1” in mask register 410 may indicate that a corresponding data element was not written into destination register 415; otherwise a “0” may be used. In such embodiments, the gather instruction may execute until the sum of the values of the state elements in mask register 410 is equal to a predetermined threshold, for example, the number of data elements to be gathered, which may vary for each gather instruction.”, Hall compares a bit-sum to a expected-count threshold check (threshold = number of elements to gather). In Hall, the claimed “one or more nonzero valued activations in the activation operand” are interpreted as the predetermined set of data elements to be gathered/packed into destination register 415 (i.e., the operand elements that are intended to be loaded/written). Hall states that a processor may execute gather operations “until a predetermined set of data has been completely gathered into destination vector register memory 415.” (Hall, paragraph 49) Hall further provides an explicit 1:1 correspondence between the bitmap and those operand elements: “Mask register 410 may track the completion…” and “there is a one-to-one correspondence between data elements stored in destination register 415 and corresponding state elements stored in mask register 410.” (Hall, paragraph 50) Thus, the count of “data elements to be gathered” is the count of the operand elements being targeted (mapped to the claim’s “nonzero activations”), which Hall uses as the comparison baseline: “the gather instruction may execute until the sum of the values of the state elements in mask register 410 is equal to a predetermined threshold, for example, the number of data elements to be gathered.” (Hall, paragraph 53))
The rationale for combining Raha and Narad with Hall is similar to that as applied for claim 2 above.

Claim 6: Raha, and Narad teaches the limitations of claim 1, Hall in the same field bit mask analysis, teaches the following limitations which Raha and Narad fail to teach::
The method of claim 1, wherein determining whether there is a fault in identifying the nonzero valued activation or the nonzero valued weight comprises: determining whether the number of one or more nonzero elements in the bitmap is greater than a number of the one or more nonzero valued weights in the weight operand. (Hall, paragraph 50, “For example, a “1” in mask register 410 may indicate that a corresponding data element was not written into destination register 415; otherwise a “0” may be used. In such embodiments, the gather instruction may execute until the sum of the values of the state elements in mask register 410 is equal to a predetermined threshold, for example, the number of data elements to be gathered, which may vary for each gather instruction.”, In Hall, the claimed “one or more nonzero valued weights in the weight operand” are interpreted as the predetermined set of data elements to be gathered/packed (operand elements that are intended to be loaded/written), analogous to targeted nonzero weight entries under the claim’s broad interpretation. Hall teaches gathering “a predetermined set of data” into a destination register. Hall also ties each targeted operand element to a corresponding bitmap bit via a 1:1 relation: “there is a one-to-one correspondence between data elements stored in destination register 415 and corresponding state elements stored in mask register 410.” (Hall, paragraph 50) Accordingly, the “number of data elements to be gathered” (Hall’s threshold) corresponds to the count of targeted operand elements (mapped here to nonzero weights), and Hall compares the bitmap’s bit-sum to that baseline: “sum … is equal to a predetermined threshold, for example, the number of data elements to be gathered.” (Hall, paragraph 53).)
The rationale for combining Raha and Narad with Hall is similar to that as applied for claim 2 above.

Claim 7: Raha and Narad teaches the limitations of claim 1. Raha further teaches:
generating a first bitmap and a second bitmap based on the activation sparsity vector and the weight sparsity vector; (Raha, col. 12, lines 33-43, “For a sparse ML accelerator, the data in the IF 681 and FL 682 RFs 608 is usually stored in compressed format in input channel (IC) dimension with the accompanying bitmaps stored within dedicated bitmap storage. The find-first logic uses a combination of these two bitmaps (AND) to skip RF 608 values that result in zero partial sums (or MAC 606 multiplications) such that only those IF 681 and FL 682 operands are read from the IF 681 and FL 682 RFs 608 that will result in non-zero partial sums towards the accumulated output… In these embodiments, a bitmap (BM) is encoded inline with the compressed data for decoding into dense data.”
Raha, col. 12, lines 56-59, “Within each PE 230, the sparsity bitmap is automatically recreated as seen during the load phase using incoming IF and FL wren signals coming from the schedule aware sparse decoders.”, Raha teaches two bitmaps for sparsity when IF (input activation / feature map) and FL (weights/filters) are stored compressed with “accompanying bitmaps.” Under the claim’s BRI: the “activation sparsity vector” maps to the IF-side sparsity bitmap (recreated using IF wren signals), and the “weight sparsity vector” maps to the FL-side sparsity bitmap (recreated using FL wren signals). Thus, Raha teaches generating a first bitmap (IF) and second bitmap (FL) from sparsity-indicating information associated with each operand.)
determining a position of the nonzero valued activation in the compressed activation operand based on the bitmap; (Raha, col. 12, lines 56-59, “Within each PE 230, the sparsity bitmap is automatically recreated as seen during the load phase using incoming IF and FL wren signals coming from the schedule aware sparse decoders. Subsequently, during the compute phase, the sparsity bitmaps of IF and FL are combined and given as an input to the find-first logic to skip 0 data and gain performance and energy improvements for the entire accelerator system 124.”,
Raha, col. 12, line 65, “As stated previously, the MACs can source the operands either from the M number of IF, FL, and OF RFs 608 based on the optimal schedule. Due to data dependent sparsity, the number of read ports in each IF subbank may need to be increased from 1 to 4 for the M×M mode of operation as each of the subbanks will be independently accessed by 4 rd pointers based on the combined sparsity of data in each IF subbank with 4 different FL subbanks.”, Raha’s “bitmap” maps to the combined (AND) sparsity bitmap (IF-bitmap AND FL-bitmap) that is explicitly fed to find-first logic. Under BRI, “determining a position … based on the bitmap” corresponds to using the combined bitmap (and find-first) to derive which element index/location to access, which Raha further characterizes operationally via “rd pointers based on the combined sparsity.” That pointer/index is the claimed “position” used to identify the activation to be processed.)
determining a first position of the nonzero valued activation in the compressed activation operand based on the first bitmap; (Raha, col. 12, lines 33-43, “For a sparse ML accelerator, the data in the IF 681 and FL 682 RFs 608 is usually stored in compressed format in input channel (IC) dimension with the accompanying bitmaps stored within dedicated bitmap storage. The find-first logic uses a combination of these two bitmaps (AND) to skip RF 608 values that result in zero partial sums (or MAC 606 multiplications) such that only those IF 681 and FL 682 operands are read from the IF 681 and FL 682 RFs 608 that will result in non-zero partial sums towards the accumulated output… In these embodiments, a bitmap (BM) is encoded inline with the compressed data for decoding into dense data.”, in Raha, each compressed operand (IF and FL) is associated with its own bitmap (BM) used “for decoding into dense data.” Under BRI, a bitmap that supports decoding inherently provides positional information (i.e., which dense indices correspond to nonzero entries). Thus, the first bitmap (mapped above to the IF/activation bitmap) is used to obtain the first position of a nonzero activation in the compressed IF operand during decode/selection.)
determining a second position of the nonzero valued activation in the compressed activation operand based on the second bitmap; (Raha, col. 12, lines 33-43, “For a sparse ML accelerator, the data in the IF 681 and FL 682 RFs 608 is usually stored in compressed format in input channel (IC) dimension with the accompanying bitmaps stored within dedicated bitmap storage. The find-first logic uses a combination of these two bitmaps (AND) to skip RF 608 values that result in zero partial sums (or MAC 606 multiplications) such that only those IF 681 and FL 682 operands are read from the IF 681 and FL 682 RFs 608 that will result in non-zero partial sums towards the accumulated output… In these embodiments, a bitmap (BM) is encoded inline with the compressed data for decoding into dense data.”, in Raha, each compressed operand (IF and FL) is associated with its own bitmap (BM) used “for decoding into dense data.” Under BRI, a bitmap that supports decoding inherently provides positional information (i.e., which dense indices correspond to nonzero entries). Thus, the second bitmap (mapped above to the FL/weight bitmap) is used to obtain the second position of a nonzero activation in the compressed FL operand during decode/selection.)
and identifying the nonzero valued activation in the compressed activation operand based on the position, the first position, and the second position. (Raha, col. 12, lines 33-43, “For a sparse ML accelerator, the data in the IF 681 and FL 682 RFs 608 is usually stored in compressed format in input channel (IC) dimension with the accompanying bitmaps stored within dedicated bitmap storage. The find-first logic uses a combination of these two bitmaps (AND) to skip RF 608 values that result in zero partial sums (or MAC 606 multiplications) such that only those IF 681 and FL 682 operands are read from the IF 681 and FL 682 RFs 608 that will result in non-zero partial sums towards the accumulated output… In these embodiments, a bitmap (BM) is encoded inline with the compressed data for decoding into dense data.”, Raha teaches identification/selection of which IF activation operand is to be read by using (i) combined bitmap (AND) via find-first (claimed “position”) and (ii) operand-side bitmap decode (the claimed “first position” for IF, “second position” for FL). The result is that the selected IF operand actually read is determined using the combined selection logic plus bitmap positional information; Raha expresses this outcome as reading only those IF/FL operands indicated by the bitmap logic.)
Hall in the same field bit mask analysis, teaches the following limitations which Raha and Narad fail to teach:
The method of claim 1, wherein identifying the nonzero valued activation in the compressed activation operand comprises: after determining that there is a fault in identifying the nonzero valued activation, (Hall, paragraph 50, “For example, a “1” in mask register 410 may indicate that a corresponding data element was not written into destination register 415; otherwise a “0” may be used. In such embodiments, the gather instruction may execute until the sum of the values of the state elements in mask register 410 is equal to a predetermined threshold, for example, the number of data elements to be gathered, which may vary for each gather instruction.”, In Hall, the claimed “fault in identifying the nonzero valued activation” is interpreted as a condition where a targeted operand element that should be gathered/loaded has not been successfully written (i.e., the selection/identification did not result in the element being written), which Hall explicitly represents in the bitmap state: “a ‘1’ in mask register 410 may indicate that a corresponding data element was not written into destination register 415; otherwise a ‘0’ may be used.” Hall also explicitly recognizes a fault condition during iterative gather processing: “if the gather instruction is finished or a fault occurs …” (Hall, paragraph 88). Thus, Hall’s “fault occurs” condition corresponds to the claimed “after determining that there is a fault in identifying the nonzero valued activation.”)
The rationale for combining Raha and Narad with Hall is similar to that as applied for claim 2 above.

Claim 8: Raha and Narad teaches the limitations of claim 1. Raha further teaches:
generating a first bitmap and a second bitmap based on the activation sparsity vector and the weight sparsity vector; (Raha, col. 12, lines 33-43, “For a sparse ML accelerator, the data in the IF 681 and FL 682 RFs 608 is usually stored in compressed format in input channel (IC) dimension with the accompanying bitmaps stored within dedicated bitmap storage. The find-first logic uses a combination of these two bitmaps (AND) to skip RF 608 values that result in zero partial sums (or MAC 606 multiplications) such that only those IF 681 and FL 682 operands are read from the IF 681 and FL 682 RFs 608 that will result in non-zero partial sums towards the accumulated output… In these embodiments, a bitmap (BM) is encoded inline with the compressed data for decoding into dense data.”
Raha, col. 12, lines 56-59, “Within each PE 230, the sparsity bitmap is automatically recreated as seen during the load phase using incoming IF and FL wren signals coming from the schedule aware sparse decoders.”, Raha teaches two bitmaps for sparsity when IF (input activation / feature map) and FL (weights/filters) are stored compressed with “accompanying bitmaps.” Under the claim’s BRI: the “activation sparsity vector” maps to the IF-side sparsity bitmap (recreated using IF wren signals), and the “weight sparsity vector” maps to the FL-side sparsity bitmap (recreated using FL wren signals). Thus, Raha teaches generating a first bitmap (IF) and second bitmap (FL) from sparsity-indicating information associated with each operand.)
determining a position of the nonzero valued activation in the compressed weight operand based on the bitmap; (Raha, col. 12, lines 56-59, “Within each PE 230, the sparsity bitmap is automatically recreated as seen during the load phase using incoming IF and FL wren signals coming from the schedule aware sparse decoders. Subsequently, during the compute phase, the sparsity bitmaps of IF and FL are combined and given as an input to the find-first logic to skip 0 data and gain performance and energy improvements for the entire accelerator system 124.”,
Raha, col. 12, line 65, “As stated previously, the MACs can source the operands either from the M number of IF, FL, and OF RFs 608 based on the optimal schedule. Due to data dependent sparsity, the number of read ports in each IF subbank may need to be increased from 1 to 4 for the M×M mode of operation as each of the subbanks will be independently accessed by 4 rd pointers based on the combined sparsity of data in each IF subbank with 4 different FL subbanks.”, Raha’s “bitmap” maps to the combined (AND) sparsity bitmap (IF-bitmap AND FL-bitmap) that is explicitly fed to find-first logic. Under BRI, “determining a position … based on the bitmap” corresponds to using the combined bitmap (and find-first) to derive which element index/location to access, which Raha further characterizes operationally via “rd pointers based on the combined sparsity.” That pointer/index is the claimed “position” used to identify the activation to be processed.)
determining a first position of the nonzero valued weight in the compressed weight operand based on the first bitmap; (Raha, col. 12, lines 33-43, “For a sparse ML accelerator, the data in the IF 681 and FL 682 RFs 608 is usually stored in compressed format in input channel (IC) dimension with the accompanying bitmaps stored within dedicated bitmap storage. The find-first logic uses a combination of these two bitmaps (AND) to skip RF 608 values that result in zero partial sums (or MAC 606 multiplications) such that only those IF 681 and FL 682 operands are read from the IF 681 and FL 682 RFs 608 that will result in non-zero partial sums towards the accumulated output… In these embodiments, a bitmap (BM) is encoded inline with the compressed data for decoding into dense data.”, in Raha, each compressed operand (IF and FL) is associated with its own bitmap (BM) used “for decoding into dense data.” Under BRI, a bitmap that supports decoding inherently provides positional information (i.e., which dense indices correspond to nonzero entries). Thus, the first bitmap (mapped above to the IF/activation bitmap) is used to obtain the first position of a nonzero weight in the compressed IF/FL bitmap during decode/selection.)
determining a second position of the nonzero valued weight in the compressed weiht operand based on the second bitmap; (Raha, col. 12, lines 33-43, “For a sparse ML accelerator, the data in the IF 681 and FL 682 RFs 608 is usually stored in compressed format in input channel (IC) dimension with the accompanying bitmaps stored within dedicated bitmap storage. The find-first logic uses a combination of these two bitmaps (AND) to skip RF 608 values that result in zero partial sums (or MAC 606 multiplications) such that only those IF 681 and FL 682 operands are read from the IF 681 and FL 682 RFs 608 that will result in non-zero partial sums towards the accumulated output… In these embodiments, a bitmap (BM) is encoded inline with the compressed data for decoding into dense data.”, in Raha, each compressed operand (IF and FL) is associated with its own bitmap (BM) used “for decoding into dense data.” Under BRI, a bitmap that supports decoding inherently provides positional information (i.e., which dense indices correspond to nonzero entries). Thus, the second bitmap (mapped above to the FL/weight bitmap) is used to obtain the second position of a nonzero weight in the compressed FL/IF bitmap during decode/selection.)
and identifying the nonzero valued weight in the compressed weight operand based on the position, the first position, and the second position. (Raha, col. 12, lines 33-43, “For a sparse ML accelerator, the data in the IF 681 and FL 682 RFs 608 is usually stored in compressed format in input channel (IC) dimension with the accompanying bitmaps stored within dedicated bitmap storage. The find-first logic uses a combination of these two bitmaps (AND) to skip RF 608 values that result in zero partial sums (or MAC 606 multiplications) such that only those IF 681 and FL 682 operands are read from the IF 681 and FL 682 RFs 608 that will result in non-zero partial sums towards the accumulated output… In these embodiments, a bitmap (BM) is encoded inline with the compressed data for decoding into dense data.”, Raha teaches identification/selection of which FL weight operand is to be read by using (i) combined bitmap (AND) via find-first (claimed “position”) and (ii) operand-side bitmap decode (the claimed “first position” for IF, “second position” for FL). The result is that the selected FL (weight) operand actually read is determined using the combined selection logic plus bitmap positional information; Raha expresses this outcome as reading only those IF/FL operands indicated by the bitmap logic.)
Hall in the same field bit mask analysis, teaches the following limitations which Raha and Narad fail to teach:
The method of claim 1, wherein identifying the nonzero valued weight in the compressed weight operand comprises: after determining that there is a fault in identifying the nonzero valued weight, (Hall, paragraph 50, “For example, a “1” in mask register 410 may indicate that a corresponding data element was not written into destination register 415; otherwise a “0” may be used. In such embodiments, the gather instruction may execute until the sum of the values of the state elements in mask register 410 is equal to a predetermined threshold, for example, the number of data elements to be gathered, which may vary for each gather instruction.”, In Hall, the claimed “fault in identifying the nonzero valued weight” is interpreted as a condition where a targeted operand element that should be gathered/loaded has not been successfully written (i.e., the selection/identification did not result in the element being written), which Hall explicitly represents in the bitmap state: “a ‘1’ in mask register 410 may indicate that a corresponding data element was not written into destination register 415; otherwise a ‘0’ may be used.” Hall also explicitly recognizes a fault condition during iterative gather processing: “if the gather instruction is finished or a fault occurs …” (Hall, paragraph 88). Thus, Hall’s “fault occurs” condition corresponds to the claimed “after determining that there is a fault in identifying the nonzero valued weight.”)
The rationale for combining Raha and Narad with Hall is similar to that as applied for claim 2 above.

Claims 12 and 18 are substantially similar to claim 2, as such a similar analysis applies.
Claims 13-14 are substantially similar to claims 5-6, as such a similar analysis applies.
Claims 15 and 20 are substantially similar to claim 7, as such a similar analysis applies.
Claim 19 is substantially similar to claim 5, as such a similar analysis applies.

Claims 9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Raha in view of Narad and Anders et al., (US 20210397414 A1), hereafter referred to as Anders.

Claim 9: Raha and Narad teaches the limitations of claim 1. Anders in the same field bit mask analysis, teaches the following limitations which Raha and Narad fail to teach:
The method of claim 1, wherein identifying the nonzero valued activation in the compressed activation operand comprises: after determining that there is a fault in identifying the nonzero valued activation, identifying another nonzero valued activation in the compressed activation operand, wherein the another nonzero valued activation is subsequently next to a previously identified nonzero valued activation in the compressed activation operand. (Anders, paragraph 73, “The non-zero values are compressed and kept adjacent to one another in an IF RF 214. ”, Anders explicitly states that nonzero activations are compressed and stored adjacent in IF RF. Thus, selecting another activation next to a previously selected one is directly supported by the adjacency teaching.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Raha and Narad by further incorporating the teachings of Anders to support recovery behavior of identifying “another” nonzero activation that is “subsequently next” to a previously identified nonzero activation in the compressed activation operand. Raha teaches bitmap-guided sparse operand identification for deep-learning MAC operations (e.g., IF/FL sparsity bitmaps combined into find-first logic to select nonzero work and read only the corresponding operands). Narad provides a known mechanism for detecting invalid/erroneous mask states using bit-count logic (“population count” and invalid indication). Anders explicitly teaches the storage property needed for adjacent selection in compressed activations: “the non-zero values are compressed and kept adjacent to one another in an IF RF,” (Anders, paragraph 73) and further ties the bitmap to zero/nonzero positions (“zero and non-zero positions … are represented by a bit in the bitmap”).  A motivation of which would have been to reduce fault-recovery overhead and improve robustness by enabling a simple “next-adjacent-nonzero” fallback in the compressed activation stream.

Claim 16 is substantially similar to claim 9, as such a similar analysis applies.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Wang, Y., Zhang, C., Xie, Z., Guo, C., Liu, Y., & Leng, J. (2021, June). Dual-side sparse tensor core. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) (pp. 1083-1095). IEEE.
Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., ... & Dally, W. J. (2017). SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH computer architecture news, 45(2), 27-40.
Mishra, A., Latorre, J. A., Pool, J., Stosic, D., Stosic, D., Venkatesh, G., ... & Micikevicius, P. (2021). Accelerating sparse deep neural networks. arXiv preprint arXiv:2104.08378.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HYUNGJUN B YI whose telephone number is (703)756-4799. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/H.B.Y./Examiner, Art Unit 2146  


 /USMAAN SAEED/ Supervisory Patent Examiner, Art Unit 2146
Read full office action
DETECTING AND MITIGATING FAULT IN SPARSITY COMPUTATION IN DEEP NEURAL NETWORK

This examiner grants 18% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

DETECTING AND MITIGATING FAULT IN SPARSITY COMPUTATION IN DEEP NEURAL NETWORK

This examiner grants 18% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email