DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is made final.
Claims 1-20 are pending. Claims 1 and 11 are independent claims.
Response to Arguments
Applicant’s arguments dated 12/4/2025, regarding the 35 U.S.C. 103 rejections of the previous office action have been fully considered, but are unpersuasive.
Applicant argues that since the previous office action supposedly acknowledges that Martin does not teach ”a weight buffer configured to store weight values in an arrangement selected from a group comprising a structured weight sparsity arrangement and a random weight arrangement”, Martin cannot teach “a weight multiplexer array… configured to output one or more weight values stored in the weight buffer as first operand values based on the selected weight sparsity arrangement” because Martin does not teach a selected arrangement. The examiner would like to clarify that in the previous office action, Martin was described as failing to explicitly teach “selected from a group comprising a structured weight sparsity arrangement and a random weight sparsity arrangement”, not “a weight buffer configured to store weight values in an arrangement…” The examiner argues that under broadest reasonable interpretation (BRI), the claim does not require both a structured and random arrangement, and also argues that an actual process of selecting of an arrangement is not recited in the claims (i.e., a step to select arrangement A under condition X, or arrangement B under condition Y). Therefore, since Martin discloses a weight buffer with a weight sparsity arrangement (Martin, ¶141, A sparsity map for the weights may be provided with weights 302 by a respective weight buffer 240 – a sparsity map, i.e., representing the previously selected arrangement), Martin teaches “a weight multiplexer array… configured to output one or more weight values stored in the weight buffer as first operand values based on the selected weight sparsity arrangement”.
Applicant argues that claim 1 “requires actual weight values stored in a weight buffer”, and that Xiao does not teach or suggest this element. Examiner argues that Martin teaches weight values stored in a weight buffer, and that Xiao was not claimed to teach this element. While bitmaps are not equivalent to weights themselves, selecting a sparsity arrangement can be interpreted under BRI to encompass selecting a sparsity pattern for a sparsity bitmap that is applied to weights.
Applicant argues that Xiao fails to teach “an arrangement [be] selected from a group comprising a structured weight sparsity arrangement and a random weight sparsity arrangement” (examiner note: the underlined [be] was not included in the claim). Examiner argues that under BRI, the claim as is does not include an explicit selecting step, and that the interpretation of this limitation is presenting two possible options for the sparsity arrangement. Applicant argues that Xiao uses structured and unstructured sparsity being used simultaneously for a pair of bitmaps representing one sparse matrix, and therefore is not equivalent to “selecting either a structured sparsity or un-structured sparsity”, which applicant argues is required by claim 1. However, the claim does not read “selecting either a structured sparsity or un-structured sparsity”. As Xiao uses two separate bitmaps (i.e., weight sparsity arrangements), one structured sparse and one unstructured sparse, Xiao teaches at least two options for sparsity arrangements, structured and unstructured.
Applicant argues that Xiao’s un-structured sparsity is not equivalent to random sparsity, because unstructured sparsity can be produced in a number of ways, “but the pattern is not random is any of them”. Examiner argues that as each element of an unstructured sparsity arrangement (i.e., pattern) is independent from the other elements of the arrangement, the arrangement itself is indeed a random weight sparsity arrangement under BRI. For example, given a sparse weight vector with 10 elements, if the first 9 elements’ sparsity status were known, the last element’s sparsity status is unable to be predicted. The claim does not recite a description like “prune random weights” and does not describe how the random sparsity arrangement is created.
Applicant argues that the previous office action relies on isolated terminology from Xiao and is improper under MPEP 2141.01(a), because it apparently relies “selectively on the phrases ‘structured sparsity’ and ‘un-structured sparsity’ from Xiao while disregarding the underlying bitmap mechanisms tied to the ‘structured sparsity’ and ‘un-structured sparsity’ of Xiao”. Applicant adds that “Xiao’s ‘structured sparsity’ and ‘un-structured sparsity’ only refer to bitmaps, not to an arrangement of weights”. Examiner respectfully disagrees. In the previous office action, citations of Martin (¶141, A sparsity map for the weights may be provided with weights 302 by a respective weight buffer 240) used in claim 1 include a sparsity map and weights in a weight buffer. Using a bitmap (i.e., zeroes representing sparsified parameters, ones representing non-sparse parameters) in combination with a weight vector/matrix/tensor to represent a ”weight sparsity arrangement” is present in both references. Examiner does not believe that replacing the sparsity maps of Martin with the bitmap structures of Xiao would fundamentally alter the operation disclosed in Martin.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: weight buffer configured to store… and activation buffer configured to store… in claims 1-10.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. The weight buffer and activation buffer are interpreted to be collections of registers as described in ¶5 of the specification: ¶5, The weight buffer may include an array of weight registers… and The activation buffer may include an array of activation registers… The same interpretation applies to dependent claims 2-10.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 6-11 and 15-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Martin et al. (US 20190147327 A1), herein Martin, in view of Xiao et al. (US 20210240684 A1, INCLUDED IN IDS), herein Xiao.
Regarding claim 1, Martin teaches: A neural processing unit, comprising: a weight buffer configured to store weight values in an arrangement (Fig. 2, weight buffers 240a-n)…a weight multiplexer array (Fig. 3, neuron engine 245 contains a weight 308 and input 307 multiplexer, also see Fig. 2, which contains a plurality of neuron engines 245, i.e. an array of multiplexers and multipliers) configured to output one or more weight values stored in the weight buffer as first operand values based on the selected weight sparsity arrangement (¶141, A sparsity map for the weights may be provided with weights 302 by a respective weight buffer 240); an activation buffer configured to store activation values (Fig. 2, input buffer 235 – the input can be the output (i.e., activations) of a previous layer as described in ¶120, Sparsity in input data may occur for the following reasons: Activation Function… Data sparsity is generally higher following a ReLU activation layer, as this function clamps all negative values to zero); an activation multiplexer array comprising inputs to the activation multiplexer array coupled to the activation buffer, the activation multiplexer array configured to output one or more activation values stored in the activation buffer as second operand values (Fig. 3, multiplexer 307), each respective second operand value and a corresponding first operand value forming an operand value pair; and a multiplier array configured to output a product value for each operand value pair (Fig. 3, multiplier 309 – and – ¶40, The multiplication logic may comprise a plurality of multipliers arranged to concurrently combine a plurality of weights with a plurality of corresponding data values).
Martin fails to explicitly teach: selected from a group comprising a structured weight sparsity arrangement and a random weight sparsity arrangement.
However, in the same field of endeavor, Xiao teaches: selected from a group comprising a structured weight sparsity arrangement and a random weight sparsity arrangement (¶105, In some embodiments, hierarchical representation of a sparse matrix can represent structured sparsity in a level (e.g., first level BM) and un-structured sparsity in another level (e.g., second level BM) – BM referring to bitmaps – Xiao teaches structured and random sparsity arrangements represented by bitmaps).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use a structured weight sparsity arrangement or an unstructured sparsity arrangement as disclosed by Xiao in the unit disclosed by Martin to balance accuracy and efficiency (¶27, as there may be a tradeoff between theoretically a more accurate outcome using less a less aggressive sparsing techniques versus computational savings using a more aggressive sparsing techniques).
Regarding claim 6, Martin teaches: The neural processing unit of claim 1, wherein the weight values are stored in the weight buffer in the… weight sparsity arrangement (¶198, Packing the weights for sparsity with all of the zero weights at one end), the neural processing unit further comprising a control unit configured to control the activation multiplexer array to select and output one or more activation values stored in the activation buffer based on the… weight sparsity arrangement (¶140, The control block 304 may be configured to identify whether each input datum or its respective weight are zero. If either the input datum or its respective weight are zero, the datum-weight pair is skipped and not processed… This can be achieved through the use of multiplexers 307 and 308 which are configured to pass to the multiplication logic 309 (in this case a multiplier) only on those datum-weight pairs where both the datum and weight are non-zero – this method handles weights and activations under any sparsity arrangement).
Martin fails to explicitly teach: structured weight sparsity… structured…
However, in the same field of endeavor, Xiao teaches: structured weight sparsity (¶105, In some embodiments, hierarchical representation of a sparse matrix can represent structured sparsity in a level (e.g., first level BM)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use a structured weight sparsity arrangement or an unstructured sparsity arrangement as disclosed by Xiao in the unit disclosed by Martin to balance accuracy and efficiency (¶27, as there may be a tradeoff between theoretically a more accurate outcome using less a less aggressive sparsing techniques versus computational savings using a more aggressive sparsing techniques).
Regarding claim 7, Martin teaches: The neural processing unit of claim 1, wherein the weight values are stored in the weight buffer in the random weight sparsity arrangement (¶118, Zero pruning is a process that can be performed during mapping, where very small non-zero weights can be set to zero in order to increase the sparsity – zero pruning would result in a random sparsity arrangement), the neural processing unit further comprising a control unit configured to control the activation multiplexer array to select and output one or more activation values stored in the activation buffer based on the random weight sparsity arrangement of the weight values (¶140, The control block 304 may be configured to identify whether each input datum or its respective weight are zero. If either the input datum or its respective weight are zero, the datum-weight pair is skipped and not processed… This can be achieved through the use of multiplexers 307 and 308 which are configured to pass to the multiplication logic 309 (in this case a multiplier) only on those datum-weight pairs where both the datum and weight are non-zero – this method handles weights and activations under any sparsity arrangement).
Regarding claim 8, Martin teaches: The neural processing unit of claim 7, wherein the activation values are stored in the activation buffer in a random activation sparsity arrangement (¶126, When converting the data into a fixed point format at a particular bit depth, some small values may become zero. The lower the bit depth used, the more zeros are likely to be introduced into the data – quantization would result in a random sparsity arrangement), and wherein the control unit is further configured to control the activation multiplexer array to select and output the one or more activation values based on the random weight sparsity arrangement and on the random activation sparsity arrangement (¶140, The control block 304 may be configured to identify whether each input datum or its respective weight are zero. If either the input datum or its respective weight are zero, the datum-weight pair is skipped and not processed… This can be achieved through the use of multiplexers 307 and 308 which are configured to pass to the multiplication logic 309 (in this case a multiplier) only on those datum-weight pairs where both the datum and weight are non-zero).
Regarding claim 9, Martin teaches: The neural processing unit of claim 8, wherein the control unit is further configured to select and output the one or more activation values based on an ANDing of an activation zero- bit mask of activation values stored in the activation buffer and a weight zero-bit mask of weight values stored in the weight buffer (¶141, A sparsity map for the input data may be provided with input data 301 by the input buffer 235. A sparsity map for the weights may be provided with weights 302 by a respective weight buffer 240. By combining the pair of sparsity maps the control block may readily determine which of the datum-weight pairs includes a zero value).
Regarding claim 10, Martin teaches: The neural processing unit of claim 1, wherein the weight multiplexer array comprises four multiplexers, the activation multiplexer array comprises four second multiplexers and the multiplier array comprises four multipliers (Fig. 3, each neuron engine has a weight multiplexer 308, an input (i.e., activation) multiplexer 307, and a multiplier 309 – if there were 4 neuron engines, there would be 4 of each component – ¶138, Any number of neuron engines can theoretically be included in a hardware implementation 200, allowing the design to be scaled with a fine granularity).
Regarding claim 11, it recites similar limitations to claim 1 and is rejected on the same grounds – see above.
Regarding claim 15, it recites similar limitations to claim 6 and is rejected on the same grounds – see above.
Regarding claim 16, it recites similar limitations to claim 7 and is rejected on the same grounds – see above.
Regarding claim 17, it recites similar limitations to claim 8 and is rejected on the same grounds – see above.
Regarding claim 18, it recites similar limitations to claim 9 and is rejected on the same grounds – see above.
Regarding claim 19, Martin teaches: The neural processing unit of claim 17, wherein the weight multiplexer is part of an array of weight multiplexers, the activation multiplexer is part of an array of activation multiplexers, and the multiplier unit is part of an array of multipliers (Fig. 3, a neuron engine 245 contains a weight 308 and input 307 multiplexer, also see Fig. 2, which contains a plurality of neuron engines 245, i.e. an array of multiplexers and multipliers).
Regarding claim 20, it recites similar limitations to claim 19 and is rejected on the same grounds – see above.
Claim(s) 2 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Martin in view of Xiao as applied to claim 1 above, and further in view of Moshovos et al. (US 20210004668 A1), herein Moshovos.
Regarding claim 2, Martin in view of Xiao fails to teach: The neural processing unit of claim 1, wherein the weight multiplexer array is further configured to select the one or more weight values in a lookahead manner, and wherein the activation multiplexer array is further configured to select the one or more activation values in the lookahead manner.
However, in the same field of endeavor, Moshovos teaches: wherein the weight multiplexer array is further configured to select the one or more weight values in a lookahead manner (¶36, allowing schedules where only two intra-filter weight movements are permitted: a lookahead movement and a lookaside movement), and wherein the activation multiplexer array is further configured to select the one or more activation values in the lookahead manner (¶47, For each weight wi processed by tile 7000 there are h+1 activations, Ai,0 through Ai,h, that correspond to a lookahead window of h activations).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to select weights and activations in a lookahead manner as disclosed by Moshovos in the unit disclosed by Martin in view of Xiao to improve flexibility and efficiency (¶36, weight scheduling flexibility may be balanced with energy and area efficiency).
Regarding claim 12, it recites similar limitations to claim 2 and is rejected on the same grounds – see above.
Claim(s) 3, 4 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Martin in view of Xiao and Moshovos as applied to claim 2 above, and further in view of Ovsiannikov et al. (US 20190392287 A1), herein Ovsiannikov.
Regarding claim 3, Martin in view of Xiao fails to teach: The neural processing unit of claim 2, wherein the weight multiplexer array is further configured to select the one or more weight values in a lookaside manner…
However, in the same field of endeavor, Moshovos teaches: wherein the weight multiplexer array is further configured to select the one or more weight values in a lookaside manner (¶36, only two intra-filter weight movements are permitted: a lookahead movement and a lookaside movement).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to select weights in a lookaside manner as disclosed by Moshovos in the unit disclosed by Martin in view of Xiao to improve flexibility and efficiency (¶36, weight scheduling flexibility may be balanced with energy and area efficiency).
Martin in view of Xiao and Moshovos fails to teach: and wherein the activation multiplexer array is further configured to select the one or more activation values in the lookaside manner.
However, in the same field of endeavor, Ovsiannikov teaches: and wherein the activation multiplexer array is further configured to select the one or more activation values in the lookaside manner (¶343, As mentioned earlier in the description of those drawings, the ability to retrieve (and multiplex in) data from one lane above and below may be referred to as a “look-aside of 1”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to select activations in a lookaside manner as disclosed by Ovsiannikov in the unit disclosed by Martin in view of Xiao and Moshovos to improve efficiency (¶258, The neural processor may be configured to efficiently calculate a convolution or a tensor product of an input feature map (IFM) (or a tensor of “activations”) with a multi-dimensional array (or tensor) of weights, to form an output feature map).
Regarding claim 4, Martin in view of Xiao fails to teach: The neural processing unit of claim 3, wherein the weight multiplexer array is configured to select the one or more weight values in a lookahead of at least 3 timeslots and in a lookaside of 1 channel.
However, in the same field of endeavor, Moshovos teaches: wherein the weight multiplexer array is configured to select the one or more weight values in a lookahead of at least 3 timeslots (¶36, A lookahead movement allows an effectual weight to advance in step to replace an ineffectual weight… h is a lookahead depth which is linked to the number of activation values that must be made available -- and – ¶80, Accordingly, setting a lookahead and lookaside pair to (2, 5) or (4, 3) may be a reasonable compromise configuration for many embodiments and situations – (4,3) meaning a lookahead depth of 4, which is at least 3) and in a lookaside of 1 channel (¶36, lookaside movement allows an effectual weight to replace an ineffectual weight in a different lane, for example effectual weight… may be advanced one time step and shifted d lanes to replace ineffectual weight – and – ¶42, Accelerator 6000 employs a lookaside structure in which d has been set to 1).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to select weights with a lookahead of at least 3 and a weight lookaside of 1 as disclosed by Moshovos in the unit disclosed by Martin in view of Xiao to improve flexibility and efficiency (¶36, weight scheduling flexibility may be balanced with energy and area efficiency).
Martin in view of Xiao and Moshovos fails to teach: and wherein the activation multiplexer array is configured to select the one or more activation values in a lookahead of at least 3 time slots and in a lookaside of at least 2 channels.
However, in the same field of endeavor, Ovsiannikov teaches: and wherein the activation multiplexer array is configured to select the one or more activation values in a lookahead of at least 3 time slots and in a lookaside of at least 2 channels (¶344, The look-aside and/or look-ahead may be greater than two… skipping zero activations – the lookahead greater than 2 is at least 3, and the lookahead greater than 2 includes values like 3, 4, 5, etc. that are at least 2).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to select activations with a lookahead of at least 3 and a lookaside of at least 2 as disclosed by Ovsiannikov in the unit disclosed by Martin in view of Xiao and Moshovos to improve efficiency (¶258, The neural processor may be configured to efficiently calculate a convolution or a tensor product of an input feature map (IFM) (or a tensor of “activations”) with a multi-dimensional array (or tensor) of weights, to form an output feature map).
Regarding claim 13, it recites similar limitations to claim 3 and is rejected on the same grounds – see above.
Claim(s) 5 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Martin in view of Xiao as applied to claim 1 above, and further in view of Moshovos and Ovsiannikov.
Regarding claim 5, Martin in view of Xiao fails to teach: The neural processing unit of claim 1, wherein the weight multiplexer array is further configured to select the one or more weight values in a lookaside manner.
However, in the same field of endeavor, Moshovos teaches: wherein the weight multiplexer array is further configured to select the one or more weight values in a lookaside manner (¶36, allowing schedules where only two intra-filter weight movements are permitted: a lookahead movement and a lookaside movement).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to select weights in a lookaside manner as disclosed by Moshovos in the unit disclosed by Martin in view of Xiao to improve flexibility and efficiency (¶36, weight scheduling flexibility may be balanced with energy and area efficiency).
Martin in view of Xiao and Moshovos fails to teach: and the activation multiplexer array is further configured to select the one or more activation values in the lookaside manner.
However, in the same field of endeavor, Ovsiannikov teaches: and the activation multiplexer array is further configured to select the one or more activation values in the lookaside manner (¶343, As mentioned earlier in the description of those drawings, the ability to retrieve (and multiplex in) data from one lane above and below may be referred to as a “look-aside of 1”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to select activations in a lookaside manner as disclosed by Ovsiannikov in the unit disclosed by Martin in view of Xiao and Moshovos to improve efficiency (¶258, The neural processor may be configured to efficiently calculate a convolution or a tensor product of an input feature map (IFM) (or a tensor of “activations”) with a multi-dimensional array (or tensor) of weights, to form an output feature map).
Regarding claim 14, it recites similar limitations to claim 5 and is rejected on the same grounds – see above.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HARRISON CHAN YOUNG KIM whose telephone number is (571)272-0713. The examiner can normally be reached Monday - Thursday 8:30 am - 4:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CESAR PAULA can be reached at (571) 272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HARRISON C KIM/ Examiner, Art Unit 2145
/CESAR B PAULA/ Supervisory Patent Examiner, Art Unit 2145