DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 20 January 2026, 05 February 2026 has been entered.
Claim Objections
Claims 1, 22, and 24 are objected to because of the following informalities: “a different portion of the data element”. If referring back to “the first data element” it would promote better clarity of the claims if the limitation were recited as “a different portion of the first data element”, or if meant to refer back to “the data elements” to recite the limitation as “a different portion of the data elements”. Claims 3, 5, 9-15, 17-20, 27-32 inherit the same deficiency by reasons of dependence on claim 1. Claim 34 inherits the same deficiency by reasons of dependence on claim 22. Claim 35 inherits the same deficiency by reasons of dependence on claim 24.
Claim 35 is objected to because of the following informalities: “comprising a first mode and a second mode. the first data element”.
Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3, 9, 11-15, 17-19, 22, 24, 27-31, 34-35 are rejected under 35 U.S.C. 103 as being unpatentable over US 20210073171 A1 Master et al. (hereinafter “Master”) in view of US 10776078 B1 Clark et al. (hereinafter “Clark”, previously cited on PTO-892 Notice of References Cited filed 06/10/2025) in view of US 20210279055 A1 Saxena et al. (hereinafter “Saxena”) in view of US 20190171930 A1 Lee et al. (hereinafter “Lee”).
Regarding claim 1, Master teaches a processor comprising:
a substrate ([0008] integrated circuit which by nature has a substrate); and
logic ([0008]; Fig. 2, 200, computational cores) coupled to the substrate, the logic including an arithmetic block ([0121]; Fig. 2, 300, reconfigurable arithmetic engine (RAE) circuit), the arithmetic block comprising:
a plurality of multipliers ([0129], [0175-0176]; Fig. 5, 305 multiplier, many multipliers utilized for particular modes see Table 1, IEEE half, IEEE quarter, BFLOAT16, INT8, INT16, etc.), each multiplier configured to perform multiplication at a N-bit data precision ([0135], [0132], [0175], [0237]), wherein N is an integer number ([0175-176], Table 1 – INT8 where four 9x9 multipliers are implemented, as an example N is 9),
a register file (Fig. 46, 526, [0431]) configured to store data elements ([0431] holding for 4-bit configuration of each slice) of one or more tensor operations in one or more neural networks ([0007], [0119]), the data elements including a first data element having a first data precision and a second data element having a second data precision, the first data precision being higher than the N-bit data precision, the second data precision being lower than the N-bit data precision ([0175-176], Table 1 INT8 where four 9x9 multipliers are implemented, as an example N is 9), and
an accumulator (Fig. 2, 315 [0138]; [0176] accumulator; Fig. 42, 315, [0374]; Fig. 5, 315, [0131]) configured to accumulate products ([0374]) computed by at least part of the plurality of multipliers to compute output data (Fig. 5, output of multiplier shifter-combiner network 310 beforehand output from multiplier 305, [0129]) of the one or more tensor operations,
wherein compared with the second data element, the first data element is to be processed by more multipliers of the plurality of multipliers ([0135], [0132], [0175], [0237]) for computing ([0187]) a single product, wherein each of the more multipliers is to receive a different portion of the data element ([0431] holding for 4-bit configuration of each slice).
While Master generally teaches a register file (Fig. 46, 526, [0431]), they appear to be silent with disclosing one or more tensor operations; the data elements including a first data element having a first data precision and a second data element having a second data precision, the first data precision being higher than the N-bit data precision, the second data precision being lower than the N-bit data precision; wherein compared with the second data element, the first data element is to be processed by more multipliers for computing a single product, wherein each of the more multipliers is to receive a different portion of the data element.
Clark teaches a first data element having a first data precision (Fig. 1A, 110; Col. 3, lines 34-39, multiplier 110 uses ‘first data type’) and a second data element having a second data precision (Fig. 1A, 111; Col. 3, lines 34-39, multiplier 111 uses ‘second data type’); the first data precision being higher than the N-bit data precision, the second data precision being lower than the N-bit data precision (Fig. 1A, 111 and 110; Col. 3, lines 34-39, where the first data type used by multiplier 110 is higher than the multiplier 111 which is the second data type; co. 7 ln. 32-48, FP16 as first type, Int8 as second type); wherein compared with the second data element, the first data element is to be processed by more multipliers of the plurality of multipliers for computing a single product (Fig. 1A, Out of 111 and Out of 110, Col. 5, lines 41-46), wherein each of the more multiplier is to receive a different portion of the data element (Fig. 1A, Op4 coupled with Op3 is split when Op3 is sent to Sign Extend 112 and multiplier 111 receives Op4 and Op2, Op2 coupled with Op2 is split when Op1 is sent to Sign Extend 112 and multiplier 110 receives Op1 and Op3; Col. 4, lines 7-33).
It would have been obvious to one of ordinary skill in the art before the effective
filing date to modify Master with Clark’s various data precisions and multipliers features
because they are in the claimed invention’s same field of endeavor of multiply-accumulate operations ([abstract]). It would have been obvious to one of ordinary skill in the art to implement the various data precisions and multipliers as Clark’s splitting of data to multiple multipliers more advantageously allows for parallel computation at various bit sizes (Col. 4, lines 26-33; Col. 5, lines 37-46). A person of ordinary skill in the art would look to Clark’s various data precisions and multipliers features in order to more effectively process large amounts of data, comprising of different types and sizes, more efficiently, and thus it would have been obvious to make the modification.
Master in view of Clark are silent to disclosing one or more tensor operations; compared with the second data element, the first data element is to be processed by more multipliers.
However, in the same field of endeavor of multiply-accumulate devices [abstract], Saxena teaches one or more tensor operations ([0451-0452] 4x4 matrix, [0146]).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Master in view of Clark with Saxena’s specific details of the tensor operations. Master generally teaches a register file (Fig. 46, 526, [0431]), but does not disclose configuring to one or more tensor operations. Modifying Master’s register file with Saxena’s various precisions and tensor operation by adjusting Master in view of Clark’s architecture to accommodate the tensor operations would lead to additional architecture to process various data types presented with tensor operations with quicker precision. Master already teaches that their system can be used for computations in a neural network ([0007], [0119]), so it would be obvious to try to employ tensor operations, as data is typically represented in such a format in neural networks ([0115], [0152]). This architecture would further allow Master in view of Clark’s techniques to more effectively handle varying precision operands, thus leading to improved speed of processing performance ([0226], [0342]).
Master in view of Clark in view of Saxena are silent to disclosing compared with the second data element, the first data element is to be processed by more multipliers.
Lee discloses compared with the second data element, the first data element is to be processed by more multipliers (Fig. 4A ‘403’ 100% utilization for 16 bits, Fig. 4B ‘423’ 25% utilization for 8 bits, [0103-0105]).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Master in view of Clark in view of Saxena’s multiplier with Lee’s utilization features because they are in the same field of endeavor of multiplier architecture ([Abstract]). Modifying Master in view of Clark in view of Saxena’s multiplier architecture to utilize more of the multiplier when processing higher length operands would increase the efficiency of the operators, and utilizing less of the multiplier when processing lower length operands, while it would reduce the number of resources, would make the operator less effective ([0105]). This modification in architecture would allow Master in view of Clark in view of Saxena’s techniques to more effectively handle varying precision operands and avoid unnecessary utilizations of the multiplier when processing smaller sized operands, therefore leading to improved performance by reducing idle resources and prioritizing maximum bit width operands to increase efficiency and promote full utilization ([0103-0105], [0110]).
Regarding claim 3, in addition to the teachings addressed in the claim 1 analysis,
the rejection of claim 1 is incorporated and Master further teaches the processor wherein:
another data element ([0431] holding for 4-bit configuration of each slice) stored in the register file (Fig. 46, 526, [0431]) is allocated to the more multipliers to compute ([0187]) the single product, wherein each of the more multipliers ([0129], [0175-0176]; Fig. 5, 305 multiplier, many multipliers utilized for particular modes see Table 1, IEEE half, IEEE quarter, BFLOAT16, INT8, INT16, etc.) is to receive a different portion of the another data element ([0431] holding for 4-bit configuration of each slice).
While Master generally teaches a register file (Fig. 46, 526, [0431]), they do not explicitly disclose is allocated to the more multipliers to compute the single product, wherein each of the more multipliers is to receive a different portion of the another data element.
Clark teaches is allocated (Fig. 1A, Op4 coupled with Op3 is split when Op3 is sent to Sign Extend 112, Op2 coupled with Op2 is split when Op1 is sent to Sign Extend 112; Col. 4, lines 7-33) to the more multiplier and to compute the single product (Fig. 1A, Out of 111 and Out of 110, Col. 5, lines 41-46), wherein each of the more multipliers is to receive a different portion of the another data element (Fig. 1A, Op4 coupled with Op3 is split when Op3 is sent to Sign Extend 112 and multiplier 111 receives Op4 and Op2, Op2 coupled with Op2 is split when Op1 is sent to Sign Extend 112 and multiplier 110 receives Op1 and Op3; Col. 4, lines 7-33).
The motivation to combine provided with respect to claim 1 equally applies.
Master in view of Clark in view of Saxena are silent to disclosing more multipliers.
Lee discloses more multipliers (Fig. 4A ‘403’ 100% utilization for 16 bits, Fig. 4B ‘423’ 25% utilization for 8 bits, [0103-0105]).
The motivation to combine provided with respect to claim 1 equally applies.
Regarding claim 9, in addition to the teachings addressed in the claim 1 analysis,
the rejection of claim 1 is incorporated and Master further teaches the processor wherein:
the data elements ([0431] holding for 4-bit configuration of each slice) are allocated from the register files ([0431] register file is 4-bit; [0135] 27x27 unsigned multiplier, 8x8, 16x16, 32x32) to the plurality of multipliers ([0135], [0132], [0175], [0237]) by:
allocating data elements ([0431] holding for 4-bit configuration of each slice) in a plurality of input channels (Fig. 4B, 105A, 105B, 105C; Fig. 6, 350) to the plurality of multipliers (Fig. 5, 305 [0129]).
While Master generally teaches a register file (Fig. 46, 526, [0431]), they do not explicitly disclose the allocation.
Clark teaches allocated (Fig. 1A, Op4 coupled with Op3 is split when Op3 is sent to Sign Extend 112, Op2 coupled with Op2 is split when Op1 is sent to Sign Extend 112; Col. 4, lines 7-33).
The motivation to combine provided with respect to claim 1 equally applies.
Regarding claim 11, in addition to the teachings addressed in the claim 1 analysis,
the rejection of claim 1 is incorporated and Master further teaches the processor wherein:
the data elements ([0431] holding for 4-bit configuration of each slice) comprise weights ([0186-0187], [0226]; [0432] Z input weight of 2, Y input weight of 1) of the one or more tensor operations.
While Master generally teaches weights as explained in the rejection of claim 11 above, Master does not disclose activations and the one or more tensor operations.
Clark teaches activations (Col. 5, lines 18-46).
The motivation to combine provided with respect to claim 1 equally applies.
Master in view of Clark are silent with disclosing the one or more tensor operation.
However, in the same field of endeavor of multiply-accumulate devices [abstract], Saxena teaches one or more tensor operations ([0451-0452] 4x4 matrix, [0146]).
The motivation to combine provided with respect to claim 1 equally applies.
Regarding claim 12, in addition to the teachings addressed in the claim 1 analysis, the rejection of claim 1 is incorporated and Master further teaches the processor wherein:
a multiplier of the plurality of multipliers ([0129], [0175-0176]; Fig. 5, 305 multiplier, many multipliers utilized for signed modes see Table 1) is a signed magnitude multiplier (Table 1, [0177-0178], [0202]).
Regarding claim 13, in addition to the teachings addressed in the claim 12 analysis, the rejection of claim 12 is incorporated and Master further teaches the processor wherein:
the logic ([0008]; Fig. 2, 200, computational cores) is to add a single mixed radix partial product ([0202] adding 1), and wherein a final partial product of a lower radix operates as a subset of possibilities of a higher radix ([0209] applying correction to the low order multiplier to avoid redundant correction at two different alignments; [0211] selective negation of the product, upper half gets the sign correction).
Regarding claim 14, in addition to the teachings addressed in the claim 12 analysis, the rejection of claim 12 is incorporated and Master further teaches the processor wherein:
the accumulator ([0176] accumulator; Fig. 2, 315 [0138]) is configured to:
sum ([0194]) some of the products ([0193] lower order multiplier 24x24) in a first radix; and
sum ([0194]) other products in a second radix that is different from the first radix.
Master generally teaches summing, however in a different embodiment discloses summing in a first and different than the first radix ([0497], [0519], [0521]).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify with the first and different than the first radix. It would have been obvious to one of ordinary skill in the art to implement this mixed radix summation, as it would allow the summation techniques to handle processing complex values. By including complex values, the range of data values capable of being processed and computed is increased thereby improving applicability to various fields of use ([0519]). Making this modification would have been obvious to one of ordinary skill in the art, since one of ordinary skill in the art would recognize the benefits of this separate embodiment, as taught by Master.
Regarding claim 15, in addition to the teachings addressed in the claim 14 analysis, the rejection of claim 14 is incorporated and Master further teaches the processor wherein:
a multiplier of the plurality of multipliers ([0129], [0175-0176]; Fig. 5, 305 multiplier, many multipliers utilized for unsigned modes see Table 1) is configured to perform unsigned multiplication ([0175]; Table 1, “Mode”).
Regarding claim 17, in addition to the teachings addressed in the claim 1 analysis,
the rejection of claim 1 is incorporated and Master further teaches the processor wherein:
the plurality of multipliers ([0129], [0175-0176]; Fig. 5, 305 multiplier, many multipliers utilized for specific modes see Table 1) are to:
calculate a first group of additional products in the first precision ([0193-0194] 24 bits of X input and Y input); and
calculate a second group of additional products in the second precision ([0193-194] 8 bit ‘L’ shaped partial products), wherein the first group of additional products and the second group of additional products have signed magnitude values ([0193] signed modes).
Master in view of Clark discloses the first precision and second precision as it corresponds to the first data element and second data element.
Clark discloses a first data element having a first data precision (Fig. 1A, 110; Col. 3, lines 34-39, multiplier 110 uses ‘first data type’) and a second data element having a second data precision (Fig. 1A, 111; Col. 3, lines 34-39, multiplier 111 uses ‘second data type’).
The motivation to combine provided with respect to claim 1 similarly applies.
Regarding claim 18, in addition to the teachings addressed in the claim 1 analysis,
the rejection of claim 1 is incorporated and Master teaches the processor wherein
the logic ([0008]; Fig. 2, 200, computational cores) is configured to:
identify a largest exponent ([0252] determines maximum radix-32 exponent) from the products ([0187]); and
denormalize ([0252] difference between exponent and the maximum exponent) at least some of the products based on the largest exponent ([0252] maximum exponent) to compute denormalized products ([0252] results from shifting by the difference between exponent and the maximum exponent),
wherein the denormalized products are accumulated by the accumulator ([0176] accumulator; Fig. 2, 315 [0138]; [0252] before the accumulator) to obtain a product sum ([0252]; Fig. 23, output of 456); and
the product sum ([0252]; Fig. 23, output of 456) is normalized ([0253] convert to floating point) to a floating point value ([0253], [0394]).
Regarding claim 19, in addition to the teachings addressed in the claim 1 analysis,
the rejection of claim 1 is incorporated and Master further teaches the processor wherein:
the plurality of multipliers (Fig. 5A, configurable multiplier 305 in RAE 300) are arranged in blocks that are cascaded in a sequence ([0335] four RAEs 300).
Claim 22 is directed to a system that similarly recites limitations that are practiced by the device of claim 1. All limitations recited in claim 22 are practiced by the device of claim 1. The claims 1 analysis equally applies to claim 22. Additionally, Master further discloses the following: in claim 22, Master discloses a computing system comprising:
a network controller ([0117]; Fig. 1, first interconnection network 120); and
a processor ([0117]; Fig. 1, reconfigurable processor 100) coupled to the network controller ([0117]; Fig. 1, first interconnection network 120), wherein the processor includes logic ([0008]; Fig. 2, 200, computational cores) coupled to a substrate ([0008] integrated circuit, circuit board), wherein the logic includes an arithmetic block ([0121]; Fig. 2, 300, reconfigurable arithmetic engine (RAE) circuit).
Claim 24 is directed to a method that would be practiced by the device of claim 1. All steps recited in claim 24 is practiced by the device of claim 1. The claims 1 analysis equally applies to claim 24.
Regarding claim 27, in addition to the teachings addressed in the claim 1 analysis, the rejection of claim 1 is incorporated and Master further teaches the processor wherein:
a single multiplier of the plurality of multipliers ([0129], [0175-0176]; Fig. 5, 305 multiplier, many multipliers utilized for signed modes see Table 1) is to receive multiple data elements ([0431] holding for 4-bit configuration of each slice) and to compute ([0187]) multiple products from the multiple data elements ([0431] holding for 4-bit configuration of each slice) in a same cycle, wherein the multiple data elements ([0431] holding for 4-bit configuration of each slice) include the second data element .
While Master generally teaches multipliers, they do not explicitly disclose to receive multiple data elements and to compute multiple products from the multiple data elements in a same cycle, wherein the multiple data elements include the second data element .
Clark teaches a single multiplier is to receive multiple data elements (Fig. 1A, OpB and OpA, Op1-4; Col. 3, lines 61-66; Col. 4, lines 34-57) and to compute multiple products (Fig. 1A, Out, Op1* Op3 and OpA*OpB; Col. 5, lines 33-46) from the multiple data elements in a same cycle (Col. 5, lines 33-46; Col. 4, lines 1-33), wherein the multiple data elements include the second data element (Fig. 1A, 111; Col. 3, lines 34-39, multiplier 111 uses ‘second data type’).
The motivation to combine provided with respect to claim 1 equally applies.
Regarding claim 28, in addition to the teachings addressed in the claim 27 analysis, the rejection of claim 27 is incorporated and Master further teaches the processor wherein:
the multiple data elements ([0431] holding for 4-bit configuration of each slice) are in different input channels (Fig. 4B, 105A, 105B, 105C; Fig. 6, 350) of a tensor operation.
While Master generally teaches data elements ([0431] holding for 4-bit configuration of each slice), they appear to be silent with disclosing a tensor operation.
Master in view of Clark are silent with disclosing a tensor operation.
However, in the same field of endeavor of multiply-accumulate devices [abstract], Saxena teaches a tensor operation ([0451-0452] 4x4 matrix, [0146]).
The motivation to combine provided with respect to claim 1 equally applies.
Regarding claim 29, in addition to the teachings addressed in the claim 1 analysis, the rejection of claim 1 is incorporated and Master further teaches the processor wherein:
the plurality of multipliers (Fig. 5A, configurable multiplier 305 in RAE 300) are arranged in blocks that are sequentially cascaded ([0335] four RAEs 300).
Regarding claim 30, in addition to the teachings addressed in the claim 27 analysis, the rejection of claim 27 is incorporated and Master further teaches the processor wherein:
each of the multiple data elements ([0431] holding for 4-bit configuration of each slice) have the second data precision.
While Master generally teaches data elements ([0431] holding for 4-bit configuration of each slice), they do not explicitly disclose each of the multiple data elements, and the second data precision as it relates to the second data element.
Clark teaches multiple data elements (Fig. 1A, OpB and OpA, Op1-4; Col. 3, lines 61-66; Col. 4, lines 34-57); second data element having a second data precision (Fig. 1A, 111; Col. 3, lines 34-39, multiplier 111 uses ‘second data type’).
The motivation to combine provided with respect to claim 1 equally applies.
Regarding claim 31, in addition to the teachings addressed in the claim 3 analysis, the rejection of claim 3 is incorporated and Master further teaches the processor wherein:
the another data element ([0431] holding for 4-bit configuration of each slice) has the first data precision.
Master does not explicitly disclose the data element has the first data precision as it relates to the first data element.
Clark teaches the first data element having a first data precision (Fig. 1A, 110; Col. 3, lines 34-39, multiplier 110 uses ‘first data type’).
The motivation to combine provided with respect to claim 1 equally applies.
Regarding claim 34, in addition to the teachings addressed in the claim 22 analysis, the rejection of claim 22 is incorporated and Master further teaches the processor wherein:
a single multiplier of the plurality of multipliers ([0129], [0175-0176]; Fig. 5, 305 multiplier, many multipliers utilized for signed modes see Table 1) is to receive multiple data elements ([0431] holding for 4-bit configuration of each slice) from the register file (Fig. 46, 526, [0431]) and to compute ([0187]) multiple products from the multiple data elements in a same cycle, wherein the multiple data elements include the second data element.
While Master generally teaches a register file (Fig. 46, 526, [0431]) and multipliers, they do not explicitly disclose a single multiplier is to receive multiple data elements and to compute multiple products from the multiple data elements in a same cycle, wherein the multiple data elements include the second data element.
Clark teaches a single multiplier is to receive multiple data elements (Fig. 1A, OpB and OpA, Op1-4; Col. 3, lines 61-66; Col. 4, lines 34-57) and to compute multiple products (Fig. 1A, Out, Op1* Op3 and OpA*OpB; Col. 5, lines 33-46) from the multiple data elements in a same cycle (Col. 5, lines 33-46; Col. 4, lines 1-33) wherein the multiple data elements include the second data element (Fig. 1A, 111; Col. 3, lines 34-39, multiplier 111 uses ‘second data type’).
The motivation to combine provided with respect to claim 22 equally applies.
Regarding claim 35, in addition to the teachings addressed in the claim 24 analysis, the rejection of claim 24 is incorporated and Master further teaches the processor wherein:
the plurality of multipliers ([0129], [0175-0176]; Fig. 5, 305 multiplier, many multipliers utilized for signed modes see Table 1) are configured to operate in different modes comprising a first mode ([0135] one of the modes selected 27x27 unsigned multiplier, 8x8, 16x16, 32x32, for example 16) and in a second mode ([0135] one of the other modes not selected as the “first mode” 27x27 unsigned multiplier, 8x8, 16x16, 32x32, for example 8). the first data element is processed in the first mode ([0135] one of the modes selected 27x27 unsigned multiplier, 8x8, 16x16, 32x32, for example 16) and the second data element is processed in a second mode ([0135] one of the other modes not selected as the “first mode” 27x27 unsigned multiplier, 8x8, 16x16, 32x32, for example 8).
While Master generally teaches multipliers, they do not explicitly disclose a first data element and a second data element.
Clark discloses a first data element (Fig. 1A, 110; Col. 3, lines 34-39, multiplier 110 uses ‘first data type’) and a second data element (Fig. 1A, 111; Col. 3, lines 34-39, multiplier 111 uses ‘second data type’).
The motivation to combine provided with respect to claim 1 equally applies.
Claims 5 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Master in view of Clark in view of Saxena in view of Lee, and further in view of US 20200073637 A1 Carlson (hereinafter “Carlson”).
Regarding claim 5, in addition to the teachings addressed in the claim 1 analysis,
the rejection of claim 1 is incorporated and Master further teaches the processor wherein:
the plurality of multipliers (Fig. 5A, configurable multiplier 305 in RAE 300) are arranged in an array ([0021], [0177], [0184] array) with a rank order, and the accumulator (Fig. 2, 315 [0138]; [0176] accumulator; Fig. 42, 315, [0374]; Fig. 5, 315, [0131]) is configured to accumulate ([0374]) the products by:
summing the products (Fig. 30, 128 bit carry-save format vectors to accum/adder; Fig. 42, ZIN carry-save signed mantissa (128 bit x2), [0399]) to compute summed products (Fig. 42, 509; [0395]);
shifting the summed products to obtain shifted products (Fig. 42, 511B, [0395]); and
adding ([0406]; Fig. 5, 340, [0172]) the shifted products (Fig. 42, 512, 514, [0405-0406]; Fig. 5, output of 315 accumulator to final add, round and saturate circuit 340).
While Master generally teaches summing partial products as explained in the rejection of claim 5 above, Master does not disclose summing in rank order. Further, Master in view of Clark in view of Saxena in view of Lee dose not disclose summing in a rank order.
However, in the same field of endeavor of multiply-accumulate devices [abstract], Carlson teaches the technique of summing partial products in rank order ([0027-0028]; Fig. 1A, 121 output, 122 output, 123 output).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Master in view of Clark in view of Saxena in view of Lee with Carlson’s rank order technique as disclosed by Carlson. Master teaches summing partial products ([0189] removed partial products summed separately) generally, but does not disclose an order. Modifying Master in view of Clark in view of Saxena in view of Lee’s partial product techniques with Carlson’s dot product circuitry by adjusting Master in view of Clark in view of Saxena in view of Lee’s architecture to accommodate the three stages of partial product accumulators would lead to additional architecture to process the partial products in rank order. This architecture would further allow Master in view of Clark in view of Saxena in view of Lee’s techniques to more effectively compress the partial products of a like magnitude and perform those operations in a smaller precision than the final product [0023], leading to less hardware and area while improving speed and processing performance [0009].
Regarding claim 10, in addition to the teachings addressed in the claim 1 analysis, the rejection of claim 1 is incorporated and Master further teaches the processor wherein:
the plurality of multipliers (Fig. 5A, configurable multiplier 305 in RAE 300) are arranged in an array ([0021], [0177], [0184] array) with a rank order, and the accumulator (Fig. 2, 315 [0138]; [0176] accumulator; Fig. 42, 315, [0374]; Fig. 5, 315, [0131]) is configured to accumulate the products ([0374]) in the rank order.
While Master generally teaches summing multiplier outputs as explained in the rejection of claim 10 above, Master does not disclose summing in rank order. Further, Master in view of Clark in view of Saxena in view of Lee dose not disclose summing in a rank order.
However, in the same field of endeavor of multiply-accumulate devices [abstract], Carlson teaches the technique of summing in rank order ([0027-0028]; Fig. 1A, 121 output, 122 output, 123 output).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Master in view of Clark in view of Saxena in view of Lee with Carlson’s rank order technique as disclosed by Carlson. Master teaches summing multiplier outputs ([0176] sum of products within a RAE circuit quad) generally, but does not disclose an order. Modifying Master in view of Clark in view of Saxena in view of Lee’s multiplier output techniques with Carlson’s dot product circuitry by adjusting Master in view of Clark in view of Saxena in view of Lee’s architecture to accommodate the three stages of partial product accumulators would lead to additional architecture to process the multiplier outputs in rank order. This architecture would further allow Master in view of Clark in view of Saxena in view of Lee’s techniques to more effectively compress the multiplier outputs of a like magnitude and perform those operations in a smaller precision than the final product [0023], leading to less hardware and area while improving speed and processing performance [0009].
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Master in view of Clark in view of Saxena in view of Lee, in further view of US 20180046916 A1 Dally et al. (hereinafter “Dally”).
Regarding claim 20, in addition to the teachings addressed in the claim 1 analysis,
the rejection of claim 1 is incorporated and Master further teaches the processor wherein:
the plurality of multipliers (Fig. 5A, configurable multiplier 305 in RAE 300) are configured to operate ([0135], [0132], [0175], [0237]); the data elements ([0431] holding for 4-bit configuration of each slice).
While Master generally teaches multipliers as explained in the rejection of claim 20 above, Master does not disclose the various data precisions, a sparsity bitmap format indicating sparsity, and a bitmap format.
Clark teaches various data precisions (Fig. 1A, OpB and OpA, Op1-4; Col. 3, lines 61-66; Col. 4, lines 34-57).
The motivation to combine provided with respect to claim 1 equally applies.
Master in view of Clark do not disclose the various data precisions, a sparsity bitmap format indicating sparsity, and a bitmap format.
Master in view of Clark in view of Saxena is silent to disclosing a sparsity bitmap format indicating sparsity and a bitmap format.
Master in view of Clark in view of Saxena in view of Lee is silent to disclosing a sparsity bitmap format indicating sparsity and a bitmap format.
However, in the same field of endeavor of multiply-accumulate operations [0050], Dally teaches a sparsity bitmap format indicating sparsity ([0076]) and a bitmap format ([0076] output activations and output positions into compressed-sparse form).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Master in view of Clark in view of Saxena in view of Lee with Dally’s arranging technique. Master teaches weights ([0186-0187], [0226]) and multiple precisions ([0175-0176]) generally, but does not disclose activations and arranging those weights and activations in a bitmap format. Modifying Master in view of Clark in view of Saxena in view of Lee’s weighting and multiple precisions techniques with Dally’s processing elements circuitry by adjusting Master in view of Clark in view of Saxena in view of Lee’s architecture to accommodate the SCNN Accelerator would lead to additional architecture to arrange the weights and activations. This architecture would further allow Master in view of Clark in view of Saxena in view of Lee’s techniques to more effectively calculate nonzero values by ensuring only nonzero elements of weights and input activations are provided, leading to less transitions on bus which reduce energy consumption and operations [0031]. Further, it would more efficiently allow reusing processed data as the input data for the next layer [0048], leading to reduced energy consumption [0107] and improving processing throughput [0030].
Claim 32 is rejected under 35 U.S.C. 103 as being unpatentable over Master in view of Clark in view of Saxena in view of Lee, in further view of US 20200050581 A1 Pinilla Pico et al. (hereinafter “Pinilla”).
Regarding claim 32, in addition to the teachings addressed in the claim 1 analysis, the rejection of claim 1 is incorporated.
Master in view of Clark in view of Saxena in view of Lee are silent with disclosing wherein N is 5.
Pinilla teaches wherein N is 5 ([0021]).
It would have been obvious to one of ordinary skill in the art before the effective
filing date to modify Master in view of Clark in view of Saxena in view of Lee with Pinilla’s N is 5 feature because they are in the claimed invention’s same field of endeavor of computer architecture devices ([Abstract]). In practicality, there are only a finite number of sizes for a register file, with Master teaching as low as 4-bits (Master, [0431]) and Saxena teaching as high as 64-bits (Saxena, [0451-0453]), so it would have been obvious to try 5-bits, as 5 is within the taught range of bits. Further, Pinilla teaches the register file is configurable in which registers are utilized during operation, as any number of data bits can be written to any number of registers, thereby making this modification would beneficial as this technique has viability in avoiding unnecessary writes during operation ([0021]).
Response to Arguments
35 USC 112(b). The rejections are withdrawn based on the amendment to the claims.
35 USC 103.
Applicant’s arguments, see Remarks, filed 20 January 2026, Pg. 11, Para. 1-3 with respect to the rejection(s) of claim(s) 1, 3, 5, 9-15, 17-20, 22, 24, 27-32, 34-35 under 35 USC 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Lee, as necessitated by the amendment.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARKUS A VILLANUEVA whose telephone number is (703)756-1603. The examiner can normally be reached M - F 8:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached at (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MARKUS ANTHONY VILLANUEVA/Examiner, Art Unit 2151
/James Trujillo/Supervisory Patent Examiner, Art Unit 2151