DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Action is FINAL and is in response to the amendment filed January 23rd, 2026. Claims 1-2, 4, 6-7, 9-15, 17-23 are pending, of which claims 1-2, 4, 6-7, 9-15, 17-23 are currently rejected. Claims 3, 5, 8, and 16 have been cancelled by Applicant.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/18/2025, 01/26/2026, 03/27/2026 is in compliance with the provisions of 37 CFR 1.97. It has been placed in the application file, and the information referred to therein has been considered as to the merits.
Response to Arguments
The amendment filed January 23rd, 2026 has been entered. Claims 1-2, 4, 6-7, 9-15, 17-23 remain pending in the application. Applicant’s amendments to the have overcome some claim objections, all specification objections, and all 112(b) rejections previously set forth in the Non-Final Office Action mailed October 23rd, 2025.
Specification Objection
Applicant has amended the title and this new title is sufficient. Applicant has also amended the specification. Therefore, the previous objection to the Specification has been withdrawn.
Claim Interpretation
Claims continue to be interpreted under 112(f).
Claim Objections
Applicant has amended or cancelled claims accordingly and therefore the previous claim objections have been resolved. However, new claim objections have been made.
See Claim Objections.
Claim Rejections– 35 USC 112
Applicant has amended or cancelled claims accordingly and therefore resolved lack of clarity or antecedent basis issues. Therefore, rejections under 35 USC 112(b) have been withdrawn.
Prior Art Rejections
Arguments have been fully considered and are partially persuasive.
Arguments regarding the 103 rejection of claim 1 and thereby dependents with respect to Wang (US 2020/0234099) and Willcock (US 2022/0171605) not teaching the sequencing of both weights and input operand at once are considered persuasive (Applicant Remarks Pg. 16). However, new grounds of rejection have been made as necessitated by amendments. See Claim Rejections - 35 USC § 102 and Claim Rejections - 35 USC § 103.
Arguments regarding the 102 rejection of claims 1 and 17 under Miscuglio (US 2023/0152667) have been fully considered and are not persuasive. Applicant alleges that Miscuglio’s “MUX and optical bus” do not have the ability to permute values in order to ensure reading of values ones and to avoid memory collisions. However, Examiner respectfully disagrees. As discussed in ¶ 0035 of Miscuglio, the combined input (of both weights and activations) is provided to MUX, which would then provide respective inputs to the corresponding processing element. MUX does not provide inputs in parallel, rather a certain order of the inputs must be provided to the processing array in order for the correct values to be operated on in each respective processing element. Along those lines, a person having ordinary skill in the art would know that the MUX would also have to manage inputs so that memory collisions do not occur. New reasons of rejection have been made as necessitated by amendments (for claim 1), and new grounds of rejection have been made as necessitated by amendments (for the rest of the claims).
See Claim Rejections - 35 USC § 102 and Claim Rejections - 35 USC § 103.
Claim Objections
Claims 20-23 are objected to because of the following informalities which recite grammatical issues:
Claim 20 line 7 “set of memory devices” should be “a set of memory devices”
Claims 21-23 are objected to based on their dependence on claim 20.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.
The following limitations are interpreted as invoking 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
Claim 1, “a first memory device for storing the input values”. The corresponding structure in the disclosure for performing the claimed storing of input values is shown in Figure 12 Element 1210-1 through 1210-64 and is disclosed in ¶ 0097.
Claim 20 “an array of dot product units…that generates dot product values”. The corresponding structure in the disclosure for performing the claimed generating of dot product values is shown in Figure 12 Element 1240 and is disclosed in ¶ 0097.
Claim 20, “a set of memory devices storing a plurality of weight values and a plurality of activation values”. The corresponding structure in the disclosure for performing the claimed storing of a plurality of weight values and a plurality of activation values is shown in Figure 12 Element 1210-1 through 1210-64 and is disclosed in ¶ 0097.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1, 11, and 17 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Miscuglio et al. (US 2023/0152667 A1) (hereinafter “Miscuglio”).
Regarding claim 1, Miscuglio teaches:
A hardware circuit comprising:
a combinatorial tree having an array of dot product units comprising plurality of rows and a plurality of columns (Fig. 5 shows a combinatorial tree i.e., photonic dot product engine array), configured to generate one or more dot product values based on input values received at a first column of the combinatorial tree (Fig. 5 shows data being provided to the first column of the combinatorial tree and dot product operations occur at each of the cells as discussed in ¶ 0007), the input values comprising weight values and operand values (¶ 0035);
a first memory device comprising a plurality of memory banks configured to store the input values (¶ 0034 photonic memories i.e., memory banks to bring in inputs to be operated on and provided to dot product operands of the photonic dot product engines or PDPEs);
a sequencer operatively coupled to a switch fabric, the sequencer and the switch fabric configured to (¶ 0035 MUX and optical bus coupled to each other):
read, from the plurality of memory banks of the first memory device, the weight values and the operand values (¶ 0035 MUX and optical bus i.e., switch fabric as sequencer of signals to organize input values and prepare for dot product operations corresponding to each cell of the PDPEs); and
permute the weight values and the operand values in the switch fabric to generate a sequence of individual input vectors to be provided to the first column of the combinatorial tree (¶ 0035 MUX and optical bus i.e., switch fabric as sequencer of signals to organize input values i.e., weights and operands and prepare for dot product operations corresponding to each cell of the PDPEs),
wherein each individual input vector of the sequence of individual input vectors comprises a subset of the weight values and a corresponding subset of operand values read from the plurality of memory banks, and the sequence of individual input vectors controls an order in which the operand values are applied to the weight values by the dot product units at the first column to generate the one or more dot product values (¶ 0035 input vectors of weights and operands and MUX provides sequence of vectors to PDPEs), and
wherein the weight values and the operand values are permuted in the switch fabric such that each weight value and operand value is read exactly once from the plurality of memory banks and no memory bank collisions occur during the read of the weight values and the operand values from the plurality of memory banks (¶ 0035 MUX selects each of the input vectors once in order to supply inputs to corresponding PDPE’s, MUX would avoid memory collisions occurring amidst the PDPEs); and
a clock configured to control a time for when an input vector of the sequence of individual input vectors is provided to the first column of the combinatorial tree from the switch fabric (¶ 0055 Table 2 shows the use of a clock for controlling inputs to the photonic tensor core; ¶ 0035 MUX and optical bus i.e., switch fabric as sequencer of signals to organize input values and prepare for dot product operations corresponding to each cell of the PDPEs; Fig. 5 shows input values being provided to first column of PDPEs); and
wherein each dot product unit of the plurality of dot product units comprises an accumulator configured to accumulate the one or more dot product values (¶ 0018 teaches post multiplication accumulation occurring, therefore accumulator circuitry i.e., hardware units are present within the dot product engines of the PDPEs), and wherein during one or more subsequent time periods defined by the clock, additional vectors of the sequence of individual input vectors are provided to accumulators of dot product units associated with the first column (0055 Table 2 shows the use of a clock for controlling inputs to the photonic tensor core; ¶ 0035 MUX and optical bus i.e., switch fabric as sequencer of signals to organize input values and prepare for dot product operations corresponding to each cell of the PDPEs; Fig. 5 shows input values being provided to first column of PDPEs), and wherein (¶ 0044 values from accumulators of PDPEs can be read out individually from the column).
Regarding claim 11, Miscuglio further teaches:
The hardware circuit of claim 1, wherein a single dot product unit is operatively coupled to at least two adjacent dot product units such that an output value generated by the single dot product unit is shared as an input value to each of the at least two adjacent dot product units (Miscuglio: Fig. 5 shows PDPEs connected to each other, for example cell D0,0 is connected to D0,1 and D1,0).
Regarding claim 17, Miscuglio teaches:
The hardware circuit of claim 1, wherein the hardware circuit is part of a photonic integrated circuit (PIC) (Abstract matrix/vector multiplier is implemented in a photonic integrated circuit).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Miscuglio, further in view of Yinger et al. (US 2019/0012295 A1) (hereinafter “Yinger”).
While Miscuglio teaches the hardware circuit of claim 1, Miscuglio does not explicitly teach the hardware units comprising fused multiply-accumulate (FMA) units.
However, Yinger teaches processing elements i.e., hardware units that perform FMA operations, and thus comprise FMA circuitry i.e., unit (Yinger: ¶ 0025).
It would be obvious to combine the FMA units as taught by Yinger with the hardware circuit as taught by Miscuglio as all teachings are directed towards matrix multiplication. One with ordinary skill in the art would be motivated to combine all teachings for easy implementation of calculations for both sparse and dense operands as needed (Yinger: ¶ 0026).
Claims 4, 6-7, and 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Miscuglio, further in view of Wang et al. (US 2020/0234099 A1) (hereinafter “Wang”),
Regarding claim 4, while Miscuglio teaches the hardware circuit of claim 1, Miscuglio does not explicitly teach a control circuit for using data paths for swapping neighboring columns of the array.
However, Wang teaches:
further comprising a control circuit, the array of dot product units comprising data paths, the control circuit being configured to use the data paths to cause at least two weight values to swap between two neighboring columns of the array or to propagate a weight value from a first row of the array to a second row of the array (Wang: ¶ 0073 controller circuit that controls multiplexers; ¶ 0537 the multiplexers act as crossbars i.e., data paths which enable swapping of values between columns).
It would be obvious to combine the control circuitry for swapping neighboring columns as taught by Wang with the hardware circuit as taught by Miscuglio as all teachings are directed towards matrix multiplication. One with ordinary skill in the art would be motivated to combine all teachings in order to account for sparse operations in spreading more non-zero activations across lanes i.e., columns (Wang ¶ 0353).
Regarding claim 6, while Miscuglio teaches the hardware circuit of claim 1 and the sequencer for reorganizing input data and determining a sequence of input vectors provided to the first column of processing elements (Miscuglio: ¶ 0035 and Fig. 3a), Miscuglio does not explicitly teach the sequencer determining a sequence of input vectors that are provided to the first column based on a set of parameters.
However, Wang teaches:
wherein the sequencer determines the sequence of individual input vectors provided to the first column based on a set of parameters, the set of parameters comprising at least one of a stride value, a dilation value, or a kernel size value (Wang: ¶ 0284 input vectors provided to register files i.e., first memory device; ¶ 0285 input vectors provided based on parameters such as operation type or stride as discussed ¶ 0407).
It would be obvious to combine the determination of sequence of input vectors based on a set of parameters as taught by Wang with the hardware circuit as taught by Miscuglio as all teachings are directed towards matrix multiplication. One with ordinary skill in the art would be motivated to combine all teachings to enable mapping to available hardware (Wang: ¶ 0407).
Regarding claim 7, while Miscuglio teaches the generating of input vectors from operand and weight values, Miscuglio does not explicitly teach this operation occurring in each cycle.
However, Wang teaches:
wherein the sequencer is configured to generate each individual input vector of the sequence of individual input vectors by reading one or more weight values of the weight values and one or more operand values of the operand values read from the first memory device during one or more clock cycles (Wang: ¶ 0284 input vectors provided to register files i.e., first memory device; ¶ 0280 every clock cycle inputs or weights are accepted from memory, i.e., input vector generation must occur in order for inputs to be sent to corresponding processing elements).
The motivation to combine with respect to claim 6 applies equally to claim 7.
Regarding claim 9, Miscuglio in view of Wang further teaches the inputs being weights and activations i.e., weights and operand values and reading in via clock signals the input values from the first memory device through value loaders (Miscuglio: ¶ 0056 clock signal to drive circuit), as well as the data being read in from memory banks with the memory banks being equivalent to a number of rows of the combinatorial tree (Miscuglio: Fig. 1 shows four photonic memory banks coupled to photonic integrated circuit holding PDPEs; Fig. 5 shows the actual array of PDPEs having four rows of PDPEs which is equivalent with the number of photonic memory banks).
Regarding claim 10, Miscuglio in view of Wang further teaches:
The hardware circuit of claim 7, wherein each successive read of the first memory device, by the sequencer, results in reading of new weight values of the weight values and new operand values of the operand values from the first memory device not read by the sequencer during a previous clock cycle (Wang: ¶ 0284 input vectors provided to register files i.e., first memory device; ¶ 0280 every clock cycle inputs are accepted from memory, i.e., input vector generation must occur in order for inputs to be sent to corresponding processing elements, new inputs are accepted for every clock cycle and calculated in the same cycle).
The motivation to combine with respect to claim 1 applies equally to claim 15.
Claims 12-14 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Miscuglio in view of Willcock (US 2022/0171605 A1) (hereinafter “Willcock”).
Regarding claim 12, while Miscuglio in view of Wang teaches the hardware circuit of claim 1, Miscuglio in view of Wang does not explicitly teach:
wherein the accumulator comprises a plurality of accumulators configured to accumulate partial dot product values generated by a single dot product unit as values flow through the single dot product unit
However, Willcock teaches:
wherein the accumulator comprises a plurality of accumulators configured to accumulate partial dot product values generated by a single dot product unit as values flow through the single dot product unit (Willcock: Fig. 3 shows inside the cell, multiple accumulators accumulate in order to have a final accumulation of dot products as values flow through the cell).
It would be obvious before the effective filing date of the claimed invention to combine the plurality of accumulators as taught by Willcock with the hardware circuit as taught by Miscgulio as both teachings are directed towards dot product computations. One with ordinary skill in the art would be motivated to combine the teachings because this can help reduce the amount of data that needs to be sent to the array of processing elements in the array (Willcock: ¶ 0060).
Regarding claim 13, Miscuglio in view of Willcock further teaches:
The hardware circuit of claim 1, wherein the combinatorial tree comprises registers associated with the dot product units configured to maintain clock alignment of values flowing through the dot product units (Willcock: ¶ 0041 values provided to register in cell i.e., associated with hardware units of cell to help synchronize values according to the clock cycles).
The motivation to combine with respect to claim 12 applies equally to claim 13.
Regarding claim 14, Miscuglio in view of Willcock further teaches:
The hardware circuit of claim 1, comprising logic that is configured to generate a set of final dot product values for the input values after all of the sequence of individual input vectors have been inputted to the combinatorial tree, the set of final dot product values being constructed from values stored in accumulators of the dot product units of the combinatorial tree (Willcock: ¶ 0046 final accumulated value from all accumulations and dot products across the cells).
The motivation to combine with respect to claim 12 applies equally to claim 14.
Claim 18 recites the method practiced by the hardware circuit of claim 1. Miscuglio in view of Willcock further teaches:
receiving, at the first column of the combinatorial tree, a plurality of second individual input vectors of the sequence of individual input vectors from the sequencer, each second individual input vector of the plurality of second individual input vectors comprising a second subset of the plurality of weight values and a second subset of the plurality of operand values (Willcock: cells as shown in Fig. 2 receiving input at first column can be used for convolution i.e., dot product as described in ¶ 0037; ¶ 0045 having accumulators; ¶ 0043 teaches inputs being weights and activations i.e., weights and operand values, and reading in via clock cycles input values from first memory device through value loaders; ¶ 0043 corresponding to each clock cycle, a weight and activation are processed and accumulated, hence first cycle would be one set of weights and activations, and next clock cycle would be another set of weights and activations, which are provided to first column as shown in Fig. 2, these values are provided from memory device through a bus i.e., switch fabric ¶ 0051);
processing the plurality of second individual input vectors by the math units at the first column of the combinatorial tree (Willcock: cells as shown in Fig. 2 receiving input at first column can be used for convolution i.e., dot product as described in ¶ 0037; ¶ 0045 having accumulators; ¶ 0043 teaches inputs being weights and activations i.e., weights and operand values, and reading in via clock cycles input values from first memory device through value loaders; ¶ 0043 corresponding to each clock cycle, a weight and activation are processed and accumulated, hence first cycle would be one set of weights and activations, and next clock cycle would be another set of weights and activations, which are provided to first column as shown in Fig. 2, these values are provided from memory device through a bus i.e., switch fabric ¶ 0051); and
after all of the sequence of individual input vectors has been received by the first column of the combinatorial tree and processed by associated math units, providing the one or more dot product values by retrieving an individual dot product value from each accumulator in each of the math units residing in every row of the first column of the combinatorial tree (Willcock: ¶ 0046 a final accumulated value is obtained at the bottom of the column based on accumulators of each of the math units i.e., cells from Fig. 2).
The motivation to combine with respect to claim 12 applies equally to claim 18.
Regarding claim 19, Miscuglio in view of Willcock further teaches:
The method of claim 18, wherein the at least one accumulator comprises a plurality of accumulators in each of the math units, the plurality of accumulators being configured to accumulate partial dot product values generated by individual math units as values flow through the individual math units (Willcock: Fig. 2 shows the plurality of cells i.e., math units; Fig. 3 shows each of the math units having accumulation via a plurality of accumulators, which accumulate the partial dot products as discussed in ¶ 0029).
The motivation to combine with respect to claim 12 applies equally to claim 19.
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Miscuglio in view of Willcock in view of Dhanoa et al. (8307021) (hereinafter “Dhanoa”).
Regarding claim 15, while Miscuglio in view of Willcock teaches the hardware circuit of claim 14 and computation of final dot product values (Willcock: ¶ 0046), Miscuglio in view of Willcock does not explicitly teach the storing of these final dot product values in the same storage as input values.
However, Dhanoa teaches storage of outputs from column in the same memory that the input is stored (Dhanoa: Col. 2 Lines 66-67 and Col. 3 Lines 1-10).
It would be obvious before the effective filing date of the claimed invention to combine the storage of outputs in the first memory as taught by Dhanoa with the hardware circuit as taught by Miscuglio in view of Wilcock because all teachings are directed towards dot product computations via an array of processing elements. One with ordinary skill in the art would be motivated to combine the teachings because it would allow memory bandwidth to be freed by reducing the need of memory access from various memories (Dhanoa: Col. 21 Lines 52-54).
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Miscuglio in view of Dhanoa in view of Wang in view of Snelgrove (US 2022/0171829 A1) (hereinafter “Snelgrove”).
Regarding claim 20, Miscuglio teaches the limitations of claim 20 as discussed with respect to claim 1 (the array of dot product units for the combinatorial tree, set of memory devices as the respective memory banks, sequencer and its function of reading and permuting weight and activation i.e., operand values, a clock circuit, and the individual input vectors).
Miscuglio does not explicitly teach the dimensions of the arrangement of dot product units, the bit-width of inputs, or the writing of accumulated values back into the set of memory devices as claimed
However, Dhanoa teaches storage of outputs from column in the same memory that the input is stored (Dhanoa: Col. 2 Lines 66-67 and Col. 3 Lines 1-10).
The motivation to combine with respect to claim 15 applies equally to claim 20.
Miscuglio in view of Dhanoa does not explicitly teach the dimensions of the arrangement of dot product units, the bit-width of inputs as claimed
However, Wang teaches inputs to dot product operations being 8 bits (Wang: ¶ 0350), and inputs having 64 channels i.e., input vectors (Wang: ¶ 0399).
It would be obvious to combine the inputs and input vector dimensions as taught by Wang with the circuit of Miscuglio in view of Dhanoa as all teachings are directed towards matrix multiplication. One with ordinary skill in the art would be motivated to combine the teachings because doing so would improve throughput in fully connected calculations to keep multiplier utilization high (Wang: ¶ 0524).
Miscuglio in view of Dhanoa in view of Wang does not explicitly teach the dimensions of the arrangement of dot product units as claimed.
However, Snelgrove teaches 2D array's dimensions may be accommodated according to inputs, 64x32 dimensions may be needed for utilization based on input vectors (Snelgrove: ¶ 0135).
It would be obvious to combine the dimensions of the dot product unit matrix as taught by Snelgrove with the circuit as taught by Miscuglio in view of Dhanoa in view of Wang as all teachings are directed towards digital design. One with ordinary skill in the art would be motivated to combine the teachings in order to increase the utilization based on input dimensions (Snelgrove: ¶ 0135).
Miscuglio in view of Dhanoa in view of Wang in view of Snelgrove therefore teaches:
A hardware circuit comprising:
an array of dot product units organized as a combinatorial tree configured to generate dot product values based on weight values and activation values received as input by the array of dot product units, the array of dot product units comprises 64 rows by 32 columns of dot product units, each dot product unit being configured to receive as input two 8-bit values and having an accumulator;
set of memory devices comprising a plurality of memory banks, the set of memory devices configured to store a plurality of weight values and a plurality of activation values; and
a sequencer operatively coupled to a switch fabric, the sequencer and switch fabric operatively coupling the array of dot product units to the set of memory devices, and configured to:
read, from the plurality of memory banks, the plurality of weight values and the plurality of activation values;
permute the plurality of weight values and the plurality of activation values in the switch fabric to generate a sequence of individual input vectors as input to a first column of the array of dot product units,
wherein each individual input vector comprises a subset of the plurality of weight values and a corresponding subset of the plurality of activation values read from the plurality of memory banks, and the sequence of individual input vectors controls an order in which the plurality of activation values are applied to the plurality of weight values by the array of dot product units to generate the dot product values, and
wherein the plurality of weight values and the plurality of activation values are permuted in the switch fabric such that each weight value and activation value is read exactly once from the plurality of memory banks and no memory bank collisions occur during the read of the plurality of weight values and the plurality of activation values from the plurality of memory banks;
during a first clock sequence comprising one or more clock cycles, provide a first set of 64 individual input vectors of the sequence of individual input vectors to the first column of the array of dot product units; and
during one or more additional clock sequences, provide a plurality of additional sets of 64 individual input vectors to the first column of the array of dot product units, and wherein when each of the accumulators of the dot product units in the first column of the array of dot product units has 32 bits, writing each value from each accumulator back into the set of memory devices.
Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Miscuglio, in view of Dhanoa in view of Wang in view of Snelgrove, further in view of Yinger.
While Miscuglio in view of Dhanoa in view of Wang in view of Snelgrove teaches the hardware circuit of claim 20, Miscuglio in view of Dhanoa in view of Wang in view of Snelgrove does not explicitly teach the dot product units comprising a plurality of FMA units.
However, Yinger teaches processing elements that perform FMA operations and thus comprise FMA circuitry i.e., units (Yinger: ¶ 0025).
The motivation to combine with respect to claim 2 applies equally to claim 21.
Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Miscuglio in view of Dhanoa in view of Wang in view of Snelgrove, further in view of Komuravelli et al. (US 2021/0319076 A1) (hereinafter “Komuravelli”).
While Miscuglio in view of Dhanoa in view of Wang in view of Snelgrove teaches the hardware circuit of claim 20, Miscuglio in view of Dhanoa in view of Wang in view of Snelgrove does not explicitly teach the dot product units taking as input 32-byte values.
However, Komuravelli teaches 32 byte values used for convolution engine i.e. dot product units (Komuravelli: ¶ 0033).
It would be obvious to combine the 32-byte input values for dot products as taught by Komuravelli with the circuit as taught by Miscuglio in view of Dhanoa in view of Wang in view of Snelgrove as all teachings are directed towards digital design. One with ordinary skill in the art would be motivated to combine the teachings because this would allow for more efficient byte aligning and therefore more efficient operations (Komuravelli: ¶ 0033).
Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Miscuglio in view of Dhanoa in view of Wang in view of Snelgrove, further in view of Seiler (10970809) (hereinafter “Seiler”).
While Misculgio in view of Dhanoa in view of Wang in view of Snelgrove teaches the hardware circuit of claim 20, Miscuglio in view of Dhanoa in view of Wang in view of Snelgrove does not explicitly teach there being 64 memory banks, each memory bank having a 32-byte size.
However, Wang teaches 64 memory banks as memory devices (¶ 0527).
The motivation to combine with respect to claim 20 applies equally to claim 23.
Miscuglio in view of Dhanoa in view of Wang in view of Snelgrove does not explicitly teach the memory banks each having a size of 32 bytes.
However, Seiler teaches memory banks having a size of 32 bytes each (Seiler: Col. 16 Lines 39-49).
It would be obvious to combine the 32-byte size for memory banks as taught by Seiler with the circuit as taught by Miscuglio in view of Dhanoa in view of Wang in view of Snelgrove as all teachings are directed towards digital design. One with ordinary skill in the art would be motivated to combine the teachings because it would ease access to a greater capacity in memory (Seiler: Col. 16 Lines 39-49).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARIA DE JESUS RIVERA whose telephone number is (571)272-2793. The examiner can normally be reached Monday-Friday 7:30AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached at (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/M.D.R./Examiner, Art Unit 2151