DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Action is FINAL and is in response to the amendment filed January 23rd, 2026. Claims 1-20 are pending, of which claims 1-20 are currently rejected.
Response to Arguments
The amendment filed January 23rd, 2026 has been entered. Claims 1-30 remain pending in the application. Applicant’s amendments to the Claims have overcome some objections to the claims and all 112(b) rejections and 101 rejections previously set forth in the Non-Final Office Action mailed October 23rd, 2025.
Claim Objections
Applicant has amended claims and resolved claim objections as previously set forth in the Office Action Mailed October 23rd, 2025. However, new claim objections have been made.
See Claim Objections.
Claim Rejections – 35 USC § 101
Applicant has amended claim 20 and this has resolved the issues under 35 USC § 101. Therefore, the previous rejections under 35 USC § 101 have been withdrawn.
Claim Rejections – 35 USC § 112(b)
Applicant has amended claims to remove the claims invoking 112(f) interpretation, therefore the interpretation under 112(f) has been withdrawn and corresponding 112(a) and 112(b) issues have been resolved. Applicant has also amended claims 9 and 10, resolving the previous 112(b) issue with regards to claims 9 and 10. Therefore, the rejection of claims 1-20 under 35 USC § 112(a) and 35 USC § 112(b) have been withdrawn.
Prior Art Rejections
Applicant’s arguments regarding 103 rejections have been fully considered and are not persuasive.
Applicant’s arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. Additionally, Applicant’s arguments do not comply with 37 CFR 1.111(c) because they do not clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. Further, they do not show how the amendments avoid such references or objections.
See Claim Rejections – 35 USC § 103.
Claim Objections
Claims 1-12 are objected to:
Claim 1 lines 6 "configured to assigning" should be "configured to assign"
claim 1 line 9-10 "configured to generating" should be "configured to generate"
Claim 1 lines 13-14 "configured to generating" should be "configured to generate"
Claim 1 lines 17-18 "configured to communicating" should be "configure to communicate"
Claims 2-12 are objected to based on their dependence upon claim 1
Claim 2 phrase "submatrix Rc of the result matrix R to one or more desired memory units for the result matrix R" repeated twice in the same claim
Claim 15 line 2 "a compute unit c of the C compute units" in order to avoid confusion.
Claim 16 line 2 “sub matrix Rc” should be “submatrix Rc”.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3 and 6-16 are rejected under 35 U.S.C. 103 as being unpatentable over P. Kennedy (“SambaNova SN10 RDU at Hot Chips 33”, August 2021) (hereinafter “Kennedy”), further in view of R. Prabhakar et al. ("Plasticine: A Reconfigurable Architecture for Parallel Patterns", 2017) (hereinafter “Prabhakar”), further in view of Koeplinger et al. (US 2020/0241844 A1) (hereinafter “Koeplinger”).
Regarding claim 1, Kennedy teaches:
A system for multiplying matrices A and B and producing a result matrix R in a coarse-grained computing grid, the system comprising:
a reconfigurable dataflow unit (RDU) comprising a computing grid, the computing grid comprising C compute units arranged in a 2D grid comprising m logical rows and n logical columns (Kennedy: Pg. 9 first figure shows 2D computing grid comprised of C compute units arranged in logical rows and logical columns);
one or more processors configured to assigning each compute unit c of C compute units to a unique submatrix Rc (Kennedy: Pg. 9 first figure shows compute units being assigned to submatrices based on the stage of computation; Pg. 6 first figure the configuration and pipeline controller as the one or more processors; Pg 9 PCUs as compute units, shown in further detail on Pg. 6 first figure “Cardinal SN10: PCU”);
the one or more processors further configured to generating memory unit configuration information that enables one or more source memory units to provide matrix data (Kennedy: Pg. 7 second figure “Cardinal SN10: AG and CU” address ALUs and coalescing units of memory unit configuration provide this configuration information i.e., one or more processors; Pg. 9 first figure PMUs as memory units, shown in further detail on Pg. 6 second figure “Cardinal SN10: PMU”) via a plurality of packets (Pg. 7 first figure switch allows for packet-switched communication, so data travels through packets);
the one or more processors further configured to generating compute unit configuration information that enables each compute unit c to produce the unique submatrix Rc (Kennedy: Pg. 6 first figure shows PCUs or compute units being divided into submatrices based on the stage being performed by the compute units; Pg. 6 first figure pipeline and configuration controller as one or more processors);
the one or more processors further configured to communicating the memory unit configuration information and the compute unit configuration information to the RDU and initiating data flow in the computing grid to produce the result matrix R within the desired memory units (Kennedy: Pg. 9 first figure shows grid of PCUs and PMUs connected via a switch; end of Pg. 6 discusses how the switch and router crossbar is what directs information from and to the RDU and to the various PCUs and PMUs within the RDU, including configuration information and data to be processed; shown in further detail on Pg. 7 first figure).
Kennedy does not explicitly teach:
of a result matrix R comprising M rows and N columns
provide relevant matrix A data and matrix B data to the C compute units via a plurality of packets
wherein providing matrix B data to the C compute units comprises narrowcasting packets to each column of compute units in the computing grid, wherein the narrow-casted packets comprise matrix B data corresponding to the column of compute units.
However, Prabhakar teaches:
of a result matrix R comprising M rows and N columns (Prabhakar: Pg. 3 Col. 1 Lines 22-23 output matrix i.e., result matrix of dimension M x P)
provide relevant matrix A data and matrix B data to the C compute units (Prabhakar: Pg. 10 Col. 2 Section 4.3 Lines 16-18 memory units are configured for operations; Abstract Lines 21-23, Pg. 3 Col. 2 Lines 2-3 and Fig. 1 shows multiplication of matrices A and B)
and send the unique submatrix Rc to one or more desired memory units (Prabhakar: Pg. 4 Col. 2 Section 3.1 Lines 20-31 reading and writing by PCU i.e., compute units to PMU i.e., memory unit for result matrix computation; Pg. 4 Col. 2 Section 3.1 Lines 6-11 configuration registers for PCUs can additionally be used for the one or more processors for configuration as discussed with regards to Kennedy).
It would have been obvious before the effective filing date of the claimed invention to combine the result matrix, the specific computation of matrix A and matrix B, and communication between the compute units and memory units as taught by Prabhakar with the structure as taught by Kennedy as both teachings are directed towards matrix computation through the use of a 2D grid of compute units and memory units. One with ordinary skill in the art would be motivated to combine both teachings because Prabhakar simply provides a more sophisticated mapping scheme for the same structure as Kennedy, hence allowing for further increase in compute utilization and improving performance (Prabhakar: Pg. 12 Col. 1 Lines 17-20).
While Kennedy in view of Prabhakar teaches the use of packets to provide data to the compute and memory units (Kennedy: Pg. 7 first figure switch allows for packet-switched communication, so data travels through packets) Kennedy in view of Prabhakar does not explicitly teach:
wherein providing matrix B data to the C compute units comprises narrowcasting packets to each column of compute units in the computing grid, wherein the narrow-casted packets comprise matrix B data corresponding to the column of compute units.
However, Koeplinger teaches:
wherein providing matrix B data to the C compute units comprises narrowcasting packets to each column of compute units in the computing grid, wherein the narrow-casted packets comprise matrix B data corresponding to the column of compute units (Koeplinger: ¶ 0035 vector and scalar buses are packet switched in order to determine to what column or row the packet of input data is sent to, these packets being narrow-casted; ¶ 0027 these vector buses contain column major i.e., column-based vectors; Fig. 2 shows the same structure of a 2D grid of compute units and memory units).
It would have been obvious before the effective filing date of the claimed invention to combine packet switching for column-based vectors as taught by Koeplinger with the system as taught by Kennedy in view of Prabhakar as all teachings are directed towards matrix multiplication on a 2D grid of compute units and memory units. One with ordinary skill in the art would be motivated to combine the teachings in order to improve operating efficiency and ease accessing of matrices by the units (Koeplinger: ¶ 0004).
Regarding claim 2, Kennedy in view of Prabhakar in view of Koeplinger further teaches:
The system of claim 1, wherein the one or more processors configures each compute unit to send submatrix Rc of the result matrix R to one or more desired memory units for the result matrix R (Prabhakar: Pg. 7 Col. 2 Lines 27-28 output i.e., result matrix sent to output buses for a given partitioning; Pg. 12 Col. 1 Lines 27-29 PCU i.e., compute unit produces output feature map i.e., result matrix and is sent to another PMU i.e., desired memory unit).
The motivation to combine with respect to claim 1 applies equally to claim 2.
Regarding claim 3, Kennedy in view of Prabhakar in view of Koeplinger teaches:
The system of claim 1, wherein each compute unit c of the C compute units produces the unique submatrix Rc by sequentially providing column-based vectors for matrix A to a vector bus and concurrently conducting a multiply accumulate operation for each data element of the column-based vectors (Koeplinger: ¶ 0027 column major or column based vectors are provided to a vector bus, for this vector bus to be later packet switched in order to provide the vector through packets to the compute units within the grid as discussed in ¶ 0035 for matrix-based computation i.e., multiply-accumulate operations).
The motivation to combine with respect to claim 1 applies equally to claim 3.
Regarding claim 6, while Kennedy in view of Prabhakar teaches compute units of a computing grid being connected to a grid connected memory unit (Prabhakar: Pg. 5 Fig. 5 PCUs i.e., compute units being grid connected to PMUs i.e., memory units), Kennedy in view of Prabhakar does not teach narrow-casted packets being provided.
However, Koeplinger teaches PMUs i.e., memory units containing a vector bus interconnect which is connected to the vector bus having the narrow-casted packets for column major or column-based vectors (Koeplinger: ¶ 0034 - ¶ 0035).
The motivation to combine with respect to claim 1 applies equally to claim 6.
Regarding claim 7, Kennedy in view of Prabhakar in view of Koeplinger further teaches:
The system of claim 1, wherein a compute unit of the computing grid comprises an array of arithmetic units comprising I lanes and J pipelined stages (Prabhakar: Pg. 5 Fig. 3 compute units having pipelined stages and lanes).
The motivation to combine with respect to claim 1 applies equally to claim 7.
Regarding claim 8, while Kennedy teaches compute units, Kennedy does not explicitly teach:
wherein the compute unit comprises a streaming port configurable to sequentially stream K vector packets comprising matrix A data through the I lanes of the array of arithmetic units where each vector packet of the K vector packets comprises I column-ordered data elements corresponding to I rows of matrix A data.
However, Prabhakar teaches:
wherein the compute unit comprises a streaming port configurable to sequentially stream K vector packets comprising matrix A data through the I lanes of the array of arithmetic units where each packet of the K packets comprises I column-ordered data elements corresponding to I rows of matrix A data (Prabhakar: Pg. 5 Fig. 1 shows compute unit PCU having a streaming port through the vector FIFO connection where K packets stream through the I lanes of the array of AU or FU, the packets being column ordered corresponding to these rows i.e., lanes, Fig. 4 shows 2 vector FIFOs one for row based and one for column based, which correspond with each other in turn).
The motivation to combine with respect to claim 1 applies equally to claim 8.
Kennedy in view of Prabhakar does not explicitly teach the packets being vector based.
However, Koeplinger teaches the packets being vector based (Koeplinger: ¶ 0034 - ¶ 0035).
The motivation to combine with respect to claim 1 applies equally to claim 8.
Regarding claim 9, Kennedy does not explicitly teach:
wherein a row connected memory unit is configurable to stream the I rows of matrix A data to the streaming port via the K vector packets.
However, Prabhakar teaches:
wherein a row connected memory unit is configurable to stream the I rows of matrix A data to the vector port via the K vector packets (Prabhakar: Pg. 5 Fig. 4 shows vector FIFO connected to PCU and configured to stream I rows to vector bus via the vector packets).
The motivation to combine with respect to claim 1 applies equally to claim 9.
Kennedy in view of Prabhakar does not explicitly teach the packets being vector based.
However, Koeplinger teaches the packets being vector based (Koeplinger: ¶ 0034 - ¶ 0035).
The motivation to combine with respect to claim 1 applies equally to claim 9.
Regarding claim 10, while Kennedy teaches corresponding stages of the array of compute units and data being provided accordingly (Kennedy: Pg. 9 first figure), Kennedy does not explicitly teach:
wherein the compute unit comprises a staging port configurable to receive J vector packets corresponding to J columns of matrix B data and sequentially provide a data element from each of the J vector packets.
However, Prabhakar teaches the PCUs i.e., compute units interfacing with vector FIFOs i.e., the staging port for providing an element to each of the compute units (Prabhakar: Pg. 5 Fig. 3; Pg. 6 Col. 1 Lines 18-20).
The motivation to combine with respect to claim 1 applies equally to claim 10.
Kennedy in view of Prabhakar does not explicitly teach this staging port allowing for the receiving of J vector packets that correspond to the J columns of the matrix.
However, Koeplinger teaches in ¶ 0027 how column major or column based vectors are provided to a vector bus, for this vector bus to be later packet switched in order to provide the vector through packets to the compute units within the grid as discussed in ¶ 0035. These column vector packets can be provided through the vector FIFO as discussed with respect to Prabhakar.
The motivation to combine with respect to claim 1 applies equally to claim 10.
Kennedy in view of Prabhakar in view of Koeplinger therefore teaches:
The system of claim 8, wherein the compute unit comprises a staging port configurable to receive J vector packets corresponding to J columns of matrix B data and sequentially provide a data element from each of the J vector packets to a corresponding stage of the array of arithmetic units.
Regarding claim 11, Kennedy in view of Prabhakar in view of Kennedy further teaches:
The system of claim 10, wherein the data element is concurrently provided to every arithmetic unit of the corresponding stage of the array of arithmetic units (Prabhakar: Pg. 3 Fig. 1 and 2 show the functions performed by RDU, multiply accumulate operations; Pg. 4 Col. 2 Section 3.1 Lines 5-18 providing data element concurrently to each of the arithmetic units i.e., functional units of the compute units).
The motivation to combine with respect to claim 1 applies equally to claim 11.
Regarding claim 12, Kennedy in view of Prabhakar in view of Koeplinger teaches:
The system of claim 10, wherein each arithmetic unit of the array of arithmetic units is configurable to repetitively conduct a multiply-accumulate operation using a data element from the streaming port and a data element from the staging port (Prabhakar: Pg. 5 Fig. 1 shows compute unit PCU having a streaming port and staging port through the vector FIFO connections where K packets stream through the I lanes of the array of AU or FU, the packets being column ordered corresponding to these rows i.e., lanes, Fig. 4 shows 2 vector FIFOs one for row based and one for column based, which correspond with each other in turn, each of the FIFO connections allowing for the streaming port and staging port connections accordingly).
The motivation to combine with respect to claim 1 applies equally to claim 12.
Claims 13-14 recite the method practiced by the system of claims 1-2 respectively, and are therefore rejected for the same reasons therein.
Regarding claim 15, while Kennedy in view of Prabhakar teaches packets, Kennedy does not teach the packets being vector-sized or being processed in parallel by a compute unit.
However, Koeplinger teaches:
wherein the plurality of packets are vector-sized packets each comprising a vector of data elements that can be processed in parallel by a compute unit (Koeplinger: ¶ 0027 read operations of packets operating in parallel to allow for parallel computations of the packets; ¶ 0035 vector and scalar buses are packet switched in order to determine to what column or row the packet of input data is sent to, these packets being narrow-casted; ¶ 0027 these vector buses contain column major i.e., column-based vectors; Fig. 2 shows the same structure of a 2D grid of compute units and memory units).
The motivation to combine with respect to claim 1 applies equally to claim 15.
Claim 16 recites the method practiced by the system of claim 3, and is therefore rejected for the same reasons therein.
Claims 4-5 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Kennedy in view of Prabhakar, in view of Koeplinger, further in view of Yinger et al. (US 2019/0012295 A1) (hereinafter “Yinger”).
Regarding claim 4, while Kennedy in view of Prabhakar in view of Koeplinger teaches the system of claim 1, Kennedy in view of Prabhakar in view of Koeplinger does not explicitly teach compute units for each row having a dedicated memory for that row.
However, Yinger teaches:
wherein the compute units for each row of the computing grid are connected to a memory unit dedicated to that row of the computing grid (Yinger: Fig. 1 Row feeder dedicated for each of the rows, claim 1 each of the row feeders have a corresponding buffer element i.e., the dedicated memory for each of the rows of the computing grid).
It would have been obvious before the effective filing date of the claimed invention to combine the dedicated memory as taught by Yinger with the system as taught by Kennedy in view of Prabhakar in view of Koeplinger as all teachings are directed towards matrix computations via a 2D grid of processing units. One with ordinary skill in the art would be motivated to combine the teachings because doing so would enable SRAM savings and enabling a quadratic reduction in external RAM bandwidth requirement (Yinger: ¶ 0015).
Regarding claim 5, Kennedy in view of Prabhakar in view Koeplinger in view of Yinger further teaches:
The system of claim 4, wherein all rows of matrix A are stored in the memory unit dedicated to that row of the computing grid (Yinger: Claim 1 second matrix is provided to computing grid row via row feeder and row feeder buffer, all rows of second matrix are stored in dedicated memory).
The motivation to combine with respect to claim 4 applies equally to claim 5.
Claims 17-18 recite the method practiced by the system of claims 4-5 respectively and are therefore rejected for the same reasons therein.
Regarding claim 19, while Kennedy in view of Prabhakar teaches compute units of a computing grid being connected to a grid connected memory unit (Prabhakar: Pg. 5 Fig. 5 PCUs i.e., compute units being grid connected to PMUs i.e., memory units), Kennedy in view of Prabhakar does not teach narrow-casted packets being provided.
However, Koeplinger teaches PMUs i.e., memory units containing a vector bus interconnect which is connected to the vector bus having the narrow-casted packets for column major or column-based vectors (Koeplinger: ¶ 0034 - ¶ 0035).
The motivation to combine with respect to claim 1 applies equally to claim 19.
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Kennedy in view of Prabhakar, in view of Koeplinger, further in view of Nemlekar (US 2020/0183734 A1) (hereinafter “Nemlekar”).
Claim 20 recites the non-transitory computer readable storage medium having instructions for executing the method practiced by the system of claim 1, which is taught by Kennedy in view of Prabhakar in view of Koeplinger. Kennedy in view of Prabhakar in view of Koeplinger does not explicitly teach a computer readable medium having instructions encoded to execute the method.
However, Nemlekar teaches a non-transitory computer readable medium with instructions to be executed for a method of matrix computation on a computer grid (Nemlekar: ¶ 0032, Fig. 1 shows a computing grid at 110).
It would have been obvious before the effective filing date of the claimed invention to combine the non-transitory computer readable medium as taught by Nemlekar with the system as taught by Kennedy in view of Prabhakar in view of Koeplinger as all teachings are directed towards grid-based matrix computations. One with ordinary skill in the art would be motivated to combine the teachings because this would allow the instructions for executing the method to be easily manipulatable (Nemlekar: ¶ 0033).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARIA DE JESUS RIVERA whose telephone number is (571)272-2793. The examiner can normally be reached Monday-Friday 7:30AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached at (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/M.D.R./Examiner, Art Unit 2151
/James Trujillo/Supervisory Patent Examiner, Art Unit 2151