Prosecution Insights
Last updated: May 29, 2026
Application No. 18/195,769

MULTI-CHIP SYSTOLIC ARRAYS

Non-Final OA §103
Filed
May 10, 2023
Examiner
LEE, CHUN KUAN
Art Unit
2181
Tech Center
2100 — Computer Architecture & Software
Assignee
Etched AI Inc.
OA Round
5 (Non-Final)
68%
Grant Probability
Favorable
5-6
OA Rounds
3m
Est. Remaining
72%
With Interview

Examiner Intelligence

Grants 68% — above average
68%
Career Allowance Rate
460 granted / 676 resolved
+13.0% vs TC avg
Minimal +4% lift
Without
With
+3.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
24 currently pending
Career history
703
Total Applications
across all art units

Statute-Specific Performance

§101
1.1%
-38.9% vs TC avg
§103
90.0%
+50.0% vs TC avg
§102
1.5%
-38.5% vs TC avg
§112
1.3%
-38.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 676 resolved cases

Office Action

§103
DETAILED ACTION The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . CONTINUED EXAMINATION UNDER 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 4/24/2026 has been entered. RESPONSE TO ARGUMENTS Applicant’s arguments with respect to claims 1, 3-15, 17-26, and 28-43 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. I. REJECTIONS BASED ON PRIOR ART Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1, 3-11, 14, 26, 28-30, 32-36, 39 and 41-43 are rejected under 35 U.S.C. 103 as being unpatentable over MUSLEH et al. (US Pub.: 2021/0092069) in view of Woo et al. (US Pub.: 2022/0335283), Jang et al. (US Pub.: 2021/0125048), Parra Osorio et al. (US Pub.: 2024/0168723), Sun et al. (US Pub.: 2024/0028869), and Lyuh et al. (US Patent 11,507,429). As per claim 1, MUSLEH teaches/suggests a package comprising: a plurality of integrated circuits (ICs), each comprising component (e.g. associate with a plurality of integrated circuits/dies with corresponding logic component(s): [0169]-[0182]; and [0286]); and wherein the chip-to-chip connections and the plurality of ICs are connected to operate accordingly ([0169]-[0183]) ([0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; and [0286]). MUSLEH does not teach the package comprising: each comprising a local systolic array of data processing units (DPUs), where a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: obtain first data from a first one of the four other DPUs, obtain second data from a second one of the four other DPUs, perform an operation using the first data and the second data, direct the first data, without changes to the first data, to a third one of the four other DPUs, and direct the second data ,without changes to the second data, to a fourth one of the four other DPUs; and chip-to-chip connections configured to connect the local systolic array in each of the plurality of ICs to at least one other local systolic array in another one of the plurality of ICs to form a larger, combined systolic array, being configured so that data passes between DPUs in different local systolic arrays that are directly coupled together in a same number of clock cycles as data passes between DPUs in the same local systolic array and the chip-to-chip connections include bidirectional horizontal chip- to-chip connections that connect the local systolic arrays to form a row of the combined systolic array and unidirectional vertical chip-to-chip connections that connect the local systolic arrays to form a column of the combined systolic array Woo teaches/suggests a system comprising: each comprising a local systolic array of units (e.g. associated with each die (1105) having corresponding systolic array units (1110)); and to connect the local systolic array in each of module to at least one other local systolic array in another module to form a systolic array (e.g. associated with the plurality of dies (1105) being connected together), being configured so that data passes between units in different local systolic arrays that are directly coupled together operate accordingly as data passes between units in the same local systolic array (e.g. associated with data pass between systolic array units (1110) on different dies (1105) operating accordingly as data pass between systolic array units (1110) on same die (1105)), and connect the local systolic arrays to form systolic array and connect the local systolic arrays to form systolic array (e.g. associated with connecting systolic array units (1110)) (Fig 11-12F; and [0058]-[0065]). Jang teaches/suggests a system comprising: chip-to-chip connections configured to connect the plurality of ICs (e.g. associated with connection between plurality of neuromorphic chips 210a-210i : Fig. 3-4; [0041]; [0059]-[0061]) to another one of the plurality of ICs to form a larger, combined systolic array, and the chip-to-chip connections include horizontal chip- to-chip connections that connect to form a row of the combined systolic array and vertical chip-to-chip connections that connect to form a column of the combined systolic array (Fig. 3-5; [0038]-[0047]; and [0057]-[0070]). Parra Osorio teaches/suggests a system comprising: data processing units (DPUs) (e.g. Fig. 27B; [0292]-[0294]; [0382]), where a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: operating with a first one of the four other DPUs, operating with a second one of the four other DPUs, operating with a third one of the four other DPUs, and operating with a fourth one of the four other DPUs (e.g. Fig. 27B; [0364]-[0366]; [0382]); and data passes between DPUs in a same number of clock cycles as data passes between DPUs (e.g. associated with during a clock cycle data would be pass between DPUs in different parts of array architecture of Fig. 27B: [0131]; [0342]-[0353]; [0361]-[0366]; [0382]) (Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; and [0382]). Sun teaches/suggests a system comprising: obtain first data from a first one of four others, obtain second data from a second one of the four others, perform an operation using the first data and the second data, direct the first data, without changes to the first data, to a third one of the four others, and direct the second data ,without changes to the second data, to a fourth one of the four others (Fig. 1; Fig. 3; [0021]-[0027]; and [0041]). Lyuh teaches/suggests a system comprising: include bidirectional horizontal connections and unidirectional vertical connections (Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; and col. 6, ll. 6-63). It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Woo, Jang, Parra Osorio, and Sun’s architecture and Lyuh’s bidirectional connection into MUSLEH’s package for the benefit of improving efficiency and performance while implementing a robust 3D architecture (Woo, [0019]; and [0058]), increasing data processing efficiency (Jang, [0174]), improving performance by reducing bandwidth usage and power consumption (Parra Osorio, [0372]), implementing faster performing reconfigurable architecture that is energy efficient (Sun, [0015]), and minimize accesses to memory to efficiently perform calculations (Lyuh, col. 13, ll. 20-30) to obtain the invention as specified in claim 1. As per claim 3, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, Jang, Parra Osorio, and Lyuh further teach/suggest the package comprising: wherein the bidirectional horizontal chip-to-chip connections permit a rightmost IC within a row of the plurality of ICs in the combined systolic array to feed back data to a leftmost IC within the row (Parra Osorio, Fig. 18C; Fig. 27B; [0292]-[0294]; and Woo, Fig 11-12F; [0058]-[0065]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Jang, Fig. 3-5; [0038]-[0047]; [0057]-[0070]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Lyuh, Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; col. 6, ll. 6-63). As per claim 4, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, Jang, Parra Osorio, and Lyuh further teach/suggest the package comprising: wherein the unidirectional vertical chip-to-chip connections are configured such that data flows only from a topmost row of the plurality of ICs in the combined systolic array to a bottom most row of the plurality of ICs in the combined systolic array (Parra Osorio, Fig. 18C; Fig. 27B; and Woo, Fig 11-12F; [0058]-[0065]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Jang, Fig. 3-5; [0038]-[0047]; [0057]-[0070]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Lyuh, Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; col. 6, ll. 6-63). As per claim 5, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 3 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package further comprising: a plurality of memory chips, wherein at least one of the plurality of memory chips is connected to each one of the plurality of ICs in a topmost row (Parra Osorio, Fig. 18C; Fig. 27B: [0131]; [0342]-[0353]; [0361]-[0366]; [0382]; and Woo, Fig 11-12F; [0058]-[0065]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). As per claim 6, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, Parra Osorio and Sun further teach/suggest the package comprising: wherein the first DPU is further configured to: add a result of the operation to an internal accumulator (e.g. associated with adder (240) of Sun), and after performing two or more operations that are summed in the internal accumulator, directing a value of the internal accumulator to the first one of the four other DPUs (Sun, Fig. 3-5; [0041]-[0044]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Sun, Fig. 1; Fig. 3-5; [0021]-[0027]; [0041]-[0045]). As per claim 7, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 5 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein the plurality of memory chips are high-bandwidth memories (HBMs), wherein the HBMs are hardwired to respective columns in the local systolic arrays without any switching element (MUSLEH, [0098]; [0286]; Parra Osorio, Fig. 18C; [0063]; [0119]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). As per claim 8, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 6 above, where MUSLEH, Woo, Jang, Parra Osorio and Sun further teach/suggest the package comprising: the value of the internal accumulator is passed through other DPUs to a DPU on an outer edge of the combined array (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Jang, Fig. 3-5; [0038]-[0047]; [0057]-[0070]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Sun, Fig. 1; Fig. 3-5; [0021]-[0027]; [0041]-[0045]) As per claim 9, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package further comprising: an interposer, wherein the plurality of ICs are disposed in a grid pattern on the interposer, wherein the chip-to-chip connections extend through the interposer (MUSLEH, Fig. 11C; [0175]; Parra Osorio, Fig. 18C; Fig. 24C; Fig. 27B; [0345]-[0350]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). As per claim 10, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein a fifth DPU in the combined systolic is configured to perform operations for a first layer of an AI model at the same time that a sixth DPU in the combined systolic array is configured to perform operations for a second layer of the AI model (e.g. associated with parallel computing: [0208]; [0361]-[0366] of Parra Osorio) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0208]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Sun, Fig. 1; Fig. 3-5; [0021]-[0027]; [0041]-[0045]) As per claim 11, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein each of the plurality of ICs comprises auxiliary circuitry separate from the local systolic array, wherein the package further comprises: local memory chips coupled to the auxiliary circuitry in each of the plurality of ICs (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). As per claim 14, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, Jang, and Parra Osorio further teach/suggest the package further comprising: at least one memory chip connected to a topmost IC of the plurality of ICs, wherein the plurality of ICs form a single column (MUSLEH, [0098]; [0286]; Parra Osorio, Fig. 18C) (MUSLEH, Fig. 11B-11C; [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Jang, Fig. 3-5; [0038]-[0047]; [0057]-[0070]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). As per claim 26, MUSLEH teaches/suggests a package, comprising: a plurality of integrated circuits (ICs), each comprising component (e.g. associate with a plurality of integrated circuits/dies with corresponding logic component(s): [0170]-[0181]; and [0286]), wherein the plurality of ICs are arranged accordingly, and wherein the plurality of ICs are connected accordingly (e.g. associated interconnecting the plurality of integrated circuits/dies connected to form a larger design: [0170]-[0181]; and [0286]) ([0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; and [0286]). MUSLEH does not teach the package, comprising: each comprising a local systolic array of data processing units (DPUs), wherein the plurality of ICs are arranged in a grid-like pattern and in one of the local systolic arrays a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: obtain first data from a first one of the four other DPUs, obtain second data from a second one of the four other DPUs, perform an operation using the first data and the second data, direct the first data, without changes to the first data, to a third one of the four other DPUs, and direct the second data ,without changes to the second data, to a fourth one of the four other DPUs, wherein the local systolic arrays are connected to form a larger, combined systolic array, connected so that data passes between DPUs in different local systolic arrays that are directly coupled together in a same number of clock cycles as data passes between DPUs in the same local systolic array and the local systolic arrays are connected to form bidirectional horizontal chip-to-chip connections that connect the local systolic arrays to form a row of the combined systolic array and unidirectional vertical chip-to-chip connections that connect the local systolic arrays to form a column of the combined systolic array. Woo teaches/suggests a system comprising: each comprising a local systolic array of units (e.g. associated with each die (1105) having corresponding systolic array units (1110)), and in one of the local systolic arrays operating accordingly, wherein the local systolic arrays are connected to form a systolic array (e.g. associated with the plurality of dies (1105) being connected together), connected so that data passes between units in different local systolic arrays that are directly coupled together operate accordingly as data passes between units in the same local systolic array (e.g. associated with data pass between systolic array units (1110) on different dies (1105) operating accordingly as data pass between systolic array units (1110) on same die (1105)) and the local systolic arrays are connected to form connections that connect the local systolic arrays to form systolic array and connections that connect the local systolic arrays to form systolic array (e.g. associated with connecting systolic array units (1110)) (Fig 11-12F; and [0058]-[0065]). Jang teaches/suggests a system comprising: wherein the plurality of ICs are arranged in a grid-like pattern (Fig. 3-4; [0041]; [0059]-[0061]); connected to form a larger, combined systolic array (e.g. associated with connection between plurality of neuromorphic chips 210a-210i : Fig. 3-4; [0041]; [0059]-[0061]), and horizontal chip-to-chip connections to form a row of the combined systolic array and vertical chip-to-chip connections to form a column of the combined systolic array (Fig. 3-5; [0038]-[0047]; and [0057]-[0070]). Parra Osorio teaches/suggests a system comprising: data processing units (DPUs), in a grid-like pattern (e.g. Fig. 27B; [0292]-[0294]; [0382]), and a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: operating with a first one of the four other DPUs, operating with a second one of the four other DPUs, operating with a third one of the four other DPUs, and operating with a fourth one of the four other DPUs (e.g. Fig. 27B; [0364]-[0366]; [0382]); data passes between DPUs in a same number of clock cycles as data passes between DPUs (e.g. associated with during a clock cycle data would be pass between DPUs in different parts of array architecture of Fig. 27B: [0131]; [0342]-[0353]; [0361]-[0366]; [0382]) (Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; and [0382]). Sun teaches/suggests a system comprising: obtain first data from a first one of four others, obtain second data from a second one of the four others, perform an operation using the first data and the second data, direct the first data, without changes to the first data, to a third one of the four others, and direct the second data ,without changes to the second data, to a fourth one of the four others (Fig. 1; Fig. 3; [0021]-[0027]; and [0041]). Lyuh teaches/suggests a system comprising: include bidirectional horizontal connections and unidirectional vertical connections (Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; and col. 6, ll. 6-63). It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Woo, Jang, Parra Osorio, and Sun’s architecture and Lyuh’s bidirectional connection into MUSLEH’s package for the benefit of improving efficiency and performance while implementing a robust 3D architecture (Woo, [0019]; and [0058]), increasing data processing efficiency (Jang, [0174]), improving performance by reducing bandwidth usage and power consumption (Parra Osorio, [0372]), implementing faster performing reconfigurable architecture that is energy efficient (Sun, [0015]), and minimize accesses to memory to efficiently perform calculations (Lyuh, col. 13, ll. 20-30) to obtain the invention as specified in claim 26. As per claims 28-30 and 32-36, claims 28-30 and 32-36 are rejected in accordance to the same rational and reasoning as the above rejection of claims 3-5 and 7-11. As per claim 39, MUSLEH teaches/suggests a package, comprising: a plurality of integrated circuits (ICs) each comprising component (e.g. associate with a plurality of integrated circuits/dies with corresponding logic component(s): [0170]-[0182]; and [0286]); and a separate memory device being hardwired accordingly ([0098]; and [0286]) ([0098]; [0170]-[0181]; and [0286]) [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; and [0286]). MUSLEH does not expressly teach the package, comprising: each comprising a local systolic array of data processing units (DPUs), the local systolic arrays connected so that data passes between DPUs in different local systolic arrays that are directly coupled together in a same number of clock cycles as data passes between DPUs in the same local systolic array and in one of the local systolic arrays a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: obtain first data from a first one of the four other DPUs, obtain second data from a second one of the four other DPUs, perform an operation using the first data and the second data, direct the first data, without changes to the first data, to a third one of the four other DPUs, and direct the second data ,without changes to the second data, to a fourth one of the four other DPUs; comprising a plurality of channels, wherein each of the plurality of channels is coupled to respective one or more columns in the systolic array without any switching element; and chip-to-chip connections configured to connect the local systolic array in each of the plurality of ICs to at least one other local systolic array in another one of the plurality of ICs to form a larger, combined systolic array, wherein the chip-to-chip connections include bidirectional horizontal chip-to-chip connections that connect the local systolic arrays to form a row of the combined systolic array and unidirectional vertical chip-to-chip connections that connect the local systolic arrays to form a column of the combined systolic array. Woo teaches/suggests a system comprising: each comprising a local systolic array of units (e.g. associated with each die (1105) having corresponding systolic array units (1110)), the local systolic arrays connected so that data passes between units in different local systolic arrays that are directly coupled together operate accordingly as data passes between units in the same local systolic array and in one of the local systolic arrays operating accordingly (e.g. associated with data pass between systolic array units (1110) on different dies (1105) operating accordingly as data pass between systolic array units (1110) on same die (1105)); and to connect the local systolic array in each of module to at least one other local systolic array in another module to form a systolic array (e.g. associated with the plurality of dies (1105) being connected together), include connections that connect the local systolic arrays to form systolic array and connections that connect the local systolic arrays to form systolic array (e.g. associated with connecting systolic array units (1110)) (Fig 11-12F; and [0058]-[0065]). Jang teaches/suggests a system comprising: chip-to-chip connections configured to connect each of the plurality of ICs (e.g. associated with connection between plurality of neuromorphic chips 210a-210i : Fig. 3-4; [0041]; [0059]-[0061]) to another one of the plurality of ICs to form a larger, combined systolic array, wherein the chip-to-chip connections include horizontal chip-to-chip connections that connect to form a row of the combined systolic array and vertical chip-to-chip connections that connect to form a column of the combined systolic array (Fig. 3-5; [0038]-[0047]; and [0057]-[0070]). Parra Osorio teaches/suggests a system comprising: data processing units (DPUs) (e.g. Fig. 27B; [0292]-[0294]; [0382]), data passes between DPUs in a same number of clock cycles as data passes between DPUs (e.g. associated with during a clock cycle data would be pass between DPUs in different parts of array architecture of Fig. 27B: [0131]; [0342]-[0353]; [0361]-[0366]; [0382]) and a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: operating with a first one of the four other DPUs, operating with a second one of the four other DPUs, operating with a third one of the four other DPUs, and operating with a fourth one of the four other DPUs (e.g. Fig. 27B; [0364]-[0366]; [0382]); and comprising a plurality of channels, wherein each of the plurality of channels is coupled to respective one or more columns in the systolic array without any switching element (e.g. associated with direct interconnection between memories (1841A) to (1841N) and each column of PE in Fig. 18C) (Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; and [0382]). Sun teaches/suggests a system comprising: obtain first data from a first one of four others, obtain second data from a second one of the four others, perform an operation using the first data and the second data, direct the first data, without changes to the first data, to a third one of the four others, and direct the second data ,without changes to the second data, to a fourth one of the four others (Fig. 1; Fig. 3; [0021]-[0027]; and [0041]). Lyuh teaches/suggests a system comprising: include bidirectional horizontal connections and unidirectional vertical connections (Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; and col. 6, ll. 6-63). It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Woo, Jang, Parra Osorio, and Sun’s architecture and Lyuh’s bidirectional connection into MUSLEH’s package for the benefit of improving efficiency and performance while implementing a robust 3D architecture (Woo, [0019]; and [0058]), increasing data processing efficiency (Jang, [0174]), improving performance by reducing bandwidth usage and power consumption (Parra Osorio, [0372]), implementing faster performing reconfigurable architecture that is energy efficient (Sun, [0015]), and minimize accesses to memory to efficiently perform calculations (Lyuh, col. 13, ll. 20-30) to obtain the invention as specified in claim 39. As per claim 41, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 39 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein the memory device is a high-bandwidth memory (HBM) (MUSLEH, [0098]; [0286]; and Parra Osorio, Fig. 18C; [0063]; [0119];) (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0350]; [0361]-[0366]; [0371]; [0382]). As per claim 42, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 39 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package further comprising: a plurality of memory devices coupled to the IC, wherein each of the plurality of memory devices comprises a plurality of channels, wherein each of the plurality of channels is hardwire to respective one or more columns in the systolic array without any switching element (MUSLEH, [0098]; [0286]; and Parra Osorio, Fig. 18C; [0063]; [0119]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0346]; [0349]; [0361]-[0366]; [0371]; [0382]). As per claim 43, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, Jang, and Parra Osorio further teach/suggest the package comprising: wherein a bandwidth of the bidirectional horizontal chip-to-chip connections for data flowing from left to right is higher than a bandwidth of the bidirectional horizontal chip-to-chip connections for data flowing from right to left (MUSLEH, [0098]; [0286]; and Parra Osorio, Fig. 18C; [0063]; [0119]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Jang, Fig. 3-5; [0038]-[0047]; [0057]-[0070]; and Parra Osorio, Fig. 18C; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0346]; [0349]; [0361]-[0366]; [0371]; [0382]), wherein it would have been obvious design choice to one of ordinary skilled in the art to further implement the above claimed features as bandwidth can either be higher, lower or equivalent. Claims 31 and 40 are rejected under 35 U.S.C. 103 as being unpatentable over MUSLEH et al. (US Pub.: 2021/0092069) in view of Woo et al. (US Pub.: 2022/0335283), Jang et al. (US Pub.: 2021/0125048), Parra Osorio et al. (US Pub.: 2024/0168723), Sun et al. (US Pub.: 2024/0028869), and Lyuh et al. (US Patent 11,507,429) as applied to claims 30 and 39 above, and further in view of ZHANG et al. (US Pub.: 2024/0004830). As per claim 31, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 30 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein the plurality of memory chips are configured to operate for perform a matrix multiplication in the systolic array for an artificial intelligence (Al) model (Parra Osorio, [0361]-[0366]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]), but MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh do not expressly teach the package comprising: configured to store weight data. ZHANG teaches/suggests a system comprising: configured to store weight data (e.g. associated with weight buffer (221) and (222) in Fig. 3D) (Fig. 3C to Fig. 3D; [0045]-[0058]; and [0062]-[0063]). It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include ZHANG’s architecture into MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh’s package for the benefit of using space efficiently while avoiding congestion and facilitating data flow (ZHANG, [0055]) to obtain the invention as specified in claim 31. As per claim 40, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 39 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein the memory device is configured to operate for performing a matrix multiplication in the systolic array for an artificial intelligence (Al) model (Parra Osorio, [0361]-[0366]) (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0350]; [0361]-[0366]; [0371]; [0382]), but MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh do not expressly teach the package comprising: wherein the memory device is configured to store weight data. ZHANG teaches/suggests a system comprising: wherein the memory device is configured to store weight data (e.g. associated with weight buffer (221) and (222) in Fig. 3D) (Fig. 3C to Fig. 3D; [0045]-[0058]; and [0062]-[0063]). It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include ZHANG’s architecture into MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh’s accelerator for the benefit of using space efficiently while avoiding congestion and facilitating data flow (ZHANG, [0055]) to obtain the invention as specified in claim 40. Claims 12-13 and 37-38 are rejected under 35 U.S.C. 103 as being unpatentable over MUSLEH et al. (US Pub.: 2021/0092069) in view of Woo et al. (US Pub.: 2022/0335283), Jang et al. (US Pub.: 2021/0125048), Parra Osorio et al. (US Pub.: 2024/0168723), Sun et al. (US Pub.: 2024/0028869), and Lyuh et al. (US Patent 11,507,429) as applied to claims 11 and 36 above, and further in view of Wang et al. (US Pub.: 2022/0108688). As per claim 12, MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 11 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein the auxiliary circuitry is configured to perform operations that use data that is stored in the local memory chips, wherein the operations are part of an Al model (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]), but MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh do not teach the package comprising: self-attention operations that use data from previous tokens, wherein self-attention operations operate accordingly. Wang teach/suggest a system comprising: self-attention operations that use data from previous tokens, wherein self-attention operations operate accordingly (Claim 1; and [0031]) It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Wang’s self-attention operations into MUSLEH, Woo, Jang, Parra Osorio, Sun and Lyuh’s package for the benefit of improving model generation (Wang, [0021]) to obtain the invention as specified in claim 12. As per claim 13, MUSLEH, Woo, Jang, Parra Osorio, Sun, Lyuh, and Wang teach/suggest all the claimed features of claim 12 above, where MUSLEH, Woo, Parra Osorio, and Wang further teach/suggest the package comprising: wherein the local systolic arrays do not communicate with the local memory chips (e.g. associated with data security that secure data in memory from being accessed: MUSLEH, [0185]; [0207]; and Parra Osorio, [0215]; [0356]) (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Wang, Claim 1; [0031]). As per claims 37-38, claims 37-38 are rejected in accordance to the same rational and reasoning as the above rejection of claims 12-13. Claims 15, 17-23 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over MUSLEH et al. (US Pub.: 2021/0092069) in view of Woo et al. (US Pub.: 2022/0335283), Jang et al. (US Pub.: 2021/0125048), Parra Osorio et al. (US Pub.: 2024/0168723), ZHANG et al. (US Pub.: 2024/0004830), Sun et al. (US Pub.: 2024/0028869), and Lyuh et al. (US Patent 11,507,429). As per claim 15, MUSLEH teaches/suggests an Al accelerator, comprising: a plurality of integrated circuits (ICs), each comprising component (e.g. associate with a plurality of integrated circuits/dies with corresponding logic component(s): [0170]-[0181]; and [0286]); wherein the chip-to-chip connections and the plurality of ICs are connected to operate accordingly; and a plurality of memory chips configured to store data, data stored in one of the plurality of memory chips and the plurality of memory chips coupled to the plurality of ICs ([0098]; [0170]-[0181]; and [0286]) ([0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; and [0286]). MUSLEH does not teach the package comprising: each comprising a local systolic array of data processing units (DPUs), where a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: obtain first data from a first one of the four other DPUs, obtain second data from a second one of the four other DPUs, perform an operation using the first data and the second data, direct the first data, without changes to the first data, to a third one of the four other DPUs, and direct the second data ,without changes to the second data, to a fourth one of the four other DPUs; chip-to-chip connections configured to connect the local systolic arrays to form a larger, combined systolic array, configured so that data passes between DPUs in different local systolic arrays that are directly coupled together in a same number of clock cycles as data passes between DPUs in the same local systolic array and the chip-to-chip connections include bidirectional horizontal chip-to-chip connections that connect the local systolic arrays to form a row of the combined systolic array and unidirectional vertical chip-to-chip connections that connect the local systolic arrays to form a column of the combined systolic array; and to store weights for performing matrix multiplications in the combined systolic array as part of an Al model, the first data including a weight stored and forming a top row of the combined systolic array. Woo teaches/suggests a system comprising: each comprising a local systolic array of units (e.g. associated with each die (1105) having corresponding systolic array units (1110)); to connect the local systolic arrays to form a systolic array (e.g. associated with the plurality of dies (1105) being connected together), configured so that data passes between units in different local systolic arrays that are directly coupled together operate accordingly as data passes between units in the same local systolic array (e.g. associated with data pass between systolic array units (1110) on different dies (1105) operating accordingly as data pass between systolic array units (1110) on same die (1105)) and connect the local systolic arrays to form systolic array and connect the local systolic arrays to form systolic array (e.g. associated with connecting systolic array units (1110)); and operating with the systolic array, operating with the systolic array (e.g. Fig. 11) (Fig 11-12F; and [0058]-[0065]). Jang teaches/suggests a system comprising: chip-to-chip connections configured to connect to form a larger, combined systolic array (e.g. associated with connection between plurality of neuromorphic chips 210a-210i : Fig. 3-4; [0041]; [0059]-[0061]), and the chip-to-chip connections include horizontal chip-to-chip connections that form a row of the combined systolic array and vertical chip-to-chip connections that form a column of the combined systolic array; and operating with the combined systolic array, and operating with the combined systolic array (Fig. 3-5; [0038]-[0047]; and [0057]-[0070]). Parra Osorio teaches/suggests a system comprising: data processing units (DPUs) (e.g. Fig. 27B; [0292]-[0294]; [0382]), where a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: operating with a first one of the four other DPUs, operating with a second one of the four other DPUs, operating with a third one of the four other DPUs, and operating with a fourth one of the four other DPUs (e.g. Fig. 27B; [0364]-[0366]; [0382]); and data passes between DPUs in a same number of clock cycles as data passes between DPUs (e.g. associated with during a clock cycles data would be pass between DPUs in different parts of array architecture of Fig. 27B: [0131]; [0342]-[0353]; [0361]-[0366]; [0382]); and weight for performing matrix multiplications in the systolic array as part of an Al model ([0172]; [0195]; [0292]; and [0361]-[0366]) (Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; and [0382]). ZHANG teaches/suggests a system comprising: to store weights (e.g. associated with weight buffer (221) and (222) in Fig. 3D), forming a top row of systolic array (e.g. associate with weight buffer being positioned at the top row) (Fig. 3C to Fig. 3D; [0045]-[0058]; and [0062]-[0063]). Sun teaches/suggests a system comprising: obtain first data from a first one of four others, obtain second data from a second one of the four others, perform an operation using the first data and the second data, direct the first data, without changes to the first data, to a third one of the four others, and direct the second data ,without changes to the second data, to a fourth one of the four others; and the first data including a weight (e.g. associated with weight (206) in Fig. 3) (Fig. 1; Fig. 3; [0021]-[0027]; and [0041]). Lyuh teaches/suggests a system comprising: include bidirectional horizontal connections and unidirectional vertical connections (Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; and col. 6, ll. 6-63). It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Woo, Jang, Parra Osorio, ZHANG and Sun’s architecture and Lyuh’s bidirectional connection into MUSLEH’s accelerator for the benefit of improving efficiency and performance while implementing a robust 3D architecture (Woo, [0019]; and [0058]), increasing data processing efficiency (Jang, [0174]), improving performance by reducing bandwidth usage and power consumption (Parra Osorio, [0372]), using space efficiently while avoiding congestion and facilitating data flow (ZHANG, [0055]), implementing faster performing reconfigurable architecture that is energy efficient (Sun, [0015]), and minimize accesses to memory to efficiently perform calculations (Lyuh, col. 13, ll. 20-30) to obtain the invention as specified in claim 15. As per claim 17, MUSLEH, Woo, Jang, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, Jang, Parra Osorio and Lyuh further teach/suggest the Al accelerator comprising wherein the bidirectional horizontal chip-to-chip connections permit a rightmost IC within a row of the plurality of ICs in the combined systolic array to feed back data to a leftmost IC within the row (Parra Osorio, Fig. 18C; Fig. 27B; [0292]-[0294]; and Woo, Fig 11-12F; [0058]-[0065]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Jang, Fig. 3-5; [0038]-[0047]; [0057]-[0070]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Lyuh, Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; col. 6, ll. 6-63). As per claim 18, MUSLEH, Woo, Jang, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, Jang, Parra Osorio and Lyuh further teach/suggest the Al accelerator comprising wherein the unidirectional vertical chip-to-chip connections are configured such that data flows only from a topmost row of the plurality of ICs in the combined systolic array to a bottom most row of the plurality of ICs in the combined systolic array (Parra Osorio, Fig. 18C; Fig. 27B; and Woo, Fig 11-12F; [0058]-[0065]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Jang, Fig. 3-5; [0038]-[0047]; [0057]-[0070]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Lyuh, Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; col. 6, ll. 6-63). As per claim 19, MUSLEH, Woo, Jang, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator comprising wherein the plurality of memory chips are high- bandwidth memories (HBMs), wherein the HBMs are hardwired to respective columns in the local systolic arrays without any switching element (MUSLEH, [0098]; [0286]; Parra Osorio, Fig. 18C; [0063]; [0119]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0349]; [0361]-[0366]; [0371]; [0382]). As per claim 20, MUSLEH, Woo, Jang, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 19 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator comprising wherein multiple HBMs are hardwired to each of the plurality of ICs in the top row (MUSLEH, [0098]; [0286]; Parra Osorio, Fig. 18C; [0063]; [0119]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0346]; [0349]; [0361]-[0366]; [0371]; [0382]). As per claim 21, MUSLEH, Woo, Jang, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator further comprising: an interposer, wherein the plurality of ICs are disposed in a grid pattern on the interposer, wherein the chip-to-chip connections extend through the interposer (MUSLEH, Fig. 11C; [0175]; Parra Osorio, Fig. 18C; Fig. 24C; Fig. 27B; [0345]-[0350]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). As per claim 22, MUSLEH, Woo, Jang, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator comprising: wherein the plurality of ICs are stacked on each other, wherein the chip-to-chip connections are formed using microbumps or pillars connecting the plurality of ICs (MUSLEH, Fig. 11B-11C; [0172]-[0177]; Parra Osorio, Fig. 24B-24C; [0342]-[0350]; and Woo, Fig 11-12F; [0058]-[0065]) (MUSLEH, Fig. 11B-11C; [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). As per claim 23, MUSLEH, Woo, Jang, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator comprising: wherein each of the plurality of ICs comprises auxiliary circuitry separate from the local systolic array, wherein the Al accelerator further comprises: local memory chips coupled to the auxiliary circuitry in each of the plurality of ICs (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). As per claim 25, MUSLEH, Woo, Jang, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator comprising: wherein the local systolic arrays do not communicate with the local memory chips (e.g. associated with data security that secure data in memory from being accessed: MUSLEH, [0185]; [0207]; and Parra Osorio, [0215]; [0356]) (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). Claims 24 are rejected under 35 U.S.C. 103 as being unpatentable over MUSLEH et al. (US Pub.: 2021/0092069) in view of Woo et al. (US Pub.: 2022/0335283), Jang et al. (US Pub.: 2021/0125048), Parra Osorio et al. (US Pub.: 2024/0168723), ZHANG et al. (US Pub.: 2024/0004830), Sun et al. (US Pub.: 2024/0028869) and Lyuh et al. (US Patent 11,507,429) as applied to claim 23 above, and further in view of Wang et al. (US Pub.: 2022/0108688). As per claim 24, MUSLEH, Woo, Jang, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 23 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator comprising: wherein the auxiliary circuitry is configured to perform operations data that is stored in the local memory chips, wherein the operations are part of the Al model (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0350]; [0361]-[0366]; [0371]; [0382]), but MUSLEH, Woo, Jang, Parra Osorio, ZHANG, Sun and Lyuh do not teach the package comprising: self-attention operations that use data from previous tokens, wherein self-attention operations operate accordingly. Wang teach/suggest a system comprising: self-attention operations that use data from previous tokens, wherein self-attention operations operate accordingly (Claim 1; and [0031]) It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Wang’s self-attention operations into MUSLEH, Woo, Jang, Parra Osorio, ZHANG, Sun and Lyuh’s package for the benefit of improving model generation (Wang, [0021]) to obtain the invention as specified in claim 24. II. CLOSING COMMENTS CONCLUSION STATUS OF CLAIMS IN THE APPLICATION The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P. 707.07(i): CLAIMS REJECTED IN THE APPLICATION Per the instant office action, claims 1, 3-15, 17-26, and 28-43 have received a first action on the merits and are subject of a first action non-final. DIRECTION OF FUTURE CORRESPONDENCES Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHUN KUAN LEE whose telephone number is (571)272-0671. The examiner can normally be reached Monday-Friday. IMPORTANT NOTE If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Idriss Alrobaye can be reached on (571) 270-1023. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /CHUN KUAN LEE/Primary Examiner Art Unit 2181 May 07, 2026
Read full office action

Prosecution Timeline

Show 4 earlier events
Jun 17, 2025
Request for Continued Examination
Jun 20, 2025
Response after Non-Final Action
Jul 09, 2025
Non-Final Rejection mailed — §103
Sep 29, 2025
Response Filed
Oct 24, 2025
Final Rejection mailed — §103
Apr 24, 2026
Request for Continued Examination
Apr 28, 2026
Response after Non-Final Action
May 12, 2026
Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12639235
CROSS-DOMAIN VOLTAGE BUS RESOURCE SHARING FOR IMPROVED POWER DELIVERY NETWORK
2y 9m to grant Granted May 26, 2026
Patent 12639237
MEMORY DEVICE WITH INTERNAL PROCESSING INTERFACE
1y 9m to grant Granted May 26, 2026
Patent 12619435
Virtual Idle Loops
3y 7m to grant Granted May 05, 2026
Patent 12613814
DEVICES USING CHIPLET BASED STORAGE ARCHITECTURES
3y 8m to grant Granted Apr 28, 2026
Patent 12602270
KV-CACHE STREAMING FOR IMPROVED PERFORMANCE AND FAULT TOLERANCE IN GENERATIVE MODEL SERVING
2y 6m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

5-6
Expected OA Rounds
68%
Grant Probability
72%
With Interview (+3.7%)
3y 4m (~3m remaining)
Median Time to Grant
High
PTA Risk
Based on 676 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month