Last updated: April 19, 2026
Application No. 18/195,769
MULTI-CHIP SYSTOLIC ARRAYS

Final Rejection §103
Filed
May 10, 2023
Examiner
LEE, CHUN KUAN
Art Unit
2181
Tech Center
2100 — Computer Architecture & Software
Assignee
Etched AI Inc.
OA Round
4 (Final)
Interview Optional

— +3.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 669 resolved cases, 2023–2026
Examiner Intelligence

LEE, CHUN KUAN View full profile →
Grants 68% — above average
Career Allow Rate
455 granted / 669 resolved
+13.0% vs TC avg
Minimal +3% lift
Without
With
+3.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
32 currently pending
Career history
701
Total Applications
across all art units
Statute-Specific Performance

§101
1.7%
-38.3% vs TC avg
§103
79.4%
+39.4% vs TC avg
§102
3.3%
-36.7% vs TC avg
§112
3.5%
-36.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 669 resolved cases
Office Action

§103
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

RESPONSE TO ARGUMENTS
Applicant’s arguments with respect to claims 1, 3-15, 17-26, and 28-42 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

I. REJECTIONS BASED ON PRIOR ART
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-11, 14, 26, 28-30, 32-36, 39 and 41-42  are rejected under 35 U.S.C. 103 as being unpatentable over MUSLEH et al. (US Pub.: 2021/0092069) in view of Woo et al. (US Pub.: 2022/0335283), Parra Osorio et al. (US Pub.: 2024/0168723), Sun et al. (US Pub.: 2024/0028869), and Lyuh et al. (US Patent 11,507,429).

As per claim 1, MUSLEH teaches/suggests a package comprising: a plurality of integrated circuits (ICs), each comprising component (e.g. associate with a plurality of integrated circuits/dies with corresponding logic component(s): [0169]-[0182]; and [0286]); and chip-to-chip connections configured to connect the plurality of ICs to another one of the plurality of ICs to form a larger, combined design (e.g. associated interconnecting the plurality of integrated circuits/dies to form a larger design: [0170]-[0182]; and [0286]) ([0098]; [0169]-[0182]; and [0286]), wherein the chip-to-chip connections and the plurality of ICs are configured to operate accordingly, and the chip-to-chip connections include chip- to-chip connections that connect to form the combined design and chip-to-chip connections that connect to form the combined design ([0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; and [0286]). 
MUSLEH does not teach the package comprising: 
each comprising a local systolic array of data processing units (DPUs), where a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: 
obtain first data from a first one of the four other DPUs, 
obtain second data from a second one of the four other DPUs, 
perform an operation using the first data and the second data, 
direct the first data, without changes to the first data, to a third one of the four other DPUs, and 
direct the second data ,without changes to the second data, to a fourth one of the four other DPUs; and 
to connect the local systolic array in each of module to at least one other local systolic array in another module to form a systolic array, being configured so that data passes between DPUs in different local systolic arrays that are directly coupled together in a same number of clock cycles as data passes between DPUs in the same local systolic array, and include bidirectional horizontal connections that connect the local systolic arrays to form a row of systolic array and unidirectional vertical connections that connect the local systolic arrays to form a column of systolic array.
Woo teaches/suggests a system comprising: each comprising a local systolic array of units (e.g. associated with each die (1105) having corresponding systolic array units (1110)); and to connect the local systolic array in each of module to at least one other local systolic array in another module to form a systolic array (e.g. associated with the plurality of dies (1105) being connected together), being configured so that data passes between units in different local systolic arrays that are directly coupled together operate accordingly as data passes between units in the same local systolic array (e.g. associated with data pass between systolic array units (1110) on different dies (1105) operating accordingly as data pass between systolic array units (1110) on same die (1105)), and connect the local systolic arrays to form systolic array and connect the local systolic arrays to form systolic array (e.g. associated with connecting systolic array units (1110)) (Fig 11-12F; and [0058]-[0065]).
Parra Osorio teaches/suggests a system comprising: data processing units (DPUs) (e.g. Fig. 27B; [0292]-[0294]; [0382]), where a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: operating with a first one of the four other DPUs, operating with a second one of the four other DPUs, operating with a third one of the four other DPUs, and operating with a fourth one of the four other DPUs (e.g. Fig. 27B; [0364]-[0366]; [0382]); and data passes between DPUs in a same number of clock cycles as data passes between DPUs (e.g. associated with during a clock cycle data would be pass between DPUs in different parts of array architecture of Fig. 27B: [0131]; [0342]-[0353]; [0361]-[0366]; [0382]) (Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; and [0382]).
Sun teaches/suggests a system comprising: obtain first data from a first one of four others, obtain second data from a second one of the four others, perform an operation using the first data and the second data, direct the first data, without changes to the first data, to a third one of the four others, and direct the second data ,without changes to the second data, to a fourth one of the four others (Fig. 1; Fig. 3; [0021]-[0027]; and [0041]).
Lyuh teaches/suggests a system comprising: include bidirectional horizontal connections to form a row of elements and unidirectional vertical connections to form a column elements (Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; and col. 6, ll. 6-63).
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Woo, Parra Osorio, and Sun’s architecture and Lyuh’s bidirectional connection into MUSLEH’s package for the benefit of improving efficiency and performance while implementing a robust 3D architecture (Woo, [0019]; and [0058]), improving performance by reducing bandwidth usage and power consumption (Parra Osorio, [0372]), implementing faster performing reconfigurable architecture that is energy efficient (Sun, [0015]), and minimize accesses to memory to efficiently perform calculations (Lyuh, col. 13, ll. 20-30) to obtain the invention as specified in claim 1.

As per claim 3, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, Parra Osorio, and Lyuh further teach/suggest the package comprising: wherein the bidirectional horizontal chip-to-chip connections permit a rightmost IC within a row of the plurality of ICs in the combined systolic array to feed back data to a leftmost IC within the row (Parra Osorio, Fig. 18C; Fig. 27B; [0292]-[0294]; and Woo, Fig 11-12F; [0058]-[0065]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Lyuh, Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; col. 6, ll. 6-63). 

As per claim 4, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, Parra Osorio, and Lyuh further teach/suggest the package comprising: wherein the unidirectional vertical chip-to-chip connections are configured such that data flows only from a topmost row of the plurality of ICs in the combined systolic array to a bottom most row of the plurality of ICs in the combined systolic array (Parra Osorio, Fig. 18C; Fig. 27B; and Woo, Fig 11-12F; [0058]-[0065]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Lyuh, Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; col. 6, ll. 6-63). 

As per claim 5, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 3 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package further comprising: a plurality of memory chips, wherein at least one of the plurality of memory chips is connected to each one of the plurality of ICs in a topmost row (Parra Osorio, Fig. 18C; Fig. 27B: [0131]; [0342]-[0353]; [0361]-[0366]; [0382]; and Woo, Fig 11-12F; [0058]-[0065]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). 

As per claim 6, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, Parra Osorio and Sun further teach/suggest the package comprising: wherein the first DPU is further configured to: add a result of the operation to an internal accumulator (e.g. associated with adder (240) of Sun), and after performing two or more operations that are summed in the internal accumulator, directing a value of the internal accumulator to the first one of the four other DPUs (Sun, Fig. 3-5; [0041]-[0044]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Sun, Fig. 1; Fig. 3-5; [0021]-[0027]; [0041]-[0045]).

As per claim 7, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 5 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein the plurality of memory chips are high-bandwidth memories (HBMs), wherein the HBMs are hardwired to respective columns in the local systolic arrays without any switching element (MUSLEH, [0098]; [0286]; Parra Osorio, Fig. 18C; [0063]; [0119]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). 

As per claim 8, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 6 above, where MUSLEH, Woo, Parra Osorio and Sun further teach/suggest the package comprising: the value of the internal accumulator is passed through other DPUs to a DPU on an outer edge of the combined array (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Sun, Fig. 1; Fig. 3-5; [0021]-[0027]; [0041]-[0045])

As per claim 9, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package further comprising: an interposer, wherein the plurality of ICs are disposed in a grid pattern on the interposer, wherein the chip-to-chip connections extend through the interposer (MUSLEH, Fig. 11C; [0175]; Parra Osorio, Fig. 18C; Fig. 24C; Fig. 27B;  [0345]-[0350]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). 

As per claim 10, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein a fifth DPU in the combined systolic is configured to perform operations for a first layer of an AI model at the same time that a sixth DPU in the combined systolic array is configured to perform operations for a second layer of the AI model (e.g. associated with parallel computing: [0208]; [0361]-[0366] of Parra Osorio) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0208]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Sun, Fig. 1; Fig. 3-5; [0021]-[0027]; [0041]-[0045])

As per claim 11, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein each of the plurality of ICs comprises auxiliary circuitry separate from the local systolic array, wherein the package further comprises: local memory chips coupled to the auxiliary circuitry in each of the plurality of ICs (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). 

As per claim 14, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 1 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package further comprising: at least one memory chip connected to a topmost IC of the plurality of ICs, wherein the plurality of ICs form a single column (MUSLEH, [0098]; [0286]; Parra Osorio, Fig. 18C) (MUSLEH, Fig. 11B-11C; [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]).

As per claim 26, MUSLEH teaches/suggests a package, comprising: a plurality of integrated circuits (ICs), each comprising component (e.g. associate with a plurality of integrated circuits/dies with corresponding logic component(s): [0170]-[0181]; and [0286]), wherein the plurality of ICs are arranged accordingly, and being connected to form a larger, combined design, wherein the plurality of ICs are connected accordingly (e.g. associated interconnecting the plurality of integrated circuits/dies connected to form a larger design: [0170]-[0181]; and [0286]) and connected to form chip-to-chip connections that connect to form the combined design and chip-to-chip connections that connect to form the combined design ([0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; and [0286]) ([0098]; [0170]-[0181]; and [0286]) ([0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; and [0286]).
MUSLEH does not teach the package, comprising:
each comprising a local systolic array of data processing units (DPUs), 
being arranged in a grid-like pattern and in one of the local systolic arrays a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: 
obtain first data from a first one of the four other DPUs, 
obtain second data from a second one of the four other DPUs, 
perform an operation using the first data and the second data,
direct the first data, without changes to the first data, to a third one of the four other DPUs, and 
direct the second data ,without changes to the second data, to a fourth one of the four other DPUs,
wherein the local systolic arrays are connected to form a systolic array, 
connected so that data passes between DPUs in different local systolic arrays that are directly coupled together in a same number of clock cycles as data passes between DPUs in the same local systolic array and the local systolic arrays are connected to form bidirectional horizontal connections that connect the local systolic arrays to form a row of systolic array and unidirectional vertical connections that connect the local systolic arrays to form a column of systolic array.
Woo teaches/suggests a system comprising: each comprising a local systolic array of units (e.g. associated with each die (1105) having corresponding systolic array units (1110)), and in one of the local systolic arrays operating accordingly, wherein the local systolic arrays are connected to form a systolic array (e.g. associated with the plurality of dies (1105) being connected together), connected so that data passes between units in different local systolic arrays that are directly coupled together operate accordingly as data passes between units in the same local systolic array (e.g. associated with data pass between systolic array units (1110) on different dies (1105) operating accordingly as data pass between systolic array units (1110) on same die (1105)) and the local systolic arrays are connected to form connections that connect the local systolic arrays to form systolic array and connections that connect the local systolic arrays to form systolic array (e.g. associated with connecting systolic array units (1110)) (Fig 11-12F; and [0058]-[0065]).
Parra Osorio teaches/suggests a system comprising: data processing units (DPUs), in a grid-like pattern (e.g. Fig. 27B; [0292]-[0294]; [0382]), and a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: operating with a first one of the four other DPUs, operating with a second one of the four other DPUs, operating with a third one of the four other DPUs, and operating with a fourth one of the four other DPUs (e.g. Fig. 27B; [0364]-[0366]; [0382]); data passes between DPUs in a same number of clock cycles as data passes between DPUs (e.g. associated with during a clock cycle data would be pass between DPUs in different parts of array architecture of Fig. 27B: [0131]; [0342]-[0353]; [0361]-[0366]; [0382]) (Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; and [0382]).
Sun teaches/suggests a system comprising: obtain first data from a first one of four others, obtain second data from a second one of the four others, perform an operation using the first data and the second data, direct the first data, without changes to the first data, to a third one of the four others, and direct the second data ,without changes to the second data, to a fourth one of the four others (Fig. 1; Fig. 3; [0021]-[0027]; and [0041]).
Lyuh teaches/suggests a system comprising: include bidirectional horizontal connections to form a row of elements and unidirectional vertical connections to form a column of elements (Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; and col. 6, ll. 6-63).
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Woo, Parra Osorio, and Sun’s architecture and Lyuh’s bidirectional connection into MUSLEH’s package for the benefit of improving efficiency and performance while implementing a robust 3D architecture (Woo, [0019]; and [0058]), improving performance by reducing bandwidth usage and power consumption (Parra Osorio, [0372]), implementing faster performing reconfigurable architecture that is energy efficient (Sun, [0015]), and minimize accesses to memory to efficiently perform calculations (Lyuh, col. 13, ll. 20-30) to obtain the invention as specified in claim 26.

As per claims 28-30 and 32-36, claims 28-30 and 32-36 are rejected in accordance to the same rational and reasoning as the above rejection of claims 3-5 and 7-11.

As per claim 39, MUSLEH teaches/suggests a package, comprising: a plurality of integrated circuits (ICs) each comprising component (e.g. associate with a plurality of integrated circuits/dies with corresponding logic component(s): [0170]-[0182]; and [0286]); and a separate memory device being hardwired accordingly ([0098]; and [0286]) ([0098]; [0170]-[0181]; and [0286]); and chip-to-chip connections configured to connect element in each of the plurality of ICs to at least one other element in another one of the plurality of ICs to form a larger, combined design (e.g. associated interconnecting the plurality of integrated circuits/dies to form a larger design: [0170]-[0182]; and [0286]) ([0098]; [0169]-[0182]; and [0286]), wherein the chip-to-chip connections include chip-to-chip connections that connect to form the combined design and chip-to-chip connections that connect to form the combined design ([0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; and [0286]).
MUSLEH does not expressly teach the package, comprising:
each comprising a local systolic array of data processing units (DPUs), the local systolic arrays connected so that data passes between DPUs in different local systolic arrays that are directly coupled together in a same number of clock cycles as data passes between DPUs in the same local systolic array and in one of the local systolic arrays a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: 
obtain first data from a first one of the four other DPUs, 
obtain second data from a second one of the four other DPUs, 
perform an operation using the first data and the second data,
direct the first data, without changes to the first data, to a third one of the four other DPUs, and 
direct the second data ,without changes to the second data, to a fourth one of the four other DPUs; 
comprising a plurality of channels, wherein each of the plurality of channels is coupled to respective one or more columns in the systolic array without any switching element; and
connect the local systolic array to other local systolic array to form systolic array, include bidirectional horizontal connections that connect the local systolic arrays to form a row of systolic array and unidirectional vertical connections that connect the local systolic arrays to form a column of systolic array.
Woo teaches/suggests a system comprising: each comprising a local systolic array of units (e.g. associated with each die (1105) having corresponding systolic array units (1110)), the local systolic arrays connected so that data passes between units in different local systolic arrays that are directly coupled together operate accordingly as data passes between units in the same local systolic array and in one of the local systolic arrays operating accordingly (e.g. associated with data pass between systolic array units (1110) on different dies (1105) operating accordingly as data pass between systolic array units (1110) on same die (1105)); and connect the local systolic array to other local systolic array to form systolic array, include connections that connect the local systolic arrays to form systolic array and connections that connect the local systolic arrays to form systolic array (e.g. associated with connecting systolic array units (1110)) (Fig 11-12F; and [0058]-[0065]).
Parra Osorio teaches/suggests a system comprising: data processing units (DPUs) (e.g. Fig. 27B; [0292]-[0294]; [0382]), data passes between DPUs in a same number of clock cycles as data passes between DPUs (e.g. associated with during a clock cycle data would be pass between DPUs in different parts of array architecture of Fig. 27B: [0131]; [0342]-[0353]; [0361]-[0366]; [0382]) and a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: operating with a first one of the four other DPUs, operating with a second one of the four other DPUs, operating with a third one of the four other DPUs, and operating with a fourth one of the four other DPUs (e.g. Fig. 27B; [0364]-[0366]; [0382]); and comprising a plurality of channels, wherein each of the plurality of channels is coupled to respective one or more columns in the systolic array without any switching element (e.g. associated with direct interconnection between memories (1841A) to (1841N) and each column of PE in Fig. 18C) (Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; and [0382]).
Sun teaches/suggests a system comprising: obtain first data from a first one of four others, obtain second data from a second one of the four others, perform an operation using the first data and the second data, direct the first data, without changes to the first data, to a third one of the four others, and direct the second data ,without changes to the second data, to a fourth one of the four others (Fig. 1; Fig. 3; [0021]-[0027]; and [0041]).
Lyuh teaches/suggests a system comprising: include bidirectional horizontal connections to form a row of elements and unidirectional vertical connections to form a column of elements (Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; and col. 6, ll. 6-63).
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Woo, Parra Osorio, and Sun’s architecture and Lyuh’s bidirectional connection into MUSLEH’s package for the benefit of improving efficiency and performance while implementing a robust 3D architecture (Woo, [0019]; and [0058]), improving performance by reducing bandwidth usage and power consumption (Parra Osorio, [0372]), implementing faster performing reconfigurable architecture that is energy efficient (Sun, [0015]), and minimize accesses to memory to efficiently perform calculations (Lyuh, col. 13, ll. 20-30) to obtain the invention as specified in claim 39.

As per claim 41, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 39 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein the memory device is a high-bandwidth memory (HBM) (MUSLEH, [0098]; [0286]; and Parra Osorio, Fig. 18C; [0063]; [0119];) (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0350]; [0361]-[0366]; [0371]; [0382]).

As per claim 42, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 39 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package further comprising: a plurality of memory devices coupled to the IC, wherein each of the plurality of memory devices comprises a plurality of channels, wherein each of the plurality of channels is hardwire to respective one or more columns in the systolic array without any switching element (MUSLEH, [0098]; [0286]; and Parra Osorio, Fig. 18C; [0063]; [0119]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0346]; [0349]; [0361]-[0366]; [0371]; [0382]).

Claims 31 and 40 are rejected under 35 U.S.C. 103 as being unpatentable over MUSLEH et al. (US Pub.: 2021/0092069) in view of Woo et al. (US Pub.: 2022/0335283), Parra Osorio et al. (US Pub.: 2024/0168723), Sun et al. (US Pub.: 2024/0028869), and Lyuh et al. (US Patent 11,507,429) as applied to claims 30 and 39 above, and further in view of ZHANG et al. (US Pub.: 2024/0004830).

As per claim 31, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 30 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein the plurality of memory chips are configured to operate for perform a matrix multiplication in the systolic array for an artificial intelligence (Al) model (Parra Osorio, [0361]-[0366]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]), but MUSLEH, Woo, Parra Osorio, Sun and Lyuh do not expressly teach the package comprising: configured to store weight data.
ZHANG teaches/suggests a system comprising: configured to store weight data (e.g. associated with weight buffer (221) and (222) in Fig. 3D) (Fig. 3C to Fig. 3D; [0045]-[0058]; and [0062]-[0063]). 
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include ZHANG’s architecture into MUSLEH, Woo, Parra Osorio, Sun and Lyuh’s package for the benefit of using space efficiently while avoiding congestion and facilitating data flow (ZHANG, [0055]) to obtain the invention as specified in claim 31.

As per claim 40, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 39 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein the memory device is configured to operate for performing a matrix multiplication in the systolic array for an artificial intelligence (Al) model (Parra Osorio, [0361]-[0366]) (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0350]; [0361]-[0366]; [0371]; [0382]), but MUSLEH, Woo, Parra Osorio, Sun and Lyuh do not expressly teach the package comprising: wherein the memory device is configured to store weight data. 
ZHANG teaches/suggests a system comprising: wherein the memory device is configured to store weight data (e.g. associated with weight buffer (221) and (222) in Fig. 3D) (Fig. 3C to Fig. 3D; [0045]-[0058]; and [0062]-[0063]). 
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include ZHANG’s architecture into MUSLEH, Woo, Parra Osorio, Sun and Lyuh’s accelerator for the benefit of using space efficiently while avoiding congestion and facilitating data flow (ZHANG, [0055]) to obtain the invention as specified in claim 40.

Claims 12-13 and 37-38 are rejected under 35 U.S.C. 103 as being unpatentable over MUSLEH et al. (US Pub.: 2021/0092069) in view of Woo et al. (US Pub.: 2022/0335283), Parra Osorio et al. (US Pub.: 2024/0168723), Sun et al. (US Pub.: 2024/0028869), and Lyuh et al. (US Patent 11,507,429) as applied to claims 11 and 36 above, and further in view of Wang et al. (US Pub.: 2022/0108688).

As per claim 12, MUSLEH, Woo, Parra Osorio, Sun and Lyuh teach/suggest all the claimed features of claim 11 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the package comprising: wherein the auxiliary circuitry is configured to perform operations that use data that is stored in the local memory chips, wherein the operations are part of an Al model (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]), but MUSLEH, Woo, Parra Osorio, Sun and Lyuh do not teach the package comprising: self-attention operations that use data from previous tokens, wherein self-attention operations operate accordingly. 
Wang teach/suggest a system comprising: self-attention operations that use data from previous tokens, wherein self-attention operations operate accordingly (Claim 1; and [0031])
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Wang’s self-attention operations into MUSLEH, Woo, Parra Osorio, Sun and Lyuh’s package for the benefit of improving model generation (Wang, [0021]) to obtain the invention as specified in claim 12.

As per claim 13, MUSLEH, Woo, Parra Osorio, Sun, Lyuh, and Wang teach/suggest all the claimed features of claim 12 above, where MUSLEH, Woo, Parra Osorio, and Wang further teach/suggest the package comprising: wherein the local systolic arrays do not communicate with the local memory chips (e.g. associated with data security that secure data in memory from being accessed: MUSLEH, [0185]; [0207]; and Parra Osorio, [0215]; [0356]) (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Wang, Claim 1; [0031]). 

As per claims 37-38, claims 37-38 are rejected in accordance to the same rational and reasoning as the above rejection of claims 12-13.

Claims 15, 17-23 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over MUSLEH et al. (US Pub.: 2021/0092069) in view of Woo et al. (US Pub.: 2022/0335283), Parra Osorio et al. (US Pub.: 2024/0168723), ZHANG et al. (US Pub.: 2024/0004830), Sun et al. (US Pub.: 2024/0028869), and Lyuh et al. (US Patent 11,507,429).

As per claim 15, MUSLEH teaches/suggests an Al accelerator, comprising: a plurality of integrated circuits (ICs), each comprising component (e.g. associate with a plurality of integrated circuits/dies with corresponding logic component(s): [0170]-[0181]; and [0286]); chip-to-chip connections configured to connect to form a larger, combined design, wherein the chip-to-chip connections and the plurality of ICs are configured to operate accordingly (e.g. associated interconnecting the plurality of integrated circuits/dies to form a larger design: [0170]-[0181]; and [0286]) and the chip-to-chip connections include chip- to-chip connections that connect to form the combined design and chip-to-chip connections that connect to form the combined design ([0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; and [0286]); and a plurality of memory chips configured to store data in the combined design, data stored in one of the plurality of memory chips and the plurality of memory chips coupled to the plurality of ICs of the combine design ([0098]; [0170]-[0181]; and [0286]) ([0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; and [0286]). 
MUSLEH does not teach the package comprising: 
each comprising a local systolic array of data processing units (DPUs), where a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: 
obtain first data from a first one of the four other DPUs, 
obtain second data from a second one of the four other DPUs, 
perform an operation using the first data and the second data, 
direct the first data, without changes to the first data, to a third one of the four other DPUs, and 
direct the second data ,without changes to the second data, to a fourth one of the four other DPUs; 
to connect the local systolic arrays to form a systolic array, configured so that data passes between DPUs in different local systolic arrays that are directly coupled together in a same number of clock cycles as data passes between DPUs in the same local systolic array and include bidirectional horizontal connections that connect the local systolic arrays to form a row of systolic array and unidirectional vertical connections that connect the local systolic arrays to form a column of systolic array; and 
to store weights for performing matrix multiplications in systolic array as part of an Al model, the first data including a weight stored and forming a top row of systolic array. 
Woo teaches/suggests a system comprising: each comprising a local systolic array of units (e.g. associated with each die (1105) having corresponding systolic array units (1110)); to connect the local systolic arrays to form a systolic array (e.g. associated with the plurality of dies (1105) being connected together), configured so that data passes between units in different local systolic arrays that are directly coupled together operate accordingly as data passes between units in the same local systolic array (e.g. associated with data pass between systolic array units (1110) on different dies (1105) operating accordingly as data pass between systolic array units (1110) on same die (1105)) and connect the local systolic arrays to form systolic array and connect the local systolic arrays to form systolic array (e.g. associated with connecting systolic array units (1110)); and operating with the systolic array, operating with the systolic array (e.g. Fig. 11) (Fig 11-12F; and [0058]-[0065]).
Parra Osorio teaches/suggests a system comprising: data processing units (DPUs) (e.g. Fig. 27B; [0292]-[0294]; [0382]), where a first DPU of the DPUs is coupled to four other DPUs and the first DPU is configured to: operating with a first one of the four other DPUs, operating with a second one of the four other DPUs, operating with a third one of the four other DPUs, and operating with a fourth one of the four other DPUs (e.g. Fig. 27B; [0364]-[0366]; [0382]); and data passes between DPUs in a same number of clock cycles as data passes between DPUs (e.g. associated with during a clock cycles data would be pass between DPUs in different parts of array architecture of Fig. 27B: [0131]; [0342]-[0353]; [0361]-[0366]; [0382]); and weight for performing matrix multiplications in the systolic array as part of an Al model ([0172]; [0195]; [0292]; and [0361]-[0366]) (Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; and [0382]).
ZHANG teaches/suggests a system comprising: to store weights (e.g. associated with weight buffer (221) and (222) in Fig. 3D), forming a top row of systolic array (e.g. associate with weight buffer being positioned at the top row) (Fig. 3C to Fig. 3D; [0045]-[0058]; and [0062]-[0063]). 
Sun teaches/suggests a system comprising: obtain first data from a first one of four others, obtain second data from a second one of the four others, perform an operation using the first data and the second data, direct the first data, without changes to the first data, to a third one of the four others, and direct the second data ,without changes to the second data, to a fourth one of the four others; and the first data including a weight (e.g. associated with weight (206) in Fig. 3) (Fig. 1; Fig. 3; [0021]-[0027]; and [0041]).
Lyuh teaches/suggests a system comprising: include bidirectional horizontal connections to form a row of elements and unidirectional vertical connections to form a column elements (Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; and col. 6, ll. 6-63).
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Woo, Parra Osorio, ZHANG and Sun’s architecture and Lyuh’s bidirectional connection into MUSLEH’s accelerator for the benefit of improving efficiency and performance while implementing a robust 3D architecture (Woo, [0019]; and [0058]), improving performance by reducing bandwidth usage and power consumption (Parra Osorio, [0372]), using space efficiently while avoiding congestion and facilitating data flow (ZHANG, [0055]), implementing faster performing reconfigurable architecture that is energy efficient (Sun, [0015]), and minimize accesses to memory to efficiently perform calculations (Lyuh, col. 13, ll. 20-30) to obtain the invention as specified in claim 15.

As per claim 17, MUSLEH, Woo, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, Parra Osorio and Lyuh further teach/suggest the Al accelerator comprising wherein the bidirectional horizontal chip-to-chip connections permit a rightmost IC within a row of the plurality of ICs in the combined systolic array to feed back data to a leftmost IC within the row (Parra Osorio, Fig. 18C; Fig. 27B; [0292]-[0294]; and Woo, Fig 11-12F; [0058]-[0065]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Lyuh, Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; col. 6, ll. 6-63).

As per claim 18, MUSLEH, Woo, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, Parra Osorio and Lyuh further teach/suggest the Al accelerator comprising wherein the unidirectional vertical chip-to-chip connections are configured such that data flows only from a topmost row of the plurality of ICs in the combined systolic array to a bottom most row of the plurality of ICs in the combined systolic array (Parra Osorio, Fig. 18C; Fig. 27B; and Woo, Fig 11-12F; [0058]-[0065]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]; and Lyuh, Fig. 3; col. 2, ll. 50-63; col. 4, ll. 35-60; col. 6, ll. 6-63). 

As per claim 19, MUSLEH, Woo, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator comprising wherein the plurality of memory chips are high- bandwidth memories (HBMs), wherein the HBMs are hardwired to respective columns in the local systolic arrays without any switching element (MUSLEH, [0098]; [0286]; Parra Osorio, Fig. 18C; [0063]; [0119]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0349]; [0361]-[0366]; [0371]; [0382]).

As per claim 20, MUSLEH, Woo, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 19 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator comprising wherein multiple HBMs are hardwired to each of the plurality of ICs in the top row (MUSLEH, [0098]; [0286]; Parra Osorio, Fig. 18C; [0063]; [0119]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0346]; [0349]; [0361]-[0366]; [0371]; [0382]). 

As per claim 21, MUSLEH, Woo, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator further comprising: an interposer, wherein the plurality of ICs are disposed in a grid pattern on the interposer, wherein the chip-to-chip connections extend through the interposer (MUSLEH, Fig. 11C; [0175]; Parra Osorio, Fig. 18C; Fig. 24C; Fig. 27B;  [0345]-[0350]) (MUSLEH, [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). 

As per claim 22, MUSLEH, Woo, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator comprising: wherein the plurality of ICs are stacked on each other, wherein the chip-to-chip connections are formed using microbumps or pillars connecting the plurality of ICs (MUSLEH, Fig. 11B-11C; [0172]-[0177]; Parra Osorio, Fig. 24B-24C; [0342]-[0350]; and Woo, Fig 11-12F; [0058]-[0065]) (MUSLEH, Fig. 11B-11C; [0035]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). 

As per claim 23, MUSLEH, Woo, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator comprising: wherein each of the plurality of ICs comprises auxiliary circuitry separate from the local systolic array, wherein the Al accelerator further comprises: local memory chips coupled to the auxiliary circuitry in each of the plurality of ICs (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). 

As per claim 25, MUSLEH, Woo, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 15 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator comprising: wherein the local systolic arrays do not communicate with the local memory chips (e.g. associated with data security that secure data in memory from being accessed: MUSLEH, [0185]; [0207]; and Parra Osorio, [0215]; [0356]) (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 24A-24D; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0131]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0353]; [0361]-[0366]; [0371]; [0382]). 

Claims 24 are rejected under 35 U.S.C. 103 as being unpatentable over MUSLEH et al. (US Pub.: 2021/0092069) in view of Woo et al. (US Pub.: 2022/0335283), Parra Osorio et al. (US Pub.: 2024/0168723), ZHANG et al. (US Pub.: 2024/0004830), Sun et al. (US Pub.: 2024/0028869) and Lyuh et al. (US Patent 11,507,429) as applied to claim 23 above, and further in view of Wang et al. (US Pub.: 2022/0108688).

As per claim 24, MUSLEH, Woo, Parra Osorio, ZHANG, Sun and Lyuh teach/suggest all the claimed features of claim 23 above, where MUSLEH, Woo, and Parra Osorio further teach/suggest the Al accelerator comprising: wherein the auxiliary circuitry is configured to perform operations data that is stored in the local memory chips, wherein the operations are part of the Al model (MUSLEH, Fig. 1; Fig. 11B-11C; [0035]-[0036]; [0057]; [0069]; [0098]; [0128]; [0169]-[0183]; [0251]; [0286]; Woo, Fig 11-12F; [0058]-[0065]; and Parra Osorio, Fig. 18C; Fig. 27B; Abstract; [0045]-[0046]; [0052]-[0054]; [0063]; [0119]; [0172]; [0195]; [0233]; [0292]-[0294]; [0298]; [0342]-[0350]; [0361]-[0366]; [0371]; [0382]), but MUSLEH, Woo, Parra Osorio, ZHANG, Sun and Lyuh do not teach the package comprising: self-attention operations that use data from previous tokens, wherein self-attention operations operate accordingly. 
Wang teach/suggest a system comprising: self-attention operations that use data from previous tokens, wherein self-attention operations operate accordingly (Claim 1; and [0031])
It would have been obvious for one of ordinary skill in this art, before the effective filing date of the claimed invention, to include Wang’s self-attention operations into MUSLEH, Woo, Parra Osorio, ZHANG, Sun and Lyuh’s package for the benefit of improving model generation (Wang, [0021]) to obtain the invention as specified in claim 24.

II. CLOSING COMMENTS
CONCLUSION
STATUS OF CLAIMS IN THE APPLICATION
The following is a summary of the treatment and status of all claims in the application as recommended by M.P.E.P.  707.07(i):

CLAIMS REJECTED IN THE APPLICATION
Applicant's amendment necessitated the new ground(s) of reject
Read full office action
Prosecution Timeline

May 10, 2023
Application Filed
Nov 16, 2024
Non-Final Rejection — §103
Feb 21, 2025
Response Filed
Mar 12, 2025
Final Rejection — §103
Jun 17, 2025
Request for Continued Examination
Jun 20, 2025
Response after Non-Final Action
Jul 07, 2025
Non-Final Rejection — §103
Sep 29, 2025
Response Filed
Oct 21, 2025
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/476,422
Patent 12602270
KV-CACHE STREAMING FOR IMPROVED PERFORMANCE AND FAULT TOLERANCE IN GENERATIVE MODEL SERVING
2y 5m to grant Granted Apr 14, 2026
18/647,048
Patent 12596659
METHODS, DEVICES AND SYSTEMS FOR HIGH SPEED TRANSACTIONS WITH NONVOLATILE MEMORY ON A DOUBLE DATA RATE MEMORY BUS
2y 5m to grant Granted Apr 07, 2026
18/457,842
Patent 12579080
OUTPUT METHOD AND DEVICE
2y 5m to grant Granted Mar 17, 2026
18/685,110
Patent 12579089
DATA PROCESSING METHOD, APPARATUS AND SYSTEM BASED ON PARA-VIRTUALIZATION DEVICE
2y 5m to grant Granted Mar 17, 2026
18/230,744
Patent 12554540
EVENT PROCESSING BY HARDWARE ACCELERATOR
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
68%
Grant Probability
71%
With Interview (+3.1%)
3y 4m
Median Time to Grant
High
PTA Risk
Based on 669 resolved cases by this examiner. Grant probability derived from career allow rate.