Last updated: April 17, 2026

Application No. 17/484,200

MATRIX OPERATION WITH MULTIPLE TILES PER MATRIX DIMENSION

Final Rejection §101§103§112

Filed

Sep 24, 2021

Examiner

RIVERA, MARIA DE JESUS

Art Unit

2151

Tech Center

2100 — Computer Architecture & Software

Assignee

Intel Corporation

OA Round

2 (Final)

Interview Optional

— +35.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 15 resolved cases, 2023–2026

Examiner Intelligence

RIVERA, MARIA DE JESUS View full profile →

Grants 67% — above average

Career Allow Rate

10 granted / 15 resolved

+11.7% vs TC avg

Strong +35% interview lift

Without

With

+35.1%

Interview Lift

resolved cases with interview

Typical timeline

4y 4m

Avg Prosecution

31 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

13.0%

-27.0% vs TC avg

§103

36.0%

-4.0% vs TC avg

§102

17.8%

-22.2% vs TC avg

§112

30.5%

-9.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 15 resolved cases

Office Action

§101 §103 §112

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Action is FINAL and is in response to the amendment filed May 29th, 2025. Claims 1-3, 5-10, 12-17, and 19-21 are pending, of which claims 1-3, 5-10, 12-17, 19-21 are currently rejected. Claims 4, 11, and 18 have been canceled by Applicants.

Response to Arguments
The amendment filed May 29th, 2025 has been entered. Claims 1-3, 5-10, and 19-21 remain pending in the application. Applicant’s amendments to the Claims have overcome each and every objection., and 112(b) rejection previously set forth in the Non-Final Office Action mailed March 10th, 2025.

Claim Objections
Applicants have amended the claims. Therefore, the previous objections to the Claims have been withdrawn.

Claim Rejections – 35 USC 101
The previous rejections of claims 1-21 under 35 U.S.C. 101 have been withdrawn due to the amendment of the claims.

Claim Rejections – 35 USC 112
	Applicants have amended independent claim 8 and canceled claim 11 and resolved the lack of clarity and antecedent basis issues. Therefore, the previous rejection of claims 8-14 under 35 U.S.C. 112(b) have been withdrawn.
Prior Art Rejections
Applicant’s arguments regarding the previously cited art have been fully considered and are persuasive. In regards to claims 1-3, 5-10, 12-17, and 19-21, new grounds of rejection have been made by Examiner that are necessitated by the amendments. See Claim Rejections - 35 USC § 103.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-8, 12-15, and 19-21 are rejected under 35 U.S.C. 103 as being unpatentable over Gradstein et al. (US 2020/0201932 A1) included in the IDS filed on 03/23/2023 (hereinafter “Gradstein”), further in view of He et al. (US 2021/0089304 A1) (hereinafter “He”).
Regarding claim 1, Gradstein teaches:
An apparatus, comprising:
A systolic array circuit (Gradstein: Fig. 6 shows a systolic array arrangement for matrix multiplication); and
circuitry coupled to the systolic array circuit, the circuitry to receive a single request comprising (Gradstein: ¶ 0064 single request used to define which tiles will be used for matrix operation):
one or more first source fields which are to provide, for each source tile of a first one or more source tiles, a different respective identifier of a location of the source tile (Gradstein: Fig. 29 Element 2902; ¶ 0229 discusses the execution of an instruction by the processor, and the instruction having various fields, including a field that identifies a first input two dimensional matrix i.e., first source tile; ¶ 0306 discusses the location of operands i.e., source tiles being specified);
one or more second source fields which are to provide, for each source tile of a second one or more source tiles, a different respective identifier of a location of the source tile (Gradstein: Fig. 29 Element 2902; ¶ 0229 discusses the execution of an instruction by the processor, and the instruction having various fields, including a field that identifies a second input two dimensional matrix i.e., second source tile; ¶ 0306 discusses the location of operands i.e., source tiles being specified);
one or more destination fields which are to provide, for each result tile of one or more result tiles, a different respective identifier of a location of the result tile (Gradstein: Fig. 29 Element 2902, ¶ 0229 discusses the execution of an instruction by the processor, and the instruction having various fields, including a third field that identifies resultant storage i.e., result tile; ¶ 0306 location of operands); and
a first field to provide an opcode which is to indicate that one or more full matrix operations are to be performed (Gradstein: ¶ 0297 of field for the opcode determining the operation to be performed; ¶ 0183 full matrix operations carried out by matrix processor);
the circuitry further to execute the single request, comprising the circuitry to (Gradstein: ¶ 0064 single request used to define which tiles will be used for matrix operation; ¶ 0237 execution circuitry to execute the single instruction):
detect a condition wherein:
the one or more first source fields indicate a first plurality of source tiles (Gradstein: ¶ 0229 first plurality of registers i.e., first plurality of source tiles further explained in ¶ 0306, ¶ 0055-56 rows i.e., submatrices are put into registers and operations are applied with respect to the registers i.e., tiles; Fig. 1A; ¶ 0058 all elements stored in memory are stored in tiles of a tile matrix);
the one or more second source fields indicate a second plurality of source tiles (Gradstein: ¶ 0229 second plurality of registers i.e., second plurality of source tiles further explained in ¶ 0306; ¶ 0055-56 rows i.e., submatrices are put into registers and operations are applied with respect to the registers i.e., tiles; Fig. 1A; ¶ 0058 all elements stored in memory are stored in tiles of a tile matrix); and
the one or more first destination fields indicate a plurality of result tiles (Gradstein: Fig. 29 Element 2902; ¶ 0229 discusses the execution of an instruction by the processor, and the instruction having various fields, including a third field that identifies resultant storage i.e., result tiles; ¶ 0306 location of operands in storage; ¶ 0058 all elements stored in memory are stored in tiles of a tile matrix); and
based on the condition:
cause the systolic array circuit to perform a first full matrix operation to generate the first result (Gradstein: ¶ 0121 matrix operations including a first matrix operation being carried out across tiles);
cause the systolic array circuit to perform a second full matrix operation (Gradstein: ¶ 0121 matrix operations including a second matrix operation being carried out across tiles);
store the first result tile and the second result tile each to a respective location identified by the one or more destination fields (Gradstein: Fig. 29 Element 2902; ¶ 0229 discusses the execution of an instruction by the processor, and the instruction having various fields, including a field for identifying the location for result storage i.e., result tiles; ¶ 0306 location of operands; ¶ 0058 all elements stored in memory are stored in tiles of a tile matrix).
While Gradstein does teach a systolic array (Gradstein: Fig. 6), a single instruction (Gradstein: ¶ 0064, ¶ 0237) for full matrix operations of sub-matrices i.e., tiles (Gradstein: ¶ 0121 matrix operations being carried out across tiles; Fig. 29 Element 2900; ¶ 0229 discusses the execution of an instruction by the processor, and the instruction having various fields, including a third field that identifies resultant storage i.e., result tiles), Gradstein does not explicitly teach the specifics of a first, second, third and fourth source tile and a first, and second result tile for the various matrix operations.
However, He, teaches:
a first plurality of source tiles which comprise a first source tile and a second source tile (He: ¶ 0020 describes the submatrices i.e., first source tile A1 and second source tile A2 for matrix multiplication operations; Fig. 3; ¶ 0036 matrix operations are carried out in an array-based configuration);
a second plurality of source tiles which comprise a third source tile and a fourth source tile (He: ¶ 0020 describes the submatrices i.e., third source tile B1 and fourth source tile B2 for matrix multiplication operations; Fig. 3; ¶ 0036 matrix operations are carried out in an array-based configuration);
a plurality of result tiles which comprise a first result tile and a second result tile (He: Fig. 3 output tiles 325 and 326 of corresponding first and second full matrix operations); and
perform a first full matrix operation on the first source tile and the third source tile to generate the first result tile (He: ¶ 0028 multiply accumulate operations carried out with each of the specified submatrices; Fig. 3 first operation with submatrices tiles A1 and B1 i.e., first and third source tile respectively; ¶ 0036 matrix operations are carried out in an array-based configuration);
perform a second full matrix operation on the second source tile and the fourth source tile to generate the second result tile (He: ¶ 0028 multiply accumulate operations carried out with each of the specified submatrices; Fig. 3 second operation with submatrices tiles A2 and B2 i.e., second and fourth source tile respectively; ¶ 0036 matrix operations are carried out in an array-based configuration); and
store the first result tile and the second result tile each to a respective location (He: Fig. 3 first result tile 325 and second result tile 326).
It would be obvious to combine the respective source tile specification for each of the matrix operations as taught by He with the systolic array and single instruction with respective fields as taught by Gradstein as both teachings are directed towards matrix multiplication operations. The improvement of He lies in reducing power consumption per unit area in high-performance processing units during matrix multiplication of first and second matrices, while increasing the reuse of data and therefore reducing bandwidth consumption in a processing unit (He: ¶ 0011).
Gradstein in view of He therefore teaches:
An apparatus, comprising:
A systolic array circuit; and
circuitry coupled to the systolic array circuit, the circuitry to receive a single request comprising:
one or more first source fields which are to provide, for each source tile of a first one or more source tiles, a different respective identifier of a location of the source tile;
one or more second source fields which are to provide, for each source tile of a second one or more source tiles, a different respective identifier of a location of the source tile;
one or more destination fields which are to provide, for each result tile of one or more result tiles, a different respective identifier of a location of the result tile; and
a first field to provide an opcode which is to indicate that one or more full matrix operations are to be performed;
the circuitry further to execute the single request, comprising the circuitry to:
detect a condition wherein:
the one or more first source fields indicate a first plurality of source tiles which comprise a first source tile and a second source tile;
the one or more second source fields indicate a second plurality of source tiles which comprise a third source tile and a fourth source tile; and
the one or more first destination fields indicate a plurality of result tiles which comprise a first result tile and a second result tile; and
based on the condition:
cause the systolic array circuit to perform a first full matrix operation on the first source tile and the third source tile to generate the first result tile;
cause the systolic array circuit to further perform a second full matrix operation on the second source tile and the fourth source tile to generate the second result tile; and
store the first result tile and the second result tile each to a respective location identified by the one or more destination fields.
Regarding claim 5, Gradstein teaches:
The apparatus of claim 1, wherein:
the first plurality of source tiles (Gradstein: ¶ 0229 first plurality of registers i.e., first plurality of source tiles further explained in ¶ 0306; ¶ 0055-56 rows i.e., submatrices are put into registers and operations are applied with respect to the registers i.e., tiles; Fig. 1A; ¶ 0058 all elements stored in memory are stored in tiles of a tile matrix)
the second plurality of source tiles (Gradstein: ¶ 0229 second plurality of registers i.e., second plurality of source tiles further explained in ¶ 0306; ¶ 0055-56 rows i.e., submatrices are put into registers and operations are applied with respect to the registers i.e., tiles; Fig. 1A; ¶ 0058 all elements stored in memory are stored in tiles of a tile matrix)
the plurality of result tiles (Gradstein: Fig. 29 Element 2902; ¶ 0229 discusses the execution of an instruction by the processor, and the instruction having various fields, including a field for identifying the location for result storage i.e., result tiles; ¶ 0306 location of operands; ¶ 0058 all elements stored in memory are stored in tiles of a tile matrix)
based on the condition, the circuitry is further to:
cause the systolic array circuit to perform a third full matrix operation (Gradstein: ¶ 0121 matrix operations including a third matrix operation being carried out across tiles); and
cause the systolic array circuit to perform a fourth full matrix operation (Gradstein: ¶ 0121 matrix operations including a fourth matrix operation being carried out across tiles); and
store the third result tile and the fourth result tile each to a respective location indicated by the one or more destination fields (Gradstein: Fig. 29 Element 2902; ¶ 0229 discusses the execution of an instruction by the processor, and the instruction having various fields, including a field for identifying the location for result storage i.e., result tiles; ¶ 0306 location of operands; ¶ 0058 all elements stored in memory are stored in tiles of a tile matrix).
While Gradstein does teach a systolic array (Gradstein: Fig. 6) for full matrix operations of sub-matrices i.e., tiles (Gradstein: ¶ 0121 matrix operations being carried out across tiles; Fig. 29 Element 2902; ¶ 0229 discusses the execution of an instruction by the processor, and the instruction having various fields, including a third field that identifies resultant storage i.e., result tiles), Gradstein does not explicitly teach the specifics of a fifth, sixth, seventh, and eighth source tile and a third and fourth result tile for the various matrix operations.
However, He teaches:
the first plurality of source tiles further comprises a fifth source tile and a sixth source tile (He: ¶ 0020 describes the submatrices i.e., fifth source tile A3 and sixth source tile A4 for matrix multiplication operations; Fig. 3; ¶ 0036 matrix operations are carried out in an array-based configuration);
the second plurality of source tiles further comprise a seventh source tile and an eighth source tile (He: ¶ 0020 describes the submatrices i.e., seventh source tile B3 and eighth source tile B4 for matrix multiplication operations; Fig. 3; ¶ 0036 matrix operations are carried out in an array-based configuration); 
the plurality of result tiles further comprises a third result tile and a fourth result tile (He: Fig. 3 output tiles 327 and 328 of corresponding first and second full matrix operations); and
perform a third full matrix operation on the fifth source tile and the seventh source tile to generate a third result tile (He: ¶ 0028 multiply accumulate operations carried out with each of the specified submatrices; Fig. 3 third operation with submatrices tiles A3 and B3 i.e., fifth and seventh source tile respectively; ¶ 0036 matrix operations are carried out in an array-based configuration); and
to perform a fourth full matrix operation on the sixth source tile and the eighth source tile to generate the fourth result tile (He: ¶ 0028 multiply accumulate operations carried out with each of the specified submatrices; Fig. 3 second operation with submatrices tiles A4 and B4 i.e., sixth and eighth source tile respectively; ¶ 0036 matrix operations are carried out in an array-based configuration); and
store the third result tile and the fourth result tile each to a respective location (He: Fig. 3 third result tile 327 and fourth result tile 328).
The motivation to combine with respect to claim 1 applies equally to claim 5.
Regarding claim 6, Gradstein in view of He teaches:
The apparatus of claim 1, wherein the first full matrix operation includes a matrix multiplication operation (Gradstein: ¶ 0057 the different kinds of matrix operations carried out with respect to the tiles includes matrix multiplication).
Regarding claim 7, Gradstein in view of He teaches:
The apparatus of claim 1, wherein the first full matrix operation includes a matrix fused multiply-add operation (Gradstein: ¶ 0094 a fused multiply accumulate i.e., fused multiply add instruction is one of the operation types that can be applied with respect to the tiles).
Claim 8 recites the apparatus of claim 1 and is therefore rejected for the same reasons therein. Gradstein in view of He additionally teaches:
decode circuitry to decode a single instruction, to generate a decoded instruction (Gradstein: ¶ 0236 decoder i.e., decoding circuitry to decode the single instruction; Fig. 29 Element 2904)
execution circuitry to execute the decoded instruction according to the opcode (Gradstein: ¶ 0237 execution circuitry to execute the decoded instruction from the decoder circuitry; Fig. 2900 Element 2910)
retrieve the first source tile and the second source tile from respective locations indicated by the one or more first source fields (Gradstein: ¶ 0229 retrieving of data from first source fields; Fig. 29 Element 2906);
retrieve the third source tile and the fourth source tile from respective locations indicated by the one or more second source fields (Gradstein: ¶ 0229 retrieving of data from first source fields; Fig. 29 Element 2906).
The motivation to combine with respect to claim 1 applies equally to claim 8.
Claims 12-14 recite the method practiced by the apparatus of claims 5-7 respectively and are therefore rejected for the same reasons therein.
Claim 15 recites the method practiced by the apparatus of claim 8 which recites the apparatus of claim 1, and is therefore rejected for the same reasons therein. Gradstein in view of He additionally teaches:
fetching a single instruction (Gradstein: ¶ 0064 single request used to define which tiles will be used for matrix operation; Fig. 29 element 2902 fetching of single instruction)
scheduling execution of the decoded instruction (Gradstein: Fig. 29 Element 2908 execution of instruction is scheduled).
The motivation to combine with respect to claim 1 applies equally to claim 15.
Claims 19-21 recite the method practiced by the apparatus of claim 12-14 respectively which recites the apparatus of claims 5-7 respectively and are therefore rejected for the same reasons therein.

Claims 2-3, 9-10, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Gradstein, in view of He, further in view of Shalev et al. (10853448) (hereinafter “Shalev”).
Regarding claim 2, while Gradstein in view of He teach the apparatus of claim 1 and the single request (Gradstein: ¶ 0064), Gradstein in view of He does not explicitly teach a matrix operation applying to two or more input tiles per row dimension of a tile matrix.
However, Shalev teaches:
wherein the single request indicates two or more input tiles per row dimension of a tile matrix (Shalev: Col 10 Lines 57-67 “To address this mismatch of dimensions, matrix 160 is partitioned horizontally into two tiles (or sets of tiles). The upper tile comprises columns…consisting of the upper halves of the columns of matrix 160; while the lower tile comprises columns 168… consisting of lower halves of the columns. Similarly, matrix 162 is partitioned vertically into two tiles, one comprising rows… consisting of the left halves of the rows of matrix 162, and the other comprising rows 172… consisting of the right halves of the rows of matrix 162.”; Col. 11 Lines 7-12 “… is multiplied by the first row… in the left tile of the matrix 162, and the result values are accumulated…the same column 166 is multiplied by the first row… in the right tile of matrix 162…”; Col. 10 Lines 19-25 “Thus, data access logic 31 distributes the input data values so that processing elements 24 simultaneously multiply a given row in each of the tiles of matrix 132 by the same column 77 in matrix 130. By the same token, the processing elements could be made to simultaneously multiply a given column in each of the tiles of one matrix by the same row in the other matrix).
It would be obvious to combine multiple tiles of the same row dimension for matrix multiplication as taught by Shalev with the apparatus as taught by Gradstein in view of He as all teachings are directed towards matrix multiplications. The improvement of Shalev lies in making more efficient use of resources during execution (Shalev: Col. 10 Line 10).
Regarding claim 3, while Gradstein in view of He teach the apparatus of claim 1 and the single request (Gradstein: ¶ 0064), Gradstein in view of He does not explicitly teach a matrix operation applying to two or more input tiles per column dimension of a tile matrix.
However, Shalev teaches:
wherein the single request indicates two or more input tiles per column dimension of a tile matrix (Shalev: Col 10 Lines 57-67 “To address this mismatch of dimensions, matrix 160 is partitioned horizontally into two tiles (or sets of tiles). The upper tile comprises columns…consisting of the upper halves of the columns of matrix 160; while the lower tile comprises columns 168… consisting of lower halves of the columns. Similarly, matrix 162 is partitioned vertically into two tiles, one comprising rows… consisting of the left halves of the rows of matrix 162, and the other comprising rows 172… consisting of the right halves of the rows of matrix 162.”; Col. 11 Lines 7-12 “… is multiplied by the first row… in the left tile of the matrix 162, and the result values are accumulated…the same column 166 is multiplied by the first row… in the right tile of matrix 162…”; Col. 10 Lines 19-25 “Thus, data access logic 31 distributes the input data values so that processing elements 24 simultaneously multiply a given row in each of the tiles of matrix 132 by the same column 77 in matrix 130. By the same token, the processing elements could be made to simultaneously multiply a given column in each of the tiles of one matrix by the same row in the other matrix).
The motivation to combine with respect to claim 2 applies equally to claim 3.
Claims 9-10 recite the apparatus of claims 2-3 respectively and are therefore rejected for the same reasons therein.
Claims 16-17 recite the method practiced by the apparatus of claims 9-10 respectively which recite the apparatus of claims 2-3 respectively and are therefore rejected for the same reasons therein.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARIA DE JESUS RIVERA whose telephone number is (571)272-2793. The examiner can normally be reached Monday-Friday 7:30AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Caldwell can be reached at (571) 272-3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.D.R./Examiner, Art Unit 2182                                                                                                                                                                                                        

/EMILY E LAROCQUE/Primary Examiner, Art Unit 2182

Read full office action

Prosecution Timeline

Sep 24, 2021

Application Filed

Jan 21, 2022

Response after Non-Final Action

Mar 04, 2025

Non-Final Rejection — §101, §103, §112

May 29, 2025

Response Filed

Aug 14, 2025

Final Rejection — §101, §103, §112

Apr 09, 2026

Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

17/518,661

Patent 12596553

TECHNIQUE FOR SPECULATIVELY GENERATING AN OUTPUT VALUE IN ANTICIPATION OF ITS USE BY DOWNSTREAM PROCESSING CIRCUITRY

2y 5m to grant Granted Apr 07, 2026

17/564,091

Patent 12596528

MULTIPURPOSE MULTIPLY-ACCUMULATOR ARRAY

2y 5m to grant Granted Apr 07, 2026

17/494,944

Patent 12580553

APPARATUS, METHOD, AND PROGRAM FOR POWER STABILIZATION THROUGH ARITHMETIC PROCESSING OF DUMMY DATA

2y 5m to grant Granted Mar 17, 2026

17/560,100

Patent 12572619

MATRIX PROCESSING ENGINE WITH COUPLED DENSE AND SCALAR COMPUTE

2y 5m to grant Granted Mar 10, 2026

17/448,123

Patent 12566952

MULTIPLIER BY MULTIPLEXED OFFSETS AND ADDITION, RELATED ELECTRONIC CALCULATOR FOR THE IMPLEMENTATION OF A NEURAL NETWORK AND LEARNING METHOD

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

67%

Grant Probability

99%

With Interview (+35.1%)

4y 4m

Median Time to Grant

Moderate

PTA Risk

Based on 15 resolved cases by this examiner. Grant probability derived from career allow rate.