Last updated: July 17, 2026

Application No. 17/893,985

HARDWARE ENHANCEMENTS FOR MATRIX LOAD/STORE INSTRUCTIONS

Non-Final OA §102§103

Filed

Aug 23, 2022

Examiner

ALLI, KASIM A

Art Unit

2183

Tech Center

2100 — Computer Architecture & Software

Assignee

Intel Corporation

OA Round

3 (Non-Final)

Interview Optional

— +35.5% interview lift. Examiner has a relatively high allowance rate (65%); +35.5% interview lift. A written response may suffice.

Based on 187 resolved cases, 2023–2026

Examiner Intelligence

ALLI, KASIM A View full profile →

Grants 65% — above average

Career Allowance Rate

122 granted / 187 resolved

+10.2% vs TC avg

Strong +36% interview lift

Without

With

+35.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

16 currently pending

Career history

209

Total Applications

across all art units

Statute-Specific Performance

§101

1.1%

-38.9% vs TC avg

§103

74.9%

+34.9% vs TC avg

§102

7.9%

-32.1% vs TC avg

§112

13.3%

-26.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 187 resolved cases

Office Action

§102 §103

CTNF 17/893,985 CTNF 93560 DETAILED ACTION Notice of Pre-AIA or AIA Status 07-03-aia AIA 15-10-aia The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Continued Examination Under 37 CFR 1.114 07-42-04 AIA A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 05/21/2026 has been entered. Response to Amendment This office action is in response to the amendment filed on 05/21/2026. Claims 1-20 are pending. Claims 1, 14, and 18 are amended. Response to Arguments 07-37 AIA Applicant's arguments filed 05/21/2026 with respect to claims 1, 14, and 18 have been fully considered but they are not persuasive. On page 6 of the Remarks, Applicant submits: The examiner's position conflates element-wise iteration with 2D block access. A single element-wise memory access is inherently one-dimensional, as it it operates along at most one dimension (e.g., incrementing by element size along dimension 0). A "two- dimensional block access message" requires accessing a 2D spatial region within the tensor, specifically height and width simultaneously, within a single message. Minkin's innermost loop does precisely the opposite: it increments the global address "by the element size for the tensor" [0089] and processes one element at a time. Even a batch of such operations remains a collection of 1D accesses, not 2D block accesses. The outer loops in Minkin's nest iterate over positions, but they do not change the dimensionality of each individual message. Instead each remains a scalar element fetch. The claims require messages that are themselves 2D; iteration over multiple 1D accesses cannot satisfy that limitation. However, to clarify the claimed subject matter, the independent claims are amended to specify a batch of "two-dimensional block access messages" "of multiple-element granularity." However, this argument is not persuasive because it does not consider that the requests made by the two innermost loops collectively access a two-dimensional block and thus these requests form a batch of two-dimensional block access messages/requests, i.e., requests that collectively access a two-dimensional block. The claims do not require that the messages themselves are 2D (as applicant asserts), the claims only require generating a batch of two-dimensional block access messages, which is broadly interpreted as a batch/collection of requests/messages that collectively access a two-dimensional block. The amendment specifying that the batch of two-dimensional block access messages are of multiple element granularity does not overcome this interpretation because the two innermost loops of Minkin collectively access multiple elements. Further, any batch of two-dimensional block access messages would be of multiple-element granularity as a two-dimensional block includes multiple elements (a two-dimensional block cannot have only one element because an element does not have dimensions). Applicant’s arguments on pages 7-8 of the Remarks with respect to claims 11-13 have been fully considered and are persuasive. The prior art rejection of claims 11-13 has been withdrawn . Claim Rejections - 35 USC § 102 07-07-aia AIA 07-07 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – 07-12-aia AIA (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. 07-15-03-aia AIA Claim s 1 and 14-18 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Minkin US 2023/0289304 . Regarding claim 1, 1. A graphics processor (Fig. 1, 100) comprising: a system interface (Fig. 1, 114) ; general-purpose graphics execution resources coupled with the system interface (Fig. 1, streaming multiprocessors SMs 102a-n) , the general-purpose graphics execution resources including a matrix accelerator, the matrix accelerator configured to perform a matrix operation on a plurality of tensors stored in a memory ([0049]: a tensor core in a functional unit of one of the SMs is a matrix accelerator that performs matrix operations, see also [0061]-[0062] describing that the SMs may request the tensor memory access unit TMAU to load tensor data from global memory, indicating that the matrix operations may be performed on tensors stored in the global memory) ; and circuitry configured to facilitate access to the memory by the general-purpose graphics execution resources ([0061]-[0062]: TMAU 112 is circuitry that facilitates access to the global memory by the SMs) , wherein the circuitry is configured to: receive a request to access a tensor of the plurality of tensors ([0061]-[0062]: the TMAU may receive a request for tensor data, which is a request to access one of the tensors stored in global memory) , the tensor having at least three dimensions ([0088]: the tensor may have five dimensions) ; and generate, based on the request, a batch of two-dimensional block access messages of multiple-element granularity along at least a third dimension of the tensor, the batch of two-dimensional block access messages to enable access to the tensor by the matrix accelerator ([0088]-[0092] discloses (with respect to Fig. 7B) that, in response to receiving a request from a SM, the TMAU may access elements in a five dimensional tensor using five nested loops, each loop iterating through a respective dimension, with the innermost loop loading the current element and incrementing the global address for the next element; the requests corresponding to elements accessed in the two innermost loops (which access a two-dimensional block of multiple elements/multiple-element granularity defined by dimensions 0 and 1) are a batch of two-dimensional block access messages which are generated in a third nested loop along a third dimension; see also [0061]-[0062] disclosing that the memory access requests generated by the TMAU are based on the request from the SM). Regarding claim 14, Minkin teaches: 14. A method comprising: receiving a request to access a tensor in memory of a general-purpose graphics processor ( [0061]-[0062]: the TMAU may receive a request for tensor data, which is a request to access one of the tensors stored in global memory of the GPU 100) , the tensor having at least three dimensions ([0088]: the tensor may have five dimensions) ; generating, based on the request, a batch of two-dimensional block access messages of multiple-element granularity along at least a third dimension of the tensor ([0088]-[0092] discloses (with respect to Fig. 7B) that, in response to receiving a request from a SM, the TMAU may access elements in a five dimensional tensor using five nested loops, each loop iterating through a respective dimension, with the innermost loop loading the current element and incrementing the global address for the next element; the requests corresponding to elements accessed in the two innermost loops (which access a two-dimensional block of multiple elements/multiple-element granularity defined by dimensions 0 and 1) are a batch of two-dimensional block access messages which are generated in a third nested loop along a third dimension; see also [0061]-[0062] disclosing that the memory access requests generated by the TMAU are based on the request from the SM) ; and enabling access to the memory for a matrix accelerator of the general-purpose graphics processor based on the batch of two-dimensional block access messages ([0049] discloses a tensor core of the GPU that perform matrix operations (i.e., the tensor core is a matrix accelerator), and the memory requests/messages generated by the TMAU, see [0061]-[0062], enables access to the tensors in memory for the tensor core) . Regarding claim 15, Minkin teaches: 15. The method as in claim 14, wherein enabling access to the memory for a matrix accelerator includes loading a plurality of two-dimensional block data planes of the tensor from the memory of the general-purpose graphics processor ([0084] describes the request generator in the TMAU traversing the tensor my iterating multidimensional coordinates to generate the requests for the block of tensor data, which may load two-dimensional block data planes of the tensor from the global memory when traversing more than two dimensions, see also [0088]-[0092] and Fig. 7B describing traversing dimensions of the tensor by incrementing the base global address to the next slice) . Regarding claim 16, Minkin teaches: 16. The method as in claim 14, wherein enabling access to the memory for a matrix accelerator includes prefetching a plurality of two-dimensional block data planes of the tensor from the memory of the general-purpose graphics processor to a cache of the general-purpose graphics processor (the requests may be prefetch requests, see [0081] which prefetch data from the global memory to cache, see [0144], which may include prefetching two-dimensional block data planes for tensors when traversing the dimensions of the tensor as shown in Fig. 7B) . Regarding claim 17, Minkin teaches: 17. The method as in claim 14, wherein enabling access to the memory for a matrix accelerator includes storing a plurality of two-dimensional block data planes of the tensor to the memory of the general-purpose graphics processor (the requests may be store requests, see [0081], which store the tensor data to the global memory, see also [0145]-[0146] describing the destination memory is treated as a multidimensional tensor and that the store request may be executed in a tile mode (as shown in Fig. 7B), which indicates that the store requests may store two-dimensional block data planes when traversing the multidimensional tensor as shown in Fig. 7B) . Regarding claim 18, Minkin teaches: 18. A data processing system comprising: a memory device (Fig. 1, global memory 116) ; general-purpose graphics execution resources coupled with the memory device (Fig. 1, streaming multiprocessors SMs 102a-n) , the general-purpose graphics execution resources including a matrix accelerator, the matrix accelerator configured to perform a matrix operation on a plurality of tensors stored in the memory device ([0049]: a tensor core in a functional unit of one of the SMs is a matrix accelerator that performs matrix operations, see also [0061]-[0062] describing that the SMs may request the tensor memory access unit TMAU to load tensor data from global memory, indicating that the matrix operations may be performed on tensors stored in the global memory) , and circuitry configured to facilitate access to the memory device by the general-purpose graphics execution resources ([0061]-[0062]: TMAU 112 is circuitry that facilitates access to the global memory by the SMs) , wherein the circuitry is configured to: receive a request to access a tensor of the plurality of tensors ([0061]-[0062]: the TMAU may receive a request for tensor data, which is a request to access one of the tensors stored in global memory) , the tensor having at least three dimensions ([0088]: the tensor may have five dimensions) ; and generate, based on the request, a batch of two-dimensional block access messages along at least a third dimension of the tensor, the batch of two-dimensional block access messages of multiple-element granularity to enable access to the tensor by the matrix accelerator ([0088]-[0092] discloses (with respect to Fig. 7B) that, in response to receiving a request from a SM, the TMAU may access elements in a five dimensional tensor using five nested loops, each loop iterating through a respective dimension, with the innermost loop loading the current element and incrementing the global address for the next element; the requests corresponding to elements accessed in the two innermost loops (which access a two-dimensional block of multiple elements/multiple-element granularity defined by dimensions 0 and 1) are a batch of two-dimensional block access messages which are generated in a third nested loop along a third dimension; see also [0061]-[0062] disclosing that the memory access requests generated by the TMAU are based on the request from the SM) . Claim Rejections - 35 USC § 103 07-20-aia AIA The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 07-21-aia AIA Claim s 2-10 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Minkin US 2023/0289304 in view of Gottscho US 2022/0327075 . Regarding claim 2, Minkin teaches: 2. The graphics processor as in claim 1, including a based address of the tensor (Fig. 7B baseGlobalAddress) , a batch size, and a surface stride ([0078]: the height or width of the block to be accessed is a batch size and the tensor stride is a surface stride). Minkin does not teach: the request to access the tensor to include a base address of the tensor, a batch size, and a surface stride. However, Gottscho teaches a request to a DMA thread (analogous to the TMAU) that includes a descriptor including tensor addresses, size, and stride, see [0031] and [0034]. It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Minkin to include the base address, batch size, and surface stride in the requests as suggested by Gottscho. One of ordinary skill in the art would have been motivated to make this modification to allow faster access to the base address, batch size, and surface stride since receiving this information with the request would be faster than calculating them or looking them up. Regarding claim 3, Minkin in view of Gottscho teaches: 3. The graphics processor as in claim 2, the surface stride to specify a distance between two- dimensional block data planes of the tensor (Minkin [0078] discloses that the tensor stride is specified for each dimension and Fig. 7B shows the tensor stride being used to advance the global address to the next slice, the tensor stride along a third dimension (i.e., tensorStride[2]) is the distance between the two dimensional planes defined by the first and second dimensions) . Regarding claim 4, Minkin in view of Gottscho teaches: 4. The graphics processor as in claim 3, wherein to generate the batch of two-dimensional block access messages includes to generate parameters for two-dimensional block access messages within the batch of two-dimensional block access messages (Minkin [0084] describes the request generator computing the global memory addresses, which are parameters for accessing the requested tensor data) Regarding claim 5, Minkin in view of Gottscho teaches: 5. The graphics processor as in claim 4, wherein to generate parameters for the two-dimensional block access messages includes to calculate an address for each of the two-dimensional block access messages within the batch of two-dimensional block access messages based on the base address of the tensor and the surface stride (Minkin [0084] describes the request generator computing the global memory addresses and Fig. 7B shows that the global addresses are based on the base tensor address “baseGlobalAddress” and the surface stride “tensorStride”) . Regarding claim 6, Minkin in view of Gottscho teaches: 6. The graphics processor as in claim 5, the surface stride of the request configured according to a selected access dimension of the tensor (Minkin [0078] describes that the tensor stride is specified for each dimension of the tensor and Fig. 7B shows the tensor stride being used to calculate the global addresses of the requests in loops that iterate through dimensions of the tensors, which indicates that the surface stride of the requests are configured according to the selected access dimension of the tensor that the loop is iterating through) . Regarding claim 7, Minkin in view of Gottscho teaches: 7. The graphics processor as in claim 6, the surface stride of the request configured to be specified in cache line units (Minkin [0135] discloses that each row of cells in Fig. 9C (i.e., the set of elements along the C dimension) is a single cache line, which indicates that the tensor stride along W or H are specified in/correspond to cache lines (as each point along the W and H dimensions corresponds to a cache line)) . Regarding claim 8, Minkin in view of Gottscho teaches: 8. The graphics processor as in claim 6, the request to access the tensor configured to include a request to load the tensor from the memory (Minkin [0081]: the requests may be load requests, i.e., a request to load the tensor from the memory) . Regarding claim 9, Minkin in view of Gottscho teaches: 9. The graphics processor as in claim 6, the request to access the tensor configured to include a request to store the tensor to the memory (Minkin [0081]: the requests may be store requests, i.e., a request to store the tensor to memory) . Regarding claim 10, Minkin in view of Gottscho teaches: 10. The graphics processor as in claim 6, the request to access the tensor configured to include a request to pre-fetch the tensor from memory to a cache memory (Minkin [0081]: the requests may be prefetch requests, i.e., a request to prefetch the tensor from the memory, see also [0144] describing that the prefetch requests prefetch data from the global memory to L2 cache) . Regarding claim 19, Minkin teaches: 19. The data processing system as in claim 18, including a base address of the tensor (Fig. 7B baseGlobalAddress) , a batch size ([0078]: the height or width of the block to be accessed is a batch size) , and a surface stride, the surface stride to specify a distance between two-dimensional block data planes of the tensor ([0078] describes a tensor stride for each dimension and Fig. 7B shows the tensor stride being used to advance the global address to the next slice, the tensor stride along a third dimension (i.e., tensorStride[2]) is the distance between the two dimensional planes defined by the first and second dimensions)). Minkin does not teach: the request to access the tensor to include a base address of the tensor, a batch size, and a surface stride. However, Gottscho teaches a request to a DMA thread (analogous to the TMAU) that includes a descriptor including tensor addresses, size, and stride, see [0031] and [0034]. It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Minkin to include the base address, batch size, and surface stride in the requests as suggested by Gottscho. One of ordinary skill in the art would have been motivated to make this modification to allow faster access to the base address, batch size, and surface stride since receiving this information with the request would be faster than calculating them or looking them up. Regarding claim 20, Minkin in view of Gottscho teaches: 20. The data processing system as in claim 19, wherein to generate the batch of two-dimensional block access messages includes to generate parameters for two-dimensional block access messages within the batch of two-dimensional block access message (Minkin [0084] describes the request generator computing the global memory addresses, which are parameters for accessing the requested tensor data) . Allowable Subject Matter 12-151-08 AIA 07-43 12-51-08 Claim s 11-13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 13-03-01 AIA The following is a statement of reasons for the indication of allowable subject matter: The known prior art of record, taken alone or in combination, was not found to teach, in combination with other limitations in the claim, wherein the request to access the tensor includes an out-of-bounds parameter to indicate one or more two-dimensional block data planes of the tensor to set as out-of-bounds for the request, as recited in claim 11. The closest prior art of record was found to be Minkin US 2023/0289304. While Minkin teaches a tensor access descriptor parameter including an out-of-boundary value that is used to indicate portions of a tensor that is out-of-bounds, see [0078] and Fig. 5A, Minkin does not teach the out-of-bounds value indicating one or more two-dimensional block data planes to set as out-of-bounds for the request. No other prior art was found to cure this deficiency. Claims 12 and 13 depend from claim 11 and thus include the same allowable subject matter . Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476. The examiner can normally be reached Monday - Friday 9am 5pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on (571) 270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /KASIM ALLI/Examiner, Art Unit 2182 /JYOTI MEHTA/ Supervisory Patent Examiner, Art Unit 2183 Application/Control Number: 17/893,985 Page 2 Art Unit: 2183 Application/Control Number: 17/893,985 Page 3 Art Unit: 2183 Application/Control Number: 17/893,985 Page 4 Art Unit: 2183 Application/Control Number: 17/893,985 Page 5 Art Unit: 2183 Application/Control Number: 17/893,985 Page 6 Art Unit: 2183 Application/Control Number: 17/893,985 Page 7 Art Unit: 2183 Application/Control Number: 17/893,985 Page 8 Art Unit: 2183 Application/Control Number: 17/893,985 Page 9 Art Unit: 2183 Application/Control Number: 17/893,985 Page 10 Art Unit: 2183 Application/Control Number: 17/893,985 Page 11 Art Unit: 2183 Application/Control Number: 17/893,985 Page 12 Art Unit: 2183 Application/Control Number: 17/893,985 Page 13 Art Unit: 2183

Read full office action

Prosecution Timeline

Show 1 earlier event

Oct 24, 2022

Response after Non-Final Action

Oct 02, 2025

Non-Final Rejection mailed — §102, §103

Dec 19, 2025

Response Filed

Feb 19, 2026

Final Rejection mailed — §102, §103

Apr 15, 2026

Response after Non-Final Action

May 21, 2026

Request for Continued Examination

May 28, 2026

Response after Non-Final Action

Jul 08, 2026

Non-Final Rejection mailed — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/620,516

Patent 12645453

DATA TRANSFER FOR VECTORS IN NEURAL NETWORKS

3y 8m to grant Granted Jun 02, 2026

17/853,087

Patent 12639072

REDUCING INSTRUMENTATION CODE BLOAT AND PERFORMANCE OVERHEADS USING A RUNTIME CALL INSTRUCTION

3y 11m to grant Granted May 26, 2026

18/670,039

Patent 12632256

PREFETCH REQUEST GENERATION

1y 12m to grant Granted May 19, 2026

18/742,976

Patent 12632255

RESERVATION STATION WITH MULTIPLE ENTRY TYPES

1y 11m to grant Granted May 19, 2026

18/378,207

Patent 12578963

IMPLIED FENCE ON STREAM OPEN

2y 5m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

65%

Grant Probability

99%

With Interview (+35.5%)

3y 3m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 187 resolved cases by this examiner. Grant probability derived from career allowance rate.

HARDWARE ENHANCEMENTS FOR MATRIX LOAD/STORE INSTRUCTIONS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email