Prosecution Insights
Last updated: April 19, 2026
Application No. 17/893,985

HARDWARE ENHANCEMENTS FOR MATRIX LOAD/STORE INSTRUCTIONS

Final Rejection §102§103
Filed
Aug 23, 2022
Examiner
ALLI, KASIM A
Art Unit
2183
Tech Center
2100 — Computer Architecture & Software
Assignee
Intel Corporation
OA Round
2 (Final)
66%
Grant Probability
Favorable
3-4
OA Rounds
3y 1m
To Grant
99%
With Interview

Examiner Intelligence

Grants 66% — above average
66%
Career Allow Rate
120 granted / 183 resolved
+10.6% vs TC avg
Strong +38% interview lift
Without
With
+38.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
22 currently pending
Career history
205
Total Applications
across all art units

Statute-Specific Performance

§101
3.7%
-36.3% vs TC avg
§103
49.4%
+9.4% vs TC avg
§102
16.8%
-23.2% vs TC avg
§112
24.2%
-15.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 183 resolved cases

Office Action

§102 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Amendment This office action is in response to the amendment filed on 12/19/2025. Claims 1-20 are pending. Claims 1, 4-5, 8-10, 12, 14, 18, and 20 are amended. Response to Arguments Applicant's arguments filed 12/19/2025 have been fully considered but they are not persuasive. On page 8 of the Remarks, Applicant submits: In other words, while Minkin describes using loops to copy data at per-element granularity, the instant claims specify a batch of "two-dimensional block access messages" along "at least" (e.g., no less than) a "third dimension" of a tensor having at least three dimension. The two-dimensional block access messages remove the need to, for example, traverse the first and second dimension to build a list of tensor elements, as in Minkin. However, this argument is not persuasive because the requests made by the two innermost loops corresponding to the two innermost dimensions of the tensor in Minkin are a batch of two-dimensional block access messages as they access two-dimensional blocks- the first dimension corresponding to the innermost loop and the second dimension corresponding to the second innermost loop, and these accesses are along a third dimension as the two innermost loops are nested in a third innermost loop corresponding to the third dimension of the tensor. In other words, the third innermost loop iterates the accesses of the two innermost loops along the third dimension. With respect to the argument on page 9 of the Remarks that Minkin does not disclose a signed parameter in the request to mark initial vs final planes, which appears to be drawn to claims 11-12¸ this argument is not persuasive. The out-of-bounds parameter of Minkin is used to one or more two-dimensional block data planes to set as out-of-bounds by indicating that the elements of those planes are out-of-bounds. In the example of Fig. 4B, for the bottom left tensor the out-of-bounds parameter would indicate that the elements of the left portion of the tensor (i.e., a two-dimensional block data plane of the tensor) are out-of-bounds, which would indicate a two-dimensional block data plane of the tensor to set as out-of-bounds as required by claim 11. Further, this left portion/two-dimensional block data plane of the tensor may be an initial or final plane of the tensor, thus any positive or negative out-of-bounds value/predefined constant (Minkin [0074]) for this portion would disclose a positive or negative value to specify an initial or final two-dimensional block data plane as out-of-bounds. With respect to the argument on page 9 of the Remarks that Minkin forcing the value of an element to a predefined constant does not mean bypassing request generation, which appears to be drawn to claim 13, this argument is not persuasive because Minkin forces the value of an out-of-bounds element to a predefined constant because it does not load the element/generate a message for the element from memory. Specifically, this argument does not consider [0084] of Minkin which describes checking the out-of-bounds conditions before generating the requests to memory, which indicates that the memory access requests corresponding to out-of-bounds blocks are not generated as the elements would be forced to be the predefined constant and not have to be loaded from memory. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. Claims 1 and 14-18 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Minkin US 2023/0289304. Regarding claim 1, 1. A graphics processor (Fig. 1, 100) comprising: a system interface (Fig. 1, 114); general-purpose graphics execution resources coupled with the system interface (Fig. 1, streaming multiprocessors SMs 102a-n), the general-purpose graphics execution resources including a matrix accelerator, the matrix accelerator configured to perform a matrix operation on a plurality of tensors stored in a memory ([0049]: a tensor core in a functional unit of one of the SMs is a matrix accelerator that performs matrix operations, see also [0061]-[0062] describing that the SMs may request the tensor memory access unit TMAU to load tensor data from global memory, indicating that the matrix operations may be performed on tensors stored in the global memory); and circuitry configured to facilitate access to the memory by the general-purpose graphics execution resources ([0061]-[0062]: TMAU 112 is circuitry that facilitates access to the global memory by the SMs), wherein the circuitry is configured to: receive a request to access a tensor of the plurality of tensors ([0061]-[0062]: the TMAU may receive a request for tensor data, which is a request to access one of the tensors stored in global memory), the tensor having at least three dimensions ([0088]: the tensor may have five dimensions); and generate, based on the request, a batch of two-dimensional block access messages along at least a third dimension of the tensor, the batch of two-dimensional block access messages to enable access to the tensor by the matrix accelerator ([0088]-[0092] discloses (with respect to Fig. 7B) that, in response to receiving a request from a SM, the TMAU may access elements in a five dimensional tensor using five nested loops, each loop iterating through a respective dimension, with the innermost loop loading the current element and incrementing the global address for the next element; the requests corresponding to elements accessed in the two innermost loops (which access a two-dimensional block defined by dimensions 0 and 1) are a batch of two-dimensional block access messages which are generated in a third nested loop along a third dimension; see also [0061]-[0062] disclosing that the memory access requests generated by the TMAU are based on the request from the SM). Regarding claim 14, Minkin teaches: 14. A method comprising: receiving a request to access a tensor in memory of a general-purpose graphics processor ( [0061]-[0062]: the TMAU may receive a request for tensor data, which is a request to access one of the tensors stored in global memory of the GPU 100), the tensor having at least three dimensions ([0088]: the tensor may have five dimensions); generating, based on the request, a batch of two-dimensional block access messages along at least a third dimension of the tensor ([0088]-[0092] discloses (with respect to Fig. 7B) that, in response to receiving a request from a SM, the TMAU may access elements in a five dimensional tensor using five nested loops, each loop iterating through a respective dimension, with the innermost loop loading the current element and incrementing the global address for the next element; the requests corresponding to elements accessed in the two innermost loops (which access a two-dimensional block defined by dimensions 0 and 1) are a batch of two-dimensional block access messages which are generated in a third nested loop along a third dimension; see also [0061]-[0062] disclosing that the memory access requests generated by the TMAU are based on the request from the SM); and enabling access to the memory for a matrix accelerator of the general-purpose graphics processor based on the batch of two-dimensional block access messages ([0049] discloses a tensor core of the GPU that perform matrix operations (i.e., the tensor core is a matrix accelerator), and the memory requests/messages generated by the TMAU, see [0061]-[0062], enables access to the tensors in memory for the tensor core). Regarding claim 15, Minkin teaches: 15. The method as in claim 14, wherein enabling access to the memory for a matrix accelerator includes loading a plurality of two-dimensional block data planes of the tensor from the memory of the general-purpose graphics processor ([0084] describes the request generator in the TMAU traversing the tensor my iterating multidimensional coordinates to generate the requests for the block of tensor data, which may load two-dimensional block data planes of the tensor from the global memory when traversing more than two dimensions, see also [0088]-[0092] and Fig. 7B describing traversing dimensions of the tensor by incrementing the base global address to the next slice). Regarding claim 16, Minkin teaches: 16. The method as in claim 14, wherein enabling access to the memory for a matrix accelerator includes prefetching a plurality of two-dimensional block data planes of the tensor from the memory of the general-purpose graphics processor to a cache of the general-purpose graphics processor (the requests may be prefetch requests, see [0081] which prefetch data from the global memory to cache, see [0144], which may include prefetching two-dimensional block data planes for tensors when traversing the dimensions of the tensor as shown in Fig. 7B). Regarding claim 17, Minkin teaches: 17. The method as in claim 14, wherein enabling access to the memory for a matrix accelerator includes storing a plurality of two-dimensional block data planes of the tensor to the memory of the general-purpose graphics processor (the requests may be store requests, see [0081], which store the tensor data to the global memory, see also [0145]-[0146] describing the destination memory is treated as a multidimensional tensor and that the store request may be executed in a tile mode (as shown in Fig. 7B), which indicates that the store requests may store two-dimensional block data planes when traversing the multidimensional tensor as shown in Fig. 7B). Regarding claim 18, Minkin teaches: 18. A data processing system comprising: a memory device (Fig. 1, global memory 116); general-purpose graphics execution resources coupled with the memory device (Fig. 1, streaming multiprocessors SMs 102a-n), the general-purpose graphics execution resources including a matrix accelerator, the matrix accelerator configured to perform a matrix operation on a plurality of tensors stored in the memory device ([0049]: a tensor core in a functional unit of one of the SMs is a matrix accelerator that performs matrix operations, see also [0061]-[0062] describing that the SMs may request the tensor memory access unit TMAU to load tensor data from global memory, indicating that the matrix operations may be performed on tensors stored in the global memory), and circuitry configured to facilitate access to the memory device by the general-purpose graphics execution resources ([0061]-[0062]: TMAU 112 is circuitry that facilitates access to the global memory by the SMs), wherein the circuitry is configured to: receive a request to access a tensor of the plurality of tensors ([0061]-[0062]: the TMAU may receive a request for tensor data, which is a request to access one of the tensors stored in global memory), the tensor having at least three dimensions ([0088]: the tensor may have five dimensions); and generate, based on the request, a batch of two-dimensional block access messages along at least a third dimension of the tensor, the batch of two-dimensional block access messages to enable access to the tensor by the matrix accelerator ([0088]-[0092] discloses (with respect to Fig. 7B) that, in response to receiving a request from a SM, the TMAU may access elements in a five dimensional tensor using five nested loops, each loop iterating through a respective dimension, with the innermost loop loading the current element and incrementing the global address for the next element; the requests corresponding to elements accessed in the two innermost loops (which access a two-dimensional block defined by dimensions 0 and 1) are a batch of two-dimensional block access messages which are generated in a third nested loop along a third dimension; see also [0061]-[0062] disclosing that the memory access requests generated by the TMAU are based on the request from the SM). Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 2-13 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Minkin US 2023/0289304 in view of Gottscho US 2022/0327075. Regarding claim 2, Minkin teaches: 2. The graphics processor as in claim 1, including a based address of the tensor (Fig. 7B baseGlobalAddress), a batch size, and a surface stride ([0078]: the height or width of the block to be accessed is a batch size and the tensor stride is a surface stride). Minkin does not teach: the request to access the tensor to include a base address of the tensor, a batch size, and a surface stride. However, Gottscho teaches a request to a DMA thread (analogous to the TMAU) that includes a descriptor including tensor addresses, size, and stride, see [0031] and [0034]. It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Minkin to include the base address, batch size, and surface stride in the requests as suggested by Gottscho. One of ordinary skill in the art would have been motivated to make this modification to allow faster access to the base address, batch size, and surface stride since receiving this information with the request would be faster than calculating them or looking them up. Regarding claim 3, Minkin in view of Gottscho teaches: 3. The graphics processor as in claim 2, the surface stride to specify a distance between two- dimensional block data planes of the tensor (Minkin [0078] discloses that the tensor stride is specified for each dimension and Fig. 7B shows the tensor stride being used to advance the global address to the next slice, the tensor stride along a third dimension (i.e., tensorStride[2]) is the distance between the two dimensional planes defined by the first and second dimensions). Regarding claim 4, Minkin in view of Gottscho teaches: 4. The graphics processor as in claim 3, wherein to generate the batch of two-dimensional block access messages includes to generate parameters for two-dimensional block access messages within the batch of two-dimensional block access messages (Minkin [0084] describes the request generator computing the global memory addresses, which are parameters for accessing the requested tensor data) Regarding claim 5, Minkin in view of Gottscho teaches: 5. The graphics processor as in claim 4, wherein to generate parameters for the two-dimensional block access messages includes to calculate an address for each of the two-dimensional block access messages within the batch of two-dimensional block access messages based on the base address of the tensor and the surface stride (Minkin [0084] describes the request generator computing the global memory addresses and Fig. 7B shows that the global addresses are based on the base tensor address “baseGlobalAddress” and the surface stride “tensorStride”). Regarding claim 6, Minkin in view of Gottscho teaches: 6. The graphics processor as in claim 5, the surface stride of the request configured according to a selected access dimension of the tensor (Minkin [0078] describes that the tensor stride is specified for each dimension of the tensor and Fig. 7B shows the tensor stride being used to calculate the global addresses of the requests in loops that iterate through dimensions of the tensors, which indicates that the surface stride of the requests are configured according to the selected access dimension of the tensor that the loop is iterating through). Regarding claim 7, Minkin in view of Gottscho teaches: 7. The graphics processor as in claim 6, the surface stride of the request configured to be specified in cache line units (Minkin [0135] discloses that each row of cells in Fig. 9C (i.e., the set of elements along the C dimension) is a single cache line, which indicates that the tensor stride along W or H are specified in/correspond to cache lines (as each point along the W and H dimensions corresponds to a cache line)). Regarding claim 8, Minkin in view of Gottscho teaches: 8. The graphics processor as in claim 6, the request to access the tensor configured to include a request to load the tensor from the memory (Minkin [0081]: the requests may be load requests, i.e., a request to load the tensor from the memory). Regarding claim 9, Minkin in view of Gottscho teaches: 9. The graphics processor as in claim 6, the request to access the tensor configured to include a request to store the tensor to the memory (Minkin [0081]: the requests may be store requests, i.e., a request to store the tensor to memory). Regarding claim 10, Minkin in view of Gottscho teaches: 10. The graphics processor as in claim 6, the request to access the tensor configured to include a request to pre-fetch the tensor from memory to a cache memory (Minkin [0081]: the requests may be prefetch requests, i.e., a request to prefetch the tensor from the memory, see also [0144] describing that the prefetch requests prefetch data from the global memory to L2 cache). Regarding claim 11, Minkin teaches: 11. The graphics processor as in claim 1, including an out-of-bounds parameter to indicate one or more two-dimensional block data planes of the tensor to set as out-of-bounds for the request ([0074] describes the TMAU forcing the value of any requested element located outside of the tensor to a predefined value and [0078] describes that the parameters include the out-of-boundary value; that is, the out-of-boundary value indicates the requested blocks of data that is set as out-of-bounds for the request). Minkin does not teach: wherein the request to access the tensor includes the out-of-bounds parameter. However, Gottscho teaches a request to a DMA thread (analogous to the TMAU) that includes a descriptor including information about the DMA transaction, see [0031]. It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Minkin to include the out-of-boundary value in the requests as suggested by Gottscho. One of ordinary skill in the art would have been motivated to make this modification to allow faster access to the out-of-boundary value. Regarding claim 12, Minkin in view of Gottscho teaches: 12. The graphics processor as in claim 11, wherein the out-of-bounds parameter is a signed value, where a positive value is to specify one or more initial two-dimensional block data planes of the request as out-of-bounds and a negative value is to specify one or more final two-dimensional block data planes of the request as out-of-bounds (Minkin [0074] indicates that the out-of-bounds value may be a predefined constant value, which is a positive or negative value (since any non-zero constant is either positive or negative), see also [0073] and Fig. 4B showing that the out of bounds condition can occur in different areas of the tensor, which indicates that the out-of-bounds condition may be specified for initial or final planes of the tensor). Regarding claim 13, Minkin in view of Gottscho teaches: 13. The graphics processor as in claim 12, to generate the batch of two-dimensional block access messages includes to bypass generation of a memory access message for a two-dimensional block data plane that is specified as out-of-bounds (Minkin [0074]: by forcing the value of requested element to a special constant, the generation of memory access messages for those elements are bypassed, see also Minkin [0084] describing checking the out-of-bounds conditions before generating the requests to memory, which indicates that the memory access requests corresponding out-of-bounds blocks are bypassed/not generated). Regarding claim 19, Minkin teaches: 19. The data processing system as in claim 18, including a base address of the tensor (Fig. 7B baseGlobalAddress), a batch size ([0078]: the height or width of the block to be accessed is a batch size), and a surface stride, the surface stride to specify a distance between two-dimensional block data planes of the tensor ([0078] describes a tensor stride for each dimension and Fig. 7B shows the tensor stride being used to advance the global address to the next slice, the tensor stride along a third dimension (i.e., tensorStride[2]) is the distance between the two dimensional planes defined by the first and second dimensions)). Minkin does not teach: the request to access the tensor to include a base address of the tensor, a batch size, and a surface stride. However, Gottscho teaches a request to a DMA thread (analogous to the TMAU) that includes a descriptor including tensor addresses, size, and stride, see [0031] and [0034]. It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Minkin to include the base address, batch size, and surface stride in the requests as suggested by Gottscho. One of ordinary skill in the art would have been motivated to make this modification to allow faster access to the base address, batch size, and surface stride since receiving this information with the request would be faster than calculating them or looking them up. Regarding claim 20, Minkin in view of Gottscho teaches: 20. The data processing system as in claim 19, wherein to generate the batch of two-dimensional block access messages includes to generate parameters for two-dimensional block access messages within the batch of two-dimensional block access message (Minkin [0084] describes the request generator computing the global memory addresses, which are parameters for accessing the requested tensor data). Conclusion THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476. The examiner can normally be reached Monday - Friday 9am 5pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Caldwell can be reached on (571) 272-3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /KASIM ALLI/Examiner, Art Unit 2182 /JYOTI MEHTA/Supervisory Patent Examiner, Art Unit 2183
Read full office action

Prosecution Timeline

Aug 23, 2022
Application Filed
Oct 24, 2022
Response after Non-Final Action
Sep 30, 2025
Non-Final Rejection — §102, §103
Dec 19, 2025
Response Filed
Feb 13, 2026
Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12578963
IMPLIED FENCE ON STREAM OPEN
2y 5m to grant Granted Mar 17, 2026
Patent 12541369
EXECUTING PHANTOM LOOPS IN A MICROPROCESSOR
2y 5m to grant Granted Feb 03, 2026
Patent 12536131
VECTOR COMPUTATIONAL UNIT
2y 5m to grant Granted Jan 27, 2026
Patent 12498930
STORE TO LOAD FORWARDING USING HASHES
2y 5m to grant Granted Dec 16, 2025
Patent 12468530
ASSOCIATIVELY INDEXED CIRCULAR BUFFER
2y 5m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
66%
Grant Probability
99%
With Interview (+38.3%)
3y 1m
Median Time to Grant
Moderate
PTA Risk
Based on 183 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month