Last updated: April 19, 2026

Application No. 18/056,949

CROSS-THREAD REGISTER SHARING FOR MATRIX MULTIPLICATION COMPUTE

Non-Final OA §103

Filed

Nov 18, 2022

Examiner

METZGER, MICHAEL J

Art Unit

2183

Tech Center

2100 — Computer Architecture & Software

Assignee

Intel Corporation

OA Round

1 (Non-Final)

Interview Optional

— +8.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 482 resolved cases, 2023–2026

Examiner Intelligence

METZGER, MICHAEL J View full profile →

Grants 90% — above average

Career Allow Rate

435 granted / 482 resolved

+35.2% vs TC avg

Moderate +8% lift

Without

With

+8.1%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

27 currently pending

Career history

509

Total Applications

across all art units

Statute-Specific Performance

§101

6.0%

-34.0% vs TC avg

§103

53.6%

+13.6% vs TC avg

§102

14.1%

-25.9% vs TC avg

§112

8.7%

-31.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 482 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.  Claims 1-2, 4-11, 13-16, 18-21, and 23-25 are rejected under 35 U.S.C. 103 as being unpatentable over Maiyuran et al (US 2021/0089301, herein Maiyuran) in view of Khorasani et al (US 2018/0275991, herein Khorasani).

Regarding claim 16, Maiyuran teaches a system comprising:
a memory to store a block of data (Fig 1, [0047], memory); and
a processor coupled to the memory, the processor comprising a processor core having matrix acceleration hardware comprising a plurality of data processing units (Figs 1, 2A-2D, [0045], processors and associated units, [0108], compute accelerator), wherein the respective plurality of data processing units are to:
receive a decoded instruction ([0136], decode unit) for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space for an operand of the decoded instruction for the first thread ([0240-0242], [0246-0247], [0250-0251], multiple sets of input matrix data for matrix multiply operations spread across first and second sets of registers to be used in a fused operation, [0141], [0144], [0159], [0240], a matrix multiplication type instruction opcode indicates the need for divided sets of matrix inputs);
access the second register space to obtain data for the operand of the decoded instruction ([0250-0251], second set of matrix multiplication input data acquired from second set of registers); and
perform the matrix multiplication operation for the first thread using the data for the operand from the second register space ([0252], execute matrix multiplication using shared processing resources).
Maiyuran fails to teach the second register space is that of a second thread.
Khorasani teaches a system comprising a processor to execute instructions using a first and second register space of first and second threads ([0024], [0030], threads assigned a corresponding register space for operations, [0040], [0058-0062], using instructions that extend a register space into shared registers assigned to other threads).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the teachings of Maiyuran and Khorasani to utilize register spaces across thread barriers.  While Maiyuran does disclose the use of threaded and multithreaded execution techniques, and the use of multiple registers spaces, Maiyuran does not explicitly state that these register spaces are assigned a corresponding thread to use as its register space for the instructions of that thread.  However, as both Maiyuran and Khorasani disclose techniques for utilizing multiple register spaces for data-intensive operations such as matrix multiplication, the combination would merely entail a simple substitution of known prior art elements to achieve predictable results, and thus would have been obvious to one of ordinary skill in the art.

Regarding claim 18, the combination of Maiyuran and Khorasani teaches the system of claim 16, wherein a compiler is to utilize a synchronization barrier to synchronize the first thread and the second thread prior to the first thread accessing the data for the operand from the second register space of the second thread (Khorasani [0038], Maiyuran [0144], synchronization operations between threads, [0168], register write prompts synchronization event).

Regarding claim 19, the combination of Maiyuran and Khorasani teaches the system of claim 16, wherein a compiler is to utilize a global synchronization barrier prior to execution of the first thread and the second thread to determine that the first thread and the second thread are dispatched and running (Khorasani [0038], CTA-wide syncthreads command to handle extended register sets).

Regarding claim 20, the combination of Maiyuran and Khorasani teaches the system of claim 16, wherein the indication comprises a first encoding in the instruction that comprises an identifier (ID) of the second thread, and wherein the indication comprises a second encoding to utilize a third register space of a third thread for another operand of the decoded instruction for the first thread (Khorasani [0058-0059], shared register pool lookup table indicates available register spaces according to a thread identifier).

Claims 1 and 6-8 refer to a processor embodiment of the system embodiment of claims 16 and 18-20, respectively.  Therefore, the above rejections for claims 16 and 18-20 are applicable to claims 1 and 6-8, respectively.

Regarding claim 2, the combination of Maiyuran and Khorasani teaches the processor of claim 1, wherein decoder circuitry of a graphics processor core comprising the matrix acceleration hardware is to decode an encoded instruction into the decoded instruction (Maiyuran [0136], decoding instructions from cache).

Regarding claim 4, the combination of Maiyuran and Khorasani teaches the processor of claim 1, wherein the plurality of data processing units comprise multiply-accumulate (MAC) circuits to support matrix acceleration operations, and wherein each MAC circuit comprises multiplier circuits, shifters, and at least one adder (Maiyuran [0246], execution units to perform multiplication and accumulation, [0135], [0144], performing math instructions such as add and multiply on execution units of accelerator).

Regarding claim 5, the combination of Maiyuran and Khorasani teaches the processor of claim 1, wherein the first register space and the second register space comprise general register files (GRFs) (Maiyuran [0128], GRF array).

Regarding claim 9, the combination of Maiyuran and Khorasani teaches the processor of claim 1, wherein the processor comprises a graphics processing unit (GPU) (Maiyuran [0095], general purpose GPU).

Regarding claim 10, the combination of Maiyuran and Khorasani teaches the processor of claim 1, wherein the matrix acceleration hardware comprises systolic array hardware (Maiyuran [0135], systolic array).

Claims 11 and 13-15 refer to a method embodiment of the system embodiment of claims 16 and 18-20, respectively.  Therefore, the above rejections for claims 16 and 18-20 are applicable to claims 11 and 13-15 respectively.

Claims 21 and 23-25 refer to a medium embodiment of the system embodiment of claims 16 and 18-20, respectively.  Therefore, the above rejections for claims 16 and 18-20 are applicable to claims 21 and 23-25 respectively.

Allowable Subject Matter
2.  Claims 3, 12, 17, and 22 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The dependent claims listed above incorporate limitations regarding the matrix multiplication instruction encoding an indication to allow reading but not writing of registers used by the instruction, which distinguishes the claims from the previously recited prior art.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Li (US 2024/0403059) discloses a processor using a shared register group across threads of a warp.
Feiste (US 2023/0068637) discloses a processor for performing matrix multiplication operations and assigning register blocks based on their utilization.
Ciolkosz (US 2023/0111125) discloses a processor with a shared register space in a GPU used for matrix multiplication acceleration.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J METZGER whose telephone number is (571)272-3105. The examiner can normally be reached Monday-Friday 8:30-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached at 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL J METZGER/             Primary Examiner, Art Unit 2183

Read full office action

Prosecution Timeline

Nov 18, 2022

Application Filed

Jan 17, 2023

Response after Non-Final Action

Jan 15, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/496,013

Patent 12591517

FETCHING VECTOR DATA ELEMENTS WITH PADDING

2y 5m to grant Granted Mar 31, 2026

18/358,894

Patent 12578965

Biased Indirect Control Transfer Prediction

2y 5m to grant Granted Mar 17, 2026

18/237,511

Patent 12566610

MICROPROCESSOR WITH APPARATUS AND METHOD FOR REPLAYING LOAD INSTRUCTIONS

2y 5m to grant Granted Mar 03, 2026

18/596,106

Patent 12566607

ROBUST, EFFICIENT MULTIPROCESSOR-COPROCESSOR INTERFACE

2y 5m to grant Granted Mar 03, 2026

18/406,527

Patent 12561139

ENCODING AND DECODING VARIABLE LENGTH INSTRUCTIONS

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

90%

Grant Probability

98%

With Interview (+8.1%)

2y 8m

Median Time to Grant

Low

PTA Risk

Based on 482 resolved cases by this examiner. Grant probability derived from career allow rate.