Last updated: April 19, 2026

Application No. 18/243,264

MATRIX MULTIPLICATION UNIT WITH FLEXIBLE PRECISION OPERATIONS

Non-Final OA §101§103§DP

Filed

Sep 07, 2023

Examiner

UNELUS, ERNEST

Art Unit

2181

Tech Center

2100 — Computer Architecture & Software

Assignee

Advanced Micro Devices, Inc.

OA Round

1 (Non-Final)

Interview Optional

— +38.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 540 resolved cases, 2023–2026

Examiner Intelligence

UNELUS, ERNEST View full profile →

Grants 77% — above average

Career Allow Rate

417 granted / 540 resolved

+22.2% vs TC avg

Strong +39% interview lift

Without

With

+38.6%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

29 currently pending

Career history

569

Total Applications

across all art units

Statute-Specific Performance

§101

5.8%

-34.2% vs TC avg

§103

37.3%

-2.7% vs TC avg

§102

45.8%

+5.8% vs TC avg

§112

3.5%

-36.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 540 resolved cases

Office Action

§101 §103 §DP

DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

	The instant application having Application No. 18/243,264 has a total of 20 elected claims pending in the application; there are 3 independent claims and 17 dependent claims, all of which are ready for examination by the examiner.

INFORMATION CONCERNING OATH/DECLARATION
Oath/Declaration
The applicant’s oath/declaration has been reviewed by the examiner and is found to conform to the requirements prescribed in 37 C.F.R. 1.63.

INFORMATION CONCERNING DRAWINGS
Drawings
The applicant’s drawings submitted are acceptable for examination purposes.

ACKNOWLEDGEMENT OF REFERENCES CITED BY APPLICANT
As required by M.P.E.P.  609(C), the applicant’s submissions of the Information Disclosure Statements 10/02/2023, 10/15/2024, 04/09/2025 and 09/30/2025 are acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. As required by M.P.E.P 609 C(2), a copy of the PTOL-1449 initialed and dated by the examiner is attached to the instant office action.
	
REJECTIONS NOT BASED ON PRIOR ART

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 21-40 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.  Claims 21-40 are directed to the abstract idea of a judicial exception for abstract idea without significantly more. For example, claim 1 is directed to performing matrix multiplication on first portions of a first matrix and first portions of a second matrix, which are abstract idea. Additionally, when considered as a whole, the claim does not include an inventive concept sufficient to transform the abstract idea into a patent-eligible application. The claim merely recites generic computing processors (e.g., vector signal processors) without any improvements to functioning of a computer or any specific technological implementation. The claim does not add any meaningful limitations beyond the abstract idea itself. 
	
REJECTIONS BASED ON PRIOR ART

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory obviousness-type double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and  In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).

A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).

The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed, approved immediately upon submission, and reduces waiting time for Terminal Disclaimer to be manually approved.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.  

Claims 21-40 are rejected on the ground of nonstatutory double patenting over the claims of 1-20 of U.S. Pat. No. 11,762,658, since the claims, if allowed, would improperly extend the “right to exclude” already granted in patents. Although the conflicting claims are not identical, they are not patentably distinct from each other because the subject matter claimed in the instant application is at least fully disclosed in the reference patents and application.

Claim Rejections - 35 USC § 103

1.	 In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

2.	Claims 21-40 are rejected under 35 U.S.C. 103(a) as being unpatentable over Wilder et al. (US pub. 2010/0274990), hereinafter, “Wilder”, in view of Boswell et al. (US pub. 2022/0391206), hereinafter, “Boswell”

3.         As per claims 21, 29 and 37, Wilder discloses an apparatus [see paragraph 0002, which discloses “an apparatus and method for performing SIMD (Single Instruction Multiple Data) multiply-accumulate (MAC) operations”] comprising: a plurality of vector signal processors (VSPs) (taking into consideration of applicant’s specification/claim disclosing “plurality of vector signal processors (VSPs) comprising multiply/accumulate elements”, see paragraph 0015 of Wilder, which discloses a SIMD data processing circuitry to perform a plurality of iterations of a multiply-accumulate process and paragraph 0026 which discloses a SIMD register bank for storing data elements), wherein the VSPs perform matrix multiplication on first portions of a first matrix and first portions of a second matrix (see paragraph 0048 which discloses “performing said plurality of iterations of a multiply-accumulate process, each iteration of the multiply-accumulate process comprising performing N multiply-accumulate operations in parallel in order to produce N multiply-accumulate data elements”).
Wilder fails to expressly discloses wherein subsets of the first portions of the first matrix accessed by a VSP of the plurality of VSPs are changed such that a different VSP of the plurality of VSPs accesses the subsets after the VSP performs the matrix multiplication.
To take into consideration, from Boswell: paragraph 0081, disclosing “many algorithms are designed around a fundamental arithmetic operation of multiplying a first input matrix with a second input matrix and summing the result with a third input matrix (i.e., a collector matrix)… where A is an input matrix of size N×K, B is an input matrix of size K×M, and C is the collector matrix of size N×M. The collector matrix C is read in from the register file, and the results of the MMA operation are accumulated and written over the data for the collector matrix C in the register file. In one embodiment, the collector matrix C and the result matrix D (C.sub.out=D) may be different operands such that the result of the MMA operation is not written over the collector matrix C ”, paragraph 0082, which discloses “the MMA operation multiplies an input matrix A 710 by an input matrix B 720, and accumulates the result in a collector matrix C 730” and paragraph 0088, which discloses “For example, as shown in FIG. 7, a first element of the collector matrix C.sub.0,0 is generated as the result of a dot product operation between a first vector <A.sub.0,0, A.sub.0,1, A.sub.0,2, A.sub.0,3> of the input matrix A 710 and a first vector <B.sub.0,0, B.sub.1,0, B.sub.2,0, B.sub.3,0> of the input matrix B 720. The first vector of the input matrix A 710 represents a first row of the input matrix A 710. The first vector of the input matrix B 720 represents a first column of the input matrix B 720. Thus, the dot product between these two vectors is given as: where the dot product operation is fundamentally the execution of four multiplication operations performed on corresponding elements of the two vectors followed by four addition operations that sum the four partial products generated by the multiplication operations along with the initial value of the element of the collector matrix. Each of the other elements of the collector matrix C 730 is then calculated in a similar manner using different combinations of the vectors of the input matrices. For example, another element of the collector matrix C 730, element C.sub.3,2, is generated as the result of a dot product operation between a fourth vector <A.sub.3,0, A.sub.3,1, A.sub.3,2, A.sub.3,3> of the input matrix A 710 and a third vector <B.sub.0,2, B.sub.1,2, B.sub.2,2, B.sub.3,2> of the input matrix B 720. As shown in the MMA operation of FIG. 7, each vector of the input matrix A 710 is consumed by eight dot product operations configured to generate a corresponding row of elements of the collector matrix C 730. Similarly, each vector of the input matrix B 720 is consumed by eight dot product operations configured to generate a corresponding column of elements of the collector matrix C 730. While each of the 64 dot product operations to generate the elements of the collector matrix C 730 is unique as defined by using a different pair of vectors from the input matrices, each vector of the first input operand and each vector of the second input operand are consumed by multiple dot product operations and contribute to multiple individual elements of a result matrix”.
More importantly, with respect to claim limitation “wherein subsets of the first portions of the first matrix accessed by a VSP of the plurality of VSPs are changed such that a different VSP of the plurality of VSPs accesses the subsets after the VSP performs the matrix multiplication”, see paragraphs 0081 and 0102, which teach different cores/processors accessing metrices results in a register file, result queue.  
It would have been obvious to one having ordinary skills in the art before the effective filling date of the claimed invention to incorporate Boswell’s teaching of matrix multiply and accumulate (MMA) dot product operations to include steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder, into Wilder’s teaching of a method for performing SIMD multiply-accumulate operations including SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements, for the benefit of improving the efficiency of a processor by having register files with multiple banks such that operands can be efficiently stored in separate banks and multiple operands can be loaded from the register file into the inputs of a datapath in a single clock cycle, as taught by Boswell.

4.         As per claims 22 and 30, the combination of Wilder and Boswell discloses “The apparatus of claim 21” [see rejection to claim 21 above], wherein the plurality of VSPs further comprise a first buffer, a second buffer, and an output buffer, and wherein subsets of the first portions of the first and second matrices are copied to the first and second buffers in the plurality of VSPs prior to initiating the matrix multiplication (see paragraph 0073 of Wilder, which discloses “all of the required input data elements and coefficient data elements are read from a SIMD register bank into internal registers of the SIMD data processing circuitry prior to the computations”). 

5.         As per claims 23 and 31, the combination of Wilder and Boswell discloses “The apparatus of claim 22” [see rejection to claim 22 above], wherein, during a current iteration of the matrix multiplication, the VSPs perform matrix multiplication on the subsets of the first portions of the first and second matrices stored in the corresponding first and second buffers (see paragraph 0048 of Wilder, which discloses “performing said plurality of iterations of a multiply-accumulate process, each iteration of the multiply-accumulate process comprising performing N multiply-accumulate operations in parallel in order to produce N multiply-accumulate data elements”).

6.         As per claims 24 and 32, the combination of Wilder and Boswell discloses “The apparatus of claim 23” [see rejection to claim 23 above], wherein, during the current iteration, the subsets of the first portions of the first matrix comprise operands that are rotated between different VSPs through a crossbar switch that interconnects the plurality of VSPs after the VSPs perform the matrix multiplication for the current iteration (see paragraph 0015 of Wilder, which discloses “during each iteration, the SIMD data processing circuitry determines N input data elements from the first vector and a single coefficient data element from the second vector” and paragraph 0107 of Boswell, which discloses “the crossbar 915 and operand collectors 920 may be implemented between the register file 420 and the one or more cores 450. Furthermore, the result queue 950 may be implemented between the one or more cores 450 and the interconnect network 480, which enables the result stored in the result queue 950 to be written back to the register file 420. Consequently, the processor 900 is a PPU 200 comprising a plurality of SMs 340, each SM 340 in the plurality of SMs 340 including the register file 420 and a number of cores 450, each core 450 in the number of cores 450 including an instance of the HMMA datapath 930”).

7.         As per claims 25, 33 and 38, the combination of Wilder and Boswell discloses “The apparatus of claim 24” [see rejection to claim 24 above], further comprising: a crossbar switch that interconnects the plurality of VSPs, wherein the subsets of the first portions of the first matrix are rotated to the different VSPs via the crossbar switch (see paragraph 0099 of Boswell, which discloses “a crossbar 915 or other type of switchable interconnect may be coupled to the read ports of the register banks 910 and the inputs of the operand collectors. The crossbar 915 can be configured to route the signals from a read port associated with any of the register banks 910 to a particular operand collector 920” and paragraph 0107, which discloses “the crossbar 915 and operand collectors 920 may be implemented between the register file 420 and the one or more cores 450. Furthermore, the result queue 950 may be implemented between the one or more cores 450 and the interconnect network 480, which enables the result stored in the result queue 950 to be written back to the register file 420. Consequently, the processor 900 is a PPU 200 comprising a plurality of SMs 340, each SM 340 in the plurality of SMs 340 including the register file 420 and a number of cores 450, each core 450 in the number of cores 450 including an instance of the HMMA datapath 930”).

8.         As per claims 26 and 34, the combination of Wilder and Boswell discloses “The apparatus of claim 21” [see rejection to claim 21 above], wherein the VSPs perform the matrix multiplication for all combinations of the subsets of the first portions of the first and second matrices during a first round of iterations (see figures 1A-1B and paragraph 0114 of Wilder, which discloses “each repeating MAC instruction receives a first vector of input data elements, and a second vector of coefficient data elements, and generates one set of N multiply-accumulate results”).

9.         As per claims 27 and 35, the combination of Wilder and Boswell discloses “The apparatus of claim 26” [see rejection to claim 26 above], wherein the plurality of VSPs further comprise: output buffers, wherein the VSPs write accumulated results of the multiplications to the output buffer subsequent to performing the matrix multiplication in the first round
of iterations and prior to beginning a second round of iterations (see paragraph 0015 of Wilder, which discloses “after performance of the plurality of iterations the SIMD data processing circuitry then outputs N multiply-accumulate results”).

10.         As per claims 28 and 36, the combination of Wilder and Boswell discloses “The apparatus of claim 27” [see rejection to claim 27 above], wherein second portions of the first and second matrices are fetched into a plurality of registers in response to the VSPs writing the accumulated results to the output buffers (see paragraph 0015 of Wilder, which discloses “during each iteration, the SIMD data processing circuitry determines N input data elements from the first vector and a single coefficient data element from the second vector”).

11.         As per claim 39, the combination of Wilder and Boswell  discloses “The method of claim 37” [see rejection to claim 37 above], further comprising: fetching the first portions of the first matrix and the first portions of the second matrix into vector general-purpose registers (VGPRs) associated with the VSPs; and copying the first portions of the first matrix and the first portions of the second matrix from the VGPRs into first and second buffers in the VSPs, respectively, prior to beginning the multiplication (see paragraph 0015 of Wilder, which discloses “during each iteration, the SIMD data processing circuitry determines N input data elements from the first vector and a single coefficient data element from the second vector”).

12.         As per claim 40, the combination of Wilder and Boswell discloses “The method of claim 37” [see rejection to claim 37 above], writing accumulated results of multiplying the first portions of the first and second matrices into an output buffer in response to completing the multiplication. (see paragraph 0015 of Wilder, which discloses “during each iteration, the SIMD data processing circuitry determines N input data elements from the first vector and a single coefficient data element from the second vector”).

CLOSING COMMENTS
CONCLUSION

a. STATUS OF CLAIMS IN THE APPLICATION 

            The following is a summary of the treatment and status of all claims in the 

application as recommended by M.P.E.P. 707.07(i):

a (1) CLAIMS REJECTED IN THE APPLICATION 

            Per the instant office action, claims 21-40 have received a first action on the merits and are subject of a first action non-final.

b. DIRECTION OF FUTURE CORRESPONDENCES

            Any inquiry concerning this communication or earlier communications from the 

Examiner should be directed to Ernest Unelus whose telephone number is (571) 272-

8596. The examiner can normally be reached on Monday to Friday 9:00 AM to 5:00PM. 

IMPORTANT NOTE

            If attempts to reach the above noted Examiner by telephone are unsuccessful, the Examiner's supervisor, Mr. Idriss Alrobaye, can be reached at the following telephone number: Area Code (571) 270-1023.

The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through private PAIR only. For more information about the PMR system, see her//pair-direct.uspto.gov. 
Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217- 91 97 (toll-free).

/Ernest Unelus/
Primary Examiner
Art Unit 2181

Read full office action

Prosecution Timeline

Sep 07, 2023

Application Filed

Jan 28, 2026

Non-Final Rejection — §101, §103, §DP (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/072,603

Patent 12585420

AUDIO SWITCH WITH TURN-OFF HELPER FOR DIGITAL INTERFACE

2y 5m to grant Granted Mar 24, 2026

18/080,604

Patent 12585605

COARSE GRAINED RECONFIGURABLE ARCHITECTURE

2y 5m to grant Granted Mar 24, 2026

18/050,852

Patent 12573798

DYNAMIC LANE REALLOCATION BASED ON BANDWIDTH NEEDS

2y 5m to grant Granted Mar 10, 2026

18/451,169

Patent 12572484

HDMI display control

2y 5m to grant Granted Mar 10, 2026

18/090,886

Patent 12561074

SYSTEM AND METHOD FOR SECURE ACCESS TO A DISTRIBUTED VIRTUAL FIRMWARE NETWORK DRIVE

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

77%

Grant Probability

99%

With Interview (+38.6%)

3y 3m

Median Time to Grant

Low

PTA Risk

Based on 540 resolved cases by this examiner. Grant probability derived from career allow rate.