DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-25 are rejected under 35 U.S.C. 103 as being unpatentable over Pal et al. (US 20210191724 A1, hereinafter “Pal”) in view of Heinecke et al. (US 20190227797 A1, hereinafter “Heinecke”).
As per claim 1, Pal teaches A processor comprising: processing resources comprising multiplier circuitry to: receive operands for a matrix multiplication operation, wherein the operands comprising two source matrices to be multiplied as part of the matrix multiplication operation (Pal: Fig. 20; [0235]);
and issue a multiply and add vector (MADV) instruction for the multiplication operation (Pal: Fig. 21A; [0242]),
wherein the MADV instruction to multiply two vectors of the two source matrices in a single floating point (FP) pipeline of the processor (Pal: [0248], [0253]).
However, while Pal discloses an execution pipeline for MADV instructions (Fig. 21C), Pal does not explicitly disclose detailed circuitry for performing the instruction. Thus, Pal does not teach utilizing a double accumulator access output.
Heinecke teaches utilizing a double accumulator access output (Heinecke: Fig. 15 element 1501; [0150]).
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to modify, with a reasonable expectation of success, the execute stages of Pal with the multiply-accumulate circuitry of Heinecke. While Pal discloses the instruction to be processed, Pal does not disclose the details of the underlying hardware that implements it. A person of ordinary skill of the art would look to Heinecke for suitable SIMD units for detailed, because both references are related to Intel processors. Thus, one would have been motivated to combine these references because both references disclose performing vector multiply-add operations on Intel processors, and combining prior art elements according to known methods to yield predictable results (circuitry for performing a vector multiply-add operation).
As per claim 2, Pal/Heinecke further teaches The processor of claim 1, wherein the multiplier circuitry is further to: iterate through a loop of the MADV instruction to combine results of vector multiplications to generate a final result for the matrix multiplication operation (Pal: Fig. 22; [0251]);
and output the final result for the matrix multiplication operation (Pal: [0235]).
As per claim 3, Pal/Heinecke further teaches The processor of claim 1, wherein the MADV instruction is to generate a matrix output (Pal: [0235]).
As per claim 4, Pal/Heinecke further teaches The processor of claim 1, wherein the multiplier circuitry comprises one or more multiplier-accumulate (MAC) units to support double precision (DP) multiplication operations on at least two elements of a vector of one of the two source matrices in a same cycle (Pal: [0112], [0124]).
As per claim 5, Pal/Heinecke further teaches The processor of claim 1, wherein the MADV instruction is issued to a single floating point unit (FPU) pipeline of the processor, the FPU pipeline configured to provide double wide source input for one of the two source matrices and configured to provide the double accumulator access via a double wide output for a destination accumulator (Heinecke: Fig 15, [0150]; Fig. 13, [0137] showing exemplary source registers that are double wide).
As per claim 6, Pal/Heinecke further teaches The processor of claim 1, wherein the multiplier circuitry is comprised in a single floating point unit (FPU) pipeline that comprises 16 channel double precision (DP) multiplier-accumulate (MAC) units, and wherein the 16 channel double precision MAC units can support a 16 channel single precision (SP) MAC to provide double speed for the MADV instruction with a SP data type (Heinecke: Fig. 5B element 528; [0093] showing the example MAC circuitry of Fig. 15 can expand to 16-wide).
As per claim 7, Pal/Heinecke further teaches The processor of claim 1, wherein the multiplier circuitry is part of an arithmetic logic unit (ALU) and comprises a plurality of adders and shifters (Heinecke: Fig. 5B element 528; [0093]; [0125] last sentence; wherein the MAC circuitry of Fig. 15 is part of an ALU).
As per claim 8, Pal/Heineck further teaches The processor of claim 1, wherein the processor comprises a graphics processing unit (GPU) (Pal: [0087]).
As per claim 9, Pal/Heinecke further teaches The processor of claim 1, wherein the processor is at least one of a single instruction multiple data (SIMD) machine or a single instruction multiple thread (SIMT) machine (Pal: [0112]).
As per claim 10, the claim is directed to a method that implements the same or similar features as the processor of claim 1, and is therefore rejected for at least the same reasons therein.
As per claim 11, Pal/Heinecke further teaches The method of claim 10, further comprising: iterating through a loop of the MADV instruction to combine results of vector multiplications to generate a final result for the matrix multiplication operation (Pal: Fig. 22; [0251]);
and outputting the final result for the matrix multiplication operation (Pal: [0235]); wherein the MADV instruction is to generate a matrix output (Pal: [0235]).
As per claims 12-15, the claims are directed to a method that implements the same or similar features as the processor of claims 4-7, respectively, and are therefore rejected for at least the same reasons therein.
As per claim 16, the claim is directed to a system that implements the same or similar features as the method of claim 10, and is therefore rejected for at least the same reasons therein. Furthermore, Pal/Heinecke teaches a memory to store a block of data (Pal: [0231]).
As per claims 17-20, the claims are directed to a method that implements the same or similar features as the method of claims 11, 12-14, respectively, and are therefore rejected for at least the same reasons therein.
As per claim 21, the claim is directed to a non-transitory computer-readable medium that implements the same or similar features as the processor of claim 2, and is therefore rejected for at least the same reasons therein.
As per claims 22-25, the claims are directed to a non-transitory computer-readable medium that implements the same or similar features as the processor of claims 3-6, respectively, and are therefore rejected for at least the same reasons therein.
Prior Art Made of Record
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
Chen et al. (US 20190129718 A1, hereinafter “Chen”) discloses a plurality of compute units with a plurality of lanes (Fig. 1) to compute instructions on packed vectors and scalars. Each computation lane comprises a plurality of ALUs that each access a respective portion of the destination operand (Fig. 2 element 250A-250B; [0040]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHAT N LE whose telephone number is (571)272-0546. The examiner can normally be reached Monday-Friday 8:30AM-5PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew T Caldwell can be reached at (571) 272-3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/P.N.L./
Phat LeExaminer, Art Unit 2182 (571) 272-0546
/ANDREW CALDWELL/Supervisory Patent Examiner, Art Unit 2182