DETAILED ACTION
This action is responsive to the application filed on 11/13/2022. Claims 1-25 are pending and have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 3-5, 15-18, 20 and 24-25 objected to because of the following informalities:
In regards to claim 3, lines 6, 7 and 10 each include an instance of the limitation stating “plurality of execution circuitry” that should be amended to “plurality of execution circuits” as to correct a grammatical issue.
In regards to claim 15, lines 2 and 5 each include a recitation of the term “indicted” that should be amended to “indicated” as to correct a typographical error.
In regards to claim 17, line 2 amend “performing” to “perform” as to correct a grammatical issue.
In regards to claim 20, lines 5-6 and 9 each include an instance of the limitation stating “plurality of execution circuitry” that should be amended to “plurality of execution circuits” as to correct a grammatical issue.
In regards to claim 24, lines 6 and 9 each include a recitation of the term “indicted” that should be amended to “indicated” as to correct a typographical error.
Claims 4-5, 16-18 and 25 are dependent upon one or more claims above and therefore are similarly objected to for including the deficiencies of one or more claims above.
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., abstract idea) without significantly more.
Regarding claim 1:
Subject Matter Eligibility Analysis Step 1:
Claim 1 recites “An apparatus” and thus a machine, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
Claim 1 recites “…perform operations corresponding to a complex number matrix multiplication… the complex number matrix multiplication… to indicate a first source complex number matrix having M rows by K columns of complex numbers, a second source complex number matrix having K rows by N columns of complex numbers, the operations including, for each row m of the M rows, and for each column n of the N columns, to: generate K complex numbers by K complex multiplications of K complex numbers of the row m of the first source complex number matrix with K corresponding complex numbers of the column n of the second source complex number matrix; combine the K generated complex numbers to generate a complex number…combine the generated complex number with a complex number at, a row m of M rows and a column n of N columns of a destination complex number matrix…” which describe a process that under its broadest reasonable interpretation encompasses mathematical calculations. That is other than reciting generic computing components (e.g. a cache and a processor) nothing in the claimed elements precludes the steps from practically being performed in the mind and/or with the aid of pen and paper.
For example, the claim discusses performing matrix multiplication of two matrices including complex values with an optional accumulation of values (see paragraphs [00203-00205] of applicant’s specification which disclose arithmetic equations that correspond to the mathematics indicated in the claim), thus the limitation encompasses mathematical calculations (MPEP 2106.04(a)(2)(I)(C)).
If a claim, limitation, under its broadest reasonable interpretation, covers performance of a mathematical calculation in the mind with the aid of pen and paper but for the recitation of generic computer components then it falls within the “Mathematical concepts” grouping of abstract ideas.
Subject Matter Eligibility Analysis Step 2A Prong 2:
Claim 1 further recites additional elements of
a cache; a processor coupled with the cache, the processor to perform…an instruction
and either store the generated complex number at a row m of M rows and a column n of N columns of a destination
These additional elements do not integrate the abstract idea into a practical application because (a) recites at a high-level of generality the words “apply it” (or an equivalent) with the judicial exception, or use mere instructions to implement the abstract idea on a computer, or merely uses a computer as a tool to perform the abstract idea (See MPEP 2106.05(f)) and (b) recites insignificant extra-solution activity (i.e. data outputting) (See MPEP 2106.05 (g)).
Therefore, claim 1 is directed to the abstract idea.
Subject Matter Eligibility Analysis Step 2B:
The additional elements of claim 1 do not provide significantly more than the abstract idea itself, taken alone and in combination, because (a) uses mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea which cannot provide significantly more (see MPEP 2106.05(f)). While, (b) recites insignificant extra-solution activity of data outputting (see MPEP 2106.05(g)) which the courts have deemed to be well-understood, routine and conventional activities that do not provide significantly more (MPEP 2106.05(d)); the courts have recognized that receiving or transmitting data over a network ((Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362), as well as storing and retrieving information in memory are well‐understood, routine, and conventional functionalities (Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93)).
Therefore, based on the discussion of the additional elements above, claim 1 is not patent eligible.
Claim 2, dependent upon claim 1 further recites “…wherein the processor is a central processing unit (CPU), and wherein the CPU comprises: decode circuitry to decode the… instruction; and execution circuitry coupled with the decode circuitry, the execution circuitry to perform the operations corresponding to the… instruction” which ties the abstract idea of claim 1 to mere instructions used on a computer, using the words “apply it” with the judicial exception or merely uses a computer as a tool to perform an abstract idea (See MPEP 2106.05(f)). Furthermore, using computer components such as decoder and execution circuitry can be viewed as well-understood, routine and conventional because it is well-known in computer architecture to process instructions by using a pipeline which include decode and execute circuitry (See NPL reference “Computer Architecture A Quantitative Approach”, pages C-34 to C-35 and 232). Therefore, the claim recites no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claim 3, dependent upon claim 1 further recites“…wherein the processor is a graphics processing unit (GPU), and wherein the GPU comprises: decode circuitry to decode… the instruction; scheduler circuitry coupled with the decode circuitry, the scheduler circuitry to schedule… the instruction; a plurality of execution circuitry, each corresponding to a different thread of a thread group, the plurality of execution circuitry to collectively perform the operations corresponding to the instruction; and a plurality of sets of registers, each corresponding to a different one of the plurality of execution circuitry, the plurality of sets of registers to collectively store the first source, the second source, and the destination…” which ties the abstract idea of claim 1 to mere instructions used on a computer, using the words “apply it” with the judicial exception or merely uses a computer as a tool to perform an abstract idea (See MPEP 2106.05(f)) and ties the abstract idea to a particular technological environment (e.g. a GPU multi-threaded environment) (MPEP 2106.05(h)). Furthermore, using computer components such as decoder, scheduler, execution circuits and register files can be viewed as well-understood, routine and conventional because it is well-known in GPU architectures to include those types of hardware when executing SIMT instructions (See NPL reference “Computer Architecture A Quantitative Approach”, pages C-34 to C-35 and pages 291-297 and 310). Therefore, the claim recites no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claim 4, dependent upon claim 3, further recites “…wherein the instruction has a synchronization indicator to synchronize the threads of the thread group by causing the threads of the thread group to wait until all other of the threads of the thread group have executed … prior to execution of other subsequent instructions” which ties the abstract idea of claim 1 to mere instructions used on a computer, using the words “apply it” with the judicial exception or merely uses a computer as a tool to perform an abstract idea (See MPEP 2106.05(f)). Furthermore, the additional limitation ties the abstract idea to a particular field of use or technological environment (SIMT multi-threaded processor environment) (MPEP 2106.05(h)). Furthermore, synchronizing threads of a thread group can be viewed as well-understood, routine and conventional because it is well-known in graphics processing to synchronize threads of a thread group (See NPL reference “Computer Architecture A Quantitative Approach”, pages 313-314). Therefore, the claim recites no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claim 5, dependent upon claim 3, further recites “…wherein the instruction has an alignment indicator to indicate that the threads of the thread group are to execute the same… instruction” which ties the abstract idea of claim 1 to mere instructions used on a computer, using the words “apply it” with the judicial exception or merely uses a computer as a tool to perform an abstract idea (See MPEP 2106.05(f)). Furthermore, the additional limitation ties the abstract idea to a particular field of use or technological environment (SIMT multi-threaded processor environment) (MPEP 2106.05(h)). Furthermore, executing a same instruction across multiple threads simultaneously can be viewed as well-understood, routine and conventional because it is well-known in graphics processing to execute SIMT instructions (See NPL reference “Computer Architecture A Quantitative Approach”, pages 291-297 and 310). Therefore, the claim recites no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claim 6, dependent upon claim 1, further recites “wherein the complex number matrix multiplication instruction is a single-instruction, multiple-thread (SIMT) instruction” which ties the abstract idea of claim 1 to mere instructions used on a computer, using the words “apply it” with the judicial exception or merely uses a computer as a tool to perform an abstract idea (See MPEP 2106.05(f)). Furthermore, the additional limitation ties the abstract idea to a particular field of use or technological environment (multi-threaded processor environment) (MPEP 2106.05(h)). Therefore, the claim recites no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claim 7, dependent upon claim 1, further recites “…wherein the complex number matrix multiplication instruction has one or more matrix size indicators to indicate one or more of the M rows of the first source complex number matrix, the K columns of the first source complex number matrix, and the N columns of the second source complex number matrix”, which discloses a size indicator used in the instruction to indicate a size of the matrices used in the abstract idea. Thus, the limitation ties the abstract idea to using mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea which cannot provide significantly more (see MPEP 2106.05(f)) and ties the abstract idea to a particular type of data (matrix data with a particular number of columns and rows), e.g. a particular field of use or technological environment (MPEP 2106.05(h)). Therefore, the claim recites no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claim 8, dependent upon claim 1, further recites “…wherein the complex number matrix multiplication instruction has a matrix layout indicator to indicate a layout of one of the first and second source complex number matrices as being either a row-major layout or a column major layout”, which discloses a matrix layout indicator used in the instruction to indicate a layout of the matrices used in the abstract idea. Thus, the limitation ties the abstract idea to using mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea which cannot provide significantly more (see MPEP 2106.05(f)) and ties the abstract idea to a particular type of data (matrix in a column or row major layout), e.g. a particular field of use or technological environment (MPEP 2106.05(h)). Therefore, the claim recites no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claim 9, dependent upon claim 1, further recites “…wherein the destination complex number matrix has an integer multiple of four complex numbers” which discloses a number of elements stored in the destination matrix used in the abstract idea of claim 1. Thus, the claim recites additional embellishments of the abstract idea of claim 1 and includes no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claim 10, dependent upon claim 1, further recites “…the processor is to perform operations corresponding to a real number matrix multiplication instruction, the real number matrix multiplication instruction to indicate a first source real number matrix having M' rows by K' columns of real numbers, a second source real number matrix having K' rows by N' columns of real numbers, and the operations including, for each row m' of the M' rows, and for each column n' of the N' columns, to: generate K' real numbers by K' real multiplications of K' real numbers of the row m' of the first source real number matrix with K' corresponding real numbers of the column n' of the second source real number matrix; combine the K' generated real numbers to generate a real number; and either store the generated real number at, or combine the generated real number with a real number at, a row m' of M' rows and a column n' of N' columns of a destination real number matrix”, which discloses an additional abstract idea performed using generic computing components to store a result of the abstract idea. Thus, the additional limitations tie the abstract ideas to using mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea which cannot provide significantly more (see MPEP 2106.05(f) and insignificant extra-solution activities (data outputting) which are well-understood, routine and conventional (see MPEP 2106.05(d and g)). Therefore, the claim recites no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claim 11, dependent upon claim 10, further recites “…wherein the processor includes circuitry to be configured by the complex number matrix multiplication instruction to perform the operations corresponding to the complex number matrix multiplication instruction, and wherein the circuitry is to be configured by the real number matrix multiplication instruction to perform the operations corresponding to the real number matrix multiplication instruction”, thus the limitations tie the abstract idea to using mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea which cannot provide significantly more (see MPEP 2106.05(f). Therefore, the claim recites no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claim 12, dependent upon claim 1, further recites “wherein the processor, for each said row m of the M rows of the first source complex number matrix, and for each said column n of the N columns of the second source complex number matrix, is to generate the K complex numbers concurrently” which ties the abstract idea of claim 1 to being performed in a parallel computing environment such as a GPU. Thus, the additional limitations tie the abstract idea to using mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea which cannot provide significantly more (see MPEP 2106.05(f)) and to a particular technological computing environment (e.g. parallel computing) (see MPEP 2106.05(h)). Furthermore, parallel computing can be viewed as well-understood, routine and conventional because it is well-known to execute operations concurrently (See NPL reference “Computer Architecture A Quantitative Approach”, pages 9-10 and 291-294). Therefore, the claim recites no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claim 13, dependent upon claim 1, further recites “…wherein the processor, for each said row m of the M rows of the first source complex number matrix, and for each said column n of the N columns of the second source complex number matrix, is to generate a plurality of portions of the K complex numbers sequentially” which ties the abstract idea of claim 1 to being performed sequentially in a processor. Thus, the additional limitations tie the abstract idea to using mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea which cannot provide significantly more (see MPEP 2106.05(f)) and to a particular technological computing environment (e.g. sequential loop computing) (see MPEP 2106.05(h)). Furthermore, sequential computing can be viewed as well-understood, routine and conventional because it is well-known to execute operations sequentially (See NPL reference “Computer Architecture A Quantitative Approach”, pages C-34 to C-35 and 313). Therefore, the claim recites no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claim 14, dependent upon claim 1, further recites “…wherein the processor is to combine the generated complex number with the complex number at the row m of the M rows and the column n of the N columns of the destination complex number matrix” which recites further details of the abstract idea recited in claim 1. Further, the claim recites additional the limitations that tie the abstract idea to using mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea which cannot provide significantly more (see MPEP 2106.05(f)). Therefore, the claim recites no additional elements which could integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself.
Claims 15, 19 and 24 are similarly rejected on the same basis as claim 1 above.
Claims 16-18, 20-23 and 25 are similarly rejected on the same basis as claims 3-4, 7 and 14 above.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s)1-2, 7-8, 10-16, 18-19 and 22-25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zivkovic, PGPUB No. 2020/0310820, and further in view of Grochowski, PGPUB No. 2019/0258481.
In regards to claim 1, Zivkovic discloses An apparatus (See Fig. 12) comprising: a cache; and a processor coupled with the cache (See Fig. 12 and [0142]: wherein a L3 cache (element 1216) is coupled with the processor (element 1255)) the processor to perform operations corresponding to a complex number matrix multiplication instruction ([0143, 0158, 0181-0182]: wherein a processor includes an matrix accelerator to perform multiply accumulate instructions using complex numbers (see abstract, 0145-0147, 0158 and Fig. 21 for further clarity)) the complex number matrix multiplication instruction to indicate a first source complex number matrix having a row and a column of complex numbers, a second source complex number matrix having a row by a column of complex numbers ([0145-0147]: wherein a complex multiplication or alternatively multiplication and accumulation instruction indicates a first and a second source vector registers each storing a 1x1 matrices of complex numbers) the operations including, for each row m, and for each column n, to: generate K complex numbers by K complex multiplications of K complex numbers of the row m of the first source complex number matrix with K corresponding complex numbers of the column n of the second source complex number matrix ([0145-0147] and See Fig. 13 resultants of multipliers (element 1302)) combine the K generated complex numbers to generate a complex number ([0145-0147]: wherein adders (1303) combine the generated complex numbers to generate a complex number (See Fig. 13)) and either store the generated complex number at, or combine the generated complex number with a complex number at, a row and a column n of a destination complex number matrix. ([0145-0147]: wherein a generated complex number result is stored at 1x1 matrix destination register for operand z or combined with a complex number in an accumulator register)
Zivkovic does not disclose a matrix multiplication instruction indicating a first source matrix having M rows by K columns of number, a second source complex number matrix having K rows by N columns of complex numbers, the operations including, for each row m of the M rows, the operations including, for each row m of the M rows, and for each column n of the N columns, to: generate K numbers by K multiplications of K numbers of the row m of the first source number matrix with K corresponding numbers of the column n of the second source number matrix; combine the K generated numbers to generate a number; and either store the generated number at, or combine the generated number with a number at, a row m of M rows and a column n of N columns of a destination number matrix. Zivkovic generally discloses performing complex number matrix operations using 1x1 matrices using vector registers, and additionally discloses performing real matrix multiplication using a NxM matrix and a 1xM matrix ([0148]), however Zivkovic has not explicitly disclosed performing matrix multiplication using MxN and KxN sized matrices.
Grochowski discloses a matrix multiplication instruction indicating a first source matrix having M rows by K columns of number, a second source complex number matrix having K rows by N columns of complex numbers ([0064-0065]: wherein the matrix multiplication and/or accumulation instruction indicates a first source matrix A having a number of rows and columns and a source matrix B having a number of rows and columns. Wherein the number of columns of matrix A equals the number of rows of matrix B) the operations including, for each row m of the M rows, the operations including, for each row m of the M rows, and for each column n of the N columns, to: generate K numbers by K multiplications of K numbers of the row m of the first source number matrix with K corresponding numbers of the column n of the second source number matrix; combine the K generated numbers to generate a number; and either store the generated number at, or combine the generated number with a number at, a row m of M rows and a column n of N columns of a destination number matrix. ([0064, 0069 and 0078]: wherein source matrices A and B are multiplied to generate K number of products which are combined and then a result is stored to matrix C or a result is accumulated with data in matrix C (See Fig. 5))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the complex matrix multiply accumulate instructions of Zivkovic to use two-dimensional source matrices such that a plurality of rows and columns are used in the operations as the matrix multiply and accumulate instructions of Grochowski. It would have been obvious to one of ordinary skill in the art because it would allow the instructions of Zivkovic to operate on flexible and/or arbitrary size matrices allowing for wide range size matrices to be used (Grochowski [0032-0033]). Furthermore, the courts have deemed that changes in size are obvious, thus it would have been obvious to execute complex matrix operations using 2-D matrices (See In re Rose, 220 F.2d 459, 105 USPQ 237 (CCPA 1955) (MPEP 2144.04(IV)(A)).
Claim 15 is similarly rejected on the same basis as claim 1 above as claim 15 is the method claim corresponding to the apparatus of claim 1 above.
Claim 19 is similarly rejected on the same basis as claim 1 above as claim 19 is the system claim corresponding to the apparatus of claim 1 above. (Note: Claim 15 discloses an additional limitation stating a processor coupled to a DRAM memory. This limitation is disclosed in Zivkovic [0116 and 12A]: discloses a system memory coupled to processor (element 1255), wherein system memory can be a DRAM)
Claim 24 is similarly rejected on the same basis as claim 1 above as claim 24 is the non-transitory medium claim corresponding to the apparatus of claim 1 above. (Note: Claim 24 discloses an additional limitation stating a non-transitory medium. This limitation is disclosed in Zivkovic [0133-0134 and 0229])
In regards to claim 2, the combination of Zivkovic and Grochowski discloses The apparatus of claim 1 (see rejection of claim 1 above) wherein the processor is a central processing unit (CPU), and wherein the CPU comprises: decode circuitry to decode the complex number matrix multiplication instruction (Zivkovic [0138, 0140 and Fig. 12A]|Grochowski: See Fig. 1) and execution circuitry coupled with the decode circuitry, the execution circuitry to perform the operations corresponding to the complex number matrix multiplication instruction. (Zivkovic [0138, 0143 and Fig. 12A]| Grochowski: See Fig. 1))
In regards to claim 7, the combination of Zivkovic and Grochowski discloses The apparatus of claim 1 (see rejection of claim 1 above) wherein the complex number matrix multiplication instruction has one or more matrix size indicators to indicate one or more of the M rows of the first source complex number matrix, the K columns of the first source complex number matrix, and the N columns of the second source complex number matrix. (Grochowski [0032-0033]: wherein matrix instruction includes matrix dimension indicators indicating number of rows and columns of each source matrix)
Claim 18 is similarly rejected on the same basis as claim 7 above as claim 18 is the method claim corresponding to the apparatus of claim 7 above.
Claim 23 is similarly rejected on the same basis as claim 7 above as claim 23 is the system claim corresponding to the apparatus of claim 7 above.
In regards to claim 8, the combination of Zivkovic and Grochowski discloses The apparatus of claim 1 (see rejection of claim 1 above) wherein the complex number matrix multiplication instruction has a matrix layout indicator to indicate a layout of one of the first and second source complex number matrices as being either a row-major layout or a column major layout. (Grochowski [0061]: wherein matrix instruction has a memory layout dimension indicator indicated a row-major or column major layout of one of the matrix source operands)
In regards to claim 10, the combination of Zivkovic and Grochowski discloses The apparatus of claim 1 (see rejection of claim 1 above) wherein the processor is to perform operations corresponding to a real number matrix multiplication instruction, the real number matrix multiplication instruction to indicate a first source real number matrix having M' rows by K' columns of real numbers, a second source real number matrix having K' rows by N' columns of real numbers (Zivkovic [0181-0188]| Grochowski [0027, 0108 and Figs. 1 and 5]) and the operations including, for each row m' of the M' rows, and for each column n' of the N' columns, to: generate K' real numbers by K' real multiplications of K' real numbers of the row m' of the first source real number matrix with K' corresponding real numbers of the column n' of the second source real number matrix; combine the K' generated real numbers to generate a real number; and either store the generated real number at, or combine the generated real number with a real number at, a row m' of M' rows and a column n' of N' columns of a destination real number matrix. (Zivkovic [0181-0188] | Grochowski [0027, 0108 and Figs. 1 and 5])
In regards to claim 11, the combination of Zivkovic and Grochowski discloses The apparatus of claim 10 (see rejection of claim 10 above) wherein the processor includes circuitry to be configured by the complex number matrix multiplication instruction to perform the operations corresponding to the complex number matrix multiplication instruction, and wherein the circuitry is to be configured by the real number matrix multiplication instruction to perform the operations corresponding to the real number matrix multiplication instruction. (Zivkovic [0139, 0147 and 0181-0182]: wherein complex number multiply accumulate operations and real number matrix multiply accumulate operations are performed using existing complex multiply accumulate circuitry (also see abstract and Figs. 13 and 17-18))
In regards to claim 12, the combination of Zivkovic and Grochowski discloses The apparatus of claim 1 (see rejection of claim 1 above) wherein the processor, for each said row m of the M rows of the first source complex number matrix, and for each said column n of the N columns of the second source complex number matrix, is to generate the K complex numbers concurrently. (Zivkovic [0206] |Grochowski [0097-0098])
In regards to claim 13, the combination of Zivkovic and Grochowski discloses The apparatus of claim 1 (see rejection of claim 1 above).
The combination of Zivkovic and Grochowski thus far does not disclose wherein the processor, for each said row m of the M rows of the first source complex number matrix, and for each said column n of the N columns of the second source complex number matrix, is to generate a plurality of portions of the K complex numbers sequentially.
Grochowski discloses wherein the processor, for each said row m of the M rows of the first source matrix, and for each said column n of the N columns of the second source matrix, is to generate a plurality of portions of the K numbers sequentially. ([0041-0045 and 0078-0079]: wherein matrices are multiplied in sequences such that portions of generated products are generated sequentially. (see abstract))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the matrix multiply instruction of Zivokic to be performed sequentially such that the instruction can be stopped and restarted using intermediate matrix multiply results such as the matrix multiply instruction of Grochowski. It would have been obvious because this may help to ensure continued forward progress in the face of possible interruptions, and may tend to be especially advantageous for long or extremely long completion times of the matrix multiplication instructions (Grochowski [0046]).
In regards to claim 14, the combination of Zivkovic and Grochowski discloses The apparatus of claim 1 (see rejection of claim 1 above) wherein the processor is to combine the generated complex number with the complex number at the row m of the M rows and the column n of the N columns of the destination complex number matrix. (Zivkovic [0147 and Fig. 13] | Grochowski [0064-0065])
Claim 16 is similarly rejected on the same basis as claim 14 above as claim 16 is the method claim corresponding to the apparatus of claim 14 above.
Claim 22 is similarly rejected on the same basis as claim 14 above as claim 22 is the system claim corresponding to the apparatus of claim 14 above.
Claim(s) 3, 6, 9 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zivkovic, Grochowski, and further in view of Pisha, USPAT No. 12,475,189.
In regards to claim 3, the combination of Zivkovic and Grochowski discloses The apparatus of claim 1 (see rejection of claim 1 above) wherein the processor comprises: decode circuitry to decode the complex number matrix multiplication instruction and execute circuitry to execute the complex number matrix multiplication instruction (Zivkovic [0138-0139 and 0143-0147]: wherein processor comprises decode circuitry to decode complex matrix multiply and accumulate instructions and execution circuitry (See Fig. 12A-13))
The combination of Zivkovic and Grochowski thus far does not explicitly disclose processor of Fig. 12A being a graphics processing unit.
However, Zivkovic discloses a processor is a graphics processing unit ([0093 and 0108])
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the processor of Fig. 12A of Zivkovic to be a graphics processing unit. It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention because it would have been the simple substitution of one known element (executing instructions using a graphics processing unit) for another (executing instructions using a generic processing unit) to obtain predictable results (executing complex matrix multiplication instructions using a graphics processing unit) (MPEP 2143, Example B).
The combination of Zivkovic and Grochowski does not further disclose scheduler circuitry to schedule the complex number matrix multiplication instruction; a plurality of execution circuitry, each corresponding to a different thread of a thread group, the plurality of execution circuitry to collectively perform the operations corresponding to the complex number matrix multiplication instruction; and a plurality of sets of registers, each corresponding to a different one of the plurality of execution circuitry, the plurality of sets of registers to collectively store the first source complex number matrix, the second source complex number matrix, and the destination complex number matrix. Zivkovic discloses using a plurality of cores to executes threads simultaneously ([0140]), however Zivkovic does not disclose using a plurality of execution circuits to collectively perform the operations corresponding to the complex number matrix multiplication instruction. Thus, another reference is brought in for that teaching.
Pisha discloses scheduler circuitry to schedule the complex matrix multiplication instruction (Column 5, lines 1-10 and 30-45 and Column 58, lines 13-14 to Column 59, lines 5-38: wherein scheduler unit schedules complex matrix multiply accumulate instruction) a plurality of execution circuitry, each corresponding to a different thread of a thread group, the plurality of execution circuitry to collectively perform the operations corresponding to the complex matrix multiplication instruction (Column 57, lines 43-55, Column 58, lines 14-40, and Column 59, lines 5-38: wherein a plurality of execution circuits are disclosed each corresponding to a different thread within a thread group to collectively perform the operations corresponding to a complex matrix multiplication (See Fig. 29)) and a plurality of sets of registers, each corresponding to a different one of the plurality of execution circuitry, the plurality of sets of registers to collectively store the first source complex number matrix, the second source complex number matrix, and the destination complex number matrix. (Column 5, lines 1-9, Column 57, lines 43-55, Column 58, lines 14-40, and Column 59, lines 5-47: wherein a plurality of sets of registers of a register file each correspond to a different one of a plurality of execution circuits and each collectively store matrices needed for the mma instructions (See Fig.29))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the graphics processing unit of Zivokic which performs complex multiply accumulate instructions to use a SIMT architecture which executes complex matrix accumulate instructions using a graphics processing unit as taught in Pisha. It would have been obvious to one of ordinary skill in the art because using a SIMT architecture can be used for the benefits of massive parallelism and improved processing efficiency which can optimize graphics processing units.
Claim 20 is similarly rejected on the same basis as claim 3 above as claim 20 is the system claim corresponding to the apparatus of claim 3 above.
In regards to claim 6, the combination of Zivkovic and Grochowski discloses The apparatus of claim 1 (see rejection of claim 1 above).
The combination of Zivkovic and Grochowski does not disclose wherein the complex number matrix multiplication instruction is a single-instruction, multiple-thread (SIMT) instruction. Zivkovic does disclose executing a complex matrix multiply instruction in a multi-threading environment but does not explicitly indicate executing the instruction in a SIMT architecture.
Pisha discloses wherein the complex number matrix multiplication instruction is a single-instruction, multiple-thread (SIMT) instruction (Column 5, lines 1-5 and 31-45, Column 45, lines 16-26, Column 57, lines 53-55: wherein a complex MMA instruction is executed in a SIMT environment and thus is a SIMT instruction)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the graphics processing unit of Zivokic which performs complex multiply accumulate instructions to use a SIMT architecture which executes complex matrix accumulate instructions using a graphics processing unit as taught in Pisha. It would have been obvious to one of ordinary skill in the art because using a SIMT architecture can be used for the benefits of massive parallelism and improved processing efficiency which can optimize graphics processing units.
In regards to claim 9, the combination of Zivkovic and Grochowski discloses The apparatus of claim 1 (see rejection of claim 1 above).
The combination of Zivkovic and Grochowski does not explicitly disclose wherein the destination complex number matrix has an integer multiple of four complex numbers. Grochowski does disclose that a destination matrix can be of any arbitrary size, but does explicitly disclose that that size includes an integer multiple of four.
Pisha discloses wherein the destination complex number matrix has an integer multiple of four complex numbers. (See Figs. 2-3 and Column 5, lines 46-67: wherein a destination complex number has an integer multiple of four complex numbers (8x4 or 8x8))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the destination matrix of Zivkovic and Grochowski to include a multiple of four complex numbers as the destination matrix of Pisha. It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention because it would have been the simple substitution of one known element (using a destination matrix size that is a integer multiple of four) for another (using a generic or arbitrary size destination matrix) to obtain predictable results (executing complex matrix multiplication instructions to store complex number results that are a multiple of four to a destination matrix) (MPEP 2143, Example B). Furthermore, the courts have deemed that changes in size are obvious, thus it would have been obvious to include a destination matrix that has a multiple of four numbers (See In re Rose, 220 F.2d 459, 105 USPQ 237 (CCPA 1955) (MPEP 2144.04(IV)(A)).
Claim(s) 4-5, 17, 21 and 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zivkovic, Grochowski, Pisha and further in view of NPL reference, “Parallel Thread Execution ISA” hereby referred to as PTX.
In regards to claim 4, the combination of Zivkovic, Grochowski, Pisha discloses The apparatus of claim 3 (see rejection of claim 3 above).
The combination of Zivkovic, Grochowski, Pisha does not disclose wherein the complex number matrix multiplication instruction has a synchronization indicator to synchronize the threads of the thread group by causing the threads of the thread group to wait until all other of the threads of the thread group have executed the complex number matrix multiplication prior to execution of other subsequent instructions.
PTX discloses wherein a matrix multiplication instruction has a synchronization indicator to synchronize the threads of the thread group by causing the threads of the thread group to wait until all other of the threads of the thread group have executed the matrix multiplication prior to execution of other subsequent instructions. (pages 275-277: wherein a warp-level mma instruction includes a. sync qualifier to synchronize threads of a warp causing threads to wait until all other threads of the warp have executed the same mma instruction before resuming execution)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the complex matrix multiply instructions of Zivkovic, Grochowski, Pisha to include a synchronization indicator as the matrix multiply instructions of PTX. It would have been obvious to one of ordinary skill in the art because including a synchronization indicator to allow the threads of a warp in a SIMT architecture to synchronize ensures data consistency in shared memory.
Claim 17 is similarly rejected on the same basis as claim 4 above as claim 17 is the method claim corresponding to the apparatus of claim 4 above.
Claim 21 is similarly rejected on the same basis as claim 4 above as claim 21 is the system claim corresponding to the apparatus of claim 4 above.
Claim 25 is similarly rejected on the same basis as claim 4 above as claim 25 is the non-transitory medium claim corresponding to the apparatus of claim 4 above.
In regards to claim 5, the combination of Zivkovic, Grochowski, Pisha discloses The apparatus of claim 3 (see rejection of claim 3 above).
The combination of Zivkovic, Grochowski, Pisha does not disclose wherein the complex number matrix multiplication instruction has an alignment indicator to indicate that the threads of the thread group are to execute the same complex number matrix multiplication instruction.
PTX discloses wherein a matrix multiplication instruction has an alignment indicator to indicate that the threads of the thread group are to execute the same matrix multiplication instruction. (pages 275-277: wherein a warp-level mma instruction includes a align qualifier to indicate that threads of a warp are to execute the same mma instruction)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the complex matrix multiply instructions of Zivkovic, Grochowski, Pisha to include a align indicator as the matrix multiply instructions of PTX. It would have been obvious to one of ordinary skill in the art because including a align indicator ensures that all threads of a warp are working with a synchronized data layout, for complex operations where data is distributed across the registers of multiple threads within a warp, thus ensuring data consistency.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Sazegari, PGPUB No. 2024/0103858 for teaching matrix multiplication instructions including MxK and KxN matrix operands
Choquette, PGPUB No. 2023/0289398 for teaching a group MMA instruction performing an MMA operation for a plurality of warps
Any inquiry concerning this communication or earlier communications from the examiner should be directed to COURTNEY P SPANN whose telephone number is (571)431-0692. The examiner can normally be reached M-F, 9am-6pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached at 571-270-3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/COURTNEY P SPANN/Primary Examiner, Art Unit 2183