DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending in this office action and presented for examination. Claims 1-3, 5-10, 12-17, and 19-20 are newly amended by the response received September 30, 2025.
In claim 1, line 4, an “s” appears to be added at the end of “circuit” without appropriate underlining.
In claim 2, line 6, “each of” is marked with strikethrough, but did not appear in the previous set of claims.
In claim 15, line 9, an “s” appears to be added at the end of “circuit” without appropriate underlining.
In claim 16, line 6, “each of” is marked with strikethrough, but did not appear in the previous set of claims.
Examiner requests that future amendments be made in the appropriate manner conveyed in MPEP 714 to avoid confusion and potential notices of non-compliant amendment.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 1 recites the limitation “fetch, from the register file only once, a first plurality of values” in line 8 (with further recited limitations regarding the first plurality of values). However, a claim may lack written description support when a broad genus claim is presented but the disclosure only describes a narrow species with no evidence that the genus is contemplated. In the instant case, Examiner submits that the claim is a broad genus claim (by using the language "register file", which encompasses both “scalar register file” 332 and “vector register file 330”, in the context of the remaining language of the limitation), but the original disclosure (e.g., paragraph [0014], “The circuitry of the parallel data processing circuit performs a matrix multiplication operation using source operands accessed only once from a vector register file”; paragraph [0018], “Therefore, the data of the rows and columns of matrices 110 and 120 are retrieved only once from the vector register file”; original claim 1, “fetch, from the vector register file only once, a first plurality of values”) only describes a narrow species (fetching, from the “vector” register file only once, a first plurality of values) with no evidence that the genus is contemplated.
Claims 2-7 are rejected for failing to alleviate the rejection of claim 1 above.
Claim 8 recites the limitation “fetching, by the processing circuit from a register file only once, a first plurality of values” in lines 4-5 (with further recited limitations regarding the first plurality of values). However, a claim may lack written description support when a broad genus claim is presented but the disclosure only describes a narrow species with no evidence that the genus is contemplated. In the instant case, Examiner submits that the claim is a broad genus claim (by using the language "register file", which encompasses both “scalar register file” 332 and “vector register file 330”, in the context of the remaining language of the limitation), but the original disclosure (e.g., paragraph [0014], “The circuitry of the parallel data processing circuit performs a matrix multiplication operation using source operands accessed only once from a vector register file”; paragraph [0018], “Therefore, the data of the rows and columns of matrices 110 and 120 are retrieved only once from the vector register file”; original claim 1, “fetch, from the vector register file only once, a first plurality of values”) only describes a narrow species (fetching, from the “vector” register file only once, a first plurality of values) with no evidence that the genus is contemplated.
Claims 9-14 are rejected for failing to alleviate the rejection of claim 8 above.
Claim 15 recites the limitation “fetch, from the register file only once, the first plurality of values” in line 14 (with further recited limitations regarding the first plurality of values). However, a claim may lack written description support when a broad genus claim is presented but the disclosure only describes a narrow species with no evidence that the genus is contemplated. In the instant case, Examiner submits that the claim is a broad genus claim (by using the language "register file", which encompasses both “scalar register file” 332 and “vector register file 330”, in the context of the remaining language of the limitation), but the original disclosure (e.g., paragraph [0014], “The circuitry of the parallel data processing circuit performs a matrix multiplication operation using source operands accessed only once from a vector register file”; paragraph [0018], “Therefore, the data of the rows and columns of matrices 110 and 120 are retrieved only once from the vector register file”; original claim 15, “fetch, from the vector register file only once, a first plurality of values”) only describes a narrow species (fetching, from the “vector” register file only once, a first plurality of values) with no evidence that the genus is contemplated.
Claims 16-20 are rejected for failing to alleviate the rejection of claim 15 above.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation “A processor comprising: a register file; a plurality of execution pipelines, having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute at least two different types of instructions; and circuitry, wherein responsive to a first instruction of a first type of the at least two different instructions, the circuitry is configured to: fetch, from the register file only once, a first plurality of values; and perform, using the plurality of arithmetic logic circuits, a first operation by reusing the first plurality of values for at least two iterations of computations used to perform the first operation” in lines 1-12. However, the specification appears to conflict with the aforementioned claimed subject matter. For example, while the claim recites that a plurality of execution pipelines have a plurality of arithmetic logic circuits, the specification (e.g., paragraph [0047]) appears to convey that an ALU comprises an execution pipeline. For example, while the claim appears to recite the circuitry of line 6 as a separate element from the register file in line 2 and the plurality of execution pipelines in line 3, Figure 3 shows the vector processing circuit 310A comprising vector register file 330 and vector ALU 350. For example, while the claim recites the plurality of arithmetic logic circuits reusing the first plurality of values fetched from the register file, Figure 3 does not show a plurality of arithmetic logic circuits operating on a reused first plurality of values fetched from a register file. A claim, although clear on its face, may also be indefinite when a conflict or inconsistency between the claimed subject matter and the specification disclosure renders the scope of the claim uncertain as inconsistency with the specification disclosure or prior art teachings may make an otherwise definite claim take on an unreasonable degree of uncertainty. Therefore, because the aforementioned claimed subject matter appears to conflict with the specification in the manner explained above, the claim is indefinite.
Claim 1 recites the limitation “a plurality of execution pipelines, having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute at least two different types of instructions” in lines 3-5. However, it is indefinite as to whether a) a plurality of execution pipelines, or b) a plurality of arithmetic logic circuits, are that which comprise the aforementioned circuitry configured to execute at least two different types of instructions.
Claims 2-7 are rejected for failing to alleviate the rejections of claim 1 above.
Claim 5 recites the limitation “the values of the first matrix” in line 8. However, it is indefinite as to whether these values are the same as, or different from, “the first plurality of values of the first matrix” as recited in claim 5, line 4. Note that claim 6 recites the limitation “the first plurality of values of the first matrix” in line 4.
Claim 5 recites the limitation “the values of the second matrix” in lines 8-9. However, it is indefinite as to whether these values are the same as, or different from, “the second plurality of values of the second matrix” as recited in claim 5, lines 4-5. Note that claim 6 recites the limitation “the second plurality of values of the second matrix” in lines 4-5.
Claim 6 is rejected for failing to alleviate the rejections of claim 5 above.
Claim 8 recites the limitation “A method, comprising: responsive to receiving, by a processing circuit, a first instruction of a first type of at least two different types of instructions: fetching, by the processing circuit from a register file only once, a first plurality of values; and performing, using a plurality of execution pipelines of the processing circuit having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute the at least two different types of instructions, a first operation by reusing the first plurality of values for at least two iterations of computations used to perform the first operation” in lines 1-11. However, the specification appears to conflict with the aforementioned claimed subject matter. For example, while the claim recites that a plurality of execution pipelines have a plurality of arithmetic logic circuits, the specification (e.g., paragraph [0047]) appears to convey that an ALU comprises an execution pipeline. For example, while the claim appears to recite the processing circuit of line 4 as a separate element from the vector register file in line 4, Figure 3 shows the vector processing circuit 310A comprising vector register file 330. For example, while the claim recites the plurality of execution pipelines reusing the first plurality of values fetched from the register file, Figure 3 does not show a plurality of execution pipelines operating on a reused first plurality of values fetched from a register file. A claim, although clear on its face, may also be indefinite when a conflict or inconsistency between the claimed subject matter and the specification disclosure renders the scope of the claim uncertain as inconsistency with the specification disclosure or prior art teachings may make an otherwise definite claim take on an unreasonable degree of uncertainty. Therefore, because the aforementioned claimed subject matter appears to conflict with the specification in the manner explained above, the claim is indefinite.
Claim 8 recites the limitation “a plurality of execution pipelines of the processing circuit having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute the at least two different types of instructions” in lines 6-9. However, it is indefinite as to whether a) a plurality of execution pipelines, or b) a plurality of arithmetic logic circuits, are that which comprise the aforementioned circuitry configured to execute at least two different types of instructions.
Claims 9-14 are rejected for failing to alleviate the rejections of claim 8 above.
Claim 9 recites the limitation “The method as recited in claim 8, responsive to receiving, by the processing circuit, a second instruction of a second type of the at least two different types of instructions different from the first type of the first instruction: fetching … performing …” in lines 1-9. However, the metes and bounds of this limitation are grammatically indefinite. For example, it is indefinite as to whether the method is being recited to comprise the fetching and performing steps.
Claims 10-14 are rejected for failing to alleviate the rejection of claim 9 above.
Claim 10 recites the limitation “The method as recited in claim 9, further comprising fetching, from the register file only once by the processing circuit, the first plurality of values as data of a first matrix and the second plurality of values as data of a second matrix” in lines 1-4. Claim 9, upon which claim 10 is dependent, recites the limitation “fetching, by the processing circuit from the register file only once, a second plurality of values” in lines 4-5. Claim 8, upon which claim 9 is dependent, recites the limitation “fetching, by the processing circuit from a register file only once, a first plurality of values” in lines 4-5. Therefore, it is indefinite as to whether claim 10 (in the context of claim 8) entails fetching the first plurality of values once or twice, in view of the “further” language in claim 10, line 1. Similarly, it is indefinite as to whether claim 10 (in the context of claim 9) entails fetching the second plurality of values once or twice, in view of the “further” language in claim 10, line 1.
Claims 11-14 are rejected for failing to alleviate the rejection of claim 10 above.
Claim 12 recites the limitation “the values of the first matrix” in line 8. However, it is indefinite as to whether these values are the same as, or different from, “the first plurality of values of the first matrix” as recited in claim 12, line 4. Note that claim 13 recites the limitation “the first plurality of values of the first matrix” in line 4.
Claim 12 recites the limitation “the values of the second matrix” in lines 8-9. However, it is indefinite as to whether these values are the same as, or different from, “the second plurality of values of the second matrix” as recited in claim 12, lines 4-5. Note that claim 13 recites the limitation “the second plurality of values of the second matrix” in lines 4-5.
Claim 12 recites the limitation “The method as recited in claim 10, wherein responsive to the first instruction, the method further comprises, by each of the plurality of arithmetic logic circuits: … performing a matrix multiplication operation of a fused multiply add (FMA) operation” in lines 1-6. Claim 8, upon which claim 12 is indirectly dependent, recites the limitation “responsive to receiving, by a processing circuit, a first instruction of a first type of at least two different types of instructions … performing, using a plurality of execution pipelines of the processing circuit having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute the at least two different types of instructions, a first operation” in lines 2-9. Therefore, it is indefinite as to whether or not claim 12 (in the context of claim 8) entails, in response to receiving the first instruction, executing both a first operation and, separately and distinctly, a matrix multiplication operation of a FMA operation, in view of the “further” language in claim 12, line 2.
Claim 13 is rejected for failing to alleviate the rejections of claim 12 above.
Claim 13 recites the limitation “The method as recited in claim 12, wherein responsive to the second instruction, the method further comprises, by each of the plurality of arithmetic logic circuits: … performing a matrix multiplication operation of a dot product operation” in lines 1-6. Claim 9, upon which claim 13 is indirectly dependent, recites the limitation “responsive to receiving, by the processing circuit, a second instruction of a second type of the at least two different types of instructions different from the first type of the first instruction … performing, using the plurality of arithmetic logic circuits, a second operation” in lines 1-7. Therefore, it is indefinite as to whether or not claim 13 (in the context of claim 9) entails, responsive to receiving the second instruction, executing both a second operation and, separately and distinctly, a matrix multiplication operation of a dot product operation, in view of the “further” language in claim 13, line 2.
Claim 15 recites the limitation “a second processor comprising: a register file; a plurality of execution pipelines, having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute at least two different types of instructions; and circuitry configured to: responsive to a first instruction of the one or more kernels with a first type of the at least two different instructions: fetch, from the register file only once, the first plurality of values; and perform, using the plurality of arithmetic logic circuits, a first operation by reusing the first plurality of values for at least two iterations of computations used to perform the first operation” in lines 6-18. However, the specification appears to conflict with the aforementioned claimed subject matter. For example, while the claim recites that a plurality of execution pipelines have a plurality of arithmetic logic circuits, the specification (e.g., paragraph [0047]) appears to convey that an ALU comprises an execution pipeline. For example, while the claim appears to recite the circuitry of line 11 as a separate element from the register file in line 7 and the plurality of execution pipelines in line 8, Figure 3 shows the vector processing circuit 310A comprising vector register file 330 and vector ALU 350. For example, while the claim recites the plurality of arithmetic logic circuits reusing the first plurality of values fetched from the register file, Figure 3 does not show a plurality of arithmetic logic circuits operating on a reused first plurality of values fetched from a register file. A claim, although clear on its face, may also be indefinite when a conflict or inconsistency between the claimed subject matter and the specification disclosure renders the scope of the claim uncertain as inconsistency with the specification disclosure or prior art teachings may make an otherwise definite claim take on an unreasonable degree of uncertainty. Therefore, because the aforementioned claimed subject matter appears to conflict with the specification in the manner explained above, the claim is indefinite.
Claim 15 recites the limitation “a plurality of execution pipelines, having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute at least two different types of instructions” in lines 8-10. However, it is indefinite as to whether a) a plurality of execution pipelines, or b) a plurality of arithmetic logic circuits, are that which comprise the aforementioned circuitry configured to execute at least two different types of instructions.
Claim 15 recites the limitation “a first instruction of the one or more kernels with a first type of the at least two different types of instructions” in lines 12-13. However, it is indefinite as to whether it is a) a first instruction, or b) the one or more kernels, which is with a first type of the at least two different types of instructions.
Claims 16-20 are rejected for failing to alleviate the rejections of claim 15 above.
Claim 19 recites the limitation “the values of the first matrix” in line 8. However, it is indefinite as to whether these values are the same as, or different from, “the first plurality of values of the first matrix” as recited in claim 19, line 4. Note that claim 20 recites the limitation “the first plurality of values of the first matrix” in line 4.
Claim 19 recites the limitation “the values of the second matrix” in lines 8-9. However, it is indefinite as to whether these values are the same as, or different from, “the second plurality of values of the second matrix” as recited in claim 19, lines 4-5. Note that claim 20 recites the limitation “the second plurality of values of the second matrix” in lines 4-5.
Claim 20 is rejected for failing to alleviate the rejections of claim 19 above.
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
The following is a quotation of pre-AIA 35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA 35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
Claims 7 and 14 are rejected under 35 U.S.C. 112(d) or pre-AIA 35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.
Claim 7 recites the limitation “The processor as recited in claim 3, wherein the circuitry is further configured to: fetch the first matrix and the second matrix from the register file only once until each element of a resulting matrix is updated by one of the first operation and the second operation” in lines 1-5. Claim 3, upon which claim 7 is dependent, recites the limitation “the circuitry is configured to fetch, from the register file only once, the first plurality of values as data of a first matrix and the second plurality of values as data of a second matrix” in lines 1-4. Claim 2, upon which claim 3 is dependent, recites the limitation “fetch, from the register file only once, a second plurality of values” in line 5. Claim 1, upon which claim 2 is dependent, recites the limitation “fetch, from the register file only once, a first plurality of values” in line 8. Therefore, claim 7 appears to fail to include all the limitations of the claim upon which it depends, because claims 1 and 3 recite that the first plurality of values is fetched from the register file only once and claims 2 and 3 recite that the second plurality of values is fetched from the register file only once, whereas claim 7 appears to encompass the possibility that, rather than the first plurality of values and the second plurality of values being fetched from the register file only once, the first plurality of values and the second plurality of values may be fetched from the register file an additional time(s) following each element of a resulting matrix being updated by one of the first operation and the second operation.
Claim 14 recites the limitation “The method as recited in claim 10, further comprising: fetching, by the processing circuit, the first matrix and the second matrix from the register file only once until each element of a resulting matrix is updated by one of the first operation and the second operation” in lines 1-4. Claim 10, upon which claim 14 is dependent, recites the limitation “fetching, from the register file only once by the processing circuit, the first plurality of values as data of a first matrix and the second plurality of values as data of a second matrix” in lines 1-4. Claim 9, upon which claim 10 is dependent, recites the limitation “fetching, by the processing circuit from the register file only once, a second plurality of values” in lines 4-5. Claim 8, upon which claim 9 is directly dependent, recites the limitation “fetching, by the processing circuit from a register file only once, a first plurality of values” in lines 4-5. Therefore, claim 14 appears to fail to include all the limitations of the claim upon which it depends, because claims 8 and 10 recite that the first plurality of values is fetched from the register file only once and claims 9 and 10 recite that the second plurality of values is fetched from the register file only once, whereas claim 14 appears to encompass the possibility that, rather than the first plurality of values and the second plurality of values being fetched from the register file only once, the first plurality of values and the second plurality of values may be fetched from the register file an additional time(s) following each element of a resulting matrix being updated by one of the first operation and the second operation.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (Zhang) (US 20220206749 A1) in view of Chen et al. (Chen) (US 20200272687 A1).
Consider claim 1, Zhang discloses a processor ([0029], line 6, processor) comprising: a register file ([0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel); a plurality of execution pipelines, having a plurality of arithmetic logic circuits ([0032], line 8, multiple dot product data units 322-1 to 322-n; [0036], lines 1-2, FIG. 4 is a schematic block diagram of a dot product data unit 400); and circuitry, wherein responsive to a first instruction of a first type ([0053], line 12, single instruction), the circuitry is configured to: fetch, from the register file only once, a first plurality of values ([0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations); and perform, using the plurality of arithmetic logic circuits, a first operation by reusing the first plurality of values for at least two iterations of computations used to perform the first operation ([0048], lines 1-6, in Block 504, the data reuse unit 321 determines the multiple data subsets from the data set, so as to respectively input the multiple data subsets into the multiple dot product data units 322-1 to 322-n. The two data subsets inputted into the two adjacent dot product data units include a portion of the same data; [0049], lines 1-4, in Block 506, each dot product data unit of the multiple dot product data units 322-1 to 322-n performs the dot product operation on the inputted data subset, so as to generate the dot product operation result; [0050], lines 1-5, in Block 508, each dot product data unit of the multiple dot product data units 322-1 to 322-n generates the current cumulative result of the dot product data unit based on the previous cumulative result of the dot product data unit and the dot product operation result; [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 6-8, otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation).
However, Zhang does not disclose each of the aforementioned comprises circuitry configured to execute at least two different types of instructions.
On the other hand, Chen discloses circuitry configured to execute at least two different types of instructions ([0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teaching of Chen with the invention of Zhang in order to increase processing capability via supporting different types of instructions. (Alternatively, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Zhang with the invention of Chen in order to improve efficiency; see paragraph [0014] of Zhang.) Note that the overall combination thereby entails that each of the aforementioned comprises circuitry configured to execute at least two different types of instructions, and that the aforementioned first instruction is of a first type of the at least two different types of instructions.
Consider claim 2, the overall combination entails the processor as recited in claim 1 (see above), wherein responsive to a second instruction of a second type of the at least two different types of instructions different from the first type of the first instruction (Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations), the circuitry is further configured to: fetch, from the register file only once, a second plurality of values (Zhang, [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations); and perform, using the plurality of arithmetic logic circuits, a second operation different from the first operation by reusing the second plurality of values for at least two iterations of computations used to provide the second operation (Zhang, [0048], lines 1-6, in Block 504, the data reuse unit 321 determines the multiple data subsets from the data set, so as to respectively input the multiple data subsets into the multiple dot product data units 322-1 to 322-n. The two data subsets inputted into the two adjacent dot product data units include a portion of the same data; [0049], lines 1-4, in Block 506, each dot product data unit of the multiple dot product data units 322-1 to 322-n performs the dot product operation on the inputted data subset, so as to generate the dot product operation result; [0050], lines 1-5, in Block 508, each dot product data unit of the multiple dot product data units 322-1 to 322-n generates the current cumulative result of the dot product data unit based on the previous cumulative result of the dot product data unit and the dot product operation result; [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 6-8, otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation).
Consider claim 3, the overall combination entails the processor as recited in claim 2 (see above), wherein the circuitry is configured to fetch, from the register file only once, the first plurality of values as data of a first matrix and the second plurality of values as data of a second matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations).
Consider claim 4, the overall combination entails the processor as recited in claim 3 (see above), wherein the first operation is a fused multiply add (FMA) operation (Zhang, Figure 4, which shows a fused multiply add operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) and the second operation is a dot product operation (Zhang, [0049], line 3, dot product operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations).
Consider claim 5, the overall combination entails the processor as recited in claim 3 (see above), wherein responsive to the first instruction (Zhang, [0053], line 12, single instruction), each of the plurality of arithmetic logic circuits is configured to: receive the first plurality of values of the first matrix and the second plurality of values of the second matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations); and perform a matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) multiplication operation of a fused multiply add (FMA) operation (Zhang, Figure 4, which shows a fused multiply add operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) using at least a first multiplier circuit and a second multiplier circuit (Zhang, Figure 4, which shows the multiplier circuits; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations), each having a size less than a size of the values of the first matrix and less than a size of the values of the second matrix (Zhang, [0059], lines 5-6, the dot product data unit 322-1 may perform a dot product operation of A1*B1+A2*B2+A3*B3).
Consider claim 6, the overall combination entails the processor as recited in claim 5 (see above), wherein responsive to the second instruction (Zhang, [0053], line 12, single instruction; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations), each of the plurality of arithmetic logic circuits is configured to: receive the first plurality of values of the first matrix and the second plurality of values of the second matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations); and perform a matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) multiplication operation of a dot product operation (Zhang, Figure 4, which shows a dot product operation, which entails multiplication; [0036], lines 1-2, FIG. 4 is a schematic block diagram of a dot product data unit 400; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) using the first multiplier circuit and the second multiplier circuit (Zhang, Figure 4, which shows the multiplier circuits).
Consider claim 7, the overall combination entails the processor as recited in claim 3 (see above), wherein the circuitry is further configured to: fetch the first matrix and the second matrix from the register file only once (Zhang, [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) until each element of a resulting matrix is updated by one of the first operation and the second operation (Zhang, [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 1-8, in Block 512, each dot product data unit of the dot product data units 322-1 to 322-n writes the current cumulative result of the dot product data unit to the general register 310 to serve as the convolution operation result when it is determined in the Block 510 that the convolution operation is over. Otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations); and store the first matrix and the second matrix in a plurality of storage elements for reuse by the plurality of arithmetic logic circuits (Zhang, [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 1-8, in Block 512, each dot product data unit of the dot product data units 322-1 to 322-n writes the current cumulative result of the dot product data unit to the general register 310 to serve as the convolution operation result when it is determined in the Block 510 that the convolution operation is over. Otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation).
Consider claim 8, Zhang discloses a method, comprising: responsive to receiving, by a processing circuit ([0029], line 6, processor), a first instruction of a first type ([0053], line 12, single instruction): fetching, by the processing circuit from a register file only once, a first plurality of values ([0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel); and performing, using a plurality of execution pipelines of the processing circuit having a plurality of arithmetic logic circuits ([0032], line 8, multiple dot product data units 322-1 to 322-n; [0036], lines 1-2, FIG. 4 is a schematic block diagram of a dot product data unit 400), a first operation by reusing the first plurality of values for at least two iterations of computations used to perform the first operation ([0048], lines 1-6, in Block 504, the data reuse unit 321 determines the multiple data subsets from the data set, so as to respectively input the multiple data subsets into the multiple dot product data units 322-1 to 322-n. The two data subsets inputted into the two adjacent dot product data units include a portion of the same data; [0049], lines 1-4, in Block 506, each dot product data unit of the multiple dot product data units 322-1 to 322-n performs the dot product operation on the inputted data subset, so as to generate the dot product operation result; [0050], lines 1-5, in Block 508, each dot product data unit of the multiple dot product data units 322-1 to 322-n generates the current cumulative result of the dot product data unit based on the previous cumulative result of the dot product data unit and the dot product operation result; [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 6-8, otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation).
However, Zhang does not disclose each of the aforementioned comprises circuitry configured to execute at least two different types of instructions.
On the other hand, Chen discloses circuitry configured to execute at least two different types of instructions ([0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teaching of Chen with the invention of Zhang in order to increase processing capability via supporting different types of instructions. (Alternatively, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Zhang with the invention of Chen in order to improve efficiency; see paragraph [0014] of Zhang.) Note that the overall combination thereby entails that each of the aforementioned comprises circuitry configured to execute at least two different types of instructions, and that the aforementioned first instruction is of a first type of the at least two different types of instructions.
Consider claim 9, the overall combination entails the method as recited in claim 8 (see above), responsive to receiving, by the processing circuit, a second instructi