Prosecution Insights
Last updated: May 29, 2026
Application No. 18/619,392

WAVE LEVEL MATRIX MULTIPLY INSTRUCTIONS

Final Rejection §102§103§112
Filed
Mar 28, 2024
Priority
Apr 03, 2023 — provisional 63/493,972
Examiner
VICARY, KEITH E
Art Unit
2183
Tech Center
2100 — Computer Architecture & Software
Assignee
Advanced Micro Devices, Inc.
OA Round
2 (Final)
58%
Grant Probability
Moderate
3-4
OA Rounds
1y 9m
Est. Remaining
99%
With Interview

Examiner Intelligence

Grants 58% of resolved cases
58%
Career Allowance Rate
393 granted / 684 resolved
+2.5% vs TC avg
Strong +41% interview lift
Without
With
+41.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
27 currently pending
Career history
728
Total Applications
across all art units

Statute-Specific Performance

§101
7.2%
-32.8% vs TC avg
§103
48.9%
+8.9% vs TC avg
§102
7.2%
-32.8% vs TC avg
§112
32.3%
-7.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 684 resolved cases

Office Action

§102 §103 §112
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-20 are pending in this office action and presented for examination. Claims 1-3, 5-10, 12-17, and 19-20 are newly amended by the response received September 30, 2025. In claim 1, line 4, an “s” appears to be added at the end of “circuit” without appropriate underlining. In claim 2, line 6, “each of” is marked with strikethrough, but did not appear in the previous set of claims. In claim 15, line 9, an “s” appears to be added at the end of “circuit” without appropriate underlining. In claim 16, line 6, “each of” is marked with strikethrough, but did not appear in the previous set of claims. Examiner requests that future amendments be made in the appropriate manner conveyed in MPEP 714 to avoid confusion and potential notices of non-compliant amendment. Claim Rejections - 35 USC § 112 The following is a quotation of the first paragraph of 35 U.S.C. 112(a): (a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112: The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention. Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Claim 1 recites the limitation “fetch, from the register file only once, a first plurality of values” in line 8 (with further recited limitations regarding the first plurality of values). However, a claim may lack written description support when a broad genus claim is presented but the disclosure only describes a narrow species with no evidence that the genus is contemplated. In the instant case, Examiner submits that the claim is a broad genus claim (by using the language "register file", which encompasses both “scalar register file” 332 and “vector register file 330”, in the context of the remaining language of the limitation), but the original disclosure (e.g., paragraph [0014], “The circuitry of the parallel data processing circuit performs a matrix multiplication operation using source operands accessed only once from a vector register file”; paragraph [0018], “Therefore, the data of the rows and columns of matrices 110 and 120 are retrieved only once from the vector register file”; original claim 1, “fetch, from the vector register file only once, a first plurality of values”) only describes a narrow species (fetching, from the “vector” register file only once, a first plurality of values) with no evidence that the genus is contemplated. Claims 2-7 are rejected for failing to alleviate the rejection of claim 1 above. Claim 8 recites the limitation “fetching, by the processing circuit from a register file only once, a first plurality of values” in lines 4-5 (with further recited limitations regarding the first plurality of values). However, a claim may lack written description support when a broad genus claim is presented but the disclosure only describes a narrow species with no evidence that the genus is contemplated. In the instant case, Examiner submits that the claim is a broad genus claim (by using the language "register file", which encompasses both “scalar register file” 332 and “vector register file 330”, in the context of the remaining language of the limitation), but the original disclosure (e.g., paragraph [0014], “The circuitry of the parallel data processing circuit performs a matrix multiplication operation using source operands accessed only once from a vector register file”; paragraph [0018], “Therefore, the data of the rows and columns of matrices 110 and 120 are retrieved only once from the vector register file”; original claim 1, “fetch, from the vector register file only once, a first plurality of values”) only describes a narrow species (fetching, from the “vector” register file only once, a first plurality of values) with no evidence that the genus is contemplated. Claims 9-14 are rejected for failing to alleviate the rejection of claim 8 above. Claim 15 recites the limitation “fetch, from the register file only once, the first plurality of values” in line 14 (with further recited limitations regarding the first plurality of values). However, a claim may lack written description support when a broad genus claim is presented but the disclosure only describes a narrow species with no evidence that the genus is contemplated. In the instant case, Examiner submits that the claim is a broad genus claim (by using the language "register file", which encompasses both “scalar register file” 332 and “vector register file 330”, in the context of the remaining language of the limitation), but the original disclosure (e.g., paragraph [0014], “The circuitry of the parallel data processing circuit performs a matrix multiplication operation using source operands accessed only once from a vector register file”; paragraph [0018], “Therefore, the data of the rows and columns of matrices 110 and 120 are retrieved only once from the vector register file”; original claim 15, “fetch, from the vector register file only once, a first plurality of values”) only describes a narrow species (fetching, from the “vector” register file only once, a first plurality of values) with no evidence that the genus is contemplated. Claims 16-20 are rejected for failing to alleviate the rejection of claim 15 above. The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 1 recites the limitation “A processor comprising: a register file; a plurality of execution pipelines, having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute at least two different types of instructions; and circuitry, wherein responsive to a first instruction of a first type of the at least two different instructions, the circuitry is configured to: fetch, from the register file only once, a first plurality of values; and perform, using the plurality of arithmetic logic circuits, a first operation by reusing the first plurality of values for at least two iterations of computations used to perform the first operation” in lines 1-12. However, the specification appears to conflict with the aforementioned claimed subject matter. For example, while the claim recites that a plurality of execution pipelines have a plurality of arithmetic logic circuits, the specification (e.g., paragraph [0047]) appears to convey that an ALU comprises an execution pipeline. For example, while the claim appears to recite the circuitry of line 6 as a separate element from the register file in line 2 and the plurality of execution pipelines in line 3, Figure 3 shows the vector processing circuit 310A comprising vector register file 330 and vector ALU 350. For example, while the claim recites the plurality of arithmetic logic circuits reusing the first plurality of values fetched from the register file, Figure 3 does not show a plurality of arithmetic logic circuits operating on a reused first plurality of values fetched from a register file. A claim, although clear on its face, may also be indefinite when a conflict or inconsistency between the claimed subject matter and the specification disclosure renders the scope of the claim uncertain as inconsistency with the specification disclosure or prior art teachings may make an otherwise definite claim take on an unreasonable degree of uncertainty. Therefore, because the aforementioned claimed subject matter appears to conflict with the specification in the manner explained above, the claim is indefinite. Claim 1 recites the limitation “a plurality of execution pipelines, having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute at least two different types of instructions” in lines 3-5. However, it is indefinite as to whether a) a plurality of execution pipelines, or b) a plurality of arithmetic logic circuits, are that which comprise the aforementioned circuitry configured to execute at least two different types of instructions. Claims 2-7 are rejected for failing to alleviate the rejections of claim 1 above. Claim 5 recites the limitation “the values of the first matrix” in line 8. However, it is indefinite as to whether these values are the same as, or different from, “the first plurality of values of the first matrix” as recited in claim 5, line 4. Note that claim 6 recites the limitation “the first plurality of values of the first matrix” in line 4. Claim 5 recites the limitation “the values of the second matrix” in lines 8-9. However, it is indefinite as to whether these values are the same as, or different from, “the second plurality of values of the second matrix” as recited in claim 5, lines 4-5. Note that claim 6 recites the limitation “the second plurality of values of the second matrix” in lines 4-5. Claim 6 is rejected for failing to alleviate the rejections of claim 5 above. Claim 8 recites the limitation “A method, comprising: responsive to receiving, by a processing circuit, a first instruction of a first type of at least two different types of instructions: fetching, by the processing circuit from a register file only once, a first plurality of values; and performing, using a plurality of execution pipelines of the processing circuit having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute the at least two different types of instructions, a first operation by reusing the first plurality of values for at least two iterations of computations used to perform the first operation” in lines 1-11. However, the specification appears to conflict with the aforementioned claimed subject matter. For example, while the claim recites that a plurality of execution pipelines have a plurality of arithmetic logic circuits, the specification (e.g., paragraph [0047]) appears to convey that an ALU comprises an execution pipeline. For example, while the claim appears to recite the processing circuit of line 4 as a separate element from the vector register file in line 4, Figure 3 shows the vector processing circuit 310A comprising vector register file 330. For example, while the claim recites the plurality of execution pipelines reusing the first plurality of values fetched from the register file, Figure 3 does not show a plurality of execution pipelines operating on a reused first plurality of values fetched from a register file. A claim, although clear on its face, may also be indefinite when a conflict or inconsistency between the claimed subject matter and the specification disclosure renders the scope of the claim uncertain as inconsistency with the specification disclosure or prior art teachings may make an otherwise definite claim take on an unreasonable degree of uncertainty. Therefore, because the aforementioned claimed subject matter appears to conflict with the specification in the manner explained above, the claim is indefinite. Claim 8 recites the limitation “a plurality of execution pipelines of the processing circuit having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute the at least two different types of instructions” in lines 6-9. However, it is indefinite as to whether a) a plurality of execution pipelines, or b) a plurality of arithmetic logic circuits, are that which comprise the aforementioned circuitry configured to execute at least two different types of instructions. Claims 9-14 are rejected for failing to alleviate the rejections of claim 8 above. Claim 9 recites the limitation “The method as recited in claim 8, responsive to receiving, by the processing circuit, a second instruction of a second type of the at least two different types of instructions different from the first type of the first instruction: fetching … performing …” in lines 1-9. However, the metes and bounds of this limitation are grammatically indefinite. For example, it is indefinite as to whether the method is being recited to comprise the fetching and performing steps. Claims 10-14 are rejected for failing to alleviate the rejection of claim 9 above. Claim 10 recites the limitation “The method as recited in claim 9, further comprising fetching, from the register file only once by the processing circuit, the first plurality of values as data of a first matrix and the second plurality of values as data of a second matrix” in lines 1-4. Claim 9, upon which claim 10 is dependent, recites the limitation “fetching, by the processing circuit from the register file only once, a second plurality of values” in lines 4-5. Claim 8, upon which claim 9 is dependent, recites the limitation “fetching, by the processing circuit from a register file only once, a first plurality of values” in lines 4-5. Therefore, it is indefinite as to whether claim 10 (in the context of claim 8) entails fetching the first plurality of values once or twice, in view of the “further” language in claim 10, line 1. Similarly, it is indefinite as to whether claim 10 (in the context of claim 9) entails fetching the second plurality of values once or twice, in view of the “further” language in claim 10, line 1. Claims 11-14 are rejected for failing to alleviate the rejection of claim 10 above. Claim 12 recites the limitation “the values of the first matrix” in line 8. However, it is indefinite as to whether these values are the same as, or different from, “the first plurality of values of the first matrix” as recited in claim 12, line 4. Note that claim 13 recites the limitation “the first plurality of values of the first matrix” in line 4. Claim 12 recites the limitation “the values of the second matrix” in lines 8-9. However, it is indefinite as to whether these values are the same as, or different from, “the second plurality of values of the second matrix” as recited in claim 12, lines 4-5. Note that claim 13 recites the limitation “the second plurality of values of the second matrix” in lines 4-5. Claim 12 recites the limitation “The method as recited in claim 10, wherein responsive to the first instruction, the method further comprises, by each of the plurality of arithmetic logic circuits: … performing a matrix multiplication operation of a fused multiply add (FMA) operation” in lines 1-6. Claim 8, upon which claim 12 is indirectly dependent, recites the limitation “responsive to receiving, by a processing circuit, a first instruction of a first type of at least two different types of instructions … performing, using a plurality of execution pipelines of the processing circuit having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute the at least two different types of instructions, a first operation” in lines 2-9. Therefore, it is indefinite as to whether or not claim 12 (in the context of claim 8) entails, in response to receiving the first instruction, executing both a first operation and, separately and distinctly, a matrix multiplication operation of a FMA operation, in view of the “further” language in claim 12, line 2. Claim 13 is rejected for failing to alleviate the rejections of claim 12 above. Claim 13 recites the limitation “The method as recited in claim 12, wherein responsive to the second instruction, the method further comprises, by each of the plurality of arithmetic logic circuits: … performing a matrix multiplication operation of a dot product operation” in lines 1-6. Claim 9, upon which claim 13 is indirectly dependent, recites the limitation “responsive to receiving, by the processing circuit, a second instruction of a second type of the at least two different types of instructions different from the first type of the first instruction … performing, using the plurality of arithmetic logic circuits, a second operation” in lines 1-7. Therefore, it is indefinite as to whether or not claim 13 (in the context of claim 9) entails, responsive to receiving the second instruction, executing both a second operation and, separately and distinctly, a matrix multiplication operation of a dot product operation, in view of the “further” language in claim 13, line 2. Claim 15 recites the limitation “a second processor comprising: a register file; a plurality of execution pipelines, having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute at least two different types of instructions; and circuitry configured to: responsive to a first instruction of the one or more kernels with a first type of the at least two different instructions: fetch, from the register file only once, the first plurality of values; and perform, using the plurality of arithmetic logic circuits, a first operation by reusing the first plurality of values for at least two iterations of computations used to perform the first operation” in lines 6-18. However, the specification appears to conflict with the aforementioned claimed subject matter. For example, while the claim recites that a plurality of execution pipelines have a plurality of arithmetic logic circuits, the specification (e.g., paragraph [0047]) appears to convey that an ALU comprises an execution pipeline. For example, while the claim appears to recite the circuitry of line 11 as a separate element from the register file in line 7 and the plurality of execution pipelines in line 8, Figure 3 shows the vector processing circuit 310A comprising vector register file 330 and vector ALU 350. For example, while the claim recites the plurality of arithmetic logic circuits reusing the first plurality of values fetched from the register file, Figure 3 does not show a plurality of arithmetic logic circuits operating on a reused first plurality of values fetched from a register file. A claim, although clear on its face, may also be indefinite when a conflict or inconsistency between the claimed subject matter and the specification disclosure renders the scope of the claim uncertain as inconsistency with the specification disclosure or prior art teachings may make an otherwise definite claim take on an unreasonable degree of uncertainty. Therefore, because the aforementioned claimed subject matter appears to conflict with the specification in the manner explained above, the claim is indefinite. Claim 15 recites the limitation “a plurality of execution pipelines, having a plurality of arithmetic logic circuits, each comprising circuitry configured to execute at least two different types of instructions” in lines 8-10. However, it is indefinite as to whether a) a plurality of execution pipelines, or b) a plurality of arithmetic logic circuits, are that which comprise the aforementioned circuitry configured to execute at least two different types of instructions. Claim 15 recites the limitation “a first instruction of the one or more kernels with a first type of the at least two different types of instructions” in lines 12-13. However, it is indefinite as to whether it is a) a first instruction, or b) the one or more kernels, which is with a first type of the at least two different types of instructions. Claims 16-20 are rejected for failing to alleviate the rejections of claim 15 above. Claim 19 recites the limitation “the values of the first matrix” in line 8. However, it is indefinite as to whether these values are the same as, or different from, “the first plurality of values of the first matrix” as recited in claim 19, line 4. Note that claim 20 recites the limitation “the first plurality of values of the first matrix” in line 4. Claim 19 recites the limitation “the values of the second matrix” in lines 8-9. However, it is indefinite as to whether these values are the same as, or different from, “the second plurality of values of the second matrix” as recited in claim 19, lines 4-5. Note that claim 20 recites the limitation “the second plurality of values of the second matrix” in lines 4-5. Claim 20 is rejected for failing to alleviate the rejections of claim 19 above. The following is a quotation of 35 U.S.C. 112(d): (d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers. The following is a quotation of pre-AIA 35 U.S.C. 112, fourth paragraph: Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA 35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers. Claims 7 and 14 are rejected under 35 U.S.C. 112(d) or pre-AIA 35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements. Claim 7 recites the limitation “The processor as recited in claim 3, wherein the circuitry is further configured to: fetch the first matrix and the second matrix from the register file only once until each element of a resulting matrix is updated by one of the first operation and the second operation” in lines 1-5. Claim 3, upon which claim 7 is dependent, recites the limitation “the circuitry is configured to fetch, from the register file only once, the first plurality of values as data of a first matrix and the second plurality of values as data of a second matrix” in lines 1-4. Claim 2, upon which claim 3 is dependent, recites the limitation “fetch, from the register file only once, a second plurality of values” in line 5. Claim 1, upon which claim 2 is dependent, recites the limitation “fetch, from the register file only once, a first plurality of values” in line 8. Therefore, claim 7 appears to fail to include all the limitations of the claim upon which it depends, because claims 1 and 3 recite that the first plurality of values is fetched from the register file only once and claims 2 and 3 recite that the second plurality of values is fetched from the register file only once, whereas claim 7 appears to encompass the possibility that, rather than the first plurality of values and the second plurality of values being fetched from the register file only once, the first plurality of values and the second plurality of values may be fetched from the register file an additional time(s) following each element of a resulting matrix being updated by one of the first operation and the second operation. Claim 14 recites the limitation “The method as recited in claim 10, further comprising: fetching, by the processing circuit, the first matrix and the second matrix from the register file only once until each element of a resulting matrix is updated by one of the first operation and the second operation” in lines 1-4. Claim 10, upon which claim 14 is dependent, recites the limitation “fetching, from the register file only once by the processing circuit, the first plurality of values as data of a first matrix and the second plurality of values as data of a second matrix” in lines 1-4. Claim 9, upon which claim 10 is dependent, recites the limitation “fetching, by the processing circuit from the register file only once, a second plurality of values” in lines 4-5. Claim 8, upon which claim 9 is directly dependent, recites the limitation “fetching, by the processing circuit from a register file only once, a first plurality of values” in lines 4-5. Therefore, claim 14 appears to fail to include all the limitations of the claim upon which it depends, because claims 8 and 10 recite that the first plurality of values is fetched from the register file only once and claims 9 and 10 recite that the second plurality of values is fetched from the register file only once, whereas claim 14 appears to encompass the possibility that, rather than the first plurality of values and the second plurality of values being fetched from the register file only once, the first plurality of values and the second plurality of values may be fetched from the register file an additional time(s) following each element of a resulting matrix being updated by one of the first operation and the second operation. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (Zhang) (US 20220206749 A1) in view of Chen et al. (Chen) (US 20200272687 A1). Consider claim 1, Zhang discloses a processor ([0029], line 6, processor) comprising: a register file ([0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel); a plurality of execution pipelines, having a plurality of arithmetic logic circuits ([0032], line 8, multiple dot product data units 322-1 to 322-n; [0036], lines 1-2, FIG. 4 is a schematic block diagram of a dot product data unit 400); and circuitry, wherein responsive to a first instruction of a first type ([0053], line 12, single instruction), the circuitry is configured to: fetch, from the register file only once, a first plurality of values ([0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations); and perform, using the plurality of arithmetic logic circuits, a first operation by reusing the first plurality of values for at least two iterations of computations used to perform the first operation ([0048], lines 1-6, in Block 504, the data reuse unit 321 determines the multiple data subsets from the data set, so as to respectively input the multiple data subsets into the multiple dot product data units 322-1 to 322-n. The two data subsets inputted into the two adjacent dot product data units include a portion of the same data; [0049], lines 1-4, in Block 506, each dot product data unit of the multiple dot product data units 322-1 to 322-n performs the dot product operation on the inputted data subset, so as to generate the dot product operation result; [0050], lines 1-5, in Block 508, each dot product data unit of the multiple dot product data units 322-1 to 322-n generates the current cumulative result of the dot product data unit based on the previous cumulative result of the dot product data unit and the dot product operation result; [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 6-8, otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation). However, Zhang does not disclose each of the aforementioned comprises circuitry configured to execute at least two different types of instructions. On the other hand, Chen discloses circuitry configured to execute at least two different types of instructions ([0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teaching of Chen with the invention of Zhang in order to increase processing capability via supporting different types of instructions. (Alternatively, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Zhang with the invention of Chen in order to improve efficiency; see paragraph [0014] of Zhang.) Note that the overall combination thereby entails that each of the aforementioned comprises circuitry configured to execute at least two different types of instructions, and that the aforementioned first instruction is of a first type of the at least two different types of instructions. Consider claim 2, the overall combination entails the processor as recited in claim 1 (see above), wherein responsive to a second instruction of a second type of the at least two different types of instructions different from the first type of the first instruction (Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations), the circuitry is further configured to: fetch, from the register file only once, a second plurality of values (Zhang, [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations); and perform, using the plurality of arithmetic logic circuits, a second operation different from the first operation by reusing the second plurality of values for at least two iterations of computations used to provide the second operation (Zhang, [0048], lines 1-6, in Block 504, the data reuse unit 321 determines the multiple data subsets from the data set, so as to respectively input the multiple data subsets into the multiple dot product data units 322-1 to 322-n. The two data subsets inputted into the two adjacent dot product data units include a portion of the same data; [0049], lines 1-4, in Block 506, each dot product data unit of the multiple dot product data units 322-1 to 322-n performs the dot product operation on the inputted data subset, so as to generate the dot product operation result; [0050], lines 1-5, in Block 508, each dot product data unit of the multiple dot product data units 322-1 to 322-n generates the current cumulative result of the dot product data unit based on the previous cumulative result of the dot product data unit and the dot product operation result; [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 6-8, otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation). Consider claim 3, the overall combination entails the processor as recited in claim 2 (see above), wherein the circuitry is configured to fetch, from the register file only once, the first plurality of values as data of a first matrix and the second plurality of values as data of a second matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations). Consider claim 4, the overall combination entails the processor as recited in claim 3 (see above), wherein the first operation is a fused multiply add (FMA) operation (Zhang, Figure 4, which shows a fused multiply add operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) and the second operation is a dot product operation (Zhang, [0049], line 3, dot product operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations). Consider claim 5, the overall combination entails the processor as recited in claim 3 (see above), wherein responsive to the first instruction (Zhang, [0053], line 12, single instruction), each of the plurality of arithmetic logic circuits is configured to: receive the first plurality of values of the first matrix and the second plurality of values of the second matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations); and perform a matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) multiplication operation of a fused multiply add (FMA) operation (Zhang, Figure 4, which shows a fused multiply add operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) using at least a first multiplier circuit and a second multiplier circuit (Zhang, Figure 4, which shows the multiplier circuits; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations), each having a size less than a size of the values of the first matrix and less than a size of the values of the second matrix (Zhang, [0059], lines 5-6, the dot product data unit 322-1 may perform a dot product operation of A1*B1+A2*B2+A3*B3). Consider claim 6, the overall combination entails the processor as recited in claim 5 (see above), wherein responsive to the second instruction (Zhang, [0053], line 12, single instruction; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations), each of the plurality of arithmetic logic circuits is configured to: receive the first plurality of values of the first matrix and the second plurality of values of the second matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations); and perform a matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) multiplication operation of a dot product operation (Zhang, Figure 4, which shows a dot product operation, which entails multiplication; [0036], lines 1-2, FIG. 4 is a schematic block diagram of a dot product data unit 400; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) using the first multiplier circuit and the second multiplier circuit (Zhang, Figure 4, which shows the multiplier circuits). Consider claim 7, the overall combination entails the processor as recited in claim 3 (see above), wherein the circuitry is further configured to: fetch the first matrix and the second matrix from the register file only once (Zhang, [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) until each element of a resulting matrix is updated by one of the first operation and the second operation (Zhang, [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 1-8, in Block 512, each dot product data unit of the dot product data units 322-1 to 322-n writes the current cumulative result of the dot product data unit to the general register 310 to serve as the convolution operation result when it is determined in the Block 510 that the convolution operation is over. Otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations); and store the first matrix and the second matrix in a plurality of storage elements for reuse by the plurality of arithmetic logic circuits (Zhang, [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 1-8, in Block 512, each dot product data unit of the dot product data units 322-1 to 322-n writes the current cumulative result of the dot product data unit to the general register 310 to serve as the convolution operation result when it is determined in the Block 510 that the convolution operation is over. Otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation). Consider claim 8, Zhang discloses a method, comprising: responsive to receiving, by a processing circuit ([0029], line 6, processor), a first instruction of a first type ([0053], line 12, single instruction): fetching, by the processing circuit from a register file only once, a first plurality of values ([0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel); and performing, using a plurality of execution pipelines of the processing circuit having a plurality of arithmetic logic circuits ([0032], line 8, multiple dot product data units 322-1 to 322-n; [0036], lines 1-2, FIG. 4 is a schematic block diagram of a dot product data unit 400), a first operation by reusing the first plurality of values for at least two iterations of computations used to perform the first operation ([0048], lines 1-6, in Block 504, the data reuse unit 321 determines the multiple data subsets from the data set, so as to respectively input the multiple data subsets into the multiple dot product data units 322-1 to 322-n. The two data subsets inputted into the two adjacent dot product data units include a portion of the same data; [0049], lines 1-4, in Block 506, each dot product data unit of the multiple dot product data units 322-1 to 322-n performs the dot product operation on the inputted data subset, so as to generate the dot product operation result; [0050], lines 1-5, in Block 508, each dot product data unit of the multiple dot product data units 322-1 to 322-n generates the current cumulative result of the dot product data unit based on the previous cumulative result of the dot product data unit and the dot product operation result; [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 6-8, otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation). However, Zhang does not disclose each of the aforementioned comprises circuitry configured to execute at least two different types of instructions. On the other hand, Chen discloses circuitry configured to execute at least two different types of instructions ([0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teaching of Chen with the invention of Zhang in order to increase processing capability via supporting different types of instructions. (Alternatively, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Zhang with the invention of Chen in order to improve efficiency; see paragraph [0014] of Zhang.) Note that the overall combination thereby entails that each of the aforementioned comprises circuitry configured to execute at least two different types of instructions, and that the aforementioned first instruction is of a first type of the at least two different types of instructions. Consider claim 9, the overall combination entails the method as recited in claim 8 (see above), responsive to receiving, by the processing circuit, a second instruction of a second type of the at least two different types of instructions different from the first type of the first instruction (Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations): fetching, by the processing circuit from the register file only once, a second plurality of values (Zhang, [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations); and performing, using the plurality of arithmetic logic circuits, a second operation different from the first operation by reusing the second plurality of values for at least two iterations of computations used to provide the second operation (Zhang, [0048], lines 1-6, in Block 504, the data reuse unit 321 determines the multiple data subsets from the data set, so as to respectively input the multiple data subsets into the multiple dot product data units 322-1 to 322-n. The two data subsets inputted into the two adjacent dot product data units include a portion of the same data; [0049], lines 1-4, in Block 506, each dot product data unit of the multiple dot product data units 322-1 to 322-n performs the dot product operation on the inputted data subset, so as to generate the dot product operation result; [0050], lines 1-5, in Block 508, each dot product data unit of the multiple dot product data units 322-1 to 322-n generates the current cumulative result of the dot product data unit based on the previous cumulative result of the dot product data unit and the dot product operation result; [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 6-8, otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation). Consider claim 10, the overall combination entails the method as recited in claim 9 (see above), further comprising fetching, from the register file only once by the processing circuit, the first plurality of values as data of a first matrix and the second plurality of values as data of a second matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations). Consider claim 11, the overall combination entails the method as recited in claim 10 (see above), wherein the first operation is a fused multiply add (FMA) operation (Zhang, Figure 4, which shows a fused multiply add operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) and the second operation is a dot product operation (Zhang, [0049], line 3, dot product operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations). Consider claim 12, the overall combination entails the method as recited in claim 10 (see above), wherein responsive to the first instruction (Zhang, [0053], line 12, single instruction), the method further comprises, by each of the plurality of arithmetic logic circuits: receiving the first plurality of values of the first matrix and the second plurality of values of the second matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations); and performing a matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) multiplication operation of a fused multiply add (FMA) operation (Zhang, Figure 4, which shows a fused multiply add operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) using at least a first multiplier circuit and a second multiplier circuit (Zhang, Figure 4, which shows the multiplier circuits; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations), each having a size less than a size of the values of the first matrix and less than a size of the values of the second matrix (Zhang, [0059], lines 5-6, the dot product data unit 322-1 may perform a dot product operation of A1*B1+A2*B2+A3*B3). Consider claim 13, the overall combination entails the method as recited in claim 12 (see above), wherein responsive to the second instruction (Zhang, [0053], line 12, single instruction; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations), the method further comprises, by each of the plurality of arithmetic logic circuits: receiving the first plurality of values of the first matrix and the second plurality of values of the second matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations); and performing a matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) multiplication operation of a dot product operation (Zhang, Figure 4, which shows a dot product operation, which entails multiplication; [0036], lines 1-2, FIG. 4 is a schematic block diagram of a dot product data unit 400; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) using the first multiplier circuit and the second multiplier circuit (Zhang, Figure 4, which shows the multiplier circuits). Consider claim 14, the overall combination entails the method as recited in claim 10 (see above), further comprising: fetching, by the processing circuit, the first matrix and the second matrix from the register file only once (Zhang, [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) until each element of a resulting matrix is updated by one of the first operation and the second operation (Zhang, [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 1-8, in Block 512, each dot product data unit of the dot product data units 322-1 to 322-n writes the current cumulative result of the dot product data unit to the general register 310 to serve as the convolution operation result when it is determined in the Block 510 that the convolution operation is over. Otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations); and storing, by the processing circuit, the first matrix and the second matrix in a plurality of storage elements for reuse by the plurality of arithmetic logic circuits (Zhang, [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 1-8, in Block 512, each dot product data unit of the dot product data units 322-1 to 322-n writes the current cumulative result of the dot product data unit to the general register 310 to serve as the convolution operation result when it is determined in the Block 510 that the convolution operation is over. Otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation). Consider claim 15, Zhang discloses a memory comprising circuitry configured to store at least a first plurality of values ([0013], line 9, memory); and a second processor ([0029], line 6, processor) comprising: a register file ([0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel); a plurality of execution pipelines, having a plurality of arithmetic logic circuits ([0032], line 8, multiple dot product data units 322-1 to 322-n; [0036], lines 1-2, FIG. 4 is a schematic block diagram of a dot product data unit 400); and circuitry configured to: responsive to a first instruction with a first type ([0053], line 12, single instruction): fetch, from the register file only once, a first plurality of values ([0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations); and perform, using the plurality of arithmetic logic circuits, a first operation by reusing the first plurality of values for at least two iterations of computations used to perform the first operation ([0048], lines 1-6, in Block 504, the data reuse unit 321 determines the multiple data subsets from the data set, so as to respectively input the multiple data subsets into the multiple dot product data units 322-1 to 322-n. The two data subsets inputted into the two adjacent dot product data units include a portion of the same data; [0049], lines 1-4, in Block 506, each dot product data unit of the multiple dot product data units 322-1 to 322-n performs the dot product operation on the inputted data subset, so as to generate the dot product operation result; [0050], lines 1-5, in Block 508, each dot product data unit of the multiple dot product data units 322-1 to 322-n generates the current cumulative result of the dot product data unit based on the previous cumulative result of the dot product data unit and the dot product operation result; [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 6-8, otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation). However, Zhang does not disclose each of the aforementioned comprising circuitry configured to execute at least two different types of instructions. Zhang also does not disclose the memory is configured to store a plurality of instructions, and a first processor comprising circuitry configured to launch one or more kernels comprising the plurality of instructions, the first instruction being of the one or more kernels. On the other hand, Chen discloses circuitry configured to execute at least two different types of instructions ([0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations). Chen also discloses memory configured to store a plurality of instructions and data ([0049], lines 14-19, in various implementations, the program instructions are stored on any of a variety of non-transitory computer readable storage mediums. The storage medium is accessible by a computing system during use to provide the program instructions to the computing system for program execution; FIG. 1, memory device(s) 140), and, alongside a second processor ([0019], lines 3-8, in one implementation, processor 105N is a data parallel processor with a highly parallel architecture. Data parallel processors include graphics processing units (GPUs), digital signal processors (DSPs), field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and so forth), a first processor comprising circuitry configured to launch one or more kernels comprising the plurality of instructions, a first instruction being of the one or more kernels ([0025], lines 1-7, CPU (not shown) of computing system 200 launches kernels to be performed on GPU 205. Command processor 235 receives kernels from the host CPU and uses dispatch unit 250 to issue corresponding wavefronts to compute units 255A-N. In one implementation, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit; [0019], lines 1-2, processor 105A is a general purpose processor, such as a central processing unit (CPU)). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teaching of Chen with the invention of Zhang in order to increase processing capability via supporting and facilitating different types of instructions and multiple processors. (Alternatively, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Zhang with the invention of Chen in order to improve efficiency; see paragraph [0014] of Zhang.) Note that the overall combination thereby entails that each of the aforementioned comprises circuitry configured to execute at least two different types of instructions, and that the aforementioned first instruction is of the one or more kernels with a first type of the at least two different types of instructions. Consider claim 16, the overall combination entails the computer system as recited in claim 15 (see above), wherein responsive to a second instruction of a second type of the at least two different types of instructions different from the first type of the first instruction (Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations), the circuitry is further configured to: fetch, from the register file only once, a second plurality of values (Zhang, [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations); and perform, using the plurality of arithmetic logic circuits, a second operation different from the first operation by reusing the second plurality of values for at least two iterations of computations used to provide the second operation (Zhang, [0048], lines 1-6, in Block 504, the data reuse unit 321 determines the multiple data subsets from the data set, so as to respectively input the multiple data subsets into the multiple dot product data units 322-1 to 322-n. The two data subsets inputted into the two adjacent dot product data units include a portion of the same data; [0049], lines 1-4, in Block 506, each dot product data unit of the multiple dot product data units 322-1 to 322-n performs the dot product operation on the inputted data subset, so as to generate the dot product operation result; [0050], lines 1-5, in Block 508, each dot product data unit of the multiple dot product data units 322-1 to 322-n generates the current cumulative result of the dot product data unit based on the previous cumulative result of the dot product data unit and the dot product operation result; [0052], lines 1-2, in Block 510, it is determined whether the convolution operation has ended; [0053], lines 6-8, otherwise, return to the Block 504 to continue performing the cycle on the data to be calculated in the convolution operation). Consider claim 17, the overall combination entails the computing system as recited in claim 16 (see above), wherein the circuitry is configured to fetch, from the register file only once, the first plurality of values as data of a first matrix and the second plurality of values as data of a second matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations). Consider claim 18, the overall combination entails the computing system as recited in claim 17 (see above), wherein the first operation is a fused multiply add (FMA) operation (Zhang, Figure 4, which shows a fused multiply add operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) and the second operation is a dot product operation (Zhang, [0049], line 3, dot product operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations). Consider claim 19, the overall combination entails the processor as recited in claim 17 (see above), wherein responsive to the first instruction (Zhang, [0053], line 12, single instruction), each of the plurality of arithmetic logic circuits is configured to: receive the first plurality of values of the first matrix and the second plurality of values of the second matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations); and perform a matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) multiplication operation of a fused multiply add (FMA) operation (Zhang, Figure 4, which shows a fused multiply add operation; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) using at least a first multiplier circuit and a second multiplier circuit (Zhang, Figure 4, which shows the multiplier circuits; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations), each having a size less than a size of the values of the first matrix and less than a size of the values of the second matrix (Zhang, [0059], lines 5-6, the dot product data unit 322-1 may perform a dot product operation of A1*B1+A2*B2+A3*B3). Consider claim 20, the overall combination entails the computing system as recited in claim 19 (see above), wherein responsive to the second instruction (Zhang, [0053], line 12, single instruction; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations), each of the plurality of arithmetic logic circuits is configured to: receive the first plurality of values of the first matrix and the second plurality of values of the second matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations); and perform a matrix (Zhang, [0029], line 4, general register 310; [0047], lines 1-3, in Block 502, the data reuse unit 321 reads from the general register 310 and temporarily stores the data set used for the multiple convolution operations; [0056], lines 1-3, the data reuse unit 321 may read from the general register 310 and temporarily store the above-mentioned 5×5 pixel matrix and the 3×3 convolution kernel; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) multiplication operation of a dot product operation (Zhang, Figure 4, which shows a dot product operation, which entails multiplication; [0036], lines 1-2, FIG. 4 is a schematic block diagram of a dot product data unit 400; Chen, [0025], lines 5-13, each compute unit 255A-N includes an adaptive multi-instruction type matrix operations unit. For example, the adaptive multi-instruction type matrix operations unit performs matrix multiplication operations, dot product operations, and fused multiply add (FMA) operations. Additionally, in various implementations, the adaptive, multi-instruction type matrix operations unit performs other types of matrix, arithmetic, or bitwise operations) using the first multiplier circuit and the second multiplier circuit (Zhang, Figure 4, which shows the multiplier circuits). Response to Arguments Applicant on page 15 argues: “The Specification has been objected to because of informalities. Applicant has made the appropriate corrections. Applicant submits no new matter has been added.” In view of the aforementioned corrections, the previously presented objections to the specification are withdrawn. Applicant on page 15 argues: ‘The drawings are objected to as failing to comply with 37 CFR 1.121(d). In particular, it is suggested that the drawings do not show the shared L1 cache described in paragraph 50 of the Specification. A replacement sheet is provided herein for Figure 5 to label a shared cache as "Shared L1 Cache 565". Applicant submits no new matter has been added.’ In view of the aforementioned replacement sheet, the previously presented objection to the drawings is withdrawn. Applicant on page 15 argues: ‘In the present Office Action, claim 6 is objected to due to informalities. In particular, there is an inadvertent duplication of the word "in" within the limitation. Claim 6 has been amended as requested.’ In view of the aforementioned amendment, the previously presented objection to the claims is withdrawn. Applicant on page 16 argues: ‘It is suggested that the specification does not support "an execution pipeline comprises an arithmetic logic circuit." Applicant respectfully disagrees. For example, paragraph 36 of the Specification describes "Similarly, the hardware of lane 320C is an instantiation of the hardware of lane 320A. The components in lanes 320A-320C operate in lockstep."’ However, the reproduced portion of paragraph 36 does not appear to disclose the terminology “execution pipeline”. Examiner notes that the terminology “execution pipeline” is used elsewhere in the specification (e.g., paragraph [0047]) to refer to an element that is different from, for example, a “lane” as conveyed in the reproduced portion of paragraph 36, and therefore Examiner submits that the terminology “execution pipeline” in the claims would not be understood to be equivalent to the “lane” in the reproduced portion of paragraph 36. Examiner notes that the meaning of every term used in any of the claims should be apparent from the descriptive portion of the specification with clear disclosure as to its import, and further notes that the terms and phrases used in claims must find clear support or antecedent basis in the description so that the meaning of the terms in the claims may be ascertainable by reference to the description. Applicant on page 16 argues: ‘Paragraph 37 of the Specification describes "the parallel computational lanes 320A-320C operate in lockstep. In various implementations, the data flow within each of the lanes 320A-320C is pipelined. Pipeline registers are used for storing intermediate results. Within a given row across lanes 320A-320C, vector arithmetic logic unit (ALU) 350 includes the same circuitry and functionality, and operates on the same instruction, but different data associated with a different thread."’ However, the reproduced portion of paragraph 37 does not appear to disclose the terminology “execution pipeline”. Examiner notes that the terminology “execution pipeline” is used elsewhere in the specification (e.g., paragraph [0047]) to refer to an element that is different from, for example, a “lane” as conveyed in the reproduced portion of paragraph 37, and therefore Examiner submits that the terminology “execution pipeline” in the claims would not be understood to be equivalent to the “lane” in the reproduced portion of paragraph 37. Examiner notes that the meaning of every term used in any of the claims should be apparent from the descriptive portion of the specification with clear disclosure as to its import, and further notes that the terms and phrases used in claims must find clear support or antecedent basis in the description so that the meaning of the terms in the claims may be ascertainable by reference to the description. Applicant on page 16 argues: ‘Further, paragraph 40 of the Specification describes "Therefore, the vector ALU 350 can begin operations sooner. Selection circuit 342 also includes multiplexers and possible crossbar circuitry to route source operands to particular inputs of operations being performed by vector ALU 350. In various implementations, lane 320A is organized as a multi-stage pipeline. Intermediate sequential elements, such as staging flip-flop circuits, registers, or latches, are not shown for ease of illustration."’ However, the reproduced portion of paragraph 40 does not appear to disclose the terminology “execution pipeline”. Examiner notes that the terminology “execution pipeline” is used elsewhere in the specification (e.g., paragraph [0047]) to refer to an element that is different from, for example, a “lane” as conveyed in the reproduced portion of paragraph 40, and therefore Examiner submits that the terminology “execution pipeline” in the claims would not be understood to be equivalent to the “lane” in the reproduced portion of paragraph 40. Examiner notes that the meaning of every term used in any of the claims should be apparent from the descriptive portion of the specification with clear disclosure as to its import, and further notes that the terms and phrases used in claims must find clear support or antecedent basis in the description so that the meaning of the terms in the claims may be ascertainable by reference to the description. Applicant on page 16 argues: ‘Each of the parallel computational lanes includes a corresponding multi-stage execution pipeline, and the ALU is in the execution pipeline. The ALU can also include one or more stages of the entire execution pipeline. Therefore, as described in paragraph 47 of the Specification, "each of the vector ALUs includes a single execution pipeline for each of the FMA operation and the dot product operation." Accordingly, Applicant submits the specification supports "an execution pipeline comprises an arithmetic logic circuit."’ However, Examiner submits that the reproduced portions addressed above do not convey that the ALU is in the “execution pipeline”, when “execution pipeline” is interpreted in a manner consistent with how that particular terminology is used in the specification. Applicant on page 17 argues: ‘It is suggested that it is indefinite as to whether claim 3 (in the context of claim 1) entails fetching the first plurality of values once or twice. Claim 3 has been amended to include … Support for the claim amendments may be found in at least Figures 1-4 and paragraphs 16, 20, 22, 28, 30, 32, 42 and 46-47. Claims 10 and 17 have been amended in a similar manner.’ In view of the amendments to claims 3 and 17, the associated previously presented indefinite rejections are withdrawn. However, claim 10 continues to recite “further” language. Applicant on page 17 argues: ‘It is suggested that the claim 5 features "the values of the first matrix" lack antecedent basis. Claim 5 has been amended to include "receive the first plurality of values of the first matrix and the second plurality of values of the second matrix." Claims 6, 12-13 and 19-20 have been amended in a similar manner.’ However, Examiner notes that claims 5, 12, and 19 continue to have instances of the “the values of the first matrix” and “the values of the second matrix”. Applicant on page 17 argues: ‘Regarding claim 5, it is suggested that it is indefinite whether "a fused multiply add (FMA) operation" is the same as "a fused multiply add (FMA) operation" of claim 4. Regarding claim 6, which is dependent on claim 5, it is suggested that it is indefinite whether "a dot product operation" is the same as "a dot product operation" of claim 4. Claim 5 has been amended to be dependent on claim 3. Claims 12 and 19 have been amended in a similar manner.’ In view of the aforementioned amendments, the associated previously presented indefinite rejections are withdrawn. Applicant on page 17 argues: ‘Claim 5 has also been amended to correct a typographical error to include "each having a size less than a size of the values of the first matrix and less than a size of the values of the second matrix." Claims 12 and 19 have been amended in a similar manner.’ In view of the aforementioned amendments, the associated previously presented indefinite rejections are withdrawn. Applicant on page 18 argues: “As amended, claim 1 recites … Applicant submits Zhang fails to disclose or suggest at least the above highlighted features.” Applicant on page 19 argues: “As independent claims 8 and 15 include features similar to claim 1, claims 8 and 15 are patentably distinguished from the cited art for similar reasons. As each of the dependent claims includes the features of the independent claims on which they depend, each of the dependent claims is patentably distinct for at least the above reasons.” Applicant on page 20 argues: “Therefore, for at least these further reasons, claim 2 is patentably distinguishable from the cited art. Claims 9 and 16 include similar features and are similarly patentably distinguishable.” In view of the aforementioned amendments, Examiner is withdrawing the associated rejections under 35 USC 102. Applicant on page 20 argues: “On page 21 of the present Office Action, it is suggested that the dot product data unit of Zhang discloses "the first operation is a fused multiply add (FMA) operation." However, Zhang discloses "FIG. 4 is a schematic block diagram of a dot product data unit 400 that performs a dot product operation on three pairs of data according to an embodiment of the disclosure." (See Zhang, para. 36). The dot product data unit 400 performs a dot product operation on three pairs of data. The dot product data unit 400 of Zhang does not perform an operation different from the dot product operation. Therefore, for at least these further reasons, claim 4 is patentably distinguishable from the cited art. Claims 11 and 18 include similar features and are similarly patentably distinguishable.” Examiner generally submits that a dot product operation may be considered to entail a fused multiply add operation. For example, FIG. 4 shows a circuit that performs what may be considered a fused multiply add — see the arrangement of multipliers and adders. Nevertheless, Examiner notes that the Chen reference cited above discloses a fused multiply add operation as well. Applicant on page 21 argues: ‘The above disclosure merely describes the dot product data unit 322-1 performs the dot product operation with a subset of a matrix corresponding to the 5 x 5 pixel matrix shown in Table 1 and a subset of a matrix corresponding to the 3 x3 convolution kernel shown in Table 2. However, Zhang nowhere discloses the dot product data unit 322-1 has "a size less than a size of the values of the first matrix and less than a size of the values of the second matrix." Regarding the use of the dot product data unit 322-1, Zhang discloses the dot product data unit 400 includes three multipliers 410-1 to 410-3 and "Each of the multipliers 410 is configured to multiply one pair of inputted data, so as to generate a product, and input the product to a corresponding adder 420. A pair of data includes, for example, a pixel and a weight in the convolution kernel." (See Zhang, para. 37). The three multipliers 410-1 to 410-3 of Zhang are used to perform a dot product operation, not a fused multiply add (FMA) operation. The three multipliers 410-1 to 410-3 of Zhang do not have "a size less than a size of the values of the first matrix and less than a size of the values of the second matrix." Therefore, for at least these further reasons, claim 5 is patentably distinguishable from the cited art. Claims 12 and 19 include similar features and are similarly patentably distinguishable.’ However, Examiner submits that the multiplication of matrices via multiplication on subsets of the matrices teaches the argued size-related limitations. In addition, as noted above, Examiner generally submits that a dot product operation may be considered to entail a fused multiply add operation. For example, FIG. 4 shows a circuit that performs what may be considered a fused multiply add — see the arrangement of multipliers and adders. Nevertheless, Examiner notes that the Chen reference cited above discloses a fused multiply add operation as well. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEITH E VICARY whose telephone number is (571)270-1314. The examiner can normally be reached Monday to Friday, 9:00 AM to 5:00 PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Caldwell can be reached at (571)272-3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /KEITH E VICARY/Primary Examiner, Art Unit 2182
Read full office action

Prosecution Timeline

Mar 28, 2024
Application Filed
Jul 02, 2025
Non-Final Rejection mailed — §102, §103, §112
Sep 30, 2025
Response Filed
Oct 20, 2025
Final Rejection mailed — §102, §103, §112
Feb 18, 2026
Applicant Interview (Telephonic)
Feb 18, 2026
Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12608336
SINGLE INSTRUCTION MULTIPLE DISPATCHES FOR SHORT KERNELS IN A RECONFIGURABLE PARALLEL PROCESSOR
2y 9m to grant Granted Apr 21, 2026
Patent 12608208
ASYNCHRONOUS RELEASE OPERATIONS IN A MULTIPROCESSOR SYSTEM
2y 1m to grant Granted Apr 21, 2026
Patent 12602349
HANDLING DYNAMIC TENSOR LENGTHS IN A RECONFIGURABLE PROCESSOR THAT INCLUDES MULTIPLE MEMORY UNITS
2y 9m to grant Granted Apr 14, 2026
Patent 12572360
Cache Preload Operations Using Streaming Engine
3y 11m to grant Granted Mar 10, 2026
Patent 12554507
SYSTEMS AND METHODS FOR PROCESSING FORMATTED DATA IN COMPUTATIONAL STORAGE
2y 8m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4
Expected OA Rounds
58%
Grant Probability
99%
With Interview (+41.3%)
3y 11m (~1y 9m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 684 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month