DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
The present application, 17864732, filed 07/14/2022 is a Continuation of PCT/CN2020/121536, filed 10/16/2020 claims foreign priority to CN202010245293.5, filed 03/31/2020; claims foreign priority to CN202010066005.X, filed 01/20/2020.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/11/2023, 01/16/2024 and 04/11/2024 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The drawings are objected to under 37 CFR 1.83(a). The drawings must show every feature of the invention specified in the claims. Therefore, the following must be shown or the feature(s) canceled from the claim(s). No new matter should be entered.
A. output the output floating-point number to the controller as specified in claims 3 and 17
B. sending, by the ALU, the output floating-point number to the controller as specified in claim 8
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter. See 37 CFR 1.75(d)(1) and MPEP § 608.01(o). Correction of the following is required:
A. output the output floating-point number to the controller in claims 3 and 17
B. sending, by the ALU, the output floating-point number to the controller in claim 8.
There is no description of outputting or sending the output floating-point number from the ALU to the controller.
Claim Objections
Claims 1-20 are objected to under 37 C.F.R. 1.71(a) which requires “full, clear, concise, and exact terms” as to enable any person skilled in the art or science to which the invention or discovery appertains, or with which it is most nearly connected, to make and use the same. The following should be corrected.
A. In claim 1 line 18, “the operating type” should read “the operation type” instead for consistency of claim terminologies. Claim 15 recites a similar limitation in line 19 and is objected to for the same reason. Claims 2-7 inherit the same deficiency as claim 1 by reason of dependence. Claims 16-20 inherit the same deficiency as claim 15 by reason of dependence.
B. In claim 2 lines 1-2, “each of N adjustment circuits” should read “each of the N adjustment circuits” instead.
C. In claim 2 line 3, “an input floating-point number” should read “the input floating-point number” because an input floating-point number is already introduced in claim 1 line 11 from which the claim depends. Claim 16 recites a similar limitation in line 3 and is objected to for the same reason. Claim 17 inherit the same deficiency as claim 16 by reason of dependence.
D. In claim 3 line 4, “the operation output floating-point number” should read “the operation result floating-point number” instead for consistency of claim terminologies. Claim 17 recites a similar limitation in line 4 and is objected to for the same reason.
E. In claim 4 line 6, “the output floating-point number” should read “the one output floating-point number” instead for consistency of claim terminologies. Claims 11 and 18 recite a similar limitation in line 6 and are objected to for the same reason.
F. In claim 7 line 2, “an input floating-point number” should read “the input floating-point number” because an input floating-point number is already introduced in claim 1 line 11 from which the claim depends.
G. In claims 8 lines 18 and 21, “multiplier accumulator” should read “multiplier-accumulator” instead for consistency of claim terminologies. Claims 9-14 inherit the same deficiency as claim 1 by reason of dependence.
H. In claim 12 lines 5-6, “each input floating-point number” should read “the input floating-point number” instead for better clarity.
Claim Interpretation
The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met. For example, assume a method claim requires step A if a first condition happens and step B if a second condition happens. If the claimed invention may be practiced without either the first or second condition happening, then neither step A or B is required by the broadest reasonable interpretation of the claim. If the claimed invention requires the first condition to occur, then the broadest reasonable interpretation of the claim requires step A. If the claimed invention requires both the first and second conditions to occur, then the broadest reasonable interpretation of the claim requires both steps A and B. The “when” clause in claims 11-12 are contingent limitations that are not required to be performed if the condition(s) precedent are not met. See MPEP 2111.04 for more information.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 13-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 13 recites “each input floating-point number” in lines 2-3. There is insufficient antecedent basis for this limitation in the claim. It is unclear whether this is supposed to refer to each floating point number in the first group or to something else. For purposes of examination, this is interpreted to refer to the each floating point number in the first group.
Claim 14 recites “the input floating-point number” in line 1. There is insufficient antecedent basis for this limitation in the claim. For purposes of examination, this is interpreted to refer to the each floating point number in the first group.
Further, claim 14 recites “wherein a format of the input floating-point number satisfies the Institute of Electrical and Electronics Engineers (IEEE) binary floating point arithmetic standard, and a format of the output floating-point number does not satisfy the IEEE binary floating point arithmetic standard”. Claim 8 from which recites claim 14 recites in part “converting, by the multiplier accumulator, the operation result floating-point number to an output floating-point number of a selected type, wherein the selected type is one of the N types of floating-point numbers”. It is unclear how the output floating-point number does not satisfy the IEEE binary floating point arithmetic standard when it is of one of the N types of floating-point numbers and a format of each of the floating-point numbers in the first group satisfies the Institute of Electrical and Electronics Engineers (IEEE) binary floating point arithmetic standard. For purposes of examination, this is interpreted as “wherein a format of each floating point number in the first group satisfies the Institute of Electrical and Electronics Engineers (IEEE) binary floating point arithmetic standard, and a format of the one or more converted floating-point numbers does not satisfy the IEEE binary floating point arithmetic standard”.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 4-7, 15-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. (US 20190042244 A1), hereinafter Henry, in view of Pugh et al. (US 20210042087 A1), hereinafter Pugh. Henry is cited in the IDS submitted on 04/11/2023.
Regarding claim 15, Henry teaches a computing device, comprising:
a memory storing data and executable instructions (Henry Fig. 8B and paragraph [0143] “While the illustrated embodiment of the processor also includes separate instruction and data cache units 834/874 and a shared L2 cache unit 876, alternative embodiments may have a single internal cache for both instructions and data, such as, for example, a Level 1 (L1) internal cache, or multiple levels of internal cache”; memory – cache); and
a hardware processor chip comprising (Henry Fig. 8B hardware processor chip – processor core 890):
a controller being configured to process N different types of floating-point numbers with N precisions, wherein each type of the N types of floating point numbers has one of the N precisions, N is an integer equal to or greater than 2 (Henry Figs. 4 and 8B and paragraphs [0092-0093, 0118 and 0137] controller - front-end unit or decode circuitry; N different types of floating-point numbers – values A and B; N precisions – precisions of A and B);
an arithmetic logic unit (ALU) comprising (Henry Fig. 8B and paragraphs [0121-0123, 0125, 0138] ALU – execution engine or execution unit ):
(Figs. 4 and 7A and paragraphs [0093] “Each of the values is then converted into two values in bfloat16 format. At references 404 and 406, the values in A and B in FP32 are approximated with values in bfloat16 format”; paragraph [0121] “At reference 704, the execution circuitry executes the decoded instruction. The execution includes converting values for each operand, each value being converted into a plurality of lower precision values at reference 712, where an exponent is to be stored for each operand … the lower precision values are in another floating-point format that has lesser bit than the significand precision bits of the floating-point format used by the vector or matrix. When the lower precision values are in the other lower precision floating-point format”; input floating-point number of a corresponding type with a corresponding precision - A and B in FP32; one or more output floating-point numbers of an operation type with an operation precision – A1, A2, B1 and B2 in bfloat16); and
a multiplier-accumulator (Henry Figs. 4, 7 and paragraph [0123] “the execution circuitry comprises one or more dedicated multiplier-accumulator (MAC) circuits, and the one or more dedicated MAC circuits are to perform integer multiply-accumulate operations. In one embodiment, each MAC circuit is a fused multiply add (FMA) circuit. The circuits may be dedicated to low precision floating-point values in bfloat16 format in one embodiment. For example, dedicated circuits have been built for bfloat16 arithmetic operations in machine learning (ML) and artificial intelligence (AI) applications”; multiplier-accumulator - MAC circuits):
receive a group of converted floating-point numbers (Henry 4, 7 and paragraph [0123]); and
perform the operation on the group of converted floating-point numbers to produce an operation result floating-point number of the operating type and operation precision (Henry 4, 7 and paragraphs [0123, 0125] an operation result floating-point number – resulting value);
the multiplier-accumulator is further configured to:
convert the operation result floating-point number to an output floating point number, wherein the output floating-point number is of one type of the N types of floating-point numbers (Henry 4, 7 and paragraph [0125] “The execution additionally includes generating a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and storing the floating-point value at reference 716).
Henry does not explicitly teach N adjustment circuits, each of the N adjustment circuits being configured to: obtain an input floating-point number of a corresponding type with a corresponding precision from the controller; and convert the input floating-point number from the corresponding type to one or more output floating-point numbers of an operation type with an operation precision; a multiplier-accumulator connected to the N adjustment circuits; and receive a group of converted floating-point numbers from the N adjustment circuits as inputs for an operation.
However, on the same field of endeavor, Pugh discloses N adjustment circuits, each of the N adjustment circuits being configured to: obtain an input floating-point number of a corresponding type with a corresponding precision and convert the input floating-point number from the corresponding type to one or more output floating-point numbers of an operation type with an operation precision; and provide a group of converted floating-point numbers from the N adjustment circuits as inputs for an operation to a multiplier-accumulator connected to the N adjustment circuits (Pugh Figs. 4-5 and paragraphs [0042, 0045-0047] “Each of the bit remap logics 430A-430D remaps the inputs based on a multiplication mode and byte selection mode input … In a floating-point mode that differs from the floating-point format used by the portion 500, the bit remap logics 430A-430D convert the inputs to a format expected by the portion 500”; N adjustment circuits - bit remap logics 430A-430D).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to modify Henry using Pugh and configure the ALU to include conversion circuitry (i.e., bit remap logics) for each input to the converted in order to convert the inputs to the format expected by the MAC circuits and in order to implement a multiple mode arithmetic circuit that supports different floating point format combinations (Pugh paragraph [0046]).
Therefore, the combination of Henry as modified in view of Pugh teaches N adjustment circuits, each of the N adjustment circuits being configured to: obtain an input floating-point number of a corresponding type with a corresponding precision from the controller; and convert the input floating-point number from the corresponding type to one or more output floating-point numbers of an operation type with an operation precision; a multiplier-accumulator connected to the N adjustment circuits; and receive a group of converted floating-point numbers from the N adjustment circuits as inputs for an operation.
Regarding claim 16, Henry as modified in view of Pugh teaches all the limitations of claim 15 as stated above. Further, Henry as modified in view of Pugh teaches wherein for each of the N adjustment circuits an exponent bit width of an output floating-point number is greater than an exponent bit width of an input floating-point number (Pugh paragraph [0046] “an example, the portion 500 expects floating-point values with a 15-bit mantissa, a one-bit sign, and an 8-bit exponent. In this example, the multiple mode arithmetic circuit supports inputs and outputs using various combinations of 16-bit mantissas, 10-bit mantissas, 12-bit mantissas, 8-bit exponents, 6-bit exponents, and 5-bit exponents”; paragraph [0077]).
Regarding claim 18, Henry as modified in view of Pugh teaches all the limitations of claim 15 as stated above. Further, Henry as modified in view of Pugh teaches wherein at least one of the N adjustment circuits is configured to: when an input floating-point number has a mantissa bit width is less than or equal to a mantissa bit width of the operation type of floating-point numbers, convert the input floating-point number to one output floating-point number of the operation type, wherein a value represented by the input floating-point number is equal to a value represented by the output floating-point number (Pugh paragraphs [0046-0047] “In a floating-point mode that differs from the floating-point format used by the portion 500, the bit remap logics 430A-430D convert the inputs to a format expected by the portion 500. In an example, the portion 500 expects floating-point values with a 15-bit mantissa, a one-bit sign, and an 8-bit exponent. In this example, the multiple mode arithmetic circuit supports inputs and outputs using various combinations of 16-bit mantissas, 10-bit mantissas, 12-bit mantissas, 8-bit exponents, 6-bit exponents, and 5-bit exponents. Based on the input format and the format expected by the portion 500, the bit remap logics 430A-430D convert the input values. In this example, selection of the input floating-point format is in response to a mode selection input. The bit remap logics 430A-430D, in some example embodiments, perform sign extension. As a result, operands that are smaller than the size of the input values accepted by the arithmetic blocks (e.g., the multipliers 520A-520H) are routed using only the routing resources necessary for the operands and sign-extended by the bit remap logics 430A-430D prior to use by the arithmetic blocks”).
Regarding claim 19, Henry as modified in view of Pugh teaches all the limitations of claim 15 as stated above. Further, Henry as modified in view of Pugh teaches wherein at least one of the N adjustment circuits is configured to: when a mantissa bit width of an input floating-point number is greater than a mantissa bit width of the operation type of floating-point numbers, convert the input floating-point number into a plurality of output floating-point numbers, and a value represented by the input floating-point number is same as a value represented by a sum of the plurality of output floating-point numbers (Henry Figs. 4-5 and paragraph [0093] “At references 404 and 406, the values in A and B in FP32 are approximated with values in bfloat16 format. Values of A1 and B1 (in bfloat16 format) are approximation of the ones of A and B (in FP32 format) and have lower precision than the ones of A and B because the bfloat16 has less mantissa bits than FP32 (8 bits vs. 24 bits). A1 and B1 has values represented by the most significant 8-bit mantissa of the ones in A and B as shown at references 412 and 416. At references 414 and 416, the reminders of A and B after subtracting A1 and B1 are approximated in bfloat16 format as A2 and B2, respectively”; paragraphs [0116-0117] “One may also split each FP32 value into three Bfloat16 … Three bfloat16 have 24-bit significand precision in total, thus conversion from a FP32 value (which as 24-bit significand precision also) to three bloat16 values does not lose accuracy”).
Regarding claim 20, Henry as modified in view of Pugh teaches all the limitations of claim 15 as stated above. Further, Henry as modified in view of Pugh teaches wherein for each of the N adjustment circuits a quantity of output floating-point numbers corresponding to each input floating-point number is determined based on a mantissa bit width of the input floating-point number and the mantissa bit width of floating-point numbers of the operation type (Henry paragraph [0017] “Three bfloat16 have 24-bit significand precision in total, thus conversion from a FP32 value (which as 24-bit significand precision also) to three bloat16 values does not lose accuracy”; paragraph [0122] “The number of the converted lower precision values per value may depend on the additional operand for the QoS requirement in on embodiment. For example, when the accuracy of the arithmetic operations is expected to be high, each value may be converted to more lower precision values ( e.g., each A may be converted into A1, A2, and A3, instead”).
Regarding claims 1-2 and 4-6, they are directed to the hardware processor chip of claims 15-16 and 18-20 respectively. All the structural configuration of the processor chip of claims 1-2 and 4-6 are recited in the processor chip of the computing device of claims 15-16 and 18-20 respectively. Claims 15-16 and 18-20 analysis applies equally to claims 1-2 and 4-6 respectively.
Regarding claim 7, Henry as modified in view of Pugh teaches all the limitations of claim 1 as stated above. Further, Henry as modified in view of Pugh teaches wherein for each of the N adjustment circuits a format of an input floating-point number satisfies the Institute of Electrical and Electronics Engineers (IEEE) binary floating point arithmetic standard, and a format of an output floating-point number of the operation type does not satisfy the IEEE binary floating point arithmetic standard (Henry Fig. 4 and paragraph [0094] “The floating-point arithmetic operation hardware circuits (e.g., MAC and/or FMA circuits) implementing such accumulation/accumulators do not comply with the present IEEE standards, but the mixed-precision decomposition of operations allows embodiments of the invention to perform arithmetic operations with a higher precision”).
Claims 8-14 are rejected under 35 U.S.C. 103 as being unpatentable over Henry in view of Pugh, and Lin et al. (US 20200026991 A1), hereinafter Lin.
Regarding claim 8, Henry teaches a floating-point number multiplication calculation method performed by a hardware processor chip comprising a controller and an arithmetic logic unit having (Henry Figs. 4, 7A and 8B and paragraph [0123]; hardware processor chip - processor core 890; controller - front-end unit or decode circuitry; arithmetic logic unit - execution engine or execution unit; multiplier-accumulator - MAC circuits):
sending, by the controller, a first group of floating-point numbers to the arithmetic logic unit (ALU) for a first operation, wherein the controller utilizes N different types of floating-point numbers with N precisions, and the floating-point numbers in the first group are of one or more types of the N types of floating-point numbers (Henry Figs. 4 , 7A and 8B and paragraphs [0092-0093, 0118, 0137] first group of floating-point numbers - values A and B; first operation – accumulate operations operation);
receiving, by the ALU, the first group of floating-point numbers from the controller (Henry Figs. 4 , 7A and 8B and paragraphs [0121-0123, 0125, 0138] “At reference 704, the execution circuitry executes the decoded instruction. The execution includes converting values for each operand, each value being converted into a plurality of lower precision values at reference 712 … The execution further includes performing arithmetic operations among lower precision values converted from values for the plurality of the operands at reference 714. The arithmetic operations include the ones shown in FIGS. 1 and 4 and discussed in the related paragraphs herein above”);
converting, (Henry Figs. 4 , 7A and paragraph [0121] “At reference 704, the execution circuitry executes the decoded instruction. The execution includes converting values for each operand, each value being converted into a plurality of lower precision values at reference 712”; paragraph [0093] “Each of the values is then converted into two values in bfloat16 format. At references 404 and 406, the values in A and B in FP32 are approximated with values in bfloat16 format”);
receiving, by the multiplier-accumulator, a second group of floating-point numbers (Henry Figs. 4 , 7A and paragraph [0123]);
performing, by the multiplier accumulator, a second operation on the second group of floating-point numbers, wherein the second operation corresponds to the first operation, to generate an operation result floating-point number of the operation type (Henry Figs. 4 , 7A and paragraph [0123] “The execution further includes performing arithmetic operations among lower precision values converted from values for the plurality of the operands at reference 714 … For example, when the execution circuitry comprises one or more dedicated multiplier-accumulator (MAC) circuits, and the one or more dedicated MAC circuits are to perform integer multiply-accumulate operations”; paragraph [0125] operation result floating-point number – resulting value);
converting, by the multiplier accumulator, the operation result floating-point number to an output floating-point number of a selected type, wherein the selected type is one of the N types of floating-point numbers (Henry 4, 7 and paragraph [0125] “The execution additionally includes generating a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and storing the floating-point value at reference 716”); and
sending, by the ALU, the output floating-point number (Henry paragraph [0125] “The floating-point value may be stored in a location specified by the instruction ( e.g., memory, cache, or register). In one embodiment, the floating-point value is stored in a location that has stored the input operands”).
Henry does not explicitly teach a hardware processor chip comprising a controller and an arithmetic logic unit having N adjustment circuits and a multiplier-accumulator; receiving, by the ALU, the first group of floating-point numbers from the controller and directing each floating-point number in the first group to a corresponding one of the N adjustment circuits configured to process floating-point numbers of a type of said each floating-point number; converting, by each of the N adjustment circuits, each floating-point number of the first group directed thereto from a type of said each directed floating-point number to one or more converted floating-point numbers of an operation type of the multiplier-accumulator; receiving, by the multiplier-accumulator, a second group of floating-point numbers from the N adjustment circuits, the second group of floating-point numbers comprising floating-point numbers of the operation type generated by the N adjustment circuits by converting the floating- point numbers of the first group; and sending, by the ALU, the output floating-point number to the controller.
However, on the same field of endeavor, Pugh discloses N adjustment circuits, each of the N adjustment circuits being configured to: obtain an input floating-point number of a corresponding type with a corresponding precision and convert the input floating-point number from the corresponding type to one or more output floating-point numbers of an operation type with an operation precision; provide a group of converted floating-point numbers from the N adjustment circuits as inputs for an operation to a multiplier-accumulator connected to the N adjustment circuits (Pugh Figs. 4-5 and paragraphs [0042, 0045-0047] “Each of the bit remap logics 430A-430D remaps the inputs based on a multiplication mode and byte selection mode input … In a floating-point mode that differs from the floating-point format used by the portion 500, the bit remap logics 430A-430D convert the inputs to a format expected by the portion 500”; N adjustment circuits - bit remap logics 430A-430D).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to modify Henry using Pugh and configure the ALU to include conversion circuitry (i.e., bit remap logics) for each input to the converted in order to convert the inputs to the format expected by the MAC circuits by directing each input to a corresponding bit remap logic then convert each input and send the converted inputs to the execution engine for the multiply-accumulate operations in order to implement a multiple mode arithmetic circuit that supports different floating point format combinations (Pugh paragraph [0046]).
Therefore, the combination of Henry as modified in view of Pugh teaches a hardware processor chip comprising a controller and an arithmetic logic unit having N adjustment circuits and a multiplier-accumulator; receiving, by the ALU, the first group of floating-point numbers from the controller and directing each floating-point number in the first group to a corresponding one of the N adjustment circuits configured to process floating-point numbers of a type of said each floating-point number; converting, by each of the N adjustment circuits, each floating-point number of the first group directed thereto from a type of said each directed floating-point number to one or more converted floating-point numbers of an operation type of the multiplier-accumulator; receiving, by the multiplier-accumulator, a second group of floating-point numbers from the N adjustment circuits, the second group of floating-point numbers comprising floating-point numbers of the operation type generated by the N adjustment circuits by converting the floating- point numbers of the first group.
Henry as modified in view of Pugh does not explicitly teach sending, by the ALU, the output floating-point number to the controller.
However, on the same field of endeavor, Lin discloses outputting an output value from a multiply-accumulate unit to a controller (Lin Fig. 10 and paragraph [0070] “The multiplier and accumulator unit 1010 receives weights stored for the Nth synaptic layer from the memory system 502 to compute sum-of-products. The multiplier and accumulator unit provides the sum-of-products to the controller 504 as the output for the Nth synaptic layer”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to modify Henry in view of Pugh using Lin and provide the output floating-point number to the controller such that the controller can store the output floating-point number to a location specified in the instruction (Henry paragraph [0125]).
Therefore, the combination of Henry as modified in view of Pugh and Lin teaches sending, by the ALU, the output floating-point number to the controller.
Regarding claim 9, Henry as modified in view of Pugh and Lin teaches all the limitations of claim 8 as stated above. Further, Henry as modified in view of Pugh and Lin teaches wherein for each of the N adjustment circuits an exponent bit width of an output floating-point number is greater than an exponent bit width of an input floating-point number (Pugh paragraph [0046] “an example, the portion 500 expects floating-point values with a 15-bit mantissa, a one-bit sign, and an 8-bit exponent. In this example, the multiple mode arithmetic circuit supports inputs and outputs using various combinations of 16-bit mantissas, 10-bit mantissas, 12-bit mantissas, 8-bit exponents, 6-bit exponents, and 5-bit exponents”; paragraph [0077]).
Regarding claim 10, Henry as modified in view of Pugh teaches all the limitations of claim 8 as stated above. Further, Henry as modified in view of Pugh teaches
wherein the multiplier-accumulator comprises an operation subcircuit and (Henry paragraph [0094, 0123] “the execution circuitry comprises one or more dedicated multiplier-accumulator (MAC) circuits, and the one or more dedicated MAC circuits are to perform integer multiply-accumulate operations”; operation subcircuit – MAC circuits);
(Henry Fig. 4 and Fig. 7A and paragraph [0125] “The execution additionally includes generating a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and storing the floating-point value at reference 716. The generation of the floating-point value includes the reconstructions shown in FIGS. 1 and 4 and discussed in the related paragraphs herein above”).
Henry does not explicitly teach wherein the multiplier-accumulator comprises an operation subcircuit and a format processing subcircuit, wherein the operation subcircuit performs the second operation on the second group of floating-point numbers of the operation type, and the method further comprises: receiving, by the format processing subcircuit, a mode signal indicating the selected type; wherein the format processing subcircuit converts the operation result floating-point number to the output floating-point number of the selected type based on the mode signal.
However, on the same field of endeavor, Pugh discloses a multiplier-accumulator that comprises a format processing subcircuit is configured to: receive a mode signal indicating an output type of floating-point numbers and convert the operation result floating-point number to the output floating point number of the selected type based on the mode signal (Pugh Fig. 8 and paragraph [0072] “In a floating-point mode that differs from the floating-point format used by the portions 500-700, the logics 850A-850B convert the intermediate outputs to a format expected by the FPGA … Based on the output format and the format operated on the portions 500-700, the logics 850A-850B convert the output values”; format processing subcircuit – logic block 850A/850B).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to modify Henry using Pugh and configure the execution circuitry to include output format circuitry to convert the internal floating-point format used by the MAC circuits back to the floating-point format (Pugh paragraph [0072]). As discussed, Henry discloses converting the resulting value from the arithmetic operations back into the floating-point format. Therefore, it would be obvious to provide a circuitry for implementing the conversion process.
Therefore, the combination of Henry as modified in view of Pugh teaches wherein the multiplier-accumulator comprises an operation subcircuit and a format processing subcircuit, wherein the operation subcircuit performs the second operation on the second group of floating-point numbers of the operation type, and the method further comprises: receiving, by the format processing subcircuit, a mode signal indicating the selected type; wherein the format processing subcircuit converts the operation result floating-point number to the output floating-point number of the selected type based on the mode signal.
Regarding claim 11, Henry as modified in view of Pugh and Lin teaches all the limitations of claim 8 as stated above. Further, Henry as modified in view of Pugh and Lin teaches wherein the step of converting by each of the N adjustment circuits comprises: when a mantissa bit width of an input floating-point number is less than or equal to a mantissa bit width of an output floating-point number of the operation type, converting the input floating- point number to one output floating-point number, wherein a value represented by the input floating-point number is equal to a value represented by the output floating-point number (Pugh paragraphs [0046-0047] “In a floating-point mode that differs from the floating-point format used by the portion 500, the bit remap logics 430A-430D convert the inputs to a format expected by the portion 500. In an example, the portion 500 expects floating-point values with a 15-bit mantissa, a one-bit sign, and an 8-bit exponent. In this example, the multiple mode arithmetic circuit supports inputs and outputs using various combinations of 16-bit mantissas, 10-bit mantissas, 12-bit mantissas, 8-bit exponents, 6-bit exponents, and 5-bit exponents. Based on the input format and the format expected by the portion 500, the bit remap logics 430A-430D convert the input values. In this example, selection of the input floating-point format is in response to a mode selection input. The bit remap logics 430A-430D, in some example embodiments, perform sign extension. As a result, operands that are smaller than the size of the input values accepted by the arithmetic blocks (e.g., the multipliers 520A-520H) are routed using only the routing resources necessary for the operands and sign-extended by the bit remap logics 430A-430D prior to use by the arithmetic blocks”).
Regarding claim 12, Henry as modified in view of Pugh and Lin teaches all the limitations of claim 8 as stated above. Further, Henry as modified in view of Pugh and Lin teaches wherein the step of converting by each of the N adjustment circuits comprises: when a mantissa bit width of an input floating-point number is greater than a mantissa bit width of an output floating-point number of the operation type, converting the input floating-point number into a plurality of output floating-point numbers, wherein a value represented by each input floating-point number is same as a value represented by a sum of the plurality of output floating-point numbers (Henry Figs. 4-5 and paragraph [0093] “At references 404 and 406, the values in A and B in FP32 are approximated with values in bfloat16 format. Values of A1 and B1 (in bfloat16 format) are approximation of the ones of A and B (in FP32 format) and have lower precision than the ones of A and B because the bfloat16 has less mantissa bits than FP32 (8 bits vs. 24 bits). A1 and B1 has values represented by the most significant 8-bit mantissa of the ones in A and B as shown at references 412 and 416. At references 414 and 416, the reminders of A and B after subtracting A1 and B1 are approximated in bfloat16 format as A2 and B2, respectively”; paragraphs [0116-0117] “One may also split each FP32 value into three Bfloat16 … Three bfloat16 have 24-bit significand precision in total, thus conversion from a FP32 value (which as 24-bit significand precision also) to three bloat16 values does not lose accuracy”).
Regarding claim 13, Henry as modified in view of Pugh and Lin teaches all the limitations of claim 8 as stated above. Further, Henry as modified in view of Pugh and Lin teaches wherein in the step of converting by each of the N adjustment circuits a quantity of output floating-point numbers corresponding to each input floating-point number is determined based on a mantissa bit width of an input floating-point number and a mantissa bit width of an output floating-point number of the operation type (Henry paragraph [0017] “Three bfloat16 have 24-bit significand precision in total, thus conversion from a FP32 value (which as 24-bit significand precision also) to three bloat16 values does not lose accuracy”; paragraph [0122] “The number of the converted lower precision values per value may depend on the additional operand for the QoS requirement in on embodiment. For example, when the accuracy of the arithmetic operations is expected to be high, each value may be converted to more lower precision values ( e.g., each A may be converted into A1, A2, and A3, instead”).
Regarding claim 14, Henry as modified in view of Pugh and Lin teaches all the limitations of claim 8 as stated above. Further, Henry as modified in view of Pugh and Lin teaches wherein a format of the input floating-point number satisfies the Institute of Electrical and Electronics Engineers (IEEE) binary floating point arithmetic standard, and a format of the output floating-point number does not satisfy the IEEE binary floating point arithmetic standard (Henry Fig. 4 and paragraph [0094] “The floating-point arithmetic operation hardware circuits (e.g., MAC and/or FMA circuits) implementing such accumulation/accumulators do not comply with the present IEEE standards, but the mixed-precision decomposition of operations allows embodiments of the invention to perform arithmetic operations with a higher precision”).
Claims 3 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Henry in view of Pugh as applied to claims 1 and 16 above respectively, and further in view of Lin.
Regarding claim 17, Henry as modified in view of Pugh teaches all the limitations of claim 16 as stated above. Further, Henry as modified in view of Pugh teaches
wherein the multiplier-accumulator comprises an operation subcircuit (Henry paragraph [0094, 0123] “the execution circuitry comprises one or more dedicated multiplier-accumulator (MAC) circuits, and the one or more dedicated MAC circuits are to perform integer multiply-accumulate operations”; operation subcircuit – MAC circuits);
convert the operation result floating-point number to the output floating point number, wherein the output floating-point number is of the output type (Henry Fig. 4 and Fig. 7A and paragraph [0125] “The execution additionally includes generating a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and storing the floating-point value at reference 716. The generation of the floating-point value includes the reconstructions shown in FIGS. 1 and 4 and discussed in the related paragraphs herein above”).
Henry does not explicitly teach wherein the multiplier-accumulator comprises an operation subcircuit and a format processing subcircuit, wherein the operation subcircuit is configured to performing the operation on the group of converted floating-point numbers to generate the operation output floating-point number, and the format processing subcircuit is configured to: receive a mode signal indicating an output type of floating-point numbers; convert the operation result floating-point number to the output floating point number, wherein the output floating-point number is of the output type indicated by the mode signal; output the output floating-point number to the controller.
However, on the same field of endeavor, Pugh discloses a multiplier-accumulator that comprises a format processing subcircuit is configured to: receive a mode signal indicating an output type of floating-point numbers; convert the operation result floating-point number to the output floating point number, wherein the output floating-point number is of the output type indicated by the mode signal (Pugh Fig. 8 and paragraph [0072] “In a floating-point mode that differs from the floating-point format used by the portions 500-700, the logics 850A-850B convert the intermediate outputs to a format expected by the FPGA … Based on the output format and the format operated on the portions 500-700, the logics 850A-850B convert the output values”; format processing subcircuit – logic block 850A/850B).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to modify Henry using Pugh and configure the execution circuitry to include output format circuitry to convert the internal floating-point format used by the MAC circuits back to the floating-point format (Pugh paragraph [0072]). As discussed, Henry discloses converting the resulting value from the arithmetic operations back into the floating-point format. Therefore, it would be obvious to provide a circuitry for implementing the conversion process.
Therefore, the combination of Henry as modified in view of Pugh teaches wherein the multiplier-accumulator comprises an operation subcircuit and a format processing subcircuit, wherein the operation subcircuit is configured to performing the operation on the group of converted floating-point numbers to generate the operation output floating-point number, and the format processing subcircuit is configured to: receive a mode signal indicating an output type of floating-point numbers; convert the operation result floating-point number to the output floating point number, wherein the output floating-point number is of the output type indicated by the mode signal.
Henry as modified in view of Pugh does not explicitly teach output the output floating-point number to the controller.
However, on the same field of endeavor, Lin discloses outputting an output value from a multiply-accumulate unit to a controller (Lin Fig. 10 and paragraph [0070] “The multiplier and accumulator unit 1010 receives weights stored for the Nth synaptic layer from the memory system 502 to compute sum-of-products. The multiplier and accumulator unit provides the sum-of-products to the controller 504 as the output for the Nth synaptic layer”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention, to modify Henry in view of Pugh using Lin and provide the output floating-point number to the controller such that the controller can store the output floating-point number to a location specified in the instruction (Henry paragraph [0125]).
Therefore, the combination of Henry as modified in view of Pugh and Lin teaches output the output floating-point number to the controller.
Regarding claim 3, it is directed to the hardware processor chip of claim 17. All the structural configuration of the processor chip of claim 3 is recited in the processor chip of the computing device of claim 17. Claim 17 analysis applies equally to claim 3.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Song (US 2021/0208884 A1) directed to a MAC operator comprising a plurality of data type converters each configured to receive a floating-point input data and convert the floating-point input data to a second floating point data having a different format and precision (e.g. 16-bit FP data to a 20-bit FP data); a plurality of multipliers configured to perform multiplication on the second floating point data; an adder tree to perform accumulation on outputs of the multipliers; and a data type de-converter to convert the floating-point accumulation value back to the original floating-point data type (Fig. 72).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Carlo Waje whose telephone number is (571)272-5767. The examiner can normally be reached 9:00-6:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Trujillo can be reached at (571) 272-3677. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Carlo Waje/Examiner, Art Unit 2151 (571)272-5767