DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Action is non-final and is in response to the claims filed. Claims 1-20 are currently pending, of which claims 1-11, 18, and 20 are currently rejected. Claims 12-17 and 19.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 1, recites the limitation “a multiply accumulator, the multiply accumulator comprising an input configured to receive a floating-point number, a first selection input, a floating-point general-purpose processing circuitry, and an output circuitry, the floating-point general-purpose processing circuitry being separately connected to the input configured to receive the floating-point number and the first selection input, and an output of the floating-point general-purpose processing circuitry being connected to an input of the output circuitry”. It is unclear if applicant intends for the multiply accumulator to include “a first selection input, a floating-point general-purpose processing circuitry, and an output circuitry”, or for the multiply accumulator to be a separate component from the components listed. Appropriate correction is required. For prior art rejection purposes, Examiner interprets the listed components as part of the multiply accumulator.
Claims 2-10 inherit the same deficiency as claim 1 by reason of dependence. They are rejected for the same reason.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 11, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Mueller et al. (U.S. Patent Application Publication No.: US 20210182024 A1), hereinafter “Mueller”, Taesik Na in NPL: “Speeding up Convolutional Neural Network Training with Dynamic Precision Scaling and Flexible Multiplier-Accumulator” (https://dl.acm.org/doi/pdf/10.1145/2934583.2934625), hereinafter “Na”, further in view of Hao Zhang in NPL: "New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference" (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8825551), hereinafter “Zhang”.
Regarding Claim 1, Mueller teaches:
A chip comprising:
a multiply accumulator (¶0025, e.g., Mixed-precision MAC performs multiply-accumulate operation), the multiply accumulator comprising an input configured to receive a floating-point number, … , a floating-point general-purpose processing circuitry, and an output circuitry (Fig. 2, e.g., shows: mixed precision floating-point multiply-add operation (multiply accumulator) that receives operands A, B, and C; Unpacking module 210, Multiplier 211, adders 212a, b, and c, Aligner 215, and Floating-point adder 213 (floating-point general-purpose processing circuitry); Adder/Rounder 214 (output circuitry)),
the floating-point general-purpose processing circuitry being separately connected to the input configured to receive the floating-point number (Fig. 2, e.g., Unpacking Module receives floating-point numbers) … , and an output of the floating-point general-purpose processing circuitry being connected to an input of the output circuitry (Fig. 2, e.g., floating-point Adder outputs result to Adder/Rounder 214 (output circuitry));
the floating-point general-purpose processing circuitry being configured to
receive a first operand, a second operand, and a third operand, each of the first operand, the second operand, and the third operand having a first bit width k1 and are inputted at the input configured to receive the floating-point number (Fig. 2, e.g., Operands A, B, and C are inputted, all being 32 bit wide (first bit width k1));
divide a fractional part of the first operand into m first suboperands of a second bit width k2 … , and divide a fractional part of the second operand into m second suboperands of the second bit width k2, the second bit width k2=k1/m, and m being a positive integer (Fig. 2, e.g., Operands A (first operand) and B (second operand) are divided into suboperands A0 and A1 for operand A, and B0 and B1 for operand B (m=2), where operands A0, A1, B0, and B1 are 16 bit wide; ¶0029);
perform a multiplication operation of fractional parts based on the m first suboperands and the m second suboperands to obtain a fractional product (¶0037, e.g., unpacked values A0,A1,B0,B1 are fed into a multiplier 211 to generate partial products);
determine a floating-point number product of the first operand and the second operand based on a sign bit and an exponent part of the first operand, a sign bit and an exponent part of the second operand, and the fractional product (¶0037, e.g., floating-point adder 212 receives signs, exponents, and partial products of operands A and B); and
perform an addition operation [using the Adder/Rounder 214] on the floating-point number product and the third operand to obtain a floating-point number sum (¶0037, e.g., Adder/Rounder 214 performs addition of operand C and partial product),
wherein the output circuitry is configured to output an operation result … according to the floating-point number sum (¶0037, e.g., Adder/Rounder 214 outputs rounded result).
Mueller does not teach:
a multiply accumulator, the multiply accumulator comprising an input configured to receive a floating-point number, a first selection input, a floating-point general-purpose processing circuitry, and an output circuitry,
the floating-point general-purpose processing circuitry being separately connected to the input configured to receive the floating-point number and the first selection input,
…
divide a fractional part of the first operand into m first suboperands of a second bit width k2 according to a floating-point operation mode indicated by the first selection input, and divide a fractional part of the second operand into m second suboperands of the second bit width k2, the second bit width k2=k1/m, and m being a positive integer;
perform an addition operation on the floating-point number product and the third operand to obtain a floating-point number sum,
wherein the output circuitry is configured to output an operation result in a specified data format according to the floating-point number sum.
However, in the same field of endeavor, Na teaches using an FSM controller that receives a mode signal to indicate the precision of input data. Na explains “FSM controller determines all inputs for configurable 16-bit MAC unit based on mode (16-bit and 32-bit mode)”. See Na: Section 4.2 Flexible MAC. Additionally, Mueller teaches how the unpacking module 210 can represent mantissas as either 2×fp16 numbers for FMMA or as 1×fp32 number for FMA. See: Mueller ¶0037.
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which said subject matter pertains to combine the mode signal to determine precision of input data as taught by Na with the unpacking module outputting either 2×fp16 numbers for FMMA or as 1×fp32 number for FMA as taught by Mueller. One would have been motivated to combine these references because both references disclose floating-point MAC operations, and Na enhances the model of Mueller by allowing for dynamic selection of precision of inputs “which enables speeding up training for lower precision computation” (Na: Abstract). Hence, Mueller in view of Na would cause for the unpacking module taught by Mueller to receive a mode signal as taught by Na.
Mueller in view of Na do not teach:
perform an addition operation on the floating-point number product and the third operand to obtain a floating-point number sum,
wherein the output circuitry is configured to output an operation result in a specified data format according to the floating-point number sum.
However, Zhang teaches:
perform an addition operation on the floating-point number product and the third operand to obtain a floating-point number sum (Fig. 2, e.g., shows 3-to-2 carry save adder adding operand C and multiplication result of A and B),
wherein the output circuitry is configured to output an operation result in a specified data format according to the floating-point number sum (Page 30 Column 1, second paragraph, e.g., mode signal (first selection input) determines the precision of the rounded floating point number).
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which said subject matter pertains to combine modify the Adder/Rounder 214 as taught by Mueller to perform adding and rounding in separate components, and to receive a mode signal to specify rounding precision as taught by Zhang. One would have been motivated to combine these references because both references disclose floating point multiply-add operations, and Zhang enhances the model of Mueller in view of Na by allowing for the adding and rounding stage to process in separate pipeline stages for parallel processing, and because “further energy reduction can be achieved when each operation step can use its minimum required precision instead of being forced to use a uniform precision for all steps” (Zhang: Section 1 Introduction, third paragraph)
Regarding Claim 2, Mueller in view of Na in view of Zhang teach:
The chip according to claim 1, wherein different selection signals are corresponding to different floating-point operation modes (Mueller: ¶0037, e.g., Operands can be two fp16 numbers or one fp32 number each (different floating-point operation mode); Na: Section 4.2 Flexible MAC, e.g., Mode signal selects precision of input data);
the floating-point general-purpose processing circuitry comprises a data extraction circuitry (Mueller: Fig. 2, e.g., Unpacking Module 210 (data extraction circuitry)),
and the data extraction circuitry is separately connected to the input configured to receive the floating-point number and the first selection input (Mueller: Fig. 2, e.g., Unpacking Module receives floating-point numbers; Na: Section 4.2 Flexible MAC, e.g., Mode signal selects precision of input data); and
an operation circuit indicated by the floating-point operation mode being configured to perform a multiply accumulate operation on the floating-point number of the first bit width ki, and the first bit width ki corresponding to a quantity m of split floating-point numbers (Mueller: Fig. 2, e.g., shows multiplier 211 and adders 212a-c (operation circuit); ¶0037; ¶0025, e.g., Mixed-precision MAC performs multiply-accumulate operation);
perform division from a lower order of the fractional part of the first operand according to the second bit width k2 to obtain the m first suboperands (Mueller: ¶0029; ¶0037; Fig. 2, e.g., Mantissas (lower order of fractional part) from Operands A (first operand) and B (second operand) are divided into suboperands A0 and A1 for operand A, and B0 and B1 for operand B (m=2), where operands A0, A1, B0, and B1 are 16 bit wide); and
perform division from a lower order of the fractional part of the second operand according to the second bit width k2 to obtain the m second suboperands (Mueller: ¶0029; ¶0037; Fig. 2, e.g., Mantissas (lower order of fractional part) from Operands A (first operand) and B (second operand) are divided into suboperands A0 and A1 for operand A, and B0 and B1 for operand B (m=2), where operands A0, A1, B0, and B1 are 16 bit wide).
Regarding Claims 11 and 18, they are method claims practiced by the apparatus of claims 1 and 2, respectively. They are rejected for the same reasons as claims 1 and 2.
Regarding Claim 20, it is a media claim practiced by the apparatus of claim 1. It is rejected for the same reasons as claim 1.
Allowable Subject Matter
Claims 3-10 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
Claims 12-17, and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Prior Art Made of Record
US 10747502 B2 – teaches a MAC circuit that performs multiply-accumulate operations on weight and input activation floating point data, and performs floating point-to-fixed point conversion for accumulation and converts back to floating point after accumulation operation. See Fig. 2 and corresponding description.
US 20190324723 A1 – teaches performing matrix multiplication by decomposing matrices into smaller components. See Fig. 3A and corresponding description.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARLOS H DE LA GARZA whose telephone number is (571)272-0474. The examiner can normally be reached Monday-Friday 9:30AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Caldwell can be reached at (571) 272-3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.H.D./
Carlos H. De La GarzaExaminer, Art Unit 2182 (571)272-0474
/ANDREW CALDWELL/Supervisory Patent Examiner, Art Unit 2182