DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Action is non-final and is in response to the claims filed 10/30/2025. Claims 1-25 are currently pending, of which claims 1-25 are currently rejected.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 10/30/2025 has been entered.
Response to Arguments
Applicant’s arguments filed on 10/30/2025 have been fully considered.
35 U.S.C. 103 – Applicant’s arguments regarding the 35 U.S.C. 103 rejection have been considered, but they are not persuasive.
Applicant argues at the bottom of page 11 that none of the cited references teach claim 1 as amended. Applicant specifically argues “Applicant respectfully submits that none of the cited reference alone or in combination teach or suggest a processing unit with the different types of FPUs as claimed and a control unit that is "configured to, in response to data representing an accuracy demand for a machine learning workload, selecting the second FPU instead of the first FPU to perform an arithmetic operation capable of being performed on the first FPU and cause the second FPU to approximate one or more arithmetic operations in the first floating- point format not supported by the second FPU, by: performing number decomposition on a first number and a second number of the first floating-point format to represent each of the numbers as a plurality of decomposed numbers of the second floating-point format, the second floating-point format having fewer significand bits than the first floating-point format, causing the second FPU to perform arithmetic operations using the decomposed numbers based on the data representing the accuracy demand, and storing results of the one or more arithmetic operations in the memory unit in the second floating-point format."”
35 U.S.C. 103 – Applicant’s arguments regarding the 35 U.S.C. 103 rejections are persuasive. However, see new grounds of rejection necessitated by amendments.
35 U.S.C. 101 – Applicant’s arguments regarding the 35 U.S.C. 101 rejection have been fully considered, but they are not persuasive.
Applicant argues in the second paragraph of page 12 that claims 1-25 as amended are not directed to abstract idea. Applicant specifically argues “the claims as amended are directed to patentable subject matter and are not directed to an abstract idea without significantly more.”
Examiner respectfully disagrees. Claims are directed to an abstract idea applied in generic computer components. Regarding the new limitations, having two FPUs of different floating point formats are still generic computer components, and selecting one FPU over the other to perform an arithmetic operation arises naturally from the requirements of the nature of the problem (format resulting from the number decomposition). See 35 U.S.C. 101 rejection below.
Drawings
The drawings are objected to under 37 CFR 1.83(a). The drawings must show every feature of the invention specified in the claims. Therefore, the “second floating-point (FPU)” and “selecting the second FPU instead of the first FPU to perform an arithmetic operation capable of being performed on the first FPU” first recited in claim 1 must be shown or the feature(s) canceled from the claim(s). No new matter should be entered.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1, at Step 1 the claim is directed to an apparatus, which is a statutory category of invention.
At Step 2A, Prong 1, Examiner notes that the claims are directed towards a mathematical concept and/or mental process. Claim 1 recites: A processing unit comprising:
a memory unit configured to store results of one or more arithmetic operations;
a first floating-point unit (FPU) configured to perform the one or more arithmetic operations using a first floating-point format;
a second floating-point unit (FPU) configured to perform the one or more arithmetic operations in a second floating-point format;
and a control unit operatively coupled with the memory unit and the second FPU, the control unit configured to, in response to data representing an accuracy demand for a machine learning workload, selecting the second FPU instead of the first FPU to perform an arithmetic operation capable of being performed on the first FPU and cause the second FPU to approximate one or more arithmetic operations in the first floating-point format not supported by the second FPU, by:
performing number decomposition on a first number and a second number of the first floating-point format to represent each of the numbers as a plurality of decomposed numbers of the second floating-point format, the second floating-point format having fewer significand bits than the first floating-point format,
causing the second FPU to perform the arithmetic operations using the decomposed numbers based on the data representing the accuracy demand,
and storing results of the one or more arithmetic operations in the memory unit in the second floating-point format.
At Step 2A Prong 2, the additional elements are bolded above. These additional elements are merely an “apply it” scenario using generically recited computer components. See MPEP 2106.05 (f). In the “memory unit configured to store” limitation, the claim is simply using a memory unit to store results of mathematical calculations/mental process (storing the results of the arithmetic operations). The “first floating-point unit (FPU) configured to” limitation is using an FPU to calculate/evaluate mathematical calculations/mental process in a first format (performs the arithmetic operations in a first floating-point format). The “second floating-point unit (FPU) configured to” limitation is using an FPU to calculate mathematical calculations, or evaluate a mental process in a different format (performs the arithmetic operations in a second floating-point format). The “control unit configured to” limitation is using a control unit to perform mathematical calculations/mental process (decomposing numbers from one format to another) based on mathematical relationships, or the mental process of observation and evaluation (in response to data representing an accuracy demand). The “selecting the second FPU instead of the first FPU to” limitation simply selects the FPU appropriate for the format of the mathematical calculation/mental process (arithmetic operation), and the “cause the second FPU to” limitation simply uses the second FPU to perform the mathematical calculations/mental process in the supported format (arithmetic operations in the supported format). The “causing the second FPU to perform” limitation is further explaining how an FPU is used to calculate mathematical calculations based on mathematical relationships (perform arithmetic operations using the decomposed numbers based on accuracy demand) and/or to perform a mental process of observation and evaluation (evaluate arithmetic operations using decomposed numbers based on an accuracy demand). In the “storing results of the one or more arithmetic operations in the memory unit in the second floating-point format” limitation, like previously stated, is using a memory unit to store results of mathematical calculations (results of arithmetic operation in another format). Additionally, the italicized limitation “for a machine learning workload” above is merely an additional element that is generally linking the use of the judicial exception to a particular technological environment or a field of use. Examples of limitations that the courts have described as merely indicating a field of use or technological environments in which to apply a judicial exception include, as discussed in MPEP 2106.05(h):
iv. Specifying that the abstract idea of monitoring audit log data relates to transactions or activities that are executed in a computer environment, because this requirement merely limits the claims to the computer field, i.e., to execution on a generic computer, FairWarning v. Iatric Sys., 839 F.3d 1089, 1094-95, 120 USPQ2d 1293, 1295 (Fed. Cir. 2016); and
vi. Limiting the abstract idea of collecting information, analyzing it, and displaying certain results of the collection and analysis to data related to the electric power grid, because limiting application of the abstract idea to power-grid monitoring is simply an attempt to limit the use of the abstract idea to a particular technological environment, Electric Power Group, LLC v. Alstom S.A., 830 F.3d 1350, 1354, 119 USPQ2d 1739, 1742 (Fed. Cir. 2016);
Under Step 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claims 12 and 21 recite similar language as claim 1 and are rejected for at least the same reasons therein. Herein, claims 12 and 21 are directed towards the statutory categories of machines or manufacture, and a method, thus also satisfying Step 1. Moreover, none of the additional elements regarding the generic computer components (i.e., a user interface configured to receive user input) are more than high level generic computer components that amount to mere instructions to apply the abstract idea on a generic computer. See MPEP 2106.05(f). The “relating to an error tolerance” limitation in claim 12 is equivalent to the mathematical relationship of an accuracy demand.
Claim 2 is directed to the mathematical calculation of the summation of two values (mathematical relationship and calculations) and/or the mental process of adding two values (observation and evaluation using pen and paper). Under Steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 3 is directed to the mathematical concept of significance of exponent values (mathematical relationships and calculations) and/or mental process of determining significance of numbers (observation and evaluation). Under Steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 4 is directed to the mathematical concept of calculating products or sum of products (mathematical relationships and calculations) and/or mental process of determining term (observation and evaluation). Under Steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 5 is directed to the mathematical concept of determining number of products (mathematical relationships and calculations) and/or mental process of determining number of terms based on accuracy demand (mental process, observation and evaluation). Under Steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 6 is directed to a mathematical concept of different decomposed numbers (mathematical relationships and calculations) and/or mental process of determining one number being different from another (mental process, observation and evaluation). Under Steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 7 is directed to a mathematical concept of two values having the same exponent (mathematical relationships and calculations) and mental process of determining if two numbers are the same (mental process, observation and evaluation). Under Steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 8 is directed to the mathematical concepts of decomposing floating point numbers into three numbers (mathematical relationships and calculations). Under Steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 9 is directed the mathematical concept of storing results of mathematical calculations and configuring it (storing results of arithmetic operations and configuring it) and/or mental process of configuring results of the evaluation of arithmetic operations (observation and evaluation of arithmetic operations and configuring it). Under Steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception. Even if not considered a mathematical process/mental process, configuring data for to be utilized in machine learning workloads would then be an additional element that is generally linking the use of the judicial exception to a particular technological environment or field of use. Examples of limitations that the courts have described as merely indicating a field of use or technological environments in which to apply a judicial exception include, as discussed in MPEP 2106.05(h):
iv. Specifying that the abstract idea of monitoring audit log data relates to transactions or activities that are executed in a computer environment, because this requirement merely limits the claims to the computer field, i.e., to execution on a generic computer, FairWarning v. Iatric Sys., 839 F.3d 1089, 1094-95, 120 USPQ2d 1293, 1295 (Fed. Cir. 2016); and
vi. Limiting the abstract idea of collecting information, analyzing it, and displaying certain results of the collection and analysis to data related to the electric power grid, because limiting application of the abstract idea to power-grid monitoring is simply an attempt to limit the use of the abstract idea to a particular technological environment, Electric Power Group, LLC v. Alstom S.A., 830 F.3d 1350, 1354, 119 USPQ2d 1739, 1742 (Fed. Cir. 2016);
Under Step 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 10 is directed a mathematical concept and/or mental process, observation and evaluation (determining accuracy demand). Under Steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 11 is directed a mathematical concept and/or mental process, observation and evaluation (determining accuracy demand). Under Steps 2A prong 2 and 2B, the claim does not recite any additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 13 recites similar language as claim 9 and is rejected for at least the same reasons therein. Under Steps 2A prong 2 and 2B, none of the additional elements regarding the generic computer components (i.e., servers storing data) integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claims 14 and 22 recite similar language as claim 2 and are rejected for at least the same reasons therein. Herein, claims 14 and 22 are directed towards the statutory categories of machines or manufacture, and a method, thus also satisfying Step 1. Moreover, there are no additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claims 15 and 23 recite similar language as claim 3 and are rejected for at least the same reasons therein. Herein, claims 15 and 23 are directed towards the statutory categories of machines or manufacture, and a method, thus also satisfying Step 1. Moreover, there are no additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claims 16 and 24 recite similar language as claim 4 and are rejected for at least the same reasons therein. Herein, claims 16 and 24 are directed towards the statutory categories of machines or manufacture, and a method, thus also satisfying Step 1. Moreover, there are no additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claims 17 and 25 recite similar language as claim 5 and are rejected for at least the same reasons therein. Herein, claims 17 and 25 are directed towards the statutory categories of machines or manufacture, and a method, thus also satisfying Step 1. Moreover, there are no additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 18 recite similar language as claim 6 and is rejected for at least the same reasons therein. Herein, claim 18 is directed towards the statutory categories of machines or manufacture, thus also satisfying Step 1. Moreover, there are no additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 19 recite similar language as claim 7 and is rejected for at least the same reasons therein. Herein, claim 19 is directed towards the statutory categories of machines or manufacture, thus also satisfying Step 1. Moreover, there are no additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim 20 recite similar language as claim 8 and is rejected for at least the same reasons therein. Herein, claim 18 is directed towards the statutory categories of machines or manufacture, thus also satisfying Step 1. Moreover, there are no additional elements that integrate the abstract idea into a practical application nor do they amount to significantly more than the judicial exception.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-9 and 21-25 are rejected under 35 U.S.C. 103 as being unpatentable over DiCecco et al. (US 12197887 B2), hereinafter “DiCecco”, in view of Ferrere (US 12299412 B2), hereinafter “Ferrere”, in view of Mellempudi et al. (U.S. Patent Application Publication No.: US 20210110508 A1), hereinafter “Mellempudi”, in view of Stef Graillat in NPL: Alternative Split Functions and Dekker’s Product (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9154489), hereinafter “Graillat”, and further in view of Andrew S. Tanenbaum in NPL: Structured Computer Organization (https://csc-knu.github.io/sys-prog/books/Andrew%20S.%20Tanenbaum%20-%20Structured%20Computer%20Organization.pdf), hereinafter “Tanenbaum”.
With regards to Claim 1, DiCecco teaches:
A processing unit (Fig. 4, e.g., dynamic precision floating-point decomposition dot product circuitry 400) comprising: …
a first floating-point unit (FPU) configured to perform the one or more arithmetic operations using a [second] floating-point format (Fig. 10, e.g., shows floating-point dot product engine 406a (first FPU) receiving inputs; Fig. 5, e.g., multiplier circuits 502 perform multiplication of inputs i' and w' (one or more arithmetic operations); Column 6 Lines 12-15, e.g., Inputs are decomposed (converted to second floating point format) and inputted to floating-point dot product engine 406 (FPU));
a second floating-point unit (FPU) configured to perform one or more arithmetic operations in a second floating-point format (Fig. 10, e.g., shows floating-point dot product engine 406b (second FPU) receiving inputs; Fig. 5, e.g., multiplier circuits 502 perform multiplication of inputs i' and w' (one or more arithmetic operations); Column 6 Lines 12-15, e.g., Inputs are decomposed (converted to second floating point format) and inputted to floating-point dot product engine 406 (FPU)); and
a control unit operatively coupled with … the second FPU, the control unit configured to, in response to data representing an accuracy demand (Fig. 4, e.g., input selectors 404-1 and 404-2 (control unit) receive select signal (accuracy demand); Column 8 Lines 17-20, e.g., Multiplexing circuit 820 selects from one of the mantissa portions depending on the select signal (accuracy demand); Column 8 Lines 32-34, e.g., Select range (accuracy demand) determines the precision needed; Fig. 8, e.g., Multiplexing circuit 820 is inside mantissa selector; Fig. 6, e.g., Mantissa selector is inside input selector (control unit)) for a machine learning workload (Column 2 Lines 16-21, e.g., Circuit for dynamic high and low precision floating-point computations is implemented to support machine learning), … and cause the second FPU to approximate one or more arithmetic operations in the first floating-point format not supported by the second FPU (Column 8 Lines 53-65, e.g., Inputs need to be rounded to be in the supported precision), by:
performing number decomposition on a first number and a second number of the first floating-point format (Column 6 Lines 12-15, e.g., Inputs i and w are decomposed (converted to second floating point format) and inputted to floating-point dot product engine 406 (FPU) as shown in Fig. 4) … the second floating-point format having fewer significand bits than the first floating-point format (Column 6 Lines 46-50, e.g., Mantissa selectors 604 output ik' that corresponds to a smaller precision of the corresponding input ik),
causing the second FPU to perform arithmetic operations using the decomposed numbers based on the data representing the accuracy demand (Column 6 Lines 46-50, e.g., Mantissa selectors 604 output ik' that corresponds to a smaller precision of the corresponding input ik. Lower precision yields less outputs, hence less values to be multiplied; Column 8 Lines 29-51, e.g., Select range and select signal (accuracy demand) determine the precision of the outputs of the mantissa selectors (inputted to floating point dot product engine block 406 (FPU)); Fig. 5, e.g., multiplier circuits 502 perform multiplication of inputs i' and w' (arithmetic operations); Fig. 4, e.g., outputs from input selectors (which include mantissa selectors) are input to floating point dot product engine block 406 (FPU)),
DiCecco does not teach:
a memory unit configured to store results of one or more arithmetic operations;
and a control unit operatively coupled with the memory unit…
a first floating-point unit (FPU) configured to perform the one or more arithmetic operations using a first floating-point format;
The control unit configured to … selecting the second FPU instead of the first FPU to perform an arithmetic operation capable of being performed on the first FPU and cause the FPU to approximate one or more arithmetic operations … by: performing number decomposition on a first number and a second number of the first floating-point format to represent each of the numbers as a plurality of decomposed numbers of the second floating-point format,
and storing results of the one or more arithmetic operations in the memory unit in the second floating-point format.
However, Ferrere teaches:
a memory unit configured to store results of one or more arithmetic operations (Fig. 3, e.g., Output unit 312 stores output of Processing Unit; Column 11 Lines 43-58, e.g., Output unit 312 stores floating point number in a different format after addition/subtraction performed by processing unit 308);
and storing results of the one or more arithmetic operations in the memory unit in the second floating-point format (Fig. 3, e.g., Output unit 312 stores output of Processing Unit; Column 11 Lines 43-58, e.g., Output unit 312 stores floating point number in a different format after addition/subtraction performed by processing unit 308).
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which said subject matter pertains to combine output unit 312 to store output of arithmetic operations as taught by Ferrere with the dynamic precision floating-point decomposition dot product circuitry 400 as taught by DiCecco. One would have been motivated to combine these references because both references disclose processing floating point operations in a second format, and Ferrere enhances the model of DiCecco by allowing for data flow control and synchronization. DiCecco in view of Ferrere together teach and a control unit operatively coupled with the memory unit.
DiCecco in view of Ferrere do not teach:
a first floating-point unit (FPU) configured to perform the one or more arithmetic operations in a first floating-point format;
The control unit configured to … selecting the second FPU instead of the first FPU to perform an arithmetic operation capable of being performed on the first FPU and cause the FPU to approximate one or more arithmetic operations … by: performing number decomposition on a first number and a second number of the first floating-point format to represent each of the numbers as a plurality of decomposed numbers of the second floating-point format,
However, Mellempudi teaches:
a first floating-point unit (FPU) configured to perform the one or more arithmetic operations in a first floating-point format (¶0078, e.g., For example, in one embodiment, a first portion of the GPGPU cores 262 include a single precision FPU and an integer ALU while a second portion of the GPGPU cores include a double precision FPU; Fig. 2D, e.g., shows GPGPU Cores 262 which include single precision and double precision FPUs)
The control unit configured to … selecting the second FPU instead of the first FPU to perform an arithmetic operation capable of being performed on the first FPU (¶0222, e.g., logic units (including FPUs) within a compute unit perform computations selectively at one of multiple precisions; Fig. 18, e.g., shows FPUs within compute units)
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which said subject matter pertains to combine the different FPUs using different bit-precisions as taught by Mellempudi with the first low precision block floating-point vector unit 406a as taught by DiCecco in view of Ferrere. One would have been motivated to combine these references because both references disclose floating point vector computations, and Mellempudi enhances the model of DiCecco in view of Ferrere by allowing for larger precision computations to be computed within the same cycle for faster processing.
DiCecco in view of Ferrere in view of Mellempudi do not teach:
The control unit configured to … selecting the second FPU instead of the first FPU to perform an arithmetic operation capable of being performed on the first FPU and cause the FPU to approximate one or more arithmetic operations … by: performing number decomposition on a first number and a second number of the first floating-point format to represent each of the numbers as a plurality of decomposed numbers of the second floating-point format,
However, in the same field of endeavor, Graillat teaches how floating point values can be split to represent a higher precision using an algorithm. Graillat explains “We aim at splitting a floating-point number a into two numbers ah and al such that a = ah + al and ah is an approximation to a that fits in a given (significantly less than p) number of bits, with the consequence that al also fits in a small number of bits” Page 42 First paragraph. Graillat also shows algorithm 1 for splitting a floating point number into a high portion and a low portion (ah, al) (See Page 42 Algorithm 1).
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which said subject matter pertains to combine the Algorithm for splitting floating point values as taught by Graillat with the programable circuitry 400 as taught by DiCecco in view of Ferrere in view of Mellempudi. One would have been motivated to combine these references because both references disclose floating point computations in a second format, and Graillat enhances the model of DiCecco in view of Ferrere in view of Mellempudi by making it possible "to mimic an arithmetic that has roughly twice the precision of the underlying floating-point arithmetic" (Graillat: Introduction, First paragraph).
DiCecco in view of Ferrere in view of Mellempudi in view of Graillat do not teach:
The control unit configured to … cause the FPU to approximate one or more arithmetic operations … by: performing number decomposition on a first number and a second number of the first floating-point format to represent each of the numbers as a plurality of decomposed numbers of the second floating-point format,
However, in the same field of endeavor, Tanenbaum teaches how software can be implemented in hardware, as they are “logically equivalent”. Tanenbaum explain “Any operation performed by software can also be built directly into the hardware and any instruction executed by the hardware can also be simulated in software.” (Tanenbaum Page 11)
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which said subject matter pertains to combine the hardware implementation of software as taught by Tanenbaum with the algorithm for splitting floating point numbers and the input selectors as taught by DiCecco in view of Ferrere in view of Mellempudi in view of Graillat. One would have been motivated to combine these references because both references disclose floating point operations in software and hardware, and Tanenbaum enhances the model of DiCecco in view of Ferrere in view of Mellempudi in view of Graillat by implementing floating point instructions in hardware for enhanced speed and reliability (See Tanenbaum Page 11-12). DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum teach claims 1 in its entirety.
With regards to Claim 2, DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum teach:
The processing unit of claim 1, the control unit further configured to: cause the second FPU to approximate a sum of the first number and the second number of the first floating-point format by determining a sum of at least two of the decomposed numbers of the second floating-point format (DiCecco: Column 6 Lines 46-50, e.g., Mantissa selectors 604 output ik' that corresponds to a smaller precision of the corresponding input ik; Fig. 5, e.g., Low precision floating point dot product engine 406 (FPU) performs sum of products of inputs with lower precision than that of the original inputs).
With regards to Claim 3, DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum teach:
The processing unit of claim 2, wherein the at least two of the decomposed numbers are determined based on significance of exponent values of the decomposed numbers (DiCecco: Column 8 Lines 29-51, e.g., Select range and select signal (accuracy demand) determine the precision of the outputs of the mantissa selectors (included in floating point dot product engine block 406 (FPU)); Column 7 Lines 18-21, e.g., Exponent selector 602 (included in Floating point dot product engine 406 (FPU) determines new exponent based on the precision selected)).
With regards to Claim 4, DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum teach:
cause the second FPU to approximate a product of the numbers of the first floating-point format using the product or the sum of the products of the decomposed numbers in the one or more arithmetic operations (DiCecco: Column 6 Lines 46-50, e.g., Mantissa selectors 604 output ik' that corresponds to a smaller precision of the corresponding input ik. Lower precision yields less outputs, hence less values to be multiplied (approximation)).
DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum do not teach:
the control unit further configured to: determine a number of terms to calculate for approximating a product of the first number and the second number of the first floating-point format,
cause the second FPU to calculate one or more terms according to the determined number of terms, each term comprising either a product of the decomposed numbers of the second floating-point format or a sum of a plurality of products of the decomposed numbers of the second floating-point format,
However, Graillat teaches:
… determine a number of terms to calculate for approximating a product of the first number and the second number of the first floating-point format (Graillat: Page 45, algorithm 3, e.g., floating point numbers a (ah and al) and b (bh and bl) are multiplied and added by functions in lines 4-7 (terms)),
… calculate one or more terms according to the determined number of terms, each term comprising either a product of the decomposed numbers of the second floating-point format or a sum of a plurality of products of the decomposed numbers of the second floating-point format (Graillat: Page 45, algorithm 3, e.g., Operations in Lines 4-7 are calculated),
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which said subject matter pertains to combine the Algorithm for performing multiply add operations using two floating point numbers split in half as taught by Graillat with the Input selectors (control unit) and the floating point dot product engine block 406 (FPU) as taught by DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum. One would have been motivated to combine these references because both references disclose floating point computations in a second format, and Graillat enhances the model of DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum by making it possible "to mimic an arithmetic that has roughly twice the precision of the underlying floating-point arithmetic" (Graillat: Introduction, First paragraph). DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum teach Claim 4 in its entirety.
With regards to Claim 5, DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum teach:
The processing unit of claim 4, wherein the number of terms is statically or dynamically determined based on the accuracy demand (DiCecco: Column 2 Lines 7-9, e.g., Amount of bits of precision are dynamically determined).
With regards to Claim 6, DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum teach:
The processing unit of claim 1, wherein the first floating-point format has a first number of exponent bits, and the second floating-point format has a second number of exponent bits that is different from the first number of exponent bits (DiCecco: Column 6 Lines 46-50, e.g., Mantissa selectors 604 output ik' (second format) that corresponds to a smaller precision of the corresponding input ik (first format); Column 7 Lines 18-21, e.g., Exponent selector 602 (included in Floating point dot product engine 406 (FPU) determines new exponent based on the precision selected)).
With regards to Claim 7, DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum teach:
The processing unit of claim 1, wherein the first floating-point format and the second floating-point format have a same number of exponent bits (DiCecco: Column 6 Lines 6-9, e.g., Inputs can be fed directly to Floating point dot product engine 406 (FPU), hence first and second format would have the same number of exponent bits).
With regards to Claim 8, DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum teach:
The processing unit of claim 1, wherein the first floating-point format includes at least three times as many significand bits as the second floating-point format, wherein each of the numbers of the first floating-point format is decomposable into three numbers of the second floating-point format (DiCecco: Column 8 Lines 53-67 - Column 9 Lines 1-36, e.g., If inputs are 12 bits long and select range is 3, Select signal can select one of the 3 portions of the input in the second floating point format).
With regards to Claim 9, DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum teach:
The processing unit of claim 1, wherein the results stored in the memory unit are configured to be utilized in machine learning workloads including machine learning training or machine learning inference (Column 2 Lines 16-21, e.g., Circuit for dynamic high and low precision floating-point computations is implemented to support machine learning).
With regards to Claims 21-25, they are method versions of the claimed processing unit above (claims 1-5 respectively), wherein all claim limitations also have been addressed and/or covered in cited areas. Thus, accordingly, these claims are rejected for at least the same reasons therein.
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over DiCecco, in view of Ferrere, in view of Mellempudi, in view of Graillat, in view of Tanenbaum, and further in view of Ware et al. (US 20210132905 A1), hereinafter “Ware”.
With regards to Claim 10, DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum teach the processing unit of Claim 1 referenced above. They further teach:
wherein the accuracy demand is automatically and dynamically determined (DiCecco: Column 2 Lines 7-21)
DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum do not teach:
based on the second FPU exceeding a threshold number of arithmetic operations to perform.
However, Ware teaches:
based on … exceeding a threshold number of arithmetic operations to perform (¶0018, e.g., Values exceeding the precision need to be rounded; Fig. 1B, e.g., values that are right shifted more than 7 bits (threshold number) would need to be rounded).
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which said subject matter pertains to combine the instruction of determining rounding accuracy based on exceeding a threshold of shift operations as taught by Ware with the floating-point dot product engine 406 (FPU) as taught by DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum. One would have been motivated to combine these references because both references disclose floating point operations using a second format, and Ware enhances the model of DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum by rounding values when shifted out of range to comply with the precision (Ware: ¶0038 and ¶0063)
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over DiCecco, in view of Ferrere, in view of Mellempudi, in view of Graillat, in view of Tanenbaum, and further in view of Langhammer et al. (US 20210216318 A1), hereinafter “Langhammer”.
With regards to Claim 11, DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum teach the processing unit of claim 1. They do not teach:
wherein the accuracy demand is determined based on user input.
However, Langhammer teaches:
wherein the accuracy demand is determined based on user input (¶0235, e.g., User selects accuracy based on tolerance for computational error; ¶0296, e.g., User interface structures are used to allow the user to select accuracy).
Therefore, it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to which said subject matter pertains to combine the user interface structures as taught by Langhammer with the circuit for high and low precision floating point computations for supporting machine learning as taught by DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum because both references disclose performing floating point computations in a second format, and Langhammer enhances the model of DiCecco in view of Ferrere in view of Mellempudi in view of Graillat in view of Tanenbaum by allowing for the user to select the desired precision.
Claims 12 and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over DiCecco, in view of Ferrere, in view of Mellempudi, in view of Graillat, in view of Tanenbaum, and further in view of Langhammer.
With regards to Claim 12, DiCecco teaches:
A computing system comprising:
… and a processing unit (Fig. 4, e.g., dynamic precision floating-point decomposition dot product circuitry 400) … , the processing unit comprising:
a first floating-point unit (FPU) configured to perform the one or more arithmetic operations using a [second] floating-point format (Fig. 10, e.g., shows floating-point dot product engine 406a (first FPU) receiving inputs; Fig. 5, e.g., multiplier circuits 502 perform multiplication of inputs i' and w' (one or more arithmetic operations); Column 6 Lines 12-15, e.g., Inputs are decomposed (converted to second floating point format) and inputted to floating-point dot product engine 406 (FPU))
a second floating-point unit (FPU) configured to perform the one or more arithmetic operations in a second floating-point format (Fig. 10, e.g., shows floating-point dot product engine 406b (second FPU) receiving inputs; Fig. 5, e.g., multiplier circuits 502 perform multiplication of inputs i' and w' (one or more arithmetic operations); Column 6 Lines 12-15, e.g., Inputs are decomposed (converted to second floating point format) and inputted to floating-point dot product engine 406 (FPU)); and
a control unit operatively coupled with … the second FPU, the control unit configured to, in response to … accuracy demand (Fig. 4, e.g., input selectors 404-1 and 404-2 (control unit) receive select signal (accuracy demand); Column 8 Lines 17-20, e.g., Multiplexing circuit 820 selects from one of the mantissa portions depending on the select signal (accuracy demand); Column 8 Lines 32-34, e.g., Select range (accuracy demand) determines the precision needed; Fig. 8, e.g., Multiplexing circuit 820 is inside mantissa selector; Fig. 6, e.g., Mantissa selector is inside input selector (control unit)), and cause the second FPU to approximate one or more arithmetic operations in the first floating-point format not supported by the second FPU (Column 8 Lines 53-65, e.g., Inputs need to be rounded to be in the supported precision), by:
performing number decomposition on a first number and a second number of the first floating-point format (Column 6 Lines 12-15, e.g., Inputs i and w are decomposed (converted to second floating point format) and inputted to floating-point dot product engine 406 (FPU) as shown in Fig. 4) … the second floating-point format having fewer significand bits than the first floating-point format (Column 6 Lines 46-50, e.g., Mantissa selectors 604 output ik' that corresponds to a smaller precision of the corresponding input ik),
selecting a number of arithmetic operations to perform using the decomposed numbers based on the accuracy demand (Fig. 5, e.g., multiplier circuits 502 perform multiplication of inputs i' and w' (arithmetic operations); Column 6 Lines 46-50, e.g., Mantissa selectors 604 output ik' that corresponds to a smaller precision of the corresponding input ik. Lower precision yields less outputs, hence less values to be multiplied; Column 8 Lines 29-51 e.g., Select range and select signal (accuracy demand) determine the precision of the outputs of the mantissa selectors (inputted floating point dot product engine block 406 (FPU) for multiplication)),
causing the second FPU to perform the selected number of arithmetic operations using the decomposed numbers based on the … accuracy demand (Column 6 Lines 46-50, e.g., Mantissa selectors 604 output ik' that corresponds to a smaller precision of the corresponding input ik. Lower precision yields less outputs, hence less values to be multiplied; Column 8 Lines 29-51, e.g., Select range and select signal (accuracy demand) determine the precision of the outputs of the mantissa selectors (inputted to floating point dot product engine block 406 (FPU)); Fig. 5, e.g., multiplier circuits 502 perform multiplication of inputs i' and w' (arithmetic operations); Fig. 4, e.g., outputs from input selectors (which include mantissa selectors) are input to floating point dot product engine block 406 (FPU)),
DiCecco does not teach:
a user interface configured to receive user input indicating a user selected accuracy demand relating to an error tolerance of a workload for a machine learning operation;
and a processing unit operatively coupled with the user interface, the processing unit comprising:
a memory unit configured to store results of one or more arithmetic operations;
a first floating-point unit (FPU) configured to perform the one or more arithmetic operations using a first floating-point format
and a control unit operatively coupled with the memory unit … in response to the user desired accuracy demand, selecting the second FPU instead of the first FPU to perform an arithmetic operation capable of being performed on the first FPU and cause the second FPU to approximate one or more arithmetic operations … by:
performing number decomposition on the first number and a second number of the first floating-point format to represent each of the numbers as a plurality of decomposed numbers of the second floating-point format,
… based on the user selected accuracy demand,
and storing results of the one or more arithmetic operations in the memory unit in the second floating-point format.
However, Langhammer teaches:
a user interface configured to receive user input indicating a user selected accuracy demand relating to an error tolerance of a workload (¶0235, e.g., User selects accuracy based on tolerance for computational error; ¶0296, e.g., User interface structures are used to allow the user to select accuracy).
Therefore, combining the user interface structure from Langhammer with the circuit for high and low precision floating point computations for supporting machine learning as taught by DiCecco (See Column 2 Lines 16-21) would fully cover the limitations “a user interface configured to receive user input indicating a user selected accuracy demand relating to an error tolerance of a workload for a machine learning operation;” as well as “and a processing unit operatively coupled with the user interface”, “in response to the user desired accuracy demand”, and “based on the user selected accurac